开封市网站建设_网站建设公司_全栈开发者_seo优化
2026/1/1 21:59:56 网站建设 项目流程

计算机视觉中,目标检测(识别图像中物体位置与类别)和图像分割(像素级划分物体边界)是两大核心技术。本文基于Python生态,手把手教你实现YOLO(实时检测)、Faster R-CNN(高精度检测)和U-Net(语义分割),涵盖环境搭建、模型调用、实战训练全流程,适合入门开发者和CV爱好者。

一、前置准备:环境搭建与数据集准备

1. 核心依赖库安装

目标检测和分割依赖PyTorch/TensorFlow深度学习框架,以及OpenCV(图像处理)、ultralytics(YOLO官方库)等工具,推荐使用conda创建虚拟环境避免版本冲突。

# 1. 创建虚拟环境(建议Python 3.9-3.10)
conda create -n cv_env python=3.9
conda activate cv_env# 2. 安装PyTorch(优先GPU版本,需匹配本地CUDA;无GPU选CPU版)
# GPU版(CUDA 11.8为例,去PyTorch官网查对应命令)
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# CPU版
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu# 3. 安装目标检测/分割核心库
pip install ultralytics opencv-python pillow matplotlib tqdm
# Faster R-CNN依赖(基于torchvision)
pip install torchvision --upgrade
# U-Net训练辅助库
pip install albumentations  # 数据增强

2. 验证环境是否生效

运行以下代码,验证PyTorch和GPU是否可用:

import torch
import cv2
from ultralytics import YOLOprint(f"PyTorch版本:{torch.__version__}")
print(f"GPU是否可用:{torch.cuda.is_available()}")
print(f"OpenCV版本:{cv2.__version__}")
print("环境配置成功!")

3. 数据集准备(关键步骤)

  • 目标检测:推荐使用公开数据集VOCCOCO,或自制标注数据集(格式为YOLO txtXML)。
    自制数据集结构(YOLO格式):
    dataset/
    ├── images/  # 存放所有图片(jpg/png)
    │   ├── train/
    │   └── val/
    └── labels/  # 存放标注文件(txt,与图片同名)├── train/└── val/
    
  • 图像分割:推荐Cityscapes(城市场景)、Carvana(汽车分割),或自制数据集(标注为灰度图,不同像素值对应不同类别)。

二、目标检测实战1:YOLOv8(实时检测首选)

YOLO(You Only Look Once)是单阶段检测模型,速度快、精度高,适合实时场景(如视频监控、自动驾驶)。ultralytics库封装了YOLOv8,一行代码即可调用预训练模型。

1. 预训练模型推理(快速体验)

无需训练,直接用官方预训练模型检测图片/视频:

# 文件名:yolov8_detect.py
from ultralytics import YOLO
import cv2# 加载YOLOv8预训练模型(n/s/m/l/x,模型越大精度越高速度越慢)
model = YOLO("yolov8n.pt")  # yolov8n为轻量版,适合CPU# 1. 检测单张图片
img_path = "test.jpg"
results = model(img_path, save=True, conf=0.5)  # conf为置信度阈值
# save=True:自动保存检测结果到runs/detect目录# 2. 检测视频(实时摄像头用0,视频文件用路径)
cap = cv2.VideoCapture(0)  # 0为电脑摄像头
while cap.isOpened():ret, frame = cap.read()if not ret:break# 推理视频帧results = model(frame, conf=0.5)# 可视化结果annotated_frame = results[0].plot()cv2.imshow("YOLOv8 Detection", annotated_frame)# 按q退出if cv2.waitKey(1) & 0xFF == ord("q"):break
cap.release()
cv2.destroyAllWindows()

2. 自定义数据集训练(适配自己的场景)

若需检测特定物体(如安全帽、口罩),需用自制数据集训练:

# 文件名:yolov8_train.py
from ultralytics import YOLO# 加载预训练模型(迁移学习,加快训练速度)
model = YOLO("yolov8n.pt")# 训练模型(关键参数说明)
results = model.train(data="dataset.yaml",  # 数据集配置文件路径epochs=100,  # 训练轮数batch=16,  # 批次大小(GPU显存不足减小)imgsz=640,  # 输入图片尺寸lr0=0.01,  # 初始学习率device=0,  # GPU编号(CPU用cpu)save=True,  # 保存最佳模型project="runs/train",  # 训练结果保存路径
)# 验证模型
model.val()  # 用验证集评估模型精度

数据集配置文件dataset.yaml编写示例

# 数据集类别名称
names:0: person1: helmet2: mask
# 类别数量
nc: 3
# 训练/验证集路径(相对/绝对路径)
train: dataset/images/train
val: dataset/images/val

3. 关键参数与优化技巧

参数 作用 新手推荐值
epochs 训练轮数 50-200(避免过拟合)
batch 批次大小 8-32(GPU显存决定)
imgsz 输入尺寸 640(常用,平衡速度与精度)
conf 置信度阈值 0.5(过滤低置信度检测框)
iou NMS阈值 0.45(去除重复检测框)

三、目标检测实战2:Faster R-CNN(高精度检测首选)

Faster R-CNN是双阶段检测模型,精度高于YOLO,适合对精度要求高的场景(如医疗影像检测、工业质检)。基于torchvision可快速搭建Faster R-CNN模型。

1. 预训练模型推理

# 文件名:faster_rcnn_detect.py
import torch
import torchvision
from torchvision.models.detection import fasterrcnn_resnet50_fpn
import cv2
import numpy as np# 加载Faster R-CNN预训练模型
model = fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()  # 切换到推理模式# COCO数据集类别名称(共91类)
COCO_INSTANCE_CATEGORY_NAMES = ['__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane'# 完整列表可去torchvision官网查询
]# 预处理图片(转为Tensor)
def preprocess_img(img_path):img = cv2.imread(img_path)img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)transform = torchvision.transforms.ToTensor()img_tensor = transform(img_rgb)return img, img_tensor.unsqueeze(0)  # 增加batch维度# 检测单张图片
img_path = "test.jpg"
img, img_tensor = preprocess_img(img_path)# 推理(禁用梯度计算,加快速度)
with torch.no_grad():predictions = model(img_tensor)# 可视化检测结果
boxes = predictions[0]["boxes"].numpy()  # 检测框坐标
labels = predictions[0]["labels"].numpy()  # 类别标签
scores = predictions[0]["scores"].numpy()  # 置信度# 过滤低置信度结果
threshold = 0.5
for box, label, score in zip(boxes, labels, scores):if score < threshold:continue# 绘制检测框x1, y1, x2, y2 = map(int, box)cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)# 添加类别和置信度标签class_name = COCO_INSTANCE_CATEGORY_NAMES[label]cv2.putText(img, f"{class_name} {score:.2f}", (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,255,0), 2)# 保存并显示结果
cv2.imwrite("faster_rcnn_result.jpg", img)
cv2.imshow("Faster R-CNN Detection", img)
cv2.waitKey(0)
cv2.destroyAllWindows()

2. 自定义数据集训练

Faster R-CNN训练需自定义数据集加载器,核心是继承torch.utils.data.Dataset类,返回img_tensorboxeslabels

# 核心思路(完整代码需结合数据集格式编写)
class CustomDataset(torch.utils.data.Dataset):def __init__(self, img_dir, label_dir, transforms=None):self.img_dir = img_dirself.label_dir = label_dirself.transforms = transformsself.imgs = sorted(os.listdir(img_dir))def __getitem__(self, idx):# 1. 加载图片和标注img_path = os.path.join(self.img_dir, self.imgs[idx])label_path = os.path.join(self.label_dir, self.imgs[idx].replace(".jpg", ".txt"))# 2. 解析标注为boxes和labels(YOLO格式转xyxy)boxes = []labels = []with open(label_path, "r") as f:lines = f.readlines()for line in lines:cls, x, y, w, h = map(float, line.strip().split())# YOLO格式(归一化xywh)转xyxy(像素坐标)x1 = (x - w/2) * img_widthy1 = (y - h/2) * img_heightx2 = (x + w/2) * img_widthy2 = (y + h/2) * img_heightboxes.append([x1, y1, x2, y2])labels.append(int(cls)+1)  # 背景为0,类别从1开始# 3. 转为Tensorboxes = torch.as_tensor(boxes, dtype=torch.float32)labels = torch.as_tensor(labels, dtype=torch.int64)target = {"boxes": boxes, "labels": labels}if self.transforms:img = self.transforms(img)return img, targetdef __len__(self):return len(self.imgs)

四、图像分割实战:U-Net(语义分割经典模型)

U-Net是编码器-解码器结构,专为医学影像分割设计,后拓展到通用场景(如卫星图像、工业缺陷分割)。特点是对称结构+跳跃连接,能精准恢复像素级细节。

1. U-Net模型搭建(核心代码)

# 文件名:unet_model.py
import torch
import torch.nn as nn
import torch.nn.functional as F# 卷积块(Conv2d + BatchNorm + ReLU)
class DoubleConv(nn.Module):def __init__(self, in_channels, out_channels):super().__init__()self.double_conv = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),nn.BatchNorm2d(out_channels),nn.ReLU(inplace=True),nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),nn.BatchNorm2d(out_channels),nn.ReLU(inplace=True))def forward(self, x):return self.double_conv(x)# 下采样块(MaxPool + DoubleConv)
class Down(nn.Module):def __init__(self, in_channels, out_channels):super().__init__()self.maxpool_conv = nn.Sequential(nn.MaxPool2d(2),DoubleConv(in_channels, out_channels))def forward(self, x):return self.maxpool_conv(x)# 上采样块(Upsample + DoubleConv)
class Up(nn.Module):def __init__(self, in_channels, out_channels, bilinear=True):super().__init__()# 双线性插值上采样if bilinear:self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)else:# 转置卷积上采样self.up = nn.ConvTranspose2d(in_channels//2, in_channels//2, 2, stride=2)self.conv = DoubleConv(in_channels, out_channels)def forward(self, x1, x2):x1 = self.up(x1)# 调整尺寸,与x2(编码器特征图)对齐diffY = x2.size()[2] - x1.size()[2]diffX = x2.size()[3] - x1.size()[3]x1 = F.pad(x1, [diffX//2, diffX - diffX//2, diffY//2, diffY - diffY//2])# 拼接编码器和解码器特征图(跳跃连接)x = torch.cat([x2, x1], dim=1)return self.conv(x)# 输出层(1x1卷积,调整通道数为类别数)
class OutConv(nn.Module):def __init__(self, in_channels, out_channels):super(OutConv, self).__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1)def forward(self, x):return self.conv(x)# 完整U-Net模型
class UNet(nn.Module):def __init__(self, n_channels, n_classes, bilinear=True):super(UNet, self).__init__()self.n_channels = n_channels  # 输入通道数(RGB图为3)self.n_classes = n_classes    # 输出类别数(二分类为1)self.bilinear = bilinearself.inc = DoubleConv(n_channels, 64)self.down1 = Down(64, 128)self.down2 = Down(128, 256)self.down3 = Down(256, 512)factor = 2 if bilinear else 1self.down4 = Down(512, 1024 // factor)self.up1 = Up(1024, 512 // factor, bilinear)self.up2 = Up(512, 256 // factor, bilinear)self.up3 = Up(256, 128 // factor, bilinear)self.up4 = Up(128, 64, bilinear)self.outc = OutConv(64, n_classes)def forward(self, x):x1 = self.inc(x)x2 = self.down1(x1)x3 = self.down2(x2)x4 = self.down3(x3)x5 = self.down4(x4)x = self.up1(x5, x4)x = self.up2(x, x3)x = self.up3(x, x2)x = self.up4(x, x1)logits = self.outc(x)return logits

2. U-Net训练与推理(二分类为例,如细胞分割)

# 文件名:unet_train.py
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from unet_model import UNet
import albumentations as A
from albumentations.pytorch import ToTensorV2# 1. 定义数据集(以二分类为例,标注为灰度图)
class SegmentationDataset(torch.utils.data.Dataset):def __init__(self, img_dir, mask_dir, transforms=None):self.img_dir = img_dirself.mask_dir = mask_dirself.transforms = transformsself.imgs = sorted(os.listdir(img_dir))def __getitem__(self, idx):img_path = os.path.join(self.img_dir, self.imgs[idx])mask_path = os.path.join(self.mask_dir, self.imgs[idx].replace(".jpg", ".png"))# 加载图片和掩码img = cv2.imread(img_path)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)mask = cv2.imread(mask_path, cv2.IMREAD_GRAYSCALE)# 二值化掩码(0为背景,1为目标)mask[mask > 0] = 1if self.transforms:augmented = self.transforms(image=img, mask=mask)img = augmented["image"]mask = augmented["mask"]return img, mask.unsqueeze(0)  # mask增加通道维度def __len__(self):return len(self.imgs)# 2. 数据增强与预处理
train_transform = A.Compose([A.Resize(height=256, width=256),A.RandomCrop(height=224, width=224),A.HorizontalFlip(p=0.5),A.Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5)),ToTensorV2(),
])val_transform = A.Compose([A.Resize(height=224, width=224),A.Normalize(mean=(0.5,0.5,0.5), std=(0.5,0.5,0.5)),ToTensorV2(),
])# 3. 加载数据集
train_dataset = SegmentationDataset("train_imgs", "train_masks", train_transform)
val_dataset = SegmentationDataset("val_imgs", "val_masks", val_transform)
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=8, shuffle=False)# 4. 初始化模型、损失函数、优化器
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = UNet(n_channels=3, n_classes=1).to(device)
# 二分类损失函数:DiceLoss + BCEWithLogitsLoss
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-4)# 5. 训练函数
def train_epoch(model, loader, criterion, optimizer, device):model.train()total_loss = 0.0for imgs, masks in loader:imgs, masks = imgs.to(device), masks.to(device).float()optimizer.zero_grad()outputs = model(imgs)loss = criterion(outputs, masks)loss.backward()optimizer.step()total_loss += loss.item() * imgs.size(0)return total_loss / len(loader.dataset)# 6. 开始训练
epochs = 50
for epoch in range(epochs):train_loss = train_epoch(model, train_loader, criterion, optimizer, device)print(f"Epoch [{epoch+1}/{epochs}], Train Loss: {train_loss:.4f}")# 保存最佳模型if (epoch+1) % 10 == 0:torch.save(model.state_dict(), f"unet_epoch_{epoch+1}.pth")# 7. 推理(单张图片)
def predict_img(model, img_path, device):model.eval()img = cv2.imread(img_path)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)transform = A.Compose([A.Resize(224,224),A.Normalize(0.5,0.5),ToTensorV2()])img = transform(image=img)["image"].unsqueeze(0).to(device)with torch.no_grad():output = model(img)# 转为概率图,阈值0.5二值化pred_mask = torch.sigmoid(output).cpu().numpy() > 0.5return pred_mask.squeeze()

五、三大模型对比与场景选择

模型 类型 核心优势 适用场景 速度 精度
YOLOv8 单阶段检测 实时性强,部署简单 视频监控、自动驾驶、实时质检 中高
Faster R-CNN 双阶段检测 精度高,检测小物体能力强 医疗影像、工业缺陷检测、高精度目标识别
U-Net 语义分割 像素级分割,边界精准 医学影像分割、卫星图像分析、工业表面缺陷分割

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询