胡杨河市网站建设_网站建设公司_企业官网_seo优化-烟台市网站建设公司

C++部署参考：OCR三大模块推理代码解析

1. 引言

1.1 OCR系统架构概述

光学字符识别（OCR）技术在现代文档处理、图像理解与自动化信息提取中扮演着关键角色。一个完整的端到端OCR系统通常由三个核心模块构成：文字检测（Text Detection）、方向分类（Text Classification）和文字识别（Text Recognition）。这三个模块协同工作，实现从原始图像到结构化文本的完整转换流程。

本文基于cv_resnet18_ocr-detection镜像所提供的OCR模型能力，深入解析其在C++环境下的推理实现逻辑。该镜像集成了基于ResNet18骨干网络的DBNet文字检测模型，并支持ONNX导出和WebUI交互界面，适用于工业级部署场景。我们将重点剖析三大功能模块的C++推理代码实现细节，涵盖预处理、模型调用、后处理及结果整合等关键环节。

1.2 技术选型背景

本系统采用以下经典轻量级模型组合：

DBNet：用于高精度、实时性要求高的文本区域检测；
ShuffleNetV2：作为方向分类器，在保证低延迟的同时具备良好的分类性能；
CRNN：结合CNN与BiLSTM+CTC的序列识别架构，适合短文本识别任务。

所有模型均已通过PyTorch训练并导出为ONNX格式，可在跨平台环境中使用ONNX Runtime进行高效推理。本文将围绕这三类ONNX模型的C++集成方式展开详细分析。

2. 文字检测模块：DBNet推理实现

2.1 模型输入预处理

DBNet模型接收固定通道数的RGB图像作为输入，需完成归一化与张量布局转换。以下是典型的预处理函数实现：

cv::Mat TextDetector::preprocess(const cv::Mat& srcimg) { cv::Mat dstimg; cv::resize(srcimg, dstimg, cv::Size(inpWidth, inpHeight)); dstimg.convertTo(dstimg, CV_32F, 1.0 / 255.0); return dstimg; } void TextDetector::normalize_(const cv::Mat& img, float*& blob) { int channels = 3; int height = img.rows; int width = img.cols; blob = new float[channels * height * width]; std::vector<cv::Mat> chw(channels); for (int i = 0; i < channels; ++i) { chw[i] = cv::Mat(height, width, CV_32F, blob + i * height * width); } cv::split(img, chw); }

上述代码实现了图像尺寸缩放、归一化（/255），并通过cv::split将HWC格式转为CHW格式，符合ONNX模型输入要求。

2.2 ONNX Runtime推理执行

使用ONNX Runtime加载DBNet模型并执行前向推理：

std::vector<std::vector<cv::Point2f>> TextDetector::detect(cv::Mat& srcimg) { float* blob = nullptr; int h = srcimg.rows; int w = srcimg.cols; cv::Mat dstimg = this->preprocess(srcimg); this->normalize_(dstimg, blob); std::vector<int64_t> inputTensorShape{ 1, 3, dstimg.rows, dstimg.cols }; size_t inputTensorSize = utils::vectorProduct(inputTensorShape); std::vector<float> inputTensorValues(blob, blob + inputTensorSize); Ort::MemoryInfo memoryInfo = Ort::MemoryInfo::CreateCpu( OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault); auto inputTensor = Ort::Value::CreateTensor<float>( memoryInfo, inputTensorValues.data(), inputTensorSize, inputTensorShape.data(), inputTensorShape.size()); std::vector<Ort::Value> outputTensors = session.Run(Ort::RunOptions{nullptr}, inputNames.data(), &inputTensor, 1, outputNames.data(), outputNames.size());

其中session.Run()触发模型推理，返回概率图输出。

2.3 后处理：轮廓提取与坐标还原

对模型输出的概率图进行阈值化、轮廓查找与坐标映射：

const float* floatArray = outputTensors[0].GetTensorMutableData<float>(); cv::Mat binary(dstimg.rows, dstimg.cols, CV_32FC1); std::memcpy(binary.data, floatArray, inputTensorSize * sizeof(float)); cv::Mat bitmap; cv::threshold(binary, bitmap, binaryThreshold, 255, cv::THRESH_BINARY); std::vector<std::vector<cv::Point>> contours; cv::findContours(bitmap, contours, cv::RETR_LIST, cv::CHAIN_APPROX_SIMPLE); float scaleHeight = static_cast<float>(h) / dstimg.rows; float scaleWidth = static_cast<float>(w) / dstimg.cols; std::vector<std::vector<cv::Point2f>> results; for (const auto& contour : contours) { if (contour.size() < 4) continue; std::vector<cv::Point2f> polygon; for (const auto& pt : contour) { polygon.emplace_back(pt.x * scaleWidth, pt.y * scaleHeight); } cv::RotatedRect box = cv::minAreaRect(polygon); float longSide = std::max(box.size.width, box.size.height); if (longSide > longSideThresh) { results.push_back(polygon); } } delete[] blob; return results; }

最终返回的是原始图像尺度下的多边形顶点集合，可用于后续裁剪或可视化。

3. 方向分类模块：ShuffleNetV2推理实现

3.1 文本框裁剪与透视校正

对每个检测到的四点文本框进行规整化裁剪：

cv::Mat TextClassifier::get_rotate_crop_image(const cv::Mat& frame, std::vector<cv::Point2f> vertices) { std::sort(vertices.begin(), vertices.end(), [](const cv::Point2f& a, const cv::Point2f& b) { return a.y < b.y; }); if (vertices[0].x > vertices[1].x) std::swap(vertices[0], vertices[1]); if (vertices[2].x > vertices[3].x) std::swap(vertices[2], vertices[3]); std::vector<cv::Point2f> correctedVertices = { vertices[0], vertices[2], vertices[3], vertices[1] }; cv::Rect rect = cv::boundingRect(cv::Mat(correctedVertices)); cv::Mat crop_img = frame(rect).clone(); for (auto& v : correctedVertices) { v -= cv::Point2f(rect.x, rect.y); } cv::Size outputSize(rect.width, rect.height); std::vector<cv::Point2f> targetVertices = { cv::Point2f(0, 0), cv::Point2f(0, outputSize.height - 1), cv::Point2f(outputSize.width - 1, outputSize.height - 1), cv::Point2f(outputSize.width - 1, 0) }; cv::Mat M = cv::getPerspectiveTransform(correctedVertices, targetVertices); cv::Mat result; cv::warpPerspective(crop_img, result, M, outputSize, cv::BORDER_CONSTANT, 0); return result; }

此函数确保无论输入顶点顺序如何，均能正确生成水平对齐的矩形图像。

3.2 分类推理与结果输出

将裁剪后的图像送入ShuffleNetV2模型进行方向预测：

int TextClassifier::predict(cv::Mat cv_image) { cv::Mat resized; cv::resize(cv_image, resized, cv::Size(inpWidth, inpHeight)); resized.convertTo(resized, CV_32F, 1.0 / 255.0); float* blob = new float[3 * inpHeight * inpWidth]; // HWC to CHW std::vector<cv::Mat> chw(3); for (int c = 0; c < 3; ++c) { chw[c] = cv::Mat(inpHeight, inpWidth, CV_32F, blob + c * inpHeight * inpWidth); } cv::split(resized, chw); std::vector<int64_t> inputShape{1, 3, inpHeight, inpWidth}; auto inputTensor = Ort::Value::CreateTensor<float>( Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault), blob, 3 * inpHeight * inpWidth, inputShape.data(), inputShape.size()); auto outputTensors = session.Run(Ort::RunOptions{nullptr}, inputNames.data(), &inputTensor, 1, outputNames.data(), outputNames.size()); const float* probs = outputTensors[0].GetTensorMutableData<float>(); int max_id = 0; float max_prob = probs[0]; for (int i = 1; i < num_classes; ++i) { if (probs[i] > max_prob) { max_prob = probs[i]; max_id = i; } } delete[] blob; return max_id; // 返回角度类别索引（如0°,90°,180°,270°） }

返回值可用于控制是否旋转原图以供后续识别。

4. 文字识别模块：CRNN推理实现

4.1 图像预处理与张量构造

CRNN模型要求输入为固定高度（如48）的灰度图像：

cv::Mat TextRecognizer::preprocess(const cv::Mat& srcimg) { cv::Mat gray; if (srcimg.channels() == 3) { cv::cvtColor(srcimg, gray, cv::COLOR_BGR2GRAY); } else { gray = srcimg.clone(); } cv::Mat resized; cv::resize(gray, resized, cv::Size(inpWidth, inpHeight)); resized.convertTo(resized, CV_32F, 1.0 / 255.0); return resized; }

4.2 序列解码与CTC后处理

CRNN输出为时间步上的字符概率分布，需通过CTC解码获取最终文本：

std::string TextRecognizer::predict_text(cv::Mat cv_image) { float* blob = nullptr; cv::Mat dstimg = this->preprocess(cv_image); this->normalize_(dstimg, blob); std::vector<int64_t> inputShape{1, 1, inpHeight, inpWidth}; // 单通道输入 size_t inputSize = utils::vectorProduct(inputShape); auto inputTensor = Ort::Value::CreateTensor<float>( Ort::MemoryInfo::CreateCpu(OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault), blob, inputSize, inputShape.data(), inputShape.size()); auto outputTensors = session.Run(Ort::RunOptions{nullptr}, inputNames.data(), &inputTensor, 1, outputNames.data(), outputNames.size()); const float* logits = outputTensors[0].GetTensorMutableData<float>(); int seq_len = outputTensors[0].GetTensorTypeAndShapeInfo().GetShape()[1]; int num_classes = outputTensors[0].GetTensorTypeAndShapeInfo().GetShape()[2]; std::vector<int> pred_labels; for (int t = 0; t < seq_len; ++t) { int max_idx = 0; float max_val = logits[t * num_classes]; for (int c = 1; c < num_classes; ++c) { if (logits[t * num_classes + c] > max_val) { max_val = logits[t * num_classes + c]; max_idx = c; } } if (max_idx != 0 && !(pred_labels.size() > 0 && pred_labels.back() == max_idx)) { pred_labels.push_back(max_idx); } } std::string text; for (int idx : pred_labels) { text += alphabet[idx - 1]; // 假设alphabet不含blank } delete[] blob; return text; }

该过程模拟了CTC Greedy Decoding行为，移除重复字符与空白符。

5. 系统整合与工程优化建议

5.1 多模块串联流程设计

完整的OCR流水线应按如下顺序执行：

使用DBNet检测所有文本区域；
对每个检测框调用get_rotate_crop_image裁剪；
将裁剪图像送入ShuffleNetV2判断方向，必要时旋转；
将规整后的图像送入CRNN进行识别；
按空间位置排序输出文本行。

for (auto& box : detected_boxes) { cv::Mat cropped = classifier.get_rotate_crop_image(frame, box); int angle_id = classifier.predict(cropped); if (angle_id == 1 || angle_id == 3) { // 90 or 270 cv::rotate(cropped, cropped, angle_id == 1 ? cv::ROTATE_90_CLOCKWISE : cv::ROTATE_90_COUNTERCLOCKWISE); } std::string text = recognizer.predict_text(cropped); results.push_back({box, text}); }

5.2 性能优化实践建议

内存复用：避免频繁new/delete，可使用对象池管理blob缓冲区；
批处理支持：对于批量图片，可合并输入张量提升GPU利用率；
异步推理：利用ONNX Runtime的多线程能力实现流水线并发；
模型量化：将FP32模型转换为INT8以降低计算开销；
输入尺寸适配：根据实际场景调整inpWidth/inpHeight平衡速度与精度。

6. 总结

本文系统解析了基于ONNX Runtime的OCR三大核心模块——DBNet文字检测、ShuffleNetV2方向分类与CRNN文字识别——在C++环境中的推理实现方法。通过对预处理、模型调用、后处理各阶段的代码拆解，展示了如何将深度学习模型无缝集成至高性能生产系统中。

关键要点包括： 1. DBNet输出为概率图，需通过阈值化与轮廓提取获得文本框； 2. ShuffleNetV2用于快速判断文本方向，提升识别准确率； 3. CRNN依赖CTC机制实现端到端序列识别，需正确解码输出； 4. 多模块协同需注意坐标映射、图像裁剪与顺序恢复。

该方案已在cv_resnet18_ocr-detection镜像中验证可用，具备良好的可移植性与扩展潜力，适用于嵌入式设备、边缘服务器等多种部署场景。

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

胡杨河市网站建设_网站建设公司_企业官网_seo优化

C++部署参考：OCR三大模块推理代码解析

1. 引言

1.1 OCR系统架构概述

1.2 技术选型背景

2. 文字检测模块：DBNet推理实现

2.1 模型输入预处理

2.2 ONNX Runtime推理执行

2.3 后处理：轮廓提取与坐标还原

3. 方向分类模块：ShuffleNetV2推理实现

3.1 文本框裁剪与透视校正

3.2 分类推理与结果输出

4. 文字识别模块：CRNN推理实现

4.1 图像预处理与张量构造

4.2 序列解码与CTC后处理

5. 系统整合与工程优化建议

5.1 多模块串联流程设计

5.2 性能优化实践建议

6. 总结

热门文章

文章分类

标签云

需要专业的网站建设服务？

胡杨河市网站建设_网站建设公司_企业官网_seo优化

C++部署参考：OCR三大模块推理代码解析

1. 引言

1.1 OCR系统架构概述

1.2 技术选型背景

2. 文字检测模块：DBNet推理实现

2.1 模型输入预处理

2.2 ONNX Runtime推理执行

2.3 后处理：轮廓提取与坐标还原

3. 方向分类模块：ShuffleNetV2推理实现

3.1 文本框裁剪与透视校正

3.2 分类推理与结果输出

4. 文字识别模块：CRNN推理实现

4.1 图像预处理与张量构造

4.2 序列解码与CTC后处理

5. 系统整合与工程优化建议

5.1 多模块串联流程设计

5.2 性能优化实践建议

6. 总结

热门文章

文章分类

标签云

相关文章

开源大模型选型指南：Qwen2.5适用场景全面分析

5个高效部署工具推荐：通义千问2.5-0.5B镜像开箱即用体验

Z-Image-Turbo从零开始：Linux环境部署与测试脚本运行指南

需要专业的网站建设服务？