PaddleOCR v5 Windows 安装与使用

最近想在本地跑 OCR，试了下 PaddleOCR v5，发现 mobile 版模型在速度和精度之间平衡得挺好，server 版在我的场景里精度提升不大，推理耗时倒是翻了 20 倍。这篇从安装到推理结果处理，完整走一遍流程。

环境准备

我的机器配置：

CPU：AMD Ryzen 9 8945HX（16 核 / 32 线程）
GPU：NVIDIA GeForce RTX 5060 Laptop（8GB）
内存：32GB
系统：Windows 10
Python：3.11

先建个虚拟环境，我用的 uv：

uv init -p 3.11
uv venv

安装推理引擎 PaddlePaddle

按照官方文档来就行：https://www.paddleocr.ai/latest/version3.x/paddlepaddle_installation.html

我是 50 系显卡，需要装 CUDA 12.9 + cuDNN 9.9 + TensorRT 10.5 的版本：

uv pip install https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-TagBuild-Training-Windows-Gpu-Cuda12.9-Cudnn9.9-Trt10.5-Mkl-Avx-VS2019-SelfBuiltPypiUse/86d658f56ebf3a5a7b2b33ace48f22d10680d311/paddlepaddle_gpu-3.0.0.dev20250717-cp311-cp311-win_amd64.whl

也可以用 uv add 锁依赖：

uv add "paddlepaddle-gpu @ https://paddle-qa.bj.bcebos.com/paddle-pipeline/Develop-TagBuild-Training-Windows-Gpu-Cuda12.9-Cudnn9.9-Trt10.5-Mkl-Avx-VS2019-SelfBuiltPypiUse/86d658f56ebf3a5a7b2b33ace48f22d10680d311/paddlepaddle_gpu-3.0.0.dev20250717-cp311-cp311-win_amd64.whl ; sys_platform == 'win32' and python_version == '3.11'"

安装 PaddleOCR

uv pip install paddleocr

验证安装

import paddleocr
print(f"PaddleOCR版本: {paddleocr.__version__}")

import paddle
print(f"Paddle版本: {paddle.__version__}")
print(f"GPU可用: {paddle.is_compiled_with_cuda()}")
print(f"GPU数量: {paddle.device.cuda.device_count()}")

输出大概是这样：

PaddleOCR版本: 3.5.0
Paddle版本: 3.0.0
GPU可用: True
GPU数量: 1

中间可能有个 ccache 的 warning，是因为本机没装 ccache。我们只是用推理，不编译 paddleocr 包，不用管它。

推理测试

直接用官方示例代码就行。首次运行会自动下载模型到 C:\Users\{你的用户名}\.paddlex\official_models：

from paddleocr import PaddleOCR

ocr = PaddleOCR(
    device="gpu",
    text_detection_model_name="PP-OCRv5_mobile_det",
    text_recognition_model_name="PP-OCRv5_mobile_rec",
    use_doc_orientation_classify=True,
    use_doc_unwarping=True,
    use_textline_orientation=True,
)

result = ocr.predict("./general_ocr_002.png")
for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

配置本地模型加载

如果不想每次联网下载，可以把之前自动下载的模型拷贝到项目同级的 model_dir 目录下，然后指定路径：

ocr = PaddleOCR(
    device="gpu",
    text_detection_model_name="PP-OCRv5_mobile_det",
    text_detection_model_dir="./model_dir/PP-OCRv5_mobile_det",
    text_recognition_model_name="PP-OCRv5_mobile_rec",
    text_recognition_model_dir="./model_dir/PP-OCRv5_mobile_rec",
    use_doc_orientation_classify=True,
    doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori",
    doc_orientation_classify_model_dir="./model_dir/PP-LCNet_x1_0_doc_ori",
    use_doc_unwarping=True,
    doc_unwarping_model_name="UVDoc",
    doc_unwarping_model_dir="./model_dir/UVDoc",
    use_textline_orientation=True,
    textline_orientation_model_name="PP-LCNet_x1_0_textline_ori",
    textline_orientation_model_dir="./model_dir/PP-LCNet_x1_0_textline_ori",
)

推理结果按置信度过滤

实际用的时候，低置信度的识别结果一般不想输出。加一个过滤函数，把低于阈值的丢掉，剩下的拼成纯文本：

from paddleocr import PaddleOCR
import time


def ocr_result_text_by_confidence(
    data: dict,
    min_score: float,
    *,
    join_lines: str = "\n",
) -> str:
    """按 rec_scores >= min_score 过滤 rec_texts，用换行拼接。"""
    texts = data.get("rec_texts") or []
    scores = data.get("rec_scores") or []
    kept = [t for t, s in zip(texts, scores) if float(s) >= min_score]
    return join_lines.join(kept)


ocr = PaddleOCR(
    device="gpu",
    text_detection_model_name="PP-OCRv5_mobile_det",
    text_detection_model_dir="./model_dir/PP-OCRv5_mobile_det",
    text_recognition_model_name="PP-OCRv5_mobile_rec",
    text_recognition_model_dir="./model_dir/PP-OCRv5_mobile_rec",
    use_doc_orientation_classify=True,
    doc_orientation_classify_model_name="PP-LCNet_x1_0_doc_ori",
    doc_orientation_classify_model_dir="./model_dir/PP-LCNet_x1_0_doc_ori",
    use_doc_unwarping=True,
    doc_unwarping_model_name="UVDoc",
    doc_unwarping_model_dir="./model_dir/UVDoc",
    use_textline_orientation=True,
    textline_orientation_model_name="PP-LCNet_x1_0_textline_ori",
    textline_orientation_model_dir="./model_dir/PP-LCNet_x1_0_textline_ori",
)

print("模型初始化完毕，开始推理")
start_time = time.time()
result = ocr.predict("./image.png")
end_time = time.time()

for res in result:
    res.save_to_json("output")
    content = ocr_result_text_by_confidence(res, min_score=0.85)
    print(content)

print(f"推理时间: {end_time - start_time} 秒")

几个实际使用中的发现

mobile 和 server 模型差距没有想象的大。 同一张图，server 模型推理要 17 秒，mobile 只要 0.82 秒。精度提升对我的场景来说很有限，mobile 的结果已经够好，所以日常用 mobile 就行。

三个预处理开关的取舍。 use_doc_orientation_classify 和 use_textline_orientation 对耗时影响不大，开了没坏处。主要是 use_doc_unwarping，对图片做展平，开启后从 0.82 秒涨到 1.7 秒。但如果你的场景经常是拍照识别，这个精度提升值得，尤其是纸张有折痕或者角度不正的情况。

如果你也是 50 系显卡，注意目前 PaddlePaddle 的 50 系支持还在 dev 阶段，whl 包链接可能会更新，遇到问题去官方文档看最新的安装指引。