ppStructure同时调用ocr_engine()报错 #14528
Unanswered
IeohMingChan
asked this question in
Q&A
Replies: 2 comments
-
相当于模型你只起了一个实例,而paddleocr推理的batch是1,你上一个文件没处理完,肯定会抛出异常,建议加一个队列解决这个问题 |
Beta Was this translation helpful? Give feedback.
0 replies
-
根据您提供的问题描述和错误日志,我可以看出这是一个并发调用ppStructure导致的问题。让我为您分析并提供解决方案:
InvalidArgumentError: The shape of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [], input[1]'s shape = [1].
from multiprocessing import Pool
def process_pdf(file_path):
ocr_engine = PPStructure(...)
result = ocr_engine(file_path)
return result
def parallel_process_pdfs(file_paths):
with Pool() as pool:
results = pool.map(process_pdf, file_paths)
return results
@app.post("/ocr")
def ocr(file_path: str = Form(...)):
try:
# 每次创建新实例
ocr_engine = PPStructure(table=True, ocr=True, show_log=False)
result = ocr_engine(file_path)
return JSONResponse(content={"status": "success", "dict_list": result})
except Exception as e:
logger.error(f"OCR处理失败: {e}")
return JSONResponse(content={"status": "error", "message": str(e)})
建议您采用以上任一方案进行改造,以避免并发访问导致的问题。如果您的业务场景对性能要求较高,可以考虑使用进程池的方案。 Response generated by feifei-bot | claude-3-5-sonnet-latest |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🔎 Search before asking
🐛 Bug (问题描述)
我使用fastAPI接口来接收pdf文件地址,随后调用ocr_engine对pdf进行解析,发现当一个pdf文件在解析过程中如果收到另一个pdf文件并且调用同一个ppStructure实例开始解析pdf,在两个解析任务并行运行一段时间后会发生报错,错误内容为:
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 406, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 60, in call
return await self.app(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in call
await super().call(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/applications.py", line 113, in call
await self.middleware_stack(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/middleware/errors.py", line 187, in call
raise exc
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/middleware/errors.py", line 165, in call
await self.app(scope, receive, _send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 62, in call
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/routing.py", line 715, in call
await self.middleware_stack(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/routing.py", line 735, in app
await route.handle(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/routing.py", line 288, in handle
await self.app(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/routing.py", line 76, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/_exception_handler.py", line 62, in wrapped_app
raise exc
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/_exception_handler.py", line 51, in wrapped_app
await app(scope, receive, sender)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/routing.py", line 73, in app
response = await f(request)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/fastapi/routing.py", line 301, in app
raw_response = await run_endpoint_function(
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/fastapi/routing.py", line 214, in run_endpoint_function
return await run_in_threadpool(dependant.call, **values)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/starlette/concurrency.py", line 39, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2405, in run_sync_in_worker_thread
return await future
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 914, in run
result = context.run(func, *args)
File "/home/chenlm/srdkb-loader/loaders/pdf_loader/ocr_service.py", line 422, in ocr
dict_list, minioClinet = ocr_service.process_file(file_path, isMinio)
File "/home/chenlm/srdkb-loader/loaders/pdf_loader/ocr_service.py", line 52, in process_file
result = self.ocr_engine(file_path)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/paddleocr/paddleocr.py", line 850, in call
res, _ = super().call(
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/paddleocr/ppstructure/predict_system.py", line 135, in call
text_res, ocr_time_dict = self._predict_text(img)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/paddleocr/ppstructure/predict_system.py", line 189, in _predict_text
filter_boxes, filter_rec_res, ocr_time_dict = self.text_system(img)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/paddleocr/tools/infer/predict_system.py", line 144, in call
rec_res, elapse = self.text_recognizer(img_crop_list)
File "/data/local/anaconda3/envs/loader/lib/python3.10/site-packages/paddleocr/tools/infer/predict_rec.py", line 698, in call
rec_res[indices[beg_img_no + rno]] = rec_result[rno]
IndexError: index 58 is out of bounds for axis 0 with size 58。其中58并非固定值。以下是相关代码及报错截图,感谢解答
🏃♂️ Environment (运行环境)
paddleocr 2.8.1
paddlepaddle-gpu 2.6.2
python 3.10
Ubuntu 22.04.5 LTS
🌰 Minimal Reproducible Example (最小可复现问题的Demo)
import io
import json
import uuid
import re
import time
from fastapi import FastAPI, Form, BackgroundTasks, APIRouter
from fastapi.responses import JSONResponse
from io import BytesIO
from paddleocr import PPStructure
from loguru import logger
from ocr_config import *
app = FastAPI()
from typing import List, Dict
import pandas as pd
from bs4 import BeautifulSoup
import httpx
class OCRService:
def init(self):
ocr_service = OCRService()
@app.post("/ocr")
def ocr(file_path: str = Form(...), isMinio: bool = Form(...)):
try:
if name == "main":
import uvicorn
uvicorn.run('ocr_service:app', host="0.0.0.0", port=1300, workers = 1)
Beta Was this translation helpful? Give feedback.
All reactions