Skip to content
Discussion options

You must be logged in to vote

Great questions! Here are my recommendations based on production experience:

1. Batch Size Optimization

For 16GB RAM, I recommend:

  • 10-15 pages per batch for high-resolution scans (300+ DPI)
  • 20-30 pages per batch for standard resolution (150 DPI)
from german_ocr import OCREngine
import gc

def process_in_batches(pages, batch_size=15):
    results = []
    for i in range(0, len(pages), batch_size):
        batch = pages[i:i+batch_size]
        results.extend(engine.process(batch))
        gc.collect()  # Force garbage collection
    return results

2. Multiprocessing vs Async

Use multiprocessing for CPU-bound OCR tasks:

from concurrent.futures import ProcessPoolExecutor
import os

workers = 

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by DefcoGit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants