Feature Request: Alternative PDF parsing options for faster processing

## Feature Request
Add alternative PDF parsing options to avoid MinerU OCR hanging issues.

## Problem
- MinerU OCR stage hangs on large PDFs (>10 pages)
- Users stuck waiting >20 minutes with no progress
- No fallback option when OCR fails

## Proposed Solutions

### Option 1: Add alternative parsers
Support multiple PDF parsing backends:
- pdfplumber (Python-based, no OCR)
- PyMuPDF (fast, lightweight)
- pdf2image + OCR (alternative OCR)

### Option 2: Add timeout + retry
- Add configurable timeout for parsing stage
- Auto-retry with alternative method on failure

### Option 3: Skip parsing option
- Add --skip-parsing flag
- Allow users to provide pre-parsed markdown content
- Useful for users who just want slide generation

## Use Case
Converting arXiv papers (10-50 pages) to slides should complete in <5 minutes, not >20 minutes.

## Environment
- Titan (MacBook Pro M2), macOS
- Tested with arXiv 2602.11865 (42 pages)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Alternative PDF parsing options for faster processing #29

Feature Request

Problem

Proposed Solutions

Option 1: Add alternative parsers

Option 2: Add timeout + retry

Option 3: Skip parsing option

Use Case

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Alternative PDF parsing options for faster processing #29

Description

Feature Request

Problem

Proposed Solutions

Option 1: Add alternative parsers

Option 2: Add timeout + retry

Option 3: Skip parsing option

Use Case

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions