Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ Once installed, you can use the `parxy` command to:
- `parxy parse`: Extract text content from documents with customizable granularity levels and output formats. Process individual files or entire folders, use multiple drivers, and control output with progress bars.
- `parxy preview`: Interactive document viewer showing metadata, table of contents, and content preview in a scrollable interface
- `parxy markdown`: Convert documents into Markdown format, with optional combining of multiple documents
- `parxy pdf:merge`: Merge multiple PDF files into one, with support for selecting specific page ranges
- `parxy pdf:split`: Split a PDF file into individual pages
- `parxy drivers`: List available document processing drivers
- `parxy env`: Create a configuration file with default settings
- `parxy docker`: Generate a Docker Compose configuration for self-hosted services
Expand All @@ -89,6 +91,12 @@ parxy preview document.pdf
# Convert multiple PDFs to markdown and combine them
parxy markdown --combine -o output/ doc1.pdf doc2.pdf

# Merge multiple PDFs with page ranges
parxy pdf:merge cover.pdf doc1.pdf[1:10] doc2.pdf -o merged.pdf

# Split a PDF into individual pages
parxy pdf:split document.pdf -o ./pages

# List available drivers
parxy drivers
```
Expand Down
255 changes: 255 additions & 0 deletions docs/howto/pdf_manipulation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,255 @@
# How to Manipulate PDFs with Parxy

Parxy provides powerful **PDF manipulation commands** that allow you to merge multiple PDF files into one or split a single PDF into multiple files — all from the command line.

These commands are useful for:
- Combining multiple PDF documents into a single file
- Extracting specific page ranges from PDFs
- Splitting large PDFs into smaller, manageable files
- Reorganizing PDF pages

## Merging PDFs

The `pdf:merge` command combines multiple PDF files into a single output file, with support for selecting specific page ranges.

### Basic Merging

Merge two or more PDF files:

```bash
parxy pdf:merge file1.pdf file2.pdf -o merged.pdf
```

If you don't specify an output file, you'll be prompted to enter one:

```bash
parxy pdf:merge file1.pdf file2.pdf
# Prompts: Enter output filename or path: merged.pdf
```

### Merging Entire Folders

You can merge all PDFs in a folder (non-recursively):

```bash
parxy pdf:merge /path/to/folder -o combined.pdf
```

Files from folders are included in alphabetical order.

### Combining Files and Folders

Mix individual files and folders:

```bash
parxy pdf:merge cover.pdf /path/to/chapters appendix.pdf -o book.pdf
```

### Selecting Specific Pages

Use square brackets to specify page ranges (1-based indexing):

**Single page:**
```bash
parxy pdf:merge document.pdf[1] -o first_page.pdf
```

**Page range:**
```bash
parxy pdf:merge document.pdf[1:3] -o first_three_pages.pdf
```

**From start to page N:**
```bash
parxy pdf:merge document.pdf[:5] -o first_five_pages.pdf
```

**From page N to end:**
```bash
parxy pdf:merge document.pdf[10:] -o from_page_10.pdf
```

### Advanced Merging Examples

**Combine specific pages from multiple documents:**
```bash
parxy pdf:merge doc1.pdf[1] doc2.pdf[2:4] doc3.pdf[:2] -o selected_pages.pdf
```

**Mix full files with page ranges:**
```bash
parxy pdf:merge cover.pdf report.pdf[1:10] summary.pdf appendix.pdf[5:] -o final_report.pdf
```

**Merge chapter files:**
```bash
parxy pdf:merge intro.pdf chapter1.pdf chapter2.pdf chapter3.pdf conclusion.pdf -o complete_book.pdf
```

### Output Path Handling

- If you provide a full path, the file is created there
- If you provide just a filename, it's created in the same directory as the first input file
- The `.pdf` extension is added automatically if not provided

```bash
# Creates merged.pdf in the same directory as file1.pdf
parxy pdf:merge file1.pdf file2.pdf -o merged

# Creates in specified directory
parxy pdf:merge file1.pdf file2.pdf -o /output/dir/merged.pdf
```

## Splitting PDFs

The `pdf:split` command divides a single PDF into individual pages, with each page becoming a separate PDF file.

### Basic Splitting

Split a PDF into individual pages:

```bash
parxy pdf:split document.pdf
```

This creates a folder named `document_split/` containing:
- `document_page_1.pdf`
- `document_page_2.pdf`
- `document_page_3.pdf`
- etc.

### Custom Output Directory

Specify where to save the split files:

```bash
parxy pdf:split document.pdf --output /path/to/output
```

### Custom Filename Prefix

Change the prefix of output filenames:

```bash
parxy pdf:split book.pdf --prefix chapter
```

Creates files named:
- `chapter_page_1.pdf`
- `chapter_page_2.pdf`
- etc.

### Complete Examples

**Split with custom output directory:**
```bash
parxy pdf:split annual_report.pdf -o ./pages
```

**Split with custom prefix:**
```bash
parxy pdf:split presentation.pdf --prefix slide
```

Creates:
- `slide_page_1.pdf`
- `slide_page_2.pdf`
- etc.

**Split with both custom output and prefix:**
```bash
parxy pdf:split document.pdf -o ./individual_pages -p page
```

## Combining Merge and Split

You can chain operations together using the CLI:

**Example: Extract specific pages and split them:**
```bash
# First, extract pages 10-20
parxy pdf:merge document.pdf[10:20] -o extracted.pdf

# Then split into individual pages
parxy pdf:split extracted.pdf -o ./individual_pages
```

**Example: Merge and organize:**
```bash
# Merge selected pages from multiple documents
parxy pdf:merge doc1.pdf[1:5] doc2.pdf[3:8] -o combined.pdf

# Split the combined result into individual pages
parxy pdf:split combined.pdf -o ./pages -p combined_page
```

## Tips and Best Practices

### Page Numbering
- All page ranges use **1-based indexing** (first page is page 1, not 0)
- Ranges are **inclusive** (e.g., `[1:3]` includes pages 1, 2, and 3)

### File Organization
- Use folders to keep merged/split files organized
- Use descriptive prefixes to make file purposes clear
- Split creates a dedicated folder by default to avoid clutter

### Performance
- Both commands are optimized for speed
- Large PDFs are processed efficiently
- Progress information is displayed during processing

### Error Handling
- Invalid page ranges are reported with warnings
- Missing files are detected before processing starts
- The commands validate input before making changes

## Command Reference

### pdf:merge

```bash
parxy pdf:merge [FILES...] --output OUTPUT
```

**Arguments:**
- `FILES`: One or more PDF files or folders. Supports page ranges: `file.pdf[1:3]`

**Options:**
- `--output, -o`: Output file path (prompted if not provided)

**Examples:**
```bash
parxy pdf:merge file1.pdf file2.pdf -o merged.pdf
parxy pdf:merge folder1/ file.pdf folder2/ -o combined.pdf
parxy pdf:merge doc.pdf[1:10] doc.pdf[20:30] -o selections.pdf
```

### pdf:split

```bash
parxy pdf:split INPUT_FILE [OPTIONS]
```

**Arguments:**
- `INPUT_FILE`: PDF file to split into individual pages

**Options:**
- `--output, -o`: Output directory (default: `{filename}_split/`)
- `--prefix, -p`: Output filename prefix (default: input filename)

**Examples:**
```bash
parxy pdf:split document.pdf
parxy pdf:split document.pdf -o ./pages
parxy pdf:split document.pdf -o ./pages -p page
```

## Getting Help

For detailed command usage, use the `--help` flag:

```bash
parxy pdf:merge --help
parxy pdf:split --help
```
59 changes: 59 additions & 0 deletions docs/tutorials/using_cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ The Parxy CLI lets you:
| `parxy parse` | Extract text content from documents with customizable detail levels and output formats. Process files or folders with multiple drivers. |
| `parxy preview` | Interactive document viewer with metadata, table of contents, and scrollable content preview |
| `parxy markdown` | Convert parsed documents into Markdown format (optionally combine multiple files) |
| `parxy pdf:merge`| Merge multiple PDF files into one, with support for page ranges |
| `parxy pdf:split`| Split a PDF file into individual pages |
| `parxy drivers` | List available document processing drivers |
| `parxy env` | Generate a default `.env` configuration file |
| `parxy docker` | Create a Docker Compose configuration for running Parxy-related services |
Expand Down Expand Up @@ -207,6 +209,61 @@ parxy markdown --combine -o output/ doc1.pdf doc2.pdf doc3.pdf
This will generate a file named `combined_output.md` in the output directory.


## Manipulating PDFs

Parxy provides two powerful commands for PDF manipulation: merging multiple PDFs into one and splitting a single PDF into multiple files.

### Merging PDFs

The `pdf:merge` command combines multiple PDF files into a single output file. You can merge entire files, specific page ranges, or folders of PDFs.

**Basic merge:**
```bash
parxy pdf:merge file1.pdf file2.pdf -o merged.pdf
```

**Merge with page ranges:**
```bash
parxy pdf:merge doc1.pdf[1:5] doc2.pdf[3:7] -o combined.pdf
```

Page range syntax (1-based indexing):
- `file.pdf[1]` - Single page (page 1)
- `file.pdf[1:5]` - Pages 1 through 5
- `file.pdf[:3]` - First 3 pages
- `file.pdf[5:]` - From page 5 to the end

**Merge entire folders:**
```bash
parxy pdf:merge /path/to/pdfs -o combined.pdf
```

**Mix files, folders, and page ranges:**
```bash
parxy pdf:merge cover.pdf /chapters doc.pdf[10:20] appendix.pdf -o book.pdf
```

### Splitting PDFs

The `pdf:split` command divides a PDF file into individual pages, with each page becoming a separate PDF file.

**Split into individual pages:**
```bash
parxy pdf:split document.pdf
```

This creates a `document_split/` folder containing `document_page_1.pdf`, `document_page_2.pdf`, etc.

**Specify output directory and prefix:**
```bash
parxy pdf:split report.pdf -o ./pages -p page
```

Creates `page_1.pdf`, `page_2.pdf`, etc. in the `./pages` directory.

For more detailed examples and use cases, see the [PDF Manipulation How-to Guide](../howto/pdf_manipulation.md).


## Managing Drivers

To view the list of supported document parsing drivers:
Expand Down Expand Up @@ -271,6 +328,8 @@ With the CLI, you can use Parxy as a **standalone document parsing tool** — id
| `parxy parse` | Extract text from documents with multiple formats & drivers |
| `parxy preview` | Interactive document viewer with metadata and TOC |
| `parxy markdown` | Generate Markdown output |
| `parxy pdf:merge`| Merge multiple PDF files with page range support |
| `parxy pdf:split`| Split PDF files into individual pages |
| `parxy drivers` | List supported drivers |
| `parxy env` | Create default configuration file |
| `parxy docker` | Generate Docker Compose setup |
2 changes: 2 additions & 0 deletions src/parxy_cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
from parxy_cli.commands.env import app as env_command
from parxy_cli.commands.version import app as version_command
from parxy_cli.commands.markdown import app as markdown_command
from parxy_cli.commands.pdf import app as pdf_command


# Create typer app
Expand Down Expand Up @@ -71,6 +72,7 @@ def main(
app.add_typer(env_command)
app.add_typer(version_command)
app.add_typer(markdown_command)
app.add_typer(pdf_command)


def main():
Expand Down
Loading