Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions DEVELOPMENT_ROADMAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# Development Roadmap - Nutrient DWS Python Client

## 📊 Issue Review & Recommendations

After reviewing all open issues and analyzing the codebase, here are my recommendations for what to tackle next:

### 🥇 **Top Priority: Quick Wins (1-2 days each)**

#### 1. **Issue #11: Image Watermark Support** ⭐⭐⭐⭐⭐
- **Why**: 80% already implemented! Just needs file upload support
- **Current**: Supports `image_url` parameter
- **Add**: `image_file` parameter for local image files
- **Effort**: Very Low - mostly parameter handling
- **Value**: High - common user request

#### 2. **Issue #10: Multi-Language OCR Support** ⭐⭐⭐⭐
- **Why**: Small change with big impact
- **Current**: Single language string
- **Add**: Accept `List[str]` for multiple languages
- **Effort**: Low - update parameter handling and validation
- **Value**: High - enables multi-lingual document processing

### 🥈 **Second Priority: Core Features (3-5 days each)**

#### 3. **Issue #13: Create Redactions Method** ⭐⭐⭐⭐
- **Why**: Complements existing `apply_redactions()`
- **Value**: Complete redaction workflow
- **Complexity**: Medium - new API patterns for search strategies
- **Use cases**: Compliance, privacy, legal docs

#### 4. **Issue #12: Selective Annotation Flattening** ⭐⭐⭐
- **Why**: Enhancement to existing `flatten_annotations()`
- **Add**: `annotation_ids` parameter
- **Effort**: Low-Medium
- **Value**: More control over flattening

### 🥉 **Third Priority: High-Value Features (1 week each)**

#### 5. **Issue #16: Convert to PDF/A** ⭐⭐⭐⭐
- **Why**: Critical for archival/compliance
- **Value**: Legal requirement for many organizations
- **Complexity**: Medium - new output format handling

#### 6. **Issue #17: Convert PDF to Images** ⭐⭐⭐⭐
- **Why**: Very common use case
- **Value**: Thumbnails, previews, web display
- **Complexity**: Medium - handle multiple output files

### 📋 **Issues to Defer**

- **Issue #20: AI-Powered Redaction** - Requires AI endpoint investigation
- **Issue #21: Digital Signatures** - Complex, needs certificate handling
- **Issue #22: Batch Processing** - Client-side enhancement, do after core features
- **Issue #19: Office Formats** - Lower priority, complex format handling

### 🎯 **Recommended Implementation Order**

**Sprint 1 (Week 1):**
1. Image Watermark Support (1 day)
2. Multi-Language OCR (1 day)
3. Selective Annotation Flattening (2 days)

**Sprint 2 (Week 2):**
4. Create Redactions Method (4 days)

**Sprint 3 (Week 3):**
5. Convert to PDF/A (3 days)
6. Convert PDF to Images (3 days)

### 💡 **Why This Order?**

1. **Quick Wins First**: Build momentum with easy enhancements
2. **Complete Workflows**: Redaction creation completes the redaction workflow
3. **High User Value**: PDF/A and image conversion are frequently requested
4. **Incremental Complexity**: Start simple, build up to more complex features
5. **API Coverage**: These 6 features would increase API coverage significantly

### 📈 **Expected Outcomes**

After implementing these 6 features:
- **Methods**: 18 total (up from 12)
- **API Coverage**: ~50% (up from ~30%)
- **User Satisfaction**: Address most common feature requests
- **Time**: ~3 weeks of development

## 🚀 Current Status

As of the last update:
- **PR #7 (Direct API Methods)**: ✅ Merged - Added 5 new methods
- **PR #23 (OpenAPI Compliance)**: ✅ Merged - Added comprehensive documentation
- **Current Methods**: 12 Direct API methods
- **Test Coverage**: 94%
- **Python Support**: 3.8 - 3.12

## 📝 Notes

- All features should maintain backward compatibility
- Each feature should include comprehensive tests
- Documentation should reference OpenAPI spec where applicable
- Integration tests should be added for each new method
33 changes: 33 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,28 @@ client.watermark_pdf(
opacity=0.5,
position="center"
)

# Add image watermark from URL
client.watermark_pdf(
input_file="document.pdf",
output_path="watermarked.pdf",
image_url="https://example.com/logo.png",
width=150,
height=75,
opacity=0.8,
position="bottom-right"
)

# Add image watermark from local file (NEW!)
client.watermark_pdf(
input_file="document.pdf",
output_path="watermarked.pdf",
image_file="logo.png", # Can be path, bytes, or file-like object
width=150,
height=75,
opacity=0.8,
position="bottom-right"
)
```

## Builder API Examples
Expand All @@ -150,6 +172,17 @@ result = client.build(input_file="raw-scan.pdf") \
optimize=True
) \
.execute(output_path="final.pdf")

# Using image file in builder API
result = client.build(input_file="document.pdf") \
.add_step("watermark-pdf", {
"image_file": "company-logo.png", # Local file
"width": 100,
"height": 50,
"opacity": 0.5,
"position": "bottom-left"
}) \
.execute()
```

## File Input Options
Expand Down
59 changes: 59 additions & 0 deletions issue_comments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Issue Comments for PR #7

## For Issue #3: Add support for missing Nutrient DWS API tools

**Status**: Partially addressed by PR #7

PR #7 implements 5 of the high-priority PDF processing tools from this issue:
- ✅ split_pdf - Split PDF into multiple files by page ranges
- ✅ duplicate_pdf_pages - Duplicate and reorder specific pages
- ✅ delete_pdf_pages - Delete specific pages from PDFs
- ✅ add_page - Add blank pages to PDFs
- ✅ set_page_label - Set page labels/numbering

Once merged, the library will expand from 7 to 12 Direct API methods.

---

## For Issue #15: Feature: Extract Page Range Method

**Status**: Addressed by PR #7's split_pdf implementation

The `split_pdf()` method in PR #7 provides the functionality requested:

```python
# Extract pages 5-10 (0-based indexing)
result = client.split_pdf(
"document.pdf",
page_ranges=[{"start": 4, "end": 10}]
)

# Extract from page 10 to end
result = client.split_pdf(
"document.pdf",
page_ranges=[{"start": 9}] # Omit 'end' to go to end of document
)
```

While the method name is `split_pdf` rather than `extract_pages`, it provides the exact functionality described in this issue:
- Single range extraction ✅
- Support for "to end" extraction ✅
- Clear error messages for invalid ranges ✅
- Memory efficient implementation ✅

Consider closing this issue once PR #7 is merged.

---

## PR #7 Summary

**Title**: feat: integrate fork features with comprehensive Direct API methods

**New Methods**:
1. `split_pdf()` - Split PDFs by page ranges (addresses issue #15)
2. `duplicate_pdf_pages()` - Duplicate and reorder pages
3. `delete_pdf_pages()` - Remove specific pages
4. `add_page()` - Insert blank pages
5. `set_page_label()` - Apply page labels

**Status**: All CI checks passing ✅ Ready for merge\!
5 changes: 5 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,11 @@ disallow_any_unimported = true
[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false
disallow_any_unimported = false

[[tool.mypy.overrides]]
module = "PIL.*"
ignore_missing_imports = true

# Pytest configuration moved to pytest.ini

Expand Down
54 changes: 50 additions & 4 deletions src/nutrient_dws/api/direct.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ def watermark_pdf(
output_path: str | None = None,
text: str | None = None,
image_url: str | None = None,
image_file: FileInput | None = None,
width: int = 200,
height: int = 100,
opacity: float = 1.0,
Expand All @@ -172,8 +173,10 @@ def watermark_pdf(
Args:
input_file: Input file (PDF or Office document).
output_path: Optional path to save the output file.
text: Text to use as watermark. Either text or image_url required.
text: Text to use as watermark. One of text, image_url, or image_file required.
image_url: URL of image to use as watermark.
image_file: Local image file to use as watermark (path, bytes, or file-like object).
Supported formats: PNG, JPEG, TIFF.
width: Width of the watermark in points (required).
height: Height of the watermark in points (required).
opacity: Opacity of the watermark (0.0 to 1.0).
Expand All @@ -187,11 +190,54 @@ def watermark_pdf(
Raises:
AuthenticationError: If API key is missing or invalid.
APIError: For other API errors.
ValueError: If neither text nor image_url is provided.
ValueError: If none of text, image_url, or image_file is provided.
"""
if not text and not image_url:
raise ValueError("Either text or image_url must be provided")
if not text and not image_url and not image_file:
raise ValueError("Either text, image_url, or image_file must be provided")

# For image file uploads, we need to use the builder directly
if image_file:
from nutrient_dws.file_handler import prepare_file_for_upload, save_file_output

# Prepare files for upload
files = {}

# Main PDF file
file_field, file_data = prepare_file_for_upload(input_file, "file")
files[file_field] = file_data

# Watermark image file
image_field, image_data = prepare_file_for_upload(image_file, "watermark")
files[image_field] = image_data

# Build instructions with watermark action
action = {
"type": "watermark",
"width": width,
"height": height,
"opacity": opacity,
"position": position,
"image": "watermark", # Reference to the uploaded image file
}

instructions = {"parts": [{"file": "file"}], "actions": [action]}

# Make API request
# Type checking: at runtime, self is NutrientClient which has _http_client
result = self._http_client.post( # type: ignore[attr-defined]
"/build",
files=files,
json_data=instructions,
)

# Handle output
if output_path:
save_file_output(result, output_path)
return None
else:
return result # type: ignore[no-any-return]

# For text and URL watermarks, use the existing _process_file approach
options = {
"width": width,
"height": height,
Expand Down
8 changes: 8 additions & 0 deletions src/nutrient_dws/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,14 @@ def _map_tool_to_action(self, tool: str, options: dict[str, Any]) -> dict[str, A
action["text"] = options["text"]
elif "image_url" in options:
action["image"] = {"url": options["image_url"]} # type: ignore
elif "image_file" in options:
# Handle image file upload
image_file = options["image_file"]
# Add the image as a file part
watermark_name = f"watermark_{len(self._files)}"
self._files[watermark_name] = image_file
# Reference the uploaded file
action["image"] = watermark_name # type: ignore
else:
# Default to text watermark if neither specified
action["text"] = "WATERMARK"
Expand Down
Loading
Loading