PSPDFKit · jdrhyne · Jun 25, 2025 · Jun 20, 2025 · Jun 21, 2025 · Jun 23, 2025
diff --git a/DEVELOPMENT_ROADMAP.md b/DEVELOPMENT_ROADMAP.md
@@ -0,0 +1,100 @@
+# Development Roadmap - Nutrient DWS Python Client
+
+## 📊 Issue Review & Recommendations
+
+After reviewing all open issues and analyzing the codebase, here are my recommendations for what to tackle next:
+
+### 🥇 **Top Priority: Quick Wins (1-2 days each)**
+
+#### 1. **Issue #11: Image Watermark Support** ⭐⭐⭐⭐⭐
+- **Why**: 80% already implemented! Just needs file upload support
+- **Current**: Supports `image_url` parameter
+- **Add**: `image_file` parameter for local image files
+- **Effort**: Very Low - mostly parameter handling
+- **Value**: High - common user request
+
+#### 2. **Issue #10: Multi-Language OCR Support** ⭐⭐⭐⭐
+- **Why**: Small change with big impact
+- **Current**: Single language string
+- **Add**: Accept `List[str]` for multiple languages
+- **Effort**: Low - update parameter handling and validation
+- **Value**: High - enables multi-lingual document processing
+
+### 🥈 **Second Priority: Core Features (3-5 days each)**
+
+#### 3. **Issue #13: Create Redactions Method** ⭐⭐⭐⭐
+- **Why**: Complements existing `apply_redactions()`
+- **Value**: Complete redaction workflow
+- **Complexity**: Medium - new API patterns for search strategies
+- **Use cases**: Compliance, privacy, legal docs
+
+#### 4. **Issue #12: Selective Annotation Flattening** ⭐⭐⭐
+- **Why**: Enhancement to existing `flatten_annotations()`
+- **Add**: `annotation_ids` parameter
+- **Effort**: Low-Medium
+- **Value**: More control over flattening
+
+### 🥉 **Third Priority: High-Value Features (1 week each)**
+
+#### 5. **Issue #16: Convert to PDF/A** ⭐⭐⭐⭐
+- **Why**: Critical for archival/compliance
+- **Value**: Legal requirement for many organizations
+- **Complexity**: Medium - new output format handling
+
+#### 6. **Issue #17: Convert PDF to Images** ⭐⭐⭐⭐
+- **Why**: Very common use case
+- **Value**: Thumbnails, previews, web display
+- **Complexity**: Medium - handle multiple output files
+
+### 📋 **Issues to Defer**
+
+- **Issue #20: AI-Powered Redaction** - Requires AI endpoint investigation
+- **Issue #21: Digital Signatures** - Complex, needs certificate handling
+- **Issue #22: Batch Processing** - Client-side enhancement, do after core features
+- **Issue #19: Office Formats** - Lower priority, complex format handling
+
+### 🎯 **Recommended Implementation Order**
+
+**Sprint 1 (Week 1):**
+1. Image Watermark Support (1 day)
+2. Multi-Language OCR (1 day)
+3. Selective Annotation Flattening (2 days)
+
+**Sprint 2 (Week 2):**
+4. Create Redactions Method (4 days)
+
+**Sprint 3 (Week 3):**
+5. Convert to PDF/A (3 days)
+6. Convert PDF to Images (3 days)
+
+### 💡 **Why This Order?**
+
+1. **Quick Wins First**: Build momentum with easy enhancements
+2. **Complete Workflows**: Redaction creation completes the redaction workflow
+3. **High User Value**: PDF/A and image conversion are frequently requested
+4. **Incremental Complexity**: Start simple, build up to more complex features
+5. **API Coverage**: These 6 features would increase API coverage significantly
+
+### 📈 **Expected Outcomes**
+
+After implementing these 6 features:
+- **Methods**: 18 total (up from 12)
+- **API Coverage**: ~50% (up from ~30%)
+- **User Satisfaction**: Address most common feature requests
+- **Time**: ~3 weeks of development
+
+## 🚀 Current Status
+
+As of the last update:
+- **PR #7 (Direct API Methods)**: ✅ Merged - Added 5 new methods
+- **PR #23 (OpenAPI Compliance)**: ✅ Merged - Added comprehensive documentation
+- **Current Methods**: 12 Direct API methods
+- **Test Coverage**: 94%
+- **Python Support**: 3.8 - 3.12
+
+## 📝 Notes
+
+- All features should maintain backward compatibility
+- Each feature should include comprehensive tests
+- Documentation should reference OpenAPI spec where applicable
+- Integration tests should be added for each new method
diff --git a/README.md b/README.md
@@ -128,6 +128,28 @@ client.watermark_pdf(
     opacity=0.5,
     position="center"
 )
+
+# Add image watermark from URL
+client.watermark_pdf(
+    input_file="document.pdf",
+    output_path="watermarked.pdf",
+    image_url="https://example.com/logo.png",
+    width=150,
+    height=75,
+    opacity=0.8,
+    position="bottom-right"
+)
+
+# Add image watermark from local file (NEW!)
+client.watermark_pdf(
+    input_file="document.pdf",
+    output_path="watermarked.pdf",
+    image_file="logo.png",  # Can be path, bytes, or file-like object
+    width=150,
+    height=75,
+    opacity=0.8,
+    position="bottom-right"
+)
 ```
 
 ## Builder API Examples
@@ -150,6 +172,17 @@ result = client.build(input_file="raw-scan.pdf") \
         optimize=True
     ) \
     .execute(output_path="final.pdf")
+
+# Using image file in builder API
+result = client.build(input_file="document.pdf") \
+    .add_step("watermark-pdf", {
+        "image_file": "company-logo.png",  # Local file
+        "width": 100,
+        "height": 50,
+        "opacity": 0.5,
+        "position": "bottom-left"
+    }) \
+    .execute()
 ```
 
 ## File Input Options

diff --git a/issue_comments.md b/issue_comments.md
@@ -0,0 +1,59 @@
+# Issue Comments for PR #7
+
+## For Issue #3: Add support for missing Nutrient DWS API tools
+
+**Status**: Partially addressed by PR #7
+
+PR #7 implements 5 of the high-priority PDF processing tools from this issue:
+- ✅ split_pdf - Split PDF into multiple files by page ranges
+- ✅ duplicate_pdf_pages - Duplicate and reorder specific pages  
+- ✅ delete_pdf_pages - Delete specific pages from PDFs
+- ✅ add_page - Add blank pages to PDFs
+- ✅ set_page_label - Set page labels/numbering
+
+Once merged, the library will expand from 7 to 12 Direct API methods.
+
+---
+
+## For Issue #15: Feature: Extract Page Range Method
+
+**Status**: Addressed by PR #7's split_pdf implementation
+
+The `split_pdf()` method in PR #7 provides the functionality requested:
+
+```python
+# Extract pages 5-10 (0-based indexing)
+result = client.split_pdf(
+    "document.pdf",
+    page_ranges=[{"start": 4, "end": 10}]
+)
+
+# Extract from page 10 to end
+result = client.split_pdf(
+    "document.pdf", 
+    page_ranges=[{"start": 9}]  # Omit 'end' to go to end of document
+)
+```
+
+While the method name is `split_pdf` rather than `extract_pages`, it provides the exact functionality described in this issue:
+- Single range extraction ✅
+- Support for "to end" extraction ✅
+- Clear error messages for invalid ranges ✅
+- Memory efficient implementation ✅
+
+Consider closing this issue once PR #7 is merged.
+
+---
+
+## PR #7 Summary
+
+**Title**: feat: integrate fork features with comprehensive Direct API methods
+
+**New Methods**:
+1. `split_pdf()` - Split PDFs by page ranges (addresses issue #15)
+2. `duplicate_pdf_pages()` - Duplicate and reorder pages
+3. `delete_pdf_pages()` - Remove specific pages
+4. `add_page()` - Insert blank pages
+5. `set_page_label()` - Apply page labels
+
+**Status**: All CI checks passing ✅ Ready for merge\!
diff --git a/pyproject.toml b/pyproject.toml
@@ -104,6 +104,11 @@ disallow_any_unimported = true
 [[tool.mypy.overrides]]
 module = "tests.*"
 disallow_untyped_defs = false
+disallow_any_unimported = false
+
+[[tool.mypy.overrides]]
+module = "PIL.*"
+ignore_missing_imports = true
 
 # Pytest configuration moved to pytest.ini
 

diff --git a/src/nutrient_dws/api/direct.py b/src/nutrient_dws/api/direct.py
@@ -159,6 +159,7 @@ def watermark_pdf(
         output_path: str | None = None,
         text: str | None = None,
         image_url: str | None = None,
+        image_file: FileInput | None = None,
         width: int = 200,
         height: int = 100,
         opacity: float = 1.0,
@@ -172,8 +173,10 @@ def watermark_pdf(
         Args:
             input_file: Input file (PDF or Office document).
             output_path: Optional path to save the output file.
-            text: Text to use as watermark. Either text or image_url required.
+            text: Text to use as watermark. One of text, image_url, or image_file required.
             image_url: URL of image to use as watermark.
+            image_file: Local image file to use as watermark (path, bytes, or file-like object).
+                       Supported formats: PNG, JPEG, TIFF.
             width: Width of the watermark in points (required).
             height: Height of the watermark in points (required).
             opacity: Opacity of the watermark (0.0 to 1.0).
@@ -187,11 +190,54 @@ def watermark_pdf(
         Raises:
             AuthenticationError: If API key is missing or invalid.
             APIError: For other API errors.
-            ValueError: If neither text nor image_url is provided.
+            ValueError: If none of text, image_url, or image_file is provided.
         """
-        if not text and not image_url:
-            raise ValueError("Either text or image_url must be provided")
+        if not text and not image_url and not image_file:
+            raise ValueError("Either text, image_url, or image_file must be provided")
 
+        # For image file uploads, we need to use the builder directly
+        if image_file:
+            from nutrient_dws.file_handler import prepare_file_for_upload, save_file_output
+
+            # Prepare files for upload
+            files = {}
+
+            # Main PDF file
+            file_field, file_data = prepare_file_for_upload(input_file, "file")
+            files[file_field] = file_data
+
+            # Watermark image file
+            image_field, image_data = prepare_file_for_upload(image_file, "watermark")
+            files[image_field] = image_data
+
+            # Build instructions with watermark action
+            action = {
+                "type": "watermark",
+                "width": width,
+                "height": height,
+                "opacity": opacity,
+                "position": position,
+                "image": "watermark",  # Reference to the uploaded image file
+            }
+
+            instructions = {"parts": [{"file": "file"}], "actions": [action]}
+
+            # Make API request
+            # Type checking: at runtime, self is NutrientClient which has _http_client
+            result = self._http_client.post(  # type: ignore[attr-defined]
+                "/build",
+                files=files,
+                json_data=instructions,
+            )
+
+            # Handle output
+            if output_path:
+                save_file_output(result, output_path)
+                return None
+            else:
+                return result  # type: ignore[no-any-return]
+
+        # For text and URL watermarks, use the existing _process_file approach
         options = {
             "width": width,
             "height": height,

diff --git a/src/nutrient_dws/builder.py b/src/nutrient_dws/builder.py
@@ -211,6 +211,14 @@ def _map_tool_to_action(self, tool: str, options: dict[str, Any]) -> dict[str, A
                     action["text"] = options["text"]
                 elif "image_url" in options:
                     action["image"] = {"url": options["image_url"]}  # type: ignore
+                elif "image_file" in options:
+                    # Handle image file upload
+                    image_file = options["image_file"]
+                    # Add the image as a file part
+                    watermark_name = f"watermark_{len(self._files)}"
+                    self._files[watermark_name] = image_file
+                    # Reference the uploaded file
+                    action["image"] = watermark_name  # type: ignore
                 else:
                     # Default to text watermark if neither specified
                     action["text"] = "WATERMARK"