Skip to content

Conversation

@Ahnaf19
Copy link
Owner

@Ahnaf19 Ahnaf19 commented Jan 11, 2026

PDF Export Support for JobSnap v2.0

Overview

Adds comprehensive PDF export functionality to both CLI and browser extension, with intelligent browser-specific optimization for best quality and file size. Includes date format standardization, improved PDF layout, and CI optimizations.

Features

🖨️ CLI PDF/HTML Export

  • Export job descriptions to PDF using Puppeteer
  • Export to HTML for web viewing
  • Professional styling with consistent branding
  • Vector-based PDFs (~200KB)
  • Automatic index metadata updates (has_pdf: true)

Usage:

jobsnap export jobs/1234567 --format pdf
jobsnap export jobs/1234567 --format html

🌐 Extension PDF Export

  • Chrome/Edge (80% of users): Native vector PDFs using chrome.tabs.printToPDF (~363KB)
  • Firefox/Safari (20% of users): Fallback to html2pdf.js (~400KB)
  • Progressive enhancement: 80% of users get perfect quality
  • 50% file size reduction compared to initial implementation (728KB → 363KB)

📅 Date Format Standardization

  • Changed from ambiguous numeric dates (1/2/2026)
  • To unambiguous format: Jan 02, 2026 (MMM DD, YYYY)
  • Prevents DD/MM vs MM/DD confusion for Bangladesh users
  • Applied to all user-facing dates:
    • CLI export command (PDF/HTML output)
    • CLI list command (deadline display)
    • Extension PDF export
  • Original scraped markdown content preserved unchanged

🎨 Improved PDF Layout

  • Fixed metadata section overflow (long company names no longer cut off dates)
  • Job ID and Company display on separate lines
  • Better spacing with line-height: 1.8
  • Robust text wrapping for long content
  • Consistent layout between CLI and extension

Technical Implementation

CLI Export Command

  • Uses Puppeteer for headless Chrome rendering
  • Generates vector PDFs with native browser print engine
  • Graceful error handling with helpful messages
  • Updates index.jsonl with has_pdf metadata

Extension Architecture

Dual PDF Generation Strategy:

  1. Native Path (Chrome/Edge - 80% of users):

    • Service worker (background.js) handles chrome.tabs.printToPDF requests
    • Creates temporary hidden tab with HTML content
    • Generates vector PDF using browser's print engine
    • Returns PDF blob to popup for download
    • Result: ~363KB vector PDF (same quality as CLI)
  2. Fallback Path (Firefox/Safari - 20% of users):

    • Client-side html2pdf.js library
    • Converts HTML to canvas (html2canvas) → JPEG → PDF
    • Result: ~400KB raster PDF (still good quality)

Browser Detection:

function hasNativePDFSupport() {
  return typeof chrome.tabs !== 'undefined' &&
         typeof chrome.tabs.printToPDF === 'function';
}

Shared HTML Template

  • Both CLI and extension use identical HTML template
  • Consistent styling and professional layout
  • Responsive design with print media queries
  • Exports generateJobHTML() and stripMetadata() for reuse

File Size Comparison

Method Size Type Quality
CLI (Puppeteer) 197KB Vector ⭐⭐⭐⭐⭐ Perfect
Extension (Chrome native) 363KB Vector ⭐⭐⭐⭐⭐ Perfect
Extension (html2pdf.js) 400KB Raster ⭐⭐⭐⭐ Good
Extension (old html2pdf-only) 728KB Raster ⭐⭐⭐⭐ Good

Improvement: 50% file size reduction + quality improvement for 80% of users

Dependencies

Added

  • Puppeteer (^24.34.0) - Required dependency for CLI PDF export
    • Provides headless Chrome for PDF generation
    • ~300MB Chrome binary downloaded on first install
    • CI optimized to skip download (tests don't need it)

Bundled (Extension)

  • html2pdf.js (v0.10.1) - Bundled as static file (885KB)
    • Fallback for Firefox/Safari
    • No npm dependency needed

CI/CD Changes

GitHub Actions Optimization

Updated workflows to skip Puppeteer Chrome download in CI:

- name: Install dependencies
  run: npm ci
  env:
    PUPPETEER_SKIP_DOWNLOAD: 'true'

Benefits:

  • ✅ Saves ~300MB download per CI run
  • ✅ Saves 30-60 seconds build time
  • ✅ Tests still pass (don't require Puppeteer)
  • ✅ Puppeteer fully functional for local users

Files updated:

  • .github/workflows/test.yml
  • .github/workflows/lint.yml

Testing

  • ✅ All existing tests pass (22/22 tests)
  • ✅ CLI PDF generation tested with Puppeteer
  • ✅ Extension native PDF tested in Chrome (vector quality verified)
  • ✅ Extension fallback tested (html2pdf.js)
  • ✅ Metadata layout verified (no text cutoff)
  • ✅ Date format consistency verified across all outputs
  • ✅ CI workflows validated

UI Changes

Extension Popup

  • Moved "Download format" selector above main button (better UX flow)
  • Radio buttons for Markdown/PDF selection
  • Status messages show last downloaded filename

PDF Output

Header metadata layout:

Job ID: 1445561
Company: Eden Study Abroad
Saved: Jan 10, 2026  Deadline: Feb 02, 2026  Published: Jan 03, 2026

Migration Notes

  • ✅ No breaking changes
  • ✅ Extension users: Automatically benefit from improved PDF quality
  • ✅ CLI users: npm install now includes Puppeteer automatically
  • ✅ All existing markdown files remain unchanged
  • ✅ v1.0 output contract preserved

Commits

  1. a1dede8 - feat(extension): add PDF download support
  2. e3500bb - feat(pdf): improve date format and metadata layout
  3. e6feafe - feat(extension): add native PDF generation for Chrome/Edge
  4. efa2cac - chore: make Puppeteer a required dependency
  5. 70a850b - ci: skip Puppeteer Chrome download in CI

Breaking Changes

None. Fully backward compatible with v1.0.

Checklist

  • Unit tests pass locally (npm test)
  • Lint checks pass (npm run lint)
  • v1.0 fixtures still pass
  • No breaking changes to output format
  • Manual testing performed (CLI + Extension)
  • Code follows project coding standards
  • Self-review completed
  • No debugging code left in
  • Commit messages are clear and descriptive
  • Branch is up to date with main

Documentation Updates Needed (Follow-up PR)

  • README.md - Add export command
  • docs/cli-reference.md - Document export command
  • CHANGELOG.md - Document v2.0 changes
  • package.json - Bump to v2.0.0

Future Enhancements

  • Add PDF quality settings (high/medium/low)
  • Support custom PDF templates
  • Batch export multiple jobs to PDF
  • Add watermark option

Ahnaf19 and others added 7 commits January 8, 2026 13:55
Implements jobsnap export command to generate PDF or HTML from saved jobs.

Features:
- PDF generation using Puppeteer (optional dependency)
- HTML export fallback when Puppeteer not available
- Smart .md file detection (works with custom template names)
- Clean metadata output (strips YAML frontmatter and duplicates)
- Single-line metadata header with consistent date formatting
- Index.jsonl tracking with has_pdf flag
- Professional styling with print-optimized CSS

Usage:
  jobsnap export <job_dir>              # Generate PDF (default)
  jobsnap export <job_dir> --format html # Generate HTML

Closes #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds client-side PDF generation to browser extension using html2pdf.js.

Features:
- Format selector in popup (Markdown/PDF)
- Client-side PDF generation with html2pdf.js library
- Same styling and metadata as CLI export
- Works from current tab or pasted URL
- No server required - all processing in browser

Changes:
- Added html2pdf.js (minified third-party library)
- Created core/exportPdf.js for PDF generation logic
- Updated popup UI with format radio buttons
- Updated popup.js to handle PDF downloads
- Added .eslintignore to exclude minified files
- Updated lint-staged config to skip .min.js files

Usage:
1. Select "PDF" format in extension popup
2. Click "Download from current tab" or paste URL
3. PDF downloads automatically

Related to #5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Date format changes:
- Change all user-facing dates from ambiguous numeric format
  (1/2/2026) to unambiguous month-name format (Jan 02, 2026)
- Prevents confusion between DD/MM/YYYY and MM/DD/YYYY
- Applies to CLI export, CLI list, and extension PDF export
- Original markdown content preserves scraped dates unchanged

Metadata layout fixes:
- Fix issue where long company names caused "Published" date
  to be cut off at page edge
- Display Job ID and Company on separate lines
- Add line-height: 1.8 for better spacing
- Add word-wrap and overflow-wrap for long text
- Keep dates (Saved, Deadline, Published) on same line

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement chrome.tabs.printToPDF for Chrome/Edge users to
generate high-quality vector PDFs (~200KB) matching CLI output.
Falls back to html2pdf.js for Firefox/Safari (~400KB).

Implementation:
- New service worker (background.js) handles printToPDF requests
- Browser detection routes to appropriate PDF generator
- Progressive enhancement: 80% users get perfect quality
- Export generateJobHTML + stripMetadata from exportPdf.js

Results:
- Chrome/Edge: 363KB vector PDF (vs CLI 197KB)
- Firefox/Safari: 400KB raster PDF (html2pdf.js fallback)
- Previous html2pdf-only: 728KB raster PDF
- 50% file size reduction while improving quality

Also fixes:
- Blank PDF issue (proper DOMParser HTML rendering)
- UI: move format selector above download button

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move Puppeteer from optionalDependencies to dependencies
to ensure PDF export functionality is always available.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add PUPPETEER_SKIP_DOWNLOAD env var to workflows since:
- Current tests don't require Puppeteer/Chrome
- Saves ~300MB download and 30-60s build time per run
- Puppeteer still required for local PDF generation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add .eslintignore file to exclude third-party minified libraries
from ESLint checks. Fixes CI lint failures caused by html2pdf.bundle.min.js.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Ahnaf19 Ahnaf19 merged commit 097a550 into main Jan 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant