Skip to content

Comments

v0.3.0 — config resilience, richer metadata, performance, CLI polish#7

Merged
seanbrar merged 8 commits intomainfrom
release/v0.3
Feb 17, 2026
Merged

v0.3.0 — config resilience, richer metadata, performance, CLI polish#7
seanbrar merged 8 commits intomainfrom
release/v0.3

Conversation

@seanbrar
Copy link
Owner

@seanbrar seanbrar commented Feb 16, 2026

Summary

v0.3 makes paperweight faster, more resilient to partial configs, richer in output, and more polished as a CLI tool.

Config resilience

  • DEFAULT_CONFIG ensures partial/minimal configs never crash — paperweight run works with only arxiv.categories set
  • Triage disabled by default (opt-in via config)
  • Log file optional (stderr-only by default when logging.file omitted)
  • Lower default min_score (10 → 3) for better out-of-box results

Performance

  • Lazy imports for heavy dependencies (pollux, psycopg) — eliminates ~1s startup cost for non-AI/non-DB paths
  • Parallel category fetching
  • Abstract mode skips content hydration entirely (fast-by-default pipeline)

Richer metadata & output

  • Text digest: authors (truncated at 3), matched keywords
  • Atom feed: <author> elements, <category> terms
  • JSON digest: full structured schema — arxiv_id, authors, categories, pdf_url, keywords_matched, conditional triage_score/triage_rationale/summary
  • docs/CLI.md updated with accurate JSON schema (10 always-present + 3 conditional fields)

CLI polish & API surface

  • --version flag
  • --quiet suppresses progress status lines on stderr
  • paperweight init prints clean error to stderr (no traceback) when config exists
  • __init__.py exposes __version__ and key public functions (load_config, get_recent_papers, score_papers, etc.)
  • python -c "import paperweight; print(paperweight.__version__)" now works

Pipeline improvements

  • Profile switching via --profile NAME flag and PAPERWEIGHT_PROFILE env var
  • Metadata cache (metadata_cache config section) to skip repeated arXiv API calls within a TTL window
  • Async per-paper triage (same semaphore pattern as summaries, concurrency = 3)
  • Compact triage rationales (max 20 words, whitespace-normalized, truncated to 160 chars)
  • Per-call LLM timeout (45s) for triage and summary to prevent hanging runs
  • Progress logging during triage and summary LLM calls

Docs

  • ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains typed data structures / immutable processing
  • CLI.md: synced with implementation (JSON schema, --quiet, --version, init behavior)
  • CHANGELOG.md: updated with all v0.3.0 additions and changes
  • TODOs added for _parse_args fallback simplification and __main__ redundant exception handler

Test plan

  • python -m pytest tests/ -v --ignore=tests/api — 114 passed, 1 skipped
  • paperweight --version prints paperweight 0.3.0
  • python -c "import paperweight; print(paperweight.__version__)" prints 0.3.0
  • paperweight init with existing config → clean error, no traceback
  • Manual: paperweight doctor --profile fast with a profiles section
  • Manual: paperweight run --force-refresh with metadata_cache.enabled: true — second run hits cache

Deferred to v0.4

  • Typed data structures (Dict[str, Any]Paper dataclass/Pydantic)
  • Immutable processing (eliminate in-place mutation in process_papers)
  • AI enrichment polish and feedback loop

Sean Brar added 8 commits February 14, 2026 22:55
…progress logging

Add profile switching (--profile / PAPERWEIGHT_PROFILE), metadata cache
to skip repeated arXiv API calls within a TTL window, per-paper async
triage with semaphore (replacing run_many), compact triage rationales,
per-call LLM timeouts (45 s), and progress logging for triage/summary.

Drops run_many dependency. No new user-facing config knobs for LLM
internals. ~240 net new lines of source, all tests green.
… lockfile version sync

- Fix implicit Optional in _doctor() profile parameter (mypy)
- Suppress C901 complexity warnings for get_recent_papers and load_config
- Clean up kick_tires.py: remove unused imports, sort imports, strip blank-line whitespace
- Update uv.lock to match v0.3.0
… parallel hydration, progress UX

Reorder the pipeline to score on title+abstract before fetching content,
eliminating the dominant bottleneck (50-200s of PDF/source downloads) for
the default abstract analyzer. Content is now fetched only in summary mode
and only for papers that survive both triage and scoring.

- Fix metadata cache bug: move cache lookup before days==0 early return
  so same-day repeat runs hit cache instead of returning empty
- Enable metadata cache by default (code + config)
- Decompose pipeline: triage → score → conditional hydrate → summarize
- Parallelize content fetching with ThreadPoolExecutor (default 6 workers)
- Add config-driven concurrency section (content_fetch, triage, summary)
- Add --quiet flag and ProgressReporter for stderr status lines
- Update tests for new pipeline flow (abstract mode skips hydration)
- Populate __init__.py with __version__ and key function re-exports
  (load_config, get_recent_papers, score_papers, etc.)
- Add --version flag to CLI parser (routes through _parse_args)
- Catch ValueError in init when config exists; print clean message
  to stderr instead of traceback, return exit code 1
- Update ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains
  typed data structures and immutable processing
- Update tests for new init behavior; add version + public API tests

Addresses review feedback on library ergonomics and CLI polish.
- CLI.md: document accurate JSON schema (10 always-present + 3
  conditional fields), add --quiet and --version, update init behavior
- CHANGELOG.md: add --version, public API surface, init safety
  to v0.3.0 section
- main.py: add TODO for _parse_args fallback simplification and
  redundant __main__ exception handler
Config resilience:
- triage disabled by default (opt-in via config)
- log file optional (stderr-only when logging.file omitted)
- lower default min_score (10 → 3) for better out-of-box experience
- DEFAULT_CONFIG ensures partial/minimal configs never crash

Performance — lazy imports:
- pollux (Config, RetryPolicy, Source, run) imported at call site
- psycopg imported at call site in db.py
- eliminates ~1s startup cost for non-AI/non-DB paths

Richer digest rendering:
- text digest: authors (truncated at 3), matched keywords
- atom feed: author elements, category terms
- json digest: full structured schema (arxiv_id, authors, categories,
  pdf_url, keywords_matched, conditional triage_score/triage_rationale)

Scraper & utils:
- parallel category fetching
- config deep-merge with DEFAULT_CONFIG
- triage_enabled default flipped from True to False

Test hardening:
- notifier tests cover authors, keywords_matched, rich JSON schema
- processor tests cover count_keywords and keywords_matched propagation
- config tests cover DEFAULT_CONFIG merge and triage-disabled default
- db tests updated for lazy import patch paths
- analyzer tests updated for lazy import patch paths
- CLI integration tests cover zero-state hint and rich JSON fields
@seanbrar seanbrar changed the title feat(release): ship v0.3.0 — profiles, metadata cache, async triage v0.3.0 — config resilience, richer metadata, performance, CLI polish Feb 17, 2026
@seanbrar seanbrar merged commit c525d2c into main Feb 17, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant