v0.3.0 — config resilience, richer metadata, performance, CLI polish by seanbrar · Pull Request #7 · seanbrar/paperweight

seanbrar · 2026-02-16T01:52:34Z

Summary

v0.3 makes paperweight faster, more resilient to partial configs, richer in output, and more polished as a CLI tool.

Config resilience

DEFAULT_CONFIG ensures partial/minimal configs never crash — paperweight run works with only arxiv.categories set
Triage disabled by default (opt-in via config)
Log file optional (stderr-only by default when logging.file omitted)
Lower default min_score (10 → 3) for better out-of-box results

Performance

Lazy imports for heavy dependencies (pollux, psycopg) — eliminates ~1s startup cost for non-AI/non-DB paths
Parallel category fetching
Abstract mode skips content hydration entirely (fast-by-default pipeline)

Richer metadata & output

Text digest: authors (truncated at 3), matched keywords
Atom feed: <author> elements, <category> terms
JSON digest: full structured schema — arxiv_id, authors, categories, pdf_url, keywords_matched, conditional triage_score/triage_rationale/summary
docs/CLI.md updated with accurate JSON schema (10 always-present + 3 conditional fields)

CLI polish & API surface

--version flag
--quiet suppresses progress status lines on stderr
paperweight init prints clean error to stderr (no traceback) when config exists
__init__.py exposes __version__ and key public functions (load_config, get_recent_papers, score_papers, etc.)
python -c "import paperweight; print(paperweight.__version__)" now works

Pipeline improvements

Profile switching via --profile NAME flag and PAPERWEIGHT_PROFILE env var
Metadata cache (metadata_cache config section) to skip repeated arXiv API calls within a TTL window
Async per-paper triage (same semaphore pattern as summaries, concurrency = 3)
Compact triage rationales (max 20 words, whitespace-normalized, truncated to 160 chars)
Per-call LLM timeout (45s) for triage and summary to prevent hanging runs
Progress logging during triage and summary LLM calls

Docs

ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains typed data structures / immutable processing
CLI.md: synced with implementation (JSON schema, --quiet, --version, init behavior)
CHANGELOG.md: updated with all v0.3.0 additions and changes
TODOs added for _parse_args fallback simplification and __main__ redundant exception handler

Test plan

python -m pytest tests/ -v --ignore=tests/api — 114 passed, 1 skipped
paperweight --version prints paperweight 0.3.0
python -c "import paperweight; print(paperweight.__version__)" prints 0.3.0
paperweight init with existing config → clean error, no traceback
Manual: paperweight doctor --profile fast with a profiles section
Manual: paperweight run --force-refresh with metadata_cache.enabled: true — second run hits cache

Deferred to v0.4

Typed data structures (Dict[str, Any] → Paper dataclass/Pydantic)
Immutable processing (eliminate in-place mutation in process_papers)
AI enrichment polish and feedback loop

…progress logging Add profile switching (--profile / PAPERWEIGHT_PROFILE), metadata cache to skip repeated arXiv API calls within a TTL window, per-paper async triage with semaphore (replacing run_many), compact triage rationales, per-call LLM timeouts (45 s), and progress logging for triage/summary. Drops run_many dependency. No new user-facing config knobs for LLM internals. ~240 net new lines of source, all tests green.

… lockfile version sync - Fix implicit Optional in _doctor() profile parameter (mypy) - Suppress C901 complexity warnings for get_recent_papers and load_config - Clean up kick_tires.py: remove unused imports, sort imports, strip blank-line whitespace - Update uv.lock to match v0.3.0

… parallel hydration, progress UX Reorder the pipeline to score on title+abstract before fetching content, eliminating the dominant bottleneck (50-200s of PDF/source downloads) for the default abstract analyzer. Content is now fetched only in summary mode and only for papers that survive both triage and scoring. - Fix metadata cache bug: move cache lookup before days==0 early return so same-day repeat runs hit cache instead of returning empty - Enable metadata cache by default (code + config) - Decompose pipeline: triage → score → conditional hydrate → summarize - Parallelize content fetching with ThreadPoolExecutor (default 6 workers) - Add config-driven concurrency section (content_fetch, triage, summary) - Add --quiet flag and ProgressReporter for stderr status lines - Update tests for new pipeline flow (abstract mode skips hydration)

- Populate __init__.py with __version__ and key function re-exports (load_config, get_recent_papers, score_papers, etc.) - Add --version flag to CLI parser (routes through _parse_args) - Catch ValueError in init when config exists; print clean message to stderr instead of traceback, return exit code 1 - Update ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains typed data structures and immutable processing - Update tests for new init behavior; add version + public API tests Addresses review feedback on library ergonomics and CLI polish.

- CLI.md: document accurate JSON schema (10 always-present + 3 conditional fields), add --quiet and --version, update init behavior - CHANGELOG.md: add --version, public API surface, init safety to v0.3.0 section - main.py: add TODO for _parse_args fallback simplification and redundant __main__ exception handler

Config resilience: - triage disabled by default (opt-in via config) - log file optional (stderr-only when logging.file omitted) - lower default min_score (10 → 3) for better out-of-box experience - DEFAULT_CONFIG ensures partial/minimal configs never crash Performance — lazy imports: - pollux (Config, RetryPolicy, Source, run) imported at call site - psycopg imported at call site in db.py - eliminates ~1s startup cost for non-AI/non-DB paths Richer digest rendering: - text digest: authors (truncated at 3), matched keywords - atom feed: author elements, category terms - json digest: full structured schema (arxiv_id, authors, categories, pdf_url, keywords_matched, conditional triage_score/triage_rationale) Scraper & utils: - parallel category fetching - config deep-merge with DEFAULT_CONFIG - triage_enabled default flipped from True to False Test hardening: - notifier tests cover authors, keywords_matched, rich JSON schema - processor tests cover count_keywords and keywords_matched propagation - config tests cover DEFAULT_CONFIG merge and triage-disabled default - db tests updated for lazy import patch paths - analyzer tests updated for lazy import patch paths - CLI integration tests cover zero-state hint and rich JSON fields

Sean Brar added 8 commits February 14, 2026 22:55

Simplify LLM fallback behavior and triage batching

1501023

Simplify content fetching loop and max-items flow

1df7e92

seanbrar changed the title ~~feat(release): ship v0.3.0 — profiles, metadata cache, async triage~~ v0.3.0 — config resilience, richer metadata, performance, CLI polish Feb 17, 2026

seanbrar merged commit c525d2c into main Feb 17, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

v0.3.0 — config resilience, richer metadata, performance, CLI polish#7

v0.3.0 — config resilience, richer metadata, performance, CLI polish#7
seanbrar merged 8 commits intomainfrom
release/v0.3

seanbrar commented Feb 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

seanbrar commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Config resilience

Performance

Richer metadata & output

CLI polish & API surface

Pipeline improvements

Docs

Test plan

Deferred to v0.4

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

seanbrar commented Feb 16, 2026 •

edited

Loading