v0.3.0 — config resilience, richer metadata, performance, CLI polish#7
Merged
v0.3.0 — config resilience, richer metadata, performance, CLI polish#7
Conversation
added 8 commits
February 14, 2026 22:55
…progress logging Add profile switching (--profile / PAPERWEIGHT_PROFILE), metadata cache to skip repeated arXiv API calls within a TTL window, per-paper async triage with semaphore (replacing run_many), compact triage rationales, per-call LLM timeouts (45 s), and progress logging for triage/summary. Drops run_many dependency. No new user-facing config knobs for LLM internals. ~240 net new lines of source, all tests green.
… lockfile version sync - Fix implicit Optional in _doctor() profile parameter (mypy) - Suppress C901 complexity warnings for get_recent_papers and load_config - Clean up kick_tires.py: remove unused imports, sort imports, strip blank-line whitespace - Update uv.lock to match v0.3.0
… parallel hydration, progress UX Reorder the pipeline to score on title+abstract before fetching content, eliminating the dominant bottleneck (50-200s of PDF/source downloads) for the default abstract analyzer. Content is now fetched only in summary mode and only for papers that survive both triage and scoring. - Fix metadata cache bug: move cache lookup before days==0 early return so same-day repeat runs hit cache instead of returning empty - Enable metadata cache by default (code + config) - Decompose pipeline: triage → score → conditional hydrate → summarize - Parallelize content fetching with ThreadPoolExecutor (default 6 workers) - Add config-driven concurrency section (content_fetch, triage, summary) - Add --quiet flag and ProgressReporter for stderr status lines - Update tests for new pipeline flow (abstract mode skips hydration)
- Populate __init__.py with __version__ and key function re-exports (load_config, get_recent_papers, score_papers, etc.) - Add --version flag to CLI parser (routes through _parse_args) - Catch ValueError in init when config exists; print clean message to stderr instead of traceback, return exit code 1 - Update ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains typed data structures and immutable processing - Update tests for new init behavior; add version + public API tests Addresses review feedback on library ergonomics and CLI polish.
- CLI.md: document accurate JSON schema (10 always-present + 3 conditional fields), add --quiet and --version, update init behavior - CHANGELOG.md: add --version, public API surface, init safety to v0.3.0 section - main.py: add TODO for _parse_args fallback simplification and redundant __main__ exception handler
Config resilience: - triage disabled by default (opt-in via config) - log file optional (stderr-only when logging.file omitted) - lower default min_score (10 → 3) for better out-of-box experience - DEFAULT_CONFIG ensures partial/minimal configs never crash Performance — lazy imports: - pollux (Config, RetryPolicy, Source, run) imported at call site - psycopg imported at call site in db.py - eliminates ~1s startup cost for non-AI/non-DB paths Richer digest rendering: - text digest: authors (truncated at 3), matched keywords - atom feed: author elements, category terms - json digest: full structured schema (arxiv_id, authors, categories, pdf_url, keywords_matched, conditional triage_score/triage_rationale) Scraper & utils: - parallel category fetching - config deep-merge with DEFAULT_CONFIG - triage_enabled default flipped from True to False Test hardening: - notifier tests cover authors, keywords_matched, rich JSON schema - processor tests cover count_keywords and keywords_matched propagation - config tests cover DEFAULT_CONFIG merge and triage-disabled default - db tests updated for lazy import patch paths - analyzer tests updated for lazy import patch paths - CLI integration tests cover zero-state hint and rich JSON fields
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v0.3 makes paperweight faster, more resilient to partial configs, richer in output, and more polished as a CLI tool.
Config resilience
DEFAULT_CONFIGensures partial/minimal configs never crash —paperweight runworks with onlyarxiv.categoriessetlogging.fileomitted)min_score(10 → 3) for better out-of-box resultsPerformance
pollux,psycopg) — eliminates ~1s startup cost for non-AI/non-DB pathsRicher metadata & output
<author>elements,<category>termsarxiv_id,authors,categories,pdf_url,keywords_matched, conditionaltriage_score/triage_rationale/summarydocs/CLI.mdupdated with accurate JSON schema (10 always-present + 3 conditional fields)CLI polish & API surface
--versionflag--quietsuppresses progress status lines on stderrpaperweight initprints clean error to stderr (no traceback) when config exists__init__.pyexposes__version__and key public functions (load_config,get_recent_papers,score_papers, etc.)python -c "import paperweight; print(paperweight.__version__)"now worksPipeline improvements
--profile NAMEflag andPAPERWEIGHT_PROFILEenv varmetadata_cacheconfig section) to skip repeated arXiv API calls within a TTL windowDocs
ROADMAP.md: v0.3 gains CLI polish section, v0.4 gains typed data structures / immutable processingCLI.md: synced with implementation (JSON schema,--quiet,--version, init behavior)CHANGELOG.md: updated with all v0.3.0 additions and changes_parse_argsfallback simplification and__main__redundant exception handlerTest plan
python -m pytest tests/ -v --ignore=tests/api— 114 passed, 1 skippedpaperweight --versionprintspaperweight 0.3.0python -c "import paperweight; print(paperweight.__version__)"prints0.3.0paperweight initwith existing config → clean error, no tracebackpaperweight doctor --profile fastwith a profiles sectionpaperweight run --force-refreshwithmetadata_cache.enabled: true— second run hits cacheDeferred to v0.4
Dict[str, Any]→Paperdataclass/Pydantic)process_papers)