Releases: thiswillbeyourgithub/wdoc
Release 4.0.1
What's new
What's new
This release focuses on langfuse v3 compatibility and improved error handling.
🐛 Fixes
-
Langfuse v3 compatibility
-
Document loading robustness
📝 Documentation
- [56866d1] Add warning for using youtube audio backend instead of whisper or deepgram
🔧 Maintenance
- [fb49e60] Bump version 4.0.0 → 4.0.1
Commits details since the last release
- [fb49e60] by @thiswillbeyourgithub, 13 seconds ago:
bump version 4.0.0 -> 4.0.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [07257e0] by @thiswillbeyourgithub, 3 minutes ago:
fix: use langfuse opentelemetry for v3
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [56866d1] by @thiswillbeyourgithub, 11 minutes ago:
doc: add warning for using the youtube audio backend instead of whisper or deepgram
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/youtube.py
- [89f5132] by @thiswillbeyourgithub, 14 minutes ago:
fix: langfuse callback import changed for langfuse v3
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [3039bcf] by @thiswillbeyourgithub, 20 minutes ago:
fix: do not crash if no documents after transform_documents is ran
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/init.py
- [101c7f7] by @thiswillbeyourgithub, 29 minutes ago:
add assert that docs were found
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/init.py
Release 4.0.0
What's new
What's new
This release focuses on major performance improvements through lazy loading and deferred imports, extensive code refactoring for better maintainability, and improved testing infrastructure.
⚡ Performance
- Significantly faster startup time through deferred imports and lazy loading [52985d5, dce3c24, 3ffaec3]
- Moved litellm imports to run only when needed [52985d5]
- Deferred requests import [0b4c2fb]
- Removed eager imports from
__init__.py
files [306d4ca] - Moved imports in loaders, embeddings, and core modules [de1cecc, 08b9206, fd2dcba, 1838e0f, 1bd4ced, f1740c4, 2b3d9e8, 6fbe51d, 6c74d8e, f306325]
- Added lazy loading for document loaders with
WDOC_LAZY_LOAD
env var [7fc5fad, ce10c4b]
🔧 Fixes
- Fixed forward reference type hints across multiple modules [fd6a7e7, 22b44b4, 15a2746]
- Fixed signature wrapping for parse function [29dbf5d]
- Fixed API tests for DuckDuckGo and OpenRouter [8b9ebc2, 8f511dd, 32e036d, 048f99e]
- Fixed missing filetype handling in edge cases [0422dec]
- Fixed error for Word document loading [8cad00d]
- Fixed lazy loading logic (was reversed) [a35446f]
- Fixed query_task and search_task output handling [6f633e8, 8b95a81]
- Fixed error when summary doesn't output to file using pipe [2a85a6b]
- Fixed imports in loaders [ebd4558, af85343, 986abd2, 4e61a6f]
- Added missing
audioop-lts
requirement for Python 3.13+ [56bd634]
♻️ Refactoring
- Modularized loaders: Split monolithic loader file into separate modules [df1a0ad, d3ed873, f0a3fce, b249068, 984a8d3, def441f, fb421cc]
- Created dedicated files for PDF, Anki, URL, audio, HTML, and other loaders
- Enabled lazy loading of loader modules [7fc5fad]
- Extracted task-specific functions to separate modules:
- Moved
parse_doc
toutils/tasks/parse.py
[1c7c6e4] - Moved query/search retrieval logic to task modules [7982051, c2e6142]
- Moved
evaluate_doc_chain
toshared_query_search.py
[8965c48] - Extracted query splitting logic to shared utility [4bb54a5]
- Moved
source_replace
to query.py [0ce5f4f] - Moved
autoincrease_top_k
to query.py [38e82b4]
- Moved
- Split search and query task methods with better type hints [1d94644, 824f395, 319b8eb]
- Moved
debug_exceptions
to logger module [99cc99f] - Moved VectorStore filtering code to filters.py [de4ce57]
- Added
wdocSummary
dataclass for type hinting [9fc51c0, 92f5c47] - Added lazy caching for
all_texts
property [79b1661, 7b45948] - Removed obsolete
import_tricks.py
[5116616]
🧪 Testing
- Improved test cleanup and temp folder removal [a768642, 35ef63e, c149f5d, 913378a]
- Better verbose output in cost tests [342ad3f]
- Use Mistral for OpenRouter API tests (zero data retention) [8f511dd]
- Added shell-based CLI test script for more reliable testing [cc74a84, 4170567]
- Added check for
wdoc[full]
installation [7cb9a3c] - Updated Ollama embedding test to use
embeddingsgemma
[4d47631] - Improved test assertions with more info [3d0f947]
📦 Dependencies
- Bumped langchain version [98fd2cb]
- Bumped litellm version [7aa2ce1]
- Bumped langfuse version (litellm bug fix) [fc16e5e]
- Updated general dependencies [616457c]
- Added unstructured to required dependencies [c98d0e9]
- Added bumpver to dev packages [54be0e2]
✨ Features
- Added
wdoc[full]
installation option for all optional dependencies [6321942] - Added beartype runtime type validation for numpy arrays [691dbff]
- Prioritize throughput and Groq when using OpenRouter [f049846]
- Enable lazy loading of imports by default [7c2e397]
📝 Documentation
- Updated default models to latest Gemini in README and help [761ddd1, 0868086, 78e562f]
- Clarified that binary embeddings are not always better [fb611c4]
- Added link explaining fixed cache of LLM issue [fdc3c64]
- Improved docstrings for summarization functions [a06f570]
- Added docstring for VectorStore filtering [2bd8dcb]
🎨 Code Quality
- PEP8 formatting improvements [559fcc0, 13dd7d44, e10fb05]
- Removed unused imports [16c518e, cb9563e, b5a42ef]
- Improved type hints [78d399f, a801718, 0875b23]
- Import logger first to set log level [2dead91]
- Removed
if True
statement [263a45d]
🔖 Version
- Bumped version 3.3.1 → 4.0.0 [e1548c4]
Commits details since the last release
- [e1548c4] by @thiswillbeyourgithub, 12 seconds ago:
bump version 3.3.1 -> 4.0.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [54be0e2] by @thiswillbeyourgithub, 47 seconds ago:
add bumpver to dev packages
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [37e80a7] by @thiswillbeyourgithub, 2 minutes ago:
doc: todo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [a768642] by @thiswillbeyourgithub, 3 minutes ago:
better trash removal
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [35ef63e] by @thiswillbeyourgithub, 16 minutes ago:
less verbose test removal of temp folders
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [c149f5d] by @thiswillbeyourgithub, 30 minutes ago:
enh: delete cache dir at start of test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [ae6f28a] by @thiswillbeyourgithub, 32 minutes ago:
minor: name of a test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [342ad3f] by @thiswillbeyourgithub, 39 minutes ago:
better verbose output in cost test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [8f511dd] by @thiswillbeyourgithub, 63 minutes ago:
fix: use mistral instead of openai when testing api from openrouter because it supports zero data retention
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [8b9ebc2] by @thiswillbeyourgithub, 65 minutes ago:
fix: api test for ddg from the shell
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.sh
- [fd6a7e7] by @thiswillbeyourgithub, 72 minutes ago:
fix forward reference for typehinting
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [1428497] by @thiswillbeyourgithub, 72 minutes ago:
fix forgot to test api using cli script
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [22b44b4] by @thiswillbeyourgithub, 85 minutes ago:
fix forward reference type hints
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/query.py
wdoc/wdoc.py
- [15a2746] by @thiswillbeyourgithub, 89 minutes ago:
fix forward reference type hints
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/llm.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
- [29dbf5d] by @thiswillbeyourgithub, 2 hours ago:
fix: signature wrapping for parse
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
wdoc/wdoc.py
- [7aa2ce1] by @thiswillbeyourgithub, 2 hours ago:
bump litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [98fd2cb] by @thiswillbeyourgithub, 2 hours ago:
bump langchain version
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [616457c] by @thiswillbeyourgithub, 2 hours ago:
bump dependencies
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [dce3c24] by @thiswillbeyourgithub, 2 hours ago:
actually use lazy import for litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [52985d5] by @thiswillbeyourgithub, 3 hours ago:
better startup time by defering litellm import
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/llm.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/misc.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py
- [559fcc0] by @thiswillbeyourgithub, 3 hours ago:
pep8
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/init.py
- [0b4c2fb] by @thiswillbeyourgithub, 3 hours ago:
defer requests import
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders/shared_audio.py
wdoc/utils/misc.py
- [78d399f] by @thiswillbeyourgithub, 3 hours ago:
type hint for multiquery retriever
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/retrievers.py
- [13dd7d1] by @thiswillbeyourgithub, 3 hours ago:
pep8
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/search.py
- [6ce23d4] by @thiswillbeyourgithub, 4 hours ago:
fix import statements
Signed-off-by: thiswillbeyourgithub <26625900+thiswillbeyourgithub@users.noreply...
Release 3.3.1
What's new
This release focuses on improving code quality through comprehensive type hint fixes and enhanced testing infrastructure.
🔧 Fixes
-
Type Hints: Comprehensive type hint improvements across the codebase
- Binary FAISS vectorstore type hints ([e46ed4a], [ac02a65], [da864cd], [95c3705], [81b36cb], [be3f352])
- Loader function type hints ([e6fcad8], [e65abad], [b624373])
- Semantic batching type hints ([f3a5289], [dd6ad29])
- Prompt template type hints ([1b4ec86], [bc56beb])
- General type hint fixes ([d4e99fd])
-
Model Compatibility: Fixed issue where some models consider
<answer>
as implying</think>
([09684bb]) -
Langchain Integration: Fixed callable_chain compatibility by creating runnables without decorators ([0c89cac])
✨ Enhancements
- Type Checking: Replaced manual type checking with import hook system ([56b353a], [6b3ddab])
- Logging: Reduced verbosity of litellm logging ([9a4a69c])
- Search: Added duplicate check for DuckDuckGo search results ([f68c8a4])
🧪 Tests
- Added comprehensive test for DuckDuckGo search functionality ([7dbd3c2])
- Fixed existing CLI tests ([781f6d6])
📦 Version
- Bumped version from 3.3.0 to 3.3.1 ([0690df9])
Commits details since the last release
- [0690df9] by @thiswillbeyourgithub, 41 seconds ago:
bump version 3.3.0 -> 3.3.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [7dbd3c2] by @thiswillbeyourgithub, 8 hours ago:
test: add test for DuckDuckGo search functionality
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) aider@aider.chat
tests/test_wdoc.py
- [781f6d6] by @thiswillbeyourgithub, 9 hours ago:
fix: test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [e46ed4a] by @thiswillbeyourgithub, 14 hours ago:
fix: typehint for marginal score search
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [ac02a65] by @thiswillbeyourgithub, 18 hours ago:
fix: type hint of binary faiss
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [f68c8a4] by @thiswillbeyourgithub, 19 hours ago:
add check for duplicate ddg result
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [da864cd] by @thiswillbeyourgithub, 19 hours ago:
fix: binary faiss type hints
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [9a4a69c] by @thiswillbeyourgithub, 19 hours ago:
enh: tune down the verbosity of litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [e6fcad8] by @thiswillbeyourgithub, 19 hours ago:
fix: type hint of load_one_doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py
- [e65abad] by @thiswillbeyourgithub, 20 hours ago:
Revert "fix: typehint of load_one_doc"
This reverts commit f0037b54ac5ce317442e672f12e1da266b58c5c1.
wdoc/utils/loaders.py
- [b624373] by @thiswillbeyourgithub, 20 hours ago:
fix: typehint of load_one_doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py
- [95c3705] by @thiswillbeyourgithub, 20 hours ago:
fix: typehints for binary faiss
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [f3a5289] by @thiswillbeyourgithub, 20 hours ago:
fix: type hints for semantic batching
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/query.py
- [d4e99fd] by @thiswillbeyourgithub, 20 hours ago:
fix: type hints
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [81b36cb] by @thiswillbeyourgithub, 21 hours ago:
forgot some type hint for binary faiss
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [dd6ad29] by @thiswillbeyourgithub, 21 hours ago:
fix: wrong typehint in semantic_batch
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/query.py
- [09684bb] by @thiswillbeyourgithub, 21 hours ago:
fix: some models consider than implied
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [0c89cac] by @thiswillbeyourgithub, 22 hours ago:
fix: actually callable_chain does not work for langchain so we have to make runnables without decorators
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/callable_runnable.py
wdoc/utils/misc.py
wdoc/utils/tasks/query.py
wdoc/wdoc.py
- [6b3ddab] by @thiswillbeyourgithub, 22 hours ago:
new: remove the ubiquitous optional_typecheck decorator
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/embeddings.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders.py
wdoc/utils/logger.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/utils/typechecker.py
wdoc/wdoc.py
- [56b353a] by @thiswillbeyourgithub, 22 hours ago:
new: neutralize manual type checking and instead use the import hook
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/init.py
wdoc/utils/customs/callable_runnable.py
wdoc/utils/misc.py
wdoc/utils/tasks/query.py
wdoc/utils/typechecker.py
wdoc/wdoc.py
- [be3f352] by @thiswillbeyourgithub, 22 hours ago:
fix: type hints in binary faiss
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/binary_faiss_vectorstore.py
- [1b4ec86] by @thiswillbeyourgithub, 23 hours ago:
fix: wrong type for Prompts class
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/prompts.py
- [bc56beb] by @thiswillbeyourgithub, 23 hours ago:
add type checking to prompt template
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/prompts.py
Release 3.3.0
What's new
This release focuses on adding DuckDuckGo web search capabilities and introducing binary embeddings support for more efficient vector storage.
✨ New Features
DuckDuckGo Web Search Integration
- [372fe57] Add DuckDuckGo search support with URL extraction and metadata
- [273195e] Support
wdoc wdb "your query"
shorthand for web search - [03bfe08] Add DuckDuckGo search tests and documentation
Binary Embeddings Support
- [c528bad] Add support for binary embeddings with 8x memory reduction
- [8f65197] Enable FAISS vectorstore compression by default
- [37ebd97] Create CompressedFAISS subclass with zlib compression
🐛 Bug Fixes
Core Functionality
- [0d72efd] Fix wrong decorator used for
load_one_doc
- [edcf671] Fix
ddg_region
type (str not int) - [66ab177] Fix type hints for
ddg_safesearch
andloading_failure
- [957936c] Use keyword arguments instead of fire when calling wdoc
Testing Environment
- [d3de58e] Fix piped input/output handling in pytest environment
- [42ff516] Prevent pipe usage in pytest environment
- [c78dc0b] Add pytest environment detection
🧪 Testing Improvements
- [1b09996] Fix the
run_all_test
script - [8ed1d0c] Add comprehensive DuckDuckGo search functionality tests
- [b184177] Split CLI tests into separate
test_cli.py
file - [9d7fe9c] Split parsing tests into separate
test_parsing.py
file - [12b012d] Move vector store tests to dedicated test file
📚 Documentation
- [d7d6b04] Explain how to run tests in README
- [dc15001] Clarify how to disable parallel processing
- [df4b79f] Document debug mode's effect on
loading_failure
default - [1832299] Add shell examples for DuckDuckGo usage
🔧 Enhancements
CLI/UX Improvements
- [7e994a6] Rename
parse_file
function toparse_doc
- [4aa247e] Re-ask for input when empty query provided in CLI
- [57d5d5f] Fix Fire's pager issue in CLI
Performance THISISANAMPERSAND Reliability
- [68d4c75] Bump LiteLLM to latest version for improved startup time
- [ab9c5e9] Add parallel processing option for Whisper audio splits
- [6b13044] Add loop counter and crash protection for recursive file processing
🔄 Version Update
- [6435133] Bump version from 3.2.5 → 3.3.0
Commits details since the last release
- [6435133] by @thiswillbeyourgithub, 36 minutes ago:
bump version 3.2.5 -> 3.3.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [1b09996] by @thiswillbeyourgithub, 24 hours ago:
test: fix the run_all_test script
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [d7d6b04] by @thiswillbeyourgithub, 24 hours ago:
doc: explain how to run the tests
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [62cc2ce] by @thiswillbeyourgithub, 24 hours ago:
fix: ddg test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [0d72efd] by @thiswillbeyourgithub, 24 hours ago:
fix: wrong decorator used for load_one_doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py
- [dc15001] by @thiswillbeyourgithub, 24 hours ago:
doc: clarify how to disable parallel processing
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [e0453cb] by @thiswillbeyourgithub, 24 hours ago:
minor: mention a type hint
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [edcf671] by @thiswillbeyourgithub, 24 hours ago:
fix: ddg_region is actually a str not an int
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [df4b79f] by @thiswillbeyourgithub, 24 hours ago:
doc: mention that debug changes the default value for loading_failure
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [66ab177] by @thiswillbeyourgithub, 25 hours ago:
fix: type of ddg_safesearch and loading_failure should be Literal
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [98b0867] by @thiswillbeyourgithub, 25 hours ago:
doc: explain that loading_failure defaultto crash when parsing
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [90eacb3] by @thiswillbeyourgithub, 25 hours ago:
test: ddg should use us region by default
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [c8b1944] by @thiswillbeyourgithub, 25 hours ago:
test: less severe check for pipes
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [6e12e5c] by @thiswillbeyourgithub, 25 hours ago:
test: remove one -n auto arg
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [d3de58e] by @thiswillbeyourgithub, 2 days ago:
fix: actually inside pytest we should not bypass piped input but only piped output
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
wdoc/utils/misc.py
- [5715bc4] by @thiswillbeyourgithub, 2 days ago:
test: add env variable to detect if being called by pytest
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
tests/conftest.py
- [42ff516] by @thiswillbeyourgithub, 2 days ago:
new: do not allow using pipe input or output in pytest environment
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [c78dc0b] by @thiswillbeyourgithub, 2 days ago:
new: detect when wdoc is called in pytest environment
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
wdoc/utils/env.py
wdoc/utils/misc.py
- [fca39c0] by @thiswillbeyourgithub, 2 days ago:
test: missing oneoff and failsafe when testing ddg
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [b2b4cf1] by @thiswillbeyourgithub, 2 days ago:
test: fix missing quotation sign for args
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [13409b1] by @thiswillbeyourgithub, 2 days ago:
test: fix a timeout not long enough
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [957936c] by @thiswillbeyourgithub, 2 days ago:
fix: use keyword aguments instead of fire when calling wdoc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [b034337] by @thiswillbeyourgithub, 2 days ago:
minor
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [b44d730] by @thiswillbeyourgithub, 2 days ago:
fix: replacing ddg_max_result
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [dfcaf3b] by @thiswillbeyourgithub, 2 days ago:
fix: wrong way to replace ddg_max_result to ddg_max_results
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [adc991a] by @thiswillbeyourgithub, 2 days ago:
actually no
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py
- [9ab4cbf] by @thiswillbeyourgithub, 2 days ago:
fix: type hint of load_one_doc can be a list of string in case of error
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py
- [5f7fcf4] by @thiswillbeyourgithub, 2 days ago:
typo: Nvidia instead of NVidia
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
tests/test_cli.py
wdoc/docs/examples.md
- [03bfe08] by @thiswillbeyourgithub, 2 days ago:
test: add test for ddg search
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [48165fa] by @thiswillbeyourgithub, 2 days ago:
test: clearer echo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [08cac94] by @thiswillbeyourgithub, 2 days ago:
remove unused import
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_cli.py
- [3aabd2d] by @thiswillbeyourgithub, 2 days ago:
style: format test_cli.py with linter
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) aider@aider.chat
tests/test_cli.py
- [8ed1d0c] by @thiswillbeyourgithub, 2 days ago:
feat: add test for DuckDuckGo search functionality with NVIDIA query
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) aider@aider.chat
tests/test_cli.py
- [a8e3e04] by @thiswillbeyourgithub, 2 days ago:
test: add test for DuckDuckGo search with NVIDIA query
Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4) aider@aider.chat
tests/test_cli.py
- [1832299] by @thiswillbeyourgithub, 2 days ago:
doc: add shell example for using duckduckgo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/examples.md
- [e6c4641] by @thiswillbeyourgithub, 2 days ago:
typo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/examples.md
- [917ee51] by @...
Release 3.2.5
What's new
This release brings several improvements to command-line argument handling and filetype detection, along with key bug fixes and build process enhancements.
✨ Features
- CLI & Filetype Detection:
- Build Process:
- Integrated
sphinx-apidoc
into the ReadTheDocs build process via a pre-build job in.readthedocs.yaml
([cc86c7b]).
- Integrated
🐛 Fixes
- Corrected an issue with
sys.argv
handling that led to duplicated arguments ([e7cf185]). - Updated
litellm
dependency to resolve crashes experienced on Windows environments ([cfff0ac]), see #20.
🛠️ Improvements & Refactoring
- Filetype Detection Internals:
- Code Quality:
- Improved documentation by adding docstrings to custom exception classes ([8e6ca1a]).
Chores
- Version bumped to 3.2.5 ([82b7f81]).
Commits details since the last release
- [82b7f81] by @thiswillbeyourgithub, 19 minutes ago:
bump version 3.2.4 -> 3.2.5
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [e7cf185] by @thiswillbeyourgithub, 3 minutes ago:
fix: badly handled sys.argv was duplicating args
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [cfff0ac] by @thiswillbeyourgithub, 19 minutes ago:
fix: bump version of litellm because windows crashes
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [cc86c7b] by @thiswillbeyourgithub (aider), 2 days ago:
feat: add pre-build job to run sphinx-apidoc in .readthedocs.yaml
.readthedocs.yaml
- [ab76610] by @thiswillbeyourgithub, 2 days ago:
new: use the filetype detector to infer what to do in case of multiple implicit arguments from the cli
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/main.py
- [05966c6] by @thiswillbeyourgithub, 2 days ago:
enh: add debug prints to the filetype detector
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [520f4ce] by @thiswillbeyourgithub, 2 days ago:
new: use a specific exception when we can't infer the filetype
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
- [8e6ca1a] by @thiswillbeyourgithub, 2 days ago:
add docstring to some exceptions
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/errors.py
- [39af223] by @thiswillbeyourgithub, 2 days ago:
add an error for undetectable filetype
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/errors.py
- [b453748] by @thiswillbeyourgithub, 2 days ago:
new: put the filetype detection code in a separate function
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
Release 3.2.4
What's new
This release primarily focuses on significant documentation enhancements, crucial bug fixes for stability and build processes, and introduces updated dependencies and tokenization.
✨ New Features
- Upgraded default token estimation to use
gpt-4o-mini
tokenizer, replacinggpt-3.5-turbo
([6d41817]). - Integrated the latest
yt-dlp
for YouTube downloads ([ab207b4]). - Environment variable documentation is now automatically added to the
EnvDataclass
class__doc__
([ed9dd38]).
🐛 Bug Fixes
- Resolved a crash on ReadTheDocs caused by missing
yt-dlp
dependency ([f5068a3]). - Fixed an issue where accessing
env.__class__
on ReadTheDocs could cause a crash ([4e180f0]). - Corrected relative import paths in
wdoc
that were preventing Sphinx API documentation builds ([ade5930]). - Fixed issues with the Sphinx API command in the FAQ section of the README ([38008aa], [ff093a2]).
- Ensured collapsible bars in documentation function correctly ([3cef833]).
📚 Documentation & Refinements
- Extensive updates and fixes to Sphinx documentation generation and content:
- Addressed outdated Sphinx documentation files ([90bde99]).
- Improved API autodoc parameters for clearer documentation ([243de66]).
- Excluded private and special members from documentation ([7abedd4]).
- Added Sphinx command to FAQ in README ([1e6602e]) and removed private members from it ([11ae11b]).
- Updated copyright year to 2025 ([bd7e3c5]).
- Streamlined documentation structure and configuration:
- Removed unused make files (
Makefile
,make.bat
) for documentation ([07b0a7d]). - Removed unused argument for theme flyout display ([17bc5e6]).
- Removed unused templates path ([6bffa20]) and CSS ([712df08]).
- Removed duplicate README from the documentation source ([2b93162]).
- Added a documentation table to the main index ([1dfe2b3]).
- Removed unused make files (
⚙️ Build & Chores
- Bumped version to 3.2.4 ([ed7a9c7]).
Commits details since the last release
- [ed7a9c7] by @thiswillbeyourgithub, 20 seconds ago:
bump version 3.2.3 -> 3.2.4
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [f5068a3] by @thiswillbeyourgithub, 13 minutes ago:
fix: missing yt-dlp makes readthedock crash
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
- [17bc5e6] by @thiswillbeyourgithub, 19 minutes ago:
remove unused argument for theme flyout display
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [4e180f0] by @thiswillbeyourgithub, 22 minutes ago:
fix: class attribute of env is accessed by readthedocks and should not crash
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
- [243de66] by @thiswillbeyourgithub, 2 hours ago:
saner api autodoc parameters
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [ed9dd38] by @thiswillbeyourgithub, 3 hours ago:
new: add the environment variable documentation to the doc of the EnvDataclass class
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
wdoc/utils/env.py
- [07b0a7d] by @thiswillbeyourgithub, 3 hours ago:
doc: remove unused make files for doc
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
docs/Makefile
docs/make.bat
- [7abedd4] by @thiswillbeyourgithub, 4 hours ago:
doc: dont include private nor special
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [38008aa] by @thiswillbeyourgithub, 2 hours ago:
fix: sphinx api command of faq
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [11ae11b] by @thiswillbeyourgithub, 4 hours ago:
remove private from sphinx command
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [90bde99] by @thiswillbeyourgithub, 4 hours ago:
fix outdated sphinx doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/wdoc.rst
docs/source/wdoc.utils.batch_file_loader.rst
docs/source/wdoc.utils.customs.compressed_embeddings_cache.rst
docs/source/wdoc.utils.customs.fix_llm_caching.rst
docs/source/wdoc.utils.customs.rst
docs/source/wdoc.utils.embeddings.rst
docs/source/wdoc.utils.env.rst
docs/source/wdoc.utils.errors.rst
docs/source/wdoc.utils.flags.rst
docs/source/wdoc.utils.import_tricks.rst
docs/source/wdoc.utils.interact.rst
docs/source/wdoc.utils.llm.rst
docs/source/wdoc.utils.loaders.rst
docs/source/wdoc.utils.logger.rst
docs/source/wdoc.utils.misc.rst
docs/source/wdoc.utils.prompts.rst
docs/source/wdoc.utils.retrievers.rst
docs/source/wdoc.utils.rst
docs/source/wdoc.utils.tasks.query.rst
docs/source/wdoc.utils.tasks.rst
docs/source/wdoc.utils.tasks.summarize.rst
docs/source/wdoc.utils.typechecker.rst
docs/source/wdoc.wdoc.rst
- [ff093a2] by @thiswillbeyourgithub, 4 hours ago:
fix: sphinx api command of faq
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [ade5930] by @thiswillbeyourgithub, 4 hours ago:
fix: relative wdoc imports were stopping sphinx api build
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/init.py
wdoc/main.py
wdoc/utils/init.py
wdoc/utils/batch_file_loader.py
wdoc/utils/customs/init.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/import_tricks.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders.py
wdoc/utils/logger.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/init.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/utils/typechecker.py
wdoc/wdoc.py
- [1e6602e] by @thiswillbeyourgithub, 5 hours ago:
doc: add to faq the sphinx command
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [bd7e3c5] by @thiswillbeyourgithub, 5 hours ago:
update copyright year to 2025
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [6bffa20] by @thiswillbeyourgithub, 6 hours ago:
remove unused templates path in doc
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [2b93162] by @thiswillbeyourgithub, 6 hours ago:
remove duplicate readme from the doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/index.rst
- [3cef833] by @thiswillbeyourgithub, 6 hours ago:
fix collapsible bar
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/conf.py
- [712df08] by @thiswillbeyourgithub, 6 hours ago:
remove unused css from the doc
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/_static/custom.css
docs/source/conf.py
- [1dfe2b3] by @thiswillbeyourgithub, 6 hours ago:
documentation table
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
docs/source/index.rst
- [6d41817] by @thiswillbeyourgithub, 25 hours ago:
new: use gpt-4o-mini tokenizer by default to estimate tokens
previously we used the ageing gpt-3.5-turbo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
wdoc/utils/misc.py
- [ab207b4] by @thiswillbeyourgithub, 25 hours ago:
new: use the latest yt-dl install from yt-dlp
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
Release 3.2.3
What's new
This release primarily focuses on enhancing context management for embedding models, improving debugging utilities, and updating documentation for better clarity. It also includes several important bug fixes and feature additions.
✨ Features
- Introduced a new environment variable
WDOC_MAX_EMBED_CONTEXT
to allow capping the context size for embedding models ([d9e200f8]
)- Documentation for this new variable has been added (
[a2408fd0]
)
- Documentation for this new variable has been added (
- Enhanced debugging by ensuring debug prints are always active when
md_printer
is used. This helps in retrieving LLM answers from logs if they weren't saved to a file ([69db1916]
) - Added the current date to summary metadata and headers to help reduce potential LLM hallucinations (
[64ca4665]
)
🐛 Fixes
- Text Splitting & Context Handling:
- Addressed an issue where large language models have more context than embedding models by setting a
max_tokens
limit for the text splitter ([dac6802d]
) - Fixed an edge case where the
wdoc max chunk
setting could be ignored ([196b3a00]
) - Corrected an old variable name within the text splitting logic (
[767bc754]
)
- Addressed an issue where large language models have more context than embedding models by setting a
- Updated the default model to
gemini 2.5 preview
to reflect its renaming on OpenRouter ([22978609]
) - Improved the mechanism for ignoring initial "breathing" or placeholder lines in summaries (
[4dbcf158]
)
📚 Documentation
- Clarity and Enhancements:
- Clarified the usage of
save
andload
functionalities ([9d9642d4]
) and specifically advised against using them simultaneously ([5270c350]
) - Made multiple clarifications to the README for better understanding (
[9284ff54]
,[cb4cb519]
,[f677e5a2]
,[39e0da55]
) - Updated Ollama examples to recommend
snowflake-arctic-embed2
instead ofbge-m3
([d045702b]
) - Added documentation for the
WDOC_MAX_EMBED_CONTEXT
environment variable ([a2408fd0]
)
- Clarified the usage of
- Removed a documentation file (
summary_rag.md
) that was not yet ready for release ([6d20c220]
)
⚙️ Chore & Maintenance
- Version bumped to
3.2.3
(following an earlier bump to3.2.2
[[71ac503c]
]) ([f62a2322]
) - README Updates:
- Updated TODO items (
[8f2cbfd7]
,[5d090421]
) - Added a PyPI badge for better project visibility (
[60ef4112]
)
- Updated TODO items (
Commits details since the last release
- [f62a232] by @thiswillbeyourgithub, 46 seconds ago:
bump version 3.2.2 -> 3.2.3
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [6d20c22] by @thiswillbeyourgithub, 76 seconds ago:
doc: removed file not yet ready
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
summary_rag.md
- [71ac503] by @thiswillbeyourgithub, 4 minutes ago:
bump version 3.2.1 -> 3.2.2
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [8f2cbfd] by @thiswillbeyourgithub, 3 minutes ago:
todo
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [69db191] by @thiswillbeyourgithub, 40 minutes ago:
new: now debug print is used anyway when md_printer is used
this is to make you able to go to the logs to fetch and answer form the
LLM if you have forgotten to store it to a file
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
wdoc/wdoc.py
- [a2408fd] by @thiswillbeyourgithub (aider), 66 minutes ago:
docs: Add documentation for WDOC_MAX_EMBED_CONTEXT variable
wdoc/docs/help.md
- [d9e200f] by @thiswillbeyourgithub, 66 minutes ago:
feat: add new env var to cap the context size for embedding models
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
wdoc/utils/misc.py
- [196b3a0] by @thiswillbeyourgithub, 72 minutes ago:
fix: edge case where wdoc max chunk would be ignored
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [dac6802] by @thiswillbeyourgithub, 76 minutes ago:
fix: set a limit to max_tokens for the text splitter as large LLM have more context than embeddings models nowadays
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [767bc75] by @thiswillbeyourgithub, 80 minutes ago:
fix: forgot to rename an old variable name
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [2297860] by @thiswillbeyourgithub, 86 minutes ago:
fix: set default model to gemini 2.5 preview without date timestamp
openrouter renamed that model apparently
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
wdoc/utils/env.py
- [9d9642d] by @thiswillbeyourgithub, 22 hours ago:
doc: clarify save and load
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [5270c35] by @thiswillbeyourgithub, 22 hours ago:
doc: clarify that load and save shouldnt be used at the same time
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [d045702] by @thiswillbeyourgithub, 23 hours ago:
doc: use snowflake-arctic-embed2 instead of bge-m3 for ollama examples
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/examples.md
- [60ef411] by @thiswillbeyourgithub, 26 hours ago:
add a pypi badge
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [5d09042] by @thiswillbeyourgithub, 7 days ago:
update todo
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [9284ff5] by @thiswillbeyourgithub, 7 days ago:
doc: clarify
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [cb4cb51] by @thiswillbeyourgithub, 7 days ago:
doc: clarify
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [f677e5a] by @thiswillbeyourgithub, 7 days ago:
doc: clarify
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [39e0da5] by @thiswillbeyourgithub, 7 days ago:
doc: clarify
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
- [64ca466] by @thiswillbeyourgithub (aider), 10 days ago:
feat: Add current date to summary metadata and header to reduce hallucinations
wdoc/wdoc.py
- [4dbcf15] by @thiswillbeyourgithub, 10 days ago:
enh: better ignoring of first line of summary if just breathing
Signed-off-by: thiswillbeyourgithub
26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/summarize.py
Release 3.2.1
What's new
This small patch release primarily focuses on integrating OpenRouter for model pricing/metadata and refining cost calculations.
✨ Features
- Set default models to use OpenRouter ([915699c]).
- Fetch model prices and metadata automatically from OpenRouter, improving reliability ([7f840b7]).
🐛 Fixes & Enhancements
- Much improved price calculation and handling:
- Updated
litellm
dependency ([179b589]).
🧪 Tests
- API integration tests now fail faster if an underlying API call fails ([9a0c856]).
Commits details since the last release
- [03aeab2] by @thiswillbeyourgithub, 2 minutes ago:
bump version 3.2.0 -> 3.2.1
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [915699c] by @thiswillbeyourgithub, 6 minutes ago:
new: set the default models to use openrouter
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
README.md
wdoc/utils/env.py
- [c0b90d8] by @thiswillbeyourgithub, 64 minutes ago:
fix: reworked how pricing are computed to take internal thinking into account
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py
- [a17b41c] by @thiswillbeyourgithub, 80 minutes ago:
enh: better way to get the model prices
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
wdoc/wdoc.py
- [9a0c856] by @thiswillbeyourgithub, 22 minutes ago:
test: crash early if one api crash fails
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/run_all_tests.sh
- [7f840b7] by @thiswillbeyourgithub, 2 hours ago:
feat: automatically fetch the price and metadata from openrouter instead of waiting for litellm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
wdoc/wdoc.py
- [2b29a9d] by @thiswillbeyourgithub, 2 hours ago:
fix: error message on missing model price
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [179b589] by @thiswillbeyourgithub, 2 hours ago:
bump litellm version
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
setup.py
Release 3.2.0
What's new
This release focuses on improving the command-line interface (especially handling piped input/output), enhancing language model interactions (switching defaults, better caching, Ollama support), and increasing overall stability through numerous bug fixes and testing improvements.
✨ Features
- Added arguments to set specific keyword arguments (
kwargs
) for language models (--model_kwargs
,--query_eval_model_kwargs
) ([1392553]). - Introduced
WDOC_LLM_REQUEST_TIMEOUT
environment variable for LLM request timeouts (default 600s), useful for Ollama ([ec3c0c5]). - Switched default models from Claude Sonnet/Haiku to Gemini 1.5 Pro/Flash ([82ef10d]).
- Unified LLM handling to primarily use
ChatLiteLLM
, removing directChatOpenAI
usage ([30a0f0c]). - Enabled cost tracking for queries, storing the cost in the output ([e7753af]).
- Added automatic download of
nltk punkt
tokenizer during post-installation ([44f5bf8]). - Overhauled Command Line Interface (CLI) argument parsing for
wdoc
andwdoc parse
usingfire
([7c51ed2], [2f4748d]). - Removed the
--pipe
argument, relying on automatic stdin/stdout detection ([b03e79a], [2e6c1dd], [838f164]). - Removed the separate
wdoc_parse_file
entry point; usewdoc parse
instead ([2e878d2]). - Added a new script
media_url_finder.py
([beaf8fa]).
🐛 Fixes
- LLM PLACEHOLDER Caching:
- Resolved issues with LLM caching, including invalidation when
kwargs
change and LangChain's SQLite cache ([cb785da], [3e3e753]). - Fixed edge cases in thinking block parsing for models like Gemini and updated tags (
<thinking>
-><think>
) ([e111bdb], [d0ae21a], [ca9245b], [99ed332]). - Corrected underflow errors in cost calculation due to low LLM prices ([3f18f5d], [95a1984]).
- Addressed issues specific to Ollama: API key requirement relaxation, price assumption (zero),
litellm
naming (ollama_chat
->ollama
), and context window estimation ([d2f92a3], [5784b25], [43c6340], [c3c15e1]). - Fixed handling of
testing/testing
models and associated parameters ([b995197], [91b5e67], [7cf840c], [9a7b95b]). - Fixed
query_retrievers
parsing ([02d7412]). - Pinned
litellm
version for stability ([1b17c78]).
- Resolved issues with LLM caching, including invalidation when
- CLI PLACEHOLDER Piping:
- Improved detection and handling of piped input/output ([2e6c1dd], [509626a], [db2fa0f]).
- Fixed crashes and hangs when using pipes, especially with long inputs or specific test commands ([f59f34b], [414de8d], [b95b125], [826e7aa], [b6f7fd7], [177be6b]).
- Corrected argument parsing issues affecting the
--help
command ([c909337]). - Ensured logs are not colorized and Markdown rendering is disabled when outputting to a pipe ([f1d63cd], [fe2665c]).
- Fixed issues where debug prints or warnings were incorrectly suppressed or handled ([64fcd60], [a7724ff]).
- General:
- Fixed various bugs in task execution, parameter handling, and attribute declarations ([27a8d35], [91d8df3], [a0eaf51], [a6effc0], [5dce2f3], [4623fcc], [b17f567], [8cc9190], [e91ed3b], [c3649ab]).
- Corrected import path in
__main__
([0ef5e4d]). - Suppressed excessive INFO logs from
faiss
([a17a8d1]). - Handled
BrokenPipeError
gracefully ([b40832b]).
🧪 Testing
- Improved test setup for caching, using separate directories and disabling cache where necessary ([9104f86], [89f4859], [085a87e], [6935fe7]).
- Added tests for OpenRouter/default models, piping functionality, summary/query tasks with testing models, and environment variable handling ([06e35b0], [bbb8371], [caae34c], [cb9d237], [eaafafd], [1f835eb]).
- Refactored pipe tests to use
subprocess
explicitly and fixed related issues (stderr redirection, pytest capture, shell usage) ([38a3571], [7f3249a], [573acf9]). (Note: Some pipe tests were later commented out ([45cf419])).
⚡ Enhancements
- Reworked logic for detecting and modifying model parameters based on the task ([564c4f9]).
- Improved
load_media
function to handle online media more robustly by finding and clicking appropriate buttons ([049c9cb], [67772f8], [c5828d3]). - Added checks to prevent exceeding total token limits during summarization ([9bdcabc]).
- Refined logging levels and Markdown printing logic ([edfec82], [4ca394c], [895a60f]).
📚 Documentation
- Updated examples for Ollama arguments, model usage (Gemma -> Qwen2), and general clarity ([0087117], [49437ec], [4083dda], [404bbe4]).
- Clarified behavior related to LLM caching and model
kwargs
in help documentation ([c3e0219], [3e3e753], [1392553], [7db844f]). - Updated README and help files reflecting changes in default models, CLI arguments, and entry points ([82ef10d], [b03e79a], [2e878d2], [a30bccf]).
⚙️ Build PLACEHOLDER Chore
- Bumped version to 3.2.0 ([7d69d79]).
- Added
nltk
to dependencies ([44f5bf8]). - Updated
.gitignore
([84aa559], [5374ee1], [39e4106], [a25e3d4]). - Renamed
embed_kwargs
toembed_model_kwargs
([431efcb]).
Commits details since the last release
- [7d69d79] by @thiswillbeyourgithub, 77 seconds ago:
bump version 3.1.0 -> 3.2.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [30a0f0c] by @thiswillbeyourgithub, 24 minutes ago:
new: stop using both ChatOpenAI and ChatLiteLLM
ChatLiteLLM seems to now work reliably
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
wdoc/docs/help.md
wdoc/utils/llm.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
- [9104f86] by @thiswillbeyourgithub, 41 minutes ago:
fix: in the pytest we should delete the cache dir regularly
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/conftest.py
- [e111bdb] by @thiswillbeyourgithub, 46 minutes ago:
fix: fix edge case for gemini models that only end their thinking block
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [84aa559] by @thiswillbeyourgithub, 78 minutes ago:
test: ignore cache dir
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
.gitignore
- [89f4859] by @thiswillbeyourgithub, 79 minutes ago:
test: use a separate user dir for the cache when running the tests
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
wdoc/utils/misc.py
- [3f18f5d] by @thiswillbeyourgithub, 79 minutes ago:
fix: underflow error in cost
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py
- [27a8d35] by @thiswillbeyourgithub, 2 hours ago:
fix: latest cost attribute was not declared
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [c3c15e1] by @thiswillbeyourgithub, 2 hours ago:
enh: if ollama is used, lower the estimate of the context window
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/llm.py
- [91d8df3] by @thiswillbeyourgithub, 2 hours ago:
fix: wrong indentation in an if
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [a0eaf51] by @thiswillbeyourgithub, 2 hours ago:
fix: wrong deepcopy for eval llm
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [29c9c4e] by @thiswillbeyourgithub, 2 hours ago:
fix: test
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [cb785da] by @thiswillbeyourgithub, 2 hours ago:
fix: make the sqlite cache already patched for langchain s stupid cache
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/customs/fix_llm_caching.py
- [3e3e753] by @thiswillbeyourgithub, 2 hours ago:
fix: try to make it so that changing the kwargs does not reuse the cache
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
wdoc/utils/llm.py
- [c3e0219] by @thiswillbeyourgithub, 2 hours ago:
doc: explain that changing the kwargs will not invalidate the cache
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
- [827d563] by @thiswillbeyourgithub, 2 hours ago:
test: improved test to also test caching
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [e7753af] by @thiswillbeyourgithub, 2 hours ago:
new: store the cost of the query in the output now
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/wdoc.py
- [d0ae21a] by @thiswillbeyourgithub, 2 hours ago:
fix: reworked and improved how thinking_answer_parser works
some weak models could fail despite usable results
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [ca9245b] by @thiswillbeyourgithub, 3 hours ago:
fix: dont make the thinking block parser greedy
I'm sure some models can nest thoughts
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
- [99ed332] by @thiswillbeyourgithub, 3 hours ago:
fix: most models nowadays use not
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/tasks/query.py
- [085a87e] by @thiswillbeyourgithub, 3 hours ago:
test: disable the embedding cache
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
...
Release 3.1.0
What's new
This release primarily focuses on enhancing logging capabilities and fixing issues related to piping behavior.
Version bump to 3.1.0
([e93dcad6]
).
✨ New Features
- Logging:
- Always display the default log location (
[2fe2c431]
). - Set log level to debug for log files and critical when used in a pipe (
[130058a1]
).
- Always display the default log location (
🚀 Enhancements
- Logging:
- Improved log format (
[61465aff]
,[dc06ccfd]
). - Increased probability of early logger initialization (
[01f01ac7]
). - Clearer error messages from python-magic (
[c846dafa]
).
- Improved log format (
🐛 Fixes
- Piping:
- Resolved confusion between input and output during piping (
[e175b7d5]
). - Corrected initialization of
is_piped
variable ([e4532d30]
).
- Resolved confusion between input and output during piping (
- Logging & Environment:
- Fixed default handler issue in logger (
[43c859dd]
). - Prevented potential crash related to environment variable handling (
[d3b1e2bc]
).
- Fixed default handler issue in logger (
🧹 Minor Changes
- Removed unused imports (
[f3c05962]
). - Adjusted test imports structure (
[69738119]
). - Removed commented code (
[86b51102]
). - Removed unused
disable_md_printing
argument ([b3af430e]
).
✅ Testing
- Added test for exception handling (
[dfbfad54]
). - Added environment variable tests (
[0fba8a13]
).
Commits details since the last release
- [e93dcad] by @thiswillbeyourgithub, 10 minutes ago:
bump version 3.0.2 -> 3.1.0
bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py
- [e175b7d] by @thiswillbeyourgithub, 31 minutes ago:
fix: piping behavior was confusing input and output
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/batch_file_loader.py
wdoc/utils/env.py
wdoc/utils/loaders.py
wdoc/utils/logger.py
wdoc/utils/misc.py
wdoc/wdoc.py
- [b3af430] by @thiswillbeyourgithub, 34 minutes ago:
forgot to remove the arg disable_md_printing
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/docs/help.md
wdoc/wdoc.py
- [61465af] by @thiswillbeyourgithub, 36 minutes ago:
enh: better log format
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [2fe2c43] by @thiswillbeyourgithub, 37 minutes ago:
new: print the default log location always
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [43c859d] by @thiswillbeyourgithub, 37 minutes ago:
fix: default handler
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [e4532d3] by @thiswillbeyourgithub, 47 minutes ago:
fix: is_piped variable was wrong
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
wdoc/utils/misc.py
- [01f01ac] by @thiswillbeyourgithub, 66 minutes ago:
enh: increase chances of logger beint initialized asap
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/init.py
wdoc/main.py
- [dc06ccf] by @thiswillbeyourgithub, 89 minutes ago:
better log format
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [f3c0596] by @thiswillbeyourgithub, 2 hours ago:
remove unused imports
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [e637c2f] by @thiswillbeyourgithub, 2 hours ago:
new: the log level now is always at debug level for the logfile and using --debug only modifyed the stdout of user
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [86b5110] by @thiswillbeyourgithub, 2 hours ago:
minor: remove commented line
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/logger.py
- [130058a] by @thiswillbeyourgithub, 2 hours ago:
new: if wdoc is used in a pipe, we set the log level to critical
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
wdoc/utils/logger.py
- [dfbfad5] by @thiswillbeyourgithub, 2 hours ago:
test: add test for exception handling
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [6973811] by @thiswillbeyourgithub, 2 hours ago:
minor: move the test imports higher up
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [0fba8a1] by @thiswillbeyourgithub, 2 hours ago:
test: add an unexpected env variable to test that it works
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
tests/test_wdoc.py
- [d3b1e2b] by @thiswillbeyourgithub, 2 hours ago:
fix: env variable handling could cause a crash
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/env.py
- [c846daf] by @thiswillbeyourgithub, 3 hours ago:
better error message from python-magic
Signed-off-by: thiswillbeyourgithub 26625900+thiswillbeyourgithub@users.noreply.github.com
wdoc/utils/loaders.py