-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
Add SQLite caching with --no-cache and --force-check flags (#2219) #2608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add SQLite caching with --no-cache and --force-check flags (#2219) #2608
Conversation
Implements SQLite-based result caching to improve performance and reduce rate limiting. Results are cached for 24 hours by default and stored in ~/.sherlock/cache.db. Features: - Automatic caching of username lookup results with configurable TTL - --no-cache flag to disable caching completely - --force-check flag to ignore cached results and force fresh lookups - --cache-duration flag to customize cache expiration (default: 86400s) - sherlock-cache CLI utility for cache management (stats, clear, cleanup) - Comprehensive test suite for cache functionality Technical Details: - Cache stored in SQLite database at ~/.sherlock/cache.db - Automatic cleanup of expired entries on each run - Caches both CLAIMED and AVAILABLE status results - Thread-safe database operations - Zero dependencies (uses built-in sqlite3 module) Resolves sherlock-project#2219
This PR is part of Hacktoberfest 2025 contribution efforts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, have any checks been performed for the possibility of SQLi?
This was not a complete review, as that'll take some time -- just some comments at first glance.
[PEP 8] (second bullet point, primarily, with a blank line separator between sections and two lines after the final import)
Addresses all feedback from PR sherlock-project#2608 review by @ppfeister Security Hardening: - Implement SQL injection protection via parameterized queries in all database operations (get, set, clear, cleanup_expired, get_stats) - Add comprehensive input validation (null bytes, control characters, length limits) to prevent injection attacks - Implement path traversal protection restricting cache to ~/.sherlock - Add URL validation (max 2048 chars, no null bytes) - Store cache_duration per entry to prevent TTL drift across runs Code Quality (PEP 8 Compliance): - Fix import ordering: stdlib → third-party → local with blank line separators in cache.py, cache_cli.py, and sherlock.py - Replace Any type hints with specific unions (str|int, QueryStatus) - Remove shebang and __main__ block from cache_cli.py to prevent unsupported direct script execution Testing Improvements: - Replace file-based tests with unittest.mock (no disk I/O) - Remove time.sleep() calls, use mocked timestamps instead - Add security-specific tests (SQL injection, path traversal, null bytes) - Verify parameterized query usage in all database operations - Follow maintainer's testing patterns from feat/better_waf branch - Fix unused variable linting warnings (F841) Database Migration: - Add automatic schema migration for existing cache databases - Detect and handle old schema missing cache_duration column - Gracefully drop and recreate incompatible cache tables Platform Compatibility: - Verify Windows compatibility (Path.home() behavior documented) - Test Docker container build and execution - Confirm cross-platform path separator handling Test Results: - Linting: ✓ All checks passed - Cache tests: ✓ 14/14 passed - Docker build: ✓ Verified with act - Integration tests: 38/39 passed (1 flaky external site WAF)
✅ Fixed - All database operations now use parameterized queries with ? placeholders. Added comprehensive input validation (null bytes, control characters, length limits). Tests verify parameterized query usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attached a couple additional thoughts
Appreciate your patience and cooperation on this one. As a larger architectural change, I'd rather make sure it's right the first time rather than have things unexpectedly break.
Additional nice-to-have ---- If we could configure cache settings by environment variable, that would be extremely useful in terms of automations and containerized environments i.e. (can't imagine a need for the others) Easy enough for this PR or should we break that out into a separate RFE? |
Addresses all review comments from @ppfeister on PR sherlock-project#2608 CLI Changes: - Rename --no-cache → --skip-cache (clearer semantics) - Rename --force-check → --ignore-cache (removes ambiguity) - Fix argument names: args.skip_cache, args.ignore_cache Cache Path Improvements: - Use platformdirs for OS-specific cache locations - Linux/macOS: ~/.cache/sherlock/cache.sqlite3 (XDG spec) - Windows: %LOCALAPPDATA%\sherlock\cache.sqlite3 - Change extension .db → .sqlite3 - Support SHERLOCK_CACHE_PATH environment variable Database Migration: - Implement PRAGMA user_version for schema versioning - Extract migration logic to _migrate_schema() function - Support incremental migrations from version 0 → 1 Concurrency Fix: - Move cache writes from per-check to post-run bulk insert - Add set_batch() method for efficient bulk caching - Prevents race conditions Environment Variable Support: - SHERLOCK_CACHE_DISABLE: Disable caching entirely - SHERLOCK_CACHE_PATH: Custom cache location - SHERLOCK_CACHE_TTL: Custom TTL in seconds Dependencies: - Add platformdirs for cross-platform cache directory detection Tests: - All cache tests passing (14/14) - Update mocks to use user_cache_dir - Fix test_stats_calculation mock values - Remove unused pathlib.Path import Known Issues: - test_probes.py::AllMyLinks test is flaky (site returns WAF) - This is an external dependency issue, not related to cache Test Results: - Cache tests: 14/14 passed ✓ - Integration tests: 38/39 passed (97.4%) - Linting: passed ✓
✅ Implemented all three environment variables!
All tested and working. Perfect for Docker/CI environments! Example:
|
Easy enough - all done! |
I don't have the time to validate this tonight but the hacktoberfest-accepted label has been added so we don't have to rush testing & merging This is looking good so far though. Great addition to the project imo. |
No rush, just wanted to let you know it was done. |
~ Leaving this comment just to let you know that I also plan to review this. I’ll take a closer look at it soon. |
Description
Implements SQLite-based result caching as discussed in #2219 to improve performance and reduce rate limiting when performing multiple username lookups.
Changes
Core Caching Implementation
sherlock_project/cache.py
- SQLite cache manager with TTL supportsherlock_project/sherlock.py
- Integrated cache into main sherlock function~/.sherlock/cache.db
(automatic directory creation)CLI Arguments Added
--no-cache
- Disable caching completely (don't read or write to cache)--force-check
- Ignore cached results and force fresh checks for all sites--cache-duration SECONDS
- Customize cache expiration time (default: 86400)Cache Management Utility
sherlock-cache
with subcommands:stats
- Show cache statistics (total/valid/expired entries, cache path)clear
- Clear cache entries (all, by username, or by site)cleanup
- Remove only expired entriesTesting & Documentation
tests/test_cache.py
- Comprehensive cache functionality testsdocs/README.md
- Added cache usage and management sectionUsage Examples
Performance Impact
--no-cache
Implementation Details
Cache Strategy
CLAIMED
andAVAILABLE
status resultsUNKNOWN
,ILLEGAL
, orWAF
statuses(username, site)
tuple (prevents conflicts)Database Schema
Testing
Test Results: ✅ All 5 cache tests passing
test_cache_set_and_get
- Basic cache operationstest_cache_expiration
- TTL functionalitytest_cache_clear_all
- Clear entire cachetest_cache_clear_username
- Clear by usernametest_cache_stats
- Statistics generationRelated Issue
Closes #2219
Checklist
--no-cache
flag--force-check
flagsherlock-cache
management utilityCode of Conduct