docpull follows OWASP Top 10, OpenSSF guidelines, and supply chain security standards.
docpull implements multiple layers of defense-in-depth security to protect users when downloading documentation from the web:
- All network requests require HTTPS
- HTTP URLs are automatically rejected
- Prevents man-in-the-middle attacks
- SSL certificate verification enabled by default
- All output paths are validated and resolved
- Files must be written within the specified output directory
- Prevents directory traversal attacks (e.g.,
../../etc/passwd) - Filenames are sanitized to remove dangerous characters
- Maximum file size: 50MB per document
- Prevents memory exhaustion attacks
- Protects against zip bombs and decompression bombs
- Size checked before and during download
- XML parser configured to reject external entities
- Prevents XXE injection attacks
- Protects against billion laughs attack (XML bomb)
- URLs validated before any network request
- Scheme must be HTTPS
- Domain must be present
- Prevents SSRF (Server-Side Request Forgery) attacks
- Maximum of 5 redirects per request
- Prevents infinite redirect loops
- Protects against redirect-based attacks
- All HTTP requests have 30-second timeout
- Prevents hanging on slow/malicious servers
- Resource exhaustion protection
- Configurable delay between requests (default: 0.5s)
- Prevents hammering target servers
- Respectful scraping behavior
- Filenames sanitized to alphanumeric, dash, dot, underscore
- Maximum filename length: 200 characters
- Special characters removed
- Prevents command injection via filenames
- No use of
eval(),exec(), oros.system() - No dynamic code generation
- No shell command execution
- Safe file operations only
- Only accepts HTML, XML, and feed content types
- Rejects unexpected file types (executables, archives, etc.)
- Prevents malicious file download attacks
- Blocks localhost (127.0.0.1, localhost)
- Blocks RFC1918 private IPs (10.x, 172.16.x, 192.168.x)
- Prevents SSRF attacks on internal networks
- Optional domain allowlist feature
- Restricts fetching to approved domains only
- Zero-trust security model
- Error messages sanitized
- No stack traces exposed to users
- Minimal logging of sensitive data
- Man-in-the-middle attacks (HTTPS-only)
- Path traversal and directory escape
- XML External Entity (XXE) attacks
- XML bomb and billion laughs attack
- Zip bombs and decompression bombs (size limits)
- Memory exhaustion (file size limits)
- SSRF - External (HTTPS-only, private IP blocking)
- SSRF - Internal (localhost, RFC1918 blocking)
- Infinite redirects
- Request timeout attacks
- Command injection via filenames
- Code injection (no dynamic execution)
- Symlink attacks (path resolution)
- Content-type spoofing (validation)
- Information disclosure (sanitized errors)
- Supply chain attacks (pinned dependencies, scanning)
- Malicious content within documentation (XSS in markdown)
- DNS rebinding attacks
- Compromised upstream documentation sources
- Social engineering
- Only fetch from trusted sources
- Run in isolated environments when possible
- Review downloaded content before use
- Use specific output directories
- Monitor resource usage during large fetches
- Never disable SSL verification
- Validate all user inputs
- Keep dependencies updated
Report security vulnerabilities to support@raintree.technology.
Include:
- Description of the vulnerability
- Steps to reproduce
- Potential impact
- Suggested fix (if applicable)
Do not open public GitHub issues for security vulnerabilities.
Security updates will be released as patch versions (e.g., 1.0.1).
Check the Releases page for security advisories.
- Exact version pinning in requirements.txt
- Automated security scanning with pip-audit
- Dependabot enabled for automated updates
- Weekly dependency reviews
- requests - HTTP library with SSL/TLS support
- beautifulsoup4 - HTML parser
- html2text - HTML to Markdown converter
- certifi - SSL certificates
All dependencies are actively maintained and scanned weekly for CVEs.
- Bandit - Static security analysis
- pip-audit - Dependency vulnerability scanner
- CodeQL - Semantic code analysis
- Dependency Review - PR-based scanning
- OWASP Top 10: Protected against injection, XXE, insecure deserialization
- CWE-22: Path Traversal Prevention
- CWE-611: XXE Prevention
- CWE-918: SSRF Prevention
- CWE-400: Resource Exhaustion Prevention