Skip to content

ToS-Crawl is a stealth-enabled crawler that extracts and recursively collects Terms of Service (ToS), Privacy Policies, and other legal agreements from major websites. It is designed for academic research, NLP dataset creation, and comparative policy analysis.

License

Notifications You must be signed in to change notification settings

Xinzhang-Chen/tos-crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TOS-Crawl Banner

πŸ•ΈοΈ ToS-Crawl β€” Terms of Service Crawler

ToS-Crawl is a stealth-enabled crawler that extracts and recursively collects Terms of Service (ToS), Privacy Policies, and other legal agreements from major websites. It is designed for academic research, NLP dataset creation, and comparative policy analysis.

ToS-Crawl is part of the following research paper:

"TOSense: We Read, You Click"
Xinzhang Chen, Hassan Ali, Arash Shaghaghi, Salil S. Kanhere, Sanjay Jha
IEEE/IFIP DSN (Dependable Systems and Networks) 2025 β€” IEEE Xplore abstract

"Demo: TOSense – What Did You Just Agree to?"
Xinzhang Chen, Hassan Ali, Arash Shaghaghi, Salil S. Kanhere, Sanjay Jha
IEEE LCN (Local Computer Networks) 2025 β€” IEEE Xplore abstract


πŸ“Œ Features

  • Multi-mode browser fallback: supports Real-Browser β†’ Stealth β†’ Standard Puppeteer
  • Heuristic-based link scoring for recursive discovery of legal documents
  • Readability and Unfluff dual-mode extraction engines
  • Auto-expand content and dismiss cookie banners
  • Cleans HTML to Markdown with Turndown
  • Filters non-HTML and duplicate fragment URLs

πŸš€ Installation

git clone https://github.com/Xinzhang-Chen/tos-crawl.git
cd tos-crawl
npm install

Requires Node.js β‰₯ 18. Puppeteer will auto-install Chromium. No system Chrome is needed.


πŸ§ͺ Usage

node tos-crawl.js --urls '["<seed_url_1>", "<seed_url_2>"...]' --output <output_path.md>

▢️ Example

node tos-crawl.js --urls '["https://www.linkedin.com/legal/l/service-terms"]' --output ./output/Linkedin.md

βš™οΈ Parameters

Option Description Default
--urls (Required) JSON array of seed URLs –
--output Output .md file to store extracted content ./output/tos.md
--maxDepth Maximum recursive depth 3
--visitBudget Max successful documents to extract 6
--maxPages Page budget (stop after visiting N pages; -1 = unlimited) -1
--language Accept-Language HTTP header value en-US
--includeArchive Whether to allow crawling archived/history pages false
--extractor Extraction engine: readability or unfluff readability
--fallbackExtractor Try unfluff if readability fails false
--quiet Suppress logs to console false
--headless Run browsers in headless mode true

Browser Fallback Strategy

ToS-Crawl automatically retries failed pages using 3 fallback modes in order:

  1. Real Browser via puppeteer-real-browser (best anti-detection)
  2. Stealth Puppeteer via puppeteer-extra-plugin-stealth
  3. Plain Puppeteer with manual headers and Accept-Language

Heuristic Link Scoring

Links are filtered using a weighted scoring system defined in heuristics.js.
Scores are based on:

  • Positive Signals: e.g., path matches /terms-of-service, anchor text contains β€œprivacy policy”, etc.
  • Negative Signals: e.g., promotional, refund, job/career pages, PDFs, date-like paths
  • Placement Signals: footer, nav, aria-labels

Only links scoring β‰₯10 are followed during recursive crawl.


🌍 Test URLs Table

Use the following well-known platform links to test the crawler:

Platform Terms of Service URL
Facebook https://www.facebook.com/terms.php
YouTube https://www.youtube.com/t/terms
TikTok https://www.tiktok.com/legal/page/row/terms-of-service/en
LinkedIn https://www.linkedin.com/legal/l/service-terms
Google https://policies.google.com/terms

You can copy any of the above into --url to test the crawler on that site.


πŸ“Š Sample Crawl Summary

Loaded 1 unique seed URL(s)
Crawling 1 URL(s) | depth≀3
[0] https://www.linkedin.com/legal/l/service-terms | visited=1 succeed=0 failed=0
[LAUNCH] mode=0
[EXTRACT] https://www.linkedin.com/legal/l/service-terms | mode=0 | depth=0
[SCORE] https://www.linkedin.com | score=1
[SCORE] https://www.linkedin.com/signup/cold-join | score=1
[SCORE] https://www.linkedin.com/uas/login | score=1
[SCORE] https://www.linkedin.com/help/recruiter/answer/a411940 | score=0
[SCORE] https://www.linkedin.com/help/recruiter/answer/50181/recruiter-inmail-policy | score=7
[SCORE] https://www.linkedin.com/legal/l/jobs-policies | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a7449391 | score=0
[SCORE] https://www.linkedin.com/help/recruiter/answer/a411940 | score=0
[SCORE] https://www.linkedin.com/legal/l/lmsprogramterms | score=4
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
[1] https://www.linkedin.com/legal/user-agreement | visited=2 succeed=1 failed=0
[LAUNCH] mode=0
[1] https://www.linkedin.com/legal/privacy-policy | visited=3 succeed=1 failed=0
[LAUNCH] mode=0
[1] https://www.linkedin.com/legal/cookie-policy | visited=4 succeed=1 failed=0
[LAUNCH] mode=0
[EXTRACT] https://www.linkedin.com/legal/cookie-policy | mode=0 | depth=1
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=7
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=10
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/help/linkedin/answer/65521 | score=0
[SCORE] https://www.linkedin.com/legal/l/cookie-table | score=7
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=6
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/help/linkedin/answer/83251 | score=0
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/psettings/advertising | score=1
[SCORE] https://www.linkedin.com/psettings/guest-controls/retargeting-opt-out | score=1
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
[1] https://www.linkedin.com/legal/copyright-policy | visited=5 succeed=2 failed=0
[LAUNCH] mode=0
[EXTRACT] https://www.linkedin.com/legal/copyright-policy | mode=0 | depth=1
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=7
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=10
[SCORE] http://www.linkedin.com/help/legacy/redirect/app/ask/path/TS-NCI/loc/na/trk/d_microsites-frontend_legal_copyright-policy | score=7
[SCORE] http://www.linkedin.com/help/legacy/redirect/app/ask/path/TS-NCI/loc/na/trk/d_microsites-frontend_legal_copyright-policy | score=4
[SCORE] http://www.linkedin.com/help/legacy/redirect/app/ask/path/TS-CNRCCI/loc/na/trk/d_microsites-frontend_legal_copyright-policy | score=7
[SCORE] http://www.linkedin.com/help/legacy/redirect/app/ask/path/TS-CNRCCI/loc/na/trk/d_microsites-frontend_legal_copyright-policy | score=4
[SCORE] https://www.linkedin.com/help/linkedin | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/146 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/30365 | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/30200 | score=4
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
[2] https://www.linkedin.com/legal/privacy/usa | visited=6 succeed=3 failed=0
[LAUNCH] mode=0
[EXTRACT] https://www.linkedin.com/legal/user-agreement | mode=0 | depth=1
[EXTRACT] https://www.linkedin.com/legal/privacy-policy | mode=0 | depth=1
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=7
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=10
[SCORE] https://www.linkedin.com/legal/preview/user-agreement | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/a8059228 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1341216/updates-to-user-agreement-and-privacy-policy | score=4
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=11
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/help/linkedin/answer/63 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a6854067 | score=0
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/89880 | score=1
[SCORE] https://www.linkedin.com/legal/pop/terms-for-paid-services | score=7
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1340105 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/50 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/5704 | score=0
[SCORE] https://www.linkedin.com/payments/purchasehistory | score=-3
[SCORE] https://www.linkedin.com/mypreferences/d/categories/sign-in-and-security | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/67 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/86529 | score=1
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/psettings/news-mention-broadcast | score=-3
[SCORE] https://www.linkedin.com/help/linkedin/answer/50021 | score=1
[SCORE] https://www.linkedin.com/services | score=0
[SCORE] https://www.linkedin.com/services | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/89880 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/89880 | score=1
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1342754 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1339724 | score=0
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/63 | score=1
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/a6857122 | score=3
[SCORE] https://www.linkedin.com/legal/professional-community-policies | score=4
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=7
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1337296 | score=0
[SCORE] https://www.linkedin.com/help/linkedin | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/79728 | score=1
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=7
[SCORE] https://www.linkedin.com/legal/privacy/usa | score=10
[SCORE] https://www.linkedin.com/legal/preview/privacy-policy | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/a8059228 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1341216/updates-to-user-agreement-and-privacy-policy | score=4
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=11
[SCORE] https://www.linkedin.com/help/linkedin | score=1
[SCORE] https://www.linkedin.com/psettings/privacy | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/89876 | score=1
[SCORE] https://www.linkedin.com/legal/privacy/eu | score=9
[SCORE] https://www.linkedin.com/legal/california-privacy-disclosure | score=7
[SCORE] https://www.linkedin.com/help/linkedin/answer/63 | score=1
[SCORE] https://www.linkedin.com/profile/edit | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/38594 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1359065 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/77 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/50021 | score=-2
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=11
[SCORE] https://www.linkedin.com/help/linkedin/answer/a7154788 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/62931 | score=1
[SCORE] https://www.linkedin.com/mypreferences/g/guest-controls | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/79855 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a540854 | score=1
[SCORE] https://www.linkedin.com/psettings | score=0
[SCORE] https://www.linkedin.com/blog/member/trust-and-safety/responsible-ai-principles | score=-3
[SCORE] https://www.linkedin.com/help/linkedin/answer/a5538339 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1337820 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a517610 | score=0
[SCORE] https://www.linkedin.com/psettings | score=0
[SCORE] https://www.linkedin.com/psettings/connections-visibility | score=0
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=0
[SCORE] https://www.linkedin.com/psettings | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/50021 | score=-2
[SCORE] https://www.linkedin.com/help/linkedin/answer/67405 | score=1
[SCORE] https://www.linkedin.com/psettings | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/79855 | score=1
[SCORE] https://www.linkedin.com/psettings/data-sharing | score=0
[SCORE] https://www.linkedin.com/help/sales-navigator/answer/50216/teamlink-overview | score=-5
[SCORE] https://www.linkedin.com/help/linkedin/answer/a540854 | score=0
[SCORE] https://www.linkedin.com/psettings/messages | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/137 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/115 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/1584 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/1164 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/1645 | score=1
[SCORE] https://www.linkedin.com/help/lms/answer/a427660 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/1164 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/85809 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1337820 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a426264 | score=0
[SCORE] https://www.linkedin.com/legal/privacy/eu | score=9
[SCORE] https://www.linkedin.com/help/linkedin/answer/39575 | score=1
[SCORE] https://www.linkedin.com/psettings/data-visibility | score=0
[SCORE] https://www.linkedin.com/mypreferences/d/categories/ads | score=0
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a6882244 | score=0
[SCORE] https://www.linkedin.com/psettings/research-invitations | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/116090 | score=0
[SCORE] https://www.linkedin.com/mypreferences/d/settings/ads-related-actions | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a711358 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a547077 | score=0
[SCORE] https://www.linkedin.com/legal/user-agreement | score=11
[SCORE] https://www.linkedin.com/psettings/privacy | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/77 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/52950 | score=1
[SCORE] https://www.linkedin.com/psettings/privacy | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/1164 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/197 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/71013 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/47992 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/a6270863 | score=0
[SCORE] https://www.linkedin.com/psettings/privacy | score=6
[SCORE] https://www.linkedin.com/public-profile/settings | score=0
[SCORE] https://www.linkedin.com/mypreferences/d/data-sharing-for-permitted-services | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1340507 | score=0
[SCORE] https://www.linkedin.com/mypreferences/d/categories/profile-visibility | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a6215608 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/16880 | score=1
[SCORE] https://www.linkedin.com/legal/transparency | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/66 | score=1
[SCORE] https://www.linkedin.com/profile/edit | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/431 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/62931 | score=1
[SCORE] https://www.linkedin.com/psettings/messages | score=0
[SCORE] https://www.linkedin.com/psettings/privacy | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1342613 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1336621 | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1338610 | score=3
[SCORE] https://www.linkedin.com/help/linkedin/answer/a1339364 | score=0
[SCORE] https://www.linkedin.com/mypreferences/g/guest-controls | score=0
[SCORE] https://www.linkedin.com/legal/privacy/eu | score=6
[SCORE] https://www.linkedin.com/help/linkedin/answer/63 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/82934 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/531 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/62533 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/89878 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/89878 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/ask/TSO-DPO | score=0
[SCORE] https://www.linkedin.com/legal/privacy/eu | score=9
[SCORE] https://www.linkedin.com/help/linkedin/answer/68763 | score=1
[SCORE] http://www.linkedin.com/help/legacy/redirect/app/ask/path/ppq/loc/na/trk/microsites-frontend_legal_privacy-policy | score=4
[SCORE] https://www.linkedin.com/help/linkedin/answer/79728 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/80432 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/answer/89877 | score=1
[SCORE] https://www.linkedin.com/help/linkedin/ask/TSO-DPO | score=0
[SCORE] https://www.linkedin.com/help/linkedin/answer/80432 | score=1
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
[SKIP] visited https://www.linkedin.com/legal/copyright-policy
[SKIP] visited https://www.linkedin.com/legal/privacy/usa
[EXTRACT] https://www.linkedin.com/legal/privacy/usa | mode=0 | depth=2
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=12
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=12
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/legal/california-privacy-disclosure | score=7
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=11
[SCORE] https://www.linkedin.com/mypreferences/d/settings/ads-interactions-with-business | score=0
[SCORE] https://www.linkedin.com/accessibility | score=2
[SCORE] https://www.linkedin.com/legal/user-agreement | score=14
[SCORE] https://www.linkedin.com/legal/privacy-policy | score=14
[SCORE] https://www.linkedin.com/legal/cookie-policy | score=14
[SCORE] https://www.linkedin.com/legal/copyright-policy | score=10
[SCORE] https://www.linkedin.com/psettings/guest-controls | score=2
[SCORE] https://www.linkedin.com/help/linkedin/answer/34593 | score=2
6 visited | succeed:6 skipped:2 failed:0
[1] https://www.linkedin.com/legal/l/service-terms
[2] https://www.linkedin.com/legal/cookie-policy
[3] https://www.linkedin.com/legal/copyright-policy
[4] https://www.linkedin.com/legal/user-agreement
[5] https://www.linkedin.com/legal/privacy-policy
[6] https://www.linkedin.com/legal/privacy/usa

⚠️ Disclaimer

This tool is intended solely for non-commercial, academic research, and educational purposes. It is the user’s responsibility to ensure compliance with applicable laws, website terms of service, and ethical research guidelines. This repository does not promote or encourage violating any platform's policies.

Additionally, the maintainers do not guarantee the accuracy, completeness, or continued availability of the extracted data. Websites may change their structure or access policies at any time. The responsibility for verifying the correctness and relevance of the collected content lies solely with the user.


πŸ“„ License

This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
See the LICENSE file for more information.


πŸ‘©β€πŸ’» Maintainer

🚧🚧🚧🚧🚧🚧🚧🚧🚧🚧

This project is under active development πŸ› οΈ.

New features and improvements are being added continuously. Stay tuned!

About

ToS-Crawl is a stealth-enabled crawler that extracts and recursively collects Terms of Service (ToS), Privacy Policies, and other legal agreements from major websites. It is designed for academic research, NLP dataset creation, and comparative policy analysis.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •