Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection Broken links of an entire domain. #25

Open
BAG-OF-CHIPS-XX opened this issue Apr 12, 2018 · 1 comment
Open

Collection Broken links of an entire domain. #25

BAG-OF-CHIPS-XX opened this issue Apr 12, 2018 · 1 comment

Comments

@BAG-OF-CHIPS-XX
Copy link

can someone please help me understand this module? when i type "pylinkvalidate.py -P http://www.example.com/" it will only scrape that one URL. Is this script capable of collecting broken links of an entire domain? I am not the best with python :/

@kowalcj0
Copy link

yes @cdangerdouglas , it can scan the entire site.
Just have a look at the documentation and play with the parameter values.
Here's a config that works for me:

pylinkvalidate.py \
	    --progress \
	    --timeout=35 \
	    --depth=2 \
	    --workers=10 \
	    --types=a \
	    --strict \
	    --test-outside \
	    --parser=lxml \
	    --header="User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; FSL 7.0.6.01001)" \
	    --ignore="comma,separated,list,of,domains,or,prefixes,that,should-,be,ignored,during,the,scan" \
	    http://www.example.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants