Skip to content

wizhi/LinkCrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LinkCrawler

Simple C# console application that will crawl the given webpage for image-tags and hyperlinks. If some of them is not working, info will be sent to output.

Branch Build status
develop Build status
master Build status

Why?

Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located.

App.Settings

Key Usage
BaseUrl Base url for site to crawl
SuccessHttpStatusCodes HTTP status codes that are considered "successful". Example: "1xx,2xx,302,303"
CheckImages If true, <img src=".." will be checked
ValidUrlRegex Regex to match valid urls
Slack.WebHook.Url Url to the slack webhook. If empty, it will not try to send message to slack
Slack.WebHook.Bot.Name Custom name for slack bot
Slack.WebHook.Bot.IconEmoji Custom Emoji for slack bot
OnlyReportBrokenLinksToOutput If true, only broken links will be reported to output.
Slack.WebHook.Bot.MessageFormat String format message that will be sent to slack
Csv.Enabled Enable/disable CSV output
Csv.FilePath File path for the CSV file
Csv.Overwrite Whether to overwrite or append (if file exists)
Csv.Delimiter Delimiter between columns in the CSV file (like ',' or ';')

Build

Clone repo 👉 open solution in Visual Studio 👉 build 👊

Output to console

Example run on www.github.com

Output to file

LinkCrawler.exe >> crawl.log will save output to file. Slack

Output to slack

If configured correctly, the defined slack-webhook will be notified about broken links. Slack

##How I use it I have it running as an Webjob in Azure, scheduled every 4 days. It will notify the slack-channel where the editors of the website dwells.

Creating a webjob is simple. Just put your compiled project files (/bin/) inside a .zip, and upload it. Slack

Schedule it.

Slack

The output of a webjob is available because Azure saves it in log files. Slack

Read more about Azure Webjobs: https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/

Read more about Slack incoming webhooks: https://api.slack.com/incoming-webhooks

About

Find broken links in webpage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 100.0%