Simple C# console application that will crawl the given webpage for image-tags and hyperlinks. If some of them is not working, info will be sent to output.
Branch | Build status |
---|---|
develop | |
master |
Because it could be useful to know when a webpage you have responsibility for displays broken links to it's users. I have this running continuously, but you don't have to. For instance, after upgrading your CMS, changing database-scheme, migrating content etc, it can be relevant to know if this did or did not not introduce broken links. Just run this tool one time and you will know exactly how many links are broken, where they link to, and where they are located.
Key | Usage |
---|---|
BaseUrl |
Base url for site to crawl |
SuccessHttpStatusCodes |
HTTP status codes that are considered "successful". Example: "1xx,2xx,302,303" |
CheckImages |
If true, <img src=".." will be checked |
ValidUrlRegex |
Regex to match valid urls |
Slack.WebHook.Url |
Url to the slack webhook. If empty, it will not try to send message to slack |
Slack.WebHook.Bot.Name |
Custom name for slack bot |
Slack.WebHook.Bot.IconEmoji |
Custom Emoji for slack bot |
OnlyReportBrokenLinksToOutput |
If true, only broken links will be reported to output. |
Slack.WebHook.Bot.MessageFormat |
String format message that will be sent to slack |
Csv.Enabled |
Enable/disable CSV output |
Csv.FilePath |
File path for the CSV file |
Csv.Overwrite |
Whether to overwrite or append (if file exists) |
Csv.Delimiter |
Delimiter between columns in the CSV file (like ',' or ';') |
Clone repo 👉 open solution in Visual Studio 👉 build 👊
LinkCrawler.exe >> crawl.log
will save output to file.
If configured correctly, the defined slack-webhook will be notified about broken links.
##How I use it I have it running as an Webjob in Azure, scheduled every 4 days. It will notify the slack-channel where the editors of the website dwells.
Creating a webjob is simple. Just put your compiled project files (/bin/) inside a .zip, and upload it.
Schedule it.
The output of a webjob is available because Azure saves it in log files.
Read more about Azure Webjobs: https://azure.microsoft.com/en-us/documentation/articles/web-sites-create-web-jobs/
Read more about Slack incoming webhooks: https://api.slack.com/incoming-webhooks