A Node.js application that scans websites for broken links using a breadth-first approach.
- Scans websites recursively up to a specified depth
- Checks for broken links within the same domain
- Provides detailed reports of broken links
- Configurable scan depth and timeout settings
- Pretty-printed console output with statistics
- Node.js (v14 or higher)
- npm (Node Package Manager)
-
Clone the repository:
git clone https://github.com/akhilarsh/broken-link-checker.git cd broken-link-checker -
Install dependencies:
npm install
-
Create environment file:
cp .sample.env .env
Then edit the
.envfile with your configuration.
| Variable | Description | Default | Required |
|---|---|---|---|
| HOME_PAGE | Starting URL for the scan | - | Yes |
| MAX_DEPTH | Maximum depth for recursive scanning | 1 | No |
| TIMEOUT_MS | Request timeout in milliseconds | 10000 | No |
Example .env file:
HOME_PAGE=https://www.example.com
MAX_DEPTH=2
TIMEOUT_MS=10000Run the link checker:
npm startThe application will:
- Start from the specified HOME_PAGE
- Scan all links on the page
- Follow internal links up to MAX_DEPTH levels
- Generate a report of any broken links found
The scanner provides:
- Real-time progress updates
- Summary of links found per page
- Table of broken links with details
- Final statistics including:
- Total pages scanned
- Total working links
- Total broken links
- Total unique URLs found
The following error codes may be encountered:
| Code | Description |
|---|---|
| 400-599 | HTTP status codes |
| TIMEOUT | Request exceeded timeout limit |
| NETWORK_ERROR | Network connectivity issues |
| UNKNOWN_ERROR | Unspecified errors |
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.