Skip to content

Commit

Permalink
Merge pull request #40 from milistu/scraper
Browse files Browse the repository at this point in the history
Scraper Update
  • Loading branch information
milistu authored May 16, 2024
2 parents 786e9eb + 88d54eb commit a37770d
Show file tree
Hide file tree
Showing 4 changed files with 167 additions and 275 deletions.
27 changes: 27 additions & 0 deletions scraper/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Amazon Laws Scraper

This script scrapes law articles from a list of URLs and saves them as JSON files.

## Usage

To run the script, use the following command:

```bash
python scraper/scraper.py --file scraper/urls.txt --output-dir laws_test
```

## Arguments
- `--url`: A single URL to scrape.
- `--file`: Path to a text file containing URLs separated by newlines.
- `--output-dir`: Directory to save the JSON files (default is scraper/laws).

## Example
To scrape law articles from a list of URLs in urls.txt and save the output in the `scraper/laws` directory:

```bash
python scraper/scraper.py --file scraper/urls.txt --output-dir scraper/laws
```
> ⚠️ _**Note**: Ensure you are in the root directory of the project before running the script._
## Output
The output JSON files will be saved in the specified output directory, with each file named after the corresponding URL's stem.
275 changes: 0 additions & 275 deletions scraper/scraper-dev.ipynb

This file was deleted.

Loading

0 comments on commit a37770d

Please sign in to comment.