Structure for app 📂
Install the dependencies with pip, running:
$ pip install -r requeriments.txt
.
In file run.py
in should set a site to crawl, like the example in line 50 of the file, and the filename and format of output. Below is the code snippet:
# example
# declaring url to get using the crawler
url_to_scrape = 'https://elixir-lang.org/'
# creating crawler
new_crawler = Crawler(url_to_scrape)
# for running crawl result with tables uncomment the line bellow
# new_crawler.storage_data('my_scraped_assets.txt', 'my_scraped_relations.txt')
"""
For run and plot graph uncomment the three lines
bellow(and comment the two above lines),after
see the result in a network map."""
# get_relations = new_crawler.run()
# json_file = save_json(get_relations, 'data.json')
# plot_map(json_file)
Make sure you have installed all the dependencies. Run the file:
$ python run.py
.
If everything goes well you will have this result in your folder:
.
├── .gitignore # File with ignored files
├── crawler.py # Module with crawler
├── data.json # Your scraped data in json, to plot, if you choose plot graph
├── README.md # Readme with how to use the crawler
├── requeriments.txt # Dependencies file
├── my_scraped_assets.txt # Your scraped data assets in tables
├── my_scraped_relations.txt # Your scraped data relations in tables
├── run.py # File to run the crawler
└── utils.py # File with helpers