Scrape Resident Advisor (RA) to find out where and when your favourite (techno) artists are playing.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
The code is run and tested with Python 3.7.5 on macOS 10.14.6 and Ubuntu 18.04.5.
Clone the repo to your local machine. Create a virtual environment for Python 3 with:
virtualenv -p python3 env
Activate the virtual environment with:
source env/bin/activate
Install the required Python packages with:
pip3 install -r requirements.txt
Within the scraper/artists.txt
file specify all artists you want to scrape events for (one per line).
Artist names should be in a format such that a valid RA url https://www.residentadvisor.net/dj/ARTIST_NAME
exists.
This project uses the scrapy Python library (tutorial here).
You can find the spider with the name ra_artist_spider
in the file scraper/scraper/spiders/ra_artist_spider.py
.
To start the spider and save results locally please run:
make build
If you want to recursively scrape events of artists mentioned in the lineup section on the event pages scraped for the initially given artists, set the following value in scraper/scraper/settings.py
:
RECURSIVE = True
This setting can result in a very large number of requests, which can be controlled by CLOSESPIDER_ITEMCOUNT
or CLOSESPIDER_PAGECOUNT
.
Every run of make build
produces three JSON Line files (data models in scraper/scraper/items.py
).
The first one contains general event information for the given artists, the second and third ones contain the lineup and price information for these events, if available.
For example (one artist only, no recursive scraping):
EventItem.jsonl
(four events on artist page Solomun)
{"id": "1319532", "artist": "Solomun", "date": "Tue, 31 Dec 2019", "title": "NYE with Solomun & Jamie Jones by Link Miami Rebels", "link": "https://www.residentadvisor.net/events/1319532", "venue": "Space", "city": "Miami"}
{"id": "1363163", "artist": "Solomun", "date": "Sat, 04 Jan 2020", "title": "Solomun 1 Maceo Plex Tulum", "link": "https://www.residentadvisor.net/events/1363163", "venue": "Templo Xunanha Tulum", "city": "South"}
{"id": "1363420", "artist": "Solomun", "date": "Sat, 25 Apr 2020", "title": "Solomun Paris 2020", "link": "https://www.residentadvisor.net/events/1363420", "venue": "La Seine Musicale / Seguin", "city": "Paris"}
{"id": "1352870", "artist": "Solomun", "date": "Sat, 20 Jun 2020", "title": "Diynamic - Off Week Festival", "link": "https://www.residentadvisor.net/events/1352870", "venue": "Parc Del Forum", "city": "Barcelona"}
EventLineupItem.jsonl
(four events with lineup section available)
{"id": "1319532", "lineup": ["Solomun", "Jamie Jones", "Danyelino", "Thunderpony"]}
{"id": "1363163", "lineup": ["Solomun", "Maceo Plex"]}
{"id": "1363420", "lineup": ["Solomun"]}
{"id": "1352870", "lineup": ["Solomun"]}
EventPriceItem.jsonl
(three events with price section available, both sold-out and on-sale event prices)
{"id": "1319532", "closed_prices": [["$20.00", "1st release (entry before 12AM)"], ["$50.00", "2nd release (entry before 12AM)"], ["$80.00", "3rd release (entry before 12AM)"], ["$150.00", "4th release (entry before 12AM)"], ["$80.00", "1st release (entry before 3 AM)"], ["$100.00", "2nd release (entry before 3 AM)"], ["$120.00", "3rd release (entry before 3 AM)"], ["$100.00", "1st release (entry ANYTIME)"], ["$120.00", "2nd release (entry ANYTIME)"], ["$150.00", "3rd release (entry ANYTIME)"], ["$250.00", "4th release (entry ANYTIME)"], ["$80.00", "1st release (entry AFTER 10 AM)"], ["$100.00", "2nd release (entry AFTER 10 AM)"], ["$120.00", "3rd release (entry AFTER 10 AM)"], ["$70.00", "1st release (entry AFTER 12 PM)"], ["$60.00", "1st release (entry AFTER 2 PM)"], ["$80.00", "2nd release (entry AFTER 2 PM)"]], "onsale_prices": [["$100.00 + $12.50", "2nd release (entry AFTER 12 PM)"]]}
{"id": "1363420", "closed_prices": [], "onsale_prices": [["35,00 € + 4,40 €", "1st release"], ["100,00 € + 12,50 €", "Backstage - VIP"]]}
{"id": "1352870", "closed_prices": [["19,50 €", "Early bird "], ["29,50 €", "1st release"]], "onsale_prices": [["39,50 € + 4,95 €", "2nd release"], ["150,00 € + 18,75 €", "Backstage experience"], ["49,00 € + 6,10 €", "FRIDAY ALL PARK TICKET (Afterlife+Arpiar+Solid Grooves)"], ["49,00 € + 6,10 €", "SATURDAY ALL PARK TICKET (Diynamic+FRRC+Secretsundaze)"], ["99,00 € + 12,40 €", "WEEKEND ALL PARK TICKET (All Events)"], ["300,00 € + 37,50 €", "Weekend backstage experience "]]}
In order to additionally create a JSONL or CSV file with all results, set the following values in scraper/scraper/settings.py
:
FEED_EXPORTERS = {"jsonlines": "scrapy.exporters.JsonLinesItemExporter"}
FEED_FORMAT = "jsonlines"
FEED_URI = "results.jsonl"
FEED_EXPORTERS = {"csv": "scrapy.exporters.CsvItemExporter"}
FEED_FORMAT = "csv"
FEED_URI = "results.csv"
To send the content in EventItem.jsonl
via eMail to your Google mail account run:
make notify
You need to set the environment variables ACCOUNT
(eMail address) and SECRET
(password) before.
Further settings are available in scraper/notifications.py
.
For daily notifications, you can use GitHub Actions and deploy the workflow with the .github/workflows/app.yml
file.
- Event lineup and price collection
- Email notifications
- Visualizations/stats (Dash front-end?)
- Database connection
- Resident Advisor API
- Manuel Zander
See LICENSE