Here you can find the code which scrapes and saves data from the Shopify App Store.
The scraper is used to collect Shopify app store dataset on Kaggle and includes these files:
apps
apps_categories
categories
key_benefits
pricing_plan_features
pricing_plans
reviews
While the dataset published on Kaggle is regularly updated, this repository allows keeping the local copy up to date independently of the released version.
Detailed dataset description can be found here.
Authenticate to GitHub Container Registry (if not already)
docker login ghcr.io -u USERNAME -p TOKEN
Pull container
docker pull ghcr.io/usernam3/shopify-app-store-scraper
Run container
docker run -v `pwd`/output/:/app/output/ ghcr.io/usernam3/shopify-app-store-scraper
After container finished the execution check the output
folder (in current directory)
ls -la output/
Install requirements
pip install -r requirements.txt
Run scraper
scrapy crawl app_store
After container finished the execution check the output
folder (in current directory)
ls -la output/
Please don't hesitate to open issues or PRs at any time if you need help with anything.