Amazon Product Scraper using Django and Selenium

This project is a Django-based web application that scrapes product data from Amazon's search results using Selenium. It stores the scraped data in an SQLite database and provides an API to access the data and an endpoint to scrap data when hit the endpoint.

Getting Started

To set up the project locally, follow these steps:

Installation

Clone the repository:

git clone https://github.com/dostogircse171/amazon_product_scrap.git

Change into the project directory:

cd amazon_product_scrap

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install the required packages:

pip install -r requirements.txt

Apply database migrations:

python manage.py migrate

Create an admin user(Because we need to add some keywords from admin panel before we can run the scraping):

python manage.py createsuperuser

*Here you will need to provide username,email(optional),password etc...

Start the development server:

python manage.py runserver

Go to the server admin panel and login as admin:

http://127.0.0.1:8000/admin

Go to the server admin panel and login as admin:

http://127.0.0.1:8000/admin

Under the Keywords section add as many keywords as you want to scrap the products:

Eg. Bike, Car Parts, Cat Food, Dog Food etc.

That's it we are ready to run our scraping.

The application should now be running at http://127.0.0.1:8000/ (Unless you specified a different port).

Run the Scraping using command line

We have created a custom command in django so we can start the scraping manually using django manage.py.

python manage.py run_scraper

**This will scrap through all the keywords in the DB one by one **

Using API Endpoints

We also have API endpoint to fetch all the scraped data and we can filter data based on keywords and date

#Endpoint to fetch all data
http://127.0.0.1:8000/api/scraped-data/

#query filter to filter data based on keywords
http://127.0.0.1:8000/api/scraped-data/?keyword=bike

#or by date
http://127.0.0.1:8000/api/scraped-data/?date=2023-05-09

#or using keywords and dates
http://127.0.0.1:8000/api/scraped-data/?keyword=bike&date=2023-05-09

This endpoint will start the scraping

#RUN THE SCRAPING USING ENDPOINT
http://127.0.0.1:8000/api/trigger-scraper

Run Task Scheduler using Celery and RabbitMQ

We also have a feature to scrap the data on every 24 hours. To run this we will need RabbitMQ setup. Follow this link: https://www.rabbitmq.com/download.html to setup it as per device requirements. Once we have the RabbitMQ running we can trigger the server using these comands.

#navigate to your project directory and start the Celery worker
celery -A amazon_product_scraper worker --loglevel=info

#In another terminal window, start the Celery beat
celery -A amazon_product_scraper beat --loglevel=info

Make sure to run the commands in separate terminal windows, as they need to run concurrently. The first command starts the Celery worker, which processes the tasks. The second command starts the Celery beat, which schedules tasks to be executed by the worker at specified intervals. In your case, the trigger_scraper task will be executed every 24 hours.

Run Tests

We define some basic Unit tests to check our scrap functions, DB schema and Api endpoints. Which can be test using flowing command.

#Run all tests
python manage.py test

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
amazon_product_scraper		amazon_product_scraper
api		api
scraper_app		scraper_app
.gitignore		.gitignore
Basic overview of the application.pdf		Basic overview of the application.pdf
LICENSE		LICENSE
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Product Scraper using Django and Selenium

Getting Started

Installation

Run the Scraping using command line

Using API Endpoints

Run Task Scheduler using Celery and RabbitMQ

Run Tests

About

Releases

Packages

Languages

License

dostogircse171/amazon_product_scrap

Folders and files

Latest commit

History

Repository files navigation

Amazon Product Scraper using Django and Selenium

Getting Started

Installation

Run the Scraping using command line

Using API Endpoints

Run Task Scheduler using Celery and RabbitMQ

Run Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages