This project demonstrates web scraping techniques applied to extract product data from Amazon. The primary goal is to gather and analyze information such as prices, ratings, reviews, and other attributes from product listings. This analysis showcases my ability to utilize web scraping for data collection, making it highly relevant for roles in data analysis and data engineering.
- Web Scraping: Utilizes Python libraries to scrape product data from Amazon's website.
- Data Collection: Gathers comprehensive product information, including title, price, rating, and number of reviews.
- Pagination Handling: Effectively navigates multiple pages to collect extensive product data.
- Data Storage: Saves scraped data in structured formats (CSV or JSON) for further analysis.
- Python: The primary programming language for the scraping scripts.
- Beautiful Soup: A library for parsing HTML and XML documents to extract data.
- Requests: A library for making HTTP requests to fetch web pages.
- Pandas: A library for data manipulation and analysis, used to store and process scraped data.
- Jupyter Notebook: For interactive coding and data exploration.
The project addresses key questions regarding product trends and market analysis, including:
- Price Trends: How do prices of similar products vary across different categories?
- Customer Ratings: What is the distribution of customer ratings for top-selling products?
- Review Sentiment: What are the common sentiments expressed in product reviews?
- Market Comparison: How do products in different categories compare in terms of features and pricing?
- Sales Insights: What patterns can be identified from the sales data of high-ranking products?
Detailed explanations for each analysis and the scraping methodology are included in the Jupyter Notebook.
amazon_scraping.ipynb
: Contains the main code for scraping Amazon product data and analysis.requirements.txt
: Lists the necessary Python libraries for the project.
-
Clone this repository.
-
Install the required libraries using:
pip install -r requirements.txt