This project aims to address two critical issues in e-commerce: the accurate calculation of product ratings and the effective sorting of product reviews. By solving these problems, we enhance customer satisfaction, improve product visibility for sellers, and ensure a smooth shopping experience for buyers.
In e-commerce, accurately calculating post-sale product ratings is essential for customer satisfaction and for making products stand out. Properly sorting product reviews is also crucial. Misleading reviews can directly affect sales, leading to financial loss and customer attrition. Addressing these issues will help e-commerce sites and sellers increase their sales while providing a seamless purchasing journey for customers.
The dataset contains Amazon product data, including various metadata and product categories. It features user ratings and reviews for the most reviewed products in the Electronics category.
reviewerID
: User IDasin
: Product IDreviewerName
: User Namehelpful
: Helpful rating scorereviewText
: Review textoverall
: Product ratingsummary
: Review summaryunixReviewTime
: Review time (Unix timestamp)reviewTime
: Raw review timeday_diff
: Number of days since the reviewhelpful_yes
: Number of times the review was marked as helpfultotal_vote
: Total number of votes for the review
Task 1: Calculate the Average Rating Based on Recent Reviews and Compare It with the Existing Average Rating
-
Load the Dataset and Calculate the Product's Average Rating
- The dataset includes user ratings and reviews. Calculate the average rating based on these reviews and compare it with the existing average rating.
-
Calculate the Weighted Average Rating Based on Date
- Convert date columns to datetime format and calculate the weighted average rating based on the date of the reviews.
-
To Observe Which Time Zone It Is In
- Determine the time zone of the data to ensure accurate date calculations.
-
Ratings Given Recently Are Higher
- Analyze if recent ratings are higher, which might indicate the product’s popularity.
-
Create the
helpful_no
Variable- The
total_vote
variable represents the total number of up and down votes for a review. Create thehelpful_no
variable to indicate the number of unhelpful votes.
- The
-
Calculate and Add the
score_pos_neg_diff
,score_average_rating
, andwilson_lower_bound
Scores to the Data- Compute the
score_pos_neg_diff
,score_average_rating
, andwilson_lower_bound
for each review and add these scores to the dataset.
- Compute the
-
Identify 20 Reviews and Interpret the Results
- Select the top 20 reviews based on the calculated scores and provide an interpretation of the results.
To run this project, ensure you have the following dependencies installed:
- Python 3.x
- Pandas
- NumPy
- scikit-learn
You can install the required packages using pip:
pip install pandas numpy scikit-learn
-
Clone the repository:
git clone <repository-url> cd Rating_Product_-_Sorting_Reviews_in_Amazon
The results of the analysis will be saved in the results
directory. You will find the following:
average_rating_comparison.csv
: A CSV file comparing the calculated average ratings with the existing average ratings.review_scores.csv
: A CSV file containing the calculatedscore_pos_neg_diff
,score_average_rating
, andwilson_lower_bound
for each review.top_reviews.csv
: A CSV file with the top 20 reviews based on the calculated scores.
- Ensure that the dataset is correctly formatted before running the analysis.
- If you encounter any issues, please check the dataset for missing or malformed data.
- The script assumes that the dataset has been preprocessed correctly; otherwise, errors may occur.
This project is licensed under the MIT License. See the LICENSE file for details.