This project focuses on data cleaning and transformation of the Myntra product dataset, containing details about various products, such as names, brands, prices, discounts, ratings, and other attributes.
The initial dataset includes the following columns:
- product_name: Name of the product.
- brand_name: Brand of the product.
- rating: Average rating of the product.
- rating_count: Number of ratings received.
- marked_price: Original price of the product.
- discounted_price: Price after discount.
- sizes: Available sizes.
- product_link: URL to the product page.
- img_link: URL to the product image.
- product_tag: Category tag (e.g., "wallets," "flip-flops").
- brand_tag: Brand tag for filtering purposes.
- discount_amount: Discount amount applied.
- discount_percent: Discount percentage applied.
-
Adding product_id Column Extracted a unique product_id from the product_link URL by isolating the ID portion. This helps in uniquely identifying products.
-
Creating product_description Column Generated a product_description column by extracting descriptive information from the product_link. This provides a brief description of each product.
-
Adding price_range Column Added a price_range column based on discounted_price to categorize products into price bands:
"Rs 0 - Rs 500" for products priced up to 500. "Rs 500 - Rs 1000" for products priced between 500 and 1000. "Rs 1000 - Rs 2000" for products priced between 1000 and 2000. "Rs 2000 - Rs 5000" for products priced above 2000.
Final Dataset Structure After processing, the dataset now has 16 columns:
The original 13 columns. product_id: Unique identifier for each product. product_description: A brief description generated from product_link. price_range: Categorized price range.
Contributions are welcome! If you have ideas for enhancements or spot any issues, feel free to submit a pull request.