RUG Advanced Programming Project
Download the following file and extract in this the github repo. H&M data source
Install requirements (python libraries) using:
pip install requirements.txt
Alternatively (depending on what you use):
conda install requirements.txt
Afterwards start up a jupyter notebook and start up Main notebook.ypynb.
Working on the project went really well. We meeted in real life and online to decide on who did what and created research questions. We decided that everyone should use their own notebook to minimize merge errors. In the end we merged all notebooks together into a main notebook.
We try to answer the following main research question:“Can we use the H&M dataset to explore data about its latest fashion trends and customer base?” This main research question is devided into 3 sub categories“:
- Sales --> What is popular
- Customer base --> Who is our customer?
- Personalised Fashion reccomendation --> Can we reccomend products based on other customers
This section will try to answer the following questions:
What are the most sold articles? What is the most sold article? What are the most sold types of articles? What are the worst selling articles? What is the least sold article? What are the worst selling types of articles? What color is the most popular? What was the most succesful week in sales? Do expensive (categories) sell better than cheaper articles?
In the customer section of this project we have explored the following research questions:
- What is the most frequent age of H&M customers?
- What is the distribution of the ages of H&M customers? W
- What is the most frequent postal code?
- How many customers have a club membership?
- What kind of fashion news frequency is most popular with what club membership?
- How many people receive fashion news?
- What is the spread of fashion news in contrast to club member status?
- What is the relation between receiving fashion news and the customers age?
We have expressed the data in different ways, varying from plots to tables. With this research, we try to create a better overview of the customers shopping at H&M.
In this chapter we will be using an item based collaborative filtering approach to reccomending items to users. The item reccomendations can be used to reccomend other items to users when they are shopping for products. The main idea is to find products that are frequently bought together.
We first start by reducing the dataset even more. The current appoach of using cosine similarity could not be used on the intire dataset since the matrix would become to large to fit into memory. We reduced the dataset to only contain data after 09/01/2020 and selected the first 20 000 items that users bought in h&m stores. This does probably impact accuracy, since a lot of previous transactional data is not taken into account.