- Implementation of a well-commented Recommender system in Python
- Writing of project report.
The goal of this project is to recommend images based on the preferences of the user. You have three practical sessions to build this system. You must ensure that all the tasks related to data acquisition, annotation, analysis, and visualization are automated.
The main tasks of the project are given below:
- Data Collection
- Labeling and Annotation
- Data Analyses
- Data Visualization
- Recommendation System
- Tests
- Report
You have to collect and download a set of images. You have the following tasks to program, automating the process as much as possible:
- Create a folder called images.
- Download open-licensed images to the folder images (minimum 100 images).
- Save metadata of every image like image size, image format (.jpeg, .png, etc.), image orientation (landscape, portrait, square, etc.), creation date, camera model, etc. in one or more JSON files. You can make use of the Exif information present in the image files.
For this task, you should look for sources having additional information like the tags, categories, etc.
In this task, you may need to label, annotate and save additional information about every image. You may analyze the images using clustering algorithms for finding the predominant colours.
You already have some metadata from the EXIF of images from the previous task. In this task, your goal is to obtain additional information, like the predominant colors, tags. How about asking users to tag the images? E.g., color names, #cat, #flower, #sunflower, rose etc. How are you planning to process the user tags? Is it possible to automate this process?
Ask the user to select some images and add tags. For every user, you are now ready to build a user-preference profile, based on this selection. You may collect the following information manually, but the objective of this task is to obtain them using the selected images in an automated manner:
- Favorite colors
- Favorite image orientation
- Favorite image sizes (thumbnail images, large images, medium-size images, etc.)
- Favorite tags
- ...
Now, with your knowledge of different types of classifiers and clustering algorithms, what more information will you add for every image?
Your next objective is to analyze the user information and their favorite images. How did you create random users? How many users did you create? What information did you store for every user? What types of analyses did you perform?
In this task, your goal is to visualize the different characteristics of all the downloaded images.
- The available number of images for every year
- The available number of images for different types: image size, image orientation, camera models, etc.
- Color characteristics
The users may also like to visualize the above information related to their favorite images. In this task, you must also add functionality to let the users visualize information related to their own user profile.
Are you now ready to recommend images to a user? In this task, your goal is to build the recommendation system. Which approach did you decide to take? Collaborative filtering, content-based, or a hybrid approach? For every user, are you now in a position to build a user-preference profile? What type of information did you use for building a user profile? What's missing? What are the limitations of your proposed approach?
Your next task is to develop and run different tests on your proposed system. Are different functions functional? How did you test your project? How are you verifying that your recommender system is working?
Your final task is to prepare a 4-page Project report (French or English) in PDF format detailing the following:
- The goal of your project
- Data sources of your images and license.
- Size of your data.
- Information that you decided to store for each image.
- Information concerning user preferences
- Data mining and/or machine learning models that you used along with the metrics obtained.
- Self-evaluation of your work.
- Remarks concerning the practical sessions, exercises, and scope for improvement.
- Conclusion
Note: Please do not add any program (or code) in this report.
- Please do not submit your image.
- Rename your project report as Name1_Name2_[Name3].pdf, where Name1, Name2, etc. are your names.
- Add your project report in your project folder.
- Compress and rename your project work as Name1_Name2_[Name3].zip, where Name1, Name2 are your names.
- Submit your project work online.
The criteria for the project evaluation is given below:
- Data Collection
- Automated approaches to data collection
- Use of open-licensed images
- Storage and management of images and the associated metadata
- Labeling and Annotation
- Automated approaches to labeling
- Storage and management of labels and annotations of images
- Use of classification and clustering algorithms
- Data Analyses
- Types of analyses used
- Use of Pandas and Scikit-learn
- Use of data mining algorithms
- Data Visualization
- Types of visualization techniques used
- Use of matplotlib
- Recommendation System
- Storage and management of user preferences and user-profile
- Use of recommendation algorithms
- Tests
- Presence of functional tests
- Presence of user tests
- Report
- Clarity of presentation
- Presence of a clear introduction and conclusion, architecture diagrams, a summary of different tasks achieved, and limitations
- Bibliography
Note: You can check supplementary examples of noteooks.
The goal of this part is to modularize the different tasks done in Part 1 and create installable components for each of them using Docker.
For this part, you have to identify the different independent tasks of the part 1 of your project. Once these tasks have been identified, perform the following operations on each task:
- Identify the repeating portion (for example, loops).
- Replace the repeating portion by map-reduce, lambda expressions and pyspark.
- Create a Docker container for this task.
- Use of Docker volumes for sharing files (CSV, JSON, images etc.) among containers.
You must have a minimum of three containers for your project. For example, a Docker container for the data acquisition, another for data analysis and another for recommendation. You may have additional containers.
The criteria for the project evaluation is given below:
- Creation of different independent tasks.
- Use of Docker volumes for data sharing between different containers.
- Use of map-reduce framework and lambda expressions.
- Use of pyspark.
Note: You can check supplementary examples of Docker containers.