In this project, we'll work on how to count the objects in bins. Our goal is to create a pipeline with AWS tools.
Note: This repository relates to AWS Machine Learning Engineer nanodegree provided by Udacity.
Our work is organized into 5 categories:
- Collect data from the main resource and organize into an S3 bucket.
- Apply an exploratory data analysis(EDA) on the dataset using SageMaker Studio.
- Design a model and tune its hyper parameters using SageMaker.
- Train and evaluate the model using SageMaker.
- Monitor the resource management of the model using SageMaker Debugger.
We used an AWS SageMaker instance ml.t3.medium
type with the following configurations:
- two virtual CPUs
- four GiB memory
And the main software pre-requisites for the project are:
- Python 3.8
- Pytorch: 1.12
- Clone the repository.
- Run sagemaker.ipynb cells in order and follow its instructions!
We use Amazon Image Bin Dataset. The dataset contains 536,435 bin JPEG images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry pods as part of normal Amazon Fulfillment Center operations. We apply an EDA on the dataset to know it better. All the files and their metadata are organized in list and metadatalist.
You can see a sample(with 5 objects in it) of the dataset in the following picture:
We used file_list.json, a subset which is a well-balanced representative subset of the whole dataset.
After splitting our dataset into train, validation and test splits, we can store them into S3 bucket as shown below:
You can use hpo.py and hpo_improved.py for hyperparameter tuning for benchmark and refined model, respectively. This point is similar for train.py and train_improved.py for training and evaluation.
And finally, you can use sagemaker.ipynb as an orchestrator for all the mentioned above scripts to create the pipeline in SageMaker.
The reports of the SageMaker profiler is organized in benchmark profiler reports and improved profiler reports for benchmark and improved models, respectively.
You can read about the introduction and development phase of the project in proposal.pdf and report.pdf.