Skip to content

Improving multi-modal food detection system with transfer learning

Notifications You must be signed in to change notification settings

sgowdaks/food-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Modal Food Classification in a Diet Tracking System

This repository contains the official implementation of Multi-Modal Food Classification in a Diet Tracking System with Spoken and Visual Inputs

Overview

We investigated multi-modal transfer learning approaches on a novel, food-specific image-text dataset. We aim to provide new insight into the process of developing domain-specific, multi-modal deep learning models with small datasets.

Model-I

Baseline vision-and-language model without pre-trained weights by merging CNN and LSTM models

Model-II

Vision-and-Language Transformer (ViLT) with Pre-Trained Weights

Citation

@INPROCEEDINGS{10095762,
  author={Gowda, Shivani and Hu, Yifan and Korpusik, Mandy},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Multi-Modal Food Classification in a Diet Tracking System with Spoken and Visual Inputs}, 
  year={2023},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10095762}}

About

Improving multi-modal food detection system with transfer learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages