From Part 2 of my medium post titled "Neural Image Captioning for Mortals"...
We’re talking about how mortals (i.e. the author, an undergraduate intern) can implement machine learning models solving frontier-AI problems, like automatically generating captions to describe scenes in images.
These are the mini-projects to which I divided the problem into:
- Rating how relevant an image and caption are to each other (Part 1)
- Given an image of a handwritten digit, generating the word to describe it character-by-character (i.e. “z-e-r-o”) (Part 2)
- Given a natural scene photo, generating the sentence to describe it word-by-word (Part 2)
The most important piece I’m trying to get across is that that I would not have had the knowledge, tools, or resources to even begin building these models without the help of brilliant, generous people sharing their ideas, code, and models. So, with that perspective in mind, let’s get to it!