forked from allenai/unified-io-inference
-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Daniel Napierski edited this page Nov 4, 2022
·
3 revisions
Welcome to the unified-io-inference wiki!
- Develop iterative approach to object detection and captioning.
- Improve and extend the current results by making iterative prompts to unified-IO
- Adapt the prompts based on the results of the preliminary round of prompts.
- Gather text results from a variety of VQA prompts, including captioning and categorization.
- Captioning: "What does the image describe ?"
- Categorize: "What is in this image ?"
- Test others including "What is happening in the image ?", "Describe the scene.", "List the objects.",
- Parse text answers using spacy
- Identify parts of speech
- Collect noun-phrases ("soccer player", "police officer", etc.)
- Collect template of noun-phrases, including possibly "[] sitting down","[] holding a []"
- Work to get long phrases containing multiple nouns: "a man in uniform talks to people"
- Use unified-IO
refexp(...)
- find bounding boxes of noun-phrases
- add error handling for cases where it fails
- refexp can also return
<extra_id_[N]>
tokens and plain text tokens ("person", "fire", etc.)
- Submit customized captioning prompts:
- "Describe the scene, including the soccer player."
- Explore additional prompts.
- Review the literature for captioning and question answering.