-
Notifications
You must be signed in to change notification settings - Fork 1
Midge generator
mmitchellai/Midge
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
### Copyright 2011, 2012 Margaret Mitchell ### Distributed under the terms of the GNU General Public License ### ### This file is part of the vision-to-language system Midge. ### ### Midge is free software: you can redistribute it and/or modify ### it under the terms of the GNU General Public License as published by ### the Free Software Foundation, either version 3 of the License, or ### (at your option) any later version. ### ### Midge is distributed in the hope that it will be useful, ### but WITHOUT ANY WARRANTY; without even the implied warranty of ### MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ### GNU General Public License for more details. ### ### You should have received a copy of the GNU General Public License ### along with Midge. If not, see <http://www.gnu.org/licenses/>. ### ### Please cite the relevant work: ### Mitchell et al. (2012). "Midge: Generating Image Descriptions From Computer Vision Detections." Proceedings of EACL 2012. ### ### Questions/Comments, send to m.mitchell@abdn.ac.uk README for Midge generator. REQUIREMENTS: - Python 2.5 or later - NLTK and NLTK's version of WordNet - Python's yaml library for reading yaml files To use, python generate.py --data-file=Blah.yaml To hallucinate verbs, python generate.py --data-file=Blah.yaml --hallucinate=verb This program currently generates sentences for the objects listed in KB/reserved_words. More words will be added to this as detectors are trained for more objects. ### Options ### --data_file=path/to/vision_out Reads in vision out in yaml format. --vision-objects=path/to/detected_objects Reads in the objects the vision system is detecting (helps constrain search space). --word-thresh=X Blanket probability threshold for word-coocurrence rules. --vision-thresh=X Blanket score threshold for vision detections. --post-id=X Prints out descriptions for a specific post-id (if available from read in output). --hallucinate=[verb|noun|adj] 'Hallucinates' open class things. Currently just supports verbs. --count-cutoff=X How many times something must be observed before considering it as an option. --with-preps Input includes a preposition specification, and output will be constrained in accordance (calculated from bounding box; true for BabyTalk input). --verb-forms=path/to/vision_verbs Reads in mapping of vision detection verbs to their lexical forms. --not-pickled Do not read in saved pickle files. ### Note on Downloading ### The "KB" directory is quite large. Unless you're adding new object detections, you will not need any of the nyt_stats or flickr_stats subdirectories, as the relevant information has already been stored in the pickled files. ### Yaml file format ### Computer vision output needs to be formatted in the way listed below. Note that the only things REQUIRED are the id, label, score, and postid; everything else is optional. - id: 1 type: 1 label: girl score: -0.674386 post_id: 2008_000335.txt bbox: [98.0, 98.0, 242.0, 145.0] attrs: {'blue': 0.028448, 'brown': 0.029408, 'colorful': 0.096156, 'gray': 0.010711, 'wooden': 0.045647, 'furry': 0.160361, 'clear': 0.033369, 'golden': 0.506392, 'yellow': 0.038993, 'rectangular': 0.025997, 'rusty': 0.090902, 'pink': 0.008614, 'black': 0.039571, 'dirty': 0.268903, 'orange': 0.031291, 'green': 0.013342, 'white': 0.003223, 'feathered': 0.345758, 'shiny': 0.411671, 'striped': 0.030062, 'red': 0.014437} - id: 2 type: 1 label: boy score: 0.949139 post_id: 2008_000335.txt bbox: [64.0, 38.0, 377.0, 147.0] attrs: {'blue': 0.571586, 'brown': 0.018853, 'colorful': 0.061227, 'gray': 0.570315, 'wooden': 0.009729, 'furry': 0.000332, 'clear': 0.759303, 'golden': 0.073179, 'yellow': 0.006765, 'rectangular': 0.067174, 'rusty': 0.050299, 'pink': 0.002161, 'black': 0.012524, 'dirty': 0.027361, 'orange': 0.005298, 'green': 0.007537, 'white': 0.012733, 'feathered': 0.069583, 'shiny': 0.003181, 'striped': 0.002878, 'red': 0.00454} preps: {'1,2': 'by'} Definitions: 'id': A unique identifier 'type': 1 for object or 2 for action/pose 'label': The detection label 'score': The vision score 'post_id': The image 'bbox': The bounding box of the detected object (xmin,ymin,width,height) 'attrs': The attributes that fire within the bounding box, along with their scores 'preps': A basic computed spatial relation derived from the bounding boxes. MUST be either 'above', 'below', 'by', or 'against', as defined in KB/preps. Possible prepositions from these basic ones are determined by corpus frequencies of the given objects. ### Output ### By default, Midge outputs a set of possible parse trees for each image. Nodes in the tree are formatted as (TAG, word, likelihood) triples. Likelihood is defined from the corpus frequencies, with the given head noun object.
About
Midge generator
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published