Evaluation codes for caption generation or machine translation.
Original code: COCO Caption
Also includes code from Visual Question Answering.
Update!: Now supports TER (using tercom) and sentence-level BLEU from Nematus! and sentence-level TER.
- java 1.8.0
- python 2.7
export PYTHONPATH="/path/to/coco-caption:$PYTHONPATH"
- You will first need to download the Stanford CoreNLP 3.6.0 code and models for use by SPICE. To do this, run: ./get_stanford_models.sh
- Note: SPICE will try to create a cache of parsed sentences in ./pycocoevalcap/spice/cache/. This dramatically speeds up repeated evaluations. The cache directory can be moved by setting 'CACHE_DIR' in ./pycocoevalcap/spice. In the same file, caching can be turned off by removing the '-cache' argument to 'spice_cmd'.
- Microsoft COCO Captions: Data Collection and Evaluation Server
- PTBTokenizer: We use the Stanford Tokenizer which is included in Stanford CoreNLP 3.4.1.
- BLEU: BLEU: a Method for Automatic Evaluation of Machine Translation
- Meteor: Project page with related publications. We use the latest version (1.5) of the Code. Changes have been made to the source code to properly aggreate the statistics for the entire corpus.
- Rouge-L: ROUGE: A Package for Automatic Evaluation of Summaries
- CIDEr: CIDEr: Consensus-based Image Description Evaluation
- TER: Translation Edit Rate with Targeted Human Annotation
- SPICE: SPICE: Semantic Propositional Image Caption Evaluation
- Xinlei Chen (CMU)
- Hao Fang (University of Washington)
- Tsung-Yi Lin (Cornell)
- Ramakrishna Vedantam (Virgina Tech)
- Álvaro Peris (Universitat Politècnica de València)
- David Chiang (University of Norte Dame)
- Michael Denkowski (CMU)
- Alexander Rush (Harvard University)