Skip to content

Comparison over reference documents

pluteski edited this page May 2, 2017 · 2 revisions

Comparison on reference documents.

These comparisons are made over 245 reference documents, for which the reference transcript was transcribed using a speech-to-text transcription software that was trained to my voice, in a quiet environment, using a hand-held medium quality wired microphone, and then most errors manually corrected.

Google generated a transcript for 210 out of the 245 reference documents (86%), and IBM generated a transcript for 243 of the 245 (99%). The Bleu scores over these reference documents are fairly comparable.

Bleu score deciles

When measured using Ratcliff-Obershelp similarity, Google fares slightly better across the board.

Ratcliff score deciles

Clone this wiki locally