announce.eml

From: Nima Pourdamghani <damghani@isi.edu>
To: nlg-seminar@isi.edu, nlg-plus@isi.edu
Subject: Reminder: TODAY! - Beyond Parallel Data - A Decipherment Approach for Better Quality Machine Translation

*********** NL Seminar Announcement ***********

 Speaker: Qing Dou (USC / ISI)
    Date: 14 Aug 2015
    Time: 3:00 pm - 4:00 pm
Location: 6th Floor Large Conference Room [689]

    Note: Outside visitors should go to the tenth floor lobby where
          they will be met and escorted to the appropriate location
          five minutes before the talk.

   Title: Beyond Parallel Data - A Decipherment Approach for Better Quality Machine Translation

Abstract:

Thanks to the availability of parallel data and advances in machine
learning techniques, we have seen tremendous improvement in the field
of machine translation over the past 20 years. However, due to lack of
parallel data, the quality of machine translation is still far from
satisfying for many language pairs and domains. In general, it is
easier to obtain non-parallel data, and much work has tried to learn
translations from non-parallel data. Nonetheless, improvements to
machine translation have been limited. In this work, I follow a
decipherment approach to learn translations from non parallel data and
achieve significant gains in machine translation.

I apply slice sampling to Bayesian decipherment. Compared with the
state- of-the-art algorithm, the new approach is highly scalable and
accurate, making it possible to decipher billions of tokens with
hundreds of thousands of word types at high accuracy for the first
time. When it comes to deciphering foreign languages, I introduce
dependency relations to address the problems of word reordering,
insertion, and deletion. Experiments show that dependency relations
help improve Spanish/English deciphering accuracy by over 5-fold.
Moreover, this accuracy is further doubled when word embeddings are
used to incorporate more contextual information.

Moreover, I decipher large amounts of monolingual data to improve the
state- of-the-art machine translation systems in the scenario of
domain adaptation and low density languages. Through experiments, I
show that decipherment find high quality translations for
out-of-vocabulary words in the task of domain adaptation, and help
improve word alignment when the amount of parallel data is limited. I
observe up to 3.8 point and 1.9 point BlEU gain in Spanish/French and
Malagasy/English machine translation experiments respectively.

Bio. Qing is a PhD candidate at USC. His research interests focus on
application of machine learning techniques to help computer better
understand human languages. He is working with Kevin Knight on various
problems related to Machine Translation and Decipherment. Prior to
that, he has worked on computational phonology, including stress
prediction and transliteration. He is interested in continuing his
research in industrial settings to solve exciting large scale problems.


***********************************************

Remember, future seminars can be found at:
  http://www.isi.edu/natural-language/nl-seminar/