ICLR 2018 Poster, arXiv:1801.07736
This paper (1) proposes to improve the text generation quality using Genrative Adversarial Network (GAN), which was successfully used in many image generation tasks and (2) claim that "validation perplexity" measure alone cannot be used to evaluate the quality of the generated text directly. Compared to conventional text generation models such as seq2seq where a word is sampled conditioned on the previous word, their model is based on a actor-critic conditional GAN that fills in missing text conditioned on the surrounding context. They show much natural & realistic results compared to conventional maximum-likelihood based models.
Problem with previous text generation models is that - since they are trained to maximize the likelihood of a word conditioned on the previous word (teacher-forcing), model becomes unpredictable when an unseen word is used as a contdition to generate subsequent text, resulting in lower sample quality. There have been approaches to make the model more "predictable" (e.g. professor-forcing, SeqGAN).
Since it is infeasable to propagate the gradients from the discriminator back to the generator in discrete text-generation setting, authors use reinforcement learning (RL) based approach to train the generator. Also, authors train the generator on "in-filling" task where the model is expected to complete an incomplete sentence with blanks. This setting provides more feedback (loss signals) to the generator, thus resulting in better training stability and less mode-collapse for the generator.
Generator: A discrete sequence
Figure - Generator architecture: Encoder reads in masked sequence (underscore represents masked word), then the decoder imputes missing tokens using hidden states. Dotted line represents sampling operation.
Discriminator: Same architecture as the generator is used for the discriminator. Discriminator outputs probability estimates of all elements
Critic: The critic network estimates the value function, which is the discounted sum of total rewards
Since the model is not fully-differentiable due to the discrete sampling operations in the generator, for training, they estimate the gradient of the parameters via policy gradients. Using a variant of REINFORCE algorithm, generator gradient contribution for a single token
Table - Mechanical Turk blind evaluation between model pairs (trained on IMDB reviews)
MaskGAN represents GAN-trained variant and MaskMLE represents maximum-likelihood trained variant. Authors used PTB and IMDB dataset at word-level. Evaluation of generative models is done by unbiased human evaluation where blind comparison using Amazon Mechanical Turk is used.
Jan. 31 2018, Janghoon Choi