Skip to content

SCRN-VRC/Language-Translation-with-Fragment-Shaders

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Language Translation with Fragment Shaders

NOTE: This was built and tested with Unity 2019.4.29f1 using built-in render pipeline, there may be shader compatibility issues with other versions.

Table of Contents

Overview

One of the influential papers in natural language processing is called "Attention Is All You Need". This project is a recreation of the Transformer model described in that paper without depending on deep learning libraries like TensorFlow or PyTorch. Specifically built in fragment shaders for VRChat, it will run as fast as your frame rate will allow in VR.

Problems

  • To keep the network simple, the max sentence length is 20 words. Including the special SOS and EOS, start/end of sentence tokens, it's 22.
  • This is a very simple implementation of Transformers, it can't handle numbers well. Only remembers what it seen before. For example, it might know 2000 but not 2001.
  • The character buffer only holds 80 characters, that's an average of 4 characters per word. So keep the sentences short.
  • And of course, this uses cameras which players can't see unless they're your friend in VRChat.

Transformers

To briefly go over how Transformers work, the input sentence is turned into a sequence of integers. In terms of English to Japanese, every English word has an unique integer representation. And every Japanese character has their own unique integer representation as well on the output side.

These integers are used to fetch a special vector trained to represent each word. Then positional encoding is added to the vector to give it spacial relationship.

The job of the encoder layer is to learn the relation of each word with other words in the input sentence. This is called self attention. The layer can be stacked together multiple times to add complexity.

The job of the decoder layer is to map the relationship the encoder has made with the input sentence and generate a target sentence.

To generate a Japanese sentence, the decoder layer picks the highest probable Japanese character given the inputs. Then it's added to the end of a string of previous outputs to be fed into the decoder again to generate the next. This loops until an EOS, an end of sentence token, is predicted or the max length is reached.

Setup

CLONING THE REPO WILL NOT WORK

  1. Download the latest .unitypackage from Release and import it.
  2. Look for the Prefab folders and find TranslatorEng2Jp.prefab or TranslatorJp2Eng.prefab.
  3. If adding to avatar, drop the prefab into the Scene first.
  4. Then add the prefab into the avatar hierarchy. This keeps the prefab the same size if the avatar or the bones are not scaled to 1.
  5. Check if the network works in Playmode.

Python, C++ Code

  • Python
    • https://www.tensorflow.org/text/tutorials/transformer
    • My Python code is a copy of the tutorial linked above but modified to spit out intermediate layer outputs and trained with a different dataset. It's better to follow the tutorial linked above than try to run mine.
  • C++
    • Windows, for non-Windows platforms remove the Windows.h include.
    • Download eng2jp_weights.bytes or jp2eng_weights.bytes and put it in the same folder as the compiled executable depending on which model you want to run.

Resources

Datasets

Thanks to Merlin and orels1 for the help.

If you have questions or comments, you can reach me on Discord: SCRN#8008 or Twitter: https://twitter.com/SCRNinVR