Skip to content

Latest commit

 

History

History
256 lines (193 loc) · 9.9 KB

README.md

File metadata and controls

256 lines (193 loc) · 9.9 KB

snek-LMS + Lantern logo

Check out our website!

Lantern

Build Status

Lantern is the implementation of a machine learning framework prototype in Scala. The design of Lantern is built on two important and well-studied programming language concepts, delimited continuations and multi-stage programming (staging for short). Delimited continuations provides a very concise view of the reverse mode automated differentiation, which which permits implementing reverse-mode AD purely via operator overloading and without any auxiliary data structures. Multi-stage programming leading to a highly efficient implementation that combines the performance benefits of deep learning frameworks based on explicit reified computation graphs (e.g., TensorFlow) with the expressiveness of pure library approaches (e.g., PyTorch).

An accompanying technical paper is here: Demystifying Differentiable Programming: Shift/Reset the Penultimate Backpropagator.

A Python front-end that JIT-compiles TensorFlow and PyTorch code to Lantern is currently under development, based on Snek-LMS.

Automatic differentiation in Lantern

Lantern is based on a novel implementation of reverse-mode AD, the algorithm that underlies backpropagation in neural networks. It is well known that forward-mode AD can be implemented using operator overloading:

// differentiable number type
class NumF(val x: Double, val d: Double) {
  def +(that: NumF) =
    new NumF(this.x + that.x, this.d + that.d)
  def *(that: NumF) =
    new NumF(this.x * that.x,
             this.d * that.x + that.d * this.x)
  ...
}

// differentiation operator
def grad(f: NumF => NumF)(x: Double) = {
  val y = f(new NumF(x, 1.0))
  y.d
}

// example
val df = grad(x => 2*x + x*x*x)
forAll { x =>
  df(x) == 2 + 3*x*x }

Even though the intrinsics of forward-mode and reverse-mode AD are different, we implement reverse-mode AD in the same fashion as forward-mode AD. This is done by using delimited continuations.

// differentiable number type
class NumR(val x: Double, var d: Double) {
  def +(that: NumR) = shift { (k:NumR=>Unit)=>
    val y = new NumR(x + that.x, 0.0)
    k(y)
    this.d += y.d
    that.d += y.d
  }
  def *(that: NumR) = shift { (k:NumR=>Unit)=>
    val y = new NumR(x * that.x, 0.0)
    k(y)
    this.d += that.x * y.d
    that.d += this.x * y.d
  }
  ...
}

// differentiation operator
def grad(f: NumR => NumR@cps[Unit])(x: Double) = {
  val z = new NumR(x, 0.0)
  reset { f(z).d = 1.0 }
  z.d
}

// example
val df = grad(x => 2*x + x*x*x)
forAll { x =>
  df(x) = 2 + 3*x*x
}

Staging in Lantern

Efficiency is a big issue for practical deep learning tasks. Since Lantern is hosted in Scala, a user's deep learning model would run on the JVM which is not efficient enough for practical tasks. A good way to solve this problem is to stage our code and transform high-level Scala code into low-level code for efficient back-ends such as C++. This is another important feature of Lantern -- it supports Staging and code transformation!

We take the advantage of compatibility betweeen continuations and multi-stage programming and introduce Staging within 2 steps:

The first step is to extend data type to staged type.

// Staged Num (variable of type double with AD)
class Num(val x: Rep[Double], val d: Rep[Var[Double]]) {...}

// Staged Tensor and Tensor with AD
class Tensor(val data: Rep[Array[Double]], val shape: Array[Int]) {...}
class TensorR(val x: Tensor, val d: Tensor) {...}

The second step is to define basic control structures using delimited continuations with LMS support.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

JDK sbt

Directory Organization

Write deep learning models in Lantern

The automatic differentiation support of Lantern makes writing deep learning model extremely easy. Here is a code snippet of simple CNN model in Lantern:

val pars = ... // all trainable parameters
def trainFun(input: TensorR, target: Rep[Int]) = { (dummy: TensorR) =>
  val resL1 = input.conv(pars(0)).maxPool(stride).relu()
  val resL2 = resL1.conv(pars(1)).maxPool(stride).relu()
  val resL3 = ((pars(2) dot resL2.resize(in3)) + pars(3)).relu().dropout(0.5f)
  val resL4 = (pars(4) dot resL3) + pars(5)
  val res = resL4.logSoftmax()
  res.nllLoss(target)

Each layer is constructed and nested in a very elegant way. This is thanks to the functional implementation of automatic differentiation in Lantern.

Compile deep learning models to C++ programs

Once you have cloned this repo, enter into the root directory of Lantern repo ($PATH_REPO/).

If you want to compile our demo code, execute:

$ sbt
sbt> testOnly lantern.$TEST_instance

Here $TEST_instance can be one of the following 4 test instances:

  • VanillaRNN
  • LSTMTest
  • SentimentTreeLSTM
  • MnistCNN

You can also choose to run all 4 test cases as well as a bunch of basic tests in one command:

$ sbt
sbt> test

All generated C++ code will be put in corresponding subdirectory under the directory for evaluation code (./src/out/ICFP18evaluation/).

Running the deep learning model

Once you have compiled the deep learning model that you want to try, the C++ code is in corresponding directory. All you need is to compile that C++ program and run it. For example, suppose you are about to play with the Vanilla RNN language model and you already compiled the model and heve the generated code in directory. You can take the following steps to train it:

## make sure you are in the root directory of repo
cd ./src/out/ICFP18evaluation/evaluationRNN/
g++ -std=c++11 -O3 -march=native Lantern.cpp -o Lantern
./Lantern result.txt

Running the evaluations and plotting results

Running evaluations for CNN and TreeLSTM can take long time. We suggest users try VanillaRNN and LSTMTest first.

To run all test cases and plot their results, users only need to change working directory to repo and execute the following commands:

## suppose you already have all 4 models compiled
## make sure you are in the root directory of repo
cd ./src/out/ICFP18evaluation/
./run_exp.sh

The resulting plots are generated in the $PATH_REPO/src/out/ICFP18evaluation/save_fig/ dir.