-
Notifications
You must be signed in to change notification settings - Fork 623
Tensorflow within Peloton
Tensorflow(TF) within Peloton can be used to create and use Deep Learning models for miscellaneous learning tasks. Currently it is being used for the Workload forecasting modules(brain/workload
). The diagram below summarized the process involved in setting up and running a TF model:
An important point to keep in mind is that there is a multi-language TF dependency. The actual TF models need to be written in Python(Why - because its highly documented and stable compared to other languages), after which they are serialized to Protobuf. This serialized export can now be imported into C++ and used for training and prediction.
We need both the C and Python Tensorflow APIs:
-
Python(3.6):
pip install --upgrade tensorflow
. -
C API: The TF C API can be installed as explained here. It is a native API described as suitable for building bindings for other languages on the TF website.
Since Google offers prebuilt binaries for the C API, its intallation is relatively lightweight and fast.
Getting TF to work within Travis/Jenkins is a bit tricky. This is mainly because of the prebuilt binaries often put constraints on the OS environment.
- Protobuf 3.4.0+ is needed for the newest versions of Tensorflow which are available on Mac's
brew
but not Ubuntuapt-get
. So it has to be built from source on Ubuntu. For Ubuntu 18.04 onwards, Maarten Fonwille's PPA can be used. - Newer versions of TF(>= 1.5.0) don't work correctly on Ubuntu 14.04 so TF 1.4.0 needs to be used for that.
One alternative explored by us(and worth exploring in the future) is the TF C++ API. It traditionally requires a long bazel installation, build and additional setup. An easier way of installing is described here. But this still requires installing bazel and building tensorflow with it. The overall process is a bit tricky to get right and time consuming.
The main DL model has to be written in Python. An example LSTM model is available at src/brain/modelgen/LSTM.py
. When building a model there are three important things to follow:
-
Named Input/Output Graph Nodes: The graph nodes which accept some sort of input(placeholders) and which return a result(prediction output/error metric) should be provided an appropriate name. These names are used to call these components within C++ and pass data to them. For example, you'll notice
data_
,target_
,lossOp_
names in the LSTM code. -
Passing Arguments by CLI: The python script should be able to run by CLI by passing arguments to the script(using a library such as
argparse
). All relevant parameters for the model should be set this way. -
Protobuf Export Code: The
write_graph
method should be directly reused and called in the main function as:
args = parser.parse_args()
model = LSTM(args.nfeats, args.nencoded, args.nhid,
args.nlayers, args.lr, args.dropout_ratio,
args.clip_norm)
model.tf_init()
model.write_graph(' '.join(args.graph_out_path))
To give perspective about why we are doing all this here's a little lookahead. The way things work within C++ is as follows - First we will execute, build and serialize the Tensorflow graph on the fly in the C++ model constructor. Immediately after we import the model and use the C API to call its nodes for passing inputs and getting outputs. Finally we cleanup the exported model in the C++ model destructor.
The TF Session Entity uses the TF C API for fine grained model control, providing clients higher level functions to abstract away implementation. There are 3 main classes at brain/util/tf_session_entity
:
-
TF Session Entity Input: Used to define and allow the TF graph to accept inputs as C++ data types. It currently works with native arrays and
std::vector
. Example usage is as follows:
// Inputs for backprop
std::vector<TfFloatIn *> inputs_optimize{
// flattened C++ vectors(but works even without flattening)
// if flattened dimensions have to be passed separately
new TfFloatIn(data_batch.data(), dims, "data_"),
new TfFloatIn(target_batch.data(), dims, "target_"),
// single float value
new TfFloatIn(dropout_ratio_, "dropout_ratio_"),
- TF Session Entity Output: Used to define and allow the TF graph to accept outputs as C++ data types. Example usage:
// loss output
auto output_loss = new TfFloatOut("lossOp_");
// prediction output
auto output_predict = new TfFloatOut("pred_");
- TF Session Entity: This class accepts the input and output types, evaluates and returns the relevant result. Example usage:
auto out = this->tf_session_entity_->Eval(inputs_loss, output_loss);
The BaseTFModel
C++ class can be inherited to obtain all the required functionality for automatically constructing the serialized model, using the TF session entity for training/prediction and finally destroying the model. You can refer to brain/workload/lstm.cpp
for an example implementation.
BaseTFModel
needs the path to the python script, to the directory where to generate the model and the final generated model name. It resolved relative paths to absolute paths automatically(see the member variables).
-
Calling Python script: Simply call
GenerateModel
with all the arguments passed as--argname argvalue
. -
Importing Model: Simply
tf_session_entity_->ImportGraph(graph_path_);
. -
Initialize the TF session: Simply call
TFInit()
The way this works is completely upto the client. The tf_session_entity_
is available to call upon the graph nodes for backpropagation/loss/prediction.