Skip to content
This repository has been archived by the owner on Mar 6, 2023. It is now read-only.

Wrapping Python Models for Running in DIMR

Tom Evans edited this page Oct 3, 2019 · 2 revisions

Controlling the Python Interpreter in Pybind11

To create OE BMI-compliant models from Python scripts we have wrapped our Python in BMI-compliant DLLs written in C++, using the Pybind11 library to drive modules and objects that make up the model in Python. The typical structure has been to represent the Python modules and objects with global variables, which can be shared among initialize(), update() and the other BMI functions.

Our first attempts looked something like this:

#include "bmi.h"
#include <pybind11/embed.h>

namespace py = pybind11;

// start the interpreter and keep it alive
py::scoped_interpreter guard{};

// import the model's Python module
py::module pyModule = py::module::import("model_module");
// create an object that will do the modelling
py::object pyObj = pyModule.attr("model_object")();

int initialize(const char* config_file){
    // call pyObject's initialize method to 
    // configure it for the model run
    pyObj.attr("initialize")(config_file);
    return 0;
}
// etc.

This approach was effective when models were run one-at-a-time, but came to grief when models were run in a sequence in DIMR. The reason for this failure was that the python interpreter was launched and kept running by the global pybind11::scoped_interpreter. The first model in the sequence would launch the interpreter successfully, but the second model would fail at load time because -- as stated in the interpreter lifetime section of the Pybind11 documentation -- Pybind11 does not permit a single process (DIMR in this case) to run more than one Python interpreter. The scoped_interpreter object opens an interpreter and keeps it open until the the scope in which it was created closes. In this example pyModule and pyObject are evaluated in the global scope, so the interpreter must be open while the DLL is loaded.

To prevent the DLLs from trying (and failing) to launch multiple Python interpreters, it is possible to test whether an interpreter is already running before attempting to instantiate Pybind11 objects. This requires adjustments to scoping like this:

#include "bmi.h"
#include <pybind11/embed.h>

namespace py = pybind11;

// declare, but do not instantiate, the model's Python module
py::module pyModule;
// similarly declare an object to do the modelling
py::object pyObj;

int initialize(const char* config_file){

    // Only launch the Python interpreter if one isn't running already 
    if (!Py_IsInitialized()) {
        py::initialize_interpreter();
    }

    pyModule = py::module::import("model_module");
    pyObj = pyModule.attr("model_object")();

    // call pyObject's initialize method to 
    // configure it for the model run
    pyObj.attr("initialize")(config_file);
    return 0;
}
// etc.

Since this approach uses an explicit call to pybind11::initialize_interpreter to launch the interpreter, there must also be an explicit call to pybind11::finalize_interpreter to shut it down. There must be exactly one of each call within a run of DIMR. Even if the models are being run in sequence, rather than in parallel, a single Python interpreter should be kept running continuously until all Python models are finished. Some Python modules -- notably NumPy -- do not clear Python's memory space completely and will throw exceptions if the interpreter is shut down and restarted within a single process.

While it's easy to tell whether or not a Python interpreter is running at the time that a model's initialize function is called, it is very difficult for a model to tell whether or not another processes within a DIMR run will need the Python interpreter in the future at the time that the model's finalize function is called. To avoid the need for each model to have knowledge of all models' past and future Python requirements within a DIMR run, a program that shuts down the Python interpreter can be placed last in the DIMR model sequence. The bmi_PySwitch DLL contains a minimally BMI-compliant "model" that does nothing but start or shut down a Python interpreter.

Another memory-management problem can occur when global variables go out of scope as a DLL is shut down. In the example where the Python components of the C++ program are declared globally, but instantiated in the initialize function, it is necessary to explicitly decrement the Python references to the globally declared variables before the Python interpreter is shut down. This can be done in the model's finalize function, like this:

int finalize(){
    pyObj.attr("finalize")();
    pyObj.dec_ref();
    pyModule.dec_ref();
    // The Python interpreter will be shut down 
    // py::finalize_interpreter();
    return 0;
}
Clone this wiki locally