doframework
is a testing framework for decision-optimization model learning algorithms. Such algorithms learn part or all of a decision-optimization model from data and solve the model to produce a predicted optimal solution.
doframework
randomly generates multiple optimization problems (f,O,D,x*) for your algorithm to learn and solve:
- f is a continuous piece-wise linear function defined over a domain in d-dimensional space (d>1),
- O is a feasibility region in dom(f) defined by linear constraints,
- D = (X,y) is a dataset derived from f,
- x* is the true optimum of f in O (minimum or maximum).
doframework
feeds your algorithm constraints and data (O,D) and collects its predicted optimum. The algorithm's predicted optimal value can then be compared to the true optimal value f(x*). By comparing the two over multiple randomly generated optimization problems, doframework
produces a prediction profile for your algorithm.
doframework
integrates with your algorithm (written in Python).
doframework
was designed for optimal cloud distribution following an event-driven approach.
doframework
was built on top of ray for cloud distribution and rayvens for event driven management.
doframework
was written for Python version >= 3.8.0.
doframework
can run either locally or remotely. For optimal performance, run it on a Kubernetes cluster. Cloud configuration is currently available for AWS and IBM Cloud OpenShift clusters.
The framework uses storage (local or S3) to interact with simulation products. Configuration is currently available for AWS or IBM Cloud Object Storage COS.
To run doframework
locally, install with
$ pip install doframework
Storage specifications are provided in a configs.yaml
. You'll find examples under ./configs/*
.
The configs.yaml
includes the list of source and target bucket names (under buckets
). If necessary, S3 credentials are added under designated fields.
Here is the format of the configs.yaml
either for local storage
local:
buckets:
inputs: '<inputs-folder>'
inputs_dest: '<inputs-dest-folder>'
objectives: '<objectives-folder>'
objectives_dest: '<objectives-dest-folder>'
data: '<data-folder>'
data_dest: '<data-dest-folder>'
solutions: '<solutions-folder>'
or S3
s3:
buckets:
inputs: '<inputs-bucket>'
inputs_dest: '<inputs-dest-bucket>'
objectives: '<objectives-bucket>'
objectives_dest: '<objectives-dest-bucket>'
data: '<data-bucket>'
data_dest: '<data-dest-bucket>'
solutions: '<solutions-bucket>'
aws_secret_access_key: 'xxxx'
aws_access_key_id: 'xxxx'
endpoint_url: 'https://xxx.xxx.xxx'
region: 'xx-xxxx'
cloud_service_provider: 'aws'
Currently, two S3 providers are available under s3:cloud_service_provider
: either aws
or ibm
. The endpoint_url
is optional for AWS.
Bucket / folder names must be distinct.
input.json
files provide the necessary metadata for the random genration of optimization problems.
doframework
will run end to end, once input.json
files are uploaded to <inputs-bucket>
/ <inputs-folder>
.
The jupyter notebook ./notebooks/inputs.ipynb
allows you to automatically generate input files and upload them to <inputs-bucket>
.
Here is an example of an input file (see input samples input_basic.json
under ./inputs
).
{
"f": {
"vertices": {
"num": 7,
"range": [[5.0,20.0],[0.0,10.0]],
},
"values": {
"range": [0.0,5.0]
},
},
"omega" : {
"ratio": 0.8
},
"data" : {
"N": 750,
"noise": 0.01,
"policy_num": 2,
"scale": 0.4
},
"input_file_name": "input_basic.json"
}
f:vertices:num
: number of vertices in the piece-wise linear graph of f.
f:vertices:range
: f domain will be inside this range.
f:values:range
: range of f values.
omega:ratio
: vol(O) / vol(dom(f)) >= ratio.
data:N
: number of data points to sample.
data:noise
: response variable noise.
data:policy_num
: number of centers in Gaussian mix distribution of data.
data:scale
: max STD of Gaussian mix distribution of data (as a ratio of domain diameter).
It's a good idea to start experimenting on low-dimensional problems.
Your algorithm will be integrated into doframework
once it is decorated with doframework.resolve
.
A doframework
experiment runs with doframework.run()
. The run()
utility accepts the decorated model and an absolute path to the configs.yaml
.
Here is an example a user application module.py
.
import doframework as dof
@dof.resolve
def alg(data: np.array, constraints: np.array, **kwargs):
...
return optimal_arg, optimal_val, regression_model
if __name__ == '__main__':
dof.run(alg, 'configs.yaml', objectives=5, datasets=3, **kwargs)
doframework
provides the following inputs to your algorithm:
data
: 2D np.array with features X = data[ : , :-1] and response variable y = data[ : ,-1].
constraints
: linear constraints as a 2D numpy array A. A data point x satisfies the constraints when A[ : , :-1]*x + A[ : ,-1] <= 0.
It feeds your algorithm additional inputs in kwargs:
lower_bound
: lower bound per feature variable.
upper_bound
: upper bound per feature variable.
init_value
: optional initial value.
The run()
utility accepts the arguments:
objectives
: number of objective targets to generate per input file.
datasets
: number of datasets to generate per objective target.
distribute
: True to run distributively, False to run sequentially.
logger
: True to see doframework
logs, False otherwise.
after_idle_for
: stop running when event stream is idle after this many seconds.
alg_num_cpus
: number of CPUs to dedicate to your algorithm on each optimization task.
data_num_cpus
: number of CPUs to dedicate to data generation (useful in high dimensions).
Once you are done running a doframework
experiment, run the notebook notebooks/profile.ipynb
. It will fetch the relevant experiment products from the target buckets and produce the algorithm's prediction profile and prediction probabilities.
doframework
produces three types of experiment product files:
objective.json
: containing information on (f,O,x*)data.csv
: containing the dataset the algorithm accepts as inputsolution.json
: containing the algorithm's predicted optimum
See sample files under ./outputs
.
To run doframework
on a K8S cluster, make sure you are on the cluster's local kubectl
context. Log into your cluster, if necessary (applicable to OpenShift, see ./doc/openshift.md
).
You can check your local kubectl
context and change it if necessary with
$ kubectl config current-context
$ kubectl config get-contexts
$ kubectl config use-context cluster_name
>> Switched to context "cluster_name".
Now cd
into your project's folder and run the setup bash script doframework-setup.sh
. The setup script will generate the cluster configuration file doframework.yaml
in your project's folder. The setup script requires the absolute path to your configs.yaml
. Running the setup .sh
script will establish the ray
cluster.
$ cd <user_project_folder>
$ doframework-setup.sh --configs ~/path/to/configs.yaml
You have the option to adapt doframework.yaml
to fit your application.
Use the flag --project-requirements
to specify the absolute path to your requirements.txt
file. It will be pip install -r requirements.txt
on your cluster nodes.
Use the flag --project-dir
to specify the absolute path to your project. It will be pip installed on your cluster nodes.
$ doframework-setup.sh --configs ~/path/to/configs.yaml --project-requirements <absolute_requirements_path> --project-dir <absolute_project_path>
Use the --skip
flag to skip re-generating the doframework.yaml
.
$ doframework-setup.sh --skip
Or, in case you are familiar with ray
, run instead
$ ray up doframework.yaml --no-config-cache --yes
Upload input.json
file(s) to your <inputs_bucket>
. Now you can submit your application module.py
to the cluster
$ ray submit doframework.yaml module.py
To observe the ray
dashboard, connect to http://localhost:8265
in your browser. See ./doc/openshift.md
for OpenShift-specific instructions.
Some useful health-check commands:
- Check the status of
ray
pods
$ kubectl get pods -n ray
- Check the status of the
ray
head node
$ kubectl describe pod rayvens-cluster-head-xxxxx -n ray
- Monitor autoscaling with
$ ray exec doframework.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'
- Connect to a terminal on the head node
$ ray attach doframework.yaml
$ ...
$ exit
- Get a remote shell to the cluster manually (find the head node ID with
kubectl describe
)
$ kubectl -n ray exec -it rayvens-cluster-head-z97wc -- bash
After introducing manual changes to doframework.yaml
, update with
$ ray up doframework.yaml --no-config-cache --yes
Shutdown the ray
cluster with
$ ray down -y doframework.yaml
Run the setup bash script doframework-setup.sh
with the --example
flag to generate the test script doframework_example.py
in your project folder.
$ cd <user_project_folder>
$ doframework-setup.sh --configs ~/path/to/configs.yaml --example
To run the test script locally, use
$ python doframework_example.py --configs ~/path/to/configs.yaml
To run the test script on your K8S cluster, use
$ ray submit doframework.yaml doframework_example.py --configs configs.yaml
[NOTE: we are using the path to the configs.yaml
that was mounted on cluster nodes under $HOME
.]
Make sure to upload input json files to <inputs-bucket>
/ <inputs-folder>
once you run doframework_example.py
.