sentinels

Manage and track machine learning, deep learning experiments in a systematic way on a kubernetes infrastructure using command line interface without having to worry about scheduling, monitoring and managing experiments.

Inteded users of this project

Startups that are building AI first products and want to conduct research in a well defined manner.
Companies that want to adopt practises to do fast prototyping of AI research in a meaningful way.
Engineers interested in machine learning workflow and automation with open source frameworks.

Thought process

Motivation

With more and more AI first companies coming up, its important for these companies to be able to conduct data science experiments in a convinent trackable fashion. Also, following a standard way to run experiments helps the team in scaling and building components better. This is the motivation behind sentinels. The name refers to the autonomous beings from the movie matrix.

Proposed architecture

Here is the proposed architecture :

Personas who use the system

Data scientist :

As a data scientist, I want to be able to get the status of the current jobs that are running via command line interface
As a data scientist, I want to be able to get the next jobs that are to be executed in the queue via command line interface
As a data scientist, I want to be able to push a docker container and its required configuration to the queue
As a data scientist, given a job ID, I should be able to get the status of the job
As a data scientist, I want to track all the previous experiments via ML flow server
As a data scientist, I want to view logs of the experiment via ELK server
As a data scientist, I want to track the infrastructure usage of an experiment via Prometheus dashboard

Data Engineer :

As a data engineer, I want to be able to view all the current jobs via command line
As a data engineer, I want to be alerted on slack if a job complete or fails
As a data engineer, I want to be alerted if infrastructure resources are used up
As a data engineer, I should have the capability to prioritize certain experiments using configuration

Devops :

As a Devops engineer, I want to be able to setup kubernetes cluster with all the components via a single helm chart
As a Devops engineer, I should be able to add kubernetes nodes at will and remove them when not required
As a Devops engineer, I should be able to monitor and setup alerts and notifications with integrations to slack

Proposed components to be built

Command line interface that can be used to push jobs, get status and manage queue by data scientists
Message broker that queues up all the jobs
Resource Monitoring system that checks resources on kubernetes
Job scheduler on an available node via Kubernetes jobs
ML Flow server, ELK server, Prometheus probing to be setup on a dedicated node

Proposed workflow

Data scientists push dockerfile url and configuration to run the experiment to the queue via command line interface. In return, he recieves an job id.
The job queue APi recieves the job and adds it to the queue.
The resource monitor checks for available resources to schedule a job.
If resource available, job scheduler, schedules the job on the available nodes.
Track the status of the job via MLFlow, ELK and prometheus dashboards
Data scientist can query the status of the job via command line interface.
On completion of the job, store the artifacts in storage for further perusal.

Assumptions on usage

Data is accessed via a shared store of reads, so path to the data store directory. Currently, will support only files and directories.
Docker containers are present in docker registry.
A user from command line would send docker container Id/ url with a path to configuration yaml and other flags.
The job is queued.
Runs as a k8s job with resources present.
All metrics, logs pushed to mlflow server and elk.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LICENSE		LICENSE
MachineLearningExperimentsPipeline.png		MachineLearningExperimentsPipeline.png
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sentinels

Inteded users of this project

Thought process

Motivation

Proposed architecture

Personas who use the system

Proposed components to be built

Proposed workflow

Assumptions on usage

Project Steering Commitee

Task board

Technologies planned to be used

About

Releases

Packages

License

deep-learning-for-humans/sentinels

Folders and files

Latest commit

History

Repository files navigation

sentinels

Inteded users of this project

Thought process

Motivation

Proposed architecture

Personas who use the system

Proposed components to be built

Proposed workflow

Assumptions on usage

Project Steering Commitee

Task board

Technologies planned to be used

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages