MI project collects data from GitHub repositories. You can use it to either collect data stored locally or within Amazon's S3 cloud. For personal usage, checkout <Usage> section.
Together with mi-scheduler, we provide automated data extraction pipeline for data minig of requested repositories and organizations. This pipeline can be scheduled customly, e.g. to run daily, weekly, and so on.
To request data extraction for repository or organization, create Data Extraction Issue in MI-Scheduler repository. Use this link TODO
MI pipeline is simple to understand, see diagram below
+---------+ |ConfigMap| +----+----+ | +--+-------+--------+--+ | | | | | | mi-scheduler | | | | | | +------+---+---+-------+ | | | | | | | | | | | | | | | | Argo Workflows | | | | | | | | | | | +---------------v---v---v---v----v------------------+ +-------------------- +--------------------+ | | | Visualization | | Recommendation | | +---------+ +---------+ +---------+ | +-------------------+ +--------------------+ | |thoth/ | | AICoE | | your | | | Project Health | | thoth | | | station| | | | org | | | (dashboard) | | | | +---------+ +---------+ +---------+ | | | | | | |solver | |... | |your | | +---------+---------+ +----------+---------+ | | | | | | repos | | thoth-station/mi ^ ^ | |amun | |... | X X X X X | | | (Meta-information Indicators) | | | | | | | | | | +-------------+---------------+ | |adviser | |... | | | | | | | | | | | | | | | |.... | |... | | | | +-----------------+-------------------+ | | | | | | | | | | | +---------+ +---------+ +---------+ | | Knowledge Processsing | | | | | +-----------------------+---------------------------+ +-----------------+-------------------+ GitHub repositories | ^ | +--------------------------------------------------------+ | | | | | | | Entities Analysis +-------> Knowledge | | +---------------->-+ +--------------------+ +---------+----------------+----------+------------------+ | Issues | Pull Requests | Readmes | etc........... | | | | | | +---------+----------------+----------+------------------+
MI analyses entities specified on the srcopsmetrics/entities page Entity is essentialy a repository metadata that is being inspected (e.g. Issue or Pull Request), from which specified features are extracted and are stored to dataframe.
MI is essentialy wrapped around PyGitHub module to provide careless data extraction with API rate limit handling and data updating.
MI is available through PyPI, so you can do
pip install srcopsmetrics
Alternatively, you can install srcopsmetrics by cloning repository
git clone https://github.com/thoth-station/mi.git
cd mi
pipenv install --dev
To be able to extract data from GitHub, access token must be configured. To generate one, read this
To use the token with mi, set GITHUB_ACESS_TOKEN
environment variable to the token value, for example:
export GITHUB_ACESS_TOKEN=<token_string>
or
GITHUB_ACESS_TOKEN=<token_string> python -m srcopsmetrics.cli ...
and etc.
To store data locally, use -l
when calling CLI or set is_local=True when using MI as a module.
By default MI will try to store the data on Ceph. In order to store on Ceph you need to provide the following env variables:
S3_ENDPOINT_URL
Ceph Host nameCEPH_BUCKET
Ceph Bucket nameCEPH_BUCKET_PREFIX
Ceph PrefixCEPH_KEY_ID
Ceph Key IDCEPH_SECRET_KEY
Ceph Secret Key
For more information about Ceph storing look here
To view all of the available commands and their description use
python -m srcopsmetrics.cli --help
See some of the general usage examples below
python -m srcopsmetrics.cli --create --is-local --repository foo_repo --entities PullRequest
which is equivalent to
python -m srcopsmetrics.cli -clr foo_repo -e PullRequest
python -m srcopsmetrics.cli -clo foo_org -e PullRequest
python -m srcopsmetrics.cli -clr foo_repo,bar_repo -e PullRequest
python -m srcopsmetrics.cli -clr foo_repo -e PullRequest,Issue,Commit
To know more about indicators that are extracted from data, check out Meta-Information Indicators.
Always feel free to open new Issues or engage in already existing ones!
If you want to contribute by adding new entity or metric that will be analysed from GitHub repositories,
feel free to open up an Issue and describe why do you think this new entity should be analysed and what
are the benefits of doing so according to the goal of thoth-station/mi project
.
After creating Issue, you can wait for the response of thoth-station
devs
Do not forget to reference the Issue in your Pull Request.
Look at Template entity to get an idea for requirements that need to be satisfied for custom entity implementation.