Python Continuous Integration Command Line Engine
A simple command line tool to poll github for Pull Requests on a project for a single base branch(master by default)
and trigger builds when they change, or when the base branch of a PR changes. Projects are assumed to be C++
,
use CMake
for configuration and CTest
for testing with results submitted to a CDash
dasboard.
The project was/is created for use with HPX and the HPX (CDash) dashboard associated with the project is Here, other (non HPX) projects are supported.
When running, pycicle will poll github once every N seconds and look for open pull requests on your base branch using the pygithub API. A list of open PRs is generated, and for each that is mergeable, pycicle looks at the latest SHA on the PR and the latest SHA on base branch, and if either has changed since last time it looked, it marks that PR as needing an update (rebuild).
A build is triggered either from the current shell or by ssh-ing into a remote machine and calling
ctest -S dashboard-script.cmake <args>
to spawn a build, or by calling ctest -S dashboard-<scheduler>.cmake <args>
if the machine is using a scheduler for job control. Currently slurm and pbs are supported.
The scheduler version of the dashboard script does nothing more than wrap the call to the base dashboard script inside an job dispath script wrapper so that the build is triggered by on a compute node, rather than on the login node.
The build script will checkout the latest base branch (as specified in command line arguments or config file),
merge the PR (branch) into it, then do ctest configure/build/test with submit steps after each configure/build/test step respectively
to produce an entry in the dashboard that is updated as the build progresses.
Note that if a pull request is modified whilst a previous build is still going, a scancel
of the existing job is used to terminate the first before starting the second.
Every M seconds, pycicle will find (scrape) a small log file generated in each build dir that contains a summary of config/build/test results and update the github PR status based on it so that failures flag the PR as not ready for merging.
Running pycicle is relatively simple, and can be done by a user, manually or in a cron job.
It does not require elevated system privledges and uses the
same permissions as the user starting it. When a build fails, the user can ssh
into the machine, cd
into
the build dir, manually (re)start the build, see the errors, cd
into the source dir and inspect the repo/branch that
is being tested and tweak anything necessary to get it working - even fix the build/test errors and update the PR
from the test build's copy of the repo.
You can run it inside a screen session, at startup or just leave a terminal open with it and start and stop it
on demand.
CDash supports the display of build information from many sites, so pycicle can be run at several institutions with results from machines at each location being submitted to a single CDash dashboard. Machines at each location may be configured by different users and no central coordination is required - it is this aspect that makes it attractive to projects like HPX that have developers in several locations and compute resources ditributed worldwide with different architectures/hardware.
Pycicle allows users to:
- contribute to more complete CI
- running CI on exactly the systems they care about
- easily run CI on forked repos
To run locally and use machine daint for builds of the hpx project
python ./pycicle.py -m daint -P hpx
or for builds of the dca project
python ./pycicle.py -m daint -P hpx
usage: pycicle.py [-h] [-s] [--no-slurm] [-d] [-r PYCICLE_ROOT] [-t USER_TOKEN]
[-m MACHINES [MACHINES ...]] [-p PULL_REQUEST] [-c]
-P PROJECT, --project : Project name (case sensitive)
This is the name of the project to be tested, it should be the same as the name/name.cmake
file that holds the settings.
-s, --slurm : Use slurm for job launching (default).
When slurm is enabled, builds are triggered by launching a slurm script that in turn launches the ctest build script
--no-slurm : Disable slurm job launching
When disabled, the script is executed directly, you might want to do this when setting up a build script
and using a login node for test purposes.
-d, --debug : Enable debug mode
When using debug mode, remote commands are echoed to the screen instead of being executed. This is useful
when setting up your first build and trying to get commands right or debugging pycicle itself.
-r PYCICLE_ROOT, --pycicle-root : pycicle root path/directory
The environment variable $PYCICLE_ROOT should be set on the machine you are running pycicle on,
and also on the machine where builds are being triggered - but when supplied on the command line,
it overrides the environment variable for the local machine (not the remote one, but we could add that).
It is the root of the build/src tree where pycicle will write all its files.
-t PYCICLE_GITHUB_TOKEN, --github-token PYCICLE_GITHUB_TOKEN : github token used to authenticate access
To access github (and set status of PRs) you need to generate a developer token on the github website and use it
when initializing the pygithub object. Set the environment variable $PYCICLE_GITHUB_TOKEN or pass it on the
command line.
Make sure you give yourself write permission if you want to set the status of PRs using pycicle.
-m MACHINES [MACHINES ...], --machines MACHINES [MACHINES ...] list of machines to use for testing
Currently pycicle only supports a single machine at a time, but the plan is to allow spawning build on several
machines from a single pycicle instance. Currently we run one instance on a login node of each machine or run
one on a local terminal and use ssh
to spawn on a single remote machine.
-p PULL_REQUEST, --pull-request PULL_REQUEST : A single PR number for limited testing
When debugging pycicle, or your build scripts, to avoid spamming github, use a known PR numbe to tell pycicle
to ignore all other PRs apart from that one.
-c, --scrape-only : Only scrape results and set github status (no building)
When this is set, pycicle will not trigger any builds, it will only look for completed build logs on the remote
machine and scrape them for the status it needs to set github PRs to enabled or disabled.
Create a pycicle directory on a machine, set $PYCICLE_ROOT to the path and add it
to your bash
startup so that a machine that ssh's in will have it set.
(Note that this mode of operation might not work as it hasn't been used for a while but the setup steps are still valid for both modes).
# setup for machine that will do builds and run pycicle script
PYCICLE_ROOT=/user/biddisco/pycicle
mkdir -p $PYCICLE_ROOT
cd $PYCICLE_ROOT
# clone pycicle into the root
git clone https://github.com/biddisco/pycicle.git pycicle
# create a directory called `repos` where projects to be tested will go
# make a copy of your project git repository in the `repos` folder
cp -r /path/to/your/project/hpx $PYCICLE_ROOT/repos/hpx
# alternatively, clone your project into the `repos` folder
# git -C ./repos/ clone git@github.com:STEllAR-GROUP/hpx.git repos/
Note that if you are testing more than one project using the same tree then
it is only necessary to clone/copy a second project into the repos
folder.
Follow the steps above on the machine that will do builds. On the machine that will run the scripts and trigger builds (on the remote machine)
# setup for machine that runs the pycicle script only
PYCICLE_ROOT=/user/biddisco/pycicle
mkdir -p $PYCICLE_ROOT
cd $PYCICLE_ROOT
# clone pycicle into the root
git clone https://github.com/biddisco/pycicle.git pycicle
Why do we keep a copy of the project repository in the pycicle root dir? The reason is that when initially developing pycicle on a laptop, using wifi internet access, it turned out to be very painful to git clone the entire HPX project for each PR being tested, and so pycicle will copy the repo from it's own private copy for each PR rather than cloning - this is much faster when the repo is many GBs. Note that each branch being tested will still be pulled from the origin (github), but this is much faster than a full clone. (NB. Doing a shallow clone isn't a great solution because you need to go back far enough to ensure the merge-base between the PR and the base branch is in the history).
After using the above setup, pycicle can be started using a command like
python $PYCICLE_ROOT/pycicle/pycicle.py -m MACHINE -P project
When it runs, two directories will be created
$PYCICLE_ROOT/src
$PYCICLE_ROOT/build
and these will be populated with source trees and build trees for PRs and the base branch when they need to be built.
see:
config/dca_local/condaGPUTrunk_local.cmake
An example is in config/dca_local
- still debugging this feature * Assuming cdash is setup on a httpd running on localhost:8080 of machine B. From machine B
ssh -fN -R38080:localhost:8080 you@clusterA-login-node
If you have load balancing on log in nodes make sure to explicitly raise the reverse tunnel on the login node pycicle is running on.
The HPX project runs a tool called inspect
on the code (similar to clang-format/style checks etc) to ensure
that #includes
are set correctly and basic format checks pass.
Currently this is hardcoded into the ctest script as a prebuild step to do
an extra configure and submit step to a different dashboard track.
If you use pycicle to test non HPX projects, the inspect step will be skipped - at some point the scripts
will be updated to allow a custom tool per project to be run.
Not yet implemented, but adding a doc build step to pycicle.py
or the ctest scripts should be straightforward.
The config directory contains examples of two slurm operated machines {greina/daint}, these can be
copied/modified to create new configurations for other machines.
The machine name passed on the commandline python pycicle.py -m daint
(in this example daint
)
must correspond to the name of a cmake configuratiuon file for the machine in the config
directory.
Details of the CMake Vars that need to be set will follow. Most is self explanatory for developers familiar with CMake/CTest.
In the $PYCICLE_ROOT directory of the machine that runs the pycicle script you can delete the file that holds the last checked SHA from github. This will trigger a new build for all PRs.
cd $PYCICLE_ROOT
find src -maxdepth 2 -name last_pr_sha.txt -delete
If you only want to force a rebuild for PR 3042, then
cd $PYCICLE_ROOT
rm -f src/${project-name}-3042/last_pr_sha.txt
NB. A command line param should be added to allow this to be done without manual deletion.
I don't really know anything about python, so have no real idea if this works with python2 and python3. I think it does and I added a few imports to make it work, but it isn't tested.