Clustering build logs to analyze common build issues

When your company attempts to build Lossless Semantic Trees (LSTs) for all of your repositories, you may find that some of them do not build successfully. While you could go through each of those by hand and attempt to figure out common patterns, there is a better way: cluster analysis.

You can think of cluster analysis as a way of grouping data into easily identifiable chunks. In other words, it can take in all of your build failures and then find what issues are the most common – so you can prioritize what to fix first.

This repository will walk you through everything you need to do to perform a cluster analysis on your build failures. By the end, you will have produced two HTML files:

one that visually displays the clusters
one that contains samples for each cluster.

NOTE: Clustering is currently limited to Maven, Gradle, .Net, and Bazel builds because our heuristic-based extraction of build errors is specific to these build types. Although build failures for other types won’t cause error when clustering, the heuristic extraction may overlook valuable parts of the stack trace.

Prerequisites

Note

This repository contains a devcontainer specification, it is the recommended path to get setup as it ensures a consistent developer experience. If you so choose, you can install the necessary components locally. Running without docker might be faster, if your local machine has GPU or metal support. See LOCAL_INSTALL.md for how to get started.

Please ensure you have the following tools installed on your system:

A Devcontainer compatible client (GitHub Codespaces, GitPod, DevPod, Docker Desktop, Visual Studio Code, etc)
Optional:
- Git

Instructions

Clone this project

Most Devcontainer clients will perform the clone on your behalf as well as initializing the workspace for you. If you specific client requires you to clone the workspace locally first, then you will need to perform that task using the following.

git clone git@github.com:moderneinc/moderne-cluster-build-logs.git
cd moderne-cluster-build-logs

Gather build logs

In order to perform an analysis on your build logs, all of them need to be copied over to this directory (Clustering). Please ensure that they are copied over inside a folder named repos.

You will also need a build.xlsx file that provides details about the builds such as where the build logs are located, what the outcome was, and what the path to the project is. This file should exist inside of repos directory.

Here is an example of what your directory should be look like if everything was set up correctly:

moderne-cluster-build-logs
│
├───scripts
│       (4 files)
│
└───repos
    │   builds.xlsx
    │
    ├───Org1
    │   ├───Repo1
    │   │   └───main
    │   │           build.log
    │   │
    │   └───Repo2
    │       └───master
    │               build.log
    │
    ├───Org2
    │   ├───Repo1
    │   │   └───main
    │   │           build.log
    │   │
    │   └───Repo2
    │       └───master
    │               build.log
    │
    └───Org3
        └───Repo1
            └───main
                    build.log

Using Moderne mass ingest logs

If you want to use Moderne's mass ingest logs to run this scripts, you may use the following script to download a sample.

python scripts/00.download_ingest_samples.py

You will be prompted which of the slices you want to download. Enter the corresponding number and press Enter.

Run the scripts

Warning

Please note these scripts won't function correctly if you haven't copied over the logs and build.xlsx file into the repos directory you're working out of.

Run the following scripts in order:

Step 1

The first time you run this script, you must first run 01.extract_failures.py to extract only the logpaths for the failed build stacktraces.

python scripts/01.extract_failures.py

Step 2

Load the logs and extract relevant error messages and stacktraces from the logs:

python scripts/02.load_logs_and_extract.py

Step 3

Embed logs and cluster:

python scripts/03.embed_summaries_and_cluster.py

Analyze the results

Once you've run the two scripts, you should find that a clusters_scatter.html and clusters_logs.html file were produced. Open those in the browser of your choice to get detailed information about your build failures.

python -m http.server 8080

Success! These can now be viewed in your browser at http://localhost:8080/clusters_scatter.html and http://localhost:8080/clusters_logs.html.

Optional: Marking a certain repository as "solved"

As you work through the build failures, you might want to exclude logs that have been marked as solved from the clustering process. To do this, open the failures.csv file and set the Solved column to True for the logs you want to ignore. Alternatively, you can delete or rename the build.log file for that repository. After making these changes, you can re-run the clustering process by re-starting at step 2. You may repeat steps 2 and 3 repeatedly to update the graphics as many times as needed.

Example results

Below you can see some examples of the HTML files produced by following the above steps.

`clusters_scatter.html`

This file is a visual representation of the build failure clusters. Clusters that contain the most number of dots should generally be prioritized over ones that contain fewer dots. You can hover over the dots to see part of the build logs.

`cluster_logs.html`

To see the full extracted logs, you may use this file. This file shows all the logs that belong to a cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.devcontainer		.devcontainer
images		images
scripts		scripts
.gitignore		.gitignore
LOCAL_INSTALL.md		LOCAL_INSTALL.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering build logs to analyze common build issues

Prerequisites

Instructions

Clone this project

Gather build logs

Using Moderne mass ingest logs

Run the scripts

Step 1

Step 2

Step 3

Analyze the results

Optional: Marking a certain repository as "solved"

Example results

`clusters_scatter.html`

`cluster_logs.html`

About

Releases

Packages

Contributors 6

Languages

moderneinc/moderne-cluster-build-logs

Folders and files

Latest commit

History

Repository files navigation

Clustering build logs to analyze common build issues

Prerequisites

Instructions

Clone this project

Gather build logs

Using Moderne mass ingest logs

Run the scripts

Step 1

Step 2

Step 3

Analyze the results

Optional: Marking a certain repository as "solved"

Example results

clusters_scatter.html

cluster_logs.html

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

`clusters_scatter.html`

`cluster_logs.html`

Packages