This REPO is a bit dusty while we prepare for the OSS launch. We will have it in working order coming soon.
GiGL is an open-source library for training and inference of Graph Neural Networks at very large (billion) scale.
See 📖 Documentation for more details
-
🧠 Versatile GNN Applications: Supports easy customization in using GNNs in supervised and unsupervised ML applications like node classification and link prediction.
-
🚀 Designed for Scalability: The architecture is built with horizontal scaling in mind, ensuring cost-effective performance throughout the process of data preprocessing and transformation, model training, and inference.
-
🎛️ Easy Orchestration: Simplified end-to-end orchestration, making it easy for developers to implement, scale, and manage their GNN projects.
GiGL contains six components, each designed to facilitate the platforms end-to-end graph machine learning (ML) tasks. The components are as follows:
Component | Source Code | Documentation |
---|---|---|
Config Populator | here | here |
Data Preprocessor | here | here |
Subgraph Sampler | here | here |
Split Generator | here | here |
Trainer | here | here |
Inferencer | here | here |
The figure below illustrates at a high level how all the components work together for and end-to-end GiGL pipeline.
There are various ways to use GiGL. The recommended solution is to set up a conda environment and use some handy commands:
From the root directory:
make initialize_environment
conda activate gnn
This creates a Python 3.9 environment with some basic utilities. Next, to install all user dependencies:
make install_deps
If you instead want a developer-install which includes some extra tooling useful for contributions:
make install_dev_deps
Local Repo Setup
For developing on GiGL see our development guide and contribution guidelines
Using Docker
todoBefore getting started with running components in GiGL, it’s important to set up your config files. These are necessary files required for each component to operate. The two required files are:
-
Resource Config: Details the resource allocation and environmental settings across all GiGL components. This encompasses shared resources for all components, as well as component-specific settings.
-
Task Config: Specifies task-related configurations, guiding the behavior of components according to the needs of your machine learning task.
To configure these files and customize your GiGL setup, follow our step-by-step guides:
GiGL offers 3 primiary methods of usage to run the components for your graph machine learning tasks.
To easily get started or incorporate gigl into your existing workflows, you can simply import gigl
and call the .run()
method on its components.
Example
from gigl.src.training.trainer import Trainer
trainer = Trainer()
trainer.run(task_config_uri, resource_config_uri, job_name)
Each GiGL component can be executed as a standalone module from the command line. This method is useful for batch processing or when integrating into shell scripts.
Example
python -m \
gigl.src.training.trainer \
--job_name your_job_name \
--task_config_uri gs://your_project_bucket/task_config.yaml \
--resource_config_uri "gs://your_project_bucket/resource_conifg.yaml"
GiGL also supports pipeline orchestration using Kubeflow. This allows you to easily kick off an end-to-end run with little to no code. See Kubeflow Orchestration for more information
The best way to get more familiar with GiGL is to go through the various examples or for specific details see our user guide.
Testing in GiGL is designed to ensure reliability and robustness across different components of the library. We support three types of tests: unit tests, local integration tests, and cloud integration end-to-end tests.
GiGL's unit tests focus on validating the functionality of individual components and high-level utilities. They also check for proper formatting, typing, and linting standards.
More Details
- No external assets or a GCP project are required.
- Unit tests run on every commit on a pull request via Github Actions
To run unit tests locally, execute the following command:
# Runs both Scala and Python unit tests.
make unit_test
# Runs just Python unit tests
make unit_test_py
# Runs just Scala unit tests
make unit_test_scala
GiGL's local integration tests simulate the pipeline behavior of GiGL components. These tests are crucial for verifying that components function correctly in sequence and that outputs from one component are correctly handled by the next.
More Details
- Utilizes mocked/synthetic data publicly hosted in GCS (see: Public Assets)
- Require access and run on cloud services such as BigQuery, Dataflow etc.
- Required to pass before merging PR (Pre-merge check)
To run integration tests locally, you need to provide yur own resource config and run the following command:
make integration_test resource_config_uri="gs://your-project-bucket/resource_config.yaml"
Cloud integration tests run a full end-to-end GiGL pipeline within GCP, also leveraging cloud services such as Dataflow, Dataproc, and Vertex AI.
More Details
- Utilizes mocked/synthetic data publicly hosted in GCS (see: Public Assets)
- Require access and run on cloud services such as BigQuery, Dataflow etc.
- Required to pass before merging PR (Pre-merge check). Access to the orchestration, logs, etc., is restricted to authorized internal engineers to maintain security. Failures will be reported back to contributor as needed.
To test cloud integration test functionality, you can replicate by running and end-to-end pipeline by following along one of our Cora examples (See: Examples)
Your contributions are always welcome and appreciated. The following are the things you can do to contribute to this project.
-
Report a bug
If you think you have encountered a bug please feel free to report it here and someone from the team will take a look. -
Request a feature
Feature requests are always welcome! You can request a feature by adding it here -
Create a pull request
Pull request are always greatly appreciated. You can get started by picking up any open issues from here and making a pull request.
If you are new to open-source, make sure to check read more about it here and learn more about creating a pull request here.
For more information, see our Contributing Guide
You may still have unanswered questions or may be facing issues. If so please see our FAQ or our User Guide for further guidence.