GiGL: Gigantic Graph Learning

This REPO is a bit dusty while we prepare for the OSS launch. We will have it in working order coming soon.

GiGL is an open-source library for training and inference of Graph Neural Networks at very large (billion) scale.

See 📖 Documentation for more details

Key Features 🌟

🧠 Versatile GNN Applications: Supports easy customization in using GNNs in supervised and unsupervised ML applications like node classification and link prediction.
🚀 Designed for Scalability: The architecture is built with horizontal scaling in mind, ensuring cost-effective performance throughout the process of data preprocessing and transformation, model training, and inference.
🎛️ Easy Orchestration: Simplified end-to-end orchestration, making it easy for developers to implement, scale, and manage their GNN projects.

GiGL Components ⚡️

GiGL contains six components, each designed to facilitate the platforms end-to-end graph machine learning (ML) tasks. The components are as follows:

Component	Source Code	Documentation
Config Populator	here	here
Data Preprocessor	here	here
Subgraph Sampler	here	here
Split Generator	here	here
Trainer	here	here
Inferencer	here	here

The figure below illustrates at a high level how all the components work together for and end-to-end GiGL pipeline.

Installation ⚙️

There are various ways to use GiGL. The recommended solution is to set up a conda environment and use some handy commands:

From the root directory:

make initialize_environment
conda activate gnn

This creates a Python 3.9 environment with some basic utilities. Next, to install all user dependencies:

make install_deps

If you instead want a developer-install which includes some extra tooling useful for contributions:

make install_dev_deps

Local Repo Setup

For developing on GiGL see our development guide and contribution guidelines

Using Docker

todo

Configuration 📄

Before getting started with running components in GiGL, it’s important to set up your config files. These are necessary files required for each component to operate. The two required files are:

Resource Config: Details the resource allocation and environmental settings across all GiGL components. This encompasses shared resources for all components, as well as component-specific settings.
Task Config: Specifies task-related configurations, guiding the behavior of components according to the needs of your machine learning task.

To configure these files and customize your GiGL setup, follow our step-by-step guides:

Usage 🚀

GiGL offers 3 primiary methods of usage to run the components for your graph machine learning tasks.

1. Importable `gigl`

To easily get started or incorporate gigl into your existing workflows, you can simply import gigl and call the .run() method on its components.

Example

from gigl.src.training.trainer import Trainer

trainer = Trainer()
trainer.run(task_config_uri, resource_config_uri, job_name)

2. Command-Line Execution

Each GiGL component can be executed as a standalone module from the command line. This method is useful for batch processing or when integrating into shell scripts.

Example

python -m \
    gigl.src.training.trainer \
    --job_name your_job_name \
    --task_config_uri gs://your_project_bucket/task_config.yaml \
    --resource_config_uri "gs://your_project_bucket/resource_conifg.yaml"

3. Kubeflow Pipeline Orchestration

GiGL also supports pipeline orchestration using Kubeflow. This allows you to easily kick off an end-to-end run with little to no code. See Kubeflow Orchestration for more information

The best way to get more familiar with GiGL is to go through the various examples or for specific details see our user guide.

Tests 🔧

Testing in GiGL is designed to ensure reliability and robustness across different components of the library. We support three types of tests: unit tests, local integration tests, and cloud integration end-to-end tests.

Unit Tests

GiGL's unit tests focus on validating the functionality of individual components and high-level utilities. They also check for proper formatting, typing, and linting standards.

More Details

No external assets or a GCP project are required.
Unit tests run on every commit on a pull request via Github Actions

To run unit tests locally, execute the following command:

# Runs both Scala and Python unit tests.
make unit_test

# Runs just Python unit tests
make unit_test_py

# Runs just Scala unit tests
make unit_test_scala

Local Integration Test

GiGL's local integration tests simulate the pipeline behavior of GiGL components. These tests are crucial for verifying that components function correctly in sequence and that outputs from one component are correctly handled by the next.

More Details

Utilizes mocked/synthetic data publicly hosted in GCS (see: Public Assets)
Require access and run on cloud services such as BigQuery, Dataflow etc.
Required to pass before merging PR (Pre-merge check)

To run integration tests locally, you need to provide yur own resource config and run the following command:

make integration_test resource_config_uri="gs://your-project-bucket/resource_config.yaml"

Cloud Integration Test (End-to-End)

Cloud integration tests run a full end-to-end GiGL pipeline within GCP, also leveraging cloud services such as Dataflow, Dataproc, and Vertex AI.

More Details

Utilizes mocked/synthetic data publicly hosted in GCS (see: Public Assets)
Require access and run on cloud services such as BigQuery, Dataflow etc.
Required to pass before merging PR (Pre-merge check). Access to the orchestration, logs, etc., is restricted to authorized internal engineers to maintain security. Failures will be reported back to contributor as needed.

To test cloud integration test functionality, you can replicate by running and end-to-end pipeline by following along one of our Cora examples (See: Examples)

Contribution 🔥

Your contributions are always welcome and appreciated. The following are the things you can do to contribute to this project.

Report a bug
If you think you have encountered a bug please feel free to report it here and someone from the team will take a look.
Request a feature
Feature requests are always welcome! You can request a feature by adding it here
Create a pull request
Pull request are always greatly appreciated. You can get started by picking up any open issues from here and making a pull request.

If you are new to open-source, make sure to check read more about it here and learn more about creating a pull request here.

For more information, see our Contributing Guide

Additional Resources ❗

You may still have unanswered questions or may be facing issues. If so please see our FAQ or our User Guide for further guidence.

License 🔒

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
containers		containers
deployment/configs		deployment/configs
docs		docs
examples		examples
proto/snapchat/research/gbml		proto/snapchat/research/gbml
python		python
requirements		requirements
scala		scala
scala_spark35		scala_spark35
scripts		scripts
shared/tests		shared/tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
base_images.variable		base_images.variable
fossa-deps.yml		fossa-deps.yml
mypy.ini		mypy.ini
pull_request_template.md		pull_request_template.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GiGL: Gigantic Graph Learning

This REPO is a bit dusty while we prepare for the OSS launch. We will have it in working order coming soon.

Key Features 🌟

GiGL Components ⚡️

Installation ⚙️

Configuration 📄

Usage 🚀

1. Importable `gigl`

2. Command-Line Execution

3. Kubeflow Pipeline Orchestration

Tests 🔧

Unit Tests

Local Integration Test

Cloud Integration Test (End-to-End)

Contribution 🔥

Additional Resources ❗

License 🔒

About

Releases

Packages

Contributors 5

Languages

License

snap-research/GiGL

Folders and files

Latest commit

History

Repository files navigation

GiGL: Gigantic Graph Learning

This REPO is a bit dusty while we prepare for the OSS launch. We will have it in working order coming soon.

Key Features 🌟

GiGL Components ⚡️

Installation ⚙️

Configuration 📄

Usage 🚀

1. Importable gigl

2. Command-Line Execution

3. Kubeflow Pipeline Orchestration

Tests 🔧

Unit Tests

Local Integration Test

Cloud Integration Test (End-to-End)

Contribution 🔥

Additional Resources ❗

License 🔒

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

1. Importable `gigl`

Packages