This project shows how to use Deep Graph Library (DGL) on Amazon SageMaker to train a graph neural network (GNN) model to perform entity resolution on customer identity graphs. See the project detail page to learn more about the techniques used.
You will need an AWS account to use this solution. Sign up for an account here.
To run this JumpStart 1P Solution and have the infrastructure deploy to your AWS account you will need to create an active SageMaker Studio instance (see Onboard to Amazon SageMaker Studio). When your Studio instance is Ready, use the instructions in SageMaker JumpStart to 1-Click Launch the solution.
The solution artifacts are included in this GitHub repository for reference.
Note: Solutions are available in most regions including us-west-2, and us-east-1.
Caution: Cloning this GitHub repository and running the code manually could lead to unexpected issues! Use the AWS CloudFormation template. You'll get an Amazon SageMaker Notebook instance that's been correctly setup and configured to access the other resources in the solution.
The project architecture deployed by the cloud formation template is shown here.
deployment/
sagemaker-graph-entity-resolution.yaml
: Creates AWS CloudFormation Stack for solution
source/
lambda/
data-preprocessing/
index.py
: Lambda function script for invoking SageMaker processing
graph-modelling/
index.py
: Lambda function script for invoking SageMaker training
sagemaker/
data-preprocessing/
data_preparation_script.py
: Custom script used to prepare CIKM cup data to solution input formatdata_preprocessing_script.py
: Custom script used by SageMaker Processing for data processing/feature engineering
dgl-entity-resolution/
model.py
: Implements the various graph neural network models used in the project with the pytorch backenddata.py
: Contains functions for reading edges and node featuresestimator_fns.py
: Contains functions for parsing input from SageMaker estimator objectsgraph.py
: Contains functions for constructing DGL Graphs with node features and edge listsrequirements.txt
: Describes Python package requirements of the Amazon SageMaker training instancesampler.py
: Contains functions for graph sampling for mini-batch trainingtrain_dgl_pytorch_entry_point.py
: python entry point used by the notebook for GNN training with DGL pytorch backendutils.py
: python script with utility functions for computing metrics and plots
dgl-entity-resolution.ipynb
: Orchestrates the solution. Triggers preprocessing and model training
This project is licensed under the Apache-2.0 License.