AutoG: Towards automatic graph construction from tabular data

Introduction

Overview

AutoG is a novel framework that addresses the critical challenge of automatically constructing high-quality graphs from tabular data for graph machine learning (GML) applications. While GML has seen tremendous growth, the crucial step of converting tabular data into meaningful graphs remains largely manual and unstandardized. AutoG leverages Large Language Models (LLMs) to automate this process, producing graphs that rival those created by human experts.

Key Features

Automatic graph schema generation without human intervention
LLM-based solution for high-quality graph construction

Installation

Step 1: Clone the Repository

git clone https://github.com/amazon-science/Automatic-Table-to-Graph-Generation

cd Automatic-Table-to-Graph-Generation/

Step 2: Install Core Dependencies

Recommend using docker to prepare the environment.

docker build -t autog .

Then create a mapping for consistent installation

docker run --gpus all -it -v ../Automatic-Table-to-Graph-Generation:/workspace -v ./opt:/opt autog /bin/bash

Then

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh && bash /tmp/miniconda.sh -b -p /opt/conda && rm /tmp/miniconda.sh
conda init && source ~/.bashrc && cd /workspace/multi-table-benchmark && bash conda/create_conda_env.sh -s -g 11.7 -p 3.9 -t 1.13.1 
source ~/.bashrc
pip install codetiming humanfriendly sentence_transformers==3.3.0 transformers==4.44.2 nltk==3.9.1

# Install 4dbinfer-related libraries
cd multi-table-benchmark
bash conda/create_conda_env.sh

# Clone DeepJoin to download the language model
git clone https://github.com/mutong184/deepjoin

Step 3: Optional Dependencies

These are required for development but not necessary if using cached LLM outputs:

pip install llama-index-llms-bedrock
pip install llama-index
pip install valentine

Usage

1. Dataset Preparation

Generate the preprocessing dataset:

bash scripts/download.sh

This creates two dataset versions:

Old Version: A baseline preprocessed version using basic heuristics
Expert Version: Human expert-generated version with optimized column naming

Note: AutoG uses the 'old' version as input while ignoring the schema information.

2. Running AutoG

To run the AutoG pipeline:

bash scripts/autog.sh

For detailed configuration options, see scripts/autog.sh.

3. Running Graph Machine Learning

Execute GML tasks on the constructed graphs:

bash scripts/run.sh

Update: now we stick to a non-api version of the llm, you can directly run the script and then paste the output from your browser-end LLMs for quick testing.

Using AutoG with Custom Datasets

Follow these steps to apply AutoG to your own data:

Generate metadata information:

from models.llm.gconstruct import analyze_dataframes
metadata = analyze_dataframes(your_dataframe)

Generate initial type predictions:

from prompts import identify
types = identify(metadata)

Create a DBBRDBDataset wrapper for your data.
Generate first-round prompts using AutoG.

Citation

If you use AutoG in your research, please cite:

@inproceedings{
chen2025autog,
title={AutoG: Towards automatic graph construction from tabular data},
author={Zhikai Chen and Han Xie and Jian Zhang and Xiang song and Jiliang Tang and Huzefa Rangwala and George Karypis},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=hovDbX4Gh6}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
dbinfer		dbinfer
main		main
models		models
multi-table-benchmark		multi-table-benchmark
newdatasets		newdatasets
prompts		prompts
scripts		scripts
test		test
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AutoG: Towards automatic graph construction from tabular data

Introduction

Overview

Key Features

Installation

Step 1: Clone the Repository

Step 2: Install Core Dependencies

Step 3: Optional Dependencies

Usage

1. Dataset Preparation

2. Running AutoG

3. Running Graph Machine Learning

Update: now we stick to a non-api version of the llm, you can directly run the script and then paste the output from your browser-end LLMs for quick testing.

Using AutoG with Custom Datasets

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

amazon-science/Automatic-Table-to-Graph-Generation

Folders and files

Latest commit

History

Repository files navigation

AutoG: Towards automatic graph construction from tabular data

Introduction

Overview

Key Features

Installation

Step 1: Clone the Repository

Step 2: Install Core Dependencies

Step 3: Optional Dependencies

Usage

1. Dataset Preparation

2. Running AutoG

3. Running Graph Machine Learning

Update: now we stick to a non-api version of the llm, you can directly run the script and then paste the output from your browser-end LLMs for quick testing.

Using AutoG with Custom Datasets

Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages