AutoG is a novel framework that addresses the critical challenge of automatically constructing high-quality graphs from tabular data for graph machine learning (GML) applications. While GML has seen tremendous growth, the crucial step of converting tabular data into meaningful graphs remains largely manual and unstandardized. AutoG leverages Large Language Models (LLMs) to automate this process, producing graphs that rival those created by human experts.
- Automatic graph schema generation without human intervention
- LLM-based solution for high-quality graph construction
git clone https://github.com/amazon-science/Automatic-Table-to-Graph-Generation
cd Automatic-Table-to-Graph-Generation/
Recommend using docker to prepare the environment.
docker build -t autog .
Then create a mapping for consistent installation
docker run --gpus all -it -v ../Automatic-Table-to-Graph-Generation:/workspace -v ./opt:/opt autog /bin/bash
Then
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O /tmp/miniconda.sh && bash /tmp/miniconda.sh -b -p /opt/conda && rm /tmp/miniconda.sh
conda init && source ~/.bashrc && cd /workspace/multi-table-benchmark && bash conda/create_conda_env.sh -s -g 11.7 -p 3.9 -t 1.13.1
source ~/.bashrc
pip install codetiming humanfriendly sentence_transformers==3.3.0 transformers==4.44.2 nltk==3.9.1
# Install 4dbinfer-related libraries
cd multi-table-benchmark
bash conda/create_conda_env.sh
# Clone DeepJoin to download the language model
git clone https://github.com/mutong184/deepjoin
These are required for development but not necessary if using cached LLM outputs:
pip install llama-index-llms-bedrock
pip install llama-index
pip install valentine
Generate the preprocessing dataset:
bash scripts/download.sh
This creates two dataset versions:
- Old Version: A baseline preprocessed version using basic heuristics
- Expert Version: Human expert-generated version with optimized column naming
Note: AutoG uses the 'old' version as input while ignoring the schema information.
To run the AutoG pipeline:
bash scripts/autog.sh
For detailed configuration options, see scripts/autog.sh
.
Execute GML tasks on the constructed graphs:
bash scripts/run.sh
Update: now we stick to a non-api version of the llm, you can directly run the script and then paste the output from your browser-end LLMs for quick testing.
Follow these steps to apply AutoG to your own data:
-
Generate metadata information:
from models.llm.gconstruct import analyze_dataframes metadata = analyze_dataframes(your_dataframe)
-
Generate initial type predictions:
from prompts import identify types = identify(metadata)
-
Create a DBBRDBDataset wrapper for your data.
-
Generate first-round prompts using AutoG.
If you use AutoG in your research, please cite:
@inproceedings{
chen2025autog,
title={AutoG: Towards automatic graph construction from tabular data},
author={Zhikai Chen and Han Xie and Jian Zhang and Xiang song and Jiliang Tang and Huzefa Rangwala and George Karypis},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=hovDbX4Gh6}
}