Graph ID is a universal identifier system for atomistic structures including crystals and molecules. It generates unique, deterministic identifiers based on the topological and compositional properties of atomic structures, enabling efficient structure comparison, database indexing, and materials discovery.
Graph ID works by:
- Converting atomic structures into graph representations where atoms are nodes and bonds are edges
- Analyzing the local chemical environment around each atom using compositional sequences
- Computing a hash-based identifier that captures both topology and composition
- Supporting various modes including topology-only comparisons and Wyckoff position analysis
- Universal Structure Identification: Generate unique IDs for any crystal or molecular structure
- Topological Analysis: Option to generate topology-only IDs for structure type comparison
- Wyckoff Position Support: Include crystallographic symmetry information in ID generation
- Distance Clustering: Advanced clustering-based analysis for complex structures
- C++ Performance: High-performance C++ backend with Python bindings
- Multiple Neighbor Detection: Support for various neighbor-finding algorithms (MinimumDistanceNN, CrystalNN, etc.)
pip install graph-id-core
pip install graph-id-db # optional database componentgit clone https://github.com/kmu/graph-id-core.git
cd graph-id-core
git submodule update --init --recursive
pip install -e .from pymatgen.core import Structure, Lattice
from graph_id import GraphIDMaker
# Create a structure (NaCl)
structure = Structure.from_spacegroup(
"Fm-3m",
Lattice.cubic(5.692),
["Na", "Cl"],
[[0, 0, 0], [0.5, 0.5, 0.5]]
)
# Generate Graph ID
maker = GraphIDMaker()
graph_id = maker.get_id(structure)
print(graph_id) # Output: NaCl-88c8e156db1b0fd9from pymatgen.core import Structure
from graph_id_cpp import GraphIDGenerator
# Load structure from file
structure = Structure.from_file("path/to/structure.cif")
generator = GraphIDGenerator()
graph_id = generator.get_id(structure)from graph_id_cpp import GraphIDGenerator
from pymatgen.analysis.local_env import CrystalNN
# Topology-only comparison (ignores composition)
topo_gen = GraphIDGenerator(topology_only=True)
topo_id = topo_gen.get_id(structure)
# Include Wyckoff positions
wyckoff_gen = GraphIDGenerator(wyckoff=True)
wyckoff_id = wyckoff_gen.get_id(structure)
# Use different neighbor detection
crystal_gen = GraphIDGenerator(nn=CrystalNN()) # Faster CrystalNN using C++ is also available
crystal_id = crystal_gen.get_id(structure)Use graph-id-db to search structures in the Materials Project using precomputed Graph ID stored in graph-id-db
# pip install graph-id-db
from graph_id_cpp import GraphIDGenerator
from pymatgen.core import Structure, Lattice
structure = Structure.from_spacegroup(
"Fm-3m",
Lattice.cubic(5.692),
["Na", "Cl"],
[[0, 0, 0], [0.5, 0.5, 0.5]]
).get_primitive_structure()
gen = GraphIDGenerator()
graph_id = gen.get_id(structure)
print(f"Graph ID of NaCl is {graph_id}")
from graph_id_db import Finder
# Search for structures in graph-id-db using GraphID
finder = Finder()
finder.find(graph_id)More comprehensive examples can be found in the tests/ and examples/ directories.
Graph ID is particularly useful for:
- Materials Databases: Efficient indexing and deduplication of structure databases
- High-throughput Screening: Rapid identification of unique structures in computational workflows
- Polymorph Identification: Distinguishing between different polymorphs of the same composition
You can search materials using Graph ID at matfinder.net.
This repo is managed by poetry.
- Clone the repository:
git clone https://github.com/kmu/graph-id-core.git
cd graph-id-core- Initialize git submodules (required for the C++ build):
git submodule update --init --recursive- Install the package and dependencies using Poetry:
poetry install- Install
pre-commit
pre-commit installNote: The git submodules (library/pybind11, library/eigen, library/gtl) are required for building the C++ extension. Without them, the installation will fail during the CMake build step.
poetry run pytest
If you have made changes to the C++ code, run poetry run pip install -e --force-reinstall to apply the changes before running the tests.
- Bump version in
pyproject.toml. - Create a new PR from
mainbranch toreleasebranch.