GitHub - ajgallego/grammatical-inference-of-graphs: Grammatical inference of directed acyclic graph languages with polynomial time complexity

The code of this repository was used for the following publication. If you find this code useful please cite our paper:

@article{Gallego2017,
title = "Grammatical inference of directed acyclic graph languages with polynomial time complexity",
author = "Antonio-Javier Gallego and Damián López and Jorge Calera-Rubio",
journal = "Journal of Computer and System Sciences",
year = "2017",
issn = "0022-0000",
doi = "https://doi.org/10.1016/j.jcss.2017.12.002",
url = "http://www.sciencedirect.com/science/article/pii/S0022000017302866"
}

Below we include instructions to reproduce the experiments.

Requirements

First you have to install version 1.11 of the Python NetworkX library. For this you can run the following command:

pip install networkx==1.11

Usage

The grammin.py script runs the grammatical inference algorithm. The parameters of this script are the following:

Parameter	Default	Description
`-path`		Path to the dataset
`-ftype`	grf	Input file type: grf, gxl
`-nlabel`		Tag used for node labels (only for gxl files)
		Mutagenicity: chem, GREC: type, RNA and NIST do not have
`-db`	-1	Dataset name: grec, mutagen, rna, nist
`-c`	-1	Class to validate. -1 to validate all
`-values`	2,3,4	Comma-separated list of k values to test
`-limit`		Limit number of samples per class
`-step`	100	Step size between tests. Use -1 to test only at the end
`--remove`		Remove graphs with more than one initial node

The only mandatory parameter is -path, the rest are optional. The -nlabel parameter allows to specify the attribute that contains the node's label (only used for the gxl files). The labels of the samples are specified in the name of the file, whose name pattern can change depending on the dataset, for this reason it is necessary to indicate the name of the dataset that will be loaded with the -db option. The -c parameter allows to validate the inclusion of all classes (value -1) or of the indicated class. The -values option indicates the list of k values that will be evaluated.

For example, to classify all the classes in the Mutagenicity dataset, you have to run the following command:

python grammin.py -c -1 -ftype gxl -path datasets/Mutagenicity -nlabel chem

Datasets

The datasets folder includes the Hairpin loops dataset and the graphs extracted from the NIST dataset (Neighbor, Grid, and Skeleton graphs). The rest can be downloaded from the following address:

IAM Graph Database Repository

http://www.fki.inf.unibe.ch/databases/iam-graph-database

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
datasets		datasets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
grammin.py		grammin.py
utilGraph.py		utilGraph.py
utilGraphIO.py		utilGraphIO.py
utilGraphNGram.py		utilGraphNGram.py
utilGraphNGramTransition.py		utilGraphNGramTransition.py
utilSubGraph.py		utilSubGraph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Usage

Datasets

About

Releases

Packages

Languages

License

ajgallego/grammatical-inference-of-graphs

Folders and files

Latest commit

History

Repository files navigation

Requirements

Usage

Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages