Quill is a WIP attempt at a structurally faithful neural representation of Agda terms, with a current application in premise selection.
1. Clone or download this repository.
git clone git@github.com:konstantinosKokos/quill.git
2. Initialize a fresh python environment (e.g. using Miniconda).
conda create --name quill python=3.12
3. Activate the environment, and install the package and its dependencies.
conda activate quill
pip install .
Note: This will install the cpu version of pytorch by default. For CUDA acceleration, you will need to manually find and install compatible pytorch and pytorch geometric versions.
4. Download a pretrained model's weights through here.
Note: Optional. Only relevant for inference/evaluation.
5. Download the tokenized and processed Agda files from here.
Note: Optional. Only relevant for training/evaluation.
You can uninstall by removing the environment from your system, and deleting all downloaded files.
🚧 Note: Work in Progress 🚧
You should be able to export an Agda file (possibly containing holes) into json format
(see agda2train), and then invoke a script that looks like /scripts/inference.py
to obtain lemma suggestions.
The second part of this process is partially streamlined through a CLI API.
First, deploy a server running the model in inference mode:
agda-quill serve -config PATH_TO_MODEL_CONFIG -weights PATH_TO_MODEL_WEIGHTS
You can then optionally precompute representations of the lemmas defined in various files/libraries:
agda-quill cache -files ./data/stdlib/Data.List.*
Finally, you can query the model for suggestions, optionally using the cached representations:
agda-quill query -file ./data/stdlib/Algebra.Construct.NaturalChoice.Min.json --max_suggestions 2 --use_cache
Adjust the training configuration file (/data/config.json
) and run the training script (/scripts/train.py
), after any necessary modifications.
Gather a bunch of JSON exports over an Agda library of your choice (again, using agda2train).
Run the preprocessing script (/scripts/preprocess.py
) to have Python parse and tokenize the exports.
Then train as you would normally.
Run the evaluation script (/scripts/validate.py
), after any necessary modifications.
The Python definitions of Agda file- and term-structures can be found in src/quill/data/agda/syntax.py
.
The definitions of tokenized files can be found in /src/quill/data/tokenization.py
.
Use them as you see fit to navigate and process the parsed & tokenized files.
You can load these with:
import pickle
with open('./data/tokenized_sample.p', 'rb') as f:
samples = pickle.load(f)
For a low-dimensional projection of the neural representations of (most) lemmas in the stdlib, together with their undirectional reference structure, look here: https://konstantinoskokos.github.io/quill/viz.html.
Feel free to open and issue or get in touch.
You can cite the following entry if using our work in a scholarly context.
@inproceedings{kogkalidis_learning_2024,
author = {Konstantinos Kogkalidis and Orestis Melkonian and Jean-Philippe Bernardy},
title = {Learning Structure-Aware Representations of Dependent Types},
booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year = {2024},
}