Skip to content

Commit

Permalink
Finished first draft of paper
Browse files Browse the repository at this point in the history
  • Loading branch information
MatthewDaggitt committed Sep 3, 2024
1 parent b3a6ad9 commit 8afa07a
Show file tree
Hide file tree
Showing 2 changed files with 109 additions and 60 deletions.
34 changes: 34 additions & 0 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,44 @@ @article{daggitt2023routing
publisher={IEEE}
}

@inproceedings{carette2020leveraging,
title={Leveraging the information contained in theory presentations},
author={Carette, Jacques and Farmer, William M and Sharoda, Yasmine},
booktitle={Intelligent Computer Mathematics: 13th International Conference, CICM 2020, Bertinoro, Italy, July 26--31, 2020, Proceedings 13},
pages={55--70},
year={2020},
organization={Springer}
}

@book{paulson1994isabelle,
title={Isabelle: A generic theorem prover},
author={Paulson, Lawrence C},
year={1994},
publisher={Springer}
}

@misc{coq2024manual,
title = "The {Coq} Reference Manual -- Release 8.19.0",
author = "{The Coq Development Team}",
year = "2024",
howpublished = "\url{https://coq.inria.fr/doc/V8.19.0/refman}"
}

@inproceedings{van2020maintaining,
title={Maintaining a library of formal mathematics},
author={van Doorn, Floris and Ebner, Gabriel and Lewis, Robert Y},
booktitle={International Conference on Intelligent Computer Mathematics},
pages={251--267},
year={2020},
organization={Springer}
}

@inproceedings{allais2019generic,
title={Generic level polymorphic n-ary functions},
author={Allais, Guillaume},
booktitle={Proceedings of the 4th ACM SIGPLAN International Workshop on Type-Driven Development},
pages={14--26},
year={2019}
}


135 changes: 75 additions & 60 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,113 +8,128 @@ tags:
authors:
- name: Daggitt, Matthew L.
orcid: 0000-0000-0000-0000
equal-contrib: true
corresponding: true
affiliation: 1
- name: Allais, Guillaume
orcid: 0000-0000-0000-0000
equal-contrib: true
affiliation: 2
- name: Carrete, Jacques
orcid: 0000-0000-0000-0000
equal-contrib: true # (This is how you can denote equal contributions between multiple authors)
affiliation: 3
- name: McKinna, James
orcid: 0000-0000-0000-0000
corresponding: true # (This is how to denote the corresponding author)
affiliation: 4
- name: van Doorn, Nathan
orcid: 0000-0000-0000-0000
affiliation: 5
- name: Others to come
orcid: 0000-0000-0000-0000
affiliation: 6
affiliations:
- name: University of Western Australia, Australia
index: 1
- name: Institution Name, Country
- name: University of Strathclyde, UK
index: 2
- name: Independent Researcher, Country
- name: McMaster University, Canada
index: 3
date: 13 August 2017
- name: Heriot-Watt University, UK
index: 4
- name: TODO
index: 5
date: 03 September 2024
bibliography: paper.bib
---

# Summary

Agda[@norell2009dependently] is a functional, dependently-typed language that doubles as both a traditional programming language and as an interactive theorem prover (ITP).
The Agda standard library provides the necessary basics for users to develop Agda programs and proofs.
Unlike the standard libraries of traditional programming languages, the Agda standard library provides not only standard utilities and data structures, but also a large part of the basic mathematics that is essential for proving the correctness of programs.
Agda[@norell2009dependently] is a functional, dependently-typed language that serves both as a traditional programming language and as an interactive theorem prover (ITP).
This paper introduces the Agda standard library, which offers many of the fundamental definitions and results necessary for users to quickly begin developing Agda programs and proofs.
Unlike the standard libraries of traditional programming languages, the Agda standard library not only provides standard utilities and data structures, but also a substantial portion of the basic mathematics essential for proving the correctness of programs.

# Statement of need

Almost all programming languages have some notion of a ``standard'' library which provides a basic set of algorithms, data structures and OS operations, in order to allow users to more quickly develop more complex programs and libraries.
However, there are two reasons why the Agda standard library is perhaps more essential than a traditional standard library.
Most programming languages include a "standard" library that offers a basic set of algorithms, data structures, and operating system operations.
However, there are two reasons why a standard library is particularly important in Agda compared to traditional programming languages.

Firstly, as with most ITPs, in order to decrease the complexity of the compiler implementation and therefore increases its trustworthiness, the Agda language only provides a minimal set of core primitives from which programs can be constructed.
Consequently many concepts that are traditionally thought of as part of the programming language, instead have to be defined.
For example, in a fresh Agda environment, there is no notion of an integer or a string, let alone advanced data structures such as arrays or maps.
First, like other theorem provers, the Agda language provides only a minimal core set of primitives from which programs can be constructed.
As a result, many concepts traditionally considered part of a language must be defined within the program itself.
While this approach reduces compiler complexity and enhances its reliability, it also means that users have access to fewer built-in definitions initially.
For example, in a fresh Agda environment, there is no predefined notion of an integer or a string, let alone more complex data structures such as arrays or maps.

Furthermore, because Agda is a theorem prover, users want to be able to prove that their algorithms they build using these basic data types and data structures are correct.
Therefore, not only do they need the operations they also need proofs that the corresponding operations such ad addition, concatenation etc. are correctly implemented as well as many of their properties.
Without the Agda standard library, something as simple as defining and proving the correctness of a function that reverses strings would require hundreds of lines of code.
Second, Agda users often seek to prove that programs constructed using data types from the standard library are "correct." Therefore, the standard library must provide not only the operations for these data types but also proofs of their correctness (e.g., proving that integer addition is commutative or that string concatenation is associative).
Without the Agda standard library, something as simple as defining a function to reverse a string and proving that it preserves the lenght of the string would require hundreds of lines of code.

This comment has been minimized.

Copy link
@Taneb

Taneb Sep 13, 2024

Member

Where does this claim come from?

  • We neither define reverse on String nor prove it's length preserving
  • Without the standard library, I think this is impossible to prove without postulate: Builtins do not provide any proof that primStringFromList and primStringToList correspond, for example.
  • If we did have some fairly obvious laws on strings provided, I believe this can be proven in roughly fifty lines

This comment has been minimized.

Copy link
@jamesmckinna

jamesmckinna Sep 13, 2024

Contributor

I'd be happy to see L60 (and its typo 'lenght') removed, but it did at least prompt the text about vector reversal.

Can't comment on 'roughly fifty lines', but I trust you ;-)

This comment has been minimized.

Copy link
@Taneb

Taneb Sep 13, 2024

Member

The line as stands in the latest version of the document (at time of writing) is:

Starting from just the language, something as simple as defining a string-reversing function and proving that it preserves the length of the string would require hundreds of lines of code.

So the typos are gone, but the incorrect claim remains. Perhaps we could replace it with one about sorting a list?

This comment has been minimized.

Copy link
@MatthewDaggitt

MatthewDaggitt Sep 16, 2024

Author Contributor

Changed to list sorting in ec830af


# Impact

The focus of the standard library is predominantly on discrete mathematics, which is result of the fact that one of the most popular uses of Agda is as a tool for programming language design and research.
The library has been used as a basis in a wide variety of projects, far too many to be exhaustively listed here. A diverse selection, which is in no-way the meant to be an endorsement of these projects above others, are as follows:
The Agda standard library has been used in a wide range of projects, too numerous to list exhaustively.
A diverse selection, not intended as endorsements of these projects over others, includes:

- Formalisation of Category Theory [@hu2021categories]
- Formalisation of category theory [@hu2021categories]

- Intrinsically typed [@bach2017intrinsically]
- Intrinsically typed interpreters for imperative languages [@bach2017intrinsically]

- Calculus for Esterel [@florence2019esterel]
- Formally verified calculus for the reactive programming language Esterel [@florence2019esterel]

- Verification of hardware [@pizani2018pi]
- Verification of hardware circuit design [@pizani2018pi]

- Verification of routing protocols [@daggitt2023routing]

As one of the largest Agda developments, the library has also had a synergistic relationship with Agda itself, and has driven the implementation of several features in the language.
For example, thanks to the library Agda now has the ability to add custom compilation messages when using a given definition, and the ability to specify library-wide options in the generic library file that accompanies every library.
Perhaps the biggest impact, has been the separation of Agda language options into the categories of ''infective'' and ''coinfective'', to allow the library to safely partition code that is natively defined in Agda (and therefore), and code that uses assumptions or operating system calls and therefore cannot be viewed as safe.
As one of the largest existing Agda libraries, the standard library has had a synergistic relationship with the development of Agda itself, prompting the implementation of several new language features.
For example, the standard library is designed to be compatible with several different compiler options, including `--cubical` and `--safe`.
To enable this, in 2019 Agda categorised all language options into two categories of ''infective'' and ''coinfective'', allowing the library to precisely partition code that can be used under certain flag combinations
This categorisation enables the library to integrate safe code natively defined in Agda with code that uses unsafe operating system calls, while maintaining the safety guarantees of the former.

Additionally, the library has directly influenced the language by requesting the ability to attach custom messages to definitions, which are then displayed by the compiler when the definitions are used, enabling the implementation of deprecation warnings.

# Design

Designing a standard libary for interactive theorem provers, and Agda in particular, is typically a bigger challenge than those for standard programming languages for several reasons:

Firstly, not only does the standard library have to provide all the basics of programming (e.g. data structures, file-system operations), they also have to provide a large fraction of basic mathematics.
Organising mathematics is relatively hard~\cite{} Jacques, cite packing/unpacking of records.

Secondly, Agda was the first ITP to take dependently-typed programming seriously.
A lot of learning of how to do dependently-typed design of large-scale formalisation. Still somewhat in a state of flux as we're learning new ways to organise concepts.

Thirdly, and related to the previous point, the standard library has been used as a test bed for the design of the Agda language itself. For example, the library contains three different notions of co-inductive definitions.

Practical design concerns

Version 2.0 of the library has ironed out many of the design wrinkles found in earlier versions.
This includes:
- minimising the dependency graphs so that core, commonly used modules depend on far fewer parts of the library, and hence load far faster for the user during interactive development.
- standardisation of how mathematical objects such as groups, rings, orders, equivalences etc. are constructed from their subparts.
- the introduction of tactics
- tactics - although tend to be slow (Agda)
- testing infrastructure
Designing a standard libary for an ITP such as Agda presents several challenges.

Firstly, as discussed, the standard library contains many of the foundational mathematical results used to prove program correctness.
Even though the library currently focuses on discrete mathematics - reflecting the bias in its user base towards programming language theory - organizing this material into a coherent and logical structure is extremely challenging[@carette2020leveraging].
There is a constant tension between being as general as possible (e.g., defining operations over general algebraic structures) and providing clear, straightforward, and intuitive definitions (e.g., defining operations concretely over integers).
Additionally, there is a persistent temptation to introduce new representations of existing mathematical objects that are easier to work with for a particular problem, which comes at the cost of duplicating the theory for the new representation.
Theorem provers like Isabelle[@paulson1994isabelle] and Coq[@coq2024manual] approach these problems by having very minimal standard libraries and encouraging the use of external libraries developed by the community, which reduces the emphasis on ensuring the existence of canonical definitions for certain concepts.
On the other hand, MathLib[@van2020maintaining] for Lean aims to provide a repository of canonical definitions.
The design of the Agda standard library leans more towards the Lean approach, with a high bar set for adding alternative formalisations of the same result.

A second challenge is that Agda was the first major ITP to fully embrace dependently-typed programming as the default.
With the exception of Idris, a more recent entrant to the field, other major theorem provers either do not support dependent types or encourage them only to be used sparingly.
In contrast, nearly everything in the Agda standard library makes use of dependent types, with proofs consisting of evidence-bearing terms of the relevant dependent types.
As a result, the library provides relatively sophisticated features like polymorphic n-ary functions[@allais2019generic], regular expressions which provide proof of membership when compiled and applied, and proof-carrying `All` and `Any` predicates for containers.
While this provides powerful tools for users, learning how to design such a large-scale, dependently-typed library is an on-going journey.
Relatedly, the standard library has been used as a test bed for the design of the Agda language itself, as evidenced by the library's inclusion of three different notions of co-inductive data types.

Agda’s unique support for dependently-parameterized modules has also significantly influenced the library’s design.
Although type classes are a common mechanism for creating interfaces and overloading syntax in other functional languages, the Agda standard library has so far found little need to use them extensively.
While Agda supports a very general form of type classes via instance search, the ability to use qualified, parameterized modules as first-class objects appears to reduce their necessity compared to other functional languages.
Additionally, module parameters enable the safe and scalable embedding of non-constructive mathematics into a constructive system.
Since Agda is entirely constructive, the vast majority of the standard library is also constructive.
However, the library provides the option to perform non-constructive, classical reasoning, such as the law of excluded middle, by passing the relevant axioms as module parameters.
This allows users to write such code without directly having to postulate the non-constructive axioms, which would prevent them from using the code with the `--safe` compiler flag.

# Testing

One of the advantages of working in a thereom prover, is that much of the core functionality of the library is __proven__ to be correct, and therefore we do not need correctness tests.
One of the advantages of creating a standard library for an ITP is that proving the correctness of the defined operations is an integral part of the library itself.
As a result, there is no need for test suites to verify functional correctness.
However, the library’s test suite does address two critical areas.
The first area is the interface with the underlying operating system (e.g., reading from the command line, file access, timers).
Since the correctness of the underlying OS cannot be reasoned about in Agda itself, these operations are included in the test suite.
The second area is performance.
The performance of a program cannot be analyzed within Agda, making it necessary to include performance tests.
Although the library currently includes a few performance tests, this is not a major priority for the community, and remains an area in need of further work.

These leaves two crucial areas that do need to be tested.
The first is that the standard library provides several modules that provide an interface with the underlying operating system (e.g. reading from the command line, file access, timers).
As the correctness of the underlying OS cannot be reasoned about in Agda itself, these functionalities do have tests.
The second area, is performance. Again, the performance of the Agda compiler cannot be reasoned about in Agda, and therefore . While the library does have a couple of tests, performance is still very much an area that needs work.
# Notable achievements in version 2.0

# Similar libraries
This short paper outlines the state of version 2.0 of the Agda standard library, in which we believe we have successfully addressed some of the significant design challenges present in version 1.0. Key improvements include:

There are many older, and more mature theorem proving systems than Agda, and each with other standard libraries out there. However, they all have some crucial differences.
- Minimized Dependency Graphs: We have reduced the depth of dependency graphs within the library, ensuring that the most commonly used modules rely on fewer parts of the library. This change has resulted in significantly faster load times for users during interactive development.

Isabelle[], Coq[] both have very minimal standard libraries and encourage the use of external libraries developed by the community, and hence there is comparatively less effort on ensuring there are canonical definitions for certain concepts.
The closest perhaps in this respect to the Agda standard library, is MathLib for Lean which .
- Standardisation of Mathematical Object Construction: We have standardised the construction of mathematical objects such as groups, rings, orders, equivalences, etc., from their sub-objects, enhancing consistency and usability.

Additionally, with the exception of Idris which is a relatively newcomer to the space, other major theorem provers do not lean heavily into dependent-types, use classical axioms and therefore contain many non-constructive results.
Agda is the first library system that uses full-fledged dependent types, and the standard library is the one that uses dependent types seriously. For example, `All`, `Any` predicates, n-ary functions~\cite{}, regular expressions which decide membership. Generic polymorphic n-ary functions[@allais2019generic].
- Introduction of a Tactics Library: We’ve introduced a small but growing tactics library. Experiments have shown that these tactics are currently slower than those in comparable systems, indicating a potential area for future improvements in Agda itself.

Another interesting key difference from other ITPs is the Agda standard library's relative lack of use of type-classes, a common mechanism for creating interfaces in functional languages and overloading syntax.
It has turned out that the ability to use qualified, parameterised modules as first class objects in Agda, means that type-classes are not as essential as in other languages.
It will be interesting to see if this state of affairs, continues as the library continues to grow and scale.
- Introduction of a Testing Framework: We have also introduced a testing framework that allows users to write their own test suites, providing a structured way to check the performance and correctness of their non-native Agda code.

# Acknowledgements

Expand Down

0 comments on commit 8afa07a

Please sign in to comment.