-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathch4.tex
106 lines (65 loc) · 8.02 KB
/
ch4.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
\chapter{Conclusion and Future Work}
\section{Conclusion}
We have shown that it is possible to efficiently model the type system of the C language and to model most of the language's constructs in the language of the lambda calculus variant derived from Haskell.
Section \ref{ex:list} then showcases a code that might be used in some projects using the CHM language. See appendix \ref{stl} for more code examples that implement some common standard library features (functions we can encounter in the C++ standard library, for example).
The implementation of the CHM compiler provides several test files, each beginning with a description stating what the file tests and what is the program's expected behavior.
The CHM compiler, as it is currently implemented, serves only demonstrative purposes. Its implementation is partially incomplete and not fit for any use in software development. Yet, all code examples in this thesis are correctly parsed and interpreted based on the definitions in the previous chapters.
\section{Possible Use in System Development}
\label{ex:list}
Here we will show the usefulness of this language in some existing code from popular system development projects.
One area where it would be especially useful to support polymorphic code is \emph{polymorphic data structures}. We have discussed their implementation in C using mainly macros and a possibility of alternative CHM type-safe implementations in chapter \ref{chap:infPolyC}. A concrete example of this is polymorphic lists.
We can encounter such lists, for example, in the linux kernel, their implementation is defined in \emph{./include/linux/list.h} \cite{torvalds2018linux}.
Simple polymorphic lists can be implemented as follows (note that this implementation is simplified and the push function is suboptimal for nontrivial data):
\begin{lstlisting}
<a : b ~ struct List : <a> >
struct List
{
b *next;
a value;
};
\end{lstlisting}
Then we can implement its functions as follows:
\begin{lstlisting}
<a : b ~ struct List : <a> >
b *push(b *head, a value)
{
b *item = new();
// new() is an STL function returning malloc(sizeof(b))
item->value = value;
item->next = head;
return item;
}
<a : b ~ struct List : <a> >
b *pull(b *head)
{
b* next = head->next;
delete(head);
return next;
}
\end{lstlisting}
In C we could create a similar structure using BSD \lstinline{<queue.h>}'s \cite{pages2007queue} \lstinline{SLIST_} macros, which is yet another example, quite commonly used. Using this library requires us to create each list type instance separately and in invocations of its functions it requires us to pass additional parameters. For example SLIST's equivalent of our \lstinline{pull} function, \lstinline{SLIST_REMOVE_HEAD}, takes as an additional \lstinline{entries} argument (a structure member of the entry node that connects the elements constructed by the \lstinline{SLIST_ENTRY} macro) on top of the expected \lstinline{head} argument.
\section{Future Work}
If this project gains popularity, future work should extend the STL library, and a new compiler should be implemented, which would not rely on GCC. That would make it easier to implement new features and optimize the process of compilation.
\paragraph{Incomplete Implementation}
The provided CHM compiler implements the features crucial for the demonstration of the effect type inference and polymorphism would have on development in C. Those features are (generic) functions, pointers, structs, control statements, and almost all expressions.
There are currently implemented only some of the built-in types (the main ones) as their support is not detrimental for the demonstration, and it was not possible to make a complete implementation in time.
Some more notable constructs not implemented by the current compiler are typedefs, enums, and type-checking of zero-comparability in (control statement) conditions. All of those should be implemented in future implementations of the compiler.
\paragraph{Compiler Errors}
The current implementation of the CHM compiler outputs error messages stating the reason behind the error and the location in the code that was the source of the error. That is not the case for errors that happen during the type inference phase of compilation as the THIH \cite{jones1999typing} project would need some more changes to the definition of expressions (that model the code) to know from where in the code they originate. But without some extensive rework of the compiler, such an implementation would break the compiler's modular nature as the locations are implemented in the language-chm project (language-c, see \cite{visq2018language-c}).
Future implementations of the compiler should massively refactor the code, change the representation of the compiled code, and make the errors more readable and more specific.
\subsection{Future Features}
\paragraph{Quality of Life Changes}
In the current implementation of the compiler, we allow declaration of type variables only at the CHM headers of the various definitions. Which is enough for demonstration, but in a more convenient-to-use implementation, the language should allow declaration of local type variables even inside function definitions. Also, it should provide a notion of anonymous/implicit type variables (similar to C++'s \lstinline[language=c++]{auto}) which would decrease the verbosity of the code in many cases.
\paragraph{Better support for record fields}
The current specification of record fields in CHM is that if two record fields of different records have the same name, they shall have the same type as well (which can be ambiguous), see the section \ref{ssec:recordFieldsCHM} \emph{Record Fields}.
In the future language specification, the specification for the polymorphic fields should be that their types are equally dependent on the type parameters of the struct which they are fields of and for all possible types of those parameters, they still share the type (example: if one field has a type that is the last type parameter of the parent struct, every other field sharing its name have to be typed after the last type parameter of its parent struct as well, ditto for second to last, etc.; and similarly for all composite types).
There should be a compiler option that would forbid fields from sharing their names, at all, and another one that would just trigger a warning unless the compiler is not able to determine which one to use in a certain context.
Then another sharing of names of fields should be defining fields in type family definitions, where all specializations would have these fields without any warning or error even under the effect of the option stated above. Different specializations would then still be able to add their own field on the top of the shared ones. The implicit \lstinline{Has_<field>} class would apply here as well.
\paragraph{Modules}
Having many structures with even more fields and procedures would create a high probability of name collisions. The "include" system from the C language then becomes an issue for larger projects or even smaller ones wanting to use some foreign code.
And therefore, this language could prosper from having a module system akin to many modern languages (like Haskell), which would make it possible to export only parts of the code we want to export or give them qualified types (both to avoid the name collisions).
\paragraph{Data type specializations}
Data type specialization is not currently supported, but a great candidate matching the current nature of the language would be type families (for example, used in Haskell).
Their addition would be hugely beneficial for the language, just like the partial specialization of struct templates is for C++, as those mechanisms can be considered equivalent in many cases of use.
Their use would be especially helpful for optimizing implementations, for example:
If we want to remember arbitrary integral numbers with repetition, we could use a map from numbers to their counts, however for some very limited ranges (or for booleans), we could implement the semantically equivalent data structure using static arrays few counters.