Thank you for your interest in contributing to Shsc! We welcome contributions from the community to make Shsc better.
Before you get started, please take a moment to read and follow these guidelines.
- Project Overview
- Syntax
- Module Structure
- Linker Errors
- Naming Conventions
- The eval functions
- Memory Management
- Why Lists are in-fact Matrices
- Why Maps Internally Use Lists
- Using the
RT_VTABLE_ACC
macro - Address Sanitizer
- Clangd LSP
- Devtools Directory
- Contributing Guidelines
- Lexer Guidelines
- Language Docs
Shsc is an interpreter written in C for a custom-designed programming language.
The syntax for this language can be found in the file LanguageDocs.md.
Modules in this C project have the .c.h
extension and are directly included in the top-level .c
file from lower-level directories.
Note that .c.h
files are not compiled separately from the Makefile.
To avoid linker errors, ensure that you include the appropriate .c.h
files in the respective top-level .c
files.
If you encounter linker errors, double-check if you have missed any necessary .c.h
files.
- Lexer functions are prefixed with
lex_
. - Parser functions are prefixed with
parse_
. - Input-output wrappers are prefixed with
io_
. - AST (Abstract Syntax Tree) functions are prefixed with
ast_
. - Runtime functions are prefixed with
rt_
. - Built-in API funcions are prefixed with
fn_
. - Operator API functions are prefied with
op_
. - Custom-defined types are suffixed with
_t
. - Struct types follow the convention
prefix_TypeName_t
, whereTypeName
uses camel case andprefix
can be any prefix associated with the type. - Struct declarations are placed in files based on whether their inner member definitions need to be localized to that single file or visible globally.
- Constructors for struct types must be named
prefix_TypeName()
. - Specialized constructors and functions can have names like
prefix_TypeName_something_something
, indicating that the functionsomething_something
is related toTypeName_t
within theprefix
namespace. - The underscore (
_
) in function names and global variables is generally significant and is commonly used for namespacing purposes. - You can even do
prexfixA_prefixB_prefixC_type_or_function
. - For headers and source files dedicated to a struct type, name the file as the struct name.
- For eg, if struct type is
rt_Data_t
, files maybe named asData.h
andData.c.h
.
These are the functions listed in the src/runtime/eval.h
and defined in src/runtime/*.c.h
files.
Each of these functions evaluate a part of the AST.
The following functions evaluate code and return a rt_ControlStatus_t
. The control status is used to control the flow of the program.
rt_eval_Statements
rt_eval_Statements_newscope
rt_eval_Statement
rt_eval_CompoundSt
rt_eval_IfBlock
rt_eval_ElseBlock
rt_eval_WhileBlock
rt_eval_ForBlock
The following functions evaluate expressions and MUST make a call to either rt_VarTable_acc_setadr()
or rt_VarTable_acc_setval()
to set the accumulator.
rt_eval_Expression
rt_eval_Literal
rt_eval_Identifier
rt_eval_CommaSepList
rt_eval_AssociativeList
The RT_VTABLE_ACC
macro is used to get the accumulator data ONLY if one of the above functions was called immediately before.
If a function can call free
upon a pointer without any casts and causing no error or warning, the function is said to own it.
- If a function returns a new heap pointer, it should be documented that the returned pointer is to be freed.
- Constructor and destructor functions should be used to allocate and free composite or struct based heap objects.
- A destructor function should take a pointer to a pointer and assign the original pointer to
NULL
once freed. - If a function does not modify a passed heap pointer or object, the pointer MUST be marked as
const
in the formal parameters. - If a function must make changes to a heap object ONLY then should the pointer be passed as non-
const
.
If you have a pointer to memory and want to explicitly pass ownership, even if you don't currently own it, you may explicitly cast it to non-const
.
However, be cautious as this can result in poor code quality, so use it judiciously.
For example, a list of const
struct pointers may be created, but the functions of the list can't free them coz they may still have other references.
So you may need to explicitly cast to non-const
and free them from the list only if you're SURE that there is no other reference.
From v1.8
and v2.3
onwards, lists are implemented internally as a rt_Data_t**
instead of a rt_Data_t*
.
Earlier, lists were implemented as a rt_Data_t*
and resulted in full reallocation of the list when a new element was added. Apart from the performance hit, this also caused existing direct pointers to list elements to become invalid. The result was heap-use-after-free
errors.
The matrix implementation solves this problem by:
- making
rt_Data_t**
an array ofrt_Data_t*
. - a row of
rt_Data_t
of some constant size is allocated and a pointer to it is held in thert_Data_t**
. - when a row is full, a new row is allocated and the pointer to it is stored in the
rt_Data_t**
. - existing rows are not reallocated, so any
rt_Data_t*
remains valid.
From v1.8
and v2.3
onwards, maps are implemented internally as rt_DataList_t*
. Previously, data was stored directly in the map. Currently, data is stored in a list and index of the list is stored in the map.
This fixes the same issue that has been addressed in Why Lists are in-fact Matrices.
Definitiion
rt_VarTable_acc_get()->adr
? rt_VarTable_acc_get()->adr
: &rt_VarTable_acc_get()->val
Clearly, this is a very dangerous macro to use. This macro must be used IF AND ONLY IF it is immediately preceeded by one of the functions listed in the src/runtime/eval.h
and defined in src/runtime/*.c.h
files.
This is because the accumulator is a two-faced serpent. It can either be a pointer to a heap object or a direct value copy. This takes care of both l-values and r-values but in a very dangerous way.
In the event that it a pointer to a heap object, by chance if that scope is popped, the pointer will be dangling and the program will crash.
Using an address sanitizer is mandatory to ensure code quality and catch memory issues early. Make sure to run the address sanitizer during development and testing.
The Makefile has a target run-sanitize
for this purpose.
Debug builds are built with -fsanitize=address
.
WARNING Using address sanitizer will significantly increase memory usage (like about 500+ MiB for a single loop).
For more information, see Memory Usage Tests
.
Note that you cannot run the address sanitizer on Windows.
This section will apply to you only if you're using clangd
LSP.
You'll need to ignore clangd errors and warnings in bison
generated files.
Clangd will use the compile_flags.txt
file.
The devtools
directory contains scripts and tools to help with development.
List of tools:
mk_header_guards.sh
Auto generate header guards for all files given a header prefix and the directory.ren_files.sh
Bulk rename files by replacing a string in old filenames with a new string. Only works on files with extensions.c
,.h
or.c.h
.ren_idfs.sh
Bulk rename identifiers in files by replacing a string in old identifiers with a new string. Only works on files with extensions.c
,.h
or.c.h
.rm_trailing_ws.sh
Remove trailing whitespace from all.c
,.h
or.c.h
files.run_clangd.sh
Runsclangd --check
on all.c
,.h
or.c.h
files. This produces a rather large output, so it is recommended to redirect the output to a file.
The functions ast_Expression_Literal()
, ast_Expression_Identifier()
and ast_Expression_CommaSepList()
produce an ast_Expression_t
with op
set to TOKOP_NOP
.
However ast_Expression()
extracts out the data from the returned ast_Expression_t
and stores it inside the internal union ast_ExpressionUnion_t
.
Therefore, There's absolutely no way you'll ever find the TOKOP_NOP
in any ast_Expression_t
in the finally generated AST.
To contribute to Shsc, please follow these guidelines:
- Fork the repository and create a new branch for your contribution.
- Make your changes or additions while adhering to the coding conventions and guidelines mentioned above.
- Write clear and concise commit messages (alternatively, email the repo owner explaining your changes and include the commit link from the GitHub site in that email).
- Submit a pull request, documenting the changes made and providing any necessary context.
We appreciate your contributions and look forward to working with you to improve Shsc!
If you have any questions or need assistance, feel free to reach out to the project maintainers.