Skip to content

Compiler for my functional language FLang written in Haskell

License

Notifications You must be signed in to change notification settings

RyanAlameddine/FLang

Repository files navigation

FLang

FLang is a low(ish)-level functional programming language (designed by me) and inspired by the book "Implementing Functional Languages" by Simon Peyton Jones and David R Lester.

Parser

1. Tokenization

type Token = (LineNumber, String)
type LineNumber = Int

tokens :: String -> [Token] 

The tokens function performs a lexical analysis on the input string and returns a list of tokens based upon their conformity to the following patterns:

isNewl    = (=='\n')                               --if a newline character
isSpace   = isSpace                                --if a whitespace char
isCChar   = (=='#')                                --if the comment character
isParen   = matchAny [(=='('), (==')')]            --if a parenthesis character
isDigit   = isDigit                                --if a digit
isAlpha   = isAlpha                                --if an letter of the alphabet
isIdChar  = matchAny [isAlpha, isNumber, (=='_')]  --if part of an identifier (alpha, number, or underscore)
isOper x  = not (isParen x) && isSymbol x          --if an operator token (symbol not parenthesis) 

2. Parsing

The monadic parser combinator is defined as follows:

--Returns a list of (output, remaining tokens) which represents
--the list of all possible parses (ambiguities to be collapsed)
newtype Parser a = Parser { runParser :: [Token] -> [(a, [Token])] }

instance Functor Parser where
    fmap f (Parser p) = Parser (fmap transform p)
        where
            transform = map $ first f

instance Applicative Parser where
    pure val = Parser $ \input -> [(val, input)]
    (Parser f) <*> (Parser a) = Parser $ \input -> do
                                            (func, input' ) <- f input 
                                            (val , input'') <- a input'
                                            return (func val, input'')

instance Monad Parser where
    (Parser a) >>= f = Parser $ \input -> do
                            (val , input') <- a input
                            runParser (f val) input'

instance Alternative Parser where
    empty = Parser $ const []
    (Parser p1) <|> (Parser p2) = Parser $ \input -> p1 input <|> p2 input

The base parser pSat (which parses for a satisfied condition) is defined and all other parsers are defined as combinations of pSat using the Monadic functions.

pSat :: (String -> Bool) -> Parser String
pSat f = Parser satisfy
    where
        satisfy ((_, t):ts) | f t = [(t, ts)]
        satisfy _ = []

A parser combinator library is built off of this satisfy function. Here are some sample parsers:

--takes in a string and returns a parser which parses that literal
pLit :: String -> Parser String
pLit = pSat . (==)

--parses a variable identifier
pVar :: Parser String
pVar = pSat check
    where
        --starts with alphanumeric and is not a keyword
        check token@(c:_) = isAlpha c && token `notElem` keywords

--parses a number
pNum :: Parser Int
pNum = read <$> pSat check
    where 
        check = isJust . (readMaybe :: String -> Maybe Int)

--zero or more instances of a parser where all values are combined into a list
pZeroOrMore :: Parser a -> Parser [a]
pZeroOrMore p = return [] <|> pOneOrMore p

--one or more instances of a parser where all values are combined into a list
pOneOrMore :: Parser a -> Parser [a]
pOneOrMore p = do
            val <- p
            others <- pZeroOrMore p
            return $ val : others

To solve the issues with left recursion (and implement operator precedence), this grammar is updated to the following:

After these basic parsers are defined, all other parsers are just simple transliteration of the language syntax (using simple function composition):

pProgram :: Parser CoreProgram
pProgram = pOneOrMoreSep pScDefn (pLit ";")

pSc :: Parser CoreScDefn
pSc = pair3 pVar (pZeroOrMore pVar) (pLit "=" *> pExpr)

pExpr1 = pExprLayer pExpr2 [("|", pExpr1)]
pExpr2 = pExprLayer pExpr3 [("&", pExpr2)]
pExpr3 = pExprLayer pExpr4 $ map (,pExpr4) relOps
pExpr4 = pExprLayer pExpr5 [("+", pExpr4), ("-", pExpr5)]
pExpr5 = pExprLayer pExpr6 [("*", pExpr5), ("/", pExpr6)]

pDefns :: Parser [CoreDefn]
pDefns = pOneOrMoreSep pDefn (pLit ";")

pDefn :: Parser CoreDefn
pDefn = pair2 pVar (pLit "=" *> pExpr)

pAlts :: Parser [CoreAlt]
pAlts = pOneOrMoreSep pAlt (pLit ";")

pAlt :: Parser CoreAlt
pAlt = pair3 cnum cvars pExpr
    where
        cnum  = pLit "<" *> pNum <* pLit ">"
        cvars = pZeroOrMore pVar <* pLit "->"

Internal Representation

After being parsed, the program will be represented

type Program a = [ScDefn a]
type CoreProgram = Program Name

--name of supercombinator, arguments, and body
type ScDefn a = (Name, [a], Expr a)
type CoreScDefn = ScDefn Name

data Expr a = 
    EVar Name               -- Variable
    | ENum Int              -- Number
    | EConstr Int Int       -- Constructor: tag, arity
    | EAp (Expr a) (Expr a) -- Application
    | ELet                  -- Let(rec) expression
        IsRec               --   True=recursive
        [Defn]              --   Definitions
        (Expr a)            --   Expression Body
    | ECase                 -- Case expression
        (Expr a)            --   Expression being checked
        [Alter a]           --   Alternatives
    | Elam [a] (Expr a)     -- Lambda
    deriving Show

type CoreExpr = Expr Name 

type Name = String

type IsRec = Bool

--(variable name being bound, expression to which it is bound)
type Defn a = (a, Expr a)
type CoreDefn = Defn Name

--case expressions: expression to analyze and list of alternatives
--each alternative has tag, bound variables, and expression to the right of the arrow
type Alter a = (Int, [a], Expr a)
type CoreAlt = Alter Name

FLang Prelude

The Prelude (standard library) of FLang consists of the following functions:

id x   = x ;
K  x y = x ;
K1 x y = y ;
S f g x       = (f x) (g x) ;
compose f g x = f (g x)     ;
twice f       = compose f f ;

Pretty Printer

This project also includes a pretty printer which can transform the CoreProgram data into a formatted string which represents the program.

data Seq = SNil
        | SStr String
        | SAppend Seq Seq
        | SIndent Seq
        | SNewline

prnt :: CoreProgram -> String

To combat the inefficiency of the (++) operator (which would otherwise be needed often), the prnt function generates the string using Seq, a binary tree with values only in the leaf nodes (but also holds additional formatting information). These Seq trees are eventually flattened into a regular string once at the end of pretty printing.

About

Compiler for my functional language FLang written in Haskell

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published