Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New UCS desugarer and rudimentary PreTyper #194

Merged
merged 140 commits into from
May 23, 2024
Merged

Conversation

chengluyu
Copy link
Member

@chengluyu chengluyu commented Nov 23, 2023

TL;DR

This PR adds new UCS desugarer and a rudimentary implementation of PreTyper.

Features

Desugarer

The new UCS desugarer replaces the old desugarer. The facilities old desugarer were also removed. The new desugarer has following facilities.

  • Two Sets of Syntax. We use two sets of syntax tree nodes: source abstract syntax and core abstract syntax. Both of them are implement according to our paper precisely in package mlscript.ucs.syntax.
  • Stage Classes. The new desugarer consists of five stages, which correspond five classes in package mlscript.ucs.stages. The stages are the following.
    • Transformation: transforming parsed AST to source abstract syntax.
    • Desugaring: generating let bindings and expand patterns.
    • Normalization: removing backtracking and convert core abstract syntax to case branches.
    • PostProcessing: merging case branches with the same scrutinee.
    • CoverageChecking: detecting unreachable cases and duplicated cases.
  • Scrutinee Management. We tracks useful information of each scrutinee's patterns by Context, Scrutinee and Pattern.
    • Each If term is associated with a unique Context. A Context keeps track of all scrutinee and is in charge of allocating temporary variable names.
    • Scrutinee represents a term that is being matched in UCS expressions. It keeps the tally of all patterns the scrutinee has been matched against as well as occurrences and sub-scrutinees of each pattern.

We support two kinds of new patterns in this PR.

  • Simple Tuple Pattern is a syntax sugar for selecting fields. It cannot be used together with other patterns.
  • Alias Pattern allows people name the entire term represented by a pattern.

PreTyper

  • PreTyper traverses the AST and associate each variable with symbol it refers to.
  • A Hierarchy of Symbols. Each symbol class maintains certain information useful to desugarer.
  • Scope maintains an association from variables to symbols.

DiffTests

  • Local flag :ducs controls the new Desugarer right now. It supports sub-flags.
    • :ducs:[stage] where [stage] is one of transform, desugar, normalize, postprocess, coverage will display the debug messages from the specified stage.
    • :ducs:[stage].result shows the pretty-printed outcome of the specified stage.
    • :ducs:desugared shows the desugared MLscript term.
    • :ducs:A,B,C shows debug messages for all A, B, and C.
  • Global flag :ShowPreTyperErrors emits warnings and errors reported by PreTyper.

Others

  • The utility trait Traceable abstracts debugging methods used in Typer, including trace, println, etc.
  • Implement freeVars for Located. The original implementation is broken. But my implementation is not perfect either.
  • Support parsing _ then ... and else in IfOpsApp.
  • Fix code generation for case expression of Num, Int, Bool, and Str.
  • Fix display of escaped characters in StrLit.
  • Add utility methods foldLeft and foreach to CaseBranchesImpl.

Future Works

This PR does not implement all features, but it lays the foundation for some of the following features.

  • Branch deduplication. Currently, we duplicate branches during normalization, which generates many code in some examples (e.g., JSON parser). We can reduce branches by abstracting duplicated branches into local functions.
  • Better tuple patterns. As mentioned before, we only implement "simple" tuple patterns. It is not real tuple patterns because the underlying MLscript CaseOf doesn't support tuples right now. However, it paves the road for complicated tuple patterns even tuple splices.

Besides, we can still improve location reporting in many places.

- Very minor but important: should use post-processed term as the
  desugared term.
- Add incomplete AVL tree test cases which reveals many problems.
- Cache operator split partial terms for efficiency.
- `Traceable` now supports filtering debugging by topics.
- Skip speicalization Boolean condition scrutinees.
We used to generate let bindings which destruct class parameters
during normalization. This works fine but it exposes some
inflexibility. For example, in the previous commit, I added a case
where free variables from a absorbed case branch are shadowed.
If we moving let bindings generation to desugaring stage, the
problem can be solved easily. This change also paves the road for
duplicated branch lifting.
Normalization stage is greatly simplified from now!
also support traversing let bindings
This is a refactor and not test changes were found.
@chengluyu
Copy link
Member Author

I'm fixing not being able to report some unreachable cases and conflicting cases. I'm handling the situation we discussed before: some duplicated branches can also trigger these checks. The solution I take is to mark the duplicated syntax tree nodes, so we will not trigger warnings if these nodes are detected.

@LPTK
Copy link
Contributor

LPTK commented Apr 29, 2024

The solution I take is to mark the duplicated syntax tree nodes, so we will not trigger warnings if these nodes are detected.

That's a good enough short-term solution, but after this PR is merged we need to think of a more principled and correct way of handling this!

@chengluyu
Copy link
Member Author

That's a good enough short-term solution,

This solution is more limited than I thought. I expected it to be able to detect all the duplicated cases. Because a split might be concatenated as a fallback in the specialization for different scrutinees, a Boolean flag is not enough. I tried to turn this boolean flag into a map, but I need to modify the map in many places. I haven't been able to make it work well until now, and the code will also become very messy.

I feel it can figure out most of the duplicated cases, but it can't detect some of the covered cases, such as the one below.

fun f(x) = if
  x is A and
    x is B then 1
  x is B then 2

but after this PR is merged we need to think of a more principled and correct way of handling this!

I think a neater approach is to check on the desugared (rather than normalized) syntax tree. Although we can know which branches are removed during specialization, we don't know if they are removed due to duplication and which branches they overlap with because they need non-local information.

Lastly, I've also improved the warning message for different duplicated cases at the same time. There're three types of duplication.

  1. Simple duplication (mainly caused by identical patterns)

     ```
     :w
     fun f(x) = if x is
     	"a" then 1
     	"b" then 2
     	"a" then 3
     //│ ╔══[WARNING] found a duplicated case
     //│ ║  l.10: 	  "a" then 3
     //│ ║        	  ^^^
     //│ ╟── there is an identical pattern "a"
     //│ ║  l.8: 	  "a" then 1
     //│ ╙──     	  ^^^
     //│ fun f: ("a" | "b") -> (1 | 2)
     ```
    
  2. Duplication caused by inheritance

     ```
     :w
     fun f(x) = if
     	x is A then 1
     	x is B then 2
     //│ ╔══[WARNING] found a duplicated case
     //│ ║  l.27: 	  x is B then 2
     //│ ║        	       ^
     //│ ╟── the case is covered by pattern A
     //│ ║  l.26: 	  x is A then 1
     //│ ║        	       ^
     //│ ╟── due to the subtyping relation
     //│ ║  l.7: 	class B() extends A()
     //│ ╙──     	^^^^^^^^^^^^^^^^^^^^^
     //│ fun f: A -> 1
     ```
    
  3. Always-true pattern (mainly due to subtyping)

     ```
     :w
     fun f(x) = if
     	x is B and
     		x is A then 1
     		p(x) then 2
     	x is A then 31
     	x is B then 3
     	else 4
     //│ ╔══[WARNING] the pattern always matches
     //│ ║  l.67: 	    x is A then 1
     //│ ║        	         ^
     //│ ╟── the scrutinee was matched against B
     //│ ║  l.66: 	  x is B and
     //│ ║        	       ^
     //│ ╟── which is a subtype of A
     //│ ║  l.4: 	class B() extends A()
     //│ ╙──     	^^^^^^^^^^^^^^^^^^^^^
     //│ fun f: (B | Object & ~#B) -> (1 | 31 | 4)
     ```
    
  4. Always-false pattern (mainly due to matching a scrutinee against unrelated constructors)

    :w
      fun f(x) = if
      	x is X and
      		x is Y and
      			x is Z then 1
      	else 2
      //│ ╔══[WARNING] possibly conflicted patterns
      //│ ║  l.105: 	    x is Y and
      //│ ║         	         ^
      //│ ╟── the scrutinee was matched against X
      //│ ║  l.104: 	  x is X and
      //│ ║         	       ^
      //│ ╟── which is unrelated with Y
      //│ ║  l.105: 	    x is Y and
      //│ ╙──       	         ^
      //│ fun f: Object -> 2
    

@LPTK
Copy link
Contributor

LPTK commented May 8, 2024

I think a neater approach is to check on the desugared (rather than normalized) syntax tree. Although we can know which branches are removed during specialization, we don't know if they are removed due to duplication and which branches they overlap with because they need non-local information.

Makes sense. This can go in a future PR!

@LPTK
Copy link
Contributor

LPTK commented May 8, 2024

It seems there's something wrong in the handling of refined with multiple branches.

#194 (comment)

BTW was this fixed or at least reproduced in the tests?

@chengluyu
Copy link
Member Author

BTW was this fixed or at least reproduced in the tests?

Yes, this was fixed.

Copy link
Contributor

@LPTK LPTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LPTK LPTK merged commit 8927be7 into hkust-taco:mlscript May 23, 2024
1 check passed
@LPTK LPTK deleted the pretyper branch May 23, 2024 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants