Skip to content

Project: DSL Syntax Simplifications and Improvements

SimonCockx edited this page Mar 18, 2022 · 14 revisions

Rosetta has grown over the years: new features have appeared and new kinds of expressions got supported. Looking back, some features could have been implemented in a better way and some are unnecessarily complex. To simplify the upcoming work described in the Roadmap, we intend to clean up the syntax of Rosetta in a couple of non-intrusive ways. Furthermore, we introduce a new and explicit way to instantiate data types with a JSON-like syntax.

Phase 2: add JSON syntax and clean up Rosetta syntax to simplify upcoming work

Improve the implementation of the current syntax of Rosetta

This change should not have a noticeable effect, but it will simplify upcoming improvements to the type system and code generators of Rosetta.

  1. The argument of an only exists operation should always end in a feature call ->.

Currently, it is possible to write an expression such as myVariable only exists, which is not intentional. Arguments of only exists should always end in a feature call such as myVariable -> myAttribute only exists. As Simon Cockx discusses in his thesis, this might only look like a small thing, but it would actually decrease the complexity of code generators in a non-trivial way.

  1. The only-element operation can be used in any context.

Currently, the only-element operator is only valid in two locations: after a feature call a -> b only-element and after a function call MyFunc() only-element. This is an unnecessary restriction, introducing unnecessary complexity.

  1. Allow empty list literals [].

Empty lists [] are currently not allowed syntactically. We intend to lift this restriction.

  1. Remove parentheses from the internal abstract syntax tree (AST) of Rosetta.

Parentheses currently are parsed as a separate node in the AST of a Rosetta model. This is redundant: they are only relevant in the concrete syntax of a Rosetta model, and representing them in the AST complicates downstream processing such as type checking and code generation.

Remove redundant AST elements using syntactic sugar

This is only an internal change, and does not change the concrete syntax of Rosetta.

  1. Parse the empty literal as a empty list literal []. (i.e., make them equivalent to downstream processes such as the type checker and code generator)
  2. Parse a conditional expression with an absent else branch as else empty. (i.e., make if condition then value equivalent with if condition then value else empty to downstream processes)

This reduces the number of cases that the type checker and code generators should be able to handle.

Add JSON-like syntax for instantiating data types

Problem: There is currently no way to explicitly instantiate a data type.

Example: Suppose we want to describe an employee of a company.

type Employee:
  name string (1..1)
  salary number (1..1)
  isSeniorMember boolean (1..1)
  mentor Employee (0..1)

Currently, the only way to actually make an instance of this type, is to define a function that returns an Employee. This is quite verbose.

func CreateEmployee:
  inputs:
    name string (1..1)
    salary number (1..1)
    isSeniorMember boolean (1..1)
    mentor Employee (0..1)
  output:
    result Employee (1..1)
  set result->name: name
  set result->salary: salary
  set result->isSeniorMember: isSeniorMember
  set result->mentor: mentor

Now we can instantiate an Employee by calling CreateEmployee(...).

Additionally, as an edge case, it is currently impossible to instantiate a data type without any properties. (which has proven to be useful for some of our clients)

type Zero:

func CreateZero:
  output:
    result Zero (1..1)
  set result: ??? // impossible to implement this function

Solution: We intend to introduce an intuitive JSON-like syntax to instantiate a data type. For example, we could instantiate an Employee called Dwight Schrute who has a mentor called Michael Scott as follows.

Employee {
  name: "Dwight Schrute",
  salary: 4800.00,
  isSeniorMember: False,
  mentor: Employee {
    name: "Michael Scott",
    isSeniorMember: True,
    salary: 6300.00,
    mentor: empty
  }
}

Note that properties may be written in any order (see the salary and isSeniorMember properties of Michael Scott).

In many Rosetta models such as the CDM, it is common practice to have data types with many optional properties, e.g., with a cardinality constraint (0..1). In such cases, it is inconvenient to set most of them to empty explicitly. We'll therefore add a shorthand, ..., which stand for "assign empty to every other attribute". An example taken from the CDM:

type ProductIdentification:
  productQualifier productType (0..1)
  primaryAssetData AssetClassEnum (0..1)
  secondaryAssetData AssetClassEnum (0..*)
  externalProductType ExternalProductType (0..*)
  productIdentifier ProductIdentifier (0..*)

We could then write

ProductIdentification {
  primaryAssetData: AssetClassEnum->Credit,
  ...
}

which would be equivalent with

ProductIdentification {
  primaryAssetData: AssetClassEnum->Credit,
  productQualifier: empty,
  secondaryAssetData: empty,
  externalProductType: empty,
  productIdentifier: empty
}

Further suggestions

The following proposals still need to be discussed and might not make it to the final version of this project.

  1. Rethink the assign-output/set/add syntax for defining the output of a function.

Right now, the output of a function is defined as a series of assign-output/set/add statements, each of which has a so called "assign path" consisting of an arbitrary number of feature calls together with optional indexing. This complicates checking the validity of a function: does this series of statements define the output completely? How should we type check an assign path involving features with plural cardinality? How should we check the cardinality constraint for a series of set and add statements?

Proposal 1: With the new JSON syntax to instantiate data types, we might be able to replace these multiple statements with a single set: <expression> statement that completely defines the output with a single expression. This would circumvent creating complex solutions to the problems from above. On the downside, this would require changes to every existing Rosetta model.

Proposal 2: Create a validity rule to check that every attribute of the output is completely defined. If the implementation of a function is intended to be left abstract, then a user should make this intention explicit by writing abstract before the function declaration. Also add validity rules that an assign path may not contain attributes with non-singular cardinality in the middle (i.e., it may only end with such an attribute). Also make it possible to index the root of an assign path. This is less intrusive; only the abstract modifier would need to be added to functions that are left abstract.

  1. Add a not operator.

There is currently no operator for boolean negation. This could be added quite simply.