Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved naming of anonymous types and types from files outside of functions.ts #21

Merged
merged 2 commits into from
Mar 18, 2024

Conversation

daniel-chambers
Copy link
Collaborator

@daniel-chambers daniel-chambers commented Mar 15, 2024

Previously the way we named types was messy. We tried to use the name the user had given the type in code, and had a bunch of heuristics we used to try to ensure that duplicate type names wouldn't happen. For example, if the type was imported from a file other than the entrypoint file, we always prepended the file path to the type name. However, this was only done to object types and not to relaxed types, so imported relaxed types could have name conflicts (less of an issue when the entire type is relaxed and totally unvalidated, but still not great).

Given that we know people are probably going to want to split their functions across multiple files and just re-export them from the entrypoint file, we don't want their type names getting butchered with prefixes unless it is actually necessary (ie. there are two types trying to share the same name). To do this, we need to identify all types in use and name them at the end, where we can prefix those names that have multiple types trying to use the name with a disambiguating prefix.

Solution Design

In order to do this, we need to be able to uniquely identify each type, so we can detect when two unique types desire the same name. Turns out this is actually a massive pain in the ass because TypeScript is structurally typed (unlike NDC, which is nominally typed), and so TypeScript really doesn't care about giving types unique IDs. So we had to come up with our own unique identifier for a type. The strategy is as follows:

The type can be one of six different kinds of type (types of different kinds are (obviously) not the same type):

  • An intrinsic type (ie string, number, any, etc) identified by the name of the intrinsic
  • A literal type, either a string literal, number literal, or bigint literal, identified by the literal value itself
  • An anonymous object type, identified by the property names and types of the properties
  • An anonymous union type, identified by the types that make up the union
  • An anonymous intersection type, identified by the types that make up the intersection
  • A type with a symbol (ie. a name), identified by the location in the code where the symbol is declared (filename plus position), plus any type parameters used if the type has them.

(This algorithm can be found in inference.ts/makeUniqueTypeIdentifier)

As we encounter types, we register them with their unique ID and the name they'd prefer to have (inference.ts/CustomTypeNameRegistry.registerUniqueType). For types with a symbol, that'll be the symbol name, but for anonymous types we'll generate one based on how they're used (function name+param name, object type name+property name, etc).

Then, when we're done discovering all types, we determine the final names for all unique types (inference.ts/CustomTypeNameRegistry.determineFinalTypeNames), and replace all instances of the unique type name with the final name (inference.ts/applyFinalTypeNamesToFunctionsSchema).

When two (or more) types desire the same preferred name, nobody gets the preferred name (to help prevent that changing arbitrarily as the code is changed), and they all get a prefixed name. The prefix is derived from the path the source file where the type is declared. The path of the source file is truncated to only include the path from the project root. The project root is where the package.json file is (inference.ts/getProjectRootDirectory). If somehow two unique types still end up with same name somehow, then this is detected and a number is added to the end to force disambiguation.

Anonymous type naming has been changed to produce shorter names (inference.ts/generateTypeNameFromTypePath). Instead of being constructed from the full path from the function root to the type, they are now named after where the anonymous type is immediately used. This produces a shorter, more readable preferred type name. The three expected places to find anonymous types right now are:

  • Function parameters (the generated name is functionName_paramName)
  • Function return type (the generated name is functionName_output)
  • Object properties (the generated name is objectTypePreferredName_propertyName)

Anonymous type naming can result in unexpected type name changes where the same anonymous type (ie same unique id) is used in multiple places. The preferred name allocated will be derived from the first place that anonymous type is encountered. As such, it is recommended that users name all their types using aliases to retain stable type names.

Tests

All existing tests still pass, only a few existing tests have been modified. Namely:

  • basic-inference.test.ts - Anonymous types used now reflect the new naming convention
  • relaxed-types.test.ts - An error message gets a more readable error that reflects the type name in TypeScript not the preferred name
  • naming-conflicts tests have been moved into the new type-naming test suite and renamed to imported-types.

New tests have been added in the type-naming test suite.

JIRA: NLC-3

@daniel-chambers daniel-chambers self-assigned this Mar 15, 2024
@daniel-chambers daniel-chambers changed the title Rework naming of types Improved naming of anonymous types and types from files outside of functions.ts Mar 15, 2024
Copy link

@sordina sordina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking that two anonymous object types with different field orderings produce the same name?

Copy link

@sordina sordina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slick. I'm worried that inference.ts is getting very complicated, but I suppose that is by necessity. Maybe at some point we could split out a "library" for the structural/nominal concerns in order to simplify the business logic aspects of schema inference. No holdup on merging this PR.

CHANGELOG.md Show resolved Hide resolved
function cloneTypeDerivationContext(context: TypeDerivationContext): TypeDerivationContext {
return {
objectTypeDefinitions: structuredClone(context.objectTypeDefinitions),
scalarTypeDefinitions: structuredClone(context.scalarTypeDefinitions),
customTypeNameRegistry: context.customTypeNameRegistry.clone(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this cloned?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this cloning function is used in places (relaxed types) where we run the type derivation algorithm but throw away the results, so we clone the state and discard it after.

ndc-lambda-sdk/src/inference.ts Show resolved Hide resolved
ndc-lambda-sdk/src/inference.ts Show resolved Hide resolved
Comment on lines +844 to +864
const compareTypeId = (typeIdA: TypeId, typeIdB: TypeId): number => {
if (typeIdA.t === "d" && typeIdB.t == "d") {
return typeIdA.f.localeCompare(typeIdB.f)
|| typeIdA.s - typeIdB.s
|| compareList(compareTypeId, typeIdA.ta, typeIdB.ta);
} else if (typeIdA.t === "i" && typeIdB.t == "i") {
return typeIdA.i.localeCompare(typeIdB.i);
} else if (typeIdA.t === "l-n" && typeIdB.t == "l-n") {
return typeIdA.v - typeIdB.v;
} else if (typeIdA.t === "l-s" && typeIdB.t == "l-s") {
return typeIdA.v.localeCompare(typeIdB.v);
} else if (typeIdA.t === "l-bi" && typeIdB.t == "l-bi") {
const aStr = `${typeIdA.v.negative ? "-" : ""}${typeIdA.v.base10Value}`;
const bStr = `${typeIdB.v.negative ? "-" : ""}${typeIdB.v.base10Value}`;
return aStr.localeCompare(bStr);
} else if (typeIdA.t === "o" && typeIdB.t == "o") {
return compareList(
([nameA, pTypeIdA], [nameB, pTypeIdB]) => nameA.localeCompare(nameB) || compareTypeId(pTypeIdA, pTypeIdB),
typeIdA.p,
typeIdB.p
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess these strings could be constants somewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, they're encoded into the type as literals; you can't use any other strings. Not really worth making them constants IMO. And they're local to this function.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is getting pretty gnarly.

Copy link
Collaborator Author

@daniel-chambers daniel-chambers Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, might pull it apart in another PR another time. There's still likely to be a biggish file though, I'd keep all the type derivation functions together.

At least it's not 2.5MB like TypeScript! 😂

@daniel-chambers
Copy link
Collaborator Author

daniel-chambers commented Mar 18, 2024

Checking that two anonymous object types with different field orderings produce the same name?

@sordina Yes, the fields are sorted before they are put into the TypeId, so the original field ordering does not matter.

@daniel-chambers daniel-chambers merged commit 2ab3967 into main Mar 18, 2024
6 checks passed
@daniel-chambers daniel-chambers deleted the type-naming branch March 18, 2024 00:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants