Value.Parse performance #1151

denis-ilchishin · 2025-01-31T04:49:29Z

denis-ilchishin
Jan 31, 2025

Hi, I've been playing around with Typebox and I'm a bit concerned about the performance of Value.Default, Value.Clean, etc.

I've created a simple playground repo to do some benchmarking and here's some preview:

> node --expose-gc index.js -i 10000

Running 10,000 iterations with GC exposed: true
┌─────────┬─────────────────────────┬────────────┬──────────────────┬─────────────┐
│ (index) │ name                    │ iterations │ time             │ performance │
├─────────┼─────────────────────────┼────────────┼──────────────────┼─────────────┤
│ 0       │ 'Typebox without Parse' │ 10000      │ '     18.114 ms' │ '1.00x'     │
│ 1       │ 'Typebox with Parse'    │ 10000      │ '   3417.595 ms' │ '0.01x'     │
│ 2       │ 'Zod'                   │ 10000      │ '    404.747 ms' │ '0.04x'     │
└─────────┴─────────────────────────┴────────────┴──────────────────┴─────────────┘

I'm totally aware that micro-benchmarking is not very representative, and there's ton of different things that can impact the performace, as well as fundamental difference between how zod and typebox operate. Therefore, this numbers are just for a reference and something to base the conversation on.

From my inverstigation, unions, nested object and arrays have the biggest impact on performance, which sounds very reasonable. Well, why that is the case is also pretty obvious: by the architectural design, all the actions (apply default values, clean extra properties, etc..) in typebox are built to be independetly consumable. But this leads to a not very satisfactory performance, when for some usecase its requried to pass a value through Typebox parse pipeline with multiple actions — most of which deeply traverse the value — therefore doing pretty expensive recursive traverse operations for EACH step. And this one of the biggest differences of how zod operates (does all the validations, default values resolution, extra properties cleanup, etc. in one go).

Another thing that I noticed is that, for example, for Value.Default to correctly resolve and apply default values for union types, its necessary to run Check on the value for each subschema/subtype to figure out the schema to take the default value from. But in this case it always uses runtime validation, which can be significantly slower compared to a complied validation (example from typebox's readme)

┌────────────────────────────┬────────────┬──────────────┬──────────────┬──────────────┬──────────────┐
│ (index)                    │ Iterations │ ValueCheck   │ Ajv          │ TypeCompiler │ Performance  │
├────────────────────────────┼────────────┼──────────────┼──────────────┼──────────────┼──────────────┤
│ Array_Composite_Union      │ 1000000    │ '   1331 ms' │ '     76 ms' │ '     40 ms' │ '    1.90 x' │
└────────────────────────────┴────────────┴──────────────┴──────────────┴──────────────┴──────────────┘

So, I wonder, if there's room for improvement here? Do you see any way to optimize it? Here's couple of things I've been thinking about:

is is possible to squash all the actions of Value.Parse into one-time-only deep value traversal? So, instread of having separate

function Visit(schema: TSchema, references: TSchema[], value: unknown): any {
  const references_ = Pushref(schema, references)
    const schema_ = schema as any
    switch (schema_[Kind]) {
      case 'Array':
        return FromArray(schema_, references_, value)
      case 'Date':
        return FromDate(schema_, references_, value)
      // ...
    }
}

for every action, make it that the list of actions to be performed is passed to each type specific function

function FromDate(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown) {
   return Value.Parse(actions, schema, references, value)
}

function FromArray(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown) {
  if (IsArray(value)) {
    for (let i = 0; i < value.length; i++) {
      value[i] = Value.Parse(actions, schema.items, references, value[i])
    }
    return value
  }
  return value
}

function Visit(actions: ParseAction[], schema: TSchema, references: TSchema[], value: unknown): any {
  const references_ = Pushref(schema, references)
    const schema_ = schema as any
    switch (schema_[Kind]) {
      case 'Array':
        return FromArray(actions, schema_, references_, value)
      case 'Date':
        return FromDate(actions, schema_, references_, value)
      // ...
    }
}

is it possible to extend a compiler, so it also compiles and saves references to the optimized check functions of all nested types/schemas, so they can be passed to a Value.Default (and others) and be accessed there, just like you do with references, something like this:

function FromUnion(schema: TUnion, references: TSchema[], value: any, checks: Map<TSchema, CheckFn>): unknown {
  for (const subschema of schema.anyOf) {
    const converted = Visit(subschema, references, Clone(value))
    const checkFn = checks.get(subschema) ?? Check.bind(null, subschema)
    if (!checkFn(references, converted)) continue
    return converted
  }
  return value
}

This, probably, should significantly improve performance for union and intersaction types

sinclairzx81 · 2025-01-31T07:06:44Z

sinclairzx81
Jan 31, 2025
Maintainer

@denis-ilchishin Hi!,

Sorry, I meant to respond to this sooner via discussion thread, but have been somewhat side tracked getting https://github.com/sinclairzx81/typemap ready for Standard Schema 1.0.

Hi, I've been playing around with Typebox and I'm a bit concerned about the performance of Value.Default, Value.Clean, etc.

The TypeBox Parse function isn't really designed for high performance. The function was written to be an all-in kitchen sink pipeline which internally calls to Clone, Clean, Default, Convert, Assert and Decode to process a value. Parse pays a operational cost of all of these functions run in sequence.

Parse Configuration

You can optimize the Parse pipeline by configuring it, (or even replacing operations with more optimized versions). The following benchmarks a few variations of the default pipeline.

import { Value } from '@sinclair/typebox/value'
import { Type } from '@sinclair/typebox'

function benchmark(operations: string[]) {
  const name = operations.length > 0 ? operations.join(', ') : 'Noop'
  console.time(name)
  const type = Type.Object({
    x: Type.Number(),
    y: Type.Number(),
    z: Type.Number()
  })
  const value = { x: 1, y: 2, z: 3 }
  for(let i = 0; i < 100000; i++) Value.Parse(operations, type, value)
  console.timeEnd(name)
}

benchmark(['Clone', 'Clean', 'Default', 'Convert', 'Assert', 'Decode']) // default pipeline
benchmark(['Clean', 'Default', 'Convert', 'Assert', 'Decode'])
benchmark(['Clean', 'Default', 'Convert', 'Assert'])
benchmark(['Clean', 'Default', 'Assert'])
benchmark(['Clean', 'Assert'])
benchmark(['Assert'])
benchmark([])

Results

Clone, Clean, Default, Convert, Assert, Decode: 176.815ms
Clean, Default, Convert, Assert, Decode: 126.326ms
Clean, Default, Convert, Assert: 87.026ms
Clean, Default, Assert: 70.661ms
Clean, Assert: 34.463ms
Assert: 18.255ms -- very comparable to Zod
Noop: 6.494ms

Any potential optimizations would need to be implemented at a operation level, not a Parse level. (TypeBox narrows to scope for optimization to each operation by design)

Parse Extensions

most of which deeply traverse the value — therefore doing pretty expensive recursive traverse operations for EACH step. And this one of the biggest differences of how zod operates (does all the validations, default values resolution, extra properties cleanup, etc. in one go).

Yeah, this is true. TypeBox Value functions are written as distinct / decoupled operations that require value traversal for each operation run. This is by design as these functions may be JIT optimized in future (and it's easier to optimize a function that does one thing than a function that does many things)

However, I do agree some optimizations would be possible by avoiding repeated traversal per operation. It would be technically possible to implement a single traversal function in user space, but would require implementing a ne w parsing system from scratch. This could be configured in the following way.

import { Value, ParseRegistry } from '@sinclair/typebox/value'
import { Type, TSchema, Static } from '@sinclair/typebox'

// ------------------------------------------------------------------
// FastParse
// ------------------------------------------------------------------
ParseRegistry.Set('FastParse', (schema, references, value) => {
  return value // todo: implement all operations as single operation here
})
function FastParse<Type extends TSchema>(schema: Type, value: unknown): Static<Type> {
  return Value.Parse(['FastParse'], schema, value)
}

// ...

const result = FastParse(Type.String(), 'hello')

console.log(result)

Answers to Questions

is is possible to squash all the actions of Value.Parse into one-time-only deep value traversal? So, instread of having separate

Not using the current Value.* functions, these perform deep traversal by default. You would need to design a new set of functions that only operate at a single level of depth (traversal facilitated by some exterior recursive visitor, not within the functions themselves which should only be limited to mapping Input -> Output)

This has partially been explored before, but would be happy to assist community implementations.

is it possible to extend a compiler, so it also compiles and saves references to the optimized check functions of all nested types/schemas, so they can be passed to a Value.Default (and others) and be accessed there, just like you do with references, something like this:

Caching would be possible external to TypeBox, but not internal. Caching optimization internal to TypeBox have been attempted before but resulted in too much complexity and nuance to support, as such caching schematics should happen exterior to TypeBox (for example, caching via Map<TSchema, TypeCheck<TSchema>>)

As for the following, TypeBox can't inject Compiler functionality into the Value.* functions. As mentioned, the Value.* functions may be JIT optimized in future, and I am quite hesitant to introduce coupling between the Compiler to Value.* as this would complicate future JIT optimization work.

// coupling is out of scope.
const checkFn = checks.get(subschema) ?? Check.bind(null, subschema)

Hope this brings some insight into things. Happy to discuss optimizations at the Value operation level, but not at a Parse level. The current functions should be fairly optimized, but there is likely room to improve performance on certain types (Union and Intersect especially), so would be open to discussing better implementations if possible (it would likely require research)

Again, sorry for the delay, Should I convert this back to discussion?
Cheers
S

0 replies

denis-ilchishin · 2025-01-31T08:31:18Z

denis-ilchishin
Jan 31, 2025
Author

Again, sorry for the delay, Should I convert this back to discussion?

Yeah, let's move it back to discussions then. I'm not sure how discussions work and thought you just don't see any notifications or something.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Value.Parse performance #1151

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Value.Parse performance #1151

denis-ilchishin Jan 31, 2025

Replies: 2 comments

sinclairzx81 Jan 31, 2025 Maintainer

Parse Configuration

Parse Extensions

Answers to Questions

denis-ilchishin Jan 31, 2025 Author

denis-ilchishin
Jan 31, 2025

sinclairzx81
Jan 31, 2025
Maintainer

denis-ilchishin
Jan 31, 2025
Author