refactor(web): split tokenization realignment from evaluateTransition 🚂 by jahorton · Pull Request #15191 · keymanapp/keyman

jahorton · 2025-11-19T20:21:02Z

With the various ways that tokenizations can transition depending upon which potential inputs are applied, it's possible for multiple different tokenizations to transition into the same one. As such, there will no longer be "just one" way that a tokenization is reached. Accordingly, it's best to perform word-boundary realignment operations (splits, merges) separately from text-editing operations (inserts, deletes).

Fortunately, it's possible to enact this before multi-tokenization. It may even be advantageous to do so for clarity's sake - this makes clear which portions of the operations are for context word-boundary realignment and which are for actual context transition.

Build-bot: skip build:web
Test-bot: skip

keymanapp-test-bot · 2025-11-19T20:21:10Z

User Test Results

Test specification and instructions

User tests are not required

jahorton · 2026-03-11T20:24:31Z

web/src/engine/predictive-text/worker-thread/src/main/correction/context-tokenization.ts

+  readonly transitionEdits?: {
+    addedNewTokens: boolean,
+    removedOldTokens: boolean,
+    // NOTE:  slated for removal in an upcoming PR.  Exists in this form to


Currently, #15727.

With the various ways that tokenizations can transition depending upon which potential inputs are applied, it's possible for multiple different tokenizations to transition into the same one. As such, there will no longer be "just one" way that a tokenization is reached. Accordingly, it's best to perform word-boundary realignment operations (splits, merges) separately from text-editing operations (inserts, deletes). Build-bot: skip build:web Test-bot: skip

web/src/engine/predictive-text/worker-thread/src/main/correction/context-tokenization.ts

ermshiperete · 2026-03-13T10:59:18Z

web/src/engine/predictive-text/worker-thread/src/main/correction/context-tokenization.ts

+
    // Assumption:  inputs.length > 0.  (There is at least one input transform.)
    const inputTransformKeys = [...inputs[0].sample.keys()];
+    const baseTailIndex = (tailTokenization.length - 1);


Shouldn't this be done after removing the tokens from tailTokenization? Otherwise baseTailIndex might point to an index that is no longer valid.

No, this is still correct. Inputs to be applied are tokenized elsewhere, and those tokens are indexed relative to this specific index - the location of the last pre-edit context token. For such cases, the 'first' (and possibly more!) such token index (as obtained from inputs[0].sample.keys() will be negative.

This is enforced in ContextTokenization.mapWhitespacedTokenization and .assembleTransforms, which together produce the key-values obtained by the block of code reviewed here.

…on/context-tokenization.ts Co-authored-by: Eberhard Beilharz <ermshiperete@users.noreply.github.com>

github-project-automation bot added this to Keyman Nov 19, 2025

github-project-automation bot moved this to Todo in Keyman Nov 19, 2025

github-actions bot added web/ web/predictive-text/ refactor labels Nov 19, 2025

keymanapp-test-bot bot added the epic-autocorrect label Nov 19, 2025

keymanapp-test-bot bot changed the title ~~refactor(web): split tokenization realignment from evaluateTransition~~ refactor(web): split tokenization realignment from evaluateTransition 🚂 Nov 19, 2025

keymanapp-test-bot bot added this to the A19S16 milestone Nov 19, 2025

keyman-server modified the milestones: A19S16, A19S17 Nov 22, 2025

keyman-server modified the milestones: A19S17, A19S18 Dec 6, 2025

keyman-server modified the milestones: A19S18, A19S19 Dec 21, 2025

keyman-server modified the milestones: A19S19, A19S20 Jan 3, 2026

keyman-server modified the milestones: A19S20, A19S21 Jan 16, 2026

jahorton force-pushed the feat/web/search-space-node-propagation branch from b31bcad to c303355 Compare January 21, 2026 21:52

jahorton force-pushed the feat/web/search-space-node-propagation branch 3 times, most recently from beafeb6 to 36df714 Compare January 30, 2026 21:06

keyman-server modified the milestones: A19S21, A19S22 Jan 31, 2026

jahorton force-pushed the refactor/web/realign-tokenization branch from 49391d5 to 3473c6f Compare February 3, 2026 14:48

jahorton force-pushed the feat/web/search-space-node-propagation branch from 36df714 to c2e0427 Compare February 5, 2026 19:43

jahorton force-pushed the refactor/web/realign-tokenization branch from 3473c6f to 4f257f5 Compare February 5, 2026 19:44

keyman-server modified the milestones: A19S22, A19S23 Feb 13, 2026

jahorton force-pushed the feat/web/search-space-node-propagation branch from c2e0427 to 3713e6a Compare March 4, 2026 18:20

jahorton force-pushed the refactor/web/realign-tokenization branch from 4f257f5 to 3ef2d2a Compare March 4, 2026 18:25

jahorton force-pushed the feat/web/search-space-node-propagation branch from 3713e6a to 0fa7e6b Compare March 5, 2026 12:54

jahorton force-pushed the refactor/web/realign-tokenization branch from 3ef2d2a to 4450e14 Compare March 5, 2026 12:55

jahorton force-pushed the feat/web/search-space-node-propagation branch 3 times, most recently from c5f9b66 to 91cf42e Compare March 10, 2026 16:41

jahorton changed the base branch from feat/web/search-space-node-propagation to feat/web/test-quotient-specialized-splits March 11, 2026 16:04

jahorton force-pushed the refactor/web/realign-tokenization branch 2 times, most recently from a4843aa to 7e07e77 Compare March 11, 2026 20:22

jahorton commented Mar 11, 2026

View reviewed changes

jahorton force-pushed the refactor/web/realign-tokenization branch from 7e07e77 to 41beca3 Compare March 12, 2026 17:36

jahorton changed the base branch from feat/web/test-quotient-specialized-splits to refactor/web/root-and-legacy-spur-tests March 12, 2026 17:36

jahorton requested review from ermshiperete and mcdurdin and removed request for ermshiperete March 12, 2026 17:36

jahorton marked this pull request as ready for review March 12, 2026 17:38

jahorton requested a review from ermshiperete March 12, 2026 17:44

ermshiperete approved these changes Mar 13, 2026

View reviewed changes

Update web/src/engine/predictive-text/worker-thread/src/main/correcti…

74844bb

…on/context-tokenization.ts Co-authored-by: Eberhard Beilharz <ermshiperete@users.noreply.github.com>

keyman-server modified the milestones: A19S24, A19S25 Mar 14, 2026

ermshiperete approved these changes Mar 16, 2026

View reviewed changes

Base automatically changed from refactor/web/root-and-legacy-spur-tests to epic/autocorrect March 16, 2026 13:12

jahorton merged commit fa0c7ae into epic/autocorrect Mar 16, 2026
7 of 8 checks passed

jahorton deleted the refactor/web/realign-tokenization branch March 16, 2026 13:12

github-project-automation bot moved this from Todo to Done in Keyman Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(web): split tokenization realignment from evaluateTransition 🚂#15191

refactor(web): split tokenization realignment from evaluateTransition 🚂#15191
jahorton merged 2 commits intoepic/autocorrectfrom
refactor/web/realign-tokenization

jahorton commented Nov 19, 2025 •

edited

Loading

Uh oh!

keymanapp-test-bot bot commented Nov 19, 2025 •

edited

Loading

Uh oh!

jahorton Mar 11, 2026

Uh oh!

Uh oh!

ermshiperete Mar 13, 2026

Uh oh!

jahorton Mar 13, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jahorton commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Test Results

Uh oh!

jahorton Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ermshiperete Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

jahorton Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jahorton commented Nov 19, 2025 •

edited

Loading

keymanapp-test-bot bot commented Nov 19, 2025 •

edited

Loading

jahorton Mar 13, 2026 •

edited

Loading