Skip to content

Conversation

@CPunisher
Copy link
Member

@CPunisher CPunisher commented Oct 19, 2025

Background:

In #10377 and #10399, we create another crate swc_ecma_lexer from swc_ecma_parser.

  1. The main goal is that we want to split TokenKind and TokenValue to make lexer and parser run faster. This has been done in swc_ecma_parser, whose Token is only 1 byte. This also means we should refactor the lexer and the parser.
  2. SWC always made Lexer and Token public, so the change of Token will introduce a large breaking for rust users. So for compatibility, we have to keep the legacy set of lexer and parser to produce compatible Tokens.
  3. However, I made a wrong decision in refactor(es/parser): Split parser into also-lex/parse-only #10399. That is, we separate the common lexer/parser functions as much as possible in the swc_ecma_lexer/common by introducing a set of complexy ParseTrait, LexerTrait, etc, which makes the project chaotic and less comprehensive. You can see that swc_ecma_parser depends on swc_ecma_lexer and calls the common functions everywhere.

Motivation:

  1. Apparently it makes DX better. You don't need to jump to empty trait functions before jump to their implementations. You don't need to jump cross two crates forward and back.
  2. It also improve the space for performance optimization because you won't be restricted by the trait or the legacy lexer and parser.

Description:

Now it's time to correct the decision. This pr makes swc_ecma_parser self-contained and doesn't depends swc_ecma_lexer any more. On the contrary, this pr makes swc_ecma_lexer depends on some swc_ecma_parser instead such as some common simple data structure like Syntax. For compatibility, I also move and import legacy Token in swc_ecma_parser.

After this pr the swc_ecma_lexer is nearly marked as no longer maintained. All the bug fixes and performance optimization should only be applied in swc_ecma_parser.

Specifically, what I do in this pr is only copy all common functions from swc_ecma_lexer/common to swc_ecma_parser and eliminate the trait-based generics. For example:

// Before
// crates/swc_ecma_lexer/common/...
pub trait Parser {
   fn xxx();
   fn yyy() { ... }
}

pub fn parse_xx<P: Parser>(p: &mut P) { ... }

// Impl for both legacy Parser and new performant Parser
impl common::Parser for Parser { ... }
// After
// crates/swc_ecma_parser/...
impl Parser {
   pub fn parse_xx(&mut self) { // Copy and paste the code }
}

Note that I nearly doesn't change anything in swc_ecma_lexer so the lexer and parser in that crate are still based on trait and common function.

Breaking Changes:

If you don't use Token API, then there's no breaking changes, which means for most rust api users, there's no breaking change. Otherwise you may need to remove the dependencies of swc_ecma_lexer and related imports of traits.

Test in community crates:

Future Works:

Actually when I finished copying all the code, the performance got regression. It takes me lots of time to figure it out but I finally keep the regression around -1%. So I have to do some complex optimization ahead of time such as refactor of parse_subscripts.

@changeset-bot
Copy link

changeset-bot bot commented Oct 19, 2025

🦋 Changeset detected

Latest commit: 3d13362

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CPunisher CPunisher force-pushed the 10-11-refactor/parser branch from 8aef46c to d856923 Compare October 19, 2025 03:27
@codspeed-hq
Copy link

codspeed-hq bot commented Oct 19, 2025

CodSpeed Performance Report

Merging #11148 will not alter performance

Comparing CPunisher:10-11-refactor/parser (3d13362) with main (e93ffde)

Summary

✅ 129 untouched
🆕 11 new
⏩ 11 skipped1

Benchmarks breakdown

Benchmark BASE HEAD Change
🆕 es/lexer/angular N/A 7.7 ms N/A
🆕 es/lexer/backbone N/A 1 ms N/A
🆕 es/lexer/cal-com N/A 13.5 ms N/A
🆕 es/lexer/colors N/A 28.9 µs N/A
🆕 es/lexer/jquery N/A 5.6 ms N/A
🆕 es/lexer/jquery mobile N/A 8.7 ms N/A
🆕 es/lexer/mootools N/A 4.4 ms N/A
🆕 es/lexer/three N/A 19.9 ms N/A
🆕 es/lexer/typescript N/A 110.2 ms N/A
🆕 es/lexer/underscore N/A 883 µs N/A
🆕 es/lexer/yui N/A 4.7 ms N/A

Footnotes

  1. 11 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@socket-security

This comment was marked as spam.

@CPunisher CPunisher force-pushed the 10-11-refactor/parser branch 4 times, most recently from d3b79fc to 236f45e Compare October 21, 2025 10:45
@CPunisher CPunisher force-pushed the 10-11-refactor/parser branch from 20f5d1d to fc2e4e5 Compare October 27, 2025 08:41
@CPunisher CPunisher marked this pull request as ready for review October 27, 2025 09:27
@CPunisher CPunisher requested review from a team as code owners October 27, 2025 09:27
Copilot AI review requested due to automatic review settings October 27, 2025 09:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the swc_ecma_parser to be self-contained by removing its dependency on swc_ecma_lexer. The code previously shared between the lexer and parser through trait-based abstractions has been copied directly into swc_ecma_parser and simplified by removing generic trait implementations. This improves developer experience by eliminating cross-crate navigation and provides opportunities for future performance optimizations.

Key changes:

  • Moved common parsing functions from swc_ecma_lexer into swc_ecma_parser
  • Eliminated trait-based generic abstractions in favor of direct implementations
  • Maintained backward compatibility for most users (no breaking changes unless using Token API directly)

Reviewed Changes

Copilot reviewed 45 out of 58 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/swc_ecma_parser/src/parser/tests.rs Moved expression parsing tests from expr/tests.rs and added new benchmarks
crates/swc_ecma_parser/src/parser/stmt/module_item.rs Removed legacy decorator test (moved to module_item.rs)
crates/swc_ecma_parser/src/parser/stmt.rs Copied statement parsing logic from swc_ecma_lexer, removing trait dependencies
crates/swc_ecma_parser/src/parser/state.rs Added new State type and WithState helper for parser state management
crates/swc_ecma_parser/src/parser/pat.rs Copied pattern parsing functions from swc_ecma_lexer
crates/swc_ecma_parser/src/parser/object.rs Added object literal and pattern parsing functions
crates/swc_ecma_parser/src/parser/module_item.rs Copied module item parsing with legacy test moved from stmt/module_item.rs
crates/swc_ecma_parser/src/parser/mod.rs Removed trait dependencies, added utility methods directly to Parser
crates/swc_ecma_parser/src/parser/macros.rs Added missing macros (expect, debug_tracing, peek, return_if_arrow)
crates/swc_ecma_parser/src/parser/jsx.rs Copied JSX parsing functions from swc_ecma_lexer
crates/swc_ecma_parser/src/parser/input.rs Simplified Tokens trait and Buffer implementation
crates/swc_ecma_parser/src/parser/ident.rs Added identifier parsing functions
crates/swc_ecma_parser/src/parser/expr/tests.rs Removed tests (moved to parser/tests.rs)
crates/swc_ecma_parser/src/parser/expr/ops.rs Removed binary operator tests (moved to parser/tests.rs)
crates/swc_ecma_parser/src/parser/class_and_fn.rs Copied class and function parsing from swc_ecma_lexer
crates/swc_ecma_parser/src/lib.rs Updated exports to remove swc_ecma_lexer dependencies
crates/swc_ecma_parser/src/lexer/whitespace.rs Added whitespace scanning implementation
crates/swc_ecma_parser/src/lexer/table.rs Updated to use const generic for mul_mod method
crates/swc_ecma_parser/src/lexer/state.rs Simplified State implementation removing trait dependencies
crates/swc_ecma_parser/src/lexer/number.rs Added number parsing utilities
Comments suppressed due to low confidence (1)

crates/swc_ecma_parser/src/parser/tests.rs:1

  • Corrected spelling of 'THis' to 'This'.
use std::hint::black_box;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kdy1 kdy1 added this to the Planned milestone Oct 27, 2025
@kdy1 kdy1 changed the title refactor(es/parser): detach swc_ecma_parser from swc_emca_lexer refactor(es/parser): detach swc_ecma_parser from swc_ecma_lexer Oct 27, 2025
@kdy1 kdy1 changed the title refactor(es/parser): detach swc_ecma_parser from swc_ecma_lexer refactor(es/parser): Detach swc_ecma_parser from swc_ecma_lexer Oct 27, 2025
Copilot AI review requested due to automatic review settings October 27, 2025 10:08
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 46 out of 59 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@kdy1 kdy1 merged commit 94f175d into swc-project:main Oct 27, 2025
173 checks passed
@CPunisher CPunisher deleted the 10-11-refactor/parser branch October 27, 2025 11:12
@kdy1 kdy1 modified the milestones: Planned, v1.14.0 Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants