Scopehoisting contributor documentation (#8402)

parcel-bundler · Aug 11, 2023 · b6224fd · b6224fd
1 parent 88e6db4
commit b6224fd
Show file tree

Hide file tree

Showing 6 changed files with 899 additions and 0 deletions.
diff --git a/docs/Deferring.md b/docs/Deferring.md
@@ -0,0 +1,77 @@
+# Deferring Assets
+
+(The core idea and benefits are described in [Scopehoisting](Scopehoisting.md)).
+
+Even if the usual way to describe deferring is via dependencies (and this is also how the API exposes it), the entity that's actually getting deferred is the asset group node. This is because the dependency is just the dependency "request" (though not as in "request" graph) and doesn't know yet whether the resolved asset is side-effect free. That is only known after the resolver ran (and the resolver result is stored in the asset group node).
+
+## Deferring
+
+This might be the current state of the asset graph during transformation, only the "Button" reexport of the library is used so far, and the other reexport "Switch" wasn't imported anywhere (yet). So the "Switch" asset was deferred.
+
+The `deferred`/`hasDeferred` properties respond to the asset graph node properties.
+
+```mermaid
+graph TD;
+  AssetA
+    -->DependencyLibA[DependencyLibA:Button]
+    -->AssetGroupLib[AssetGroupLib<br>hasDeferred]
+    -->AssetLib[AssetLib<br>hasDeferred];
+  AssetLib
+    -->DependencyLibButton[DependencyLibButton:Button]
+    -->AssetGroupLibButton
+    -->AssetLibButton;
+  AssetLib
+    -->DependencyLibSwitch[DependencyLibSwitch:Switch<br>hasDeferred]
+    -->AssetGroupLibSwitch[AssetGroupLibSwitch<br>deferred];
+
+  classDef asset fill:orange,stroke:orange;
+  classDef dep fill:lime,stroke:lime;
+  class AssetA asset;
+  class AssetLib asset;
+  class AssetLibButton asset;
+  class DependencyLibA dep;
+  class DependencyLibButton dep;
+  class DependencyLibSwitch dep;
+```
+
+This is detected in [`assetGraph.shouldVisitChild(DependencyLibSwitch, AssetGroupLibSwitch)`](https://github.com/parcel-bundler/parcel/blob/9e5d05586577e89991ccf90400f2c741dca11aa3/packages/core/core/src/AssetGraph.js#L305) which calls `assetGraph.shouldDeferDependency` (reads the symbol information and determines if the dependency is unused). Then `markParentsWithHasDeferred(DependencyLibSwitch)` is called to add the `hasDeferred=true` flags for the parent asset and asset group nodes.
+
+Because `shouldVisitChild` returns false, the graph traversal never visits the asset group node and also never transforms the corresponding asset.
+
+### Undeferring
+
+Now another dependency is added/discovered during transformation, the asset group should be undeferred and the asset should get transformed:
+
+```mermaid
+graph TD;
+  AssetA
+    -->DependencyLibA[DependencyLibA:Button]
+    -->AssetGroupLib;
+  AssetB
+    -->DependencyLibB[DependencyLibB:Switch]
+    -->AssetGroupLib;
+  AssetGroupLib[AssetGroupLib<br>hasDeferred]
+    -->AssetLib[AssetLib<br>hasDeferred];
+  AssetLib
+    -->DependencyLibButton[DependencyLibButton:Button]
+    -->AssetGroupLibButton
+    -->AssetLibButton;
+  AssetLib
+    -->DependencyLibSwitch[DependencyLibSwitch:Switch<br>hasDeferred]
+    -->AssetGroupLibSwitch[AssetGroupLibSwitch<br>deferred];
+  AssetGroupLibSwitch
+    -->AssetLibSwitch
+
+  classDef asset fill:orange,stroke:orange;
+  classDef dep fill:lime,stroke:lime;
+  class AssetA,AssetB,AssetLib,AssetLibButton asset;
+  class DependencyLibA,DependencyLibB,DependencyLibButton,DependencyLibSwitch dep;
+  style AssetLibSwitch fill:transparent,stroke-dasharray: 5 5,stroke:orange;
+  linkStyle 10 stroke-dasharray: 5 5,stroke-width: 1.5;
+```
+
+`DependencyLibB` got added to the graph and now all its children are considered: in the asset graph request traversal's `visitChildren` wrapped, there's [an override to revisit nodes if they have `hasDeferred=true`](https://github.com/parcel-bundler/parcel/blob/9e5d05586577e89991ccf90400f2c741dca11aa3/packages/core/core/src/requests/AssetGraphRequest.js#L169). This causes `AssetLib` and in turn `DependencyLibSwitch` to be revisited.
+
+`shouldVisitChild` and `shouldDeferDependency` then determine that `AssetLibSwitch` is now used and call `unmarkParentsWithHasDeferred(AssetGroupLibSwitch)` which clears `DependencyLibSwitch.hasDeferred`, clears `AssetLib.hasDeferred` (but only if there is no other sibling dependency that is still deferred), and sets `AssetGroupLib.hasDeferered = AssetLib.hasDeferred`.
+
+`shouldVisitChild` returns true and `AssetGroupLibSwitch` gets visited for the first time, also transforming the asset and creating the asset node.
diff --git a/docs/Scopehoisting Packager.md b/docs/Scopehoisting Packager.md
@@ -0,0 +1,63 @@
+# Scopehoisting Packager - Overview
+
+(The skipping of single assets is described in [Scopehoisting](Scopehoisting.md)).
+
+## Starting point `package()`:
+
+1. `loadAssets()`: Load the assets contents from cache and determine which assets are wrapped.
+2. `processAsset()`/`visitAsset()` which call `buildAsset()`: These will recursively resolve dependency specifiers and inline dependencies, and append the result to the top level `res` string.
+3. Kick off the process by calling `processAsset()` for all assets (and skip some to only process assets once if it was already inlined somewhere else).
+
+## `buildAsset()`:
+
+1. If the asset should be skipped: ignore the current asset, call `buildAsset()` for dependency assets and concatenate only them together.
+2. Call `buildReplacements()`, generating the `Map`s used during the text replacement:
+   - The dependency map which is used to resolve `import "...";` declarations inserted by the transformer: `${assetId}:${specifier}${specifiertype} -> Dependency`
+   - Import replacements: the local part of a dependency symbol (`$id$import$foo`) -> result of `getSymbolResolution` (e.g. `$id$export$bar` or `parcelRequire("id").bar`)
+3. Call `buildAssetPrelude()`:
+   - generates `$parcel$defineInteropFlag($id$exports)` call for this asset if needed.
+   - synthesizes the exports object if needed (including generation of the `$parcel$export` and `$parcel$exportWildcard` calls only for used re/exports)
+4. Perform the replacements with `REPLACEMENT_RE` matching one of
+   - `import "id";`
+     - will be replaced with the source code of the asset (call `buildAsset()` recursively ). If the referenced asset is wrapped, don't inline but place it after the current asset (into `depContent`).
+     - calls `getHoistedParcelRequires` to read the `hoistedRequires` list from `getSymbolResolution` and prepend needed requires.
+   - `$id$exports`
+     - `module.exports` inside the asset gets replaced with `$id$exports` in the transformer, but for wrapped assets, this has to be replaced back to `module.exports`
+   - `$id$import|importAsync|require$foo`
+     - will be looked up in the replacements and replaced with the resolved identifier
+5. If necessary, wrap the result up until now with `parcelRequire.register("id", ...)`.
+
+## `getSymbolResolution()`:
+
+This is a wrapper around `bundleGraph.getSymbolResolution()`.
+
+The additional dependency argument is used to determine whether CJS interop has to be applied (if it's a ESM import), or whether it's a non-conditional import (and a hoisted `parcelRequire` call has to be generated).
+
+Compared to the bundle graph's method, the `parentAsset` is used to make wrapped assets using their own namespace object refer to `module.exports` instead of `$id$exports`.
+
+- It returns the resolved expression for the specified symbol:
+  - `$id$export$bar` (e.g. same-bundle ESM import),
+  - `$id$exports` (e.g. same-bundle ESM import),
+  - `id$exports.bar` (e.g. non statically analyzable exports) or
+  - `parcelRequire("id").bar` (wrapped/in another bundle)
+  - `$parcel$interopDefault` (if an ESM default import resolved to a non-statically analyzable CJS asset)
+- also handles interop (if the default symbol is imported and the resolved asset is CJS, use the namespace instead)
+- tracks imports of wrapped assets (which will need `parcelRequire` call) by mutating the `hoistedRequires` list
+
+## `bundleGraph.getSymbolResolution()`
+
+This method transitively/recursively traverses the reexports of the asset to find the specified export. This enables resolving some import to the actual value and not just some reexporting binding.
+
+The result is an `asset`, the `exportSymbol` string, and `symbol`. The value can be accessed from `$asset.id$exports[exportSymbol]`, which is potentially also already (or only) available via the top-level variable `symbol`. So for the add/square example above, `getSymbolResolution(math.js, "add")` would return `{asset: "math.js", exportSymbol: "add", symbol: "$fa6943ce8a6b29$export$add"}`.
+
+While this improves code size, an imperfection with this system is that it actually means that an asset A can use a value from asset B (which is usually modelled with a dependency from A to B) without there actually being a dependency between the two. Dependencies are also used to determine if an asset is required from another bundle and has to therefore be registered with `parcelRequiree`. This discrepancy can be handled inside of a single bundle, but not across multiple bundles, so the `boundary` parameter makes the resolution stop once the bundle is left.
+
+There are three possible resolution results:
+
+- the export has been found (with top level variable `symbol`).
+- the export has not been found (`symbol === undefined`), this should have been caught already by symbol propagation
+- the export has been found and is unused (`symbol === false`)
+- it had to bailout because there are multiple possibilities (`symbol === null`), and the caller should fallback to `$resolvedAsset$exports[exportsSymbol]`. Some examples for bailouts are:
+
+  - `export * from "./nonstatic-cjs1.js"; export * from "./nonstatic-cjs1.js";`, so the decision between which reexport to follow should happen at runtime.
+  - if the `resolvedAsset` is a non-static cjs asset itself, then `module.exports[exportsSymbol]` should be used anyway.
diff --git a/docs/Scopehoisting Transformer.md b/docs/Scopehoisting Transformer.md
@@ -0,0 +1,142 @@
+# Scopehoisting Transformer
+
+(Be sure to read [swc Visitors](swc%20Visitors.md) beforehand.)
+
+("Non-static" refers to a variable being used in a way that cannot be optimized, such as `module.exports[someVariable] = 2`, or `import * as x from "..:"; console.log(x[someVariable]);`.)
+
+The task of the hoist transformer is, in the simplest case, rewriting imports and exports, renaming the uses of the imports. The packager can then detect these `import "id:...";` statements to inline dependencies, replace `$id$import$foo` with the resolved expression, and generate necessary `$parcel$export(..., () => $id$export$b)` statements.
+
+<table>
+<tr><td>
+
+```js
+// a.js
+import {b} from './b';
+b();
+
+// b.js
+export let b = 2;
+```
+
+</td><td>
+
+```js
+// a.js
+import 'id:./b';
+$id$import$b$b();
+
+// b.js
+let $id$export$b = 2;
+```
+
+</td></tr>
+</table>
+
+While this is rather straight forward for pure ESM, a major source of complexity is having to handle arbitrary CJS while still optimizing as much as possible (non-static `module` accesses, non-top-level `require` calls, ...).
+
+In addition to the code, it sets the symbols and various meta properties on both the asset and the dependencies:
+
+- `asset.meta.id`: depending on which transformers run after the JS transformer, the value of `asset.id` will be different in packager from the id used for the various variables like `$id$export$foo`. The current asset id in the JS transformer is therefore stored.
+- `asset.meta.hasCJSExports`: true if there is at least one CJS export
+- `asset.meta.staticExports`: true if there is at least one CJS export that doesn't follow the pattern `module.exports.foo = ...`
+- `asset.meta.shouldWrap`: Some constructs require this asset being wrapped in a `parcelRequire.register` block: top-level returns, non-static uses of `module`, eval, reassigning `module` or `exports`
+- `dep.meta.shouldWrap`: this is a conditional require
+- `dep.meta.promiseSymbol`: see the "Dynamic Imports" section
+
+## Detecting non-static CJS imports/exports
+
+A commonly used pattern is detecting some special case patterns such as top-level `var x = require("...");` or `aNamespaceObject.foo` or top-level `module.exports.foo = ...;` as high up in the visitor functions as possible and not traversing the children at all if there's a match.
+
+So there is check for static top-level requires in `visit_module`, and if the `visit_expr` visitor is reached for `require("...")`, it is definitely a non-static (and conditional) require.
+
+The `typeof` visitor doesn't traverse the children if the argument is `module`, so that `typeof module` doesn't count towards the non-static accesses to `module`.
+
+## Self References
+
+Because even `module.exports.foo = ...;` statements are detected and turned into symbols just like ESM exports, reading `module.exports` or `module.exports.foo` would naively not cause all of the exports to be preserved nor an namespace object to be generated (because looking at the graph and the symbol data, they are unused).
+
+So instead, reading `module.exports` is expressed just like it is in ESM: by adding an import to the asset itself with the symbols being used. This is called a "self reference".
+
+## Identifier Names
+
+There are names to uniquely identify an import, the actual format doesn't actually matter for the code, as long as its used consistently (Parcel never re-parses these names to retrieve the parts again):
+
+- `$x$import$y` = Asset with id `x` imported the namespace of the dependency with hashed source `y`
+- `$x$import$y$z` = Asset with id `x` imported the hashed export `z` of the dependency with hashed source `y`
+- `$x$require$y` = Asset with id `x` required the namespace the dependency with hashed source `y`
+
+and to unique identify an export:
+
+- `$x$exports` = The namespace exports object of the asset with id `x`
+- `$x$exports$y` = The hashed export `y` of the asset with id `x`
+
+(The symbol names are hashed because it's possible to have export names that are invalid Javascript identifiers: `module.exports["a b"] = 1;` or `export {x as "a b"}`, or via CSS modules.)
+
+## Dynamic Imports
+
+Dynamic imports such as `import("..").then(({ foo }) => log(foo));` will only cause `foo` to be used and not the entire asset. But at runtime, we still need a namespace object from which to access `off`. For this reason,
+
+```js
+import('./other.js').then(({foo}) => log(foo));
+```
+
+the dependency:
+
+```
+{
+  promiseSymbol: '$assetId$importAsync$other'
+  symbols: {
+    'foo' => {
+      local: '$assetId$importAsync$other$90a7f3efeed30595',
+    }
+  }
+}
+```
+
+the generated code:
+
+```js
+import 'assetId:21eb38ddd81971f9';
+$assetId$importAsync$other.then(({foo}) => log(foo));
+```
+
+So `import()` is replaced by an identifier that isn't actually listed in the symbols (because otherwise a symbol for `*` would prevent removing unused symbols), and this is the identifier stored in `dep.meta.promiseSymbol` which is then used for replacement in the packager.
+
+## Preceding analysis pass: `Collect`
+
+[This analysis](https://github.com/parcel-bundler/parcel/blob/9e2d5d0d60d08d65b5ae6cd765c907a8753bbf39/packages/transformers/js/core/src/hoist.rs#L1291) runs is used even without scope-hoisting, to generate symbols in development for deferring.
+
+- collect which variable refers to an import/export
+- find evals, non-static accesses of `module`, `exports`, ...,
+
+## Actual transformation pass: `Hoist`
+
+Some of the following steps are skipped when the asset was determined to be wrapped during `Collect` (stored in `self.collect.should_wrap`), since `module` and `exports` will be available in that case anyway and no rewriting has to happen for uses of these.
+
+[fold_module](https://github.com/parcel-bundler/parcel/blob/9e2d5d0d60d08d65b5ae6cd765c907a8753bbf39/packages/transformers/js/core/src/hoist.rs#L138):
+
+- match ESM import/export decls
+  - store in `self.hoisted_import` and `self.reexports`, `self.exported_symbols`
+  - imports are replaced with `import "...";`
+  - for exports, just a `var $id$export = xyz` is left, the info what is imported/exported is kept in the maps
+- match statically analyzable `var x = require("y");`.
+  - similarly, the whole statement gets removed and replaced with `import "...";`,
+
+Then, various replacements happen:
+
+- [fold_ident](https://github.com/parcel-bundler/parcel/blob/9e2d5d0d60d08d65b5ae6cd765c907a8753bbf39/packages/transformers/js/core/src/hoist.rs#L756) looks up in `collect.imports` whether that identifier refers to an import (this renames expressions that refer to the variable as well as the names of the variable declarations themselves)
+
+- fold_assign_expr
+
+  - replace `module.exports = ...;` with `$id$exports = ...;`
+  - replace `module.exports.foo = ...;` with `$id$exports$foo = ...;` and generate a corresponding hoisted `var $id$exports$x;` declaration.
+
+- fold_expr:
+  - replace `module.exports.foo` with `$id$export` identifier
+  - replace `importedNs.foo` with `$id$import$foo` identifier
+  - replace `require("x").foo` with `$id$import$foo` identifier
+  - replace `require("x")` with `$id$import` identifier
+  - replace `import("x")` with `$id$import` identifier
+  - top-level `this` in ESM -> `undefined`
+  - top-level `this` in CJS -> `module.exports`
+  - wrap ESM imports with `(0, ...)` for correct `this`