Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI commands are slow to start #102

Open
tegefaulkes opened this issue Jan 29, 2024 · 15 comments
Open

CLI commands are slow to start #102

tegefaulkes opened this issue Jan 29, 2024 · 15 comments
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity

Comments

@tegefaulkes
Copy link
Contributor

Specification

There seems to be a lead time before running Polykey does anything. this is demonstrated by running polykey --help to get the help text. This alone takes about 0.5 seconds on my work laptop.

[nix-shell:~/workspace/Polykey-CLI]$ time node dist/polykey --help
Usage: polykey [options] [command]

Polykey CLI

<Help text helpfully redacted to save space in issue>

real    0m0.493s
user    0m0.465s
sys     0m0.078s

Compare this to polykey bootstrap command which does some password hashing and state creation.

[nix-shell:~/workspace/Polykey-CLI]$ time node dist/polykey bootstrap -pf tmp/pass -np tmp/asd2
<recovery code redacted>

real    0m1.854s
user    0m1.613s
sys     0m0.344s

So a good 500ms is spent doing something before anything really runs. We can start by checking that we're importing all of polykey before running any command. After that the likely culprit is commander.

Additional context

Related: #40

Tasks

  1. Check that we don't blindly import all of Polykey during the import stage of the code. Large parts of Polykey should only be imported when the command is run, no when it is defined.
  2. Look into Commander to see if that could be the source of the slow startup time.
  3. Fix anything causing the slow startup time when running Polykey-CLI.
@tegefaulkes tegefaulkes added the development Standard development label Jan 29, 2024
@tegefaulkes
Copy link
Contributor Author

It's also possible that pkg is doing a bunch of setup each time the command is run. So We'll need to isolate where most of the time is spent.

@tegefaulkes tegefaulkes self-assigned this Mar 26, 2024
@tegefaulkes
Copy link
Contributor Author

There are two parts to this one.

  1. Identify the source of the slow command usage. This includes working out a breakdown of where the time is spent. We can do this with profiling but I'll start with some simple AB testing to identify which part is taking up the most time. There are 2 main suspects, PKG unpacking before running and commander doing runtime setup that takes a while.
  2. After identifying what stage is the problem we need to find a way to optimise the time taken during that stage.

To do the simple AB testing we need to Obtain the following data points.

  1. Full time running the help command on the packaged executable.
  2. Full time running the help command with just the bundled code.
  3. Console.time() certain points in the program to gauge time taken for each step.

From that we can infer the time taken for pkg, commander, and parts of the code.

From there we can get a more detailed point of view by attaching a debugger to the code and running profiling on it. The profiling with be node specific so we can't get a picture of what the pkg stage is doing with this.

@tegefaulkes
Copy link
Contributor Author

Running the built linux executable using time stream-run ./result gives us 0.913s.

❯ time steam-run ./result 
Usage: polykey [options] [command]

Polykey CLI

Options:
  -np, --node-path <path>      Path to Node State (default: "/home/brian/.local/share/polykey", env: PK_NODE_PATH)
  -pf, --password-file <path>  Path to Password
  -f, --format <format>        Output Format (choices: "human", "json", default: "human")
  -v, --verbose                Log Verbose Messages (default: 0)
  -V, --version                output the version number
  -h, --help                   display help for command

Commands:
  bootstrap [options]          Bootstrap Keynode State
  agent [options]              Agent Operations
  nodes [options]              Nodes Operations
  secrets [options]            Secrets Operations
  keys [options]               Keys Operations
  vaults [options]             Vaults Operations
  identities [options]         Identities Operations
  notifications [options]      Notifications Operations
  help [command]               display help for command

real    0m0.913s
user    0m0.821s
sys     0m0.244s

Running the bundled code gives us 0.271s.

❯ time node ./dist/polykey.js
Usage: polykey [options] [command]

Polykey CLI

Options:
  -np, --node-path <path>      Path to Node State (default: "/home/brian/.local/share/polykey", env: PK_NODE_PATH)
  -pf, --password-file <path>  Path to Password
  -f, --format <format>        Output Format (choices: "human", "json", default: "human")
  -v, --verbose                Log Verbose Messages (default: 0)
  -V, --version                output the version number
  -h, --help                   display help for command

Commands:
  bootstrap [options]          Bootstrap Keynode State
  agent [options]              Agent Operations
  nodes [options]              Nodes Operations
  secrets [options]            Secrets Operations
  keys [options]               Keys Operations
  vaults [options]             Vaults Operations
  identities [options]         Identities Operations
  notifications [options]      Notifications Operations
  help [command]               display help for command

real    0m0.271s
user    0m0.287s
sys     0m0.042s

Just the built code with node gives us 0.301s.

❯ time node ./build/polykey.js
Usage: polykey [options] [command]

Polykey CLI

Options:
  -np, --node-path <path>      Path to Node State (default: "/home/brian/.local/share/polykey", env: PK_NODE_PATH)
  -pf, --password-file <path>  Path to Password
  -f, --format <format>        Output Format (choices: "human", "json", default: "human")
  -v, --verbose                Log Verbose Messages (default: 0)
  -V, --version                output the version number
  -h, --help                   display help for command

Commands:
  bootstrap [options]          Bootstrap Keynode State
  agent [options]              Agent Operations
  nodes [options]              Nodes Operations
  secrets [options]            Secrets Operations
  keys [options]               Keys Operations
  vaults [options]             Vaults Operations
  identities [options]         Identities Operations
  notifications [options]      Notifications Operations
  help [command]               display help for command

real    0m0.301s
user    0m0.336s
sys     0m0.062s

As a sanity check, ls takes about 0.009s, and running node that just prints helloworld runs in 0.046s.

❯ time ls
<output redacted>

real    0m0.009s
user    0m0.006s
sys     0m0.004s


❯ time node -e "console.log('helloworld')"
helloworld

real    0m0.046s
user    0m0.034s
sys     0m0.014s

❯ time steam-run ls
ADDITIONAL_TERMS  docs        jest.config.js  nodemon.json  package-lock.json  scripts  tests                tsconfig.json
build             flake.lock  LICENSE         npmDepsHash   README.md          secret   tmp                  utils.nix
dist              flake.nix   node_modules    package.json  result             src      tsconfig.build.json

real    0m0.121s
user    0m0.049s
sys     0m0.077s


So given these results

  1. 0.913s total 0.913
  2. 0.271s bundled only
  3. 0.121 steam-run

It seems most of the time is spent by steam-run + pkg. I'm estimating about 0.642 taken up by both and 0.521 take up by pkg alone.

So moving forward we need to see how we can optimise pkg at runtime.

@tegefaulkes
Copy link
Contributor Author

I did a quick and dirty package of a test.js script that just runs console.log('hello world'); This takes 0.142 seconds.

❯ time steam-run ./packaged
hello world!
steam-run ./packaged  0.07s user 0.08s system 101% cpu 0.142 total

Subtracting the steam-run startup time, we're looking at about 0.021s run time. So pkg can definitely run fast, it seems to be a function of the file size.

@tegefaulkes
Copy link
Contributor Author

tegefaulkes commented Mar 26, 2024

There is a compression option. It defaults to none and switching it to something else seems to increase the time.

None of the pkg options seems useful here. I think the only recourse here is to reduce the size of the of the native binaries in some way.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 26, 2024 via email

@tegefaulkes
Copy link
Contributor Author

base case

real 0m0.923s
user 0m0.839s
sys 0m0.226s

removed --no-bytecode

real 0m0.949s
user 0m0.875s
sys 0m0.221s

removed --no-bytecode and --no-native-build

real 0m0.980s
user 0m0.842s
sys 0m0.290s

No real advantages.

And yes, minified is true and I removed the source maps for esbuild.

@tegefaulkes
Copy link
Contributor Author

Doing manual testing on the office mac, I removed all the extra native modules for platforms we don't need. This brought execution down from 0.72 to 0.57s and the file size down from 100M to 77M

Polykey-CLI % time ./testout
Usage: polykey [options] [command]

Polykey CLI

Options:
  -np, --node-path <path>      Path to Node State (default: "/Users/amyyan/Library/Application Support/polykey",
                               env: PK_NODE_PATH)
  -pf, --password-file <path>  Path to Password
  -f, --format <format>        Output Format (choices: "human", "json", default: "human")
  -v, --verbose                Log Verbose Messages (default: 0)
  -V, --version                output the version number
  -h, --help                   display help for command

Commands:
  bootstrap [options]          Bootstrap Keynode State
  agent [options]              Agent Operations
  nodes [options]              Nodes Operations
  secrets [options]            Secrets Operations
  keys [options]               Keys Operations
  vaults [options]             Vaults Operations
  identities [options]         Identities Operations
  notifications [options]      Notifications Operations
  help [command]               display help for command
./testout  0.57s user 0.06s system 113% cpu 0.562 total

So that's a good chunk right there. While not a full solution, reducing the size of the final binary is definitely something we want to do. So I'll explore this a bit more.

@tegefaulkes
Copy link
Contributor Author

debuglog.txt
pkglog.txt

Here are some debug logs from testing on the mac. The debuglog outlines what is included in the virtualFS. while the pkglog outlines the packaging process.

We can see that there is 28.2MB of data being included. A good chunk of this, 19.8MB is just the quic and db universal binaries. But ignoring the size of the files for a sec, the number of files can have an impact too. So we should look into reducing all kinds of bloat here. Notably the sodium-native is including its full source code which makes up 1.3MB of assorted small files.

Moving forward I'm going to prune more of the unneeded files that are being included but also look into building the db and quic binaries with more optimisation if possible.

@tegefaulkes
Copy link
Contributor Author

Marking all of the optionalDependencies as --external in esbuild is resulting in a bunch of dependencies we don't want being pulled into the vfs. This seems problimatic since it's including stuff like async-init or timer in the vfs but also bundled inside of the polykey.js code. I'm pretty sure maintaining 2 versions of a dependency like this should be causing issues with anything that uses instanceOf. I'm really not sure why this hasn't broken timer.

We really need to be more selective about this. Maybe I can modify it to mark only .node files as external.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Mar 27, 2024 via email

@tegefaulkes
Copy link
Contributor Author

OK, I'll put it on hold for now.

With some testing I got the mac execution down to 0.48s. There is definitely a lot we can do to reduce bloat in the final executable. The @matrixai/db has some issues with how it's structured. It doesn't play nice with esbuild. Given that it's native we need to mark it as external to preserve paths for loading the native modules. But doing so causes us to external all it's dependencies.

I think we need to consider updating @matrixai/db to use the same structure @matrixai/quic uses. That way we can bundle the main code while marking the native dependencies as external with no issue.

The sodium native source code is being included in the packaged vfs as well. So we'll need to exclude that. That alone is adding about 1.8MB to the output.

@CMCDragonkai
Copy link
Member

#84 is the high level issue surrounding this problem. Recommend solving that at higher level.

@tegefaulkes tegefaulkes removed their assignment Jul 31, 2024
@CMCDragonkai CMCDragonkai added the r&d:polykey:supporting activity Supporting core activity label Aug 13, 2024
@CMCDragonkai
Copy link
Member

A few ideas:

  1. Get all native libraries statically linked into the node binary as per Statically Linked version of PK CLI #84
  2. Which requires building node binary from scratch rather than bundling the prebuilt ones from yao/pkg
  3. Conversion to ESM and the ability to use dead code elimination through tree shaking would be important, and should reduce the bundle size quite a bit
  4. Much stronger minification - specifically attempting byte code generation https://nodejs.org/api/single-executable-applications.html#startup-snapshot-support and https://nodejs.org/api/single-executable-applications.html#v8-code-cache-support
  5. Avoid as much unpacking to disk and reloading stuff from disk, usually shared object dynamically loadedcode
  6. Review the PK CLI architecture to ensure that only relevant libraries are loaded and nothing else
  7. The native shared objects should be compiled for optimisation - I don't think they have optimisation enabled, and they are relatively debug related object files.
  8. Evne

Tagging @brynblack and @aryanjassal too.

@CMCDragonkai
Copy link
Member

CMCDragonkai commented Aug 24, 2024

We need to understand the V8 system internals a bit better (node --v8-options). And even optimise the memory utilisation too. Having the HTTP status page MatrixAI/Polykey#412 and improving the MatrixAI/Polykey#635 would be critical to enabling us to do this work empirically instead of just trying random stuff. It needs to be more methodical from this point onwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Standard development r&d:polykey:supporting activity Supporting core activity
Development

No branches or pull requests

2 participants