Reducing startup times #204

Quarlos · 2022-10-12T23:55:56Z

Quarlos
Oct 12, 2022

Hi Miles,

Thanks for putting all of this together. Great work!

I’ve been using PySR to research several problems at the same time: different datasets, different search space, different params.
Every time, I need to launch a new PySR instance so it loads all the appropriate params. And every time PySR makes a call to launch julia and load all necessary libraries. Launching julia generally takes a long time — compared to how long SymbolicRegression.jl generally runs once it’s loaded. On my machine it takes about 10s for the symbolic regression algorithm to run, and about a 1-2mins to launch julia. Most of the time there’s some issue with the features I pass in and/or I want to try different params once it starts. If the machine is loaded with some other processes, launching julia can take 10x longer.

I’m not familiar with julia, so at the moment I’m really stuck with the python frontend.

Two questions:

Is there a way around having to launch julia every time?
If I am not mistaken the importance of julia here is so that the symbolic regression algorithm can be better parallelized among cores within same machine. Having access to several machines, it seems it would be easier if there was an all-python PySR (without julia backend), with parallelization taken care of on the user side. Is there a chance of making available an all-python based PySR? If not, could you suggest an alternative to parallelize among machines without having to recourse to julia?

Thanks!

MilesCranmer · 2022-10-13T01:06:48Z

MilesCranmer
Oct 13, 2022
Maintainer

Hi @Quarlos,
Thanks for the questions. I am surprised it takes that long to startup Julia on your machine - you should post about this on the Julia help forums maybe. For me, Julia (itself) only takes a couple of seconds to start, and maybe 5-10 seconds for a PySR search to start while it is compiling everything.

In general, to get faster startups, you have a few options:

Don't close Python in between runs. When you quit Python, you also quit Julia (via PyJulia) and the compiled functions in SymbolicRegression.jl are thrown out (Julia is a just-in-time compiled language). But if you simply use the same Python instance, and don't quit, then the startup time should be pretty small. For different search options, simply create a new PySRRegressor object.
Make sure your PySR and Julia versions are up-to-date. Generally the most recent version of Julia is always better at precompiling stuff, and also faster at JITing in general.
(harder) Something I've thought about is using PackageCompiler.jl to save cached SymbolicRegression.jl functions: https://github.com/JuliaLang/PackageCompiler.jl. Maybe there's a way to integrate this into PySR but I haven't really looked into this yet. You'd probably want to go into the PyJulia docs to see if there is a way to load a specific shared library.

If I am not mistaken the importance of julia here is so that the symbolic regression algorithm can be better parallelized among cores within same machine.

Using Julia for a backend is much more than just parallelism. Julia basically gets you the same performance as C: https://julialang.org/benchmarks/ while still being a high-level language, so it's just as easy to develop as a Python package. It took me cumulatively about ~1 year of full time work to write the current backend in Julia, whereas if I had wrote the backend in C++, it would probably have taken me 5+ years! (I used to do things like this in C++ 🙂)

Julia also has a lot of meta-programming tricks that let you generate and compile code on the fly. For example, you can define a custom operator or loss function in PySR by passing a string, and it will be compiled into a fast SIMD kernel in the Julia backend. That is something you simply cannot do with a statically compiled language like C.

Having access to several machines, it seems it would be easier if there was an all-python PySR (without julia backend), with parallelization taken care of on the user side.

Why not pure Python: speed. Genetic algorithms are pretty simple and the only way to get good results is just make every part of them - expression evaluation, constant optimization, mutations, crossovers, migration, etc. - really really fast so they can churn through billions of expressions.

For example, let's compare the dynamic expression evaluation of numpy (C-backend) with SymbolicRegression.jl. Don't miss the emphasis on dynamic - this expression itself is not allowed to be compiled; only the operators.

First, here's numpy (importantly: we need to pass a string defining the function here so that it doesn't cache the tokenization!)

import numpy as np

def f(X, expr_string):
    return eval(expr_string)

expr_string = "np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] +  np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2]"
X = np.random.randn(300, 5)

%%timeit
f(X, expr_string)
# 156 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Next, let's see SymbolicRegression.jl:

using SymbolicRegression
using BenchmarkTools

# Define the enum over operators:
options = Options(binary_operators=(*, -, +, /), unary_operators=(square, cos, exp))

x1, x2, x3, x4, x5 = Node("x1"), Node("x2"), Node("x3"), Node("x4"), Node("x5")

equation = cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 

X = randn(Float64, 5, 300)  # Note that SymbolicRegression.jl has columns first

@btime eval_tree_array(equation, X, options)
#  45.334 μs (137 allocations: 69.92 KiB)

And this is comparing against numpy, which already has a C backend! The point I am trying to make is that the pure Python overhead here really hurts you, even for cases like this. For all other parts of the genetic algorithm, when you are comparing against pure-Python code, the speedup would be even more apparent.

Why not only use Julia: compatibility. The vast majority of machine learning packages are in Python, so it just makes sense to have a frontend here. Most ML packages have C backends with Python frontends, so PySR is pretty similar in that sense; it just has its backend written in a high-level language instead of low-level (for productivity reasons).

Cheers,
Miles

0 replies

MilesCranmer · 2022-10-13T01:16:59Z

MilesCranmer
Oct 13, 2022
Maintainer

Oh, and for using PySR over several machines at once, check out the cluster_manager parameter for PySRRegressor! I regularly use PySR over up to ~45 different machines at once (i.e., for a single search), which is 1000s of cores, and it's quite stable.

0 replies

MilesCranmer · 2022-10-13T01:20:44Z

MilesCranmer
Oct 13, 2022
Maintainer

One other idea if you are open to messing around with Julia code. There is a package called SnoopCompile.jl which I've been meaning to try out with SymbolicRegression.jl. Using this would basically let the Julia precompilation (which happens at install time, rather than every runtime) do a much better job at compiling necessary functions. That would probably make startup time much faster.

0 replies

Quarlos · 2022-10-13T10:06:41Z

Quarlos
Oct 13, 2022
Author

Hi @MilesCranmer

Thanks for the detailed answer, particularly, on the point of julia, and how important that is here for essentially being able to pass in custom operators on the fly with enough compilation/evaluation speed.

Unfortunately, option 1 doesn’t work well with how my system is set up. I preprocess the data through some other non-python process and pipe that to a wrapper of PySR so I can adjust operators and loss according to data. Sometimes that may also be done by different users needing different python libraries to be loaded. So, I guess I’ll explore PackageCompiler.jl and SnoopCompile.jl as you suggested.

Also, thanks for pointing to the cluster_manager param. I’ll look into that as well.

Best,
Carlos

0 replies

Quarlos · 2022-10-13T10:43:15Z

Quarlos
Oct 13, 2022
Author

In fact, I had already explored a bit the PackageCompiler.jl, and it did speed up things a bit -- that's how I got from ~2.5min launching to ~1min launching. Posting it here in case someone finds it helpful and/or can suggest a more appropriate way of doing this.

I created a precompiled julia system image as:
python3 -m julia.sysimage /PATH/TO/IMAGE/sys.so

And then on the julia_helper.py in the function init_julia I force it to go through the image as:

try:
        from julia.api import LibJulia
        api = LibJulia.load()
        api.sysimage = "/PATH/TO/IMAGE/sys.so"
        api.init_julia()
        from julia import Main as _Main

        Main = _Main
except UnsupportedPythonError:
        # Static python binary, so we turn off pre-compiled modules.
        from julia.core import Julia

        jl = Julia(compiled_modules=False)
        from julia import Main as _Main

        Main = _Main

julia_initialized = True
return Main

1 reply

MilesCranmer Oct 13, 2022
Maintainer

I think you might be able to get further setup from this if you run a PySR search before creating the sysimage? As far as I understand, it saves your current Julia state - meaning that if you had ran an equation search with SymbolicRegression.jl beforehand - all of the cached functions would be saved to the sysimage.

MilesCranmer · 2022-10-13T13:59:20Z

MilesCranmer
Oct 13, 2022
Maintainer

Thanks for posting the part about using PackageCompiler.jl! That's very useful to know. I wonder how hard it would be to set that up to be done automatically as part of the install step. If you figure out how to do this, feel free to submit a PR!

Unfortunately, option 1 doesn’t work well with how my system is set up. I preprocess the data through some other non-python process and pipe that to a wrapper of PySR so I can adjust operators and loss according to data. Sometimes that may also be done by different users needing different python libraries to be loaded. So, I guess I’ll explore PackageCompiler.jl and SnoopCompile.jl as you suggested.

Thanks for explaining this use-case. I think one other thing you might want to try is DaemonMode.jl. It basically runs a Julia process in the background, which you execute things on from other Julia processes. What this basically means is that you might (?) be able to connect and execute code on the same Julia process from different Python processes, and use the same cached functions. I wonder if that could solve your issue.

3 replies

Quarlos Oct 16, 2022
Author

This sounds like it could be a solution. Once the julia server is running in the background, somehow Main in sr.py should be making calls to juliaclient instead of julia, and then everything should work as is. Need to understand how the .eval function works. Maybe you know some useful pointers I can look into?

MilesCranmer Oct 16, 2022
Maintainer

Those packages are defined at https://github.com/JuliaPy/pyjulia and https://github.com/JuliaPy/PyCall.jl

Quarlos Oct 16, 2022
Author

Thanks! Will have a look at them.

MilesCranmer · 2022-10-13T14:36:49Z

MilesCranmer
Oct 13, 2022
Maintainer

For the PackageCompiler.jl, it looks like you can pass the argument --script=my_script.jl to create a pre-defined sysimage. This is the default script: https://github.com/JuliaPy/pyjulia/blob/master/src/julia/precompile.jl - literally just using PyCall.

So, I think if you basically create a script with this instead:

using PyCall
using SymbolicRegression
X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2
options = SymbolicRegression.Options(;
    binary_operators=(+, *, /, -),
    unary_operators=(cos, exp),
    verbosity=0,
    max_evals=1
)
hall_of_fame = EquationSearch(X, y; options=options, multithreading=true)

and pass that in the --script argument. You also need to pass --compiler-env to a Julia project with both SymbolicRegression and PyCall. I think that would help quite a bit in reducing startup time?

11 replies

Quarlos Oct 17, 2022
Author

Hm, I don't see any @precompile_setup in src/SymbolicRegression.jl. Is this something specific to julia language I should be aware of?

MilesCranmer Oct 17, 2022
Maintainer

It's in the pull request: MilesCranmer/SymbolicRegression.jl#142, not in the master branch.

Quarlos Oct 17, 2022
Author

Duh, you even did all the work already putting it all together. Thanks! However, I’m still struggling with julia’s package flow. I locally replace the 4 files you’re suggesting updating, then create a sysimage with a script using SymbolicRegression, but it always fails unless I Pkg.add(“SymbolicRegression"), which ends up using the committed code.

MilesCranmer Oct 18, 2022
Maintainer

You shouldn't do Pkg.add("SymbolicRegression") - this will download the stable version rather than the dev copy. To use the local version, you need to checkout the GitHub pull request, and activate the Julia project in the SymbolicRegression.jl directory.

You can activate a Julia project with either

Environment variable JULIA_PROJECT="PATH/TO/MY/SYMBOLICREGRESSIONJL"
Start Julia with --project=...
Run Pkg.activate("...")

Quarlos Oct 26, 2022
Author

Hi Miles, thanks for your help. Frustratingly, I’m still unable to create a working image on my system from the dev copy. The actual image is created, but calling pysr from python through the image fails because apparently SymbolicRegression failed to precompile. I don’t think this has anything to do with the dev copy, since I can’t make it work anymore from the master copy. Luckily I have the old stable image that works, even if slowly. I think, in my case, the issue has to do with the env vars PATH, PYTHONPATH, JULIA_DEPOT_PATH, _PYJULIA_JULIA_PY, and also how to properly set the julia environments, but I’ve already spent so much time trying to figure this out, that I’ll have to table it for awhile.

MilesCranmer · 2022-10-23T01:01:25Z

MilesCranmer
Oct 23, 2022
Maintainer

Note for myself: the following script builds a decent sysimage:

PYSR_VERSION=0.11.4
SHARED_LIB="/Users/mcranmer/symbolic_regression.so"
julia -O3 --threads=auto -e 'using Pkg; Pkg.activate("pysr-'$PYSR_VERSION'"; shared=true); Pkg.add(["PackageCompiler", "PyCall", "Conda", "VersionParsing"]); open("/tmp/sr_precompile_file.jl", "w") do io; write(io, "import PyCall; include(joinpath(pkgdir(PyCall), \"test\", \"runtests.jl\")); using SymbolicRegression; X=randn(Float64, 5, 100); y=X[1, :] .+ 5.2 .* cos.(X[3, :] .* 1.2); options=Options(verbosity=0); EquationSearch(X, y; options=options, multithreading=true)"); end; using PackageCompiler; using SymbolicRegression; create_sysimage(["SymbolicRegression"]; sysimage_path="'$SHARED_LIB'", precompile_execution_file="/tmp/sr_precompile_file.jl")'

Then, one can simply run the following before starting a PySR run:

from julia.api import LibJulia
api = LibJulia.load()
api.sysimage = "/Users/mcranmer/symbolic_regression.so"
api.init_julia()

This should be much more effective once MilesCranmer/SymbolicRegression.jl#147 gets merged, as changing the operators will not cause the library to be re-compiled, only the expression evaluation scheme (which is small anyways).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reducing startup times #204

{{title}}

Replies: 8 comments 15 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Reducing startup times #204

Quarlos Oct 12, 2022

Replies: 8 comments · 15 replies

MilesCranmer Oct 13, 2022 Maintainer

MilesCranmer Oct 13, 2022 Maintainer

MilesCranmer Oct 13, 2022 Maintainer

Quarlos Oct 13, 2022 Author

Quarlos Oct 13, 2022 Author

MilesCranmer Oct 13, 2022 Maintainer

MilesCranmer Oct 13, 2022 Maintainer

Quarlos Oct 16, 2022 Author

MilesCranmer Oct 16, 2022 Maintainer

Quarlos Oct 16, 2022 Author

MilesCranmer Oct 13, 2022 Maintainer

Quarlos Oct 17, 2022 Author

MilesCranmer Oct 17, 2022 Maintainer

Quarlos Oct 17, 2022 Author

MilesCranmer Oct 18, 2022 Maintainer

Quarlos Oct 26, 2022 Author

MilesCranmer Oct 23, 2022 Maintainer

Quarlos
Oct 12, 2022

Replies: 8 comments 15 replies

MilesCranmer
Oct 13, 2022
Maintainer

MilesCranmer
Oct 13, 2022
Maintainer

MilesCranmer
Oct 13, 2022
Maintainer

Quarlos
Oct 13, 2022
Author

Quarlos
Oct 13, 2022
Author

MilesCranmer Oct 13, 2022
Maintainer

MilesCranmer
Oct 13, 2022
Maintainer

Quarlos Oct 16, 2022
Author

MilesCranmer Oct 16, 2022
Maintainer

Quarlos Oct 16, 2022
Author

MilesCranmer
Oct 13, 2022
Maintainer

Quarlos Oct 17, 2022
Author

MilesCranmer Oct 17, 2022
Maintainer

Quarlos Oct 17, 2022
Author

MilesCranmer Oct 18, 2022
Maintainer

Quarlos Oct 26, 2022
Author

MilesCranmer
Oct 23, 2022
Maintainer