Replies: 8 comments 15 replies
-
Hi @Quarlos, In general, to get faster startups, you have a few options:
Using Julia for a backend is much more than just parallelism. Julia basically gets you the same performance as C: https://julialang.org/benchmarks/ while still being a high-level language, so it's just as easy to develop as a Python package. It took me cumulatively about ~1 year of full time work to write the current backend in Julia, whereas if I had wrote the backend in C++, it would probably have taken me 5+ years! (I used to do things like this in C++ 🙂) Julia also has a lot of meta-programming tricks that let you generate and compile code on the fly. For example, you can define a custom operator or loss function in PySR by passing a string, and it will be compiled into a fast SIMD kernel in the Julia backend. That is something you simply cannot do with a statically compiled language like C.
Why not pure Python: speed. Genetic algorithms are pretty simple and the only way to get good results is just make every part of them - expression evaluation, constant optimization, mutations, crossovers, migration, etc. - really really fast so they can churn through billions of expressions. For example, let's compare the dynamic expression evaluation of numpy (C-backend) with SymbolicRegression.jl. Don't miss the emphasis on dynamic - this expression itself is not allowed to be compiled; only the operators. First, here's numpy (importantly: we need to pass a string defining the function here so that it doesn't cache the tokenization!) import numpy as np
def f(X, expr_string):
return eval(expr_string)
expr_string = "np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2] + np.cos(X[:, 0] * 3.2 - 1.5) * np.exp(- X[:, 3] ** 2) - X[:, 4] * X[:, 2]"
X = np.random.randn(300, 5)
%%timeit
f(X, expr_string)
# 156 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each) Next, let's see SymbolicRegression.jl: using SymbolicRegression
using BenchmarkTools
# Define the enum over operators:
options = Options(binary_operators=(*, -, +, /), unary_operators=(square, cos, exp))
x1, x2, x3, x4, x5 = Node("x1"), Node("x2"), Node("x3"), Node("x4"), Node("x5")
equation = cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3 + cos(x1 * 3.2 - 1.5) * exp(-1.0 * square(x4)) - x5 * x3
X = randn(Float64, 5, 300) # Note that SymbolicRegression.jl has columns first
@btime eval_tree_array(equation, X, options)
# 45.334 μs (137 allocations: 69.92 KiB) And this is comparing against numpy, which already has a C backend! The point I am trying to make is that the pure Python overhead here really hurts you, even for cases like this. For all other parts of the genetic algorithm, when you are comparing against pure-Python code, the speedup would be even more apparent. Why not only use Julia: compatibility. The vast majority of machine learning packages are in Python, so it just makes sense to have a frontend here. Most ML packages have C backends with Python frontends, so PySR is pretty similar in that sense; it just has its backend written in a high-level language instead of low-level (for productivity reasons). Cheers, |
Beta Was this translation helpful? Give feedback.
-
Oh, and for using PySR over several machines at once, check out the |
Beta Was this translation helpful? Give feedback.
-
One other idea if you are open to messing around with Julia code. There is a package called SnoopCompile.jl which I've been meaning to try out with SymbolicRegression.jl. Using this would basically let the Julia precompilation (which happens at install time, rather than every runtime) do a much better job at compiling necessary functions. That would probably make startup time much faster. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed answer, particularly, on the point of julia, and how important that is here for essentially being able to pass in custom operators on the fly with enough compilation/evaluation speed. Unfortunately, option 1 doesn’t work well with how my system is set up. I preprocess the data through some other non-python process and pipe that to a wrapper of PySR so I can adjust operators and loss according to data. Sometimes that may also be done by different users needing different python libraries to be loaded. So, I guess I’ll explore PackageCompiler.jl and SnoopCompile.jl as you suggested. Also, thanks for pointing to the cluster_manager param. I’ll look into that as well. Best, |
Beta Was this translation helpful? Give feedback.
-
In fact, I had already explored a bit the PackageCompiler.jl, and it did speed up things a bit -- that's how I got from ~2.5min launching to ~1min launching. Posting it here in case someone finds it helpful and/or can suggest a more appropriate way of doing this. I created a precompiled julia system image as: And then on the julia_helper.py in the function init_julia I force it to go through the image as:
|
Beta Was this translation helpful? Give feedback.
-
Thanks for posting the part about using PackageCompiler.jl! That's very useful to know. I wonder how hard it would be to set that up to be done automatically as part of the install step. If you figure out how to do this, feel free to submit a PR!
Thanks for explaining this use-case. I think one other thing you might want to try is DaemonMode.jl. It basically runs a Julia process in the background, which you execute things on from other Julia processes. What this basically means is that you might (?) be able to connect and execute code on the same Julia process from different Python processes, and use the same cached functions. I wonder if that could solve your issue. |
Beta Was this translation helpful? Give feedback.
-
For the PackageCompiler.jl, it looks like you can pass the argument So, I think if you basically create a script with this instead: using PyCall
using SymbolicRegression
X = randn(Float32, 5, 100)
y = 2 * cos.(X[4, :]) + X[1, :] .^ 2 .- 2
options = SymbolicRegression.Options(;
binary_operators=(+, *, /, -),
unary_operators=(cos, exp),
verbosity=0,
max_evals=1
)
hall_of_fame = EquationSearch(X, y; options=options, multithreading=true) and pass that in the |
Beta Was this translation helpful? Give feedback.
-
Note for myself: the following script builds a decent sysimage: PYSR_VERSION=0.11.4
SHARED_LIB="/Users/mcranmer/symbolic_regression.so"
julia -O3 --threads=auto -e 'using Pkg; Pkg.activate("pysr-'$PYSR_VERSION'"; shared=true); Pkg.add(["PackageCompiler", "PyCall", "Conda", "VersionParsing"]); open("/tmp/sr_precompile_file.jl", "w") do io; write(io, "import PyCall; include(joinpath(pkgdir(PyCall), \"test\", \"runtests.jl\")); using SymbolicRegression; X=randn(Float64, 5, 100); y=X[1, :] .+ 5.2 .* cos.(X[3, :] .* 1.2); options=Options(verbosity=0); EquationSearch(X, y; options=options, multithreading=true)"); end; using PackageCompiler; using SymbolicRegression; create_sysimage(["SymbolicRegression"]; sysimage_path="'$SHARED_LIB'", precompile_execution_file="/tmp/sr_precompile_file.jl")' Then, one can simply run the following before starting a PySR run: from julia.api import LibJulia
api = LibJulia.load()
api.sysimage = "/Users/mcranmer/symbolic_regression.so"
api.init_julia() This should be much more effective once MilesCranmer/SymbolicRegression.jl#147 gets merged, as changing the operators will not cause the library to be re-compiled, only the expression evaluation scheme (which is small anyways). |
Beta Was this translation helpful? Give feedback.
-
Hi Miles,
Thanks for putting all of this together. Great work!
I’ve been using PySR to research several problems at the same time: different datasets, different search space, different params.
Every time, I need to launch a new PySR instance so it loads all the appropriate params. And every time PySR makes a call to launch julia and load all necessary libraries. Launching julia generally takes a long time — compared to how long SymbolicRegression.jl generally runs once it’s loaded. On my machine it takes about 10s for the symbolic regression algorithm to run, and about a 1-2mins to launch julia. Most of the time there’s some issue with the features I pass in and/or I want to try different params once it starts. If the machine is loaded with some other processes, launching julia can take 10x longer.
I’m not familiar with julia, so at the moment I’m really stuck with the python frontend.
Two questions:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions