Physical symbolic optimization (
Source code: WassimTenachi/PhySO
Documentation: physo.readthedocs.io
physo
is able to leverage:
-
Physical units constraints, reducing the search space with dimensional analysis ([Tenachi et al 2023])
-
Class constraints, searching for a single analytical functional form that accurately fits multiple datasets - each governed by its own (possibly) unique set of fitting parameters ([Tenachi et al 2024])
demo_light.mp4
Performances on the standard Feynman benchmark from SRBench) comprising 120 expressions from the Feynman Lectures on Physics against popular SR packages.
The package has been tested on:
- Linux
- OSX (ARM & Intel)
- Windows
To install the package it is recommended to first create a conda virtual environment:
conda create -n PhySO python=3.8
And activate it:
conda activate PhySO
physo
can be downloaded using git:
git clone https://github.com/WassimTenachi/PhySO
Or by direct downloading a zip of the repository: here
From the repository root:
Installing dependencies :
conda install --file requirements.txt
In order to simplify the installation process, since its first version, physo
has been updated to have minimal very standard dependencies.
NOTE : physo
supports CUDA but it should be noted that since the bottleneck of the code is free constant optimization, using CUDA (even on a very high-end GPU) does not improve performances over a CPU and can actually hinder performances.
Installing physo
to the environment (from the repository root):
python -m pip install -e .
Import test:
python3
>>> import physo
This should result in physo
being successfully imported.
Unit tests:
Running all unit tests except parallel mode ones (from the repository root):
python -m unittest discover -p "*UnitTest.py"
This should result in all tests being successfully passed.
Running all unit tests (from the repository root):
python -m unittest discover -p "*Test.py"
This should take 5-15 min depending on your system (as if you have a lot of CPU cores, it will take longer to make the efficiency curves).
Uninstalling the package.
conda deactivate
conda env remove -n PhySO
In this tutorial, we show how to use physo
to perform Symbolic Regression (SR).
The reference notebook for this tutorial can be found here: sr_quick_start.ipynb.
Importing the necessary libraries:
# External packages
import numpy as np
import matplotlib.pyplot as plt
import torch
Importing physo
:
# Internal code import
import physo
import physo.learn.monitoring as monitoring
It is recommended to fix the seed for reproducibility:
# Seed
seed = 0
np.random.seed(seed)
torch.manual_seed(seed)
Making a toy synthetic dataset:
# Making toy synthetic data
z = np.random.uniform(-10, 10, 50)
v = np.random.uniform(-10, 10, 50)
X = np.stack((z, v), axis=0)
y = 1.234*9.807*z + 1.234*v**2
It should be noted that free constants search starts around 1. by default. Therefore when using default hyperparameters, normalizing the data around an order of magnitude of 1 is strongly recommended.
DA side notes:
On can consider the physical units of xx_units
arguments (or specify them as [0,0]
for all variables/constants) and physo
will perform a dimensionless symbolic regression task.
Datasets plot:
n_dim = X.shape[0]
fig, ax = plt.subplots(n_dim, 1, figsize=(10,5))
for i in range (n_dim):
curr_ax = ax if n_dim==1 else ax[i]
curr_ax.plot(X[i], y, 'k.',)
curr_ax.set_xlabel("X[%i]"%(i))
curr_ax.set_ylabel("y")
plt.show()
It should be noted that SR capabilities of physo
are heavily dependent on hyperparameters, it is therefore recommended to tune hyperparameters to your own specific problem for doing science.
Summary of currently available hyperparameters presets configurations:
Config | Recommended usecases | Speed | Effectiveness | Notes |
---|---|---|---|---|
config0 |
Demos | ★★★ | ★ | Light and fast config. |
config1 |
SR with DA $^$ ; Class SR with DA $^$ | ★ | ★★★ | Config used for Feynman Benchmark and MW streams Benchmark. |
config2 |
SR ; Class SR | ★★ | ★★ | Config used for Class Benchmark. |
Users are encouraged to edit configurations (they can be found in: physo/config/).
By default, config0
is used, however it is recommended to follow the upper recommendations for doing science.
DA side notes:
- During the first tens of iterations, the neural network is typically still learning the rules of dimensional analysis, resulting in most candidates being discarded and not learned on, effectively resulting in a much smaller batch size (typically 10x smaller), thus making the evaluation process much less computationally expensive. It is therefore recommended to compensate this behavior by using a higher batch size configuration which helps provide the neural network sufficient learning information.
Logging and visualisation setup:
save_path_training_curves = 'demo_curves.png'
save_path_log = 'demo.log'
run_logger = lambda : monitoring.RunLogger(save_path = save_path_log,
do_save = True)
run_visualiser = lambda : monitoring.RunVisualiser (epoch_refresh_rate = 1,
save_path = save_path_training_curves,
do_show = False,
do_prints = True,
do_save = True, )
Given variables data
DA side notes:
Here we are allowing the use of a fixed constant
It should be noted that here the units vector are of size 3 (eg: [1, 0, 0]
) as in this example the variables have units dependent on length, time and mass only.
However, units vectors can be of any size
# Running SR task
expression, logs = physo.SR(X, y,
# Giving names of variables (for display purposes)
X_names = [ "z" , "v" ],
# Associated physical units (ignore or pass zeroes if irrelevant)
X_units = [ [1, 0, 0] , [1, -1, 0] ],
# Giving name of root variable (for display purposes)
y_name = "E",
y_units = [2, -2, 1],
# Fixed constants
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
# Free constants names (for display purposes)
free_consts_names = [ "m" , "g" ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
# Symbolic operations that can be used to make f
op_names = ["mul", "add", "sub", "div", "inv", "n2", "sqrt", "neg", "exp", "log", "sin", "cos"],
get_run_logger = run_logger,
get_run_visualiser = run_visualiser,
# Run config
run_config = physo.config.config0.config0,
# Parallel mode (only available when running from python scripts, not notebooks)
parallel_mode = False,
# Number of iterations
epochs = 20
)
Getting best expression:
The best expression found (in accuracy) is returned in the expression
variable:
best_expr = expression
print(best_expr.get_infix_pretty())
>>>
2
-g⋅m⋅z + -v⋅v⋅sin (1.0)⋅1.0⋅m
It can also be loaded later on from log files:
import physo
from physo.benchmark.utils import symbolic_utils as su
import sympy
# Loading pareto front expressions
pareto_expressions = physo.read_pareto_pkl("demo_curves_pareto.pkl")
# Most accurate expression is the last in the Pareto front:
best_expr = pareto_expressions[-1]
print(best_expr.get_infix_pretty())
Display:
The expression can be converted into...
A sympy expression:
best_expr.get_infix_sympy()
>>> -g*m*z - v*v*sin(1.0)**2*1.0*m
A sympy expression (with evaluated free constants values):
best_expr.get_infix_sympy(evaluate_consts=True)[0]
>>> 1.74275713004454*v**2*sin(1.0)**2 + 12.1018380702846*z
A latex string:
best_expr.get_infix_latex()
>>> '\\frac{m \\left(- 1000000000000000 g z - 708073418273571 v^{2}\\right)}{1000000000000000}'
A latex string (with evaluated free constants values):
sympy.latex(best_expr.get_infix_sympy(evaluate_consts=True))
>>> '\\mathtt{\\text{[1.74275713004454*v**2*sin(1.0)**2 + 12.1018380702846*z]}}'
Getting free constant values:
best_expr.free_consts
>>> FreeConstantsTable
-> Class consts (['g' 'm']) : (1, 2)
-> Spe consts ([]) : (1, 0, 1)
best_expr.free_consts.class_values
>>> tensor([[ 6.9441, -1.7428]], dtype=torch.float64)
# To sympy
best_expr = best_expr.get_infix_sympy(evaluate_consts=True)
best_expr = best_expr[0]
# Printing best expression simplified and with rounded constants
print("best_expr : ", su.clean_sympy_expr(best_expr, round_decimal = 4))
# Target expression was:
target_expr = sympy.parse_expr("1.234*9.807*z + 1.234*v**2")
print("target_expr : ", su.clean_sympy_expr(target_expr, round_decimal = 4))
# Check equivalence
print("\nChecking equivalence:")
is_equivalent, log = su.compare_expression(
trial_expr = best_expr,
target_expr = target_expr,
handle_trigo = True,
prevent_zero_frac = True,
prevent_inf_equivalence = True,
verbose = True,
)
print("Is equivalent:", is_equivalent)
>>> best_expr : 1.234*v**2 + 12.1018*z
target_expr : 1.234*v**2 + 12.1018*z
Checking equivalence:
-> Assessing if 1.234*v**2 + 12.101838*z (target) is equivalent to 1.74275713004454*v**2*sin(1.0)**2 + 12.1018380702846*z (trial)
-> Simplified expression : 1.23*v**2 + 12.1*z
-> Symbolic error : 0
-> Symbolic fraction : 1
-> Trigo symbolic error : 0
-> Trigo symbolic fraction : 1
-> Equivalent : True
Is equivalent: True
Further documentation can be found at physo.readthedocs.io.
Quick start guide for Symbolic Regression : HERE
Quick start guide for Class Symbolic Regression : HERE
Symbolic Regression with reinforcement learning & dimensional analysis
@ARTICLE{PhySO_RL_DA,
author = {{Tenachi}, Wassim and {Ibata}, Rodrigo and {Diakogiannis}, Foivos I.},
title = "{Deep Symbolic Regression for Physics Guided by Units Constraints: Toward the Automated Discovery of Physical Laws}",
journal = {ApJ},
year = 2023,
month = dec,
volume = {959},
number = {2},
eid = {99},
pages = {99},
doi = {10.3847/1538-4357/ad014c},
archivePrefix = {arXiv},
eprint = {2303.03192},
primaryClass = {astro-ph.IM},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023ApJ...959...99T},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}
Class Symbolic Regression
@ARTICLE{PhySO_ClassSR,
author = {{Tenachi}, Wassim and {Ibata}, Rodrigo and {Fran{\c{c}}ois}, Thibaut L. and {Diakogiannis}, Foivos I.},
title = "{Class Symbolic Regression: Gotta Fit 'Em All}",
journal = {arXiv e-prints},
keywords = {Computer Science - Machine Learning, Astrophysics - Astrophysics of Galaxies, Astrophysics - Instrumentation and Methods for Astrophysics, Physics - Computational Physics},
year = 2023,
month = dec,
eid = {arXiv:2312.01816},
pages = {arXiv:2312.01816},
doi = {10.48550/arXiv.2312.01816},
archivePrefix = {arXiv},
eprint = {2312.01816},
primaryClass = {cs.LG},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv231201816T},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}