Exploring custom complexities / 2D pareto fronts #791

MilesCranmer · 2024-12-28T00:35:55Z

MilesCranmer
Dec 28, 2024
Maintainer

Trying out some ideas including from @larsentom in MilesCranmer/SymbolicRegression.jl#364. Since it's not yet possible to have a 2D pareto "surface" instead of a pareto front, I'm currently just adding different complexity measures with different weightings (which, in effect, partially accomplishes the same thing?). Here's one idea that simply adds the depth (times 2) with complexity:

using SymbolicRegression

function my_custom_complexity(tree::AbstractExpression)
    # only do a single tree traversal but get both the
    # depth AND the number of nodes:
    (depth, num_nodes) = tree_mapreduce(
        leaf -> (1, 1),
        branch -> nothing,
        (_, children...) -> (1 + maximum(first, children), 1 + sum(last, children)),
        tree,
    )
    return 2 * depth + num_nodes
end

model = SymbolicRegression.SRRegressor(;
    binary_operators=[+, *, /, -],
    complexity_mapping=my_custom_complexity,
    maxsize=50,
)

It gives interesting differences in the search result compared to simply counting the number of nodes. Maybe this sort of thing is worth trying during hyperparameter tuning.

Since we can't yet do 2D pareto surfaces, I think a good alternative is simply adding up the various complexity metrics with different weightings, and using a larger maxsize. If the 1D pareto front is fully explored, this should in effect also explore more of the 2D pareto surface.

I'd be interested in hearing any ideas from @folivetti too!

folivetti · 2024-12-28T16:23:18Z

folivetti
Dec 28, 2024

what is the objective here? To improve diversity?

I had some good initial results exploring fitness sharing here https://dl.acm.org/doi/abs/10.1145/3583133.3590525

The key element here is how I measure the similarity, I compare the predictions of two models outside the boundaries of the training data. I'm assuming that all good solutions will have similar predictions in the training points (to minimize the MSE), they can differ in how they behave outside that boundary.

Bogdan's hash can also help to identify equivalent structures (tho it is inexact) and can also be used to measure similarity.

Now, if you want different ways to measure complexity, I like Kommenda's approach. He published a study in using different measures in MOO https://link.springer.com/chapter/10.1007/978-3-319-27340-2_51

0 replies

pukpr · 2024-12-29T01:48:30Z

pukpr
Dec 29, 2024

Regarding complexity measures/metrcs, this one I'm experimenting with called complexity invariant-distance (CID) is really simple to implement, A variation that builds off the correlation coefficient

CID = CC * min(Length(Model), Length(Data))/ max(Length(Model), Length(Data))

It essentially tries to stretch a model to match the complexity in the data

My experiments with it here:
https://geoenergymath.com/2024/12/24/qbo-metrics/

0 replies

MilesCranmer · 2024-12-29T02:42:23Z

MilesCranmer
Dec 29, 2024
Maintainer Author

@folivetti Yeah, to improve diversity in the hall of fame. Whenever I do hyperparameter tuning, it seems the algorithm really likes to resample from the hall of fame (which stores a pareto front of expressions). So I wanted a way to allow the hall of fame to store alternates.

I remember someone mentioning that many ways they had tried to implement a "thick" pareto front resulted in purely genotypic differences in population members with negligible phenotypic differences (like x * 2 vs x / 0.5). So I'm trying to find a way to improve diversity in phenotype specify. I remember Hod Lipson once gave a talk where he mentioned they had good results from keeping separate Pareto fronts for random subsets of the full dataset. So I wonder if that might be an option, rather than imposing some sort of complexity differential? Thanks for the links btw.

Thanks @pukpr.

0 replies

folivetti · 2025-01-03T17:31:27Z

folivetti
Jan 3, 2025

IMO the Pareto front may not be the best place to seek diversity as it will restrict to one choice per complexity level, so when facing two equally good alternatives with the same complexity it will pick one and discard (or hide away) the other.

Maybe a way to approach is keeping a list of distinct, but equally good solutions, per complexity level?
Assuming a discrete complexity level (like the number of nodes), for each level from 1 to max-size, you can keep the list of solutions with the best accuracy by epsilon-dominance. Then you can apply my idea of dissimilarity measure to discard all models that behave the same.

1 reply

MilesCranmer Jan 3, 2025
Maintainer Author

Very cool idea!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exploring custom complexities / 2D pareto fronts #791

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Exploring custom complexities / 2D pareto fronts #791

MilesCranmer Dec 28, 2024 Maintainer

Replies: 4 comments · 1 reply

folivetti Dec 28, 2024

pukpr Dec 29, 2024

MilesCranmer Dec 29, 2024 Maintainer Author

folivetti Jan 3, 2025

MilesCranmer Jan 3, 2025 Maintainer Author

MilesCranmer
Dec 28, 2024
Maintainer

Replies: 4 comments 1 reply

folivetti
Dec 28, 2024

pukpr
Dec 29, 2024

MilesCranmer
Dec 29, 2024
Maintainer Author

folivetti
Jan 3, 2025

MilesCranmer Jan 3, 2025
Maintainer Author