Speed up statistics #7390

jdunkerley · 2023-07-25T09:16:24Z

Pull Request Description

Allow parse_to_columns to take a Regex object.
Add pattern to the Regex object.
Add column_names to the Row object.
Improve statistics performance.
Add benchmarks for stats.

Benchmark	Reference	New	Improvement
Max (by reduce)	16.4ms	16.3ms	-
Max (stats)	703ms	224ms	68%
Sum (by reduce)	38ms	38ms	-
Sum (stats)	753ms	420ms	44%
Variance (stats)	745ms	553s	26%

Also tried using a Ref approach for stats but as slower (7e13c45).

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

The documentation has been updated, if necessary.
Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
All code follows the
Scala,
Java,
and
Rust
style guides. In case you are using a language not listed above, follow the Rust style guide.
All code has been tested:
- Unit tests have been written where possible.
- If GUI codebase was changed, the GUI was tested when built using ./run ide build.

distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso

radeusgd

Good to hear about the performance improvements and in general it is much nicer to have this kind of code in Enso and to hear that it's faster!

I have some reservations to the Accumulator - the logic is pretty dense so it is not easy to understand it at first glance. After a thorough read it is making sense, since it's not a huge piece of logic - so I guess it can stay that way, although maybe slight refinement of names or perhaps a few comments explaining what the less obvious methods (comparator, comparator_error) do.

I think the most confusing part to me was the handling of the edge cases: [] and [Nothing] - since we have a check early on counter.count == 0 that is actually decoupled from the Accumulator - but that check is part of the Accumulators logic really. Could we maybe move it into it? I think it would make it easier to understand if this logic was in a single place. But I'm fine with keeping it as is. It would be nice to have it slightly clearer, but that is not the highest priority.

The only thing that I think needs to be done (either before merging or after if we can have a follow up PR), is ensuring there are no regression on a few edge cases that unfortunately were not tested before, as I noted in the comments.

Handling of non-empty vectors of only missing values - [Nothing, Nothing] or [Number.nan] etc.
Handling of the possibility of a comparator throwing a dataflow error other than Incomparable_Values (I think this might have been impossible in the past but that has changed).

Especially the (1) is a priority, because if I read the code correctly the old code was throwing Empty_Error which makes sense, but now I think it will throw Incomparable_Values - let's make sure this works correctly. We should check both compute and running. This seems to be the old behaviour that we should not lose:

[Nothing, Nothing] . compute Statistic.Minimum . should_fail_with Empty_Error
[Nothing, Nothing] . running Statistic.Minimum [Nothing, Nothing]

As for (2), it would be great to test it:

type My_Type
    Value x

type My_Comparator
    compare x y =
        _ = [x, y]
        Error.throw (Illegal_State.Error "TEST")
    hash x = x

Comparable.from (_:My_Type) = My_Comparator

...
    x = My_Type.Value 1
    y = My_Type.Value 2
    v = [x, y]
    v.compute Statistic.Count . should_equal 2
    # This is IMO the preferred behaviour:
    v.compute Statistic.Maximum . should_fail_with Illegal_State
    # This is the OLD behaviour:
    v.compute Statistic.Maximum . should_fail_with Incomparable_Values

I guess it's ok if we are able to only reproduce the old behaviour, but while we are at it, IMO it would be ideal to ensure that examples like this one propagate the error that was actually thrown in the comparator and not convert it to Incomparable_Values, essentially hiding the underlying issue and making this harder to debug.

I appreciate that custom comparators are rare and them throwing errors is even rarer occurrence, so if we need to proceed fast, I imagine we could do this as a separate ticket.

radeusgd

Approving in case you need to proceed, just please add these few tests for (1) if possible.

The rest would be nice, but I think we can live with it for now (although I think it would be good to at least get a ticket for (2)).

jdunkerley · 2023-07-25T19:56:56Z

I have some reservations to the Accumulator - the logic is pretty dense so it is not easy to understand it at first glance. After a thorough read it is making sense, since it's not a huge piece of logic - so I guess it can stay that way, although maybe slight refinement of names or perhaps a few comments explaining what the less obvious methods (comparator, comparator_error) do.

I think the most confusing part to me was the handling of the edge cases: [] and [Nothing] - since we have a check early on counter.count == 0 that is actually decoupled from the Accumulator - but that check is part of the Accumulators logic really. Could we maybe move it into it? I think it would make it easier to understand if this logic was in a single place. But I'm fine with keeping it as is. It would be nice to have it slightly clearer, but that is not the highest priority.

The empty values check has been moved into the Accumulator and the error handling simplified as the accumulator will now fail on incomparable values. Hopefully this makes it clearer.

test/Tests/src/Data/Statistics_Spec.enso

…computed if needed.

radeusgd

Looks all good

JaroslavTulach · 2023-08-09T06:05:16Z

I finally got a bit of time to dig into this further. I believe we want Max (stats) be as fast as Max (by reduce) right?

Benchmark	Takes
Max (by reduce)	16.3ms
Max (stats)	224ms

If so, then we are more than ten times off. I've reported #7525 as a small result of my findings.

Main Findings

Overall it seems that the graph for Max (stats) is a way bigger than the Max (by reduce). My eyes got attracted by compilation of TruffleAST::Text.starts_with. Why are we compiling a text manipulation when the goal is to find maximum number in a array?

[info]  at <enso> case_branch<arg-1>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31770-31791)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31758-31803)
[info]  at <enso> Text.starts_with(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:803-808:31607-31944)
[info]  at <enso> case_branch<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13534)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13577)
[info]  at <enso> Filter_Condition.handle_constructor_missing_arguments(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:296-307:12954-13596)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:282:12314-12369)
[info]  at <enso> Filter_Condition.unify_condition_or_predicate(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:280-282:12199-12369)
[info]  at <enso> Vector.any(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso:379:14314-14351)
[info]  at <enso> Statistic.type.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:156:5147-5215)
[info]  at <enso> Vector.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:309:12093-12130)
[info]  at <enso> Vector.compute<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:301:11791-11819)
[info]  at <enso> Vector.compute(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:301:11791-11827)
[info]  at <enso> Operations.collect_benches<arg-1>(../../test/Benchmarks/src/Vector/Operations.enso:42:1691-1726)
[info]  at <enso> <anonymous>(../../distribution/lib/Standard/Test/0.0.0-dev/src/Bench.enso:33:925-933)
[info]  at org.graalvm.sdk/org.graalvm.polyglot.Value.execute(Value.java:881)
[info]  at org.enso.benchmarks.generated.Vector_Operations.Max_Stats(Vector_Operations.java:214)

At the end I decided to use Visual VM and its polyglot profiler. Following snapshot shows the problem is caused more by various abstractions in the Statistics implementation than in slow engine runtime:

Clearly this amount of calls can only hardly be compiled into something faster than Math.max.

radeusgd · 2023-08-09T09:10:15Z

My eyes got attracted by compilation of TruffleAST::Text.starts_with. Why are we compiling a text manipulation when the goal is to find maximum number in a array?

[info]  at <enso> case_branch<arg-1>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31770-31791)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31758-31803)
[info]  at <enso> Text.starts_with(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:803-808:31607-31944)
[info]  at <enso> case_branch<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13534)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13577)
[info]  at <enso> Filter_Condition.handle_constructor_missing_arguments(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:296-307:12954-13596)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:282:12314-12369)
[info]  at <enso> Filter_Condition.unify_condition_or_predicate(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:280-282:12199-12369)
[info]  at <enso> Vector.any(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso:379:14314-14351)
[info]  at <enso> Statistic.type.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:156:5147-5215)
[info]  at <enso> Vector.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:309:12093-12130)
...

The Text.starts_with is a heuristic check that allows us to better handle errors in case of constructors missing arguments. Introduced by #7148 when Vector.any started allowing Filter_Condition. It should not affect performance though, because it should only be run once when computing Vector.any, as a 'pre-flight check' verifying the given predicate/filter condition is valid. Looking at the code of statistics, it should also be run O(1) times there. Is this starts_with influencing the execution time in some measurable way or was it just unexpected here?

radeusgd · 2023-08-09T09:16:14Z

At the end I decided to use Visual VM and its polyglot profiler. Following snapshot shows the problem is caused more by various abstractions in the Statistics implementation than in slow engine runtime:

Clearly this amount of calls can only hardly be compiled into something faster than Math.max.

Could you show me the code of the two approaches you were comparing? Were you using Math.max from java.lang.Math or from Enso?

I think ideally we'd want perform_comparison that seems to be a bottleneck here to be +- as fast as Math.max. Do you think it could be because, instead of doing a < b it does Ordering.compare a b == bound?

If you want to look into this further, maybe it could be worth to try comparing:

IO.println "<"
vector_a.zip vector_b a-> b-> 
    a < b

IO.println "compare"
vector_a.zip vector_b a-> b->
    Ordering.compare a b == Ordering.Less

I imagine compare will be slightly slower as it is obviously doing more, but both really should boil down to a simple Integer comparison operation - so the speed difference should be smaller than 2x I think. I wonder what it is in practice and if the graphs of the latter are somehow not 'simplified' enough by the compiler.

JaroslavTulach · 2023-08-09T09:25:41Z

Both tests are in Operations.enso and can be executed as:

sbt:std-benchmarks> withDebug benchOnly --dumpGraphs -- Vector_Operations.Max

or just

sbt:std-benchmarks> benchOnly Vector_Operations.Max

radeusgd · 2023-08-09T09:34:17Z

Both tests are in Operations.enso and can be executed as:
sbt:std-benchmarks> withDebug benchOnly --dumpGraphs -- Vector_Operations.Max
or just
sbt:std-benchmarks> benchOnly Vector_Operations.Max

Thanks, so it looks like its the Enso implementation.

Then I'm really curious what is the perf difference between the two vector zips I posted above - < vs Ordering.compare.

jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 25, 2023

jdunkerley force-pushed the wip/jd/statistics branch from 7e13c45 to 1dc0632 Compare July 25, 2023 16:34

jdunkerley marked this pull request as ready for review July 25, 2023 16:34

jdunkerley requested review from radeusgd and GregoryTravis as code owners July 25, 2023 16:34

GregoryTravis approved these changes Jul 25, 2023

View reviewed changes