Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up statistics #7390

Merged
merged 16 commits into from
Jul 26, 2023
Merged

Speed up statistics #7390

merged 16 commits into from
Jul 26, 2023

Conversation

jdunkerley
Copy link
Member

@jdunkerley jdunkerley commented Jul 25, 2023

Pull Request Description

  • Allow parse_to_columns to take a Regex object.
  • Add pattern to the Regex object.
  • Add column_names to the Row object.
  • Improve statistics performance.
  • Add benchmarks for stats.
Benchmark Reference New Improvement
Max (by reduce) 16.4ms 16.3ms -
Max (stats) 703ms 224ms 68%
Sum (by reduce) 38ms 38ms -
Sum (stats) 753ms 420ms 44%
Variance (stats) 745ms 553s 26%

Also tried using a Ref approach for stats but as slower (7e13c45).

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@jdunkerley jdunkerley added the CI: No changelog needed Do not require a changelog entry for this PR. label Jul 25, 2023
@jdunkerley jdunkerley marked this pull request as ready for review July 25, 2023 16:34
Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to hear about the performance improvements and in general it is much nicer to have this kind of code in Enso and to hear that it's faster!

I have some reservations to the Accumulator - the logic is pretty dense so it is not easy to understand it at first glance. After a thorough read it is making sense, since it's not a huge piece of logic - so I guess it can stay that way, although maybe slight refinement of names or perhaps a few comments explaining what the less obvious methods (comparator, comparator_error) do.

I think the most confusing part to me was the handling of the edge cases: [] and [Nothing] - since we have a check early on counter.count == 0 that is actually decoupled from the Accumulator - but that check is part of the Accumulators logic really. Could we maybe move it into it? I think it would make it easier to understand if this logic was in a single place. But I'm fine with keeping it as is. It would be nice to have it slightly clearer, but that is not the highest priority.

The only thing that I think needs to be done (either before merging or after if we can have a follow up PR), is ensuring there are no regression on a few edge cases that unfortunately were not tested before, as I noted in the comments.

  1. Handling of non-empty vectors of only missing values - [Nothing, Nothing] or [Number.nan] etc.
  2. Handling of the possibility of a comparator throwing a dataflow error other than Incomparable_Values (I think this might have been impossible in the past but that has changed).

Especially the (1) is a priority, because if I read the code correctly the old code was throwing Empty_Error which makes sense, but now I think it will throw Incomparable_Values - let's make sure this works correctly. We should check both compute and running. This seems to be the old behaviour that we should not lose:

[Nothing, Nothing] . compute Statistic.Minimum . should_fail_with Empty_Error
[Nothing, Nothing] . running Statistic.Minimum [Nothing, Nothing]

As for (2), it would be great to test it:

type My_Type
    Value x

type My_Comparator
    compare x y =
        _ = [x, y]
        Error.throw (Illegal_State.Error "TEST")
    hash x = x

Comparable.from (_:My_Type) = My_Comparator

...
    x = My_Type.Value 1
    y = My_Type.Value 2
    v = [x, y]
    v.compute Statistic.Count . should_equal 2
    # This is IMO the preferred behaviour:
    v.compute Statistic.Maximum . should_fail_with Illegal_State
    # This is the OLD behaviour:
    v.compute Statistic.Maximum . should_fail_with Incomparable_Values

I guess it's ok if we are able to only reproduce the old behaviour, but while we are at it, IMO it would be ideal to ensure that examples like this one propagate the error that was actually thrown in the comparator and not convert it to Incomparable_Values, essentially hiding the underlying issue and making this harder to debug.

I appreciate that custom comparators are rare and them throwing errors is even rarer occurrence, so if we need to proceed fast, I imagine we could do this as a separate ticket.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving in case you need to proceed, just please add these few tests for (1) if possible.

The rest would be nice, but I think we can live with it for now (although I think it would be good to at least get a ticket for (2)).

@jdunkerley
Copy link
Member Author

I have some reservations to the Accumulator - the logic is pretty dense so it is not easy to understand it at first glance. After a thorough read it is making sense, since it's not a huge piece of logic - so I guess it can stay that way, although maybe slight refinement of names or perhaps a few comments explaining what the less obvious methods (comparator, comparator_error) do.

I think the most confusing part to me was the handling of the edge cases: [] and [Nothing] - since we have a check early on counter.count == 0 that is actually decoupled from the Accumulator - but that check is part of the Accumulators logic really. Could we maybe move it into it? I think it would make it easier to understand if this logic was in a single place. But I'm fine with keeping it as is. It would be nice to have it slightly clearer, but that is not the highest priority.

The empty values check has been moved into the Accumulator and the error handling simplified as the accumulator will now fail on incomparable values. Hopefully this makes it clearer.

Copy link
Member

@radeusgd radeusgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good

@jdunkerley jdunkerley added the CI: Ready to merge This PR is eligible for automatic merge label Jul 26, 2023
@jdunkerley jdunkerley linked an issue Jul 26, 2023 that may be closed by this pull request
@mergify mergify bot merged commit 7345f0f into develop Jul 26, 2023
23 of 24 checks passed
@mergify mergify bot deleted the wip/jd/statistics branch July 26, 2023 10:01
@JaroslavTulach
Copy link
Member

JaroslavTulach commented Aug 9, 2023

I finally got a bit of time to dig into this further. I believe we want Max (stats) be as fast as Max (by reduce) right?

Benchmark Takes
Max (by reduce) 16.3ms
Max (stats) 224ms

If so, then we are more than ten times off. I've reported #7525 as a small result of my findings.

Main Findings

Overall it seems that the graph for Max (stats) is a way bigger than the Max (by reduce). My eyes got attracted by compilation of TruffleAST::Text.starts_with. Why are we compiling a text manipulation when the goal is to find maximum number in a array?

[info]  at <enso> case_branch<arg-1>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31770-31791)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31758-31803)
[info]  at <enso> Text.starts_with(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:803-808:31607-31944)
[info]  at <enso> case_branch<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13534)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13577)
[info]  at <enso> Filter_Condition.handle_constructor_missing_arguments(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:296-307:12954-13596)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:282:12314-12369)
[info]  at <enso> Filter_Condition.unify_condition_or_predicate(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:280-282:12199-12369)
[info]  at <enso> Vector.any(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso:379:14314-14351)
[info]  at <enso> Statistic.type.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:156:5147-5215)
[info]  at <enso> Vector.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:309:12093-12130)
[info]  at <enso> Vector.compute<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:301:11791-11819)
[info]  at <enso> Vector.compute(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:301:11791-11827)
[info]  at <enso> Operations.collect_benches<arg-1>(../../test/Benchmarks/src/Vector/Operations.enso:42:1691-1726)
[info]  at <enso> <anonymous>(../../distribution/lib/Standard/Test/0.0.0-dev/src/Bench.enso:33:925-933)
[info]  at org.graalvm.sdk/org.graalvm.polyglot.Value.execute(Value.java:881)
[info]  at org.enso.benchmarks.generated.Vector_Operations.Max_Stats(Vector_Operations.java:214)

At the end I decided to use Visual VM and its polyglot profiler. Following snapshot shows the problem is caused more by various abstractions in the Statistics implementation than in slow engine runtime:

Max (stats)

Clearly this amount of calls can only hardly be compiled into something faster than Math.max.

@radeusgd
Copy link
Member

radeusgd commented Aug 9, 2023

My eyes got attracted by compilation of TruffleAST::Text.starts_with. Why are we compiling a text manipulation when the goal is to find maximum number in a array?

[info]  at <enso> case_branch<arg-1>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31770-31791)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:806:31758-31803)
[info]  at <enso> Text.starts_with(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso:803-808:31607-31944)
[info]  at <enso> case_branch<arg-0>(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13534)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:306:13499-13577)
[info]  at <enso> Filter_Condition.handle_constructor_missing_arguments(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:296-307:12954-13596)
[info]  at <enso> case_branch(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:282:12314-12369)
[info]  at <enso> Filter_Condition.unify_condition_or_predicate(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Filter_Condition.enso:280-282:12199-12369)
[info]  at <enso> Vector.any(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso:379:14314-14351)
[info]  at <enso> Statistic.type.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:156:5147-5215)
[info]  at <enso> Vector.compute_bulk(../../distribution/lib/Standard/Base/0.0.0-dev/src/Data/Statistics.enso:309:12093-12130)
...

The Text.starts_with is a heuristic check that allows us to better handle errors in case of constructors missing arguments. Introduced by #7148 when Vector.any started allowing Filter_Condition. It should not affect performance though, because it should only be run once when computing Vector.any, as a 'pre-flight check' verifying the given predicate/filter condition is valid. Looking at the code of statistics, it should also be run O(1) times there. Is this starts_with influencing the execution time in some measurable way or was it just unexpected here?

@radeusgd
Copy link
Member

radeusgd commented Aug 9, 2023

At the end I decided to use Visual VM and its polyglot profiler. Following snapshot shows the problem is caused more by various abstractions in the Statistics implementation than in slow engine runtime:

Max (stats)

Clearly this amount of calls can only hardly be compiled into something faster than Math.max.

Could you show me the code of the two approaches you were comparing? Were you using Math.max from java.lang.Math or from Enso?

I think ideally we'd want perform_comparison that seems to be a bottleneck here to be +- as fast as Math.max. Do you think it could be because, instead of doing a < b it does Ordering.compare a b == bound?

If you want to look into this further, maybe it could be worth to try comparing:

IO.println "<"
vector_a.zip vector_b a-> b-> 
    a < b

IO.println "compare"
vector_a.zip vector_b a-> b->
    Ordering.compare a b == Ordering.Less

I imagine compare will be slightly slower as it is obviously doing more, but both really should boil down to a simple Integer comparison operation - so the speed difference should be smaller than 2x I think. I wonder what it is in practice and if the graphs of the latter are somehow not 'simplified' enough by the compiler.

@JaroslavTulach
Copy link
Member

Both tests are in Operations.enso and can be executed as:

sbt:std-benchmarks> withDebug benchOnly --dumpGraphs -- Vector_Operations.Max

or just

sbt:std-benchmarks> benchOnly Vector_Operations.Max

@radeusgd
Copy link
Member

radeusgd commented Aug 9, 2023

Both tests are in Operations.enso and can be executed as:

sbt:std-benchmarks> withDebug benchOnly --dumpGraphs -- Vector_Operations.Max

or just

sbt:std-benchmarks> benchOnly Vector_Operations.Max

Thanks, so it looks like its the Enso implementation.

Then I'm really curious what is the perf difference between the two vector zips I posted above - < vs Ordering.compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Calculating sum is 10x slower using Statistics
4 participants