Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential missed optimization in C2 JIT compiler #10609

Open
zijian-yi opened this issue Jan 31, 2025 · 4 comments
Open

Potential missed optimization in C2 JIT compiler #10609

zijian-yi opened this issue Jan 31, 2025 · 4 comments
Assignees

Comments

@zijian-yi
Copy link

zijian-yi commented Jan 31, 2025

Describe the issue

Potential missed optimization in GraalVM C2 JIT compiler

Steps to reproduce the issue

Here is the program:

public final class Sum {
    private double sum;
    private double comp;

    public Sum(final double initialValue) {
        sum = initialValue;
        comp = 0.1;
    }
    static double twoSumLow(double a, double b, double sum) {
        final double bVirtual = sum - a;
        return (a - (sum - bVirtual)) + (b - bVirtual);
    }
    public void add(final double t) {
        final double newSum = (sum % comp);
        comp += twoSumLow(t, comp, newSum);
        sum += comp;
    }
    public static void main(String[] args) {
        int N = 50000000;
        Sum s = new Sum(1.0);
        for (int i = 0; i < N; ++i) {
            s.add(0.1);
        }
        // System.out.println(s.sum);
    }
}

Run the program with C1 and C2 respectively:

# Set $JAVA_HOME to corresponding JDKs before running
javac Sum.java
time java -XX:TieredStopAtLevel=1 Sum
time java -XX:TieredStopAtLevel=4 Sum

Below is the result I got on my machine (the exact numbers vary depending on the machine, but the performance difference should be noticeable, try increasing N if not):

Oracle 21:
java -XX:TieredStopAtLevel=1 Sum  5.52s user 0.02s system 100% cpu 5.535 total
java -XX:TieredStopAtLevel=4 Sum  5.58s user 0.01s system 100% cpu 5.573 total

Oracle 23:
java -XX:TieredStopAtLevel=1 Sum  0.68s user 0.01s system 100% cpu 0.692 total
java -XX:TieredStopAtLevel=4 Sum  0.72s user 0.02s system 100% cpu 0.737 total

Graal 25:
java -XX:TieredStopAtLevel=1 Sum  0.71s user 0.03s system 102% cpu 0.714 total
java -XX:TieredStopAtLevel=4 Sum  5.82s user 0.02s system 101% cpu 5.774 total

It looks like Oracle 23 (HotSpot JIT compiler) adds some new optimization(s), making the program run much faster. Such optimization(s) are not present in GraalVM yet.

Describe GraalVM and your environment:

  • GraalVM version: CE 25.0.0-dev-20250122_1329, 23.0.1+11.1
  • JDK major version: 25, 23
  • OS: Ubuntu 20.04
  • Architecture: AMD64
@zijian-yi zijian-yi added the bug label Jan 31, 2025
@zijian-yi zijian-yi changed the title Potential missed optimization in C2 JIT compiler. Potential missed optimization in C2 JIT compiler Jan 31, 2025
@davleopo
Copy link
Member

davleopo commented Feb 3, 2025

Thanks @zijian-yi for the report. We will have a look.

A minor side note for future performance reports - for Java benchmarking, especially micro benchmarks there is the jmh harness. https://openjdk.org/projects/code-tools/jmh/ a harness that helps to write benchmarks like you did here. There are many advantages of jmh - too many to enumerate them here in a simple comment but the most important one is that it makes benchmarking very small programs more reliable. The reproducer you shared is very small - a micro benchmark. Such programs tend to behave sometimes very non-intuitive with JVMs. If you have not heard of jmh yet maybe consider a short tutorial - https://www.baeldung.com/java-microbenchmark-harness .

Do you maybe have free cycles to port your reproducer to a jmh micro before we have a look ?

@rmosaner rmosaner self-assigned this Feb 3, 2025
@zijian-yi
Copy link
Author

zijian-yi commented Feb 3, 2025

Thanks for the advice @davleopo . I have heard of the tool but haven't used it much.
Here is a reproducer using JMH:

package org.sample;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
@Fork(3)
public class Sum {
    private static double sum = 1.0;
    private static double comp = 0.1;

    static double twoSumLow(double a, double b, double sum) {
        final double bVirtual = sum - a;
        return (a - (sum - bVirtual)) + (b - bVirtual);
    }

    @Benchmark
    public void add() {
        final double newSum = (sum % comp);
        comp += twoSumLow(0.1, comp, newSum);
        sum += comp;
    }
}

Reuslts:

graalvm-community-openjdk-25+5.1:
Benchmark  Mode  Cnt    Score   Error  Units
Sum.add    avgt   15  102.889 ± 0.098  ns/op

HotSpot build 23.0.2+7-58:
Benchmark  Mode  Cnt   Score   Error  Units
Sum.add    avgt   15  13.122 ± 0.091  ns/op

@davleopo
Copy link
Member

davleopo commented Feb 4, 2025

@zijian-yi thanks for porting this to jmh. We will have a look.

@rmosaner
Copy link
Member

rmosaner commented Feb 5, 2025

Thank you for the reproducer. We found that the floating point modulo operation causes the slowdown.

This is tracked internally as [GR-61951]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants