Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of eager computations in Project initialization #221

Open
johannesduesing opened this issue Sep 17, 2024 · 3 comments
Open

Performance of eager computations in Project initialization #221

johannesduesing opened this issue Sep 17, 2024 · 3 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@johannesduesing
Copy link
Collaborator

Problem Statement

As discussed in our recent OPAL meeting, we want to understand what operations are performed (eagerly) when initializing a Project instance, and their respective impact on the overall performance. I had a first look and identified the following relevant operations:

  • O1 Building the class hierarchy: This is done in a separat future using the Scala global execution context
  • O2 Process project class files: Processes every project class file adding it to the relevant data structures and updating things like code sizes and count variable. This includes virtual class files supplied by the caller. Also processes modules and nesting information and prints inconsistent project warnings.
  • O3 Process library class files: Same thing as above, but for the library class files.
  • O4 Compute instance methods: This is done in a separat future as well
  • O5 Compute overriding methods: This is done in a separat future as well
  • O6 Validate the project instance: Checks for some fundamental issues with project consistency
  • O7 Compute classes-per-package map. This is a val definition and happens on Project instantiation
  • O8 Computing functional interfaces: This is a lazy val, so not really relevant in this context. However, it already features the following annotation:
    // TODO Consider extracting to a ProjectInformationKey
    final lazy val functionalInterfaces: UIDSet[ObjectType] = [..]

O1 runs concurrently to O2 & O3 and is waited for after O3 completes. O4 and O5 run concurrently while the main thread performs some array manipulations, both are waited for when the actual project instance is created - this is when O7 is triggered. O6 runs after the instantiation has completed, then the Project instance is returned.

Empirical Evaluation

I implemented a small patch to OPAL that extracts the runtime of the operations mentioned above. Based on that i wrote an analysis that iterates Maven Central and does the following:

  1. Locate project JAR based on GAV and open a stream for download
  2. Download project JAR and parse it to OPAL ClassFile representation
  3. Download all transitive dependency JARs and parse them to OPAL ClassFile representation (interfaces only)
  4. Initialize a Project instance based on those project- and library class files
  5. Extract performance values for the operations mentioned above
  6. Write the following values into a CSV file: GAV, #ProjectClasses, #Libraries, #LibraryClasses, StreamTime, LoadAndParseProjectCFsTime, LoadAndParseLibraryCFsTime, TotalProjectInitTime, O4Time, O1Time, O5Time, O7Time, O2Time, O3Time, O6Time

A first very basic run on ~1000 GAVs produced the following results: stats.csv. Note that all times are in milliseconds and the LoadAndParse[Project|Library]CFsTime depends on my local internet connection at home.

Let me know if you have any ideas or additional input for me, then i'll run the analysis on our servers and post evaluation results under this issue.

@johannesduesing johannesduesing added documentation Improvements or additions to documentation question Further information is requested labels Sep 17, 2024
@johannesduesing
Copy link
Collaborator Author

johannesduesing commented Sep 19, 2024

Today i ran the analysis on one of our servers (4 Cores, 30GB Heap Space). Unfortunately it crashed after ~5000 GAVs, i just restarted it with different configurations and hope to obtain some more results. Nevertheless, i did a preliminary evaluation on the results for those 5000 GAVs. Here's an overview:

Operation AVG Time [ms] MEDIAN Time [ms] 75% Quantil [ms]
Project Classes Download & Init 64 11 30
Library Classes Download & Init 1685 594 2092
Project Instance Init 446 37 145
- O1 131 11 41
- O2 ~0 0 0
- O3 17 3 13
- O4 301 18 80
- O5 84 14 63
- O6 1 0 1
- O7 3 0 4

As you can see, the most relevant operations seem to be O4 (computing instance methods) and O5 (computing overriding methods).

@errt
Copy link
Collaborator

errt commented Sep 20, 2024

Thank you for looking into this. I had a glance at the CSV, but didn't yet gain deeper insights. I think the steps that we expected to be the most expensive also ended up dominating the project creation time, with some differences between projects. Are the any insights you gained that would suggest a course of action besides a general "let's try not to compute everything all the time but just when needed"? Keeping in mind that that would probably increase latency because now some of the steps can just be started right away and done in parallel but if it is lazy, neither would be possible.

@johannesduesing
Copy link
Collaborator Author

I do think it's rather tricky to optimize. While instance and overriding methods are the last thing to be performed before the project is created - and therefore could maybe be made lazy - that would impact project validation, which could only be performed in a reduced fashion, or not at all. Maybe we want to come back to the LazyProject / UnsafeProject idea, with a separate class for use-cases where you e.g. only need the class hierarchy. Before we come to some final conclusions, i'd like to a) gather some more data and b) try the same experiments with your additions from #215 - just to see the performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants