You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in our recent OPAL meeting, we want to understand what operations are performed (eagerly) when initializing a Project instance, and their respective impact on the overall performance. I had a first look and identified the following relevant operations:
O1 Building the class hierarchy: This is done in a separat future using the Scala global execution context
O2 Process project class files: Processes every project class file adding it to the relevant data structures and updating things like code sizes and count variable. This includes virtual class files supplied by the caller. Also processes modules and nesting information and prints inconsistent project warnings.
O3 Process library class files: Same thing as above, but for the library class files.
O4 Compute instance methods: This is done in a separat future as well
O5 Compute overriding methods: This is done in a separat future as well
O6 Validate the project instance: Checks for some fundamental issues with project consistency
O7 Compute classes-per-package map. This is a val definition and happens on Project instantiation
O8 Computing functional interfaces: This is a lazy val, so not really relevant in this context. However, it already features the following annotation:
// TODO Consider extracting to a ProjectInformationKey
final lazy val functionalInterfaces: UIDSet[ObjectType] = [..]
O1 runs concurrently to O2 & O3 and is waited for after O3 completes. O4 and O5 run concurrently while the main thread performs some array manipulations, both are waited for when the actual project instance is created - this is when O7 is triggered. O6 runs after the instantiation has completed, then the Project instance is returned.
Empirical Evaluation
I implemented a small patch to OPAL that extracts the runtime of the operations mentioned above. Based on that i wrote an analysis that iterates Maven Central and does the following:
Locate project JAR based on GAV and open a stream for download
Download project JAR and parse it to OPAL ClassFile representation
Download all transitive dependency JARs and parse them to OPAL ClassFile representation (interfaces only)
Initialize a Project instance based on those project- and library class files
Extract performance values for the operations mentioned above
Write the following values into a CSV file: GAV, #ProjectClasses, #Libraries, #LibraryClasses, StreamTime, LoadAndParseProjectCFsTime, LoadAndParseLibraryCFsTime, TotalProjectInitTime, O4Time, O1Time, O5Time, O7Time, O2Time, O3Time, O6Time
A first very basic run on ~1000 GAVs produced the following results: stats.csv. Note that all times are in milliseconds and the LoadAndParse[Project|Library]CFsTime depends on my local internet connection at home.
Let me know if you have any ideas or additional input for me, then i'll run the analysis on our servers and post evaluation results under this issue.
The text was updated successfully, but these errors were encountered:
Today i ran the analysis on one of our servers (4 Cores, 30GB Heap Space). Unfortunately it crashed after ~5000 GAVs, i just restarted it with different configurations and hope to obtain some more results. Nevertheless, i did a preliminary evaluation on the results for those 5000 GAVs. Here's an overview:
Operation
AVG Time [ms]
MEDIAN Time [ms]
75% Quantil [ms]
Project Classes Download & Init
64
11
30
Library Classes Download & Init
1685
594
2092
Project Instance Init
446
37
145
- O1
131
11
41
- O2
~0
0
0
- O3
17
3
13
- O4
301
18
80
- O5
84
14
63
- O6
1
0
1
- O7
3
0
4
As you can see, the most relevant operations seem to be O4 (computing instance methods) and O5 (computing overriding methods).
Thank you for looking into this. I had a glance at the CSV, but didn't yet gain deeper insights. I think the steps that we expected to be the most expensive also ended up dominating the project creation time, with some differences between projects. Are the any insights you gained that would suggest a course of action besides a general "let's try not to compute everything all the time but just when needed"? Keeping in mind that that would probably increase latency because now some of the steps can just be started right away and done in parallel but if it is lazy, neither would be possible.
I do think it's rather tricky to optimize. While instance and overriding methods are the last thing to be performed before the project is created - and therefore could maybe be made lazy - that would impact project validation, which could only be performed in a reduced fashion, or not at all. Maybe we want to come back to the LazyProject / UnsafeProject idea, with a separate class for use-cases where you e.g. only need the class hierarchy. Before we come to some final conclusions, i'd like to a) gather some more data and b) try the same experiments with your additions from #215 - just to see the performance impact.
Problem Statement
As discussed in our recent OPAL meeting, we want to understand what operations are performed (eagerly) when initializing a
Project
instance, and their respective impact on the overall performance. I had a first look and identified the following relevant operations:val
definition and happens onProject
instantiationlazy val
, so not really relevant in this context. However, it already features the following annotation:O1 runs concurrently to O2 & O3 and is waited for after O3 completes. O4 and O5 run concurrently while the main thread performs some array manipulations, both are waited for when the actual project instance is created - this is when O7 is triggered. O6 runs after the instantiation has completed, then the
Project
instance is returned.Empirical Evaluation
I implemented a small patch to OPAL that extracts the runtime of the operations mentioned above. Based on that i wrote an analysis that iterates Maven Central and does the following:
ClassFile
representationClassFile
representation (interfaces only)Project
instance based on those project- and library class filesGAV, #ProjectClasses, #Libraries, #LibraryClasses, StreamTime, LoadAndParseProjectCFsTime, LoadAndParseLibraryCFsTime, TotalProjectInitTime, O4Time, O1Time, O5Time, O7Time, O2Time, O3Time, O6Time
A first very basic run on ~1000 GAVs produced the following results: stats.csv. Note that all times are in milliseconds and the
LoadAndParse[Project|Library]CFsTime
depends on my local internet connection at home.Let me know if you have any ideas or additional input for me, then i'll run the analysis on our servers and post evaluation results under this issue.
The text was updated successfully, but these errors were encountered: