Skip to content

Conversation

maximilianruesch
Copy link
Collaborator

Uses the results generated in #1 by:

  • Integrating them into the ClassForName reflective call analysis
  • Adding a string analysis demo along with a demo package showcasing the ability of the now integrated analysis to resolve calls across the Apache Xerces Framework

This PR contains some compromises and bugfixes that still need to be ironed out.

@errt errt self-requested a review September 9, 2024 08:28
@maximilianruesch maximilianruesch changed the title Use string analysis results for reflective calls Use string analysis results for Class.forName reflective calls Sep 12, 2024
@maximilianruesch maximilianruesch marked this pull request as ready for review September 24, 2024 10:08
@maximilianruesch maximilianruesch marked this pull request as draft September 24, 2024 10:34
@maximilianruesch
Copy link
Collaborator Author

This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.

@errt errt force-pushed the feature/StringAnalysis branch from 8de9909 to 56bc6b9 Compare June 23, 2025 13:15
@errt errt force-pushed the feature/StringAnalysis branch from 88af66c to b2f4233 Compare July 2, 2025 13:11
Base automatically changed from feature/StringAnalysis to develop August 29, 2025 09:04
@errt errt changed the title Use string analysis results for Class.forName reflective calls Use string analysis results for reflective calls Aug 29, 2025
@errt errt force-pushed the migration/reflective-calls-to-string-analysis branch from 1d0dab5 to 0731697 Compare September 1, 2025 09:14
@errt
Copy link
Collaborator

errt commented Sep 1, 2025

This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.

Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?

@maximilianruesch
Copy link
Collaborator Author

Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?

Afaik the only sound fallback we can provide is the LB of the property (i.e. ANY string) which could potentially blow up the runtime of analyses that previously had more conservative fallbacks or computed the possible strings on the fly in a very limited scope.

I am not sure what the best solution is. Can we configure the framework such that, if not specified otherwise, e.g. L1String runs? We now effectively standardized the ways to obtain strings into different levels of detail, but it seems difficult to choose one over the over...

@errt
Copy link
Collaborator

errt commented Sep 1, 2025

Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.

Can we configure the framework such that, if not specified otherwise, e.g. L1String runs?

In general, no, that goes against the design principles of the framework. For clients based on the new CLI framework (#226), we can enforce that an analysis is chosen or provide a default value if none is picked. For call graphs acquired via a CallGraphKey, it would also be possible to force a particular (hard-coded or configurable) analysis to run. For all other situations, no, we can't. That's one of the reasons most analyses do provide fallback values.
That was also the reason for my question: If we can provide a fallback that has some actual value (in the sense of benefit, not result), that would be the easiest solution by far. If not, at least for the CLI I would enforce an analysis to be scheduled (although already that brings several difficult questions, such as whether this should be the case when a call graph is configured but without the reflection analyses that need string values) but I'm already not sure whether to also do it for the CallGraphKeys.

@maximilianruesch
Copy link
Collaborator Author

Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.

I agree that these would be the reasonable fallbacks, and I am starting to think that they are the only reasonable ones. In case of a very rudimentary bytecode analysis (to at least derive constant string literals etc.), would this be performed inline in the fallback function that is passed at PropertyKey creation?

If we can provide a fallback that has some actual value (in the sense of benefit, not result)

Whether there is such a value depends on the consuming analysis, but in most cases even just analyzing string literals (as in the above) should suffice for some rudimentary results. Imo we should not necessarily force users to specify a string analysis level everywhere (since the call graphs are basically used everywhere), as that would require them to read into and try to distinguish the different levels of string analysis to be able to pick an appropriate one. Or we should, if we decide that we want every user to go to this level of detail.

Perhaps this is rather a documentation / usability issue in the end, but if there is a fallback that can to some degree approximate the results from the different local analyses before, we should supply it here.

@errt
Copy link
Collaborator

errt commented Sep 2, 2025

would this be performed inline in the fallback function that is passed at PropertyKey creation?

Yes, exactly. I think we should have a look whether we can implement a simple bytecode analysis. Potential for finding constants would be to look at LDC instructions and the ConstantValue attribute for fields.

@maximilianruesch
Copy link
Collaborator Author

and the ConstantValue attribute for fields

This reminds me of #1 (comment). Could we generalise reading constant-value attributes such that code is reused, or is it little enough code to be reasonable duplicated?

Further, I would suggest that this bytecode analysis stays private to the string package, and instead we encourage consuming analyses to actually deal with a StringConstancy property.

@errt
Copy link
Collaborator

errt commented Sep 2, 2025

Well, it is not much more code than field.constantFieldValue.isDefined and field.constantFieldValue.get.

It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.

@maximilianruesch
Copy link
Collaborator Author

maximilianruesch commented Sep 2, 2025

It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.

I am not suggesting something about end-users scheduling analyses, but rather users that are developing new analyses which directly require string information should not be tempted to use the fallback computation directly, but rather rely on StringConstancy (which then might fallback if needed). This implicitly standardizes the handling of "this string could be here" to use string trees, which is imo a good idea.

But in a sense, copying the code directly into the fallback function already scopes it such that what I propose is given, so it might not need further action.

@errt
Copy link
Collaborator

errt commented Sep 2, 2025

That's the general expectation in OPAL anyway that properties are queried from the property store.

@errt
Copy link
Collaborator

errt commented Sep 4, 2025

@maximilianruesch Is a fallback computation something you would like to look into?

@maximilianruesch
Copy link
Collaborator Author

I could have a look next week.

@johannesduesing johannesduesing added this to the Release 6.0.0 milestone Sep 12, 2025
@maximilianruesch maximilianruesch self-assigned this Sep 12, 2025
@maximilianruesch maximilianruesch force-pushed the migration/reflective-calls-to-string-analysis branch from 0731697 to f22d87a Compare September 15, 2025 15:45
@maximilianruesch
Copy link
Collaborator Author

I had another look at this. There is some difficulty implementing the bytecode analysis, since the entities for which the property is queried is one of VariableDefinition, VariableContext or MethodParameterContext which are defined in the TAC package. The property key is defined in the BR package, and thus we cannot type-safely access fields from those entities to find the PCs.

How do you suggest we proceed @errt? Simply return LB if in high soundness mode and UB otherwise? Move the property key to TAC (even the trivial string analysis is in TAC)?

@errt
Copy link
Collaborator

errt commented Sep 15, 2025

If the entities are all defined in tac, maybe the key should be as well? If I understand you correctly, it is impossible to query the key for any sensible entity without depending on tac anyway.

@maximilianruesch maximilianruesch force-pushed the migration/reflective-calls-to-string-analysis branch from aed23d7 to ec22731 Compare September 17, 2025 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants