Use string analysis results for reflective calls #214

maximilianruesch · 2024-09-08T21:52:29Z

Uses the results generated in #1 by:

Integrating them into the ClassForName reflective call analysis
Adding a string analysis demo along with a demo package showcasing the ability of the now integrated analysis to resolve calls across the Apache Xerces Framework

This PR contains some compromises and bugfixes that still need to be ironed out.

maximilianruesch · 2024-09-24T10:35:21Z

This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.

errt · 2025-09-01T09:28:49Z

This PR will still fail unit (and probably integration) tests since some computation for StringConstancy is missing (often times in the call graph computations that depend on the ClassForName analysis.

Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?

maximilianruesch · 2025-09-01T16:05:07Z

Should we force users to schedule a string analysis or should we provide a fallback for the StringConstancy property?

Afaik the only sound fallback we can provide is the LB of the property (i.e. ANY string) which could potentially blow up the runtime of analyses that previously had more conservative fallbacks or computed the possible strings on the fly in a very limited scope.

I am not sure what the best solution is. Can we configure the framework such that, if not specified otherwise, e.g. L1String runs? We now effectively standardized the ways to obtain strings into different levels of detail, but it seems difficult to choose one over the over...

errt · 2025-09-01T17:32:30Z

Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.

Can we configure the framework such that, if not specified otherwise, e.g. L1String runs?

In general, no, that goes against the design principles of the framework. For clients based on the new CLI framework (#226), we can enforce that an analysis is chosen or provide a default value if none is picked. For call graphs acquired via a CallGraphKey, it would also be possible to force a particular (hard-coded or configurable) analysis to run. For all other situations, no, we can't. That's one of the reasons most analyses do provide fallback values.
That was also the reason for my question: If we can provide a fallback that has some actual value (in the sense of benefit, not result), that would be the easiest solution by far. If not, at least for the CLI I would enforce an analysis to be scheduled (although already that brings several difficult questions, such as whether this should be the case when a call graph is configured but without the reflection analyses that need string values) but I'm already not sure whether to also do it for the CallGraphKeys.

maximilianruesch · 2025-09-01T18:39:30Z

Well, the analyses use "ANY string" or "NO string" depending on the soundness mode now when they cannot determine a string's value, right? So that (maybe combined with a very, very trivial (would have to be bytecode based, not TAC) local analysis) could be a fallback. Yes, it would be very unsound. But so is the string analysis in regular mode.

I agree that these would be the reasonable fallbacks, and I am starting to think that they are the only reasonable ones. In case of a very rudimentary bytecode analysis (to at least derive constant string literals etc.), would this be performed inline in the fallback function that is passed at PropertyKey creation?

If we can provide a fallback that has some actual value (in the sense of benefit, not result)

Whether there is such a value depends on the consuming analysis, but in most cases even just analyzing string literals (as in the above) should suffice for some rudimentary results. Imo we should not necessarily force users to specify a string analysis level everywhere (since the call graphs are basically used everywhere), as that would require them to read into and try to distinguish the different levels of string analysis to be able to pick an appropriate one. Or we should, if we decide that we want every user to go to this level of detail.

Perhaps this is rather a documentation / usability issue in the end, but if there is a fallback that can to some degree approximate the results from the different local analyses before, we should supply it here.

errt · 2025-09-02T09:07:33Z

would this be performed inline in the fallback function that is passed at PropertyKey creation?

Yes, exactly. I think we should have a look whether we can implement a simple bytecode analysis. Potential for finding constants would be to look at LDC instructions and the ConstantValue attribute for fields.

maximilianruesch · 2025-09-02T09:37:48Z

and the ConstantValue attribute for fields

This reminds me of #1 (comment). Could we generalise reading constant-value attributes such that code is reused, or is it little enough code to be reasonable duplicated?

Further, I would suggest that this bytecode analysis stays private to the string package, and instead we encourage consuming analyses to actually deal with a StringConstancy property.

errt · 2025-09-02T10:52:16Z

Well, it is not much more code than field.constantFieldValue.isDefined and field.constantFieldValue.get.

It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.

maximilianruesch · 2025-09-02T11:20:39Z

It would be the (open) fallback computation of the property, which cannot be restricted. And the framework wouldn't push you to using an actual analysis, since it wouldn't complain about no analysis being scheduled. I am not exactly sure what your concern is, though. Either we force users to schedule a string analysis when the property is needed or we don't, I don't see much of a middle ground there.

I am not suggesting something about end-users scheduling analyses, but rather users that are developing new analyses which directly require string information should not be tempted to use the fallback computation directly, but rather rely on StringConstancy (which then might fallback if needed). This implicitly standardizes the handling of "this string could be here" to use string trees, which is imo a good idea.

But in a sense, copying the code directly into the fallback function already scopes it such that what I propose is given, so it might not need further action.

errt · 2025-09-02T11:48:48Z

That's the general expectation in OPAL anyway that properties are queried from the property store.

errt · 2025-09-04T11:35:17Z

@maximilianruesch Is a fallback computation something you would like to look into?

maximilianruesch · 2025-09-04T15:17:38Z

I could have a look next week.

…lysis

maximilianruesch · 2025-09-15T15:51:44Z

I had another look at this. There is some difficulty implementing the bytecode analysis, since the entities for which the property is queried is one of VariableDefinition, VariableContext or MethodParameterContext which are defined in the TAC package. The property key is defined in the BR package, and thus we cannot type-safely access fields from those entities to find the PCs.

How do you suggest we proceed @errt? Simply return LB if in high soundness mode and UB otherwise? Move the property key to TAC (even the trivial string analysis is in TAC)?

errt · 2025-09-15T15:56:14Z

If the entities are all defined in tac, maybe the key should be as well? If I understand you correctly, it is impossible to query the key for any sensible entity without depending on tac anyway.

…ysis

errt self-requested a review September 9, 2024 08:28

maximilianruesch changed the title ~~Use string analysis results for reflective calls~~ Use string analysis results for Class.forName reflective calls Sep 12, 2024

maximilianruesch marked this pull request as ready for review September 24, 2024 10:08

maximilianruesch marked this pull request as draft September 24, 2024 10:34

errt mentioned this pull request Jun 12, 2025

Use SystemPropertiesAnalysis to get more statically known Strings #121

Closed

errt force-pushed the feature/StringAnalysis branch from 8de9909 to 56bc6b9 Compare June 23, 2025 13:15

errt force-pushed the feature/StringAnalysis branch from 88af66c to b2f4233 Compare July 2, 2025 13:11

Base automatically changed from feature/StringAnalysis to develop August 29, 2025 09:04

errt changed the title ~~Use string analysis results for Class.forName reflective calls~~ Use string analysis results for reflective calls Aug 29, 2025

errt force-pushed the migration/reflective-calls-to-string-analysis branch from 1d0dab5 to 0731697 Compare September 1, 2025 09:14

johannesduesing added this to the Release 6.0.0 milestone Sep 12, 2025

maximilianruesch self-assigned this Sep 12, 2025

maximilianruesch added 6 commits September 12, 2025 13:35

Integrate string analysis in reflection related calls

c72fd9d

Fix some code style

922ce9b

Update integration into reflection related calls

d4205d5

Harden integration into for name analysis for invalid string information

6b43194

Fix code style

5fb4572

Improve soundness mode handling in reflection related calls

685b310

maximilianruesch and others added 14 commits September 12, 2025 13:39

Exclude all but constant string trees in low soundness call graph ana…

04fe685

…lysis

Improve regex matching performance

aadb92d

Remove pattern compilation checks

c279d06

Fix formatting

374da68

Add string analysis demo

cd60c6b

Fix reflection related field accesses

306cf35

Simplify matching types in reflective call accesses

3385a28

Fix compilation after merge

df0de05

Remove recursion guards for specific functions

3983f66

Add demo parameter for level to run

ee321b1

Fix hash code definition of PD vars

98477cf

Fix string analysis demo after merge

e6d29fd

Remove inaccessible string analysis demo parts

ae96b1f

Use string analysis for more strings in reflect analysis

17c8cdc

maximilianruesch mentioned this pull request Sep 15, 2025

Fix exception for empty parameters #313

Merged

Fix compile time exceptions and adapt to new CLI framework

f22d87a

maximilianruesch force-pushed the migration/reflective-calls-to-string-analysis branch from 0731697 to f22d87a Compare September 15, 2025 15:45

Merge branch 'develop' into migration/reflective-calls-to-string-anal…

ec22731

…ysis

maximilianruesch force-pushed the migration/reflective-calls-to-string-analysis branch from aed23d7 to ec22731 Compare September 17, 2025 12:57

maximilianruesch added 2 commits September 17, 2025 15:03

Add string constancy to requirements for reflection-related analyses

245914b

Fix formatting

5bac893

johannesduesing modified the milestones: Release 6.0.0, Release 8.0.0 Oct 8, 2025

Use string analysis results for reflective calls #214

Are you sure you want to change the base?

Use string analysis results for reflective calls #214

Uh oh!

Conversation

maximilianruesch commented Sep 8, 2024

Uh oh!

maximilianruesch commented Sep 24, 2024

Uh oh!

errt commented Sep 1, 2025

Uh oh!

maximilianruesch commented Sep 1, 2025

Uh oh!

errt commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximilianruesch commented Sep 1, 2025

Uh oh!

errt commented Sep 2, 2025

Uh oh!

maximilianruesch commented Sep 2, 2025

Uh oh!

errt commented Sep 2, 2025

Uh oh!

maximilianruesch commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

errt commented Sep 2, 2025

Uh oh!

errt commented Sep 4, 2025

Uh oh!

maximilianruesch commented Sep 4, 2025

Uh oh!

maximilianruesch commented Sep 15, 2025

Uh oh!

errt commented Sep 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

errt commented Sep 1, 2025 •

edited

Loading

maximilianruesch commented Sep 2, 2025 •

edited

Loading