-
Notifications
You must be signed in to change notification settings - Fork 55
Save attached outputs for compile-only cache entries #394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Updates Based on Reviewer FeedbackI've improved this PR to address the concerns raised by @AlexanderAshitkin in PR #176's review: Code Improvements
Addressing "Save/Restore Consistency"The reviewer's concern about consistency between save and restore operations is addressed: Save side (this PR):
Restore side (existing code):
This design ensures:
TestingThe integration test validates the complete workflow: ./mvnw -Prun-its -Dit.test=Issue393CompileRestoreTest verify Test coverage includes:
|
55c6e7a
to
489ff03
Compare
src/main/java/org/apache/maven/buildcache/CacheControllerImpl.java
Outdated
Show resolved
Hide resolved
final boolean hasPackagePhase = project.hasLifecyclePhase("package"); | ||
|
||
attachGeneratedSources(project); | ||
attachOutputs(project); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditioning this logic on package
served two main purposes:
-
It prevented premature caching of outputs if the corresponding lifecycles have not run. Doing otherwise could lead to costly and incorrect caching or restoration of the previous branch's compilation output. Here’s a problematic sequence:
- Compile Branch A and then checkout Branch B.
- Run
process-resources
in Branch B. Result: the compiled sources of Branch A are cached, under checksum for Branch B. - Run the compilation in Branch B. Result: the compiled classes from Branch A are restored in Branch B, potentially interfering with the compilation of Branch B.
Although such cached entries can be replaced later, it is still not desirable.
-
Conditioning on
package
also reduces the number of caching and unpacking operations. Specifically, it avoids the need to zip, download, or upload files during every compilation, which helps maintain performance. When an engineer is actively working on the code, is it really beneficial to cache intermediate results? It is challenging to estimate how useful this would be in practice, and I assume it won’t always be desirable.
Just thinking aloud, from the user's perspective, intentionally invoking restoration seems to offer better usability. In that sense, Gradle cache model is also flawed - saving a zip of classes with every compile task seems excessive and results in the io/cpu spent on creation of unnecessary cache entries that contain intermediate compilation results. In that sense package
is not that bad - it allows to throttle costly operation and run it together with other costly IO work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! That makes much more sense. Thank you for clarifying why this logic was conditioned no the package
phase.
To tackle this problem, I've combined timestamp-based filtering with per-project thread-safe isolation to prevent both the branch-switching scenario and race conditions in multi-threaded builds.
How It Works
1. Timestamp Filtering
The cache now only captures outputs that were either:
- Modified during the current build (timestamp >= buildStartTime), OR
- Restored from cache during this build (tracked explicitly)
This elegantly handles your branch-switching scenario:
- mvn compile on Branch A at 10:00:00 → target/classes modified at 10:00:01
- git checkout branch-b
- mvn process-resources on Branch B starts at 10:01:00
- Check target/classes: lastModified (10:00:01) < buildStartTime (10:01:00)
AND not restored this build → SKIPPED ✓
2. Handling Cache Hits
Your question about mvn package
with a compile cache hit is crucial. The solution tracks restored outputs:
- First build: mvn compile at 10:00:00
- Compiles classes, caches them ✓
- Second build: mvn package at 10:05:00
- Compile phase: Cache hit, restores to target/classes
- Restoration tracked: classifier added to state.restoredOutputClassifiers
- Package phase save() checks:
- Fresh package outputs: timestamp >= 10:05:00 → include ✓
- Restored compile outputs: in restoredOutputClassifiers → include ✓
- Stale old outputs: timestamp < 10:05:00 AND not restored → exclude ✓
3. Thread Safety for Multi-Threaded Builds
Per your comment about multi-threaded builds, I've also fixed the existing thread safety issues:
Problem: CacheControllerImpl
is @SessionScoped
(one instance shared across all modules). The original code used:
private final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
private int attachedResourceCounter = 0;
With mvn -T 4
, calling clear()
in one thread would affect other threads' modules.
Solution: Per-project isolation using ConcurrentHashMap:
private static class ProjectCacheState {
final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
int attachedResourceCounter = 0;
final Set<String> restoredOutputClassifiers = new HashSet<>();
}
private final ConcurrentMap<String, ProjectCacheState> projectStates = new ConcurrentHashMap<>();
Each module gets isolated state. Cleanup happens per-project in save()
's finally block.
Implementation Details
In CacheControllerImpl.java
:
- Capture build start time:
final long buildStartTime = session.getRequest().getStartTime().getTime();
- Track restored outputs in restoreProjectArtifacts():
state.restoredOutputClassifiers.add(attachedArtifactInfo.getClassifier());
- Check timestamps in
attachDirIfNotEmpty()
:
long lastModified = Files.getLastModifiedTime(candidateSubDir).toMillis();
boolean isRestoredThisBuild = state.restoredOutputClassifiers.contains(classifier);
if (lastModified < buildStartTime && !isRestoredThisBuild) {
// Skip stale outputs
return;
}
Testing
All existing tests pass, and the approach has been validated through:
- Compilation verification
- Full test suite execution
- Thread safety analysis of concurrent access patterns
Let me know if you have any concerns about this approach or if you'd like me to add additional safeguards or test cases.
Thread Safety Improvements: - Replace HashMap with ConcurrentHashMap for thread-safe access - Implement per-project isolation for attachedResourcesPathsById - Implement per-project counters to prevent race conditions - Remove clear() pattern that caused races in multi-threaded builds Each project now has isolated state (project key → resources map), preventing cross-module contamination and race conditions in `mvn -T` builds. Configuration Property: - Add saveCompileOutputs property (default: true) - Allows users to control compile-phase caching behavior - Provides opt-out for users wanting reduced I/O during development - Default fixes JPMS module descriptor restoration bug Addresses reviewer feedback on PR apache#394: 1. Thread safety concern with shared HashMap 2. Performance/design concerns about compile-phase caching
d6bba18
to
19f4a0a
Compare
This commit addresses reviewer concerns about: 1. Branch-switching scenario with stale artifacts 2. Thread safety in multi-threaded builds (mvn -T) Changes: **Timestamp Filtering:** - Capture build start time from session.getRequest().getStartTime() - Pass buildStartTime to attachGeneratedSources() and attachOutputs() - Check directory last-modified time in attachDirIfNotEmpty() - Skip directories with lastModified < buildStartTime (unless restored) **Per-Project Thread-Safe Isolation:** - Introduce ProjectCacheState inner class to hold per-project state: * attachedResourcesPathsById (resource path tracking) * attachedResourceCounter (unique classifier generation) * restoredOutputClassifiers (track restored outputs) - Use ConcurrentHashMap<String, ProjectCacheState> for thread-safe access - Track restored outputs in restoreProjectArtifacts() - Include restored outputs even with old timestamps - Cleanup project state after save() completes **Thread Safety Benefits:** - Each project gets isolated state (no cross-module interference) - ConcurrentHashMap handles concurrent module builds - Per-project cleanup prevents memory leaks - Fixes existing thread safety bug with HashMap.clear() **How Timestamp Filtering Works:** ``` Scenario: mvn package with compile cache hit 1. First build: mvn compile at 10:00:00 - Compiles, caches outputs ✓ 2. Second build: mvn package at 10:05:00 - Compile phase: Restores from cache - Restoration tracked in state.restoredOutputClassifiers - save() phase: Checks timestamps * Fresh outputs: timestamp >= 10:05:00 → include ✓ * Restored outputs: in restoredOutputClassifiers → include ✓ * Stale outputs: timestamp < 10:05:00 AND not restored → exclude ✓ ``` This prevents the branch-switching scenario: ``` 1. mvn compile on Branch A → target/classes modified at 10:00:01 2. git checkout branch-b 3. mvn process-resources on Branch B starts at 10:01:00 4. Check target/classes: 10:00:01 < 10:01:00 AND not restored → SKIP ✓ ```
2bf5f01
to
92822da
Compare
Addressing Performance ConcernsThank you for raising the important question about performance overhead during active development. You're absolutely right that caching compile outputs on every
Solution: Configurable Compile CachingI've added a new property # Disable compile caching during active development:
mvn clean compile -Dmaven.build.cache.cacheCompile=false
# Enable for CI/multi-module scenarios (default):
mvn clean compile Design RationaleDefault: true - Maintains the fix for issue #393 without breaking changes. Users experiencing compile-only cache restoration issues get the fix automatically. Opt-out available - Developers actively editing code can disable compile caching to avoid overhead, while still benefiting from package-phase caching. Use CasesWhen to keep enabled (default):
When to disable:
ImplementationThe property gates the calls to if (cacheConfig.isCacheCompile()) {
attachGeneratedSources(project, state, buildStartTime);
attachOutputs(project, state, buildStartTime);
} This means:
TestingAdded
Ready for your review! |
a53e062
to
26af745
Compare
…caching Added new configuration property maven.build.cache.cacheCompile (default: true) to address reviewer concerns about performance overhead during active development. Changes: - CacheConfig: Added isCacheCompile() method with javadoc - CacheConfigImpl: Added CACHE_COMPILE constant and implementation (default true) - CacheControllerImpl: Wrapped attachGeneratedSources() and attachOutputs() calls in isCacheCompile() check - CacheCompileDisabledTest: Integration tests verifying behavior when disabled Benefits: - Maintains fix for issue apache#393 (default behavior unchanged) - Allows developers to opt-out of compile caching with: -Dmaven.build.cache.cacheCompile=false - Reduces IO overhead (no zipping/uploading) during active development - Users working on large multi-module projects can keep it enabled Usage: # Disable compile caching to reduce overhead during development: mvn clean compile -Dmaven.build.cache.cacheCompile=false # Enable for CI/multi-module scenarios (default): mvn clean compile
26af745
to
3126fe1
Compare
Summary
This PR fixes incomplete cache restoration from compile-only builds by ensuring that configured outputs (e.g.,
classes
,test-classes
,module-info
) are always attached and saved to the cache, regardless of the highest lifecycle phase executed.Changes Made
Reset attached resource bookkeeping per save invocation
attachedResourcesPathsById
and resetsattachedResourceCounter
at the start of eachsave()
callAttach outputs for all lifecycle phases
attachGeneratedSources()
andattachOutputs()
outside thehasLifecyclePhase("package")
conditionalAdd comprehensive integration test
module-info.class
files are restored correctly after cache hitcompile
-Only Builds #393Addressing Reviewer Feedback from PR #176
In PR #176#discussion_r1732039886, @AlexanderAshitkin suggested:
This PR implements exactly that suggestion:
attachGeneratedSources()
isRestoreGeneratedSources()
)Root Cause Analysis
Before this PR:
Problem: Compile-only builds (
mvn clean compile
) never calledattachOutputs()
, creating cache entries without critical files likemodule-info.class
.After this PR:
Result: Compile-only builds save configured outputs, enabling proper restoration in subsequent builds.
Testing
Run the integration test:
The test validates:
mvn clean compile
creates cache entry withmodule-info.class
mvn clean verify
restores from cache includingmodule-info.class
Related Issues
Fixes #393 - Incomplete Cache Restoration from compile-only Builds
Fixes #259 - Cannot restore cache after
mvn clean
commandsFixes #340 - Cache fails to restore compiled classes with
mvn clean test
Migration Notes
No configuration changes required. Existing projects will automatically benefit from this fix on their next build.
Note on Implementation Philosophy: This PR prioritizes correctness and consistency over optimization. All lifecycle phases save configured outputs to ensure cache integrity. Future optimizations could selectively skip certain outputs based on phase, but this would require careful analysis to avoid reintroducing the bugs fixed here.