Skip to content

Conversation

cowwoc
Copy link

@cowwoc cowwoc commented Oct 8, 2025

Summary

This PR fixes incomplete cache restoration from compile-only builds by ensuring that configured outputs (e.g., classes, test-classes, module-info) are always attached and saved to the cache, regardless of the highest lifecycle phase executed.

Changes Made

  1. Reset attached resource bookkeeping per save invocation

    • Clears attachedResourcesPathsById and resets attachedResourceCounter at the start of each save() call
    • Prevents stale artifacts from previous builds in multi-module projects from polluting cache entries
  2. Attach outputs for all lifecycle phases

    • Moves attachGeneratedSources() and attachOutputs() outside the hasLifecyclePhase("package") conditional
    • Ensures compile-only builds persist configured output directories to enable cache hits in subsequent builds
    • Maintains consistency between save and restore operations: what gets cached can be restored
  3. Add comprehensive integration test

Addressing Reviewer Feedback from PR #176

In PR #176#discussion_r1732039886, @AlexanderAshitkin suggested:

"More accurate implementation could support pre-compile phases - 'save' and restore generated sources and continue with compile."

This PR implements exactly that suggestion:

  • Pre-compile phase support: Compile-only builds now save all configured outputs
  • Generated sources: Already supported via attachGeneratedSources()
  • Continue with compile: Subsequent builds restore outputs and continue seamlessly with later phases
  • Save/Restore consistency: Attached outputs are saved unconditionally and restored based on configuration (controlled by isRestoreGeneratedSources())

Root Cause Analysis

Before this PR:

if (project.hasLifecyclePhase("package")) {
    attachGeneratedSources(project);  // Only for package phase
    attachOutputs(project);           // Only for package phase
    // ...
}

Problem: Compile-only builds (mvn clean compile) never called attachOutputs(), creating cache entries without critical files like module-info.class.

After this PR:

attachGeneratedSources(project);  // Always called
attachOutputs(project);           // Always called

Result: Compile-only builds save configured outputs, enabling proper restoration in subsequent builds.

Testing

Run the integration test:

./mvnw -Prun-its -Dit.test=Issue393CompileRestoreTest verify

The test validates:

  1. mvn clean compile creates cache entry with module-info.class
  2. mvn clean verify restores from cache including module-info.class
  3. ✅ Consumer module can compile against cached JPMS module descriptors
  4. ✅ Tests execute successfully with restored artifacts

Related Issues

Fixes #393 - Incomplete Cache Restoration from compile-only Builds
Fixes #259 - Cannot restore cache after mvn clean commands
Fixes #340 - Cache fails to restore compiled classes with mvn clean test

Migration Notes

No configuration changes required. Existing projects will automatically benefit from this fix on their next build.


Note on Implementation Philosophy: This PR prioritizes correctness and consistency over optimization. All lifecycle phases save configured outputs to ensure cache integrity. Future optimizations could selectively skip certain outputs based on phase, but this would require careful analysis to avoid reintroducing the bugs fixed here.

@cowwoc
Copy link
Author

cowwoc commented Oct 8, 2025

Updates Based on Reviewer Feedback

I've improved this PR to address the concerns raised by @AlexanderAshitkin in PR #176's review:

Code Improvements

  1. Added detailed inline comments explaining:

    • Why bookkeeping reset is necessary (prevents stale artifacts in multi-module builds)
    • That outputs are attached for ALL lifecycle phases (including compile-only)
    • How this addresses the JPMS module descriptor restoration issues
  2. Added debug logging to make compile-only cache saves visible in build logs

  3. Enhanced PR description with:

    • Clear explanation of how this implements the "pre-compile phase" suggestion
    • Root cause analysis comparing before/after behavior
    • Explicit mapping to the reviewer's feedback

Addressing "Save/Restore Consistency"

The reviewer's concern about consistency between save and restore operations is addressed:

Save side (this PR):

  • Always attaches configured outputs (classes, test-classes, etc.)
  • Creates cache entries that contain all necessary artifacts for restoration

Restore side (existing code):

  • Restores attached outputs based on configuration (isRestoreGeneratedSources())
  • The configuration controls what gets restored, not what gets saved

This design ensures:

  • ✅ Compile-only builds can be restored by subsequent builds
  • ✅ Users retain control over restoration behavior via configuration
  • ✅ No cache entries are created without necessary outputs

Testing

The integration test validates the complete workflow:

./mvnw -Prun-its -Dit.test=Issue393CompileRestoreTest verify

Test coverage includes:

  • JPMS module descriptors (module-info.class)
  • Multi-module projects with inter-module dependencies
  • Compile-only → verify workflow (the reported bug scenario)

@cowwoc cowwoc force-pushed the issue-393-compile-cache branch 2 times, most recently from 55c6e7a to 489ff03 Compare October 8, 2025 04:34
final boolean hasPackagePhase = project.hasLifecyclePhase("package");

attachGeneratedSources(project);
attachOutputs(project);
Copy link
Contributor

@AlexanderAshitkin AlexanderAshitkin Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditioning this logic on package served two main purposes:

  1. It prevented premature caching of outputs if the corresponding lifecycles have not run. Doing otherwise could lead to costly and incorrect caching or restoration of the previous branch's compilation output. Here’s a problematic sequence:

    • Compile Branch A and then checkout Branch B.
    • Run process-resources in Branch B. Result: the compiled sources of Branch A are cached, under checksum for Branch B.
    • Run the compilation in Branch B. Result: the compiled classes from Branch A are restored in Branch B, potentially interfering with the compilation of Branch B.
      Although such cached entries can be replaced later, it is still not desirable.
  2. Conditioning on package also reduces the number of caching and unpacking operations. Specifically, it avoids the need to zip, download, or upload files during every compilation, which helps maintain performance. When an engineer is actively working on the code, is it really beneficial to cache intermediate results? It is challenging to estimate how useful this would be in practice, and I assume it won’t always be desirable.

Just thinking aloud, from the user's perspective, intentionally invoking restoration seems to offer better usability. In that sense, Gradle cache model is also flawed - saving a zip of classes with every compile task seems excessive and results in the io/cpu spent on creation of unnecessary cache entries that contain intermediate compilation results. In that sense package is not that bad - it allows to throttle costly operation and run it together with other costly IO work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! That makes much more sense. Thank you for clarifying why this logic was conditioned no the package phase.

To tackle this problem, I've combined timestamp-based filtering with per-project thread-safe isolation to prevent both the branch-switching scenario and race conditions in multi-threaded builds.

How It Works

1. Timestamp Filtering

The cache now only captures outputs that were either:

  • Modified during the current build (timestamp >= buildStartTime), OR
  • Restored from cache during this build (tracked explicitly)

This elegantly handles your branch-switching scenario:

  1. mvn compile on Branch A at 10:00:00 → target/classes modified at 10:00:01
  2. git checkout branch-b
  3. mvn process-resources on Branch B starts at 10:01:00
  4. Check target/classes: lastModified (10:00:01) < buildStartTime (10:01:00)
    AND not restored this build → SKIPPED ✓

2. Handling Cache Hits

Your question about mvn package with a compile cache hit is crucial. The solution tracks restored outputs:

  1. First build: mvn compile at 10:00:00
  • Compiles classes, caches them ✓
  1. Second build: mvn package at 10:05:00
  • Compile phase: Cache hit, restores to target/classes
  • Restoration tracked: classifier added to state.restoredOutputClassifiers
  • Package phase save() checks:
    • Fresh package outputs: timestamp >= 10:05:00 → include ✓
    • Restored compile outputs: in restoredOutputClassifiers → include ✓
    • Stale old outputs: timestamp < 10:05:00 AND not restored → exclude ✓

3. Thread Safety for Multi-Threaded Builds

Per your comment about multi-threaded builds, I've also fixed the existing thread safety issues:

Problem: CacheControllerImpl is @SessionScoped (one instance shared across all modules). The original code used:

private final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
private int attachedResourceCounter = 0;

With mvn -T 4, calling clear() in one thread would affect other threads' modules.

Solution: Per-project isolation using ConcurrentHashMap:

private static class ProjectCacheState {
    final Map<String, Path> attachedResourcesPathsById = new HashMap<>();
    int attachedResourceCounter = 0;
    final Set<String> restoredOutputClassifiers = new HashSet<>();
}

private final ConcurrentMap<String, ProjectCacheState> projectStates = new ConcurrentHashMap<>();

Each module gets isolated state. Cleanup happens per-project in save()'s finally block.

Implementation Details

In CacheControllerImpl.java:

  1. Capture build start time:
    final long buildStartTime = session.getRequest().getStartTime().getTime();
  2. Track restored outputs in restoreProjectArtifacts():
    state.restoredOutputClassifiers.add(attachedArtifactInfo.getClassifier());
  3. Check timestamps in attachDirIfNotEmpty():
long lastModified = Files.getLastModifiedTime(candidateSubDir).toMillis();
boolean isRestoredThisBuild = state.restoredOutputClassifiers.contains(classifier);

if (lastModified < buildStartTime && !isRestoredThisBuild) {
    // Skip stale outputs
    return;
}

Testing

All existing tests pass, and the approach has been validated through:

  • Compilation verification
  • Full test suite execution
  • Thread safety analysis of concurrent access patterns

Let me know if you have any concerns about this approach or if you'd like me to add additional safeguards or test cases.

cowwoc added a commit to cowwoc/maven-build-cache-extension that referenced this pull request Oct 11, 2025
Thread Safety Improvements:
- Replace HashMap with ConcurrentHashMap for thread-safe access
- Implement per-project isolation for attachedResourcesPathsById
- Implement per-project counters to prevent race conditions
- Remove clear() pattern that caused races in multi-threaded builds

Each project now has isolated state (project key → resources map), preventing
cross-module contamination and race conditions in `mvn -T` builds.

Configuration Property:
- Add saveCompileOutputs property (default: true)
- Allows users to control compile-phase caching behavior
- Provides opt-out for users wanting reduced I/O during development
- Default fixes JPMS module descriptor restoration bug

Addresses reviewer feedback on PR apache#394:
1. Thread safety concern with shared HashMap
2. Performance/design concerns about compile-phase caching
@cowwoc cowwoc force-pushed the issue-393-compile-cache branch 2 times, most recently from d6bba18 to 19f4a0a Compare October 11, 2025 23:43
This commit addresses reviewer concerns about:
1. Branch-switching scenario with stale artifacts
2. Thread safety in multi-threaded builds (mvn -T)

Changes:

**Timestamp Filtering:**
- Capture build start time from session.getRequest().getStartTime()
- Pass buildStartTime to attachGeneratedSources() and attachOutputs()
- Check directory last-modified time in attachDirIfNotEmpty()
- Skip directories with lastModified < buildStartTime (unless restored)

**Per-Project Thread-Safe Isolation:**
- Introduce ProjectCacheState inner class to hold per-project state:
  * attachedResourcesPathsById (resource path tracking)
  * attachedResourceCounter (unique classifier generation)
  * restoredOutputClassifiers (track restored outputs)
- Use ConcurrentHashMap<String, ProjectCacheState> for thread-safe access
- Track restored outputs in restoreProjectArtifacts()
- Include restored outputs even with old timestamps
- Cleanup project state after save() completes

**Thread Safety Benefits:**
- Each project gets isolated state (no cross-module interference)
- ConcurrentHashMap handles concurrent module builds
- Per-project cleanup prevents memory leaks
- Fixes existing thread safety bug with HashMap.clear()

**How Timestamp Filtering Works:**
```
Scenario: mvn package with compile cache hit
1. First build: mvn compile at 10:00:00
   - Compiles, caches outputs ✓
2. Second build: mvn package at 10:05:00
   - Compile phase: Restores from cache
   - Restoration tracked in state.restoredOutputClassifiers
   - save() phase: Checks timestamps
     * Fresh outputs: timestamp >= 10:05:00 → include ✓
     * Restored outputs: in restoredOutputClassifiers → include ✓
     * Stale outputs: timestamp < 10:05:00 AND not restored → exclude ✓
```

This prevents the branch-switching scenario:
```
1. mvn compile on Branch A → target/classes modified at 10:00:01
2. git checkout branch-b
3. mvn process-resources on Branch B starts at 10:01:00
4. Check target/classes: 10:00:01 < 10:01:00 AND not restored → SKIP ✓
```
@cowwoc cowwoc force-pushed the issue-393-compile-cache branch from 2bf5f01 to 92822da Compare October 12, 2025 03:56
@cowwoc
Copy link
Author

cowwoc commented Oct 12, 2025

Addressing Performance Concerns

Thank you for raising the important question about performance overhead during active development. You're absolutely right that caching compile outputs on every mvn compile has costs:

  • IO overhead: Zipping classes directories
  • CPU overhead: Computing hashes
  • Network overhead: Potentially uploading to remote cache

Solution: Configurable Compile Caching

I've added a new property maven.build.cache.cacheCompile (default: true) to give users control:

# Disable compile caching during active development:
mvn clean compile -Dmaven.build.cache.cacheCompile=false

# Enable for CI/multi-module scenarios (default):
mvn clean compile

Design Rationale

Default: true - Maintains the fix for issue #393 without breaking changes. Users experiencing compile-only cache restoration issues get the fix automatically.

Opt-out available - Developers actively editing code can disable compile caching to avoid overhead, while still benefiting from package-phase caching.

Use Cases

When to keep enabled (default):

  • Large multi-module projects (editing 1 of 50 modules)
  • CI/CD pipelines for quick feedback
  • Branch switching scenarios
  • Working on test code only

When to disable:

  • Actively editing a single module
  • Rapid edit-compile-test cycles
  • Development environments where package phase is always run anyway

Implementation

The property gates the calls to attachGeneratedSources() and attachOutputs() in CacheControllerImpl.save():

if (cacheConfig.isCacheCompile()) {
    attachGeneratedSources(project, state, buildStartTime);
    attachOutputs(project, state, buildStartTime);
}

This means:

Testing

Added CacheCompileDisabledTest to verify:

  • ✅ With property disabled: No cache entries created, no restoration on second compile
  • ✅ With property enabled: Cache entries created, restoration works on second compile

Ready for your review!

@cowwoc cowwoc force-pushed the issue-393-compile-cache branch from a53e062 to 26af745 Compare October 12, 2025 05:35
…caching

Added new configuration property maven.build.cache.cacheCompile (default: true)
to address reviewer concerns about performance overhead during active development.

Changes:
- CacheConfig: Added isCacheCompile() method with javadoc
- CacheConfigImpl: Added CACHE_COMPILE constant and implementation (default true)
- CacheControllerImpl: Wrapped attachGeneratedSources() and attachOutputs()
  calls in isCacheCompile() check
- CacheCompileDisabledTest: Integration tests verifying behavior when disabled

Benefits:
- Maintains fix for issue apache#393 (default behavior unchanged)
- Allows developers to opt-out of compile caching with:
  -Dmaven.build.cache.cacheCompile=false
- Reduces IO overhead (no zipping/uploading) during active development
- Users working on large multi-module projects can keep it enabled

Usage:
  # Disable compile caching to reduce overhead during development:
  mvn clean compile -Dmaven.build.cache.cacheCompile=false

  # Enable for CI/multi-module scenarios (default):
  mvn clean compile
@cowwoc cowwoc force-pushed the issue-393-compile-cache branch from 26af745 to 3126fe1 Compare October 12, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants