Use batching instead of individual runs per file#55
Conversation
Somewhere along Codacy's lifetime swiflint was changed to run individually for each file which was resulting in constant an prevasive timeouts at 15 minutes (from testing, the tool would take close to 2 hours to be able to analyse a frequent workload -- under version `0.61.0`). This was happening because starting up all the swiflint machinery was quite expensive and time consuming (~1 minute) and was being done for each indivitual file. Seemingly, with a bump to version `0.63.2` this problem basically all but vanishes, as a workload that was takin 1:55:33 to be performed under `0.61.0` would take 00:01:29 under `0.63.2` therefore greatly mitigating the original issue. Still, batching (giving the tool a list of all the files to be analysed) significan performance improvemnts, reducing the analysis time from 90 seconds to 10 seconds. Assuming the same workload (100 files): 0.61.0 sequential: `01:55:33` (~2 hours) 0.63.2 sequential: `00:01:26` (~1:30 minutes) 0.63.2 batch: `00:00:09` (~10 seconds) The generated results accross versions and approaches seem to be similar, so the change is apparently a net positive. Unfortunately it was not really possible to fully confirm the motivations that originally had moved swiflint from batch to sequential. TAROT-3633
Codacy's Analysis Summary0 new issue (≤ 1 medium issue) Review Pull Request in Codacy →
|
Now that we run files all together, we can't return a FileError, since we don't know which file caused the error. We return a global error instead.
There was a problem hiding this comment.
Pull Request Overview
The PR is 'up to standards'. Moving to batching provides a significant performance gain (from ~2 hours to ~10 seconds), effectively addressing timeout issues. However, there is a discrepancy: the description mentions a version bump to SwiftLint 0.63.2 as a key part of the fix, but this change (typically in a Dockerfile) is missing from the diff. Additionally, the implementation trades off granular error reporting for speed.
About this PR
- The PR description highlights a version bump to SwiftLint 0.63.2 as a major factor in performance improvement, but the diff does not include changes to the Dockerfile or any build configuration where this version is defined. Ensure the version update is part of this PR.
💡 Codacy uses AI. Check for mistakes.
| runToolCommand(command, source, cfgOpt) | ||
| }.flatten |
There was a problem hiding this comment.
🟡 MEDIUM RISK
The move to batch execution removes the resilience provided by the previous per-file error handling. Previously, a crash or failure on a single file was captured as a FileError, allowing results for other files to still be reported. In the new batch mode, any failure during execution will cause the entire apply method to return a global Failure. While this trade-off is likely acceptable for the 100x performance gain, it changes how partial failures are communicated.
Try running the following prompt in your IDE agent:
Check if
runToolCommandcan be adapted to handle cases where SwiftLint returns a non-zero exit code but still produces partial JSON results on stdout, allowing us to report available violations even if the tool encountered an error on some files.
Somewhere along Codacy's lifetime swiftlint was changed to run individually for each file which was resulting in constant an pervasive timeouts at 15 minutes (from testing, the tool would take close to 2 hours to be able to analyse a frequent workload -- under version
0.61.0). This was happening because starting up all the swiftlint machinery was quite expensive and time consuming (~1 minute) and was being done for each individual file.Seemingly, with a bump to version
0.63.2this problem basically all but vanishes, as a workload that was takin 1:55:33 to be performed under0.61.0would take 00:01:29 under0.63.2therefore greatly mitigating the original issue.Still, batching (giving the tool a list of all the files to be analysed) significant performance improvements, reducing the analysis time from 90 seconds to 10 seconds.
Assuming the same workload (100 files):
0.61.0 sequential:
01:55:33(~2 hours)docker run --rm -t -e TIMEOUT_SECONDS="18000" --net=none --cap-drop=ALL - 0.20s user 0.07s system 0% cpu 1:55:33.54 total0.63.2 sequential:
00:01:26(~1:30 minutes)docker run --rm -t --net=none --cap-drop=ALL --security-opt=no-new-privileges 0.08s user 0.04s system 0% cpu 1:26.32 total0.63.2 batch:
00:00:09(~10 seconds)The generated results across versions and approaches seem to be similar, so the change is apparently a net positive. Unfortunately it was not really possible to fully confirm the motivations that originally had moved swiflint from batch to sequential.
TAROT-3633

Metabase dashboard:
