Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return a Unique Exit Code for Out of Memory Incidents #1339

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zahidblackduck
Copy link
Collaborator

@zahidblackduck zahidblackduck commented Jan 16, 2025

Ticket

IDETECT-3872

Summary

This pull request implements a solution for handling "Out of Memory" (OOM) incidents in sub-processes by introducing a unique exit code. The changes ensure that OOM-related failures are appropriately identified and prioritized in the application's exit code logic.

Key Changes

  1. Detection of OOM Incidents

    • A new method, hasOutOfMemoryIssue, was added to the DetectIssuePublisher class.

    • This method inspects DetectorDirectoryReport objects to identify if any sub-process has crashed due to an OOM incident (mapped to DetectorStatusCode.EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY).

      public boolean hasOutOfMemoryIssue(List<DetectorDirectoryReport> reports) {
          logger.info("Checking for Out of Memory (OOM) Error.");
          return reports.stream()
              .flatMap(report -> report.getNotExtractedDetectors().stream())
              .flatMap(notExtracted -> notExtracted.getAttemptedDetectables().stream())
              .anyMatch(attemptedDetectableReport -> {
                  logger.debug("Attempted Detectable Status Code: {}", attemptedDetectableReport.getStatusCode());
                  return attemptedDetectableReport.getStatusCode() == DetectorStatusCode.EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY;
              });
      }
  2. Publishing a New Exit Code

    • The performDetectors method in the DetectorTool class was updated to use the hasOutOfMemoryIssue method.

    • If an OOM incident is detected, a new exit code (FAILURE_OUT_OF_MEMORY) is published via the ExitCodePublisher.

      boolean outOfMemoryIssueFound = detectorIssuePublisher.hasOutOfMemoryIssue(reports);
      if (outOfMemoryIssueFound) {
          logger.error("Detected an issue: EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY.");
          exitCodePublisher.publishExitCode(ExitCodeType.FAILURE_OUT_OF_MEMORY, "Executable terminated likely due to out of memory.");
      }
  3. Handling Exit Code Prioritization

    • The unique exit code for OOM incidents needed to take precedence in the application's exit code evaluation.

    • The getWinningExitCode method in ExitCodeManager was modified to handle this special case, ensuring FAILURE_OUT_OF_MEMORY is correctly prioritized.

      public ExitCodeType getWinningExitCode() {
          ExitCodeType winningExitCodeType = ExitCodeType.SUCCESS;
          for (ExitCodeRequest exitCodeRequest : exitCodeRequests) {
              winningExitCodeType = ExitCodeType.getWinningExitCodeType(winningExitCodeType, exitCodeRequest.getExitCodeType());
          }
      
          if (!winningExitCodeType.isSuccess() &&
              exitCodeRequests.stream().anyMatch(request -> request.getExitCodeType() == ExitCodeType.FAILURE_OUT_OF_MEMORY)) {
              return ExitCodeType.FAILURE_OUT_OF_MEMORY;
          }
      
          return winningExitCodeType;
      }
  4. Challenges and Resolutions

    • Initially, a lower value exit code (e.g., -1) was tested to prioritize OOM errors, but it was deemed unsuitable for representing the failure state.
    • Instead, the prioritization logic was incorporated into getWinningExitCode to allow for a meaningful and unique exit code without conflicting with existing exit codes.

Testing

  • Simulated various OOM scenarios to verify detection and publishing of the custom exit code.
  • Confirmed that the modified getWinningExitCode logic correctly prioritizes the OOM exit code over other failure types.

Impact

  • Ensures better visibility and handling of OOM errors in sub-processes.
  • Avoids exit code conflicts by utilizing prioritization logic rather than relying solely on numeric values.

Notes

Please review the changes, especially the modifications in getWinningExitCode, to confirm alignment with our design principles.

@@ -25,6 +25,7 @@ public enum ExitCodeType {
),

FAILURE_ACCURACY_NOT_MET(15, "Detect was unable to meet the required accuracy."),
FAILURE_OUT_OF_MEMORY(16, "Detect encountered an Out of Memory error. Please review memory settings and system resources."),
Copy link
Contributor

@andrian-sevastyanov andrian-sevastyanov Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An idea for your consideration: we could modify the enum to have a new double priority field.

    private final int exitCode;
    private final String description;
    private final double priority;

    ExitCodeType(int exitCode, String description) {
        ExitCodeType(exitCode, description, (double) exitCode);
    }

    ExitCodeType(int exitCode, String description, double priority) {
        this.exitCode = exitCode;
        this.description = description;
        this.priority = priority;
    }

Then you could define the new exit code as FAILURE_OUT_OF_MEMORY(16, "...", 0.5), and start sorting exit codes not by their value but by their priority. This would also allow us to insert any priority we want anywhere going forward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting insight. Surely it can be done. Let's have team's opinion on it whether we should go with this approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the ExitCodeType enum as per your suggestion and it worked as expected in my end. As this is planned for 10.4.0 release, the team might have their opinions whether we should go with this approach.

logger.error("Detected an issue: EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY");
exitCodePublisher.publishExitCode(ExitCodeType.FAILURE_OUT_OF_MEMORY, "Executable terminated likely due to out of memory.");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just be me but I find this error checking a bit distracting in the overall flow of the performDetectors method. Perhaps consider refactoring these lines to a small method you can call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this error checking into a function called checkAndHandleOutOfMemoryIssue

@zahidblackduck zahidblackduck force-pushed the dev/zahidblackduck/IDETECT-3872-v2 branch from 6cd05c2 to 435951a Compare January 30, 2025 12:05
return second;
} else {
return (first.getExitCode() < second.getExitCode()) ? first : second;
Copy link
Contributor

@andrian-sevastyanov andrian-sevastyanov Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For your consideration: we could ensure that all exit codes have unique priorities by adding a test that iterates through all enum values and checks that. Then, this last check would not be necessary as comparing priorities would be enough to ensure deterministic results. Additionally, this would ensure that someone does not accidentally add an exit code without considering what priorities are meant for. We could also perform a similar check to ensure that exit code values are also unique.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion! I've implemented the proposed change and added tests to validate it.

One concern I have is how we determine the priority value for an exit code. Currently, we are assigning an arbitrary lower double value to ensure that a specific exit code takes precedence. For example:

FAILURE_OUT_OF_MEMORY(16, "Detect encountered an Out of Memory error. Please review memory settings and system resources.", 0.5)

I believe we should establish clear rules for setting priority values when deviating from the default (i.e., using the exit code itself as the priority). Assigning priorities arbitrarily doesn't seem ideal to me. What are your thoughts on defining a more structured approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should try to get to that point eventually and it's on a TODO list. If you have some concrete ideas already, feel free to write them down and share with the team!

//Completed.
logger.debug("Finished running detectors.");
detectorEventPublisher.publishDetectorsComplete(toolResult);

return toolResult;
}

private void checkAndHandleOutOfMemoryIssue(List<DetectorDirectoryReport> reports) {
if (detectorIssuePublisher.hasOutOfMemoryIssue(reports)) {
logger.error("Detected an issue: EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY");
Copy link
Contributor

@andrian-sevastyanov andrian-sevastyanov Jan 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of hard-coding EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY you can get it from the enum name (EXECUTABLE_TERMINATED_LIKELY_OUT_OF_MEMORY.toString()).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It should be done this way rather than hard-coding.

Copy link
Contributor

@andrian-sevastyanov andrian-sevastyanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but added some comments for your consideration.

@gopannabd
Copy link
Contributor

MR is still in draft?

@andrian-sevastyanov
Copy link
Contributor

andrian-sevastyanov commented Jan 30, 2025

MR is still in draft?

@gopannabd , I believe this is so that we do not merge it too soon accidentally. This is destined for Detect 10.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants