Skip to content

Conversation

@leba-atr
Copy link

@leba-atr leba-atr commented Mar 6, 2025

Description

Currently, the plugin calculates a md5 hash over the JSON serialized test definition for a given atomic test and then uses the hash value as the id for the plugin. While this is fine for static datasets, Atomic Tests every now and then receive updates. These updates cause the md5 hash to change an any reference in Caldera to break (e.g. abilities referenced from the stockpile plugin, check the debug output of Caldera for current examples).

This PR introduces a breaking change in the way how this plugin generates ability ids. Instead of calculating a hash over the test data, this plugin now prefers to use the auto-generated unique UUID that each test is assigned. This also affects how one references Atomic abilities in Plugins like stockpile but also custom in Adversaries created manually via API calls.

Context: when creating adversaries via manual API calls, one cannot just use the Atomic test uuid to link Abilities with a given Adversary but instead the custom hash value must be retrieved from the API beforehand. This makes it more tedious than I expected to reference Atomic tests in custom Adversaries.

Type of change

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

  • caldera setup completes successfully
  • all warnings from Stockpile plugin regarding missing abilities in adversaries are gone
  • visual verification that especially the Stockpile adversaries contain only abilities which 'make sense' (see use atomic UUIDs instead of md5 hash stockpile#579)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

@mkultraWasHere
Copy link
Contributor

We will need to marinate on this change. Not against it logically, but not sure of the fallout from the breaking change.

@leba-atr
Copy link
Author

I thought about that for a minute yesterday evening and came up with an idea that would on the one hand not be breaking but on the other hand require refactoring in the main Caldera repository.

Roughly, the idea is as follows:

  • store both the UUID and the hash sum
  • when looking up abilities, the REST api call handler looks up the id from the request as uuid first; if nothing is found, the handler falls back to the hash sum
  • alternatively, the api call handler could use a regex to check if the id is either a uuid (e.g. [a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}) or a md5 hash (e.g. [a-fA-F0-9]{64}) and then look up the requested object by the corresponding property

Maybe that opens another way forward with this MR.

@uruwhy
Copy link
Contributor

uruwhy commented May 19, 2025

Here's an idea that wouldn't require refactoring:

  • Have a setting specific to the atomic plugin that enables/disables legacy IDs.
  • If enabled (default setting), the atomic plugin will generate 2 copies of each ability - one copy with the MD5 hash ID (legacy ID) and (legacy) appended to the ability name, and the 2nd copy using the atomic test UUID and original name.
  • This way, users can fully migrate on their own when they are ready by simply disabling the setting
  • Publicly released content moving forward can reference the atomic test UUIDs only, but anything referencing the old MD5-based ID won't break unless the legacy atomic setting is disabled

Thoughts?

@uruwhy
Copy link
Contributor

uruwhy commented May 30, 2025

Alright I have an alternative idea based on @leba-atr 's suggestion that will require some adjustments to core code and ability format, but will actually provide users with more flexibility down the line

Proposal: add an alternative_ids field to the ability object structure. This will be a list of ID strings that represent alternative/backup IDs that users can specify for a single ability. So if users want to label abilities with their own ID schema in addition to the typical UUIDs, they can do so, and CALDERA will be able to search for and manage abilities using any of those IDs (users will just need to make sure all alternative IDs are globally unique within their CALDERA installation)

For the atomic plugin in particular, the plugin service would grab the atomic test ID and use that as the primary ability ID, and use the hash as an alternate ID. This way, we don't break any user's custom adversary profiles that rely on the hash-based IDs, and users can use the more stable atomic test ID going forward or switch at their own pace

Rather than looking up the primary ID first, CALDERA would simply look up whatever ID is referenced in the adversary profile. When abilities get imported, we'll have all of the associated IDs point to the same underlying ability object, but we'll still maintain the notion of a primary ID since that ID will be used for the yaml file name

@leba-atr
Copy link
Author

leba-atr commented Jun 2, 2025

Personally, I'm in favor of the second approach. And especially so for the reason that it opens up the possibility to use custom identifiers in addition to the ones used by Caldera internally. Also, this allows to postpone the breaking change for users until the next major release where the deprecated approach can literally just be deleted from the code without the need to re-write lots of functionality.

@uruwhy
Copy link
Contributor

uruwhy commented Jun 11, 2025

As much as I'd like to introduce a secondary ID(s) field, doing so would require quite significant updates to the core code in order to accommodate looking up and referencing abilities (and potentially other objects) with more than 1 ID. Similar amounts of effort would also be required to store and manage both atomic ability IDs as "official" ability object fields. However, I do have a middle approach in the works that looks like the following:

  • Incorporate your atomic plugin fix to prioritize the pre-generated atomic UUID and only use the MD5 hash if that UUID field isn't there for whatever reason. This will apply to any new installations of the atomic plugin (e.g. fresh install of caldera, first time enabling the atomic plugin, or recloning and re-processing the atomic repo to populate the abilities directory).
    • Generated abilities will also have the corresponding MD5-hash-based legacy ID appended to the ability description so that users can easily look up what the correct UUID should be when transitioning any custom adversary profiles
  • Add a flag file (.e.g .processed_atomic) that indicates whether or not the "new" method of processing the atomic plugin has been performed. If this file does not exist in the data folder, then either the user has never used the atomic plugin, or only has legacy-style atomic abilities in an existing atomic plugin installation
  • When the atomic plugin runs, it will check if the abilities directory exists.
    • If the directory does not exist, then we can safely assume that the user has no legacy abilities with the MD5 hash and can proceed to only generate atomic abilities with the real UUIDs.
    • If the directory exists, but the processing flag file does not exist, then we can assume that the user has some legacy abilities with the MD5 hash. The atomic service will then generate atomic abilities with the real UUIDs but still keep the legacy abilities to avoid breaking any custom profiles that the user has that might depend on those MD5 hash IDs.
    • If both the directory and processing flag file exist, then we don't need to reprocess anything
  • If the atomic plugin detects any MD5 hash legacy ID yaml files in the abilities directory, it will provide a warning that notifies the user that they have legacy atomic abilities and should retire them when possible. It will also append (legacy ID) or a similar suffix to each ability name, so that users easily know which copy of the ability is legacy or not when viewing them in the GUI
    • during this transition period, users will have an artificially inflated number of abilities in their caldera installation since every atomic ability will essentially be duplicated
  • we will provide a bash script that users can run to move all the MD5-hash-based abilities to a backup directory in the atomic plugin, effectively "retiring" them

@deacon-mp deacon-mp requested review from Copilot and removed request for mkultraWasHere October 6, 2025 22:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes how ability IDs are generated for Atomic tests by preferring the auto-generated UUID over the calculated MD5 hash. This addresses issues where hash-based IDs break references when test definitions are updated.

  • Replaces MD5 hash-based ability ID generation with UUID-based approach for stability
  • Maintains backward compatibility by falling back to hash when UUID is unavailable
  • Introduces breaking changes for existing ability references

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Return True if an ability was saved.
"""
ability_id = hashlib.md5(json.dumps(test).encode()).hexdigest()
ability_id = test.get('auto_generated_guid') or hashlib.md5(json.dumps(test).encode()).hexdigest()
Copy link

Copilot AI Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The fallback to MD5 hash maintains the old behavior but could lead to inconsistent ID types. Consider documenting the expected format of 'auto_generated_guid' and whether it should be validated before use.

Copilot uses AI. Check for mistakes.
@deacon-mp
Copy link
Contributor

Can you address the above and resubmit for Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants