Skip to content

Segment: Collectors

William W. Kimball, Jr., MBA, MSIS edited this page Nov 14, 2022 · 6 revisions

One of YAML Path's most powerful capabilities, Collectors and Collector math empower users to gather and perform operations against virtual node sets which are gathered from the source data using any segment types. A Collector always returns an Array of nodes. Collector math enables subsequent Collectors to be added to or subtracted from their left counterparts. Collectors can be nested and the same math operators can be applied to nested Collectors, enabling sophisticated data tuning.

While all other segment types move the internal DOM scanner deeper into the structure of the source data, Collectors do not. This enables multiple subsequent Collectors to be "rooted" at the same point within the source data structure. This can grant some efficiency when composing complex YAML Paths by reducing path prefix duplication.

Some illustrations may help demonstrate how Collectors are used.

Convert Results To Lists

Many YAML Paths produce multiple discrete results. A trivial application of Collectors is to convert such output into one List.

consoles:
  - ColecoVision
  - Atari 2600
  - Atari 4800
  - Nintendo Entertainment System
  - SEGA Master System
  - SEGA Genesis
  - Nintendo SNES
  - SEGA CD
  - TurboGrafx 16
  - SEGA 32X
  - NeoGeo
  - SEGA Saturn
  - Sony PlayStation
  - Nintendo 64
  - SEGA DreamCast
  - Sony PlayStation 2
  - Microsoft Xbox
  - Sony PlayStation 3
  - Nintendo Wii
  - Microsoft Xbox 360
  - Sony PlayStation 4
  - Nintendo Wii-U
  - Microsoft Xbox One
  - Microsoft Xbox One S
  - Sony PlayStation 4 Pro
  - Microsoft Xbox One X
  - Nintendo Switch

Were a user curious as to which consoles were made by SEGA, they might use an Array Element Search such as consoles[. % SEGA]. This would result in multiple responses from YAML Path:

SEGA Master System
SEGA Genesis
SEGA CD
SEGA 32X
SEGA Saturn
SEGA DreamCast

But what if the user needed only the very first result? While there are means to do this outside of YAML Path -- say by chaining multiple tools together -- YAML Path can natively solve this problem by using a Collector. First, wrapping the original search expression as a Collector will convert the multi-result output into a single List result: (consoles[. % SEGA]) produces ["SEGA Master System", "SEGA Genesis", "SEGA CD", "SEGA 32X", "SEGA Saturn", "SEGA DreamCast"]. Then, adding an Array Element index to the output of the Collector enables selecting the single desired result: (consoles[. % SEGA])[0] produces only SEGA Master System.

Collector Math

Certainly more interesting than merely converting multi-result output into single List results, Collectors can also be combined using simple Set math. + performs a union. - calculates the difference. As of version 3.7.0, & calculates the intersection. For the next several illustrations, this contrived automation sample data will be used:

standard:
  setup:
    - id: 0
      step: 1
      action: Initialize
    - id: 1
      step: 2
      action: Provision
  teardown:
    - id: 2
      step: 1
      action: Deprovision
    - id: 3
      step: 2
      action: Terminate

change:
  - id: 4
    step: 1
    action: Do something
  - id: 5
    step: 2
    action: Do something else

rollback:
  data_error:
    - id: 6
      step: 1
      action: Flush
  app_error:
    - id: 7
      step: 1
      action: Abend
    - id: 8
      step: 2
      action: Shutdown

disabled_ids:
  - 3
  - 5
  - 8

Imagine a user wanted to know all actions -- in any order -- which might be performed when an execution went merrily along the "happy path". YAML Path Collectors help reduce the data down to that answer. Each Collector merely wraps any other YAML Path segment, so let's use Array of Hashes Pass-Through Selection to help form the desired output.

We know the user wants all "happy path" actions. We'll need actions from steps in the standard setup and teardown sections along with the actions in the planned change section. That's 3 queries:

  1. /standard/setup/action
  2. /standard/teardown/action
  3. /change/action

If we run all 3 queries separately, we'll end up with 3 answers, each with an unrelated lists of actions. However, the user wants just one answer. We can use Collectors to group each list, like so:

  1. (/standard/setup/action)
  2. (/standard/teardown/action)
  3. (/change/action)

But we still end up with 3 separate answers. In order to produce just one answer, we use Collector Math. The user needs to add all 3 queries together, like so: (/standard/setup/action) + (/standard/teardown/action) + (/change/action) produces ["Initialize", "Provision", "Deprovision", "Terminate", "Do something", "Do something else"].

Aside: We can further simplify this query by using a little Hash Attribute Searches trick. If we tell YAML Path that we want all children of /standard which have non-empty key names, then use Array of Hashes Pass-Through Selection on the result, we can reduce the first two Collectors to just one and get exactly the same result: (/standard[.!='']/action) + (/change/action)

Now, what if the user is required to eliminate any steps from the "happy path" which have been disabled? The data provides unique identifiers for each step along with an Array of disabled identifiers. We can use subtraction to remove elements of one Collector from its predecessor result. While we won't be able to return the remaining actions from the sample data (a more elaborate data set would use Anchored Hashes to replicate the steps being used in more than one place), we can still produce a set of permissible step identifiers.

All step identifiers of the "happy path" can be known by using addition: (/standard[.!='']/id) + (/change/id). All disabled identifiers can be known with a single Hash Keys query because it is already an Array: /disabled_ids. In order to subtract the disabled identifiers from the collected "happy path" identifiers, all must be in Collectors, so we just need to wrap the disabled identifiers (/disabled_ids) and apply math: (/standard[.!='']/id) + (/change/id) - (/disabled_ids) produces the expected result, [0, 1, 2, 4].

Order of Operations

Collector math is performed from left-to-right. However, you can override this behavior by wrapping sets of Collectors within other Collectors. For example:

list1:
  - 1
  - 2
  - 3
list2:
  - 4
  - 5
  - 6
exclude:
  - 3
  - 4
  • (/list1) + (/list2) produces [1, 2, 3, 4, 5, 6]
  • (/list1) - (/exclude) produces [1, 2]
  • (/list2) - (/exclude) produces [5, 6]
  • (/list1) + (/list2) - (/exclude) produces [1, 2, 5, 6]
  • ((/list1) - (/exclude)) + (/list2) produces [1, 2, 4, 5, 6]
  • (/list1) + ((/list2) - (/exclude)) produces [1, 2, 3, 5, 6]

Only Collectors can be used in Collector Math. You cannot, for example, execute a YAML Paths like /list1 - /exclude or (/list1) - /exclude.

Clone this wiki locally