[pkg/ottl] enablement for an unroll function/array expansion #36507

schmikei · 2024-11-22T19:45:10Z

Component(s)

pkg/ottl, processor/transform

Is your feature request related to a problem? Please describe.

The general problem I have is that I have log data that I'd like to transform based off a separator, in my case \n within a string as the data is being sent to me.

The transformprocessor enables me to split my log on newlines however it's all one entry still just with a singular slice body

receivers:
  filelog:
    include: [ ./test.json ]
    start_at: beginning
processors:
  transform:
    log_statements:
      - context: log
        statements:
          - set(body, Split(body, "\\n"))

What I'd like to be able to do is once I've split, be able to unroll this resulting array into new log entries

Example Log Line

<20>Oct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!\n<162>Oct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!

After Split

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1732299779758008000",
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c20\u003eOct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!"
                                        },
                                        {
                                            "stringValue": "\u003c162\u003eOct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

What I'd like to do next is implement some kind of function that creates new events based off that array i.e.

- unroll(body)

Result

{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c20\u003eOct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        },
                        {
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c162\u003eOct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Describe the solution you'd like

Since what I'm looking for is some kind of editor function that would be able to take an event and expand the log slice based off each individual value of an array specified within the LogsContext; however I imagine this could be useful in any of the telemetry contexts.

unroll(attributes["foo"])

This is sort of the inverse of what the aggregate_on_attributes function is doing in the metrics context, but for log slices.

Describe alternatives you've considered

I've glanced briefly at the transformprocessor directly and think we could maybe just solve it there; however some reprocessing of log entries is still making me hesitant if that's the correct place #36506. I'm not entirely sure where the best place to implement such a feature (I've looked briefly at implementing generically in OTTL and could not think of a good way to not re-iterate over the expanded logs with our current OTTL implementation). Ideally I'm looking for some guidance on if this is something we can/should do with OTTL or what the alternative solution we could use to handle this potential processor problem!

Additional context

Important

Not saying there's anything inherently wrong with the implementation of OTTL, I'm just creating this issue seeking some guidance on what is the recommended way of solving this processing scenario! If we want to solve it generically using the OTTL framework, was hoping to start identifying any next steps we could take to get an OTTL solution if thats the correct place to add the desired functionality.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-11-22T19:45:27Z

Pinging code owners:

pkg/ottl: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley
processor/transform: @TylerHelmuth @kentquirk @bogdandrutu @evan-bradley

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2024-12-02T21:54:29Z

I looked into this a bit and have some questions about the implementation.

The processor essentially executes with logic "for each log record, execute each statement", as opposed to "for each statement, transform each log record". This is completely natural given the way statement sequences are parsed and contexts are built, but it introduces a complication which I'll explain using an example:

Say you have a simple plog.Logs that contains just one resource w/ one scope w/ three log records.

records[0].body: []string{ "A", "B", "C" }
records[1].body: "M"
records[2].body: []string{ "X", "Y", "Z" }

In my opinion, the most intuitive result for end users would retain the order of items:

# Expand in place
records[0].body: "A"
records[1].body: "B"
records[2].body: "C"
records[3].body: "M"
records[4].body: "X"
records[5].body: "Y"
records[6].body: "Z"

However, you could also argue that either of the following would be acceptable:

# Append new records to the end, delete the original
records[0].body: "M"
records[1].body: "A"
records[2].body: "B"
records[3].body: "C"
records[4].body: "X"
records[5].body: "Y"
records[6].body: "Z"

# Append new records to the end, but reuse the original by overwriting with the first value of the slice
records[0].body: "A" // original, with overwritten value
records[1].body: "M" // unmodified
records[2].body: "X"  // original, with overwritten value
records[3].body: "B"
records[4].body: "C"
records[5].body: "Y"
records[6].body: "Z"

In any case, once you consider how iteration is currently managed, it forces our hand in some sense. Specifically, we determine the length of the LogRecordSlice once, so if I'm understanding correctly, we will touch the first N items only, regardless of how the slice is modified. (Example in playground.)

This means that we effectively MUST use the solution where the original is preserved and new records are appended to the end. (See "AMXBCYZ" solution above)

IMO this is pretty ugly in terms of disrupting the intuitive order of records, but there is a tougher problem:

Suppose you want to execute a sequence of statements that includes unroll:

- set(attribute["hello"], "world")
- unroll(body)
- set(attribute["test"], "pass")

Because we determine the slice to contain 3 records before any transformations are applied, will will actually get the following result:

records[0]: { body: "A", attributes: { "hello": "world", "test": "pass" } }
records[1]: { body: "M", attributes: { "hello": "world", "test": "pass" } }
records[2]: { body: "X", attributes: { "hello": "world", "test": "pass" } }
records[3]: { body: "B", attributes: { "hello": "world" } }
records[4]: { body: "C", attributes: { "hello": "world" } }
records[5]: { body: "Y", attributes: { "hello": "world" } }
records[6]: { body: "Z", attributes: { "hello": "world" } }

What's happened here, is that statements before unroll are applied to all N records, then unroll changes the cardinality but without actually updating N, and finally we apply additional statements after copies are made but only to the first N records.

I think similar problem may occur when any function changes the length OR order of a slice. I'm curious if this has been discussed @evan-bradley, @TylerHelmuth.

One possible solution would be to introduce some notion of "this function modifies the slice" which could be used to isolate such functions into dedicated statement sequences. By having exactly one such statement per sequence, it ensures that changes to the slice are not interleaved with unrelated transformations, and subsequent statement sequences will execute on the updated number of items. This would work arbitrarily but also allow for grouping of statements which do not modify the slice:

- statement 0
- statement 1
- statement 2
- statement 3 // modifies slice
- statement 4 // modifies slice
- statement 5
- statement 6
- statement 7 // modifies slice
- statement 8
- statement 9

// compiles to the following statement sequences:
- { statement 0, statement 1, statement 2 }
- { statement 3 }
- { statement 4 }
- { statement 5, statement 6 }
- { statement 7 }
- { statement 8, statement 9 }

schmikei added enhancement New feature or request needs triage New item requiring triage labels Nov 22, 2024

github-actions bot added pkg/ottl processor/transform Transform processor labels Nov 22, 2024

evan-bradley removed the needs triage New item requiring triage label Nov 22, 2024

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

djaglowski mentioned this issue Dec 2, 2024

[feat]: [processor/transform] unroll func (PR only for discussion do not merge) #36506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pkg/ottl] enablement for an unroll function/array expansion #36507

[pkg/ottl] enablement for an unroll function/array expansion #36507

schmikei commented Nov 22, 2024

github-actions bot commented Nov 22, 2024

djaglowski commented Dec 2, 2024 •

edited

Loading

[pkg/ottl] enablement for an unroll function/array expansion #36507

[pkg/ottl] enablement for an unroll function/array expansion #36507

Comments

schmikei commented Nov 22, 2024

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Important

github-actions bot commented Nov 22, 2024

djaglowski commented Dec 2, 2024 • edited Loading

djaglowski commented Dec 2, 2024 •

edited

Loading