Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] enablement for an unroll function/array expansion #36507

Open
schmikei opened this issue Nov 22, 2024 · 2 comments
Open

[pkg/ottl] enablement for an unroll function/array expansion #36507

schmikei opened this issue Nov 22, 2024 · 2 comments
Labels
enhancement New feature or request pkg/ottl processor/transform Transform processor

Comments

@schmikei
Copy link
Contributor

Component(s)

pkg/ottl, processor/transform

Is your feature request related to a problem? Please describe.

The general problem I have is that I have log data that I'd like to transform based off a separator, in my case \n within a string as the data is being sent to me.

The transformprocessor enables me to split my log on newlines however it's all one entry still just with a singular slice body

receivers:
  filelog:
    include: [ ./test.json ]
    start_at: beginning
processors:
  transform:
    log_statements:
      - context: log
        statements:
          - set(body, Split(body, "\\n"))

What I'd like to be able to do is once I've split, be able to unroll this resulting array into new log entries

Example Log Line <20>Oct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!\n<162>Oct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!
After Split
{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "observedTimeUnixNano": "1732299779758008000",
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c20\u003eOct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!"
                                        },
                                        {
                                            "stringValue": "\u003c162\u003eOct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

What I'd like to do next is implement some kind of function that creates new events based off that array i.e.

- unroll(body)

Result
{
    "resourceLogs": [
        {
            "resource": {},
            "scopeLogs": [
                {
                    "scope": {},
                    "logRecords": [
                        {
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c20\u003eOct 24 15:16:15 schmeler2853 inventore[8729]: We need to reboot the 1080p IB firewall!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        },
                        {
                            "body": {
                                "arrayValue": {
                                    "values": [
                                        {
                                            "stringValue": "\u003c162\u003eOct 24 15:16:16 ruecker1023 optio[97]: Navigating the microchip won't do anything, we need to program the multi-byte XML card!"
                                        }
                                    ]
                                }
                            },
                            "attributes": [
                                {
                                    "key": "log.file.name",
                                    "value": {
                                        "stringValue": "test.json"
                                    }
                                }
                            ],
                            "traceId": "",
                            "spanId": ""
                        }
                    ]
                }
            ]
        }
    ]
}

Describe the solution you'd like

Since what I'm looking for is some kind of editor function that would be able to take an event and expand the log slice based off each individual value of an array specified within the LogsContext; however I imagine this could be useful in any of the telemetry contexts.

  • unroll(attributes["foo"])

This is sort of the inverse of what the aggregate_on_attributes function is doing in the metrics context, but for log slices.

Describe alternatives you've considered

I've glanced briefly at the transformprocessor directly and think we could maybe just solve it there; however some reprocessing of log entries is still making me hesitant if that's the correct place #36506. I'm not entirely sure where the best place to implement such a feature (I've looked briefly at implementing generically in OTTL and could not think of a good way to not re-iterate over the expanded logs with our current OTTL implementation). Ideally I'm looking for some guidance on if this is something we can/should do with OTTL or what the alternative solution we could use to handle this potential processor problem!

Additional context

Important

Not saying there's anything inherently wrong with the implementation of OTTL, I'm just creating this issue seeking some guidance on what is the recommended way of solving this processing scenario! If we want to solve it generically using the OTTL framework, was hoping to start identifying any next steps we could take to get an OTTL solution if thats the correct place to add the desired functionality.

@schmikei schmikei added enhancement New feature or request needs triage New item requiring triage labels Nov 22, 2024
@github-actions github-actions bot added pkg/ottl processor/transform Transform processor labels Nov 22, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@djaglowski
Copy link
Member

djaglowski commented Dec 2, 2024

I looked into this a bit and have some questions about the implementation.

The processor essentially executes with logic "for each log record, execute each statement", as opposed to "for each statement, transform each log record". This is completely natural given the way statement sequences are parsed and contexts are built, but it introduces a complication which I'll explain using an example:

Say you have a simple plog.Logs that contains just one resource w/ one scope w/ three log records.

records[0].body: []string{ "A", "B", "C" }
records[1].body: "M"
records[2].body: []string{ "X", "Y", "Z" }

In my opinion, the most intuitive result for end users would retain the order of items:

# Expand in place
records[0].body: "A"
records[1].body: "B"
records[2].body: "C"
records[3].body: "M"
records[4].body: "X"
records[5].body: "Y"
records[6].body: "Z"

However, you could also argue that either of the following would be acceptable:

# Append new records to the end, delete the original
records[0].body: "M"
records[1].body: "A"
records[2].body: "B"
records[3].body: "C"
records[4].body: "X"
records[5].body: "Y"
records[6].body: "Z"
# Append new records to the end, but reuse the original by overwriting with the first value of the slice
records[0].body: "A" // original, with overwritten value
records[1].body: "M" // unmodified
records[2].body: "X"  // original, with overwritten value
records[3].body: "B"
records[4].body: "C"
records[5].body: "Y"
records[6].body: "Z"

In any case, once you consider how iteration is currently managed, it forces our hand in some sense. Specifically, we determine the length of the LogRecordSlice once, so if I'm understanding correctly, we will touch the first N items only, regardless of how the slice is modified. (Example in playground.)

This means that we effectively MUST use the solution where the original is preserved and new records are appended to the end. (See "AMXBCYZ" solution above)

IMO this is pretty ugly in terms of disrupting the intuitive order of records, but there is a tougher problem:

Suppose you want to execute a sequence of statements that includes unroll:

- set(attribute["hello"], "world")
- unroll(body)
- set(attribute["test"], "pass")

Because we determine the slice to contain 3 records before any transformations are applied, will will actually get the following result:

records[0]: { body: "A", attributes: { "hello": "world", "test": "pass" } }
records[1]: { body: "M", attributes: { "hello": "world", "test": "pass" } }
records[2]: { body: "X", attributes: { "hello": "world", "test": "pass" } }
records[3]: { body: "B", attributes: { "hello": "world" } }
records[4]: { body: "C", attributes: { "hello": "world" } }
records[5]: { body: "Y", attributes: { "hello": "world" } }
records[6]: { body: "Z", attributes: { "hello": "world" } }

What's happened here, is that statements before unroll are applied to all N records, then unroll changes the cardinality but without actually updating N, and finally we apply additional statements after copies are made but only to the first N records.

I think similar problem may occur when any function changes the length OR order of a slice. I'm curious if this has been discussed @evan-bradley, @TylerHelmuth.

One possible solution would be to introduce some notion of "this function modifies the slice" which could be used to isolate such functions into dedicated statement sequences. By having exactly one such statement per sequence, it ensures that changes to the slice are not interleaved with unrelated transformations, and subsequent statement sequences will execute on the updated number of items. This would work arbitrarily but also allow for grouping of statements which do not modify the slice:

- statement 0
- statement 1
- statement 2
- statement 3 // modifies slice
- statement 4 // modifies slice
- statement 5
- statement 6
- statement 7 // modifies slice
- statement 8
- statement 9

// compiles to the following statement sequences:
- { statement 0, statement 1, statement 2 }
- { statement 3 }
- { statement 4 }
- { statement 5, statement 6 }
- { statement 7 }
- { statement 8, statement 9 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl processor/transform Transform processor
Projects
None yet
3 participants