Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add line number option for filelogreceiver #33530

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b1bb218
add line number option for filelogreceiver
sfc-gh-jikim Jun 12, 2024
a9cfa21
update PR num and update file_input readme
sfc-gh-jikim Jun 12, 2024
a16d7c4
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 12, 2024
101a4cb
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 13, 2024
9e46283
remove restriction against multiline and rename attribute
sfc-gh-jikim Jun 13, 2024
2e9a851
Merge branch 'main' of github.com:sfc-gh-jikim/opentelemetry-collecto…
sfc-gh-jikim Jun 13, 2024
594072f
Merge branch 'main' of ssh://github.com/open-telemetry/opentelemetry-…
sfc-gh-jikim Jun 13, 2024
22f75d1
Merge branch 'filelogreceiver-line-numbers' of github.com:sfc-gh-jiki…
sfc-gh-jikim Jun 13, 2024
3650597
remove extra 'record'
sfc-gh-jikim Jun 13, 2024
662f00f
adjust increment syntax
sfc-gh-jikim Jun 13, 2024
cc44319
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 13, 2024
25af74e
typo fix
sfc-gh-jikim Jun 13, 2024
0dd3a16
Merge branch 'filelogreceiver-line-numbers' of github.com:sfc-gh-jiki…
sfc-gh-jikim Jun 13, 2024
259bec9
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 13, 2024
822dbdf
make generate
sfc-gh-jikim Jun 13, 2024
b58ec21
Merge branch 'filelogreceiver-line-numbers' of github.com:sfc-gh-jiki…
sfc-gh-jikim Jun 13, 2024
b6c56e4
include file record num in reader config
sfc-gh-jikim Jun 14, 2024
f1df1ea
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 14, 2024
b559d3b
formatting errors
sfc-gh-jikim Jun 14, 2024
d9f0eaa
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 14, 2024
d75a2ac
Merge branch 'main' into filelogreceiver-line-numbers
sfc-gh-jikim Jun 14, 2024
67fc2b0
Update pkg/stanza/fileconsumer/internal/reader/reader.go
djaglowski Jun 17, 2024
d330aac
Update pkg/stanza/fileconsumer/internal/reader/reader.go
djaglowski Jun 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .chloggen/add_include_file_record_number.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: filelogreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: If include_file_record_number is true, it will add the file record number as the attribute `log.file.record_number`

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [33530]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
55 changes: 28 additions & 27 deletions pkg/stanza/docs/operators/file_input.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,35 +4,36 @@ The `file_input` operator reads logs from files. It will place the lines read in

### Configuration Fields

| Field | Default | Description |
| --- | --- | --- |
| `id` | `file_input` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `include` | required | A list of file glob patterns that match the file paths to be read. |
| `exclude` | [] | A list of file glob patterns to exclude from reading. |
| `poll_interval` | 200ms | The duration between filesystem polls. |
| `multiline` | | A `multiline` configuration block. See below for details. |
| `force_flush_period` | `500ms` | Time since last read of data from file, after which currently buffered log should be send to pipeline. Takes `time.Time` as value. Zero means waiting for new data forever. |
| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. |
| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. |
| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. |
| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. |
| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. |
| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. |
| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. |
| Field | Default | Description |
|---------------------------------| --- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | `file_input` | A unique identifier for the operator. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `include` | required | A list of file glob patterns that match the file paths to be read. |
| `exclude` | [] | A list of file glob patterns to exclude from reading. |
| `poll_interval` | 200ms | The duration between filesystem polls. |
| `multiline` | | A `multiline` configuration block. See below for details. |
| `force_flush_period` | `500ms` | Time since last read of data from file, after which currently buffered log should be send to pipeline. Takes `time.Time` as value. Zero means waiting for new data forever. |
| `encoding` | `utf-8` | The encoding of the file being read. See the list of supported encodings below for available options. |
| `include_file_name` | `true` | Whether to add the file name as the attribute `log.file.name`. |
| `include_file_path` | `false` | Whether to add the file path as the attribute `log.file.path`. |
| `include_file_name_resolved` | `false` | Whether to add the file name after symlinks resolution as the attribute `log.file.name_resolved`. |
| `include_file_path_resolved` | `false` | Whether to add the file path after symlinks resolution as the attribute `log.file.path_resolved`. |
| `include_file_owner_name` | `false` | Whether to add the file owner name as the attribute `log.file.owner.name`. Not supported for windows. |
| `include_file_owner_group_name` | `false` | Whether to add the file group name as the attribute `log.file.owner.group.name`. Not supported for windows. |
| `include_file_record_number` | `false` | Whether to add the record's record number in the file as the attribute `log.file.record_number`. |
| `preserve_leading_whitespaces` | `false` | Whether to preserve leading whitespaces. |
| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. |
| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. This setting will be ignored if previously read file offsets are retrieved from a persistence mechanism. |
| `preserve_trailing_whitespaces` | `false` | Whether to preserve trailing whitespaces. |
| `start_at` | `end` | At startup, where to start reading logs from the file. Options are `beginning` or `end`. This setting will be ignored if previously read file offsets are retrieved from a persistence mechanism. |
| `fingerprint_size` | `1kb` | The number of bytes with which to identify a file. The first bytes in the file are used as the fingerprint. Decreasing this value at any point will cause existing fingerprints to forgotten, meaning that all files will be read from the beginning (one time). |
| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory |.
| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently (minimum = 2). If the number of files matched in the `include` pattern exceeds half of this number, then files will be processed in batches. |
| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. |
| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. |
| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. |
| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. |
| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. |
| `header.pattern` | required for header metadata parsing | A regex that matches every header line. |
| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. |
| `max_log_size` | `1MiB` | The maximum size of a log entry to read before failing. Protects against reading large amounts of data into memory |.
| `max_concurrent_files` | 1024 | The maximum number of log files from which logs will be read concurrently (minimum = 2). If the number of files matched in the `include` pattern exceeds half of this number, then files will be processed in batches. |
| `max_batches` | 0 | Only applicable when files must be batched in order to respect `max_concurrent_files`. This value limits the number of batches that will be processed during a single poll interval. A value of 0 indicates no limit. |
| `delete_after_read` | `false` | If `true`, each log file will be read and then immediately deleted. Requires that the `filelog.allowFileDeletion` feature gate is enabled. |
| `attributes` | {} | A map of `key: value` pairs to add to the entry's attributes. |
| `resource` | {} | A map of `key: value` pairs to add to the entry's resource. |
| `header` | nil | Specifies options for parsing header metadata. Requires that the `filelog.allowHeaderMetadataParsing` feature gate is enabled. See below for details. |
| `header.pattern` | required for header metadata parsing | A regex that matches every header line. |
| `header.metadata_operators` | required for header metadata parsing | A list of operators used to parse metadata from the header. |

Note that by default, no logs will be read unless the monitored file is actively being written to because `start_at` defaults to `end`.

Expand Down
6 changes: 6 additions & 0 deletions pkg/stanza/fileconsumer/attrs/attrs.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ const (
LogFilePathResolved = "log.file.path_resolved"
LogFileOwnerName = "log.file.owner.name"
LogFileOwnerGroupName = "log.file.owner.group.name"
LogFileRecordNumber = "log.file.Record_number"
sfc-gh-jikim marked this conversation as resolved.
Show resolved Hide resolved
)

type Resolver struct {
Expand All @@ -26,6 +27,7 @@ type Resolver struct {
IncludeFilePathResolved bool `mapstructure:"include_file_path_resolved,omitempty"`
IncludeFileOwnerName bool `mapstructure:"include_file_owner_name,omitempty"`
IncludeFileOwnerGroupName bool `mapstructure:"include_file_owner_group_name,omitempty"`
IncludeFileRecordNumber bool `mapstructure:"include_file_record_number,omitempty"`
}

func (r *Resolver) Resolve(file *os.File) (attributes map[string]any, err error) {
Expand All @@ -44,6 +46,10 @@ func (r *Resolver) Resolve(file *os.File) (attributes map[string]any, err error)
return nil, err
}
}
if r.IncludeFileRecordNumber {
// non-zero value to flag for setting
attributes[LogFileRecordNumber] = 1
}
if !r.IncludeFileNameResolved && !r.IncludeFilePathResolved {
return attributes, nil
}
Expand Down
15 changes: 11 additions & 4 deletions pkg/stanza/fileconsumer/attrs/attrs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ import (
func TestResolver(t *testing.T) {
t.Parallel()

for i := 0; i < 64; i++ {
for i := 0; i < 128; i++ {

// Create a 4 bit string where each bit represents the value of a config option
bitString := fmt.Sprintf("%06b", i)
// Create a 7 bit string where each bit represents the value of a config option
bitString := fmt.Sprintf("%07b", i)

// Create a resolver with a config that matches the bit pattern of i
r := Resolver{
Expand All @@ -30,6 +30,7 @@ func TestResolver(t *testing.T) {
IncludeFilePathResolved: bitString[3] == '1',
IncludeFileOwnerName: bitString[4] == '1' && runtime.GOOS != "windows",
IncludeFileOwnerGroupName: bitString[5] == '1' && runtime.GOOS != "windows",
IncludeFileRecordNumber: bitString[6] == '1',
}

t.Run(bitString, func(t *testing.T) {
Expand All @@ -53,8 +54,14 @@ func TestResolver(t *testing.T) {
} else {
assert.Empty(t, attributes[LogFilePath])
}
if r.IncludeFileRecordNumber {
expectLen++
assert.Equal(t, 1, attributes[LogFileRecordNumber])
} else {
assert.Empty(t, attributes[LogFileRecordNumber])
}

// We don't have an independent way to resolve the path, so the only meangingful validate
// We don't have an independent way to resolve the path, so the only meaningful validate
// is to ensure that the resolver returns nothing vs something based on the config.
if r.IncludeFileNameResolved {
expectLen++
Expand Down
1 change: 1 addition & 0 deletions pkg/stanza/fileconsumer/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ func TestNewConfig(t *testing.T) {
assert.False(t, cfg.IncludeFilePathResolved)
assert.False(t, cfg.IncludeFileOwnerName)
assert.False(t, cfg.IncludeFileOwnerGroupName)
assert.False(t, cfg.IncludeFileRecordNumber)
}

func TestUnmarshal(t *testing.T) {
Expand Down
8 changes: 8 additions & 0 deletions pkg/stanza/fileconsumer/internal/reader/reader.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"bufio"
"context"
"errors"
"github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs"
"os"

"go.opentelemetry.io/collector/component"
Expand All @@ -23,6 +24,7 @@ import (
type Metadata struct {
Fingerprint *fingerprint.Fingerprint
Offset int64
RecordNum int64
FileAttributes map[string]any
HeaderFinalized bool
FlushState *flush.State
Expand Down Expand Up @@ -87,6 +89,12 @@ func (r *Reader) ReadToEnd(ctx context.Context) {
continue
}

_, fileRecordNumEnabled := r.FileAttributes[attrs.LogFileRecordNumber]
djaglowski marked this conversation as resolved.
Show resolved Hide resolved
if fileRecordNumEnabled {
r.RecordNum++
r.FileAttributes[attrs.LogFileRecordNumber] = r.RecordNum
}

err = r.processFunc(ctx, token, r.FileAttributes)
if err == nil {
r.Offset = s.Pos() // successful emit, update offset
Expand Down
Loading