-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore][fileconsumer/archive] - Add archive read logic #35798
[chore][fileconsumer/archive] - Add archive read logic #35798
Conversation
@djaglowski can you take a fresh look? I've removed the |
|
||
func Mod(x, y int) int { | ||
return ((x % y) + y) % y | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because in golang, the %
operator acts as a remainder, not a typical math modulus
and it can output negative integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it only returns negative if you apply it to a negative value, so can't we just avoid that and compute this inline?
mostRecentIndex = (x - 1 + y) % y
or
mostRecentIndex--
if mostRecentIndex < 0 {
mostRecentIndex += t.pollsToArchive
}
func (f *Fingerprint) GetFingerprint() *Fingerprint { | ||
return f | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added this so the Fingeprint
can implement the Matchable
interface, so that we can have unified return type for
opentelemetry-collector-contrib/pkg/stanza/fileconsumer/internal/tracker/tracker.go
Lines 194 to 195 in ad46f94
func (t *fileTracker) FindFiles(fps []*fingerprint.Fingerprint) []fileset.Matchable { | |
// To minimize disk access, we first access the index, then review unmatched files and update the metadata, if found. |
I think this is neater than returning []any
or creating a struct to capture the fingerprint/metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to return the fingerprints that didn't match. Just return []*reader.Metadata
and leave the not found items empty.
"go.opentelemetry.io/collector/component/componenttest" | ||
) | ||
|
||
func TestFindFilesOrder(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know your thoughts over this test @djaglowski.
@djaglowski I imagine you were at KubeCon last week. Just checking in to see if you're back and reviewing. |
func (f *Fingerprint) GetFingerprint() *Fingerprint { | ||
return f | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to return the fingerprints that didn't match. Just return []*reader.Metadata
and leave the not found items empty.
|
||
mostRecentIndex := util.Mod(t.archiveIndex-1, t.pollsToArchive) | ||
matchedMetadata := make([]fileset.Matchable, len(fps)) | ||
indices := make(map[int]bool) // Track fp indices of original fps slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This complicates the code way more than necessary. The result slice can indicate just as well whether a fingerprint has been matched. Just check if the corresponding value is nil.
if md := data.Match(fps[index], fileset.StartsWith); md != nil { | ||
// update the matched metadata for this index | ||
matchedMetadata[index] = md | ||
delete(indices, index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a map and deleting keys from it may be marginally more performant but I don't buy that it's worth the complexity. Just loop over the result slice and only attempt to match an item if the corresponding result is nil.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. I've updated the code.
matchedMetadata := make([]*reader.Metadata, len(fps)) | ||
|
||
// continue executing the loop until either all records are matched or all archive sets have been processed. | ||
for i := 0; i < t.pollsToArchive; i, mostRecentIndex = i+1, util.Mod(mostRecentIndex-1, t.pollsToArchive) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the buffer index update out of the control loop to make this easily readable?
for i := 0; i < t.pollsToArchive; i, mostRecentIndex = i+1, util.Mod(mostRecentIndex-1, t.pollsToArchive) { | |
for i := 0; i < t.pollsToArchive; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I've removed the Mod function and kept it in-place. Should be good now.
|
||
func Mod(x, y int) int { | ||
return ((x % y) + y) % y | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that it only returns negative if you apply it to a negative value, so can't we just avoid that and compute this inline?
mostRecentIndex = (x - 1 + y) % y
or
mostRecentIndex--
if mostRecentIndex < 0 {
mostRecentIndex += t.pollsToArchive
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is making more sense to me now. This round of feedback is all nits but I think it will help readability.
@djaglowski does this look good to go? |
…y#35798) This PR follows open-telemetry#35098. ### Description - This PR adds core logic for matching from archive. Check [this out](open-telemetry#32727 (comment)) for the core logic. ### Future PRs - As of now, we don't keep track of most recently written index across collector restarts. This is simple to accomplish and we can use of persister for this. I haven't implemented this in current PR, as I want to guide your focus solely towards reading part. We can address this in this PR (later) or in a separate PR, independently. - Testing and Enabling: Once we establish common ground on _**reading from archive**_ matter, we can proceed with testing and enabling the configuration.
…y#35798) This PR follows open-telemetry#35098. ### Description - This PR adds core logic for matching from archive. Check [this out](open-telemetry#32727 (comment)) for the core logic. ### Future PRs - As of now, we don't keep track of most recently written index across collector restarts. This is simple to accomplish and we can use of persister for this. I haven't implemented this in current PR, as I want to guide your focus solely towards reading part. We can address this in this PR (later) or in a separate PR, independently. - Testing and Enabling: Once we establish common ground on _**reading from archive**_ matter, we can proceed with testing and enabling the configuration.
This PR follows #35098.
Description
Future PRs