Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: count unique things seen #123

Open
danp opened this issue Oct 29, 2013 · 7 comments
Open

Feature request: count unique things seen #123

danp opened this issue Oct 29, 2013 · 7 comments

Comments

@danp
Copy link
Contributor

danp commented Oct 29, 2013

I would like to have l2met produce a "unique things seen" metric from my log lines. Say I have an app that logs lines like this:

user_id=3
user_id=6
user_id=3
user_id=3
user_id=6
user_id=2

I would like a metric that counts how many unique user_ids were logged, in this case 3. A possible convention:

unique#user-id=1
unique#user-id=1
unique#user-id=3
unique#user-id=2
unique#user-id=3
unique#user-id=1

these lines would cause a user-id metric to be emitted every interval and again in this case the value would be 3 based on these lines.

@aseemk
Copy link

aseemk commented Oct 30, 2013

Neat idea. I've wondered where the line gets drawn between technical metrics and user analytics, but this is something we might use too if it were available. IP addresses, OAuth client IDs, and maybe User-Agent strings would be things we might track like this, along with authenticated user IDs.

@josephruscio
Copy link

@dpiddy to better predict the worst case memory usage, is there some sane max boundary on key length e.g. 128 or 256 bytes?

@danp
Copy link
Contributor Author

danp commented Oct 31, 2013

I think maybe even 64 bytes would be a reasonable limit, that would handle a sha256 sum.

@freeformz
Copy link
Collaborator

@josephruscio wouldn't value length be more important as you basically need to make a set of values for the key and then count them at the bucket boundary?

@collinvandyck
Copy link
Collaborator

Yeah I think the value is what we're talking about. @dpiddy agree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable when/if the values for the logfmt tuples had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.

@danp
Copy link
Contributor Author

danp commented Oct 31, 2013

Yeah. I'd say the limit should be 64 bytes and the recommended usage when
you are not sure if your value will be under that is to hash.

Another option would be to have l2met always hash but that has tradeoffs
too.

For my immediate use case for this feature I only need 10 bytes or less.

On Thursday, October 31, 2013, Collin Van Dyck wrote:

Yeah I think the value is what we're talking about. @dpiddyhttps://github.com/dpiddyagree that 64 bytes would be reasonable. Larger values can just be hashed
down to that, and would be preferable if the values for the logfmt tuples
had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-27531682
.

@josephruscio
Copy link

@collinvandyck @freeformz correct, I accidentally used the wrong term, max length for any value in the set was my concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants