Feature request: count unique things seen #123

danp · 2013-10-29T18:41:49Z

I would like to have l2met produce a "unique things seen" metric from my log lines. Say I have an app that logs lines like this:

user_id=3
user_id=6
user_id=3
user_id=3
user_id=6
user_id=2

I would like a metric that counts how many unique user_ids were logged, in this case 3. A possible convention:

unique#user-id=1
unique#user-id=1
unique#user-id=3
unique#user-id=2
unique#user-id=3
unique#user-id=1

these lines would cause a user-id metric to be emitted every interval and again in this case the value would be 3 based on these lines.

The text was updated successfully, but these errors were encountered:

aseemk · 2013-10-30T22:36:44Z

Neat idea. I've wondered where the line gets drawn between technical metrics and user analytics, but this is something we might use too if it were available. IP addresses, OAuth client IDs, and maybe User-Agent strings would be things we might track like this, along with authenticated user IDs.

josephruscio · 2013-10-31T04:56:46Z

@dpiddy to better predict the worst case memory usage, is there some sane max boundary on key length e.g. 128 or 256 bytes?

danp · 2013-10-31T14:01:48Z

I think maybe even 64 bytes would be a reasonable limit, that would handle a sha256 sum.

freeformz · 2013-10-31T17:34:27Z

@josephruscio wouldn't value length be more important as you basically need to make a set of values for the key and then count them at the bucket boundary?

collinvandyck · 2013-10-31T21:44:35Z

Yeah I think the value is what we're talking about. @dpiddy agree that 64 bytes would be reasonable. Larger values can just be hashed down to that, and would be preferable when/if the values for the logfmt tuples had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.

danp · 2013-10-31T22:39:59Z

Yeah. I'd say the limit should be 64 bytes and the recommended usage when
you are not sure if your value will be under that is to hash.

Another option would be to have l2met always hash but that has tradeoffs
too.

For my immediate use case for this feature I only need 10 bytes or less.

On Thursday, October 31, 2013, Collin Van Dyck wrote:

Yeah I think the value is what we're talking about. @dpiddyhttps://github.com/dpiddyagree that 64 bytes would be reasonable. Larger values can just be hashed
down to that, and would be preferable if the values for the logfmt tuples
had whitespace. For example,

unique#full-name=Collin Van Dyck

Probably wouldn't be that useful in a unique tag but

unique#full-name=ac4780ff03d9df8fb067c67681e772d4

would be.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-27531682
.

josephruscio · 2013-10-31T22:40:16Z

@collinvandyck @freeformz correct, I accidentally used the wrong term, max length for any value in the set was my concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: count unique things seen #123

Feature request: count unique things seen #123

danp commented Oct 29, 2013

aseemk commented Oct 30, 2013

josephruscio commented Oct 31, 2013

danp commented Oct 31, 2013

freeformz commented Oct 31, 2013

collinvandyck commented Oct 31, 2013

danp commented Oct 31, 2013

josephruscio commented Oct 31, 2013

Feature request: count unique things seen #123

Feature request: count unique things seen #123

Comments

danp commented Oct 29, 2013

aseemk commented Oct 30, 2013

josephruscio commented Oct 31, 2013

danp commented Oct 31, 2013

freeformz commented Oct 31, 2013

collinvandyck commented Oct 31, 2013

danp commented Oct 31, 2013

josephruscio commented Oct 31, 2013