Skip to content

Conversation

@ebarskiy
Copy link

COMMAND command is expensive to process (too much string operations)
but result is rarely changes, it makes sense to add caching

Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
Copy link
Member

@JimB123 JimB123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What evidence is there to suggest that COMMAND

is expensive to process (too much string operations)

How meaningful is the benefit of caching this information? I can't tell if this represents premature optimization (adding complexity without sufficient value).

Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
@ebarskiy
Copy link
Author

What evidence is there to suggest that COMMAND

is expensive to process (too much string operations)

How meaningful is the benefit of caching this information? I can't tell if this represents premature optimization (adding complexity without sufficient value).

@PingXie can you share internal latency of COMMAND?

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a similar question in the lines of @JimB123 requested.

Why do we want to optimize this particular command? This is not a heavily used command AFAIK. Apart from the clients invoking it upfront to build certain metadata. Does this lead to tail latency on your end for datapath commands?

Copy link
Member

@roshkhatri roshkhatri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with reviews, overall looks good
Few things:

  1. Can we also add some tests to test invalidation?
  2. Can we also add some benchmark numbers to see the improvements that we are seeing?

Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
@ebarskiy
Copy link
Author

Regardless the performance:
Internally, with default valkey it takes 2ms (little to no variance) to regenerate a result.
Add bunch of modules (with extra commands) and it shoots up to 5ms.
For comparison, processing cached result takes 50usec internally.

When its important:

  1. Small cluster (low request distribution)
  2. Lots of clients (to add up, 1000 clients per node makes it 5 sec of wall time)
  3. Clients do a rolling update/deployment (so COMMAND commands are localized in time)

In general, we have identified 3 similar issues responsible for the tail latency:

  1. Topology refresh (also takes ~5ms)
  2. AUTH + IAM token (vendor/module specific)
  3. COMMAND

While COMMAND has the lowest impact out of these three, its easiest to address

Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
Signed-off-by: Evgeny Barskiy <ebarskiy@snapchat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants