Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta command follow ups #546

Closed
4 of 6 tasks
dormando opened this issue Oct 1, 2019 · 33 comments
Closed
4 of 6 tasks

meta command follow ups #546

dormando opened this issue Oct 1, 2019 · 33 comments

Comments

@dormando
Copy link
Member

dormando commented Oct 1, 2019

Meta features (#484) are out in 1.5.19. There'll be a trickle of changes hopefully with every release to smooth things out. Keeping track of them with this issue.

  • LOGGER_LOG() entries specific to meta commands
  • mget input: return value if CAS match or not match (maybe invert v if supplied cas matches?)
  • meta get should work without flags supplied
  • meta delete has extraneous newline (meta get probably has the same bug)
  • O and k should be reflected on EN/miss.
  • remove the "meta_response_old" option.

edit: removed a bunch of completed items/notes.
edit2: (final?) todo list.
edit3: updated TODO list again.

@dormando
Copy link
Member Author

dormando commented Nov 8, 2019

Trying to implement a simplified client with this in a few more languages. Just ran into something backwards.

For sending flags to the server, capital letter flags always consume a token.
However for flags received by the client, there's no blind way of determining which flags associate with tokens. You have to know the full list of flags ahead of time; which means adding a new flag that returns a token could be dangerous.

Up until now I was writing the clients with tokens being passed around blindly... but this has accidentally made parsing the value part of the response impossible. In all my other tests I'd been looking for EN\r\n and overlooked this when reviewing. As did everyone else :)

Clients either need to test for "was there a v flag?", and if so, "which token position is the size?", or blind parse by looking for EN\r\n. Options:

  1. Box capital letter flags for any flag which returns a token. If 'v' and no 's', look for EN\r\n. If 's', pull by length. Could also box by which half of alphabet.
  2. If 'v', return code is: VA <size> <returnflags> <tokens>. If no 'v', return code is: HD <returnflags> <tokens>, VA for value, HD for header. With this clients should be able to blindly accept and return all flags/tokens/values.
  3. Do 2, but also box token'ed flags so tokens can be assigned to flags without the client knowing future flags. This is less useful with the VA/HD change, but would still allow for building map/hash/dict of flag -> token. The caller should usually know what it's expecting and shouldn't need to look at the flags at all, but it should be possible to build this list without thinking about it.

Hmm. Another possibility would be to always have capital means "consumes a token". So response flags get capitalized when they have a token. l -> L, etc. and request flags get lowercased. N -> n. Downside is this reduces the number of possible flags to just 26 without using special characters.

metaget, with the most flags, has 14ish. The W/X/Z flags can be swapped for !@# or similar characters, freeing up three. however I've tried to keep the flags meanings stable across commands... will have to draw up a flag list and see how it looks I guess.

@dormando
Copy link
Member Author

dormando commented Nov 8, 2019

a: none
A:
b: none
B:
c: return CAS
C: set/use supplied CAS
d: none
D:
e: none
E:
f: get flags
F: set flags
g: none
G:
h: return if item was hit
H: none
i: none
I: invalidation (FIX)
j: none
J:
k: return key as token
K: none
l: return last access
L: none
m: none
M: none
n: none                                                                                            
N: vivify on miss. token.                                                                          
o: none
O: take and return opaque                                                                          
p: none                                                                                            
P: none                                                                                            
q: quiet                                                                                           
Q: none
r: none
R: recache if less than token
s: return size
S: SET size 
t: return TTL
T: SET TTL 
u: don't bump LRU
U: none
v: return value 
V: none
w: none
W: WIN flag 
x: none
X: STALE flag
y: none
Y: none
z: none                                                                                            
Z: WIN already sent flag

9 flags currently free
~5 with "wasted" halves. either takes and doesn't return token or etc.
3 reclaimable (swap W/X/Z with ! # * maybe)

Known commands not used yet:

  • delta/arithmetic. needs initial value, incr or decr, token to delta
  • metaset: append, prepend, replace, add (4 flags?). could cheat with "mode" and use a number since they're all exclusive to each other: 12345
  • no other flags needed for the different set modes. ADD gives a return code already (EX/EXISTS)
  • delete: dunno what's missing.

that's to get to parity with binprot. so perhaps 3 more flags for delta, though N could be reused for the init since it means the same thing. Current incr/decr is specialized: incr key value\r\n. ma key N token V token etc. could just prefix - for decr vs incr. Now we're playing golf but not using any new flags.

That should leave 12 for future expansion. Right now it's convenient to be able have a shared token pre/post parser for all meta commands, so not having overlap is nice.

However, if CAPS means has a token (supplied or returned), lower doesn't, then the preparser doesn't necessarily have to be shared, then each command can have 26 flags. The shared preparser in memcached is nice since it sets bits for whether or not to leave the item locked during internal fetch... however so far the preparser is only useful for metaget.

I can't see much reason for clients to care about flags.. if they're pre-parsing response flags into a map/dict/hash and you're in a typed language you'd want to know ahead of time if it's a number or string, but even that has limited usefulness.

@dormando
Copy link
Member Author

dormando commented Nov 10, 2019

Think the problem here stems from flags being reflected in the response rather than representing what the tokens are, or strictly from extra data being returned.
mg sfv foo
could return
mg sf 2 0\r\nhi\r\n
... and everything would be fine.

building a response this way could suck in some languages. In C you're going to end up with a copy somewhere... in the server I reserve space at the front of the return buffer, then write extra flags in backwards. If building the response flags forwards I couldn't do that without a pre-parser, which honestly there already is one so it's probably fine to just handle that in the pre-parser.

I wonder if this would end up painful in any immutable language. :\

... but I can't seem to remember why I decided to reflect everything. I think earlier in the design nothing was reflected, but I added the flag return possibly only to make return parsing easier plus have the out of band data for WIN/STALE/WON tokens.

Edit: Why am I so dumb? Of course this doesn't still work: WIN/STALE/WON still don't consume tokens, so response parsing still can't be blind. If this is done on top of the CAPS fix it would work 100%?

Wonder if I could leave the flags in for responses but basically kill all of them off: "0" or "Z" just means no extra flags in response, then tokens match what was sent implicitly. This would be fine but clients still can't autoparse since the flag <-> token state is entirely in the client. Wireshark could parse it only if it understood all flags and saw both request + response.


My ancient "protocol V3" suggestion had flags prepending tokens: ms foo sSIZE cCAS and so on. that's certainly easier to parse and not that hard to write in response regardless of language, but without the out of band flags standalone flags would require space buffers, making them slower to parse, waste bytes, etc.

Might just try the code for:
mg foo sfvt N30 R15
with res:

VA [size] [flags] s2 f0 t300
hi
EN

where a [flags] of 0 or Z means no extra flags.

If [flags] were instead a token with "e[extra/bulk flags]" they could be appended which should make client code the simplest and remove my weird "front pointer" and backwards walk bullshittery.

with 's' tagged to the size we could keep with just VA rather than VA or HD but I honestly kind of like the distinction. I could also try code for adding VA vs HD, where VA means + data\r\nEN\r\n and HD just means HD is the whole line, then the client has to find the size token to decide if it's binary blob or parsing until EN\r\n.

benefits:

  • avoids flag waste
  • makes req/resp more human readable
  • doesn't complicate parse/response too much, but can make it a tiny bit slower vs blind copying the flags. probably same speed or faster than having to flip s -> S in response otherwise.
  • preserving order matters a bit less.

@dormando
Copy link
Member Author

dormando commented Nov 13, 2019

mg foo s f v t l h N30 R15

is... 5 extra bytes.

mg foo esfvtlh N30 R15

is 1 extra byte (e)

The only single flags that get returned that mean anything are W / X / Z. flags like q/u/etc get eaten. all other flags return a token. Meaning there isn't much point in flag compression for return:

VA [size] W X s2 f15 t299 l5 h1\r\n
hi\r\n
EN\r\n

(with size being redundant here; you probably only need it when not fetching the value)

The simplest approach would just be to always space delim flags for both request/response. I'll try adding an 'e' flag for client side compression and see how that complicates the parsing code. I expect a few more bare response flags to be added over time, but not enough to make a huge dent in bytes-on-wire :/ Avoiding mirroring the flags in the response saves a few bytes in the response overall, so it's a wash anyway.

doing the flag token prepend will make code a bit longer. might be able to be clever about prepending the token or at least use a macro.

@dormando
Copy link
Member Author

dormando#3 fix is being tested here. on top of the network-readbuf branch to avoid a merge nightmare

@dormando
Copy link
Member Author

dormando commented Feb 14, 2020

With the previous change to flags, command parsing is now fully agnostic to the commands used...

ie; a client can have convenience functions for collapsing/aliasing flags, but it doesn't need anything beyond a basic "send" and maybe a "send but it's a set" function. Different commands use unique 2-char codes.

The last thing I was thinking about (while doing the mctester work) was if the return codes could be collapsed further:

  • mg has VA HD EN as returns.
  • ms has ST NS NF EX
  • md has DE NF EX
  • mn has MN

then prefixes for SE/CL/ER are for full strings of SERVER_ERROR, ERROR, CLIENT_ERROR... which are unavoidable as the protocol is live-switchable with originall text.

NF == Not_Found
EX == EXISTS (CAS doesn't match)
ST/DE both mean "success"
NS == NOT_STORED (but not because of an error); means add or replace type commands failed.

Honestly I think adding an "OK" to replace ST, DE, maybe MN and maybe HD would be enough. We can always add codes when absolutely necessary, but any remaining commands to implement can likely use one of the existing codes, with flag sprinkles for command specific information.

In the test client code, all of the above are parsed the same way already. It should be possible to add the ma (arithmetic) command without adding any client code.

Edit: Think I want to keep MN for human-readability sake. HD can change to OK.

@dormando
Copy link
Member Author

#612 easy to talk myself into this one.

@dormando
Copy link
Member Author

need to add a new do_store_item routine or adjust it somehow...

in binprot, if you send an ADD with a CAS it'll change that to a SET.. whereas doing so doesn't actually make sense (ADD with CAS will fail if the item exists, and if the item doesn't exist). REPLACE sort of works (same effect as CAS), but in the ADD case it should autofail but doesn't.

in do_store_item CAS is a specific subcommand (NREAD_CAS). Except for append/prepend, which does a quick CAS check if there is one on the item. Auditing for calls to ITEM_set_cas show that it'll be 0 unless supplied (binprot) or via CAS command for ASCII protocol.

Could refactor to check for a CAS value at top of function and set an enum properly, then reorganize the various sub routines. This could change behavior slightly but means we can get the CAS counters for all use cases.

@dormando
Copy link
Member Author

Well I refactored do_store_item's rats nest of logic into something much more readable .. yet... haven't solved this specific issue.

If a CAS is supplied to metaset, it flips the current command from NREAD_SET to NREAD_CAS. I need to make ADD/REPLACE/APPEND/PREPEND work.

  • append/prepend should JFW. Except if you specify the CAS after setting an append/prepend flag the logic will set you to NREAD_CAS so need to restructure that.
  • ADD will always fail with CAS; but it should still do an LRU bump. Problem is you'll get a NOT_STORED result rather than EXISTS which is more proper for CAS. This might just be fine?
  • REPLACE with CAS should simply become NREAD_CAS since CAS cannot win unless an item exists already... except with current semantics you get a NOT_STORED from replace.

This just sort of sucks because do_store_item is shared with all 3 protocols. I'd prefer to keep it that way...

@dormando
Copy link
Member Author

dormando commented Mar 4, 2020

Organizing thoughts for ma (meta arithmetic):

  • N(token) (autovivify on miss, set TTL to etc)
  • T(token) (incr/decr-with-touch)
  • M[I|D] (mode switch for incr/decr. default incr. maybe A/S for add/subtract? then M/D for mul/div)
  • J(token) (set initial value during autovivify. default 1)
  • C (supply CAS)
  • c (return CAS. in binprot the cas value seems to get updated on change and a new one returned. not sure if that's too limiting or not)

should be trivial to add modes for multiplication and division, maybe shifting. need more feedback on how people could benefit from expanding the counter system.

ascii requires the incr/decr with add fallback for initial values. binprot fixed that by allowing an initial value with a seeded expiration time and CAS twiddles. meta will allow touch + binprot parity.

there's no win/recache/etc routine here.

edit: code's coming along. had to do an extra loop post-delta to get cas/TTL/etc return to work... so the code's not as copypasta as the other commands.

need to make a decision re: value return. If requested, do the full VA format or add as a return token since it's always a short numeric? Leaning toward VA but going to sleep on it then finish things.
(it sends the full VA on request. it's not too painful to add a special token to get the number directly if people request so)

@dormando
Copy link
Member Author

dormando commented Mar 7, 2020

At this point we have binary protocol parity, and the further fixes noted here can be done incrementally.

@dormando
Copy link
Member Author

@shivnagarajan are you still working on / interested in this, by chance? I'm working on the C client issue now.

@shivnagarajan
Copy link
Contributor

I've got some changes in libmemcached/libmemcached for the meta commands addition. I can push them up to a branch and see where we can take it from there. Happy to help where ever possible

@casperisfine
Copy link

👋 I'm experimenting with converting https://github.com/petergoldstein/dalli (the most popular ruby client, only supports the binary protocol) to the meta commands protocol. It's actually fairly easy and I got almost everything working.

However there is a major feature missing to have proper compatibility with the binary protocol, it's to return the CAS value on ms (as you identified it in the issue body).

Is this something you are still planning to add?

Another more minor missing feature is the NOOP command. Dalli uses it to mark the end of a multi-get pipeline. Without it I have no idea how I could use the mg q flag to save on the response size.

cc @tylermercier

@shivnagarajan
Copy link
Contributor

@casperisfine

Another more minor missing feature is the NOOP command. Dalli uses it to mark the end of a multi-get pipeline. Without it I have no idea how I could use the mg q flag to save on the response size.

Will the meta no-op be sufficient for this?

@casperisfine
Copy link

Will the meta no-op be sufficient for this?

@shivnagarajan 🤦 it totally is, I simply somehow managed not to see it...

@dormando
Copy link
Member Author

@casperisfine CAS from ms is something I punted on temporarily because the code structure didn't make it easy. That's the third TODO item down in the top post in this issue. I haven't gone back to do it yet :(

RE the other thing, MN is exactly what you want as noted :)

@casperisfine
Copy link

That's the third TODO item down in the top post in this issue

Yes I saw it. I just wanted to communicate it's importance 😉

@dormando
Copy link
Member Author

Got it :) I'm a bit slammed right now but will revisit when I can.

@dormando
Copy link
Member Author

@casperisfine do you make use of this feature or is it mostly for feature parity in the client?

@casperisfine
Copy link

We don't even use dalli right now, nor the binary protocol, we're actually still mostly on the old ASCII protocol using a libmemcached binding, so no we don't use this feature, and if we were really committed to use the meta commands ASAP we could do without it.

But our goal is more to get rid of our libmemcached binding, so in this context going directly to the newer protocol, with Dalli as an API to have compatibility with many other modern libraries is appealing. That's why I'm looking for feature parity. I'd bet most software would work fine without it, but "eventually compatible" is a bit of a hard sell...

Either way there is no rush.

@dormando
Copy link
Member Author

That's good to know, thanks! I need to hear from people to figure out priorities :)

I'm working on a C client thing that I'll share soon too.

@dormando
Copy link
Member Author

dormando commented Jun 19, 2020

I'm throwing an attempt in at making a new C client: https://github.com/dormando/mcmc
See: dormando/mcmc#1 for a general idea of where it's at.

I'm hammering out the "make it work" parts now, which it's mostly there. I'll get a few limited tests going then immediately start using it on a project I'm working on now. The API should evolve into something stable and we can release it as a full C client base others can quickly use in language wrappings.

libmemcached is a lot of code and completely unmaintained. It includes client/server protocols, stats parsers (that're constantly out of sync), mallocs and holds onto read/write buffers. It also doesn't integrate into event loops at all; async mode uses poll so far as I can tell.

mcmc is completely agnostic to how you use it: it should be trivial to write tiny wrappers to embed it into any event loop. It's also tiny.

@shivnagarajan @casperisfine this may not solve any of your problems, or it might. I'm not expecting a long development time since the client is highly simplified. There's no binary protocol. I've iterated on the API a dozen times already, and I could probably write a synchronous client wrapper in most languages in a couple days at most. (not that I'm volunteering to do so; just that it should be easy)

@casperisfine
Copy link

Thanks for the link. However the meta commands protocol is simple enough that I'd rather implement it directly in Ruby than to bind a C library. At least until I get to the benchmark stage.

@dormando
Copy link
Member Author

I don't expect you to use it for dalli, but wasn't there another client in use that depended on libmemcached?

@casperisfine
Copy link

The one we currently use, but we'd like to simply get rid of it as it's not really maintained anymore: https://rubygems.org/gems/memcached. And we're only using it because of historical reasons, all our newer apps use Dalli. And in the Ruby community as a whole, Dalli is really the standard.

@dormando
Copy link
Member Author

dormando commented Jun 22, 2020

Ahhah. I must've misremembered it as a python client. Yeah that's great, do it :)

@dormando
Copy link
Member Author

Couple annoying things:

  • I'd used HD as a response code to differentiate from OK in the system commands. I then forgot this and switched everything to OK when I did the return code standardization pass. Fixing this would definitely break my slowdown promise (and any clients...)
  • I didn't reserve any response codes for future expansion. This may be fine. A strictly written client could throw an error if an unknown status code is added at some future point.. the fix for this is that we never add return codes to pre-existing meta commands.
  • On an mg miss, EN is returned without any flags. Should O k flags be returned on miss? I believe opaque is always returned in binprot, so it would be a regression for status codes to lose O/k flags. Similar issue for ma/ms/md I think.

@dormando
Copy link
Member Author

Well that got dropped...

I'd like to stabilize meta and mark it as final ASAP. My list of changes after having played with it for a while:

  • OK -> HD (GD maybe for "good"?). Other ascii commands can respond with OK so it makes response parsing less flexible (as noted last june apparently).
  • ms should be ms <key> <datalen> <flags>*\r\n instead of requiring the S flag. This makes low level protocol handling simpler, and was done for the VA response code in the past already.

Looks like I had some code level follow-ups from above that I'd forgotten about. LOGGER, refcount checks, etc. I also want to change the internal parser but am struggling to come up with an optimal approach. It's a hot path or I wouldn't try so hard

I think it's safe to make these changes since the meta protocol is still marked as EXPERIMENTAL in the documentation, but I worry it's been out there long enough that someone's relying on it. I may just have to be bold in this one case.

@dormando
Copy link
Member Author

dormando commented Aug 5, 2021

Most of these changes are in. Keeping this issue open a while longer until the top two TODO items get done.

@zenbones
Copy link

zenbones commented Mar 8, 2022

My 2 cents is that yes, EN should absolutely return O and k if requested. I'm writing a non-blocking client and the fact that the O flag is ignore by EN responses came as a show stopping shock. If the change is truly non-breaking, please consider having EN respect at least the O flag, and I should think k flag as well for consistency's sake.

@dormando
Copy link
Member Author

dormando commented Mar 10, 2022

You're two years late, but yes looks like I forgot to add that in :) It's not a breaking change and looks like I'd already planned on doing it, so I'll add it as soon as I can. There're some cases in the code where it's not trivial to add the flags.

@dormando
Copy link
Member Author

#938 will probably close out this issue when merged.

@dormando dormando closed this as completed Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants