v0.2.0: RFC-9562 Support #53

cch1 · 2024-07-01T16:18:39Z

This library has worked really well for us generating deterministic (v5) UUIDs. We also would benefit from using the new v7 UUIDs and would prefer to centralize UUID support in clj-uuid. Is there any plan to add support for UUID Version 7?

The text was updated successfully, but these errors were encountered:

dco-lentz · 2024-07-01T16:34:09Z

Yes, all of the tools to implement these should be in place. I’ll take a look Thanks, Dan

…

On Jul 1, 2024, at 12:19 PM, Chris Hapgood ***@***.***> wrote: This library has worked really well for us generating deterministic (v5) UUIDs. We also would benefit from using the new v7 UUIDs and would prefer to centralize UUID support in clj-uuid. Is there any plan to add support for UUID Version 7 <https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-04.html>? — Reply to this email directly, view it on GitHub <#53>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AC4YNXXHIXIOK7I5XQ5O6FLZKF6PJAVCNFSM6AAAAABKF4RTGSVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM4DIMRTHE2DAOA>. You are receiving this because you are subscribed to this thread.

franks42 · 2024-09-02T00:28:49Z

Waiting for v7 support - I can see the code changes in branch "53-v7-support" from last July - any release date planned?
Or maybe tag that addition such that we can pull it out of github to test/use it more easily?

franks42 · 2024-09-05T19:49:50Z

This new RFC 9562 section 6.2 has a very cryptic prescription of the v7 implementation, which will yield many different implementations with different "guarantees" about the monotonicity and the associated lexicographical order of the generated uuidv7's - a recipe for many future obscure bugs!
I've looked at different clojure and java implementations of uuidv7, and yours is the first one that seems to guarantee that every newly generated uuidv7 compares "positive" (lexicographically speaking) to the previous one. (it took me awhile to notice that if you ask for uuid's within a millisecond you could get burned by the loss of monotonicity in some implementations...)
If I understand your code well, then the "monotonic-unix-time-and-random-counter" function maintains the millisecond timer and the counter of the previously generated uuid in an atom, increments the counter when a new uuid has the be generated within that same millisecond, and when you run out of counter "runway", you loop-recur until a new millisecond has arrived... is that correct?
Anyway... hat-tip to the coder who implemented this for clj-uuid!
Btw, you should document this monotonicity guarantee loudly, in bold and all caps, as folks will not realize what sets this implementation apart!!!
Let me know if I can help with the release in any way.

franks42 · 2024-09-06T19:17:22Z

A few more observations and suggestions:

The counter can run from 0 to 4095 but is initialized to a random start value between those values to add a little more random and to obfuscate correlations between uuid-values from the same generator for privacy reasons.
The time it takes to generate one uuidv7 is about 0.5 microsec for the bulk-generation of 1 million uuids on my MBP M1 max. Or within one millisecond, it could generate about 2000 uuids. Probably my laptop has an average CPU, the generation can probably be more optimized, but it seems to be in about the order of the maximum number of uuids that can be generated (roughly 4000) within one millisecond before the counter reaches its maximum.

When the counter runs out, the generator will spin in a loop-recur till the next millisecond arrives. This should be avoided as much as possible as that cpu-core will show 100% utilization just spinning for that duration.
For normal operation, you will hardly ever hit the counter limit... when you reinitialize the counter to 0.
However, when you randomly initialize the counter, then statistically you will "often" start with a high counter value such that the counter limit will be reach "easily", and the generator will end up in this loop-recur - worst case is spinning for the full one millisecond.

One option to limit the chance to loop-recur it to reinitialize the counter to 0 instead of randomly between 0-4095. In that way you will always be able to generate the maximum uuids within a millisecond and limit the chance to run out the counter. IMHO the loss of privacy and random is not substantial ... also the rfc is not very prescriptive about this...

Another option is to sleep for a number of nanoseconds instead of spinning in that loop-recur. You could use a java.time.Instant that has a nanosecond precision (although on my Mac it yields only relevant microseconds), and call Thread/sleep with the number of nanoseconds till the next millisecond arrives. This kind of works in my repl... but don't want to spend more time investigating unless it becomes more relevant.

franks42 · 2024-09-08T20:18:51Z

After reading some of the discussions about the other java based implementations of v7, I don't think that the limitations are clear about this guaranteed lexical monotonicity. As far as I understand, you have to maintain state in order to generate the monotonic uuids. When you randomize the 12-bit after the version number, then you can have only one uuid per ms and if an implementation generates more than one uuid per ms, then they essentially force the client to check and discard uuids when monotonicity is a hard requirement.
When you use the 12-bit for an extension of the clock to microseconds, you can have 1000 uuids per ms and could use 2 bits for a counter per microsecond.
When you use the full 12-bits for a counter you can generate a maximum of 4096 uuids per millisecond, which is the maximum number possible within the v7 scheme where you have an additional 12-bits to play with. In other words a uuid-generator can only guarantee a maximum of 4096 uuids per millisecond, or about 4 million per second - no matter how much cpu power you may have. Furthermore the monotonicity can only be guaranteed by a single single-threaded v7-generator. Hopefully my understanding here is correct...

I've tested the previously suggested code changes about using a nanosecond timer to avoid spinning in a loop-recur, and the following seems to work ok (?).
Please read the code as a suggestion - it should be a replacement of the monotonic-unix-time-and-random-counter - just trying to help:

(let [-state- (atom (->State 0 0))]
  (defn monotonic-unix-time-and-counter []
    (let [^State new-state
          (swap! -state-
                 (fn [^State current-state]
                   (loop [time-now (java.time.Instant/now)]
                     (let [time-now-epoch-millis (.toEpochMilli time-now)
                           nanos (.getNano time-now)
                           nanos-till-ms (min 999999 (- 1000000 (rem nanos 1000000)))]
                       (if-not (= (.millis current-state) time-now-epoch-millis)
                         (->State 0 time-now-epoch-millis)
                         (let [tt (.seqid current-state)]
                           (if (< tt +random-counter-resolution+) 
                             (->State (inc tt) time-now-epoch-millis)
                             (do ;; recur when counter is out of runway - sleep until new millisecond
                               (java.lang.Thread/sleep 0 nanos-till-ms)
                               (recur (java.time.Instant/now))))))))))]
      [(.millis new-state) (.seqid new-state)])))

cch1 · 2024-09-12T13:23:15Z

UUID v7 support is anxiously anticipated for use with Datomic! https://clojurians.slack.com/archives/C03RZMDSH/p1725976130910969

dco-lentz · 2024-09-14T11:24:17Z

ok -- sorry I was out of office. Lets see if we can get this done quickly

dco-lentz · 2024-09-15T13:42:42Z

@franks42 this is pretty cool. I was also not feeling comfortable about not using all of the available bits and initializing to random number. This makes sense to me -- does anyone disagree? "Monotonicity can only be guaranteed by a single thread UUID generator" seems like a big caveat -- is it actually better to use loop/recur if it provides a monotonicity guarantee?

dco-lentz · 2024-09-15T13:55:50Z

My thought is "correctness first". If everyone agrees that the current implementation is correct and continues to provide a monoticity guarantee?

franks42 · 2024-09-17T10:21:13Z

The rfc is difficult to parse about what a “correct” implementation of v7 should look like.It feels like they seem to leave the monotonicity guarantee open - up to the implementer or more correctly the v7 generator service.The service level guarantee for a particular v7 generator service could be that the monotonicity is only guaranteed for the time component and that it’s the client’s responsibility to keep track of possible collisions if the monotonicity is important for the client’s use case. The partial monotonicity may be good enough for clients for whom a clustering of uuids around the millisecond precision of the uuid’s time component is all they need.If you require true monotonicity, then the SLA of a dedicated v7 generator could guarantee that for the whole uuid… but the price to pay is an absolute maximum number of uuids that can be generated: 1 per millisecond till upto 4000 per millisecond… depending on how the 12 bits are used.Guess any v7 generator should come with an SLA such that the client knows what to expect and whether or not it meets its requirements.Is your understanding of the rfc “similar”?Regards, Frank.On Sep 15, 2024, at 9:56 PM, Dan Lentz ***@***.***> wrote: My thought is "correctness first". If everyone agrees that the current implementation is correct and continues to provide a monoticity guarantee? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

dco-lentz · 2024-09-19T14:38:34Z

@franks42 and others. yes reading though chapter 6 again, but, also, yes -- I can locally reproduce the monotonicity failures in the v7 concurrency tests. I have not yet seen v6 fail in that same way.

edit -- nvm -- as soon as i said that I hit one in another window.

dco-lentz · 2024-09-19T14:57:42Z

@franks42 i did a criterium benchmark on monotonic-unix-time-and-counter and found that (at least in a single threaded example) it didn't perform as well. would you have expected that to be the case?

(require '[criterium.core :as criterium :refer [bench]])

(bench (and (repeatedly 100000000 monotonic-unix-time-and-random-counter) nil))

;; Evaluation count : 28688470740 in 60 samples of 478141179 calls.
;;              Execution time mean : 0.414950 ns
;;     Execution time std-deviation : 0.044658 ns
;;    Execution time lower quantile : 0.379115 ns ( 2.5%)
;;    Execution time upper quantile : 0.501328 ns (97.5%)
;;                    Overhead used : 1.640844 ns



(bench (and (repeatedly 100000000 monotonic-unix-time-and-counter) nil))

;; Evaluation count : 17791729260 in 60 samples of 296528821 calls.
;;              Execution time mean : 1.794180 ns
;;     Execution time std-deviation : 0.074041 ns
;;    Execution time lower quantile : 1.730718 ns ( 2.5%)
;;    Execution time upper quantile : 1.930633 ns (97.5%)
;;                    Overhead used : 1.640844 ns

franks42 · 2024-09-20T01:06:16Z

(Note that I'm traveling and won't be able to make code changes and run tests for another week...) Interesting results - seems like my version is about 4-5 times slower than yours when you run it in a tight loop for 100 million times. As far as I can see, the differences in the code are: * use of `java.time.Instant/now` vs `System/currentTimeMillis` Would be interesting to see a benchmark between those two calls to see if that could explain (?) * start counting from 0 vs random value between 0-4095 Doesn't feel that this difference can account for the benchmark results - probably easy to make your implementation start at 0 for a test (?) * `java.lang.Thread/sleep` vs loop-recur spinning till the next ms when you replace the sleep with the recur of your implementation... would that make a difference? When you run in such a tight loop without the need for random-number generation for the rest of the uuid fields (except for setting the initial counter), then both implementations should easily hit the 4095 threshold every ms when the average call-time is < 2ns. This would mean that both implementations will be called 4095 times in a ms and then would either be sleeping or spinning the rest of that ms in that single last call of that ms... Intuitively that should make both implementations yield the exact same number of calls per ms and the benchmark numbers should be identical (?). (4095 calls per ms or about 250ns per call which is very different from the benchmark) What am I missing? Intriguing ;-) Regards, Frank.

…

On Thu, Sep 19, 2024 at 10:58 PM Dan Lentz ***@***.***> wrote: @franks42 <https://github.com/franks42> i did a criterium benchmark on monotonic-unix-time-and-counter and found that (at least in a single threaded example) it didn't perform as well. would you have expected that to be the case? (require '[criterium.core :as criterium :refer [bench]]) (bench (and (repeatedly 100000000 monotonic-unix-time-and-random-counter) nil)) ;; Evaluation count : 28688470740 in 60 samples of 478141179 calls.;; Execution time mean : 0.414950 ns;; Execution time std-deviation : 0.044658 ns;; Execution time lower quantile : 0.379115 ns ( 2.5%);; Execution time upper quantile : 0.501328 ns (97.5%);; Overhead used : 1.640844 ns (bench (and (repeatedly 100000000 monotonic-unix-time-and-counter) nil)) ;; Evaluation count : 17791729260 in 60 samples of 296528821 calls.;; Execution time mean : 1.794180 ns;; Execution time std-deviation : 0.074041 ns;; Execution time lower quantile : 1.730718 ns ( 2.5%);; Execution time upper quantile : 1.930633 ns (97.5%);; Overhead used : 1.640844 ns — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABMHXVVQZAS77X5KHAHCYDZXLQ75AVCNFSM6AAAAABKF4RTGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRRGI2TGMZUGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

dco-lentz · 2024-09-23T17:41:14Z

Ok, Just an update. I've improved the handling of out-of-order timestamps and also the rigor of the tests, including multithreaded monotonicity for v1, v6, and v7 UUID's. In this version, I decreased the random initialization of the v7 subcounter to 8 bits, which should almost double the average number of UUID's per ms (and the spec reads "randomly initialized counter -- so I don't see that as necessarily requiring all of the counter bits to be used for that randomness).

Even so, the secure random aspects of v7 do have a cost.

user> (bench (uuid/v1))
Evaluation count : 599008800 in 60 samples of 9983480 calls.
             Execution time mean : 98.724581 ns

user> (bench (uuid/v6))
Evaluation count : 599782920 in 60 samples of 9996382 calls.
             Execution time mean : 98.799153 ns

user> (bench (uuid/v4))
Evaluation count : 231001080 in 60 samples of 3850018 calls.
             Execution time mean : 270.384250 ns

user> (bench (uuid/v7))
Evaluation count : 121301880 in 60 samples of 2021698 calls.
             Execution time mean : 500.258992 ns

If anyone has a time to look through, I'd appreciate any feedback. Otherwise I've got some more work to do on the documentation to get it ready for release.

dco-lentz · 2024-09-24T20:05:37Z

@cch1 note the updates in clj-uuid.clock/monotonic-unix-timestamp-and-random-counter. You should switch to this version if you're still using your original, as it adds correct handling for out-of-order system clock readings, which is an important fix.

dco-lentz · 2024-09-25T14:02:50Z

ok -- documentation has been updated and clj-uuid 0.2.0-SNAPSHOT is now available for use review on Clojars: https://clojars.org/danlentz/clj-uuid/versions/0.2.0-SNAPSHOT

Greatly appreciate all of the input (and encouragement!). I'm planning on merging and releasing 0.2.0 shortly, unless there is any feedback?

cch1 · 2024-09-26T13:39:34Z

Awesome -thanks for this work, @danlentz .

dco-lentz self-assigned this Jul 10, 2024

dco-lentz added the enhancement label Jul 10, 2024

dco-lentz linked a pull request Jul 16, 2024 that will close this issue

RFC9562 #54

Merged

3 tasks

dco-lentz changed the title ~~v7 Support~~ RFC-9562 Support Sep 23, 2024

dco-lentz mentioned this issue Sep 24, 2024

RFC9562 #54

Merged

3 tasks

dco-lentz added this to the 0.2.0 milestone Sep 24, 2024

dco-lentz pinned this issue Sep 24, 2024

danlentz closed this as completed in #54 Sep 25, 2024

danlentz mentioned this issue Sep 25, 2024

Incorrect lexical ordering #50

Closed

dco-lentz changed the title ~~RFC-9562 Support~~ v0.2.0: RFC-9562 Support Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0: RFC-9562 Support #53

v0.2.0: RFC-9562 Support #53

cch1 commented Jul 1, 2024

dco-lentz commented Jul 1, 2024 via email

franks42 commented Sep 2, 2024

franks42 commented Sep 5, 2024

franks42 commented Sep 6, 2024

franks42 commented Sep 8, 2024

cch1 commented Sep 12, 2024 •

edited

Loading

dco-lentz commented Sep 14, 2024

dco-lentz commented Sep 15, 2024 •

edited

Loading

dco-lentz commented Sep 15, 2024 •

edited

Loading

franks42 commented Sep 17, 2024 via email

dco-lentz commented Sep 19, 2024 •

edited

Loading

dco-lentz commented Sep 19, 2024

franks42 commented Sep 20, 2024 via email

dco-lentz commented Sep 23, 2024 •

edited

Loading

dco-lentz commented Sep 24, 2024

dco-lentz commented Sep 25, 2024

cch1 commented Sep 26, 2024

v0.2.0: RFC-9562 Support #53

v0.2.0: RFC-9562 Support #53

Comments

cch1 commented Jul 1, 2024

dco-lentz commented Jul 1, 2024 via email

franks42 commented Sep 2, 2024

franks42 commented Sep 5, 2024

franks42 commented Sep 6, 2024

franks42 commented Sep 8, 2024

cch1 commented Sep 12, 2024 • edited Loading

dco-lentz commented Sep 14, 2024

dco-lentz commented Sep 15, 2024 • edited Loading

dco-lentz commented Sep 15, 2024 • edited Loading

franks42 commented Sep 17, 2024 via email

dco-lentz commented Sep 19, 2024 • edited Loading

dco-lentz commented Sep 19, 2024

franks42 commented Sep 20, 2024 via email

dco-lentz commented Sep 23, 2024 • edited Loading

dco-lentz commented Sep 24, 2024

dco-lentz commented Sep 25, 2024

cch1 commented Sep 26, 2024

cch1 commented Sep 12, 2024 •

edited

Loading

dco-lentz commented Sep 15, 2024 •

edited

Loading

dco-lentz commented Sep 15, 2024 •

edited

Loading

dco-lentz commented Sep 19, 2024 •

edited

Loading

dco-lentz commented Sep 23, 2024 •

edited

Loading