Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.2.0: RFC-9562 Support #53

Closed
cch1 opened this issue Jul 1, 2024 · 17 comments · Fixed by #54
Closed

v0.2.0: RFC-9562 Support #53

cch1 opened this issue Jul 1, 2024 · 17 comments · Fixed by #54
Assignees
Milestone

Comments

@cch1
Copy link

cch1 commented Jul 1, 2024

This library has worked really well for us generating deterministic (v5) UUIDs. We also would benefit from using the new v7 UUIDs and would prefer to centralize UUID support in clj-uuid. Is there any plan to add support for UUID Version 7?

@dco-lentz
Copy link
Collaborator

dco-lentz commented Jul 1, 2024 via email

@dco-lentz dco-lentz self-assigned this Jul 10, 2024
@dco-lentz dco-lentz linked a pull request Jul 16, 2024 that will close this issue
3 tasks
@franks42
Copy link

franks42 commented Sep 2, 2024

Waiting for v7 support - I can see the code changes in branch "53-v7-support" from last July - any release date planned?
Or maybe tag that addition such that we can pull it out of github to test/use it more easily?

@franks42
Copy link

franks42 commented Sep 5, 2024

This new RFC 9562 section 6.2 has a very cryptic prescription of the v7 implementation, which will yield many different implementations with different "guarantees" about the monotonicity and the associated lexicographical order of the generated uuidv7's - a recipe for many future obscure bugs!
I've looked at different clojure and java implementations of uuidv7, and yours is the first one that seems to guarantee that every newly generated uuidv7 compares "positive" (lexicographically speaking) to the previous one. (it took me awhile to notice that if you ask for uuid's within a millisecond you could get burned by the loss of monotonicity in some implementations...)
If I understand your code well, then the "monotonic-unix-time-and-random-counter" function maintains the millisecond timer and the counter of the previously generated uuid in an atom, increments the counter when a new uuid has the be generated within that same millisecond, and when you run out of counter "runway", you loop-recur until a new millisecond has arrived... is that correct?
Anyway... hat-tip to the coder who implemented this for clj-uuid!
Btw, you should document this monotonicity guarantee loudly, in bold and all caps, as folks will not realize what sets this implementation apart!!!
Let me know if I can help with the release in any way.

@franks42
Copy link

franks42 commented Sep 6, 2024

A few more observations and suggestions:

The counter can run from 0 to 4095 but is initialized to a random start value between those values to add a little more random and to obfuscate correlations between uuid-values from the same generator for privacy reasons.
The time it takes to generate one uuidv7 is about 0.5 microsec for the bulk-generation of 1 million uuids on my MBP M1 max. Or within one millisecond, it could generate about 2000 uuids. Probably my laptop has an average CPU, the generation can probably be more optimized, but it seems to be in about the order of the maximum number of uuids that can be generated (roughly 4000) within one millisecond before the counter reaches its maximum.

When the counter runs out, the generator will spin in a loop-recur till the next millisecond arrives. This should be avoided as much as possible as that cpu-core will show 100% utilization just spinning for that duration.
For normal operation, you will hardly ever hit the counter limit... when you reinitialize the counter to 0.
However, when you randomly initialize the counter, then statistically you will "often" start with a high counter value such that the counter limit will be reach "easily", and the generator will end up in this loop-recur - worst case is spinning for the full one millisecond.

One option to limit the chance to loop-recur it to reinitialize the counter to 0 instead of randomly between 0-4095. In that way you will always be able to generate the maximum uuids within a millisecond and limit the chance to run out the counter. IMHO the loss of privacy and random is not substantial ... also the rfc is not very prescriptive about this...

Another option is to sleep for a number of nanoseconds instead of spinning in that loop-recur. You could use a java.time.Instant that has a nanosecond precision (although on my Mac it yields only relevant microseconds), and call Thread/sleep with the number of nanoseconds till the next millisecond arrives. This kind of works in my repl... but don't want to spend more time investigating unless it becomes more relevant.

@franks42
Copy link

franks42 commented Sep 8, 2024

After reading some of the discussions about the other java based implementations of v7, I don't think that the limitations are clear about this guaranteed lexical monotonicity. As far as I understand, you have to maintain state in order to generate the monotonic uuids. When you randomize the 12-bit after the version number, then you can have only one uuid per ms and if an implementation generates more than one uuid per ms, then they essentially force the client to check and discard uuids when monotonicity is a hard requirement.
When you use the 12-bit for an extension of the clock to microseconds, you can have 1000 uuids per ms and could use 2 bits for a counter per microsecond.
When you use the full 12-bits for a counter you can generate a maximum of 4096 uuids per millisecond, which is the maximum number possible within the v7 scheme where you have an additional 12-bits to play with. In other words a uuid-generator can only guarantee a maximum of 4096 uuids per millisecond, or about 4 million per second - no matter how much cpu power you may have. Furthermore the monotonicity can only be guaranteed by a single single-threaded v7-generator. Hopefully my understanding here is correct...

I've tested the previously suggested code changes about using a nanosecond timer to avoid spinning in a loop-recur, and the following seems to work ok (?).
Please read the code as a suggestion - it should be a replacement of the monotonic-unix-time-and-random-counter - just trying to help:

(let [-state- (atom (->State 0 0))]
  (defn monotonic-unix-time-and-counter []
    (let [^State new-state
          (swap! -state-
                 (fn [^State current-state]
                   (loop [time-now (java.time.Instant/now)]
                     (let [time-now-epoch-millis (.toEpochMilli time-now)
                           nanos (.getNano time-now)
                           nanos-till-ms (min 999999 (- 1000000 (rem nanos 1000000)))]
                       (if-not (= (.millis current-state) time-now-epoch-millis)
                         (->State 0 time-now-epoch-millis)
                         (let [tt (.seqid current-state)]
                           (if (< tt +random-counter-resolution+) 
                             (->State (inc tt) time-now-epoch-millis)
                             (do ;; recur when counter is out of runway - sleep until new millisecond
                               (java.lang.Thread/sleep 0 nanos-till-ms)
                               (recur (java.time.Instant/now))))))))))]
      [(.millis new-state) (.seqid new-state)])))

@cch1
Copy link
Author

cch1 commented Sep 12, 2024

UUID v7 support is anxiously anticipated for use with Datomic! https://clojurians.slack.com/archives/C03RZMDSH/p1725976130910969

@dco-lentz
Copy link
Collaborator

ok -- sorry I was out of office. Lets see if we can get this done quickly

@dco-lentz
Copy link
Collaborator

dco-lentz commented Sep 15, 2024

@franks42 this is pretty cool. I was also not feeling comfortable about not using all of the available bits and initializing to random number. This makes sense to me -- does anyone disagree? "Monotonicity can only be guaranteed by a single thread UUID generator" seems like a big caveat -- is it actually better to use loop/recur if it provides a monotonicity guarantee?

@dco-lentz
Copy link
Collaborator

dco-lentz commented Sep 15, 2024

My thought is "correctness first". If everyone agrees that the current implementation is correct and continues to provide a monoticity guarantee?

@franks42
Copy link

franks42 commented Sep 17, 2024 via email

@dco-lentz
Copy link
Collaborator

dco-lentz commented Sep 19, 2024

@franks42 and others. yes reading though chapter 6 again, but, also, yes -- I can locally reproduce the monotonicity failures in the v7 concurrency tests. I have not yet seen v6 fail in that same way.

edit -- nvm -- as soon as i said that I hit one in another window.

@dco-lentz
Copy link
Collaborator

@franks42 i did a criterium benchmark on monotonic-unix-time-and-counter and found that (at least in a single threaded example) it didn't perform as well. would you have expected that to be the case?

(require '[criterium.core :as criterium :refer [bench]])

(bench (and (repeatedly 100000000 monotonic-unix-time-and-random-counter) nil))

;; Evaluation count : 28688470740 in 60 samples of 478141179 calls.
;;              Execution time mean : 0.414950 ns
;;     Execution time std-deviation : 0.044658 ns
;;    Execution time lower quantile : 0.379115 ns ( 2.5%)
;;    Execution time upper quantile : 0.501328 ns (97.5%)
;;                    Overhead used : 1.640844 ns



(bench (and (repeatedly 100000000 monotonic-unix-time-and-counter) nil))

;; Evaluation count : 17791729260 in 60 samples of 296528821 calls.
;;              Execution time mean : 1.794180 ns
;;     Execution time std-deviation : 0.074041 ns
;;    Execution time lower quantile : 1.730718 ns ( 2.5%)
;;    Execution time upper quantile : 1.930633 ns (97.5%)
;;                    Overhead used : 1.640844 ns

@franks42
Copy link

franks42 commented Sep 20, 2024 via email

@dco-lentz
Copy link
Collaborator

dco-lentz commented Sep 23, 2024

Ok, Just an update. I've improved the handling of out-of-order timestamps and also the rigor of the tests, including multithreaded monotonicity for v1, v6, and v7 UUID's. In this version, I decreased the random initialization of the v7 subcounter to 8 bits, which should almost double the average number of UUID's per ms (and the spec reads "randomly initialized counter -- so I don't see that as necessarily requiring all of the counter bits to be used for that randomness).

Even so, the secure random aspects of v7 do have a cost.

user> (bench (uuid/v1))
Evaluation count : 599008800 in 60 samples of 9983480 calls.
             Execution time mean : 98.724581 ns

user> (bench (uuid/v6))
Evaluation count : 599782920 in 60 samples of 9996382 calls.
             Execution time mean : 98.799153 ns

user> (bench (uuid/v4))
Evaluation count : 231001080 in 60 samples of 3850018 calls.
             Execution time mean : 270.384250 ns

user> (bench (uuid/v7))
Evaluation count : 121301880 in 60 samples of 2021698 calls.
             Execution time mean : 500.258992 ns

If anyone has a time to look through, I'd appreciate any feedback. Otherwise I've got some more work to do on the documentation to get it ready for release.

@dco-lentz dco-lentz changed the title v7 Support RFC-9562 Support Sep 23, 2024
@dco-lentz dco-lentz mentioned this issue Sep 24, 2024
3 tasks
@dco-lentz dco-lentz added this to the 0.2.0 milestone Sep 24, 2024
@dco-lentz dco-lentz pinned this issue Sep 24, 2024
@dco-lentz
Copy link
Collaborator

@cch1 note the updates in clj-uuid.clock/monotonic-unix-timestamp-and-random-counter. You should switch to this version if you're still using your original, as it adds correct handling for out-of-order system clock readings, which is an important fix.

@dco-lentz
Copy link
Collaborator

ok -- documentation has been updated and clj-uuid 0.2.0-SNAPSHOT is now available for use review on Clojars: https://clojars.org/danlentz/clj-uuid/versions/0.2.0-SNAPSHOT

Greatly appreciate all of the input (and encouragement!). I'm planning on merging and releasing 0.2.0 shortly, unless there is any feedback?

@cch1
Copy link
Author

cch1 commented Sep 26, 2024

Awesome -thanks for this work, @danlentz .

@dco-lentz dco-lentz changed the title RFC-9562 Support v0.2.0: RFC-9562 Support Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants