-
Notifications
You must be signed in to change notification settings - Fork 9
V02: Fixed Typos and added 32 to 36 bit padding clarification #21
Conversation
Why padding from right? If the goal is to be able to lexicographically sort UUIDv7s then I think padding from the left would be better. Example:
If I would lexicographically sort UUIDv7s from both systems I would get an invalid order. |
Wouldn't you always get a sort error in this scenario since mixing 32-bit and 64-bit Unix Epoch introduces all kinds of differences? As for 32-bit start or end padding: That being said, I did toy around with both today while commenting on this thread: I came to the following conclusions. When using four zeros to start padding:
When using four zeros as end padding:
Python Test Code
Sample Output:
I am open to changing to the alternative as long as we are consistent, it is documented and it does not introduce more issues than it fixes. |
Imho we shoud not add right pad because the RFC-4122 describes fields and operations on UUIDs in terms of integers, bits and bytes. The canonical format is just a string representation. The RFC-4122 says regarding to padding: "Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first." [1] Another document I read a few days ago says: "The hexadecimal representation of a byte value is the two-character string created by expressing value in hexadecimal using ASCII lower hex digits, left-padded with '0' to reach two characters." [2] [3] UUID3, UUID4, and UUID5 have 1/16 chance of starting with zeros. If an implementation removes leading zeros after converting from integer or binary to hexadecimal, the resulting string will not be a canonical representation. I think the same in the case of UUIDv7. The 4 bits represented by that leading zero will be used in the future after the year 2106. If we add the right pad, the timestamp will not be able to contain dates after that year (although I don't expect to be here on that date). The leading zero is what we see today. In January 1st, 1977, we would see 2 leading zeros ( The right side padding recommendation creates 2 incompatible UUID7 sub-versions: one with padding and one without padding. If user needs to extract creation time from UUID7, how does he know if timestamp uses a right pad or not? Also, 32-bit timestamp should be avoided due to the Year 2038 problem. |
I would like to propose the change of the time bit length to 34 bits. It solves the problem with the leading zero. With a 34 bit timestamp, the maximum date is around 2514 AD. Far enough. It also leaves 2 bits free to increase entropy in the UUID7. I implemented this prototype in Java to test my proposal: package com.github.uuid6;
import java.security.SecureRandom;
import java.time.Instant;
import java.util.UUID;
public final class Uuid7with34bits {
private static long counter = 0;
private static long prevTime = 0;
private static final int VERSION = 7;
private static final int I64_BITS = 64;
private static final int SEC_BITS = 34; // unixts bits
private static final int SUBSEC_BITS = 10; // millisecond precision
private static final int COUNTER_BITS = 16;
private static final int SEC_SHIFT = I64_BITS - SEC_BITS; // 30
private static final int SUBSEC_SHIFT = I64_BITS - SEC_BITS - SUBSEC_BITS; // 20
private static final long PRECISION = 1000L;
private static final long SUBSEC_FACTOR = (1 << SUBSEC_BITS); // 2^10 = 1024
private static final long COUNTER_LIMIT = (1 << COUNTER_BITS); // 2^16 = 65536
protected static final SecureRandom SECURE_RANDOM = new SecureRandom();
public static synchronized UUID next() {
// get the current time
final long time = System.currentTimeMillis();
// get seconds and sub-seconds
final long sec = time / PRECISION;
final long subsec = (long) (((time % PRECISION) / (double) PRECISION) * SUBSEC_FACTOR);
// increment the counter the time repeats
if (time == prevTime) {
if (++counter >= COUNTER_LIMIT) {
counter = 0;
}
} else {
counter = 0;
}
// concatenate `secs` with `subsecs` in the the most significant bits
long msb = (sec << SEC_SHIFT) | (subsec << SUBSEC_SHIFT) //
| (counter & 0b1111_0000_0000_0000) << 4 | (counter & 0b0000_1111_1111_1111);
// apply the version number
msb = (msb & 0xffffffffffff0fffL) | (VERSION << 12);
// get random last significant bits
long lsb = SECURE_RANDOM.nextLong();
// set the variant number
lsb = (lsb & 0x3fffffffffffffffL) | 0x8000000000000000L;
// save the time
prevTime = time;
return new UUID(msb, lsb);
}
public static void main(String[] args) {
System.out.println("List of UUID7 with 34 bits time:\n");
for (int i = 0; i < 10; i++) {
UUID uuid = Uuid7with34bits.next();
System.out.println(uuid);
}
final long maximumUNIXTS = 0x3ffffffffL; // all 34 bits set to 1
System.out.println("\nMaximum date with 34 bits:\n");
System.out.println(Instant.ofEpochSecond(maximumUNIXTS));
System.out.println(Instant.ofEpochSecond(maximumUNIXTS >> 1) + " (signed)");
}
} Output:
|
This one is also really important.
If they represent seconds in both cases then it does not matter until 2038.
Time in programming is always problematic. I cannot provide any reference and from my experience it's all over the place. I can provide some examples in programming languages I'm familiar with. Precision in bits:
JavaScript
PHP
Go
Edit: Removed fractional where I meant exponent. Edit 2: Fixed metric prefixes. Mixed up microseconds and milliseconds. |
@fabiolimace @nerg4l good points all around. I also went through some old email exchanged between Brad and I and his original goal aligns with yours. Fault here is all mine for reading this wrong. I just updated the timestamp section to appropriately convey this. (Note, not sure why the commit doesn't show up in this PR since it is on the same branch as the original PR? If it doesn't show up I will open another PR.) Brad's Quotes in my inbox:
I also fixed #22 along with a forward reference on using UUIDv8 if an implementation/application requires an untruncated 64 bit Unix timestamp vs UUIDv7's 36-bit variant. @fabiolimace Let's continue to discuss 34 vs 36 for |
@kyzer-davis I can confirm, your latest commit is not visible. |
Yeah, let me close PR this and re-pull. I am not sure what happened :/ |
Updated V02 .xml (use for diff), .txt, and .html file