I am sure I am not the first to do it, but I have asked ChatGPT to implement a UUID v7 function based on RFC 9562. It did not get it right the first time, but after some back and forth, it gave me this answer:

import time, random, uuid

def generate_uuid_v7_fast():
    ts = int(time.time() * 1000) & ((1 << 48) - 1)
    upper = (ts << 16) | ((7 << 12) | random.getrandbits(12))
    lower = (0b10 << 62) | random.getrandbits(62)
    return str(uuid.UUID(int=(upper << 64) | lower))

Compared to the reference implementation, it is more than 2 times slower (+117% on my system). However, the reference implementation returns a bytearray while this version returns a string after a call to uuid.UUID().

When I asked to explain why its code was slower, it replied the following:

Enhancing speed The code leverages os.urandom and uses optimal bit manipulation for timestamp and version fields. This approach boosts system performance by being fast and efficient. This implementation is faster because it directly modifies a pre-allocated 16-byte array (using optimized C-level calls via os.urandom) and uses simple bitwise operations, avoiding the overhead of multiple Python-level arithmetic, struct packing, and conversions.

It did not see the slowness came from the str() and uuid.UUID(). Without those 2 calls, its code is actually 40% faster than the reference implementation on my system.

Then, for fun, I asked it to add sub-millisecond precision and it gave me the following code:

import time, random, uuid

def generate_uuid_v7_fast():
    # Get current time in nanoseconds.
    t_ns = time.time_ns()
    # Derive the millisecond portion (48 bits).
    ms = t_ns // 1_000_000
    # Derive the sub-millisecond remainder.
    rem = t_ns % 1_000_000
    # Map the remainder into 12 bits (0–4095).
    sub_ms = (rem * 4096) // 1_000_000
    # Build the 64-bit "upper" part:
    #  - Upper 48 bits: ms timestamp.
    #  - Next 4 bits: version (0x7).
    #  - Last 12 bits: sub-millisecond fraction.
    upper = (ms << 16) | ((7 << 12) | sub_ms)
    # Build the 64-bit "lower" part with 2-bit variant (0b10) and 62 random bits.
    lower = (0b10 << 62) | random.getrandbits(62)
    # Combine into a 128-bit integer.
    return (upper << 64) | lower

This is even faster than the previous one. I guess this is because the first random.getrandombits() is replaced with simple arithmetics.

Note that this implementation uses random.getrandombits() and not os.urandom(). The latter one provides a better source of randomness. However, since I do not intend to use this for cryptographic purposes, it will do just fine.

PS: I did not check the correctness of the algorithm, but ChatGPT is convinced it is correct so… :-)