Generators

UUID 生成器

生成符合 RFC 4122 标准的 UUID(v4),支持单个或批量生成。

Advertisement

What is a UUID?

A UUID (Universally Unique Identifier), also known as a GUID (Globally Unique Identifier) in Microsoft ecosystems, is a 128-bit identifier designed to be unique across both space and time without requiring a central coordinating authority. The identifier is typically represented as 32 hexadecimal digits displayed in five groups separated by hyphens, in the pattern 8-4-4-4-12, for a total of 36 characters including hyphens.

The concept was originally standardized by the Open Software Foundation in the UUID specification and later codified as RFC 4122. The fundamental promise of UUIDs is that collisions — two independently generated UUIDs being identical — are so astronomically improbable that they can be treated as impossible in practice. This property enables distributed systems to generate identifiers without coordination, a critical requirement for modern web-scale applications.

UUIDs solve a deceptively difficult problem: how do thousands of servers, devices, or users generate unique identifiers without talking to each other? Traditional approaches rely on a central authority handing out sequential numbers, but this creates a bottleneck and a single point of failure. UUIDs eliminate that bottleneck by trading sequential readability for guaranteed uniqueness through mathematical probability.

  • 128 bits of data, typically shown as 36 characters with hyphens
  • No central authority required for generation
  • Collision probability so low it is considered practically impossible
  • Standardized in RFC 4122 with multiple defined versions

The Different Versions of UUIDs

Not all UUIDs are created equal. The specification defines multiple versions, each using a different strategy to generate the 128 bits. Understanding the differences is essential for choosing the right version for your use case.

UUID v1 is time-based. It combines the current timestamp with the MAC address of the generating machine's network card. This produces UUIDs that are sortable by creation time and reveal the generating host's MAC address — which is a privacy concern in modern systems. UUID v2 is a lesser-used variant reserved for DCE Security with POSIX UIDs; it is rarely encountered in practice.

UUID v3 and v5 are name-based. They generate a UUID by hashing a namespace UUID and a name string using MD5 (v3) or SHA-1 (v5). Given the same namespace and name, they always produce the same UUID, which is useful for deterministic identifier generation. UUID v4 is by far the most popular: it uses random numbers for all 122 variable bits, making collisions essentially impossible and providing no metadata about the generating source.

  • v1: Time-based, uses MAC address (privacy concern)
  • v2: DCE Security version with POSIX UID (rarely used)
  • v3: Name-based with MD5 hashing (deterministic)
  • v4: Random (most common, recommended for most uses)
  • v5: Name-based with SHA-1 hashing (preferred over v3)

How UUID v4 Works Under the Hood

UUID v4 is the workhorse of modern applications because of its simplicity and strong uniqueness guarantees. Of the 128 bits in a v4 UUID, 6 bits are reserved to identify the version and variant, leaving 122 bits of randomness. This means there are 2^122 possible v4 UUIDs — a number so vast that generating a collision by chance is functionally impossible.

To put this in perspective: if you generated one billion UUIDs every second for an entire year, the probability of a single collision would still be effectively zero. The mathematics behind this is the birthday problem applied to a colossal number space, and the result is what gives developers the confidence to use v4 UUIDs as primary keys in databases and distributed systems without a uniqueness-checking step.

The implementation of v4 generation relies on a cryptographically secure random number generator, or at least a high-quality pseudo-random source. Most programming languages provide built-in support: JavaScript has crypto.randomUUID(), Node.js offers the uuid package, Python provides uuid.uuid4(), and most other languages have similar utilities. The version nibble is set to 4 and the variant bits are set to indicate the RFC 4122 layout, with the remaining bits filled by random data.

UUID Use Cases and Best Practices

UUIDs appear in countless places in modern software. They are commonly used as primary keys in databases, especially in distributed systems where multiple nodes insert records simultaneously. They serve as correlation IDs in microservices architectures, allowing a single logical request to be traced across many services. They identify devices, sessions, API tokens, uploaded files, and countless other entities where uniqueness without coordination is required.

When using UUIDs as database primary keys, store them in their native 16-byte binary format rather than as 36-character strings. Storing as strings wastes space, slows down indexes, and bloats foreign-key relationships. Most databases have a native UUID type or a binary type that can hold the 128 bits efficiently. If you must store as strings, consider normalizing the case and removing hyphens to reduce storage.

Be mindful of exposing UUIDs in URLs and public APIs. While UUIDs are not intended to be secret, using them as the only authorization mechanism is dangerous — an attacker who guesses or obtains a UUID could access resources they should not. Always pair UUIDs with proper authorization checks rather than treating unguessability as a security control.

  • Database primary keys in distributed or sharded systems
  • Correlation IDs for tracing requests across microservices
  • Session tokens, device identifiers, and anonymous user tracking
  • File names for user-uploaded content to avoid collisions
  • Event identifiers in event-sourced and CQRS architectures

UUID vs Auto-Increment IDs

The choice between UUIDs and auto-incrementing integers is one of the most debated topics in database design. Each approach has clear trade-offs that make it better suited to different scenarios. Auto-increment IDs are simple, compact (typically 4 or 8 bytes), naturally sortable, and human-readable. They work well for single-server databases and small-to-medium applications where sequential numbering is acceptable.

UUIDs shine in distributed environments where multiple nodes need to generate IDs without coordination. With auto-increment, you need a central authority to assign numbers, which becomes a bottleneck and a single point of failure. UUIDs also enable clients to generate IDs before saving, which simplifies offline-first applications and reduces round trips to the server. They reveal nothing about your database size or growth rate, whereas auto-increment IDs leak competitive intelligence — competitors can simply monitor your order numbers to estimate your sales volume.

The downsides of UUIDs are well documented. They are larger (16 bytes vs 4-8), which increases storage and index size. They are not naturally sortable, making time-ordered queries less efficient unless you use a time-ordered variant. They are less human-friendly in URLs and logs. Random UUIDs also cause index fragmentation in B-tree databases because inserts are scattered rather than sequential. Newer variants like UUID v7 and ULID address these issues by combining a timestamp with random bits, providing both uniqueness and sortability.

  • Auto-increment: compact, sortable, simple, leaks growth metrics
  • UUID: large, uncoordinated, privacy-preserving, distributed-friendly
  • UUID v7 / ULID: time-ordered, sortable, combines benefits of both
  • Consider sharding strategy and growth patterns when choosing

Performance Considerations with UUIDs

While UUIDs are mathematically elegant, they introduce real performance considerations that every architect should understand. The most significant impact is on database indexing. Random UUIDs (v4) cause B-tree index fragmentation because each insert goes to a random position rather than appending to the end. This leads to more page splits, larger indexes, and slower queries on large tables. Over time, a heavily-written UUID-keyed table can become noticeably slower than an equivalent integer-keyed table.

Storage size compounds the problem. A UUID stored as a string is 36 bytes, compared to 4 bytes for a 32-bit integer or 8 bytes for a 64-bit integer. This 4-9x size increase affects not just the primary key column but every foreign-key column that references it. Indexes grow proportionally, caching becomes less effective, and I/O increases. The remedy is straightforward: always use the native binary UUID type your database provides, which stores the value in 16 bytes.

For new systems where sortability matters, consider time-ordered identifiers such as UUID v7, ULID, or KSUID. These encode a timestamp in the high-order bits, so inserts append to the end of indexes rather than scattering randomly. They preserve the uncoordinated-generation property of UUIDs while delivering index performance closer to auto-increment integers. For existing systems already committed to v4 UUIDs, periodic index maintenance (rebuilding or reorganizing) can mitigate fragmentation, and partitioning by time ranges can keep working sets manageable.

  • Use native 16-byte UUID storage, never store as 36-char strings
  • Random v4 UUIDs cause B-tree index fragmentation over time
  • Consider UUID v7 or ULID for time-ordered, index-friendly keys
  • Plan index maintenance and partitioning for large UUID-keyed tables