What Are Hash Functions
A hash function is a deterministic function that takes an input of arbitrary size and produces a fixed-size output called a hash, digest, or checksum. The same input always produces the same output, while even a tiny change to the input produces a completely different hash — a property known as the avalanche effect.
Cryptographic hash functions add several important properties to this basic idea. They are one-way: it is computationally infeasible to reconstruct the input from the hash. They are collision-resistant: it is infeasible to find two distinct inputs that produce the same hash. And they are preimage-resistant: given a hash, it is infeasible to find any input that produces that hash.
These properties make cryptographic hashes invaluable across software. They are used to verify data integrity, store password hashes, build blockchains, sign digital documents, deduplicate data, and power data structures like hash tables and Merkle trees. Despite the mathematical simplicity of the concept, the engineering and cryptography behind secure hash functions is deep and continually evolving.
- Deterministic: same input always yields the same output
- Avalanche effect: small input change yields totally different hash
- One-way: infeasible to reverse the input from the hash
- Collision-resistant: infeasible to find two inputs with the same hash
- Preimage-resistant: infeasible to find an input matching a given hash
Understanding Different Hash Algorithms
Many cryptographic hash algorithms have been developed over the decades, each with different trade-offs between speed, security, and output size. Choosing the right algorithm depends on the use case.
MD5 produces a 128-bit hash and was once the dominant algorithm for checksums and digital signatures. However, serious vulnerabilities were discovered in the early 2000s — researchers demonstrated practical collision attacks, meaning it became feasible to create two distinct inputs with the same MD5 hash. MD5 is now considered cryptographically broken and should not be used for security-sensitive purposes, though it remains common for non-security tasks like file checksums.
SHA-1 produces a 160-bit hash and was widely used in TLS, Git, and other systems. Like MD5, it has been broken by practical collision attacks; Google and CWI Amsterdam demonstrated a public collision in 2017. SHA-1 should be avoided for any new security-sensitive use. Git has been migrating away from SHA-1 in favor of SHA-256.
The SHA-2 family, including SHA-256 and SHA-512, is currently recommended for general-purpose cryptographic hashing. SHA-256 produces a 256-bit hash and is used in TLS, Bitcoin, digital signatures, and countless other systems. It has no known practical attacks and is widely supported across all platforms. SHA-3 is a newer standard based on a different internal structure (Keccak sponge construction), providing diversity in case weaknesses are found in SHA-2.
- MD5: 128-bit, broken — avoid for security use
- SHA-1: 160-bit, broken — migrate away
- SHA-256: 256-bit, currently recommended for most uses
- SHA-512: 512-bit, suitable for 64-bit systems
- SHA-3: newer alternative based on Keccak construction
Use Cases for Hashing
Hashing underpins a remarkable range of functionality in modern software. Understanding the major use cases clarifies why algorithm choice matters and where each algorithm fits.
Data integrity verification is the classic application. When distributing files, the publisher provides a hash alongside the file; the recipient recomputes the hash and compares it to detect corruption or tampering. Package managers like npm, PyPI, and apt use hashes to verify that downloaded packages match expected values, defending against supply-chain attacks.
Password storage is a critical use case, but raw hashing is not sufficient. Passwords should be hashed with a slow, salted algorithm specifically designed for the purpose, such as bcrypt, scrypt, or Argon2. These algorithms add a per-password salt and deliberately high computational cost to make brute-force and rainbow-table attacks infeasible. Never store passwords with plain SHA-256, and never store them in plaintext.
Hashing is also used for content-addressable storage (Git, IPFS), deduplication systems, bloom filters, Merkle trees in blockchains, message authentication codes (HMAC), digital signatures, and proof-of-work systems. Each use case has different security requirements, which is why no single hash algorithm fits all situations.
- File integrity verification and package checksums
- Password storage with bcrypt, scrypt, or Argon2 (not raw SHA)
- Content-addressable storage (Git, IPFS)
- Data deduplication and dedup-aware backups
- HMAC for message authentication
- Digital signatures and blockchains
Security Implications of Hash Collisions
A hash collision occurs when two distinct inputs produce the same hash output. Because the set of possible inputs is infinite while the set of possible hashes is finite, collisions are mathematically inevitable — the question is how hard they are to find.
For a secure hash with n bits of output, finding a collision by brute force should take roughly 2^(n/2) operations (by the birthday paradox). For SHA-256, that is about 2^128 operations, which is computationally infeasible with current or foreseeable hardware. For MD5 (128-bit output), the birthday bound is 2^64, but known cryptographic attacks bring the actual cost of finding collisions down to a few seconds on modern hardware.
Collisions matter because they undermine the guarantees that hashing is supposed to provide. If an attacker can produce two documents with the same hash, they can substitute one for the other. This is the basis of the MD5 collision attacks that broke X.509 certificate signatures in the 2000s: attackers generated a malicious certificate with the same hash as a legitimate one and got it signed by a certificate authority.
For most integrity-verification use cases, collision resistance is what you need. For password storage, preimage resistance matters more (an attacker with the hash tries to find any input that hashes to it). For digital signatures, both collision and preimage resistance are required. Match the algorithm to the security property you actually depend on.
- Collisions are mathematically inevitable given finite output size
- Birthday bound: ~2^(n/2) operations to find a collision
- MD5 collisions can be found in seconds — do not use for security
- SHA-256 collisions remain computationally infeasible
- Choose algorithms based on which property you depend on
Hashing Best Practices
Using hashing correctly requires attention to a few key practices that, when ignored, undermine security even with a strong algorithm.
Always use a salt when hashing passwords. A salt is a random value unique to each password, stored alongside the hash. Salting prevents attackers from precomputing tables of common password hashes (rainbow tables) and from identifying users with the same password. Generate a fresh salt of at least 16 bytes for each password using a cryptographically secure random generator.
Use a slow hashing function for passwords. Algorithms like bcrypt, scrypt, and Argon2 are deliberately designed to be expensive in both time and memory, making brute-force attacks costly. Configure the cost parameter based on your hardware and acceptable login latency, and increase it over time as hardware improves.
For non-password hashing, prefer SHA-256 or SHA-3 unless you have a specific reason to use something else. Never use MD5 or SHA-1 for security-sensitive purposes such as digital signatures, certificate fingerprints, or integrity verification of untrusted data. For HMAC, use HMAC-SHA-256 rather than trying to build your own construction.
Finally, never invent your own hash function or cryptographic construction. Cryptography is full of subtle pitfalls that even experts struggle with. Use well-vetted, widely-implemented libraries and standards, and keep them updated as old algorithms are deprecated and new attacks are discovered.
- Always salt password hashes with a unique 16+ byte random value
- Use bcrypt, scrypt, or Argon2 for password storage
- Prefer SHA-256 or SHA-3 for general-purpose hashing
- Use HMAC-SHA-256 for message authentication
- Never roll your own cryptography
Choosing the Right Hash Algorithm
Selecting the right hash algorithm depends on what you are trying to achieve. There is no single best choice for all situations, and using the wrong algorithm can lead to either poor security or unnecessary overhead.
For password storage, use Argon2id (the recommended variant of Argon2) if your platform supports it; otherwise use bcrypt or scrypt. These algorithms are purpose-built for the threat model of password hashing — slow, memory-hard, and resistant to GPU and ASIC attacks. Tune the parameters to take roughly 250-500 milliseconds on your production hardware.
For file integrity verification where the source is trusted, SHA-256 is a solid default. If you need maximum security margin and performance is not critical, SHA-384 or SHA-512 from the SHA-2 family, or SHA-3, offer additional headroom. For non-security uses like hash tables or deduplication, faster non-cryptographic hashes like xxHash, FNV, or CityHash are appropriate and far quicker than cryptographic algorithms.
For digital signatures and certificates, follow current standards: SHA-256 or SHA-384 with RSA or ECDSA. Avoid SHA-1 entirely for new signatures. For HMAC, HMAC-SHA-256 is the ubiquitous default and is supported everywhere.
Our Hash Generator supports MD5, SHA-1, SHA-256, SHA-384, SHA-512, and SHA-3 so you can compute and compare hashes for any of these use cases. It runs entirely in your browser, which matters when you are hashing sensitive data like file contents or API payloads. Use it to verify checksums, inspect hash outputs side by side, or learn how different algorithms produce different digests from the same input.