What are the ingredients of effective cybersecurity? As with most tech verticals, non-techies charged with researching the best options will discover a world of jargon and specialized terms. Encryption and hashing come up a lot. At a glance, they sound similar—both are methods of scrambling data, so prying eyes can’t make sense of it.
But what do they actually do? Are they the same thing? Two different approaches to the same task? Do you need one, or the other, or both? And by the way, how did salting become involved? Are we protecting data, or cooking breakfast?
Let’s cut through the confusion and discuss how encryption, hashing, and salting are different, and how they relate to each other.
TL;DR: Encryption is a reversible process, whereas hashed data cannot be decrypted. Salting is a method to make hashing more secure.
How does encryption work?
When the data passes through an encryption algorithm, it gets scrambled into a version that is illegible to human eyes and resistant to computerized reverse-engineering.
The original, legible input data is sometimes referred to as plaintext. The scrambled code created from the plaintext by the encryption algorithm is called ciphertext. Unencrypted (and therefore unprotected) data is sometimes also called cleartext.
Put another way, plaintext gets input into an encryption algorithm, either for storage or transmission, and is turned into ciphertext. When an authorized user or recipient tries to open the file, either in its original stored location (i.e. an encrypted hard drive or cloud drive) or at its destination (e.g. an encrypted email inbox or text messaging platform), the ciphertext is converted back to the original plaintext.
How does one gain authorized access to the encrypted plaintext? With a key. A digital encryption key is a code that reverses the encryption, rendering the ciphertext back into plaintext. Some encryption algorithms use one key, others – two keys, with keys distributed in various ways to add extra layers of security.
Whatever the situation, the important thing to understand is that encryption is a reversible process.
You can decrypt encrypted data using the right key. This is necessary, since encrypted data often has to return to a form that human eyes can read, like an email or text message.
That said, the fact that a key exists that can unlock the door of encryption is an inherent vulnerability. Naturally, if an attacker can obtain the key, nothing can stop him/her from accessing the encrypted data.
Encryption predates digital computing. Coded messages have been used to protect sensitive information from enemy or unauthorized eyes since ancient times. They even worked the same way — people used algorithms to encode messages, and keys to decode them. Without computers, the algorithm and the key were limited in complexity to what a human brain could comprehend, memorize or write down and protect.
Mechanical encryption devices enabled the creation of much more complicated algorithms. One of the most famous encryption devices, the Enigma Machine, was used by the Nazis in World War II to devastating effect. Cracking the Enigma Machine using even more complex computing algorithms, designed by English mathematician Alan Turing and colleagues working for British Intelligence, is not only considered a turning point in the war, but also the birth of modern computing.
Today, computerized encryption algorithms can make ciphertext impossible for a human mind to decode. However, cybercriminals are in a constant arms race with cybersecurity thought leaders to circumvent encryption.
Types of encryption algorithms include:
- Asymmetric encryption algorithms
- Symmetric encryption algorithms
- Deterministic encryption algorithms
- Probabilistic encryption algorithms
Some of these types of encryption cross paths with each other; each has different uses and weaknesses.
Symmetric encryption algorithms
Symmetric encryption algorithms use only one secret key to both encrypt and decrypt the data. If the data is transmitted from one party to another, they must somehow come into possession of the same key.
Symmetric encryption algorithms come in two forms:
- Block ciphers. They encrypt bits in discrete blocks of equivalent sizes. During the encryption process, the algorithm holds data in its memory until enough data is acquired to create a block.
- Stream ciphers. Data is encrypted as it arrives and is not stored in memory.
List of encryption algorithms that use symmetric keys:
- AES (Advanced Encryption Standard)
- DES (Data Encryption Standard)
- IDEA (International Data Encryption Algorithm)
- Blowfish (Drop-in replacement for DES or IDEA)
- RC4 (Rivest Cipher 4)
- RC5 (Rivest Cipher 5)
- RC6 (Rivest Cipher 6)
Every use of the key “leaks” some information about the key.
This is an inherent vulnerability in symmetrical encryption—attackers who gain access to leaked portions of the key may be able to reconstruct the key. A symmetrical encryption algorithm may become “exhausted” by excessive key leaking and have to be discarded.
Asymmetric encryption algorithms
Asymmetric encryption algorithms use two keys, not one. This consists of a public key and a private key.
The private key is kept secret, like the key in a symmetric encryption algorithm. Unlike in symmetric encryption, however, that private key never needs to be transmitted or shared, making it inherently easier to protect.
Instead, the sender encrypts the data using a public key, and the recipient accesses it with a private key, which may take the form of a security certificate or temporary code authorized by an identity check.
Some encryption algorithms that use asymmetric keys:
- Elliptic-curve algorithms
Deterministic encryption algorithms
Deterministic encryption algorithms always produce the same ciphertext whenever the same plaintext is entered. Examples of deterministic encryption algorithms include RSA and most Block Ciphers in ECB mode.
Probabilistic encryption algorithms
Probabilistic encryption algorithms use randomization to produce different ciphertext with each execution, even with identical inputs. The ciphertext can still be converted back to the original plaintext, even if two different ciphertexts were created from the same plaintext by the same ciphertext.
Most asymmetrical encryption algorithms and some symmetrical encryption algorithms (e.g. block ciphers in chaining mode) are probabilistic.
As you might expect, probabilistic encryption adds immensely to the security of the cryptosystem. In fact, encryption algorithms must be probabilistic to be considered semantically secure—that is, meaningfully prevent even part of the plaintext from being extracted from the ciphertext.
What is hashing?
Hashing is similar to encryption in that it scrambles the input data into a randomized or near-randomized output data.
Hashing differs significantly from encryption, however, in that it is a one-way process.
There is no easy way to unscramble the data, interpret the output, or reverse-engineer the input. There’s no key, no system of two keys, no publicly-accessible keys, no certificates that will grant you access to the original data. At risk of over-extending the metaphor, once you bake the data into the hash, there’s no unbaking it.
So what good is this hashed data?
It can be stored securely and used to identify that data input’s recurrences—for example, a password. In fact, hashing is the go-to method of securing passwords. When a user creates a password on a site with strong security, it passes through a hashing algorithm and gets stored on the site’s data cache in its nonsensical, standardized hash format.
If a cybercriminal hacks the data, all (s)he has is a bundle of hashed passwords that can’t be used as login credentials because there is no key to unlock the data in its original form.
One characteristic of hashing algorithms, however, is that the same input produces the same hash. This is why it is useful for password storage. Users can access secure content by entering the correct password, which passes through the hashing algorithm to produce the same hashed output every time, which the system can then match with the user’s hashed password stored in the data cache.
Hashing algorithms generate hashes of a fixed size, often 160 bits, 256 bits, 512 bits, etc. Whether the input data is longer or shorter than the fixed size, the resulting hash is this same uniform length.
Hashes are often built from hexadecimals, using two 4-bit values per byte for a total of 16 values per byte. This enables large blocks of plaintext or very short plaintext to be compressed or expanded into a unique hash.
Hashes come in different flavors (no pun intended). Cybersecurity teams that want to hash passwords for storage have numerous hashing algorithms to choose from, including:
- MD5. The MD5 algorithm encodes a string of information into a 128-bit fingerprint. It’s one of the oldest and most widely-used hashing algorithms in the world, but it is also starting to show its age in the form of a high risk of hash collision.
- SHA-2. Developed by the NSA, this cryptographic hash function builds on the older SHA-1 algorithm. Both the current algorithm and its predecessor use six hashing algorithms in 244-, 256-, 384-, or 512-bit configurations.
- CRC32. “CRC” stands for “cyclical reduction check.” It’s a kind of code used to discover errors or changes in a data set. Mostly used in zip files or for file integrity checks, CRC32 produces the same hashed output every time it is run.
- RipeMD. Available in multiple bit configurations with 160-bit the most popular, RipeMD is a cryptographic hashing algorithm that is used in the Bitcoin standard. Other cryptocurrencies use it as well.
- Tiger. Tiger is a hash function invented to work with 64-bit platforms. Available in 128-bit and 160-bit configurations, Tiger has no distinguishing initializing values. It appends a byte with hexadecimal value 0x01. A variation, Tiger2, appends a byte with hexadecimal value 0x08.
- xxHash. xxHash is a non-cryptographic hash function known for its exceptional speed, working at RAM speed limits. The most up-to-date variation, XXH3, performs exceptionally well with small data.
- BCrypt. This hash function is designed to be slow, with the intention of making password cracking more time-consuming and discouraging cybercriminals attempting to execute quick attacks.
- Argon2. Argon2 uses an “adaptive” hashing algorithm that can be calibrated with a “work factor,” making the hash more or less complicated.
Which algorithm creates the most secure hashes?
There is little widespread agreement about the most secure hashing algorithm, as evidenced by the wide variation of hashing algorithms in use. Bitcoin and other cryptocurrencies rely on RipeMD, while the tried-and-true MD standard still enjoys widespread use (although its propensity for hash collisions reveal it as more “tried” than “true.”).
The most conservative answer is probably SHA-2. The National Institute of Standard in Technology (NIST) recommends SHA-2 over both SHA-1 and MD-5. SHA-3 was released in 2015, but has not enjoyed widespread adoption yet.
Good hashing algorithms produce an avalanche effect. This means that if even one digit of the input is changed, it produces a drastically different hash, unrecognizable from the first hash. If only one digit of the hash changed in response to the changing of one digit of the input value, the hash would be far less secure and easier to crack.
Hash algorithms are not perfect. One vulnerability to guard against is hash collision. Any scrambling algorithm carries with it the risk of collision, which is the production of the same scrambled output from two different inputs. It is also known as a clash.
Unlike checksums, which are designed to minimize the likelihood of clashes, hash algorithms actually produce a higher inherent likelihood of a collision because they are designed to be able to reproduce outcomes rather than randomize outcomes. One of the processes that contributes to this is locality-sensitive hashing, which groups similar inputs into the same “bucket.”
This is a vulnerability attackers can potentially exploit, because they only need to know part of a password (or a common part of a password) to enter it into a hash function and discover a common hashed output, gaining a foot in the door even without the entire correct password.
Merkle-Damgard hash functions
Merkle-Damgard hash functions were described in Ralph Merkle’s PhD thesis in 1979 and was independently confirmed by Ivan Damgard as a way to create collision-resistant hashes from one-way compression functions.
The function pads the password with MD-compliant characters to create a uniform size of 512 or 1024 bits. This is called length padding or Merkle-Damgard strengthening. A finalization function may be added to the hashing algorithm to compress the internal state, mix the hash better, and enhance the cryptographic hash function’s avalanche effect.
Can you spot the algorithm from the hashed password?
If you have the hash results for different passwords, can you determine which algorithm was used to hash it?
The short answer is no. While you can tell the bit encoding, hashing algorithms leave no obvious markers as to which one was used. A skilled attacker may be able to reverse-engineer the code, but without it, the attacker can only guess.
However, most hashing algorithms produce equivalent hashes for equivalent inputs.
That means that if an attacker discovers the hashing algorithm used and enters the right password, it will produce the same hash as stored in a secure drive. If the attacker can breach that drive and find a hash that matches the hash produced by that password, the criminal has identified a working password.
To guard against this kind of attack, cybersecurity experts recommend salting the hash. Salting is not a separate function from hashing; rather, it is an enhancement of hashing.
What is salting?
Salting isn’t an alternative to encryption or hashing; it is actually a function that can be added to the hash to make it more secure. It’s a way to defeat a rainbow table.
A rainbow table is a tool that a cybercriminal might use to try to get around a hashing algorithm. An attacker might steal a file full of hashed passwords. To try and reverse-engineer the original password, the attacker might take a list of several million statistically common passwords and run it through a hashing algorithm.
The attacker can then run an automated script to compare the two lists—the stolen hashed passwords and the hashed list of common passwords. The latter is known as a rainbow list or rainbow table. If the script reveals a match between the two lists, the attacker has found a working password.
The purpose of “salting the hash” is to add an extra layer of scrambling to the hash, making it impossible to match with a rainbow list.
“Salting the hash” is the process of adding extra, randomized characters to the password. Once the hash is salted, it won’t match any output from a rainbow table, even if it is generated by the same hashing algorithm.
Yes, the attacker could start adding random characters to the rainbow list to try and get lucky and find the exact salt for the password hash, but at that point, the chances of scoring a match are so astronomical that it is usually not worth the attacker’s time.
You may wonder whether it’s bad for secure to store the salts on the same database as the passwords.
It is — if the passwords are salted in the same way. Some salting scripts apply a predictable salt, recurring salt (the exact salt, or some other version of a predictable salt). If a cybercriminal steals two passwords with the same string of salt characters, the thief can then identify the salt and discount it.
You can mitigate this risk by applying unique salts to each password. Some security experts even advocate for storing hashed passwords and their corresponding salts in separate databases.
Why is salting important?
Hashing by itself is not particularly secure. While it adds a layer of security to password storage, most cybercriminals can easily circumvent an unsalted hash using tools like a rainbow list. Adding a unique salt to each hash adds exponentially to the security of the hash.
No security is perfect, but proper salting of a strong hash is the best show in town when it comes to guarding stored passwords.
So what is the difference between hashing and encryption?
Both hashing and encryption scramble data into nonsensical ciphertext to protect it from bad actors who would misuse the data to invade privacy, steal identities, or commit cybercrime.
But encryption and hashing differ in important ways, to wit:
- Encryption is a two-way process that depends on keys to unlock ciphertext and return it to its original form as readable plaintext. You can use probabilistic encryption algorithms to produce different ciphertext for each plaintext input, even if it’s identical.
- Hashing is a repeatable process that produces the same hash whenever you enter an equivalent input into the same hashing algorithm. However, hashing is a one-way process, with no key to unlock the input in its original format.
Neither encryption or hashing are perfect. Cybercriminals are in a constant war of attrition to try and bypass security measures like hashing and encryption. Cybersecurity best practices include constant vigilance and improvement of security protocols, including advanced key and certificate protocols for encryption, and “salting” of hash.
While encryption and hashing differ in key ways and share some core similarities, most organizations need a mix of both to comply with data security regulations and standards. This includes proper encryption of stored and transmitted data, and hashing (with adequate salt) critical data like passwords, account numbers, etc.