What is Password Hashing and How Does It Work?

If you are a frequent denizen of the Internet like myself, there is a good chance you have received an email that goes something like this:

Dear valued customer,
Recently, our website fell victim to a cyberattack on our corporate network. All passwords were encrypted, but as a precaution we are requiring all of our customers to reset their passwords immediately.
Thank you.

So, there was a breach, some of your information, including your encrypted password, was leaked. Is your account at risk?

Short answer: YES, but why?

To understand this, you must understand the concept of “password hashing.”

What is a Hash?
A hash is just a way to represent any data as a unique string of characters. You can hash anything: music, movies, your name, or this article. Metaphorically speaking, hashing is a way of assigning a “name” to your data. It allows you to take an input of any length and turn it into a string of characters that is always the same length. Obviously, there are many methods (algorithms) to do this.

A few of the most popular hashing algorithms:

MD5 – Given any data will return a unique 32 character hash.
SHA1 – Given any data will return a unique 40 character hash.
SHA256 – Given any data will return a unique 64 character hash; designed by the National Security Agency.

Lets look at a simple example:
My name is “Jamin Becker”

The MD5 hash representation of my name is:
eeb7048c69b088739908f5f5144cd1f5


The SHA1 hash representation of my name is:
a477cc14eae5fd94fe4cb20b36ec80ac6983bad44973ae7f4f230010f01289b0

Why is Hashing Secure?

The reason hashing is secure is simple: hashing is a one way operation. They cannot be reversed. Given a string “eeb7048c69b088739908f5f5144cd1f5″, there is no way to reverse the MD5 hash to return “Jamin Becker”. This is because of the way the mathematicians and programmers structured the MD5 hashing algorithm, and it comes back to a fundamental computer science problem called “P vs NP.” P and NP are just two classes of algorithms.

Most hashing algorithms fall under NP which means they can be quickly calculated. However, the un-hashing algorithms (i.e “eeb7048c69b088739908f5f5144cd1f5″ -> “Jamin Becker”) fall under the P class and can only efficiently be solved in polynomial time (i.e using a quantum computer significantly more advanced then the ones available today).

So why is this good for security?

Say you subscribe to a website and choose password “12345″. Immediately, that website will hash your password, probably with SHA1, and store it in a database. Now every time you login, the website will rehash your password and compare it to the one stored in the database. If they match, you will be successfully authenticated. If the website is ever breached, and the password database is leaked your password will appear as “8cb2237d0679ca88db6464eac60da96345513964″ and not “12345″.

Hash Attack Strategies

So, the attacker has the hashed version of my password and there is no way to reverse it to 12345. I have nothing to worry about, right? WRONG!

One method that is commonly used to get the plain text password from a hash is called a brute force attack. In this attack, the attacker will run through a giant wordlist and hash each word with the appropriate hashing algorithm. They can then compare the hashes in the wordlist to the ones they have obtained from the database. If a hash from the wordlist matches the one in the database, they can simply find the corresponding plain text password in the original wordlist they hashed. Experienced attackers will use extremely large wordlists combined with powerful software to run through millions of password possibilities a second.


Another method of attack attempts to exploit the hashing algorithm itself by creating a hash collision. A hash collision occurs when two different sets of data resolve to the same hash, and while this is rare, it can be deadly. This would allow the attacker to generate a string of characters that is not your password, but still able to log in to your account since it generates the same hash.

Conclusion
Hashing algorithms are becoming more and more advanced. Mathematicians and computer scientists are constantly designing cryptographic hashing algorithms with lower probabilities of collisions. However, it is important to remember that no matter how strong the hashing algorithm is, it can always be cracked using a brute force attack. The good news is that you can easily defend against these attacks as well by simply following best-practice password policy.

Size does matter – the longer the original password the less likely it will appear on a wordlist
Do not be predictable – avoid using words like “password” and “myname123″
Use a mixture of special characters, numbers, upper and lowercase letters