Introduction to Strong Cryptography

Jonathan FisherFebruary 22nd, 2012Last Updated: October 21st, 2012

4 83 13 minutes read

One thing that amazes me is that the most developers are not familiar with strong cryptography. In my career, I’ve seen all sort of mistakes that lead to leaked data, guessable passwords, unfortunate disclosures, and worse. The nice thing is, you don’t have to understand the ridiculously complex math behind the algorithms, you only have to know the rules for using them correctly. By the end of this series, my goal is to de-mystify the magic, so you can start using the primitives in your code right away!

But first, when I say Strong Cryptography, what the hell am I referring to anyway?

Strong cryptography or cryptographically strong are general terms applied cryptographic systems or components that are considered highly resistant to cryptanalysis.
http://en.wikipedia.org/wiki/Strong_cryptography

So Strong Cryptography is not some esoteric concept you are not privy to: Strong Cryptography is simply a set of definitions and algorithms that have been reviewed by experts, secret government agencies, and third-party organizations and found to be hard to break.

One thing I’ve seen repeatedly done is that developer ‘invents’ a cryptography scheme for a particular purpose. Here’s the thing, cryptography is thousands of years old. If you’ve ever ‘invented’ your own way to ‘encrypt’ data, chances are you’ve just re-invented something that has been discovered thousandsof years ago. If you want to avoid the mistakes that WEP made with wireless, Microsoft did with the XBox, or Sony made with the PS3, this blog series should help you avoid embarrassment, AND give you something impressive to say at the next cocktail party.

Finally, I just wanted to mention this is actually a very personal subject that I have a long history with. I found my first need for cryptography was passing notes to my friends as we played “Spies” in the neighborhood and needed to keep the locations of our secret forts safe. Unfortunately, my single letter substitution cipher must have been broken by some whiz kid as our treehouse was destroyed that summer… After reading Alvin’s Secret Code, we then created 2-3 sets of Caesar wheels and never lost a secret fort again!

Below we’ll be covering one of the most useful Strong Cryptography primitives: The Hash function. These little beasts are amazing and have a multitudes of uses!
So, what is a hash function? Glad you asked, lets quote the unabated Wikipedia:

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array (cf. associative array). The values returned by a hash function are called hash values, hash codes, hash sums, checksums or simply hashes.
http://en.wikipedia.org/wiki/Hash_function

Sigh, Wikipedia used to have nice user-friendly definitions for common folk, since that’s not the case, let me give my own definition:

A hash function takes some input bytes, then calculates a fixed value that identifies that sequence of input bytes. Sorta like a checksum… Wait, a checksum IS a hash function! Cool!

-Jonathan on cryptography.

A (very bad) hash function could be as simple as “divide everything by two until you have one digit left, rounding up as you go along.” We could this function ƒ (ƒ stands for ƒailure of a hash function), and so ƒ(48) = 6 or ƒ(451) = 8.

A simple way to use this function ƒ is to protect a website. When a user creates an account, they choose a number as password (p). During the account creation process we calculate a new number (c = ƒ(p)) and store it for the user. We end up with (c) number, 0-9, in a database for each user.
To check a supplied password (s) against the database, we simply evaluate if (c ==ƒ(s)).
This hash function ƒ isn’t really very useful in the real world, as our keyspace is only 9 possibilities, it’s scales horribly, and it’s trivial to reverse engineer, but if we substitute fail function ƒ for something better, we end up with a secure system!

Technical Properties of an Ideal Hash Algorithm

It should represent a true many-to-one function. It should take an input, and produce an identity output that can be reproduced exactly every time. For every plaintext, there is exactly one hash value. For every hash value, there are infinite plaintexts.
The input plaintext can be any length. Yes, but see the next point:
The output hash value is always the same size, meaning it’s fixed length. So if you gave the algorithm a plaintext of 50 bytes or 765kbytes, the length of the hash value (the output) would as be say, 16 bytes. The exact length of the output depends on the algorithm design.
The always produce the same hash value (output) for the same plaintext (input). So if you hash the quote, ”I go on the principle that a public debt is a public curse” you should get the value, vjPWnnLiB0BLrqUuJjtpEM5KClg= every time. This is called identifying, fingerprinting, or hashing the plaintext.
A small change in the plaintext will result in a dramatic change to the hash value. So if you change to Madison’s quote to “I go on the principle that a public debt is not a public curse” the hash value is now: g+8o7vlQAGjvlst+XsOEwIzF0Qc=.
Permanent and “one way”, meaning that the hashValue cannot be reversed to plaintext.
Given a particular hashValue, it is extremely difficult to find a plaintext that will produce that desired hashValue. These last two points run together… Basically, our function ƒ is far from ideal. Lets say you have (p = 5). It’s very easy to calculate an input (s) that would produce (5 = ƒ(s)). You could use any of (s = { 10, 20, 40} ). An ideal function should make this extremely difficult!!!

SHA1, the standard hash algorithm

SHA-1 (Secure Hash Algorithm – 1) is a hash algorithm that is quite complex, and is known to be secure at the time of this writing, for most applications (the paranoid should use SHA-256 or SHA-512). We can supply any length plaintext to SHA-1 and receive a consistent 20 byte hash value in return. Here’s how to do this is Ruby:

exabrial@arclight:~$ irb
>> require 'base64'
=> true
>> require 'openssl'
=> true
>> shaBytes = Digest::SHA1.digest('Give me liberty, or give me death!')
=> "\367\241\266`30 \2656\214G343\266\276\204\225cB\370\r"
>>  puts Base64.encode64(shaBytes)
96G2YBggtTaMRxwztr6ElWNC+A0=
=> nil
>> exit

You can use an online hash calculator to check to see if Patrick Henry’s quote was hashed correctly. Go ahead, I’ll wait…

SHA-1(Base64)    : 96G2YBggtTaMRxwztr6ElWNC+A0=

If we make a slight change to the plaintext, lets say: “Give me liberty, or let me eat cake!“, we will get a dramatically different output:

SHA-1(Base64)    : M2vPzwTPc7ALM+OkiGAwCkS1DY4=

Actually, just changing the case of the first character “give me liberty, or give me death!“ will produce a completely different hashValue:

SHA-1(Base64)    : g1UFdWJfXWfkIVz42uLLxrJv58g=

Hash Algorithm Input is indeed infinte

Remember how we said the input to the hash function can be infinite? How about the entire text of The Declaration of Independence (12kb)? If you upload the file to the online calculator we’ve been using, the result is still 20 bytes:

SHA-1(Base64): mGnTR5dnrXrMqEMVoLKCMzWEHjU=

Here’s a quick Java program to crunch the same calculation:

import java.io.InputStreamReader;
import java.nio.CharBuffer;
import java.security.MessageDigest;
import org.apache.commons.codec.binary.Base64;

public class SHA1Hasher {
 public static void main(String[] args) throws Exception {
  MessageDigest md = MessageDigest.getInstance("SHA1");
  InputStreamReader insr = new InputStreamReader(SHA1Hasher.class.getResourceAsStream("/usdeclar.txt"));
  CharBuffer charBuff = CharBuffer.allocate(16000);
  String plainText;
  byte[] hashBytes;
  String hashValue;
  insr.read(charBuff);

  plainText = charBuff.flip().toString();
  hashBytes = md.digest(plainText.getBytes());
  hashValue = Base64.encodeBase64String(hashBytes);
  System.out.println("SHA-1(Base64): " + hashValue);
 }
}

Real World Usage of Hash Algorithms, Scenario 1: Storing account passwords in a database in a secure manner

You’ve been tasked with creating a website for the American colonial resistance. The British are abound and are randomly inspecting everyones’ databases for suspicious activity. The Founding Fathers have an unfortunate habit of choosing patriotic passwords like “I<3America”. How can mask passwords from British patrols so the secrets of who is loyal to The Crown, and who is a revolutionary is kept secret?
The best thing to do is created a ‘salted password table’ like this:

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI

What we will do is store the password hash, plus some random text called a salt (to add entropy to the user’s password). So Thomas Jefferson’s original password: I<3America now becomes I<3AmericaWTg68cVxPI and is hashed permanently to: HiE0AZhRWAs6Mmd7dVqppM1WtJc=.

So how do you process a login, now that the plaintext is gone? You perform the same calculation: Take the username from the login form, and lookup the salt. Then take the plaintext password from the login form and concatenated the salt from the database to reform the original string. Hash that string, and if matches the database hash, you know the correct password was supplied! Jefferson’s original patriotic password is completely masked, so those pesky Tories will never know his true allegiance! Yarr!

But what if another patriot (Paul Revere) chooses the exact same password, but is caught by the British? Couldn’t the British Regulars then just infer that anyone with the same password hash is also a patriot (and be hung)?

Because we added salt (entropy) to the plaintext password, we’re guaranteed to never have the same hash twice in our database. Lets say Paul Revere creates an account with the same password: I<3America, but we followed the required practice of using SecureRandom to create a new salt:

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI
paulrevere1776	aktJ/0cn69Y41vssNfZDHY1CsdE=	sWVUaTGa6e

So Paul Revere’s password I<3America becomes I<3AmericasWVUaTGa6e, which is hashed to: aktJ/0cn69Y41vssNfZDHY1CsdE= and stored. Later, when Paul Revere was caught and was revealed to be a patriot, Jefferson’s secret is kept safe by salting and hashing.
To summarize the benefits:

A database breach means your user accounts are still secure.
It’s impossible to tell if two people have the same password.
There are no restrictions to the length of a user’s password.
Your database column can be a fixed length instead of a varchar.
SHA1 was created by people who are smarter than you (and I). You can deflate your ego and leverage their brains.

Sidenote: The salt column is extremely important! Never ever, store, or send, a hash without a salt! Furthermore, the salt value must be securely generated with a random generator. You must use your language’s SecureRandom generator! You’ve been warned on these two points: they are CRITICALLY important (I can’t stress this enough)!

Real World Usage of Hash Algorithms, Scenario 2: Noah Webster

Yes, Noah Webster was an American revolutionary: He was a member of Connecticut militia, an editor of the Federalist newspaper, the author of American copyright, among many other things. He went on later in life to write some dictionary thing, but the important part here is he was a real hard ass when it came to teaching.

So say Noah Webster has hired you to write a program to test his students spelling. Because he so adamantly believed in spelling correctly, it’s important each student get 100%. Knowing this is a bit unrealistic though, he will allow the students to retake the test as many times as they want until they get 100%. The test consists of Mr. Webster dictating a small passage of text to the class and having them enter it correctly into your program. The program should tell them whether they spelled everything correctly. This will allow them to figure out the correct answer before turning in their final answer.

Another important piece of American history was that Noah Webster was a closet Ruby fan; you have to write this in Ruby. This presents a problem, because the students will be running your program, you can’t just embed the correct answer in your program, or they could look at the source code.
By using a hash, you can embed the hash of the correct text in the program and compare the students input to the hash.
Mr. Webster begins the test by saying to his students:

Before a standing army can rule, the people must be disarmed; as they are in almost every kingdom of Europe. The supreme power in America cannot enforce unjust laws by the sword; because the whole body of the people are armed, and constitute a force superior to any bands of regular troops that can be, on any pretense, raised in the United States.

The correct hash is: 6+IHAAJ1i0KmoPaowU1i0xSjcO4=
Knowing what you know now, can you finish the program for Mr. Webster? Post your answer in the comments!

Today we’ll dive into some deeper to some Hashing topics. Also I’m taking suggestions for subjects in my posts… Last time it was US Founding Fathers, this week it’s tragedies.
We learned in our last article what a hash function is and how for every input, there is exactly one corresponding output. We’re going to apply that concept to two commons situations where confidentiality is needed.

Common Situation #1: A situation in Verona

Lets say you have Romeo who needs to email the following message to Mercutio: “I dreamt a dream tonight.” Unfortunately for Romeo, all the ISPs in Verona are owned by the Montague family (Not much better than Comcast). Romeo doesn’t mind if the Montagues can read his message, but he certainly has an issue of they change his message. How could Romeo be sure his message was delivered without being modified by the Montague family?

Well you could run the message through a hash function and attached the hash to the message. The Montagues however, could simply modify the message, and attached the new hash value. To get around that, I suppose Romeo could calculate the hash, then tell Mercutio in private what the hash value was. However, this would defeat the purpose of emailing him, since in private he could just tell Mercutio the contents of the message.

Any ideas?

Introducing the new MAC (not what you think)

Fortunately, there is an easy solution: Message Authentication Codes or MAC. A MAC function is a ‘keyed hash function’, so you could call it a cousin of a normal hash function. In fact, MAC functions and hash functions are nearly interchangeable! In the last article, we covered the properties of a hash function. So what makes a MAC function different from a hash function?

You must posses a secret key can produce a hash value.
A MAC function shouldn’t reveal ‘hints’ about the secret key, even if an attacker chooses the input.

Because you must posses a secret key to produce the hash value, they are often called “Keyed Hash Functions.” So lets run the quote “The lady doth protest too much, methinks” through a MAC function:

private static final byte[] keybytes = new byte[] { (byte) 0xfc, (byte) 0x4f, (byte) 0xbe, (byte) 0x23,
  (byte) 0x59, (byte) 0xf2, (byte) 0x42, (byte) 0x37, (byte) 0x4c, (byte) 0x80, (byte) 0x44, (byte) 0x31,
  (byte) 0x20, (byte) 0xda, (byte) 0x20, (byte) 0x0c };

 public static void main(String[] args) throws Exception {
  Mac mac = Mac.getInstance("HMACSHA1");
  Key key = new SecretKeySpec(keybytes, "HMACSHA1");
  mac.init(key);
  byte[] hashValue = mac.doFinal("The lady doth protest too much, methinks".getBytes());
  String encodedHashValue = Base64.encodeBase64String(hashValue);
  System.out.println(encodedHashValue);
 }

The above program produces this output:

oOUKVR5hDRh4n0H+beVO4JvMw64=

If you use a different key (lets change the last byte to 0x0d) you’ll get a completely different hash:

cDdJwEBm9qIni9A7QfIQ1e9G8qo=

So how does apply to Romeo and Mercutio? First, they meet in private and create a secret key. Both Romeo and Mercutio will know the secret key. Next, when they want to send an email, they run the message through the MAC function and produce a signature. Finally, the recipient runs the message through the same MAC function, using the same key. It the calculated signature matches the signature on the message, they can be assured the message was not tampered with during transmission.

A MAC is actually just a wrapper

Take note of the following line in the above implementation:

Mac mac = Mac.getInstance("HMACSHA1");

You may remember from previously that SHA1 is a fairly standard hash algorithm. Well here’s the whole story: a MAC function is just a wrapper around a standard hash function. The HMAC part of HMACSHA1 is the method of wrapping. There are other MAC implementations, such as VMAC, but they are less commonly used. VMAC performs a little better than HMAC. Since HMAC and VMAC are just wrappers, you can take almost any hash function and turn them into a keyed hash function. Using SHA1?s older brothers, we can run the same quote through the program:

HMACSHA256: POP9cefgoEC9pUaWXqQ8lBbW9CdMi1k3t7LXAGYl87s=
HMACSHA512:ALFIpPMbphQJ7KZQHacGIFH3T2qI5AKUqaD8lDilNnjajGL29cZdp68sLeQTjDKD+cIAfZN86udfRdecGeTm0A==

Common Situation #2: Revisiting storing passwords in a database

Previously we decided on the following table structure

USERNAME	PASSWORD_HASH	SALT
tjefferson43	HiE0AZhRWAs6Mmd7dVqppM1WtJc=	WTg68cVxPI
paulrevere1776	aktJ/0cn69Y41vssNfZDHY1CsdE=	sWVUaTGa6e

Lets add one additional column:

USERNAME	PASSWORD_HASH	SALT	MAC_KEY
tjefferson43	RFu51fVI/0y8gmPXkjo0Op9FRHU=	WTg68cVxPI	KvzpFe690vsVcW1jLqQ1vg==
paulrevere1776	SmZedewg1JD7kRhndEW2cRmcyGw=	sWVUaTGa6e	FgIpIDFi9kJgXvkp44ua4Q==

So what did we do here? Instead of using straight sha-1 function to store a representation of the user’s password, we now use HMACSHA1. What does this do for you? Not much. The better question is what does it to for hackers… It makes their life extremely difficult!

A MAC defends you from hackers (again, not what you’re thinking)

In our original table setup, a hacker could use two forms of attacks to crack our passwords. He could use a rainbow attack, (yes it’s really called that… No I don’t know why) or a brute force attack (not to be confused with a Bruce force attack).

A rainbow attack uses a multi-terrabyte database to do a reverse lookup on our hash algorithm. The database is generated ahead of time, possibly by cooperating parties and thousands of separate computers to divide up the workload. Even with the combined effort, this still takes months to generate useful tables. However, with the finished rainbow table, an attacker could lookup a hash function output to get an arbitrary input that produces the same hash as the user’s real password. Yikes! The nice thing is, using HMACSHA1 makes this infeasible for a hacker. Because we have keyed our hash algorithm differently for every user’s password, they would have to collectively regenerate the rainbow table for every individual entry, making the rainbow very very very difficult in practice!

A brute force attack is related to the rainbow attack. A brute force attack tries random combinations of letters and numbers, or even common english words until it finds an input that produces the correct output. Unfortunately, HMACSHA1 provides little additional security against brute force attacks. However, because they require powerful computing systems to run, they are less common. There are only two ways to defend against brute force attacks: