Cryptography & Theory is series of blog posts on things I learned in coursera stanford online crypto class. The class contained just right mixture of theory, math and programming and I enjoyed it a lot.
This first part explains what is meant by expression “good cipher”. It contains definition of a cipher and multiple definitions of cipher security. Although it does not sounds like much, it is one of the most important things I learned there.
It is important because ciphers are designed to protect against well defined attacks and are vulnerable to anything else. One can not just take random cipher from build-in ciphers list of his favorite framework and call it a day. One has to understand what is he choosing and why.
A cipher is something that uses secret key to encrypt and decrypt data. Encryption takes key and data as input and produces ciphertext. Decryption takes key and ciphertext and produces data. That is it.
Ciphers are divided into two big groups: symmetric ciphers and asymmetric ciphers. The difference between them is in how they treat the key.
Symmetric cipher uses the same key to encrypt and decrypt data. If you encrypt the message with a key, you have to use the same key do decrypt it. The key must always be kept secret.
Formal definition of symmetric cipher:
Definition: A cipher is a pair of efficient algorithms (E, D) defined over a triple (K, M, C) where
- E: K×M → C
- C: K×C → M
- ∀ m ∈ M, k ∈ K: D(k, E(k,m)) = m.
K in the above definition is set of all possible keys, M is a set of messages and C is a set of all possible ciphertexts.
The last property is called ‘consistency property’. It says that if you encrypt a message, you must be able to decrypt it with the same key.
Asymmetric cipher uses two different keys. One is used to encrypt data and another is used to decrypt them. It is impossible to use the encryption key to decrypt the ciphertext. Encryption key can be safely published on the internet.
Asymmetric ciphers usually use symmetric ciphers inside them. This post is about basics behind
symmetric ciphers, so there is no need to formally define asymmetric ciphers.
The above definition of cipher includes both good and horrible ciphers. Even cipher that ignores the key and returns the plaintext “as is” satisfies that definition. Therefore, we need to define what is good secure cipher.
To make things complicated, the meaning of “good cipher” depends a lot on the context. How are you using the cipher and attacker strength are as important as the cipher itself. Can the attacker modify messages? Can you generate new key for every single message or do you have to reuse them?
The consequence is that there are multiple cipher security definitions. They all model some real world cipher usage and attacker. Some of them have been created after new type of attack has been found and previous definition stopped to be useful.
First cipher security definition was invented by Shannon and defines something called perfect secrecy. The idea of perfect secrecy is that the ciphertext should reveal no information about the original message except its length. No matter how strong the attacker is, ciphertext without a key should be as good as nothing.
The strongest possible attacker is the one able to try all possible keys. To protect against him, an attempt to try all keys must generate all plaintexts of the same length. If there is a key that decrypts the ciphertext into “yes”, then there is another key that decrypts it into “no!” and yet another that decrypts it into “jaj”.
Not only that, each plaintext must appear in the the result exactly the same number of times. If there are two keys that decrypt into “yes”, exactly two of them will decrypt it into “no!”.
Definition: A cipher (E, D) over (K, M, C) has perfect secrecy if ∀ c ∈ C and ∀ m0, m1 ∈ M; |m0|=|m1|:
- Pr[E(k,m0)=c]=Pr[E(k,m1)=c] where k is uniform in K (k←K).
Note: Pr stands for “probability” and k←K means that the key k is randomly selected. |m0|=|m1| means that both messages have the same length.
Any attacker trying to guess which message is encrypted in a ciphertext is as accurate as a coin flip. All messages with the same length have the same chance to be encrypted in the same ciphertext.
Perfect secrecy does not imply perfectly usable cipher. Perfect secrecy has two unfortunate properties:
- It does not guarantee security if you use the same key twice.
- It requires key at least as long as the original message.
The first property is visible from careful reading of the definition. The definition states only what happen when the attacker has exactly one ciphertext available. It ignores any possibility that he found two ciphertexts encrypted by the same key.
Proving second property is not that difficult, at least if you know the right trick. Since it is such a nice exercise, I will not spoil it here. The box below states lemma you should try to prove if you take the challenge. It also links pdf with the answer for those who do not want to waste time with silly puzzles.
Theorem: If the cipher (E, D) over (K, M, C) has perfect secrecy, then |K|≥|M| e.g. the number of keys must be bigger than the number of messages.
Proof: available here (pdf)
Therefore, you have to transfer key as long as the message and you have to do it for each secret message. Of course, if you have secure channel able to transfer all those keys, chances are that you could have transferred secret messages through it too.
The core problem is that the definition of perfect secrecy does not take into account attacker abilities. The definition assumes that attacker knows everything there is to know except key and encrypted message. Real attackers are not that strong and they are not able to try all possible keys.
Semantic security is an attempt to define cipher security in more practical way. As before, the ciphertext should reveal no information about the original message except its length and the attacker can see only one encrypted message.
However, “reveal” is defined differently. It must be impossible to write reasonably fast computer program able to guess which message is encrypted in ciphertext. “Reasonably fast” rules out attempt to try all possible keys. That would be too slow.
Formal definition of semantic security involves two player game. One player is named challenger and the other is called adversary. Challenger encrypts messages and adversary is trying to break the cipher. Game rules:
- Adversary creates two messages of the same length and sends them to the challenger.
- Challenger generates random key.
- Challenger encrypts one of these messages and sends the cipher to the adversary.
- Adversary guesses which message was sent back.
If the adversary can guess which message was encrypted, then the cipher is not semantically secure. The adversary does not have to be correct all the time, he just must be better then a coin flip. His accuracy is called “semantic security advantage over the cipher under test”.
Definition: Let us describe two experiments EXP(0) and EXP(1). Each experiment EXP(b) requires a challenger and an adversary. The adversary tries to guess which experiment it is in and challenger knows that. They are allowed only following communication:
- Adversary creates two messages m0 and m1 of the same length and sends them to the challenger.
- Challenger generates random key.
- Challenger encrypts the message mb and sends E(k,mb) to the adversary.
- Adversary guesses b.
Semantic security advantage of the adversary A over the encryption E is
- Advss[A, E] = |Pr[W0] – Pr[W1]| ∈ [0, 1] where Wb = [A(EXP(b))=1] e.g. event Wb happens when adversary is in experiment b and guesses message 1.
Definition: The encryption E is semantically secure, if for all “efficient” A Advss[A, E] is negligible.
Otherwise said, the encryption E is semantically secure, if the adversary’s answer is not influenced by the experiment he is in. His chance to say “I’m in experiment 1″ is the same in experiment 1 and experiment 0.
Seemingly small difference between “objective probability” and “ability to write computer program” has one important consequence: the key does not have to be as long as the message. However, it still can be used only once and the attacker still gets to see only one encrypted message.
The attacker in this model knows what might be encrypted in the message. However, he is unable to use that knowledge and extract more information from the ciphertext. Therefore, the definition guarantees something like this: no matter how much the attacker knows about encrypted text, he is not going to learn anything new from the ciphertext.
Lastly, it is attacker who composes messages. Any semantically secure cipher provides security even if the attacker can compose part of the message you are going to send. It is impossible for him to write his part in a way that would compromise the rest of the message.
Chosen Plaintext Attack (CPA)
One time keys are not much practical, so the third definition of secrecy removes one-time-key-usage condition from the semantic security definition.
Challenger and adversary play the same game as previously. There is only one change: the attacker can compose as many messages as he wants to and see as may ciphertexts as he wants to. Of course, all ciphertexts are encrypted with the same key.
The adversary and challenger communicate in iterations. At each iteration, the adversary generates two messages m0 and m1 of equal length and sends them to the challenger. The challenger encrypts one of them and sends the cipher back. The challenger has two limitations:
- It has to use the same key for each iteration.
- It has to send back message with the same index at each iteration.
Adversary has to guess whether challenger sends back messages with index 0 or those with index 1. He does not have to be correct all the time, we only require him to have non-zero accuracy.
Define: Let (E,D) be a cipher over (K,M,C). For b=0,1 define EXP(b) as:
- Challenger generates random key.
- Adversary queries the challenger any number of times. In each iteration i:
- Adversary creates two messages mi,0 and mi,1 of the same length and sends them to the challenger.
- Challenger encrypts the message mi,b and sends E(k, mi,b) to the adversary.
- Adversary guesses b.
Define: A cipher (E,D) over (K,M,C) is secure under the chosen plaintext attack if the advantage of all “efficient” A:
- AdvPRF[A,E]=|Pr[W0]-Pr[W1]| is negligible.
Since event Wb happens when adversary is in experiment b and guesses message 1, the above definition can be rewritten as:
- AdvPRF[A,E]=|Pr[A(EXP(0))=1]-Pr[A(EXP(1))=1]| is negligible.
Otherwise said, the encryption E is CPA secure, if the adversary’s answer is not influenced by the experiment he is in. His chance to say “I’m in experiment 1″ is the same in experiment 1 and experiment 0.
The adversary in chosen plaintext attack definition is so powerful, that the encryption algorithm must be randomized in order to withstand his attacks. If you encrypt the same message twice, you should obtain two different ciphertexts. In short, cpa secure cipher can not be deterministic.
Claim: Let (E,D) be a cipher over (K,M,C). If E(k,m) is deterministic, then the cipher is unsecure under the chosen plaintext attack.
Proof: Adversary needs two queries to the challenger. First, adversary sends the same message twice e.g., m1,0=m1,1. Challenger encrypts one of these messages and sends back the ciphertext. It does not matter which message is encrypted, the ciphertext is the same in both cases.
Second query contains one copy of the same same message as before and one different message e.g., m2,0=m1,1 and m2,1≠m1,1.
If the challenger sends back same ciphertext as before, then the adversary knows he is in experiment 0. If the challenger sends back different ciphertext as before, then the adversary knows he is in experiment 1.
The fact that it is attacker who composes messages to be encrypted is more important then it might look at the first sight. Some ciphers are perfectly secure against passive attacker, but fail miserably if the attacker can compose messages.
As before, the attacker is unable to write part of the message in a way that would compromise the rest of it. Plus, no matter how much he knows about encrypted message, he is not going to learn anything new from the ciphertext.
From all three stated definitions of security, this one is the only suitable for disk encryption. Neither semantically secure cipher not perfectly secure cipher is good enough to encrypt files. At least, if you do not want to change the key after each disk write operation.
Last two definitions used terms “efficient” and “reasonably fast” to define adversary abilities. We left them undefined, so here goes one practical estimate: fast enough is faster than 290 operations.
If there is an attack that requires less then 290 operations, then the cipher is not secure.
Final Warning – Integrity
Strongest cipher security definition explained in this post is “chosen plaintext attack”. As far as I know, any cipher that satisfies this definition is able to keep your secrets secret.
Unfortunately, it still does not mean that such cipher is able to keep your data secure. While the attacker can not extract any new information from the ciphertext, he could be able to modify encrypted text. For example, he still could change your “yes” into “no” or modify account number inside an encrypted bill.
That brings back the context mentioned in the beginning of this post. A cipher perfectly secure in one context may very easily fail in another. Ciphers are designed to protect against well defined attacks and vulnerable to anything else.
Of course, there are ways to protect against these kind of attacks too, but that is beyond what I planned to write about.
Next part will be about pseudorandom generators. They are important, because all ciphers have to use one if they want to protect against chosen plaintext attack. There is no way around it.