As was concluded in the first part of this series, security without randomness is impossible. Deterministic ciphers are unable to protect against strong attackers and true random generators are impractical or hard to get, so cryptography is build on pseudorandom generators.
First two chapters of this post define what they are and explain what kind of pseudorandom generators secure cryptography needs. Third chapter introduces yet another way how to talk and think about pseudorandom generators and ciphers in general.
Pseudorandom Generator Definition
Pseudorandom generator (PRG) is function that generates random looking numbers. It takes small number as an input and generates huge, maybe even infinite, stream of numbers. The input is called a seed.
Definition: Pseudorandom generator (PRG) is an efficient function G:{0, 1}^{s} → {0, 1}^{n} where s ≪ n (s is much less than n)
- {0, 1}^{s} – seed space,
- {0, 1}^{n} – output.
Even function generating only zeros is pseudorandom generator according to this definition. Of course, using such pseudorandom generator for security is as good as using no pseudorandom generator.
Secure Pseudorandom Generator
Pseudorandom generator good enough to be used for security purposes is called secure pseudorandom generator. Unfortunately, all we can do is to define them. Finding secure pseudorandom generator or proving that a function is secure pseudorandom generator is hard problem. It is not even known whether they exist or not.
If you can prove that the generator is secure, than P≠NP.
This chapter shows two ways how to define them and proves that they are equivalent to each other.
Definition: Unpredictability
Pseudorandom generator is secure if it is unpredictable. If you are able to watch numbers coming out of the generator, then suddenly say “stop, next bit is going to be x” and be right, then you are not watching secure pseudorandom generator.
The generator counts as predictable if it is possible to guess the next bit with a non-zero accuracy. As usually, the prediction does not have to be correct all the time, it just needs to be better then a coin flip.
Definition: A pseudorandom generator G is predictable if exists an efficient algorithm A and a number i such that:
- Pr[A(G(k)|_{1,..,i}) = G(k)|_{i+1}] ≥ 1/2 + ε for some non-negligible ε.
where:
- |_{1,..,i} denotes prefix of length i
- |_{i+1} denotes i+1th bit.
Translation: The algorithm A takes first i output bits of the pseudogenerator G as an input and prints its i+1th bit with non zero accuracy.
Definition: A pseudorandom generator is unpredictable, if it is not predictable.
Definition: Statistical Test and its Advantage
The second way to define secure pseudorandom generator is longer and more complicated. It is also more practical for mathematical proofs constructions, so we have to explain it anyway.
According to this definition, a pseudorandom generator is secure if it is not possible to write a computer program that would distinguish between its outputs and “truly random” stream of numbers. That computer program would be given a lot of streams and then decide whether they are truly random or generated by the pseudorandom generator.
What is Statistical Test
First, we have to define statistical test. Statistical test is an efficient algorithm that takes a stream of numbers as an input and outputs either 0 or 1. It usually checks some property random streams are supposed to have and outputs 1 if input stream looks random.
For example, a statistical test may compare the number of zeros and the number of ones in its input. If they occur at similar rate, test will output 1.
Definition: Statistical test is an efficient function A: {0, 1}^{n} → {0, 1}. It takes a stream as a parameter and outputs 0 if it is not random and 1 if it is random.
As before, “efficient algorithm” means that it is possible to write corresponding reasonably fast computer program.
Using Statistical Test
We will use statistical tests to distinguish between a random stream and a pseudogenerators output.
Take a lot of random seeds and use them as input for the pseudorandom generator. Run the statistical test on all it outputs. Then take a lot of truly random streams and run the same statistical test on them.
If the statistical test behaves differently, then it is possible to use that test to distinguish between generators output and truly random streams and the pseudorandom generator is bad.
Quality of Statistical Test
We will use something called “the advantage of a statistical test over a pseudorandom generator” to measure how good the statistical test is. The standard way to compute it is to calculate a difference of two probabilities:
- A probability that the statical test says 1 on the pseudorandom generator output for a random seed k.
- A probability that the statistical test says 1 on the truly random stream.
The higher the advantage is, the better the statistical tests is in distinguishing between random and pseudorandom stream. It is a real number between 0 and 1:
- 1 – distinguish perfectly,
- 0 – unable to distinguish.
Definition: Let G: K → {0, 1}^{n} be a pseudorandom generator and A a statistical test on {0, 1} ^{n}. The advantage of G over A is:
- Adv_{PRG}[A, G] := |Pr_{k →r K}[A(G(k))=1] – Pr_{r →r {0, 1}n}[A(r)=1]| ∈ [0, 1].
Secure Pseudorandom Generator
Finally, we are ready to define secure pseudorandom generator. A pseudorandom generator is secure, if the advantage of all efficient statistical tests is negligible e.g, very very small. There must be no statistical test that would allow us to distinguish between the output of pseudorandom generator and a truly random stream.
Definition: the pseudorandom generator G: K -> {0,1}^{n} is a secure PRG if the advantage of all efficient statistical tests is negligible:
- ∀ A: Adv_{PRG}[A,G] is negligible.
Unpredictability vs Statistical Tests
Both definitions of secure randomness are equal. Any secure pseudorandom generator is unpredictable and any unpredictable pseudorandom generator is secure.
If you are able to predict the next bit of output, then you are able to create good statistical test. The test would read the beginning of the stream and predict the predictable bit. It would output 1 if the prediction is correct and 0 if it is not.
As the prediction is somewhat accurate, the statistical test is somewhat able to distinguish between pseudogenerators outputs and truly random streams.
Theorem: A secure pseudorandom generator is unpredictable.
Proof: We prove that if the pseudorandom generator G is predictable, then it is insecure. Suppose that A is an efficient algorithm able to predict some bit of G output:
- Pr[A(G(k)|_{1..i}) = G(k)|_{i+1}] ≤ 1/2+ε for some non-negligible ε.
We define a statistical test B as a function that runs A and returns 1 iff A was correct:
- B(x)=1 iff A(x|_{1..i})=x_{i+1}
We will prove that if A can predict the pseudorandom generator G, then B can distinguish it from random and G is not secure.
The advantage of B is Adv
_{PRG}[B, G] := |Pr[B(G(k))=1] – Pr[B(r)=1]| = |1/2 – (1/2 + ε)| = |ε| where ε is non-negligible. Therefore the the advantage of B over G is non-negligible. B can distinguish the generator G from random with an advantage epsilon.
The second claim, the one that says unpredictable pseudorandom generator is secure was left as an exercise.
Theorem: An unpredictable pseudorandom generator is secure.
Proof-puzzle: Hint: If the pseudorandom generator G is unpredictable ∀ i ∈ {0, …, n-1}, then G is a secure pseudorandom generator.
Computationally Indistinguishable
The idea of statistical test can be generalized and used to compare any two streams of numbers. The generalization is what makes the second definition of secure pseudorandom generator useful. It allows us to analyze ciphertexts and other otherwise hard to analyze distribution of numbers.
Definition
Two distributions are computationally indistinguishable if it is not possible to write a computer program that would distinguish between them. That computer program would be given a lot of streams and then decide whether they came from one distribution or the other.
The formal definition is exactly the same as in the previous chapter. Define a statistical test as a function that measures something and outputs 0 or 1.
Define the advantage of the statistical test as a difference of two probabilities:
- A probability that the statical test says 1 on a stream generated by the first distribution.
- A probability that the statical test says 1 on a stream generated by the second distribution.
Two distributions are computationally indistinguishable, if the advantage of all efficient statistical tests is negligible. There must be no statistical test that would allow us to distinguish between those two distributions.
Definition: Let P1 and P2 be two distributions over {0, 1}^{n}. P1 and P2 are computationally indistinguishable (denoted P1 ≈ P2) if for all statistical tests A:
- |Pr_{p1 ← P1}[A(p1)=1] – Pr_{p2 ← P1}[A(p2)=1]| is negligible.
Yet Another Definition of Secure Pseudorandom Generator
This allows as to create a third equivalent definition of secure pseudorandom generator. A pseudorandom generator is secure, if its outputs are computationally indistinguishable from truly random streams.
Definition: A pseudorandom generator is secure, if {k ←^{R}: G(k)} ≈_{p} uniform({0, 1} ^{n}).
Yet Another Definition of Semantically Secure Cipher
We can also redefine a semantically secure cipher. The cipher is semantically secure, if the next two sets are computationally indistinguishable:
- the set of all possible ciphertexts corresponding to one message and
- the set of all possible ciphertexts corresponding to any other message.
Definition: The encryption E is semantically secure, if ∀ m
_{0}, m
_{1} ∈ M: {E(k, m
_{0})} ≈
_{p} {E(k, m
_{1})} e.g. the two distributions are computationally equivalent.
The previous result is very important. It basically says, that no computer program is able to crack ciphertext encrypted with a semantically secure cipher. In other words, if we are willing to change the key often, semantically secure ciphers are all we need.
Next …
Next part will explain how to build ciphers from pseudorandom generators.
Web Application Security; How to Minimize Prevalent Risk of AttacksVulnerabilities in web applications are now the largest vector of enterprise security attacks. Stories about exploits that compromise sensitive data frequently mention culprits such as cross-site scripting, SQL injection, and buffer overflow. Vulnerabilities like these fall often outside the traditional expertise of network security managers. |