title | filename | chapternum |
---|---|---|
Zero Knowledge proofs |
lec_14_zero_knowledge |
13 |
The notion of proof is central to so many fields. In mathematics, we want to prove that a certain assertion is correct. In other sciences, we often want to accumulate a preponderance of evidence (or statistical significance) to reject certain hypotheses. In criminal law the prosecution famously needs to prove its case "beyond a reasonable doubt". Cryptography turns out to give some new twists on this ancient notion.
Typically a proof that some assertion X is true, also reveals some information about why X is true.
When Hercule Poirot proves that Norman Gale killed Madame Giselle he does so by showing how Gale committed the murder by dressing up as a flight attendant and stabbing Madame Gisselle with a poisoned dart.
Could Hercule convince us beyond a reasonable doubt that Gale did the crime without giving any information on how the crime was committed?
Can the Russians prove to the U.S. that a sealed box contains an authentic nuclear warhead without revealing anything about its design?
Can I prove to you that the number
Zero knowledge proofs are proofs that fully convince that a statement is true without yielding any additional knowledge.
So, after seeing a zero knowledge proof that
::: { .pause } This chapter will rely on the notion of NP completeness, as well as the view of NP as proof systems. For a review of this notion, please see this chapter of my introduction to TCS text. :::
Before we talk about how to achieve zero knowledge, let us discuss some of its potential applications:
The United States and Russia have reached a dangerous and expensive equilibrium where each has about 7000 nuclear warheads, much more than is needed to decimate each others' population (and the population of much of the rest of the world).1 Having so many weapons increases the chance of "leakage" of weapons, or of an accidental launch (which can result in an all out war) through fault in communications or rogue commanders. This also threatens the delicate balance of the Non-Proliferation Treaty which at its core is a bargain where non-weapons states agree not to pursue nuclear weapons and the five nuclear weapon states agree to make progress on nuclear disarmament. These huge quantities of nuclear weapons are not only dangerous, as they increase the chance of a leak or of an individual failure or rogue commander causing a world catastrophe, but also extremely expensive to maintain.
For all of these reasons, in 2009, U.S. President Obama called to set as a long term goal a "world without nuclear weapons" and in 2012 spoke concretely about talking to Russia about reducing "not only our strategic nuclear warheads, but also tactical weapons and warheads in reserve". On the other side, Russian President Putin has said already in 2000 that he sees "no obstacles that could hamper future deep cuts of strategic offensive armaments". (Though as of 2018, political winds on both sides have shifted away from disarmament and more toward armament.)
There are many reasons why progress on nuclear disarmament has been so slow, and most of them have nothing to do with zero knowledge or any other piece of technology. But there are some technical hurdles as well. One of those hurdles is that for the U.S. and Russia to go beyond restricting the number of deployed weapons to significantly reducing the stockpiles, they need to find a way for one country to verifiably prove that it has dismantled warheads. As mentioned in my work with Glaser and Goldston (see also this page), a key stumbling block is that the design of a nuclear warhead is of course highly classified and about the last thing in the world that the U.S. would like to share with Russia and vice versa. So, how can the U.S. convince the Russian that it has destroyed a warhead, when it cannot let Russian experts anywhere near it?
Electronic voting has been of great interest for many reasons. One potential advantage is that it could allow completely transparent vote counting, where every citizen could verify that the votes were counted correctly. For example, Chaum suggested an approach to do so by publishing an encryption of every vote and then having the central authority prove that the final outcome corresponds to the counts of all the plaintexts. But of course to maintain voter privacy, we need to prove this without actually revealing those plaintexts. Can we do so?
I chose these two examples above precisely because they are hardly the first that come to mind when thinking about zero knowledge.
Zero knowledge has been used for many cryptographic applications.
One such application (originating from work of Fiat and Shamir) is the use for identification protocols.
Here Alice knows a solution
Another very generic application is for "compiling protocols". As we've seen time and again, it is often much easier to handle passive adversaries than active ones. (For example, it's much easier to get CPA security against the eavesdropping Eve than CCA security against the person-in-the-middle Mallory.) Thus it would be wonderful if we could "compile" a protocol that is secure with respect to passive attacks into one that is secure with respect to active ones. As was first shown by Goldreich, Micali, and Wigderson, zero knowledge proofs yield a very general such compiler. The idea is that all parties prove in zero knowledge that they follow the protocol's specifications. Normally, such proofs might require the parties to reveal their secret inputs, hence violating security, but zero knowledge precisely guarantees that we can verify correct behaviour without access to these inputs.
So, zero knowledge proofs are wonderful objects, but how do we get them? In fact, we haven't answered the even more basic question of how do we define zero knowledge? We have to start by the most basic task of defining what we mean by a proof.
A proof system can be thought of as an algorithm
-
In Euclidean geometry, statements are geometric facts such as "in any triangle the degrees sum to 180 degrees" and the proofs are step by step derivations of the statements from the five basic postulates.
-
In Zermelo-Fraenkel + Axiom of Choice (ZFC) a statement is some purported fact about sets (e.g., the Riemann Hypothesis3), and a proof is a step by step derivation of it from the axioms.
-
We can define many other "theories". For example, a theory where the statements are pairs
$(x,m)$ such that$x$ is a quadratic residue modulo$m$ and a proof for$x$ is the number$s$ such that$x=s^2 \pmod{m}$ , or a theory where the theorems are Hamiltonian graphs$G$ (graphs on$n$ vertices that contain an$n$ -long cycle) and the proofs are the description of the cycle.
All these proof systems have the property that the verifying algorithm
To achieve the notion of zero knowledge proofs, Goldwasser and Micali had to consider a generalization of proofs from static sequences of symbols to interactive probabilistic protocols between a prover and a verifier. Let's start with an informal example. The vast majority of humans have three types of cone cells in their eyes. The reason why we perceive the sky as blue (see also this), despite its color being quite a different spectrum than the blue of the rainbow, is that the projection of the sky's color to our cones is closest to the projection of blue. It has been suggested that a tiny fraction of the human population might have four functioning cones (in fact, only women, as it would require two X chromosomes and a certain mutation). How would a person prove to another that she is a in fact such a tetrachromat?
Proof of tetrachromacy:
Suppose that Alice is a tetrachromat and can distinguish between the colors of two pieces of plastic that would be identical to a trichromat. She wants to prove to a trichromat Bob that the two pieces are not identical. She can do this as follows:
Alice and Bob will repeat the following experiment
If Alice is successful in all of the
A similar "proof" inspired the influential notion of hypothesis testing in statistics. Dr. Muriel Bristol said that she prefers the taste of tea when the milk is put first into the cup and tea later, rather than vice versa. The statistician Ronald Fisher did not believe her. William Roach (like Bristol, a chemist, and her future husband) proposed a probabilistic test, whereby eight cups would be poured for Bristol, each randomly chosen to either be "milk first" or "tea first". Bristol correctly identified all 8 cups. Pondering about this experiment, and the level of confidence that it enabled to reject the "null hypothesis" that Bristol simply guessed randomly led to Fisher's development of hypothesis testing and the now ubiquitous "$p$ values".
We now consider a more "mathematical" example along similar lines.
Recall that if
-
We have two parties: Alice and Bob. The common input is
$(m,x)$ and Alice wants to convince Bob that$NQR(m,x)=1$ . (That is, that$x$ is not a quadratic residue modulo$m$ ). -
We assume that Alice can compute
$NQR(m,w)$ for every$w\in {0,\ldots,m-1}$ but Bob is polynomial time. -
The protocol will work as follows:
-
Bob will pick some random
$s\in \Z^*_m$ (e.g., by picking a random number in${1,\ldots,m-1}$ and discard it if it has nontrivial g.c.d. with$m$ ) and toss a coin$b\in{0,1}$ . If$b=0$ then Bob will send$s^2 \pmod{m}$ to Alice and otherwise he will send$xs^2 \pmod{m}$ to Alice. -
Alice will use her ability to compute
$NQR(m,\cdot)$ to respond with$b'=0$ if Bob sent a quadratic residue and with$b'=1$ otherwise. -
Bob accepts the proof if
$b=b'$ .
To see that Bob will indeed accept the proof, note that if
Moreover, if
Please stop and make sure you see the similarities between this protocol and the one for demonstrating that the two pieces of plastic do not have identical colors.
Let us now make the formal definition:
::: {.definition title="Proof systems" #proofsystemdef}
Let
-
Completeness: If
$f(x)=1$ then on input$x$ , if$P$ and$V$ are given input$x$ and interact, then at the end of the interaction$V$ will outputAccept
with probability at least$0.9$ . -
Soundness: If If
$f(x)=0$ then for any arbitrary (efficient or non efficient) algorithm $P^$, if $P^$ and$V$ are given input$x$ and interact then at the end$V$ will outputAccept
with probability at most$0.1$ . :::
In many texts proof systems are defined with respect to languages as opposed to functions. That is, instead of talking about a function
Note that we don't necessarily require the prover to be efficient (and indeed, in some cases it might not be).
On the other hand, our soundness condition holds even if the prover uses a non efficient strategy.4
We say that a proof system has an efficient prover if there is an NP-type proof system
Up until now, we always considered cryptographic protocols where Alice and Bob trusted one another, but were worried about some adversary controlling the channel between them. Now we are in a somewhat more "suspicious" setting where the parties do not fully trust one another. In such protocols there is always a "prescribed" or honest strategy that a particular party should follow, but we generally don't want the other parties' security to rely on someone else's good intention, and hence analyze also the case where a party uses an arbitrary malicious strategy. We sometimes also consider the honest but curious case where the adversary is passive and only collects information, but does not deviate from the prescribed strategy.
Protocols typically only guarantee security for party A when it behaves honestly - a party can always chose to violate its own security and there is not much we can (or should?) do about it.
So far we merely defined the notion of an interactive proof system, but we need to define what it means for a proof to be zero knowledge.
Before we attempt a definition, let us consider an example.
Going back to the notion of quadratic residuosity, suppose that
Protocol ZK-QR: Public input for Alice and Bob:
-
Alice will pick a random
$s'$ and send to Bob$x' = xs'^2 \pmod{m}$ . -
Bob will pick a random bit
$b\in{0,1}$ and send$b$ to Alice. -
If
$b=0$ then Alice reveals$ss'$ , hence giving out a root for$x'$ ; if$b=1$ then Alice reveals$s'$ , hence showing a root for$x'x^{-1}$ . -
Bob checks that the value
$s''$ revealed by Alice is indeed a root of$x'x^{-b}$ , if so then it "accepts" the proof.
If
On the other hand, we claim that we didn't really reveal anything about
To define zero knowledge mathematically we follow the following intuition:
A proof system is zero knowledge if the verifier did not learn anything after the interaction that he could not have learned on his own.
Despite the name "zero knowledge", we do not claim that the verifier does not know anything about the private input
Here is how we formally define zero knowledge:
::: {.definition title="Zero knowledge proofs" #zkpdef}
A proof system
-
The output of
$V^*$ after interacting with$P$ on input$x$ . -
The output of
$S^*$ on input$x$ . :::
That is, we can show the verifier does not gain anything from the interaction, because no matter what algorithm $V^$ he uses, whatever he learned as a result of interacting with the prover, he could have just as equally learned by simply running the standalone algorithm $S^$ on the same input.
::: {.remark title="The simulation paradigm" #simulationrem} The natural way to define security is to say that a system is secure if some "laundry list" of bad outcomes X,Y,Z can't happen. The definition of zero knowledge is different. Rather than giving a list of the events that are not allowed to occur, it gives a maximalist simulation condition.
At its heart the definition of zero knowledge says the following: clearly, we cannot prevent the verifier from running an efficient algorithm
This simulation paradigm has become the standard way to define security of a great many cryptographic applications. That is, we bound what an adversary Eve can learn by postulating some hypothetical adversary Lilith that is under much harsher conditions (e.g., does not get to interact with the prover) and ensuring that Eve cannot learn anything that Lilith couldn't have learned either. This has an advantage of being the most conservative definition possible, and also phrasing security in positive terms- there exists a simulation - as opposed to the typical negative terms - events X,Y,Z can't happen. Since it's often easier for us to think of positive terms, paradoxically sometimes this stronger security condition is easier to prove. Zero knowledge is in some sense the simplest setting of the simulation paradigm and we'll see it time and again in dealing with more advanced notions. :::
The definition of zero knowledge is confusing since intuitively if the verifier gained confidence that the statement is true than surely he must have learned something. This is another one of those cases where cryptography is counterintuitive. To understand it better, it is worthwhile to see the formal proof that the protocol above for quadratic residuosity is zero knowledge:
Protocol ZK-QR above is a zero knowledge protocol.
::: {.proof data-ref="zkqrthm"}
Let
-
$V_1(x,m,x')$ outputs the bit$b$ that Bob chooses on input$x,m$ and after Alice's first message is$x'$ . -
$V_2(x,m,x',s'')$ is whatever Bob outputs after seeing Alice's response$s''$ to the bit$b$ .
Both
The simulator
-
Pick
$b'\leftarrow_R{0,1}$ . -
Pick
$s''$ at random in$\Z^*_m$ . If$b=0$ then let$x'={s''}^2 \pmod{m}$ . Otherwise output$x'=x{s''}^2 \pmod{m}$ . -
Let
$b=V_1(x,m,x')$ . If$b \neq b'$ then go back to step 1. -
Output
$V_2(x,m,x',s'')$ .
The correctness of the simulator follows from the following claims (all of which assume that
Claim 1: The distribution of
Claim 2: With probability at least
Claim 3: Conditioned on
Together these three claims imply that in expectation $S^$ only invokes $V_1$ and $V_2$ a constant number of times (since every time it goes back to step 1 with probability at most $1/2$).
They also imply that the output of $S^$ is in fact identical to the output of
Proof of Claim 1: In both cases,
Proof of Claim 2: This is a corollary of Claim 1; since the distribution of
Proof of Claim 3: This follows from a direct calculation. The value
Together these complete the proof of the theorem. :::
zkqrthm{.ref} is interesting but not yet good enough to guarantee security in practice.
After all, the protocol that we really need to show is zero knowledge is the one where we repeat this procedure
We now show a proof for another language.
Suppose that Alice and Bob know an
Protocol ZK-Ham:
-
Common input: graph
$H$ (in the form of an$n\times n$ adjacency matrix). Alice's private input: a Hamiltonian cycle$C=(C_1,\ldots,C_n)$ which are distinct vertices such that$(C_\ell,C_{\ell+1})$ is an edge in$H$ for all$\ell\in{1,\ldots,n-1}$ and$(C_n,C_1)$ is an edge as well. Below we assume that$G:{0,1}^n \rightarrow{0,1}^{3n}$ is a pseudorandom generator. -
Bob chooses a random string
$z\in {0,1}^{3n}$ -
Alice chooses a random permutation
$\pi$ on${1,\ldots, n}$ and let$M$ be the$\pi$ -permuted adjacency matrix of$H$ (i.e.,$M_{\pi(i),\pi(j)}=1$ iff$(i,j)$ is an edge in$H$ ). For every$i,j$ , Alice chooses a random string$x_{i,j} \in {0,1}^n$ and let$y_{i,j}=G(x_{i,j})\oplus M_{i,j}z$ . She sends${ y_{i,j} }_{i,j \in [n]}$ to Bob. -
Bob chooses a bit
$b\in{0,1}$ . -
If
$b=0$ then Alice sends out$\pi$ and the strings${ x_{i,j} }$ for all$i,j$ ; if$b=1$ then Alice sends out the$n$ strings$x_{\pi(C_1),\pi(C_2)},\ldots,x_{\pi(C_n),\pi(C_1)}$ together with their indices. -
If
$b=0$ then Bob computes$M$ to be the$\pi$ -permuted adjacency matrix of$H$ and verifies that all the$y_{i,j}$ 's were computed from the$x_{i,j}$ 's appropriately. If so then Bob accepts the proof, and otherwise it rejects it. If$b=1$ then Bob verifies that the indices of the strings${ x_{i,j } }$ sent by Alice form a cycle and that indeed$y_{i,j}=G(x_{i,j})\oplus z$ for every string$x_{i,j}$ that was sent by Alice. If so then Bob accepts the proof and otherwise he rejects it.
Protocol ZK-Ham is a zero knowledge proof system for the language of Hamiltonian graphs.5
::: {.proof data-ref="zkhamthm"} We need to prove completeness, soundness, and zero knowledge.
Completeness can be easily verified, and so we leave this to the reader.
For soundness, we recall that (as we've seen before) with extremely high probability the sets
We split into two cases.
The first case is that there exists some permutation
We now turn to showing zero knowledge. For this we need to build a simulator $S^$ for an arbitrary efficient strategy $V^$ of Bob. Recall that $S^$ gets as input the graph $H$ (but not the Hamiltonian cycle $C$) and needs to produce an output that is indistinguishable from the output of $V^$. It will do so as follows:
-
Pick
$b'\in{0,1}$ . -
Let
$z\in {0,1}^{3n}$ be the first message computed by$V^*$ on input$H$ . -
If
$b'=0$ then $S^$ computes the second message as Alice does: chooses a random permutation $\pi$ on ${1,\ldots, n}$ and let $M$ be the $\pi$-permuted adjacency matrix of $H$ (i.e., $M_{\pi(i),\pi(j)}=1$ iff $(i,j)$ is an edge in $H$). In contrast, if $b'=1$ then $S^$ lets$M$ be the all $1$s matrix. For every$i,j$ ,$S^*$ chooses a random string$x_{i,j} \in {0,1}^n$ and let$y_{i,j}=G(x_{i,j})\oplus M_{i,j}z$ , where$G:{0,1}^n\rightarrow{0,1}^{3n}$ is a pseudorandom generator. -
Let
$b$ be the output of$V^*$ when given the input$H$ and the first message${ y_{i,j} }$ computed as above. If$b\neq b'$ then go back to step 0. -
We compute the fourth message of the protocol similarly to how Alice does it: if
$b=0$ then it consists of$\pi$ and the strings${ x_{i,j} }$ for all$i,j$ ; if$b=1$ then we pick a random length-$n$ cycle$C'$ and the message consists of the$n$ strings $x_{C'_1,C'2},\ldots,x{C'_n,C'_1}$ together with their indices. -
Output whatever
$V^*$ outputs when given the prior message.
We prove the output of the simulator is indistinguishable from the output of
Claim 1: The message
Claim 2: The probability that
Claim 3: The fourth message computed by
We will simply sketch here the proofs (see Goldreich's book for example for full proofs):
For Claim 1, note that if
Claim 2 is a corollary of Claim 1. If $V^$ managed to pick a message $b$ such that $\Pr[ b=b' ] < 1/2 - negl(n)$ then in particular it could distinguish between the first message of Alice (that is computed independently of $b'$ and hence contains no information about it) from the first message of $V^$.
For Claim 3, note that again if
This completes the proof of the theorem. :::
The reason that a protocol for Hamiltonicity is more interesting than a protocol for quadratic residuosity is that Hamiltonicity is an NP-complete problem. Specifically recall the following:
-
A function
$F:{0,1}^* \rightarrow {0,1}$ is in NP if there exists a polynomial-time algorithm$V_F$ and some integer$c$ such that for every$x\in {0,1}^*$ ,$F(x)=1$ iff there exists$y \in {0,1}^{|x|^c}$ such that$V_F(x,y)=1$ . Many functions of interest in all areas of math, science, engineering, and more are in the class NP. -
Let
$HAM:{0,1}^* \rightarrow {0,1}$ be the function that maps a graph$G$ to$1$ if and only if$G$ contains a Hamiltonian cycle. Then$HAM \in NP$ . Indeed, this is demonstrated by the function$V_{HAM}$ such that$V_{HAM}(G,C)=1$ iff$C$ is a Hamiltonian cycle in the graph$G$ . -
The function
$HAM$ is NP-complete. Specifically for every$F,V_F$ as above, there is are efficiently computable functions$r, r_{Encode}, r_{Decode}$ that satisfy the following: a. (Completeness of reduction.) For every$x,y$ such that$V_F(x,y)=1$ ,$V_{HAM}( r(x), r_{Encode}(x,y))=1$ . In particular this means that for every$x$ such that$F(x)=1$ ,$HAM(r(x))=1$ . (Can you see why?) b. (Soundness of reduction.) For every$x \in {0,1}^*$ , if there exists$C$ such that$V_{HAM}(r(x),C)=1$ then$V_F(x,r_{Decode}(x,C))=1$ . In particular this means that for every$x$ such that$HAM(r(x))=1$ ,$F(x)=1$ . (Can you see why?)
Using the reduction above, we can transform the zero-knowledge proof for Hamiltonicity into a zero knowledge proof for every
-
Public input:
$x$ . Prover's private input:$y$ such that$V_F(x,y)=1$ . -
Verifier and prover will compute
$G=r(x)$ . Prover will compute$C=r_{Encode}(x,y)$ . -
Verifier and prove run the Hamiltonicity zero knowledge protocol, with public input
$G$ and prover's private input$C$ . The verifier's output is the output in this protocol.
::: { .pause }
Please make sure that you understand why this will give a zero knowledge proof for
Note that while the NP completeness of Hamiltonicity (and the Cook-Levin Theorem in general) is usually perceived as a negative result (showing evidence for the non-existence of an algorithm), in this context we use it to obtain a positive result (zero knowledge proof systems for many interesting functions). :::
This means that for every other NP language
-
The language of numbers
$m$ such that there exists a prime$p$ dividing$m$ whose remainder modulo$10$ is$7$ . -
The language of tuples
$X,e,c_1,\ldots,c_n$ such that$c_i$ is an encryption of a number$x_i$ with$\sum x_i = X$ . (This is essentially what we needed in the voting example above). -
For every efficient function
$G$ , the language of pairs$x,y$ such that there exists some input$r$ satisfying$y=G(x|r)$ . (This is what we often need in the "protocol compiling" applications to show that a particular output was produced by the correct program$G$ on public input$x$ and private input$r$ .)
While we talked about amplifying zero knowledge proofs by running them
However, Fiat and Shamir showed that in protocols (such as the ones we showed here) where the verifier only sends random bits, then if we replaced this verifier by a random function, then both soundness and zero knowledge are preserved.
This suggests a non-interactive version of these protocols in the random oracle model, and this is indeed widely used.
Schnorr designed signatures based on this non interactive version.
The following properties of zero knowledge systems are used in the literature. We might cover some in class, but mention them here. These are covered in Chapter 20 of Boneh-Shoup.
-
Proof of knowledge - it can be shown that the proof above of Hamiltonicity yields more than soundness. We can "extract" from a prover startegy that succeeds in convincing the verifier that
$G$ is Hamiltonian with probability larger than 1/2 an actual Hamiltonian cycle. This means that the prover didn't just convince the verifier that there exists a Hamiltonian cycle in the graph$G$ but also that the prover "knows" it. This notion is known as a "proof of knowledge". -
Arguments - if a proof system only satisfies the soundness condition with respect to polynomial-time provers, then it is called an argument system.
-
Succinct proofs - proofs that
$F(x)=1$ where total communication is a fixed polynomial in$n$ independently of the time to verify$F$ .
Combining succinct zero-knowledge proofs with the Fiat-Shamir heuristic for non-interactivity leads to the notion of zero-knowledge succinct arguments or ZK-SNARG. If these also satisfy a "proof of knowledge" property then they are called ZK-SNARKs. These have recently been of great interest for crypto-currencies. See lectures 16-18 in Stanford CS 251, as well as this blog post.
Footnotes
-
To be fair, "only" about 170 million Americans live in the 50 largest metropolitan areas and so arguably many people will survive at least the initial impact of a nuclear war, though it had been estimated that even a "small" nuclear war involving detonation of 100 not too large warheads could have devastating global consequences. ↩
-
As we'll see, technically what Alice needs to do in such a scenario is use a zero knowledge proof of knowledge of a solution for $P$. ↩
-
Integers can be coded as sets in various ways. For example, one can encode $0$ as $\emptyset$ and if $N$ is the set encoding $n$, we can encode $n+1$ using the $n+1$-element set ${ N } \cup N$. ↩
-
People have considered the notion of zero knowledge systems where soundness holds only with respect to efficient provers; these are known as argument systems. ↩
-
Goldreich, Micali and Wigderson were the first to come up with a zero knowledge proof for an NP complete problem, though the Hamiltoncity protocol here is from a later work by Blum. We use Naor's commitment scheme. ↩