In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on { 0 , 1 } n {\displaystyle \{0,1\}^{n}} or { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} (such functions are sometimes known as pseudo-Boolean functions) from a spectral perspective.[1] The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.
We will mostly consider functions defined on the domain { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} . Sometimes it is more convenient to work with the domain { 0 , 1 } n {\displaystyle \{0,1\}^{n}} instead. If f {\displaystyle f} is defined on { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} , then the corresponding function defined on { 0 , 1 } n {\displaystyle \{0,1\}^{n}} is
Similarly, for us a Boolean function is a { − 1 , 1 } {\displaystyle \{-1,1\}} -valued function, though often it is more convenient to consider { 0 , 1 } {\displaystyle \{0,1\}} -valued functions instead.
Every real-valued function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } has a unique expansion as a multilinear polynomial:
(Note that even if the function is 0-1 valued this is not a sum mod 2, but just an ordinary sum of real numbers.)
This is the Hadamard transform of the function f {\displaystyle f} , which is the Fourier transform in the group Z 2 n {\displaystyle \mathbb {Z} _{2}^{n}} . The coefficients f ^ ( S ) {\displaystyle {\hat {f}}(S)} are known as Fourier coefficients, and the entire sum is known as the Fourier expansion of f {\displaystyle f} . The functions χ S {\displaystyle \chi _{S}} are known as Fourier characters, and they form an orthonormal basis for the space of all functions over { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} , with respect to the inner product ⟨ f , g ⟩ = 2 − n ∑ x ∈ { − 1 , 1 } n f ( x ) g ( x ) {\displaystyle \langle f,g\rangle =2^{-n}\sum _{x\in \{-1,1\}^{n}}f(x)g(x)} .
The Fourier coefficients can be calculated using an inner product:
In particular, this shows that f ^ ( ∅ ) = E [ f ] {\displaystyle {\hat {f}}(\emptyset )=\operatorname {E} [f]} , where the expected value is taken with respect to the uniform distribution over { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} . Parseval's identity states that
If we skip S = ∅ {\displaystyle S=\emptyset } , then we get the variance of f {\displaystyle f} :
The degree of a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } is the maximum d {\displaystyle d} such that f ^ ( S ) ≠ 0 {\displaystyle {\hat {f}}(S)\neq 0} for some set S {\displaystyle S} of size d {\displaystyle d} . In other words, the degree of f {\displaystyle f} is its degree as a multilinear polynomial.
It is convenient to decompose the Fourier expansion into levels: the Fourier coefficient f ^ ( S ) {\displaystyle {\hat {f}}(S)} is on level | S | {\displaystyle |S|} .
The degree d {\displaystyle d} part of f {\displaystyle f} is
It is obtained from f {\displaystyle f} by zeroing out all Fourier coefficients not on level d {\displaystyle d} .
We similarly define f > d , f < d , f ≥ d , f ≤ d {\displaystyle f^{>d},f^{<d},f^{\geq d},f^{\leq d}} .
The i {\displaystyle i} 'th influence of a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } can be defined in two equivalent ways:
If f {\displaystyle f} is Boolean then Inf i [ f ] {\displaystyle \operatorname {Inf} _{i}[f]} is the probability that flipping the i {\displaystyle i} 'th coordinate flips the value of the function:
If Inf i [ f ] = 0 {\displaystyle \operatorname {Inf} _{i}[f]=0} then f {\displaystyle f} doesn't depend on the i {\displaystyle i} 'th coordinate.
The total influence of f {\displaystyle f} is the sum of all of its influences:
The total influence of a Boolean function is also the average sensitivity of the function. The sensitivity of a Boolean function f {\displaystyle f} at a given point is the number of coordinates i {\displaystyle i} such that if we flip the i {\displaystyle i} 'th coordinate, the value of the function changes. The average value of this quantity is exactly the total influence.
The total influence can also be defined using the discrete Laplacian of the Hamming graph, suitably normalized: Inf [ f ] = ⟨ f , L f ⟩ {\displaystyle \operatorname {Inf} [f]=\langle f,Lf\rangle } .
A generalized form of influence is the ρ {\displaystyle \rho } -stable influence, defined by:
The corresponding total influences is
One can prove that a function f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f:\{-1,1\}^{n}\to \{-1,1\}} has at most “constantly” many “stably-influential” coordinates: | { i ∈ [ n ] : Inf i ( 1 − δ ) [ f ] ≥ ϵ } | ≤ 1 δ ϵ . {\displaystyle |\{i\in [n]:\operatorname {Inf} _{i}^{\,(1-\delta )}[f]\geq \epsilon \}|\leq {\frac {1}{\delta \epsilon }}.}
Given − 1 ≤ ρ ≤ 1 {\displaystyle -1\leq \rho \leq 1} , we say that two random vectors x , y ∈ { − 1 , 1 } n {\displaystyle x,y\in \{-1,1\}^{n}} are ρ {\displaystyle \rho } -correlated if the marginal distributions of x , y {\displaystyle x,y} are uniform, and E [ x i y i ] = ρ {\displaystyle \operatorname {E} [x_{i}y_{i}]=\rho } . Concretely, we can generate a pair of ρ {\displaystyle \rho } -correlated random variables by first choosing x , z ∈ { − 1 , 1 } n {\displaystyle x,z\in \{-1,1\}^{n}} uniformly at random, and then choosing y {\displaystyle y} according to one of the following two equivalent rules, applied independently to each coordinate:
We denote this distribution by y ∼ N ρ ( x ) {\displaystyle y\sim N_{\rho }(x)} .
The noise stability of a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } at ρ {\displaystyle \rho } can be defined in two equivalent ways:
For 0 ≤ δ ≤ 1 {\displaystyle 0\leq \delta \leq 1} , the noise sensitivity of f {\displaystyle f} at δ {\displaystyle \delta } is
If f {\displaystyle f} is Boolean, then this is the probability that the value of f {\displaystyle f} changes if we flip each coordinate with probability δ {\displaystyle \delta } , independently.
The noise operator T ρ {\displaystyle T_{\rho }} is an operator taking a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } and returning another function T ρ f : { − 1 , 1 } n → R {\displaystyle T_{\rho }f\colon \{-1,1\}^{n}\to \mathbb {R} } given by
When ρ > 0 {\displaystyle \rho >0} , the noise operator can also be defined using a continuous-time Markov chain in which each bit is flipped independently with rate 1. The operator T ρ {\displaystyle T_{\rho }} corresponds to running this Markov chain for 1 2 log 1 ρ {\displaystyle {\frac {1}{2}}\log {\frac {1}{\rho }}} steps starting at x {\displaystyle x} , and taking the average value of f {\displaystyle f} at the final state. This Markov chain is generated by the Laplacian of the Hamming graph, and this relates total influence to the noise operator.
Noise stability can be defined in terms of the noise operator: Stab ρ [ f ] = ⟨ f , T ρ f ⟩ {\displaystyle \operatorname {Stab} _{\rho }[f]=\langle f,T_{\rho }f\rangle } .
For 1 ≤ q < ∞ {\displaystyle 1\leq q<\infty } , the L q {\displaystyle L_{q}} -norm of a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } is defined by
We also define ‖ f ‖ ∞ = max x ∈ { − 1 , 1 } n | f ( x ) | . {\displaystyle \|f\|_{\infty }=\max _{x\in \{-1,1\}^{n}}|f(x)|.}
The hypercontractivity theorem states that for all p > q > 1 {\displaystyle p>q>1} , if | ρ | ≤ q − 1 p − 1 {\displaystyle |\rho |\leq {\sqrt {\frac {q-1}{p-1}}}} then
Hypercontractivity is closely related to the logarithmic Sobolev inequalities of functional analysis.[2]
A similar result for 1 > p > q {\displaystyle 1>p>q} is known as reverse hypercontractivity.[3] It states that if | ρ | ≤ 1 − p 1 − q {\displaystyle |\rho |\leq {\sqrt {\frac {1-p}{1-q}}}} then
In many situations the input to the function is not uniformly distributed over { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} , but instead has a bias toward − 1 {\displaystyle -1} or 1 {\displaystyle 1} . In these situations it is customary to consider functions over the domain { 0 , 1 } n {\displaystyle \{0,1\}^{n}} . For 0 < p < 1 {\displaystyle 0<p<1} , the p-biased measure μ p {\displaystyle \mu _{p}} is given by
This measure can be generated by choosing each coordinate independently to be 1 with probability p {\displaystyle p} and 0 with probability 1 − p {\displaystyle 1-p} .
The classical Fourier characters are no longer orthogonal with respect to this measure. Instead, we use the following characters:
The p-biased Fourier expansion of f {\displaystyle f} is the expansion of f {\displaystyle f} as a linear combination of p-biased characters:
We can extend the definitions of influence and the noise operator to the p-biased setting by using their spectral definitions.
The i {\displaystyle i} 's influence is given by
The total influence is the sum of the individual influences:
A pair of ρ {\displaystyle \rho } -correlated random variables can be obtained by choosing x , z ∼ μ p {\displaystyle x,z\sim \mu _{p}} independently and y ∼ N ρ ( x ) {\displaystyle y\sim N_{\rho }(x)} , where N ρ {\displaystyle N_{\rho }} is given by
The noise operator is then given by
Using this we can define the noise stability and the noise sensitivity, as before.
The Russo–Margulis formula (also called the Margulis–Russo formula[1]) states that for monotone Boolean functions f : { 0 , 1 } n → { 0 , 1 } {\displaystyle f\colon \{0,1\}^{n}\to \{0,1\}} ,
Both the influence and the probabilities are taken with respect to μ p {\displaystyle \mu _{p}} , and on the right-hand side we have the average sensitivity of f {\displaystyle f} . If we think of f {\displaystyle f} as a property, then the formula states that as p {\displaystyle p} varies, the derivative of the probability that f {\displaystyle f} occurs at p {\displaystyle p} equals the average sensitivity at p {\displaystyle p} .
The Russo–Margulis formula is key for proving sharp threshold theorems such as Friedgut's.
One of the deepest results in the area, the invariance principle, connects the distribution of functions on the Boolean cube { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} to their distribution on Gaussian space, which is the space R n {\displaystyle \mathbb {R} ^{n}} endowed with the standard n {\displaystyle n} -dimensional Gaussian measure.
Many of the basic concepts of Fourier analysis on the Boolean cube have counterparts in Gaussian space:
Gaussian space is more symmetric than the Boolean cube (for example, it is rotation invariant), and supports continuous arguments which may be harder to get through in the discrete setting of the Boolean cube. The invariance principle links the two settings, and allows deducing results on the Boolean cube from results on Gaussian space.
If f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} has degree at most 1, then f {\displaystyle f} is either constant, equal to a coordinate, or equal to the negation of a coordinate. In particular, f {\displaystyle f} is a dictatorship: a function depending on at most one coordinate.
The Friedgut–Kalai–Naor theorem,[4] also known as the FKN theorem, states that if f {\displaystyle f} almost has degree 1 then it is close to a dictatorship. Quantitatively, if f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} and ‖ f > 1 ‖ 2 < ε {\displaystyle \|f^{>1}\|^{2}<\varepsilon } , then f {\displaystyle f} is O ( ε ) {\displaystyle O(\varepsilon )} -close to a dictatorship, that is, ‖ f − g ‖ 2 = O ( ε ) {\displaystyle \|f-g\|^{2}=O(\varepsilon )} for some Boolean dictatorship g {\displaystyle g} , or equivalently, Pr [ f ≠ g ] = O ( ε ) {\displaystyle \Pr[f\neq g]=O(\varepsilon )} for some Boolean dictatorship g {\displaystyle g} .
Similarly, a Boolean function of degree at most d {\displaystyle d} depends on at most C W 2 d {\displaystyle C_{W}2^{d}} coordinates, making it a junta (a function depending on a constant number of coordinates), where C W {\displaystyle C_{W}} is an absolute constant equal to at least 1.5, and at most 4.41, as shown by Wellens.[5] The Kindler–Safra theorem[6] generalizes the Friedgut–Kalai–Naor theorem to this setting. It states that if f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} satisfies ‖ f > d ‖ 2 < ε {\displaystyle \|f^{>d}\|^{2}<\varepsilon } then f {\displaystyle f} is O ( ε ) {\displaystyle O(\varepsilon )} -close to a Boolean function of degree at most d {\displaystyle d} .
The Poincaré inequality for the Boolean cube (which follows from formulas appearing above) states that for a function f : { − 1 , 1 } n → R {\displaystyle f\colon \{-1,1\}^{n}\to \mathbb {R} } ,
This implies that max i Inf i [ f ] ≥ Var [ f ] n {\displaystyle \max _{i}\operatorname {Inf} _{i}[f]\geq {\frac {\operatorname {Var} [f]}{n}}} .
The Kahn–Kalai–Linial theorem,[7] also known as the KKL theorem, states that if f {\displaystyle f} is Boolean then max i Inf i [ f ] = Ω ( log n n ) {\displaystyle \max _{i}\operatorname {Inf} _{i}[f]=\Omega \left({\frac {\log n}{n}}\right)} .
The bound given by the Kahn–Kalai–Linial theorem is tight, and is achieved by the Tribes function of Ben-Or and Linial:[8]
The Kahn–Kalai–Linial theorem was one of the first results in the area, and was the one introducing hypercontractivity into the context of Boolean functions.
If f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} is an M {\displaystyle M} -junta (a function depending on at most M {\displaystyle M} coordinates) then Inf [ f ] ≤ M {\displaystyle \operatorname {Inf} [f]\leq M} according to the Poincaré inequality.
Friedgut's theorem[9] is a converse to this result. It states that for any ε > 0 {\displaystyle \varepsilon >0} , the function f {\displaystyle f} is ε {\displaystyle \varepsilon } -close to a Boolean junta depending on exp ( Inf [ f ] / ε ) {\displaystyle \exp(\operatorname {Inf} [f]/\varepsilon )} coordinates.
Combined with the Russo–Margulis lemma, Friedgut's junta theorem implies that for every p {\displaystyle p} , every monotone function is close to a junta with respect to μ q {\displaystyle \mu _{q}} for some q ≈ p {\displaystyle q\approx p} .
The invariance principle[10] generalizes the Berry–Esseen theorem to non-linear functions.
The Berry–Esseen theorem states (among else) that if f = ∑ i = 1 n c i x i {\displaystyle f=\sum _{i=1}^{n}c_{i}x_{i}} and no c i {\displaystyle c_{i}} is too large compared to the rest, then the distribution of f {\displaystyle f} over { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} is close to a normal distribution with the same mean and variance.
The invariance principle (in a special case) informally states that if f {\displaystyle f} is a multilinear polynomial of bounded degree over x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} and all influences of f {\displaystyle f} are small, then the distribution of f {\displaystyle f} under the uniform measure over { − 1 , 1 } n {\displaystyle \{-1,1\}^{n}} is close to its distribution in Gaussian space.
More formally, let ψ {\displaystyle \psi } be a univariate Lipschitz function, let f = ∑ S ⊆ [ n ] f ^ ( S ) χ S {\displaystyle f=\sum _{S\subseteq [n]}{\hat {f}}(S)\chi _{S}} , let k = deg f {\displaystyle k=\deg f} , and let ε = max i ∑ S ∋ i f ^ ( S ) 2 {\displaystyle \varepsilon =\max _{i}\sum _{S\ni i}{\hat {f}}(S)^{2}} . Suppose that ∑ S ≠ ∅ f ^ ( S ) 2 ≤ 1 {\displaystyle \sum _{S\neq \emptyset }{\hat {f}}(S)^{2}\leq 1} . Then
By choosing appropriate ψ {\displaystyle \psi } , this implies that the distributions of f {\displaystyle f} under both measures are close in CDF distance, which is given by sup t | Pr [ f ( x ) < t ] − Pr [ f ( g ) < t ] | {\displaystyle \sup _{t}|\Pr[f(x)<t]-\Pr[f(g)<t]|} .
The invariance principle was the key ingredient in the original proof of the Majority is Stablest theorem.
A Boolean function f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} is linear if it satisfies f ( x y ) = f ( x ) f ( y ) {\displaystyle f(xy)=f(x)f(y)} , where x y = ( x 1 y 1 , … , x n y n ) {\displaystyle xy=(x_{1}y_{1},\ldots ,x_{n}y_{n})} . It is not hard to show that the Boolean linear functions are exactly the characters χ S {\displaystyle \chi _{S}} .
In property testing we want to test whether a given function is linear. It is natural to try the following test: choose x , y ∈ { − 1 , 1 } n {\displaystyle x,y\in \{-1,1\}^{n}} uniformly at random, and check that f ( x y ) = f ( x ) f ( y ) {\displaystyle f(xy)=f(x)f(y)} . If f {\displaystyle f} is linear then it always passes the test. Blum, Luby and Rubinfeld[11] showed that if the test passes with probability 1 − ε {\displaystyle 1-\varepsilon } then f {\displaystyle f} is O ( ε ) {\displaystyle O(\varepsilon )} -close to a Fourier character. Their proof was combinatorial.
Bellare et al.[12] gave an extremely simple Fourier-analytic proof, that also shows that if the test succeeds with probability 1 / 2 + ε {\displaystyle 1/2+\varepsilon } , then f {\displaystyle f} is correlated with a Fourier character. Their proof relies on the following formula for the success probability of the test:
Arrow's impossibility theorem states that for three and more candidates, the only unanimous voting rule for which there is always a Condorcet winner is a dictatorship.
The usual proof of Arrow's theorem is combinatorial. Kalai[13] gave an alternative proof of this result in the case of three candidates using Fourier analysis. If f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} is the rule that assigns a winner among two candidates given their relative orders in the votes, then the probability that there is a Condorcet winner given a uniformly random vote is 3 4 − 3 4 Stab − 1 / 3 [ f ] {\displaystyle {\frac {3}{4}}-{\frac {3}{4}}\operatorname {Stab} _{-1/3}[f]} , from which the theorem easily follows.
The FKN theorem implies that if f {\displaystyle f} is a rule for which there is almost always a Condorcet winner, then f {\displaystyle f} is close to a dictatorship.
A classical result in the theory of random graphs states that the probability that a G ( n , p ) {\displaystyle G(n,p)} random graph is connected tends to e − e − c {\displaystyle e^{-e^{-c}}} if p ∼ log n + c n {\displaystyle p\sim {\frac {\log n+c}{n}}} . This is an example of a sharp threshold: the width of the "threshold window", which is O ( 1 / n ) {\displaystyle O(1/n)} , is asymptotically smaller than the threshold itself, which is roughly log n n {\displaystyle {\frac {\log n}{n}}} . In contrast, the probability that a G ( n , p ) {\displaystyle G(n,p)} graph contains a triangle tends to e − c 3 / 6 {\displaystyle e^{-c^{3}/6}} when p ∼ c n {\displaystyle p\sim {\frac {c}{n}}} . Here both the threshold window and the threshold itself are Θ ( 1 / n ) {\displaystyle \Theta (1/n)} , and so this is a coarse threshold.
Friedgut's sharp threshold theorem[14] states, roughly speaking, that a monotone graph property (a graph property is a property which doesn't depend on the names of the vertices) has a sharp threshold unless it is correlated with the appearance of small subgraphs. This theorem has been widely applied to analyze random graphs and percolation.
On a related note, the KKL theorem implies that the width of threshold window is always at most O ( 1 / log n ) {\displaystyle O(1/\log n)} .[15]
Let Maj n : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle \operatorname {Maj} _{n}\colon \{-1,1\}^{n}\to \{-1,1\}} denote the majority function on n {\displaystyle n} coordinates. Sheppard's formula gives the asymptotic noise stability of majority:
This is related to the probability that if we choose x ∈ { − 1 , 1 } n {\displaystyle x\in \{-1,1\}^{n}} uniformly at random and form y ∈ { − 1 , 1 } n {\displaystyle y\in \{-1,1\}^{n}} by flipping each bit of x {\displaystyle x} with probability 1 − ρ 2 {\displaystyle {\frac {1-\rho }{2}}} , then the majority stays the same:
There are Boolean functions with larger noise stability. For example, a dictatorship x i {\displaystyle x_{i}} has noise stability ρ {\displaystyle \rho } .
The Majority is Stablest theorem states, informally, then the only functions having noise stability larger than majority have influential coordinates. Formally, for every ε > 0 {\displaystyle \varepsilon >0} there exists τ > 0 {\displaystyle \tau >0} such that if f : { − 1 , 1 } n → { − 1 , 1 } {\displaystyle f\colon \{-1,1\}^{n}\to \{-1,1\}} has expectation zero and max i Inf i [ f ] ≤ τ {\displaystyle \max _{i}\operatorname {Inf} _{i}[f]\leq \tau } , then Stab ρ [ f ] ≤ 1 − 2 π arccos ρ + ε {\displaystyle \operatorname {Stab} _{\rho }[f]\leq 1-{\frac {2}{\pi }}\arccos \rho +\varepsilon } .
The first proof of this theorem used the invariance principle in conjunction with an isoperimetric theorem of Borell in Gaussian space; since then more direct proofs were devised.[16] [17]
Majority is Stablest implies that the Goemans–Williamson approximation algorithm for MAX-CUT is optimal, assuming the unique games conjecture. This implication, due to Khot et al.,[18] was the impetus behind proving the theorem.
{{cite journal}}