# Inequalities

Published:

Introduction of probability and expectation inequalities.

# Probability Inequalities

Theorem 1 (The Gaussian Tail Inequality). Let $X\sim N(0.1)$. Then

$\mathbb{P}(\mid X\mid >\epsilon) \leq \frac{2 e^{-\epsilon^{2} / 2}}{\epsilon}.$

If $X_{1}, \ldots, X_{n} \sim N(0,1)$ then

$\mathbb{P}\left(\mid \bar{X}_{n}\mid >\epsilon\right) \leq \frac{2}{\sqrt{n} \epsilon} e^{-n \epsilon^{2} / 2} \stackrel{\text { large n}}{\leq} e^{-n \epsilon^{2} / 2}.$

Proof. Density of $X$ is $\phi(x)=(2\pi)^{(-1/2)}e^{-x^2/2}$. For one side:

\begin{aligned} \mathbb{P}(X>\epsilon)&=\int_{\epsilon}^{\infty} \phi(s) d s=\int_{\epsilon}^{\infty} \frac{s}{s} \phi(s) d s \stackrel{s>\epsilon}{\leq} \frac{1}{\epsilon} \int_{\epsilon}^{\infty} s \phi(s) d s\\ &\stackrel{\phi^{\prime}(x)=-x\phi(x)}{=}\frac{1}{\epsilon}(-\phi^{\prime}(s))\arrowvert^{\infty}_{\epsilon}=\frac{\phi(\epsilon)}{\epsilon} \leq \frac{e^{-\epsilon^{2} / 2}}{\epsilon}. \end{aligned}

By symmetry,

$\mathbb{P}(\mid X\mid >\epsilon) \leq \frac{2 e^{-\epsilon^{2} / 2}}{\epsilon}.$

Let $X_{1}, \ldots, X_{n} \sim N(0,1)$. Then $\bar{X}_{n}=n^{-1} \sum_{i=1}^{n} X_{i} \sim N(0,1 / n)$. Thus, $\bar{X}_{n} \stackrel{d}{=} n^{-1 / 2} Z$, where $Z \sim N(0,1)$ and based on the inequality above:

$\mathbb{P}\left(\mid \bar{X}_{n}\mid >\epsilon\right)=\mathbb{P}\left(n^{-1 / 2}\mid Z\mid >\epsilon\right)=\mathbb{P}(\mid Z\mid >\sqrt{n} \epsilon) \leq \frac{2}{\sqrt{n} \epsilon} e^{-n \epsilon^{2} / 2}.$

Theorem 2 (Markov Inequality). Let $X$ be a non-negative random variable and suppose that $\mathbb{E}(X)$ exists. For any $t>0$,

$\label{eq:mk} \mathbb{P}(X>t) \leq \frac{\mathbb{E}(X)}{t}.$

Proof. Since $X>0$,

\begin{aligned} \mathbb{E}(X)&=\int_{0}^{\infty} x p(x) d x=\int_{0}^{t} x p(x) d x+\int_{t}^{\infty} x p(x) d x\\ &\geq\int_{t}^{\infty} x p(x) d x\stackrel{x\geq t}{\geq}t\int_{t}^{\infty}p(x) d x=t\mathbb{P}(X>t). \end{aligned}

Theorem 3 (Chebyshev’s inequality). Let $\mu=\mathbb{E}(X)$ and $\sigma^{2}=\operatorname{Var}(X)$. Then

$\mathbb{P}(\mid X-\mu\mid \geq t) \leq \frac{\sigma^{2}}{t^{2}}\text{ and }\mathbb{P}(\mid Z\mid \geq k) \leq \frac{1}{k^{2}},$

where $Z=(X-\mu) / \sigma$. In particular, $\mathbb{P}(\mid Z\mid>2) \leq 1 / 4 \text { and } \mathbb{P}(\mid Z\mid>3) \leq 1 / 9$.

Proof.

$\mathbb{P}(\mid X-\mu\mid \geq t)=\mathbb{P}\left(\mid X-\mu\mid ^{2} \geq t^{2}\right) \stackrel{Markov}{\leq} \frac{\mathbb{E}(X-\mu)^{2}}{t^{2}}=\frac{\sigma^{2}}{t^{2}}.$

Let $t=k\sigma$,

$\mathbb{P}(\mid Z\mid \geq k)=\mathbb{P}(\mid X-\mu\mid \geq k\sigma)\leq\frac{1}{k^2}.$

# Hoeffding’s Inequality

Lemma 1. Let $X$ be a random variable. Then

$\mathbb{P}(X>\epsilon) \leq \inf _{t \geq 0} e^{-t \epsilon} \mathbb{E}\left(e^{t X}\right).$

Proof. For any $t>0$,

$\mathbb{P}(X>\epsilon)=\mathbb{P}(e^X>e^\epsilon)=\mathbb{P}(e^{tX}>e^{t\epsilon})\stackrel{Markov}{\leq}\frac{\mathbb{E}(e^{tX})}{e^{t\epsilon}}.$

Lemma 2 (Chernoff’s method). Suppose that $a\leq X\leq b$. Then

$\mathbb{E}\left(e^{t X}\right) \leq e^{t \mu} e^{\frac{t^{2}(b-a)^{2}}{8}},$

where $\mu=\mathbb{E}[X]$.

Proof. Assume $\mu=0$ for the simplicity. Since $a\leq X\leq b$, $X$ can be written as a convex combination of $a$ and $b$, $X=\alpha b+(1-\alpha)a$, where $\alpha=(X-a)/(b-a)$. Because $e^x$ is also a convex function,

$e^{tX}\stackrel{convex}{\leq}\alpha e^{tb}+(1-\alpha)e^{ta}=\frac{X-a}{b-a} e^{t b}+\frac{b-X}{b-a} e^{t a}.$

Take the expection of both sides and remember $\mathbb{E}(X)=\mu=0$,

$\mathbb{E}(e^{tX})\leq -\frac{a}{b-a} e^{t b}+\frac{b}{b-a} e^{t a}=e^{g(u)},$

where $u=t(b-a)$, $g(u)=-\gamma u+log(1-\gamma+\gamma e^u)$ and $\gamma=-a/(b-a)$.

Why do we need such a $g$ function? That because it can provide a upper bound of its value. Basic properties of $g(u)$ are $g(0)=0$; the first order derivative is

$g\prime(u)=-\gamma+\frac{\gamma e^u}{1-\gamma+\gamma e^u}=-\gamma+1-\frac{1-\gamma}{1-\gamma+\gamma e^u},$

and $g\prime(0)=0$; and for the second derivative

$g\prime\prime(u)=\frac{\gamma(1-\gamma)e^u}{(1-\gamma+\gamma e^u)^2},$

which has a bound $g\prime\prime(u)\leq1/4$ for all $u>0$. By Taylor’s theorem to the second order, there is a $\xi \in(0, u)$ such that

$g(u)=g(0)+ug\prime(0)+\frac{u^2}{2}g\prime\prime(\xi)\leq\frac{u^2}{8}=\frac{t^2(b-a)^2}{8}.$

Therefore,

$\mathbb{E}(e^{tX})\leq e^{g(u)}\leq e^{\frac{t^2(b-a)^2}{8}}.$

Theorem 4 (Hoeffding’s Inequality). Let $Y_1,\dots,Y_n$ be iid observations such that $\mathbb{E}(Y_i)=\mu$ and $a\leq Y_i\leq b$. Then, for any $\epsilon>0$,

$\mathbb{P}(\mid \bar{Y}_n-\mu\mid\geq\epsilon)\leq2e^{-2n\epsilon^2/(b-a)^2}.$

Proof. For one side $\mathbb{P}(\bar{Y}_n\geq\epsilon)$, for any $t>0$, we have

\begin{aligned} \mathbb{P}(\bar{Y}_n\geq\epsilon)&=\mathbb{P}(\sum_{i=1}^nY_i\geq n\epsilon)=\mathbb{P}(e^{\sum_{i=1}^nY_i}\geq e^{n\epsilon})\\ &=\mathbb{P}(e^{t\sum_{i=1}^nY_i}\geq e^{tn\epsilon})\stackrel{Markov}{\leq}\frac{\mathbb{E}(e^{t\sum_{i=1}^nY_i})}{e^{tn\epsilon}}\\ &=e^{-tn\epsilon}\prod_{i=1}^n\mathbb{E}(e^{tY_i})=e^{-tn\epsilon}(\mathbb{E}(e^{tY_i})^n. \end{aligned}

From Lemma 2, $\mathbb{E}(e^{tY_i})\leq e^{t^2(b-a)^2/8}$. So

$\mathbb{P}(\bar{Y}_n\geq\epsilon)\leq e^{-tn\epsilon+nt^2(b-a)^2/8},$

which can achieve a minimum by setting $t=4\epsilon/(b-a)^2$:

$\mathbb{P}(\bar{Y}_n\geq\epsilon)\leq e^{-2n\epsilon^2/(b-a)^2}.$

By symmetry, it yields the Hoeffding’s inequality.

Example. Let $X_1,\dots,x_n\sim$Bernoulli$(p)$. From Hoeffding’s inequality,

$\mathbb{P}(\bar{X}_n-p)\leq2e^{-2n\epsilon^2}.$

# Expectation Inequalities

Theorem 4 (Cauchy-Schwartz inequality). If $X$ and $Y$ have finite variances, then

$\mid\mathbb{E}(XY)\mid\leq\mathbb{E}\mid XY\mid\leq\sqrt{\mathbb{E}X^2\mathbb{E}Y^2}.$

It can be written as:

$\text{Cov}^2(X,Y)\leq\sigma^2_X\sigma^2_Y.$

Theorem 5 (Jensen’s inequality). If $g$ is convex, then

$\mathbb{E}(g(X))\geq g(\mathbb{E}(X));$

If concave, then

$\mathbb{E}(g(X))\leq g(\mathbb{E}(X)).$

Example. Let $g(x)=x^2$, and it is convex. Then

$\mathbb{E}(X^2)\geq(\mathbb{E}(X))^2.$

Example. KL divergence is defined as

$D_{KL}(p, q)=\int p(x) \log \left(\frac{p(x)}{q(x)}\right) d x.$

If $p=q$, $D_{KL}(p,p)=0$. Else,

\begin{aligned} -D(p, q)&=\mathbb{E} \log \left(\frac{q(X)}{p(X)}\right) \stackrel{Jensen}{\leq} \log \mathbb{E}\left(\frac{q(X)}{p(X)}\right)\\ &=\log\int p(x)\frac{q(x)}{p(x)}dx=0. \end{aligned}

Tags: