Mathematical Terms and Identities


Thanks to Andy Nguyen and Julie Tibshirani for their help writing this document!

This handout covers mathematical concepts, notation, and identities that may be useful over the course of CS166. The topics included here are usually taught over a combination of CS103, CS107, CS109, and CS161, though because the content in those classes shifts from quarter to quarter there’s a chance that you may not have seen everything here. If that’s the case, no worries! We’ll try to refresh the concepts as they arise in the quarter. On the other hand, if the majority of the topics here seem unfamiliar, you might want to reach out to us to make sure that you have the right prerequisites for the course.

Set Theory

The set of all natural numbers is denoted $\naturals\text.$ We include 0 as a natural number, so $\naturals = \set{0, 1, 2, 3, \ldots}\text.$

The set $\integers$ consists of all integers: $\integers = \set{ \ldots, -2, -1, 0, 1, 2, \ldots}\text.$

The set $\reals$ consists of all real numbers.

The set $\emptyset$ is the empty set consisting of no elements.

If $x$ is an element of set $S\text,$ we write $x \in S\text.$ If $x$ is not an element of $S\text,$ we write $x \notin S\text.$

The union of two sets $S_1$ and $S_2$ is denoted $S_1 \cup S_2\text.$ Their intersection is denoted $S_1 \cap S_2\text,$ difference is denoted $S_1 - S_2$ or $S_1 \ S_2\text,$ and symmetric difference is denoted $S_1 \symdiff S_2\text.$

If $S_1$ is a subset of $S_2\text,$ we write $S_1 \subseteq S_2\text.$ If $S_1$ is a strict subset of $S_2\text,$ we denote this by writing $S_1 \subsetneq S_2\text.$

The power set of a set $S$ (denoted $\powerset{S}\text)$ is the set of all subsets of $S\text.$ That is, $\powerset{S} = \set{ T \suchthat T \subseteq S }\text.$

The Cartesian product of two sets $S_1$ and $S_2$ is the set $S_1 \times S_2 = \set{ (a, b) \suchthat a \in S_1 \land b \in S_2 }\text.$

First-Order Logic

The negations of the basic propositional connectives are as follows:

\[\begin{array}{lcl} \lnot(\lnot p) & \equiv & p \\ \lnot(p \land q) & \equiv & \lnot p \lor \lnot q \\ \lnot(p \lor q) & \equiv & \lnot p \land \lnot q \\ \lnot(p \to q) & \equiv & p \land \lnot q \\ \lnot(p \leftrightarrow q) & \equiv & p \leftrightarrow \lnot q \end{array}\]

The negations of the $\forall$ and $\exists$ quantifiers are as follows:

\[\begin{aligned} \lnot \forall x. \phi & \ \ \equiv \ \ \exists x. \lnot \phi \\ \lnot \exists x. \phi & \ \ \equiv \ \ \forall x. \lnot \phi \\ \end{aligned}\]

The statement “iff” abbreviates “if and only if.”

Summations

The sum of the first $n$ natural numbers $(0 + 1 + 2 + \ldots + n - 1)$ is given by

\[\sum_{i = 0}^{n-1}i = \frac{n(n - 1)}{2}\text.\]

The sum of the first $n$ terms of the arithmetic series $a, a + b, a + 2b, \ldots a + (n – 1)b$ is

\[\sum_{i = 0}^{n-1}{\left(a + bi\right)} = an + \frac{bn(n - 1)}{2}\text.\]

The sum of the first $n$ terms of the geometric series $r^0 + r^1 + r^2 + \ldots + r^{n-1}$ is given by

\[\sum_{i = 0}^{n-1}{r^i} = \frac{r^n - 1}{r - 1}\text.\]

As a useful special case, when $r = 2\text,$ we have

\[\sum_{i = 0}^{n-1}{2^i} = 2^n - 1\text.\]

In the case that $\abs{r} \lt 1\text,$ the sum of the infinite geometric series is given by

\[\sum_{i = 0}^{\infty}{r^i} = \frac{1}{1 - r}\text.\]

The following summation often arises in the analysis of algorithms: when $\abs{r} \lt 1\text,$ we have

\[\sum_{i = 0}^{\infty}{ir^i} = \frac{r}{\left(1 - r\right)^2}\text.\]

Classical Inequalities

The following identities are useful for manipulating inequalities:

  • If $A \le B$ and $B \le C\text,$ then $A \le C\text.$
  • If $A \le B$ and $C \ge 0\text,$ then $CA \le CB\text.$
  • If $A \le B$ and $C \le 0\text,$ then $CA \ge CB\text.$
  • If $A \le C$ and $B \le D\text,$ then $A + B \le C + D\text.$
  • If $A, B \in \integers\text,$ then $A \le B$ iff $A \lt B + 1\text.$
  • If $f$ is a monotonically increasing function and $A \le B\text,$ then $f(A) \le f(B)\text.$
  • If $f$ is a monotonically decreasing function and $A \le B\text,$ then $f(A) \ge f(B)\text.$

The following inequalities are often useful in algorithmic analysis:

\[e^x \ge 1 + x\text.\] \[\sqrt[n]{x_1 x_2 \dots x_n} \le \frac{x_1 + x_2 + \dots + x_n}{n}\text.\]

Floors and Ceilings

The floor function $\floor{x}$ denotes the largest integer less than or equal to $x\text.$ The ceiling function $\ceil{x}$ denotes the smallest integer greater than or equal to $x\text.$ These functions obey the rules

\[\floor{x} \le x \lt \floor{x} + 1 \quad \text{and} \quad \floor{x} \in \integers\text.\] \[\ceil{x} - 1 \lt x \le \ceil{x} \quad \text{and} \quad \ceil{x} \in \integers\text.\]

Additionally, $\floor{x + n} = \floor{x} + n$ and $\ceil{x + n} = \ceil{x} + n$ for any $n \in \integers\text.$

Asymptotic Notation

Let $f, g : \naturals \to \naturals\text.$ Then we say that

  • $f = O(g)$ when

    \[\exists n_0 \in \naturals. \exists c \in \reals. \forall n \in \naturals. (n \ge n_0 \to f(n) \le cg(n))\text,\]
  • $f = \Omega(g)$ when

    \[\exists n_0 \in \naturals. \exists c \in \reals. \forall n \in \naturals. (n \ge n_0 \to f(n) \ge cg(n))\]
  • $f = \Theta(g)$ when $f = O(g)$ and $f = \Omega(g)\text,$

  • $f = o(g)$ when

    \[\forall \varepsilon \gt 0.\ \exists n_0 \in \naturals. \forall n \in \naturals. (n \ge n_0 \to f(n) \lt \varepsilon \cdot g(n))\text{, and}\]
  • $f = \omega(g)$ when

    \[\forall c \gt 0.\ \exists n_0 \in \naturals. \forall n \in \naturals. (n \ge n_0 \to f(n) \gt c \cdot g(n))\text.\]

When multiple variables are involved in an expression, big-O notation generalizes as follows: we say that $f(x_1, \ldots, x_n) = O(g(x_1, \ldots, x_n))$ if there are constants $N$ and $c$ such that for any $x_1 \ge N, x_2 \ge N, \ldots, x_n \ge N\text,$ we have $f(x_, \ldots, x_n) \le c \cdot g(x_1, \ldots, x_n)\text.$

We also have the following identities, which also work for $\Omega\text,$ $\Theta\text,$ $\omega\text,$ and $o\text:$

  • If $f = O(g)$ and $g = O(h)\text,$ then $f = O(h)\text.$
  • If $f_1 = O(g)$ and $f_2 = O(g)\text,$ then $f_1 + f_2 = O(g)\text.$
  • If $f_1 = O(g_1)$ and $f_2 = O(g_2)\text,$ then $f_1 f_2 = O(g_1 g_2)\text.$

For any fixed constants $a, b \ge 1\text,$ we have that $\log_a n = \Theta(\log_b n).$

Any polynomial of degree $k$ with positive leading coefficient is $\Theta(n^k)\text.$

For any $k \gt 0\text,$ we have $\log n = o(n^k)\text.$

For any $k \in \reals$ and any $b \gt 1\text,$ we have $n^k = o(b^n)\text.$

For any $1 \lt b \lt c\text,$ we have $b^n = o(c^n)\text.$

Graphs

In a graph $G = (V, E)\text,$ $n$ denotes the number of nodes $\abs{V}$ and $m$ denotes the number of edges $\abs{E}\text.$

In any graph, $m = O(n^2)\text.$ A dense graph is one where $m = \Theta(n^2)$ and a sparse graph is one where $m = o(n^2)\text.$

The Master Theorem

If $a\text,$ $b\text,$ and $d$ are constants, then the recurrence relation

\[T(n) = aT\left(\frac{n}{b}\right) + O(n^d)\]

solves as follows:

\[T(n) = \begin{cases} O\left(n^d\right) & \text{ if } \log_b a \lt d \\ O\left(n^d \log n\right) & \text { if } \log_b a = d \\ O\left(n^{\log_b a}\right) & \text { if } \log_b a \gt d \end{cases}\]

Logarithms and Exponents

Logarithms and exponents are inverses of one another:

\[a^{\log_a x} = x \qquad \log_a a^x = x\text.\]

The change-of-base formula for logarithms states that

\[\log_a x = \frac{\log_b x}{\log_b a}\text.\]

Sums and differences of logarithms translate into logarithms of products and quotients:

\[\log ab = \log a + \log b \qquad \log \frac{a}{b} = \log a - \log b\text.\]

The power rule for logarithms states

\[\log x^c = c \log x\text.\]

In some cases, exponents may be interchanged:

\[(a^b)^c = a^{bc} = (a^c)^b\text.\]

We can change the base of an exponent using the fact that logarithms and exponents are inverses:

\[a^n = b^{n \log_b a}\text.\]

We will sometimes make use of the fact that

\[\lim_{n \to \infty} \left(1 - \frac{1}{n}\right)^n = \frac{1}{e}\text.\]

In particular, note that for any $n \ge 1$ that

\[0 \le \left(1 - \frac{1}{n}\right)^n \le \frac{1}{e}\text.\]

Probability

If $\mathcal{E}_1$ and $\mathcal{E}_2$ are mutually exclusive events, then

\[\Pr[\mathcal{E}_1] + Pr[\mathcal{E}_2] = Pr[\mathcal{E}_1 \cup \mathcal{E}_2]\text.\]

For any events $\mathcal{E}_1, \mathcal{E}_2, \mathcal{E}_3, \ldots,$ including overlapping events, the union bound states that

\[\Pr[\bigcup_{i=1}^\infty \mathcal{E}_i] \le \sum_{i=1}^\infty \Pr[\mathcal{E}_i]\text.\]

The probability of $A$ given $B$ is denoted $\Pr[A \suchthat B]$ and is given by

\[\Pr[A \suchthat B] = \frac{\Pr[A \cap B]}{\Pr[B]}\text.\]

The chain rule for conditional probability states that

\[\Pr[A \cap B] = \Pr[A \suchthat B] \Pr[B]\text.\]

Two events $\mathcal{E}_1$ and $\mathcal{E}_2$ are called independent if

\[\Pr[\mathcal{E}_1 \cap \mathcal{E}_2] = \Pr[\mathcal{E}_1] \Pr[\mathcal{E}_2]\text.\]

For any event $\mathcal{E}\text,$ the complement of that event (denoted $\overline{\mathcal{E}}\text)$ represents the event that $\mathcal{E}$ does not occur. $\mathcal{E}$ and $\overline{\mathcal{E}}$ are mutually exclusive, and

\[\Pr[\mathcal{E}] + \Pr[\overline{\mathcal{E}}] = 1\text.\]

Expected Value

The expected value of a discrete random variable $X$ is defined as

\[\E[X] = \sum_{i} i \cdot \Pr[X = i]\text.\]

The expected value operator is linear: for any $a, b \in \reals$ and any random variable $X\text,$ we have

\[\E[aX + b] = a\E[X] + b\text.\]

If $X_1, X_2, X_3, \ldots X_n$ is a finite collection of random variables, linearity of expectation tells us that

\[\E\left[\sum_{i=1}^n X_i\right] = \sum_{i=1}^n \E[X_i]\text.\]

Variance and Covariance

The variance of a random variable $X$ is defined as

\[\Var[X] = \E[(X - E[X])^2] = \E[X^2] - \E[X]^2\text.\]

Accordingly, for any random variable $X\text,$ notice that

\[\Var[X] \le E[X^2]\text.\]

Given two random variables $X$ and $Y$, the covariance of $X$ and $Y$ is defined as

\[\Cov[X, Y] = \E[(X - \E[X])(Y - \E[Y])] = \E[XY] - \E[X]\E[Y]\text.\]

Accordingly,

\[\Var[X] = \Cov[X, X]\text.\]

Variance is not a linear operator:

\[Var[aX + bY] = a^2 \Var[X] + 2ab \Cov[X, Y] + b^2 \Var[Y]\text.\]

The variance of a summation of random variables, including dependent variables, can be simplified using the following rule:

\[\Var\left[\sum_{i=1}^n{X_i}\right] = \sum_{i=1}^n \Var[X_i] + \sum_{i \ne j}^n{\Cov[X_i, X_j]}\text.\]

Two random variables $X$ and $Y$ are called uncorrelated if

\[\E[XY] = \E[X]\E[Y]\text.\]

Equivalently, the random variables $X$ and $Y$ are uncorrelated if $\Cov[X, Y] = 0\text.$

Any two independent random variables are uncorrelated, but uncorrelated random variables may not be independent of one another.

If $X_1, X_2, \ldots, X_n$ are uncorrelated random variables, then

\[\Var\left[\sum_{i=1}^n{X_i}\right] = \sum_{i=1}^n \Var[X_i]\text.\]

Concentration Inequalities

Markov's inequality says that for any nonnegative random variable $X$ with finite expected value and any $c \gt 0\text,$ we have

\[\Pr[X \ge c] \le \frac{\E[X]}{c}\text.\]

Chebyshev's inequality states that for any random variable $X$ with finite expected value and variance that

\[\Pr[\abs{X - \E[X]} \ge c] \le \frac{\Var[X]}{c^2}\text.\]

A special case of the Chernoff bound says that if $X \sim \text{Binom}(n, p)$ for $p \le \frac{1}{2}\text,$ then

\[\Pr\left[X \ge \frac{n}{2}\right] \le e^{\frac{-n\left(\frac{1}{2} - p\right)^2}{2p}}\text.\]

In the case where $p$ is a fixed constant, notice that the right-hand side is $e^{-cn}$ for some $c \gt 0\text.$

Indicator Variables

An indicator random variable for an event $\mathcal{E}$ is a random variable $\mathbb{1}_\mathcal{E}$ where

\[\mathbb{1}_\mathcal{E} = \begin{cases} 1 & \text{ if event } \mathcal{E} \text{ occurs } \\ 0 & \text{ otherwise} \end{cases}\]

For any indicator variable, we have $\E[\indicator{\mathcal{E}}] = \Pr[\mathcal{E}]\text.$

Every indicator satisfies $\indicator{\mathcal{E}}^2 = \indicator{\mathcal{E}}\text,$ which means that

\[\Var[\mathbb{1}_\mathcal{E}] \le \Pr[\mathcal{E}]\text.\]

Harmonic Numbers

The nth harmonic number, denoted $H_n\text,$ is given by

\[H_n = \sum_{i=1}^n{\frac{1}{i}}\text.\]

The harmonic numbers are close in value to $\ln n\text:$ for any $n \ge 1\text,$ we have

\[\ln (n + 1) \le H_n \le \ln n + 1\text.\]

This implies that $H_n = \Theta(\log n)\text.$

Binary XOR

The *xor (“exclusive or”) operation, denoted $x \oplus y\text,$ takes in two bits $x$ and $y$ and evaluates to 1 if $x$ and $y$ are different and $0$ if x and y are the same.

The xor operator has 0 as an identity:

\[x \oplus 0 = x = 0 \oplus x\text.\]

The xor operator is self-inverting:

\[x \oplus x = 0\text.\]

Xor is also associative and commutative:

\[x \oplus y = y \oplus x\text.\] \[(x \oplus y) \oplus z = x \oplus (y \oplus z)\text.\]

While xor is nominally defined on pairs of bits, it can be extended to work on pairs of integers as well. That operation works by xoring the corresponding pairs of bits of the numbers in question, and has the same algebraic properties described above (where, in this case, 0 would be interpreted as “a number whose bits are all zeroes”).