Alteration

Left on the cutting floor from my probability notes.

$\newcommand \E {\mathbb E}$ $\newcommand \abs [1] {\left| #1 \right|}$

Context

This technique is best appreciated when we’ve understood the two phenomena below:

(1) Randomly picking an object is decent on average but performs badly near the “tails”:

Problem. (The stamp collector problem) McDonald’s has decided to release a series of $n$ different Pokemon Happy Meal toys.

Show that the expected number of Happy Meals required to collect 90% of all the toys is around $n\ln 10$.

But if we truly ``gotta catch ’em all’’, show that the expected number of Happy Meals required is instead around $n\ln n$.

For large $n$, these are very different!

(2) Stupid-smart duality.

How might we exploit something that does well on average then? Here’s a non-probabilistic method to get you thinking…

Problem. (Iran?) Show that for any subset $A\subset\{1,2,\cdots, n\}$ where any $i,j$ satisfies $\mathrm{lcm} (i,j) \le n$, $|A| \le 2\sqrt{n}$.

The key idea is that $\{n, n/2, n/3, \cdots\}$ is a fairly good construction (at least on first glance), and we can’t do better: in the interval $(n/2, n]$, we can really only pick 1 element of $A$, otherwise we can do a bound like $\mathrm{lcm} (i,j) = \frac{ij}{\gcd(i,j)}\le \frac{ij}{|i-j|} = \frac{1}{|i^{-1} - j^{-1}|} < n$.

Great, so we chop 1 to $n$ into intervals $(n/2,n], (n/3,n/2],...$ but then we get $n$ intervals! What happened?

Turns out, some of the later intervals don’t contain integers (e.g $(\frac{n}{n-1}, \frac{n}{n-2}]$). So below some cutoff, we should switch over and do something stupid: above $(n/k)$, we do $k$ intervals; below $(n/k)$, we count everything.

This gives that $|A|\le k + (n/k)$, which is minimized at around $k\approx \sqrt{n}$.

You could imagine that in the very last example, we had a key idea that was probabilistic, but didn’t perform well near the end. Then, we can switch over to some stupid (but failsafe) arguments to get a decent overall argument.

Example. (Weak Turán) A graph $G$ has $n$ vertices and average degree $d .$ Prove that it is possible to select an independent set of size at least $\frac{n}{2 d}.$ (An independent set is a subset of vertices with no edges between them.)

Solution. Select each vertex with probability $p$. This has a very bad shot at being independent: we will need each edge to not have its endpoints included, which comes to $(1-p^2)^{|E|} = (1-p^2)^{nd}$. You can get to around $O(\sqrt{nd})$ with this, but it’s not really our thing.

Instead, once we randomly pick our vertex-set (having $V$ vertices and $E$ induced edges), we can force it to be an independent set by sequentially taking each edge and deleting one of its endpoints. This removes at most $E$ vertices, so the resulting size of the subset is $|S|\ge V-E$.

Usually, this would be a terrible bound, but we can play around with the probability $p$:

$$\E{|S|} \ge \E{V-E} = np - \frac{1}{2}ndp^2 = np\left(1-\frac{1}{2}dp\right)$$

Optimizing our $p$ (best at $p=1/d$) gets us the bound. $\blacksquare$

Here’s the framework for the general argument: suppose that we want to find a set of objects $S$ of a certain size $N$ satisfying some property. Instead of picking a random set of size $N$ and hoping that it works (which it will not), we will

Initially pick each object with an undetermined probability $p$ to produce a set $S'$ that may not satisfy the property
Prune set $S'$ into set $S$ by removing objects until the desired property is satisfied
Estimate $\E{\abs{S}}$, and optimize the value of $p$.

David Kewei Lin

Welcome to my personal site!

Alteration

Context