In the Guide to Induction, we outlined a number of specific issues and concepts to be mindful about when writing inductive proofs. This proofwriting checklist distills down those concepts to smaller number of specific points that you should keep an eye out for when writing up your inductive proofs:
- Make $P(n)$ a predicate, not a number or function.
- Watch your variable scoping in P(n).
- “Build up” if $P(n)$ is existentially-quantified; “build down” if it’s universally-quantified.
- Choose the simplest base cases possible, and avoid redundant base cases.
The remainder of this page goes into more detail about what each of these rules mean.
Make $P(n)$ a Predicate, Not a Number or Function
The principle of induction states that if you have a predicate $P$ and the following are true:
- $P(0)$
- $\forall k \in \naturals. (P(k) \to P(k+1))$
then you can conclude that $\forall n \in \naturals. P(n)$ must be true.
It’s important to note that $P$ has to be a predicate for any of the above statements to be syntactically valid. Forgetting for the moment that we’re dealing with induction, in First-Order Logic Land the statement $P(0)$ can only be true or false if $P$ is a predicate, and the statement $P(k) \to P(k+1)$ only makes sense if $P(k)$ and $P(k+1)$ evaluate to truth values, which only happens when $P$ is a predicate.
One of the most common mistakes we see in inductive proofs is to define $P$ as something that isn’t a predicate. For example, on the recurrence relations problem from Problem Set Five, if you want to prove that $a_n = 2^n$ for every $n \in \naturals$, you should not define $P(n)$ like this:
âš $P(n) = 2^n$ âš
This doesn’t work because $P$ isn’t a predicate; the quantity $2^n$ isn’t something that evaluates to true or false. If we try using this $P$ in the definition of mathematical induction given above, we’d say that if $2^0$ is true and $\forall k \in \naturals. (2^k \to 2^{k+1})$ is true, then $\forall n \in \naturals. 2^n$ is true. The statement “$2^0$ is true” isn’t mathematically meaningful, and we can’t apply the $\to$ connective between the terms $2^k$ and $2^{k+1}$ because they’re quantities, not Booleans.
So before you submit your proofs, double-check that you’ve actually chosen $P$ as a predicate. Just ask whether it’s something that can be true or false (good!), or whether it’s a quantity of some sort (bad!).
Watch Your Variable Scoping in $P(n)$
Another common mistake we see people make when defining the predicate $P$ is to write something like this:
âš $P(n)$ is “for any $n \in \naturals$, we have $a_n = 2^n$.” âš
Why exactly isn’t this correct? After all, $P$ is indeed a predicate: it either evaluates to true or false.
The issue here is one of variable scoping. To illustrate this, consider the following piece of C++/Java-esque code:
void doSomething(int n) {
for (int n = 0; n < 137; n++) {
// ... do something with n ...
}
}
There’s something weird about this code. This function takes in a parameter called n
, which is supposed to be set by the person calling the function, but it then immediately proceeds to declare a new variable n
inside the for loop. Depending on your programming language, this is either (1) really bad style or (2) a compile-time error.
The predicate $P$ defined above makes the same mistake as this code. The problem is that there are two different variables $n$ here: there’s the variable $n$ that’s the argument of the predicate (kinda like the parameter $n$ to the doSomething
function), and then there’s the variable $n$ introduced by “for any $n$” (kinda like the variable $n$ defined in the for
loop). And just as the meaning of the code is either “uh, that’s really weird” or “that’s not even legal” (depending on which language you’re using), mathematically the predicate $P$ defined above is either “uh, that’s really weird” or “that’s not even legal” (depending on who you ask).
Treat the argument to the predicate $P$ like an argument to a function. The caller specifies it, and you shouldn’t say something like “for any $n$” or “for some $n$” inside of the definition of $P(n)\text.$ Remember, induction ultimately lets you conclude that $P(n)$ is true for all $n \in \naturals$, and so the “for all $n \in \naturals$” part is provided externally to the predicate $P$.
“Build Up” if $P(n)$ is Existentially Quantified; “Build Down” if $P(n)$ is Universally-Quantified
Let’s look at two of the inductive proofs we did in lecture: the proof that any square can be subdivided into $n$ squares for any $n \ge 6$, and that in any group of $3^n$ coins a counterfeit can be found in $n$ weighings. If you look at the high-level structure of those proofs, you’ll see that their inductive steps differ in a key specific way. In the proof that a square can be subdivided into smaller squares, the inductive step starts off with a subdivision of the square into $k$ squares, then uses that to form a subdivision of the square into $k+3$ squares. In the proof about counterfeit coins, the inductive step starts with a group of $3^{k+1}$ coins, then breaks it apart into three groups of $3^k$ coins each.
It seems like these proofs are going in opposite directions. In the square case, we start with a smaller object (size $k$) and grow it into a larger object (size $k+3$). In the coin case, we start with a larger object (size $3^{k+1}$) and shrink it into smaller objects (sizes $3^k$). What’s going on here?
The answer has to do with how the predicate $P$ is defined in each case. In the case of the square subdivision problem, then predicate $P$ is (implicitly) existentially-quantified. That is, if we write it in a hybrid English/FOL notation, we’d get something like this:
$P(n)$ is “$\exists S. (S \text{ is a subdivision of a square } \land S \text{ has } n \text{ squares})$”
In the case of the counterfeit coin problem, the predicate $P$ is (implicitly) universally-quantified. Written in a hybrid English/FOL notation, it looks like this:
$P(n)$ is “$\forall C. (C \text{ is a group of } 3^n \text{ coins } \to C\text{’s counterfeit can be found in } n \text{ weighings})$”
(Please don’t actually write your predicates this way – it’s just for expository purposes.)
Notice that in the square case, the predicate $P$ is existentially-quantified, and in the coin case, the predicate $P$ is universally-quantified. That’s a really important distinction, and it explains why we work the way we do.
To see this, imagine we have some predicate $P$ that looks like this:
$P(n)$ is “there is an $X$ of size $n$ where $(\dots)$”
If we want to prove that $P(n)$ is true for all $n \in \naturals$, then in our inductive step we’d need to prove that
\[\begin{aligned} &\text{If} \\ &\quad \text{there is an } X \text{ of size } k\text{ where } (\dots) \text, \\ & \text{then} \\ & \quad \text{there is an } X' \text{ of size } k+1 \text{ where } (\dots) \text. \end{aligned}\]Take a step back for a minute and think, abstractly, about how you prove something like this. We begin by assuming the antecedent. Since the antecedent is existentially-quantified, it means that we’re assuming we have some honest-to-goodness object $X$ lying around of size $k$ that has some specific property. In other words, we can think of assuming the antecedent in this case as starting off with some concrete object in hand. (This is the standard procedure for assuming an existentially-quantified statement; check the Guide to Proofs on Discrete Structures for more information.)
We then need to prove the consequent, which is an existential statement. This means that we need to somehow show that, somewhere out there, there’s an object $X'$ of size $k+1$ with some property. Given that we already have a handy object $X$ lying around, it’s reasonable to check whether we can start with that object $X$ and then extend it into something larger.
In other words, we start with a smaller object (one of size $k$), then show how to convert it to a bigger object (one of size $k+1$).
On the other hand, if we have a predicate P that looks like this:
$P(n)$ is “for any object $X$ of size $n$, $X$ has property $(\dots)$”
then the inductive step would look like this:
\[\begin{aligned} &\text{If} \\ &\quad \text{for any } X \text{ of size } k\text{, } X \text { has property } (\dots) \text, \\ & \text{then} \\ & \quad \text{for any } X' \text{ of size } k+1\text{, } X' \text { has property } (\dots) \text. \end{aligned}\]Think about how you prove a statement like this. We’d begin by assuming the antecedent. Since the antecedent is universally-quantified, we can think of it as a tool. It says “hey, as soon as you can actually find me an $X$ of size $k$, I can step in and help you show that it has to some some key property.” However, at the moment we don’t actually have an object of size $k$ lying around. Think of it as if we have a hammer rather than a nail – once we find a nail we can hammer it, but assuming the antecedent doesn’t actually give us a nail to hit. (You've seen this before - the general principle for assuming a universally-quantified statement is to hold tight and keep an eye out for a place to apply the universally-quantified statement.)
Next, we go to prove the consequent. The consequent is universally-quantified, and so if we want to prove it by a direct proof, we’d begin by choosing some object $X'$ of size $k+1$, then trying to prove it has some property. So now we have an actual choice of object $X'$ in hand. It’s not quite the right shape for our antecedent (the antecedent only works when we get objects of size $k$, not $k+1$), so it’s reasonable to try to see if we can produce some smaller object $X$ out of our object $X'$. That way, we can use our antecedent to learn something new about $X$, which might in turn tell us something about $X'$.
In other words, we start with a larger object (one of size $k+1$) and then try to convert it to a smaller object (one of size $k$).
The difference between these two approaches is purely a difference in what quantifier is at the front of the predicate $P$. If it’s existentially-quantified, we start with something smaller and then try to grow it. If it’s universally-quantified, we start with something larger and shrink it down.
When you’re writing up your proofs – and when you’re still in the problem-solving stage – make sure that you keep these points in mind. Reread your proofs and check for the directionality in your induction – if you’re going in a different direction than what we’re proposing here, it likely means that there’s a logic error in your proof.
The above advice applies to cases where your predicate $P(n)$ is universally-quantified or existentially-quantified. But what if it’s neither of these? For example, what happens if you have a predicate like this?
$P(n)$ is the statement “the sum of the first $n$ powers of two is $2^n - 1$.”
This statement has no quantifiers on it at all. But not to worry! The general rule here is to write out the implication we’d need to prove in the inductive step. And that implication is this one here:
\[\begin{aligned} &\text{If} \\ &\quad 2^0 + 2^1 + 2^2 + \dots + 2^{k-1} = 2^k - 1 \text, \\ & \text{then} \\ & \quad 2^0 + 2^1 + 2^2 + \dots + 2^{k-1} + 2^k = 2^{k+1} - 1 \text. \end{aligned}\]What would we need to do to prove a result like this one? There’s a bunch of options actually. First, there’s the approach we took in lecture:
- Begin by writing out the first $k+1$ powers of two: $2^0 + 2^1 + \dots + 2^{k-1} + 2^k \text.$
- Recognize that we can group this as $(2^0 + 2^1 + \dots + 2^{k-1}) + 2^k \text.$
- Use our inductive hypothesis, that $2^0 + 2^1 + \dots + 2^{k-1} = 2^k - 1 \text,$ to rewrite this as $2^k + 2^k - 1\text.$
- Invoke the Great Power of Algebra to simplify this to $2^{k+1} - 1 \text.$
Here, we start with an unknown quantity (the sum of the first $k+1$ powers of two), then use a chain of equational reasoning to turn it into a different quantity ($2^{k+1} - 1$). This approach is fine, since at no point do we assume something we don’t yet know to be true.
However, this isn’t the only we we could have proved this result. A different route would be to do something like this:
- Begin with the inductive hypothesis ($2^0 + 2^1 + \dots + 2^{k-1} = 2^k - 1$), which is an equation we’re assuming to be true.
- Add $2^k$ to both sides, giving $2^0 + 2^1 + \dots + 2^{k-1} + 2^k = 2^k - 1 + 2^k$.
- Recognize that the left-hand side is the sum of the first $k+1$ powers of two and that the right-hand side is (via Clever Algebra) equal to $2^{k+1} - 1 \text.$
This would also be perfectly fine, since at no point do we assume some equality we don’t know to be true.
In summary, the real rule here is to make sure you’re always following the general principles for how to prove an implication. You always assume the antecedent is true and then prove the consequent. The shorthands of “induct up” versus “induct down” are just that - shorthands - rather than some new rule that applies only in the case of induction. When in doubt, write it out!
Choose the Simplest Base Cases Possible, and Avoid Redundant Base Cases
All inductive proofs need to kick off the induction somewhere, and that’s the job of the base case. Choosing the “right” base case is important to the proof, both in terms of correctness and in terms of proofwriting style. At the same time, choosing the right base case can be tricky, because inductive base cases often consider cases that are so small or degenerate that they bear little resemblance to the overall problem at hand.
The Guide to Induction goes into a bunch of detail about the importance and value of choosing the right base case, and in the interest of space we won’t repeat all of those details here. What we can do, though, is provide a series of questions you should think through to make sure that you’ve picked your base cases intelligently:
-
Are all the numbers you need to have covered covered? For example, if you’re going to prove that something is true for all natural numbers, you’ll need to ensure that you’ve covered the zero case, so you’d probably want to pick a base case like $n = 0$. If you start later than this, it’s likely a logic error.
-
Have you thought through degenerate cases? Often times, the base case of an inductive proof involves an extreme sort of edge case (a set of no elements, an implication that’s vacuously true, a sum of no numbers, a graph with no nodes, etc.) It can feel really, really weird working with cases like these the first time that you’re exposed to them, but it gets a lot easier with practice. If you’re uncertain whether the result is true in some extreme case, it’s best to ask the course staff for input. You’ll build a good intuition for what works and what doesn’t over time.
-
Does the logic in your inductive step apply to your base case? Once you’ve picked your base case, look at the reasoning in it. Is the reasoning there just a special case of the reasoning in your inductive step? If so, that might indicate that you haven’t actually picked the simplest possible base case and that you might be able to pick an even simpler base case.
-
If you have multiple base cases, are they truly necessary? Some proofs by induction really do need multiple base cases, such as the square subdivision problem from lecture (where we need one base case for each remainder modulo three). However, most of the time we see proofs that have multiple base cases in them, we find that at least one of them isn’t necessary. Often times, that extra base case can be eliminated by asking question (3) from this list.
Elaborating more on points (3) and (4), here’s a lovely quote often attributed to Albert Einstein:
"Everything should be as simple as possible, but no simpler."
This quite nicely summarizes our philosophy on base cases in inductive proofs: include as many of them as are necessary to get the proof going, no more, and no fewer.
Most, but not all, inductive proofs will need just a single base case. For example, when proving that the sum of the first $n$ powers of two is $2^n - 1$, we just proved one base case, that the sum of zero powers of two is equal to $2^0 - 1$. Similarly, our proof about finding the counterfeit coin used only a single base case, namely, that with zero weighings you can find the counterfeit out of a group of one coin.
These proofs can’t be simplified by removing those base cases, since they’re needed to kick off the induction. And since their inductive steps work just fine starting from the $n=0$ case, there’s no reason to add in other base cases. And, in fact, adding in more base cases here wouldn’t contribute anything new. It would just lengthen the proof and increase the surface area in the proof where a mistake could occur.
Some proof approaches do require multiple base cases. Our proof about square subdivisions needed three base cases because we were taking steps of size three and needed to cover all the possible remainders modulo three. If we removed any of the proofs of $P(6)$, $P(7)$, or $P(8)$, then we wouldn’t cover all natural numbers $n \ge 6$. However, adding in more base cases (say, $P(9)$, $P(10)$, or $P(11)$) wouldn’t accomplish anything new. It would just add length to the proof and increase the likelihood of an error.