Who wrote the Federalist Papers?

Written by Lisa Yan and Chris Piech

Introduction: Publius

The Federalist Papers was a body of 85 essays advocating ratification of the US constitution. The pseudonymous author "Publius" actually referred to Alexander Hamilton, James Madison, and John Jay.

Using probability, we can determine who wrote each of the essays in the Federalist Papers by analyzing the probability of the words in the essay and comparing them against the word distributions in known writings from Hamilton, Madison, and Jay. This approach is known more generally as the "Bag of Words" model--in other words, we ignore sentence structure and word ordering in favor of comparing just word frequency.

In this demo, we seek to decide whether James Madison or Alexander Hamilton was the author of Federalist No. 53, the fifty-third of The Federalist Papers (unknown.txt). We have two known writing samples from Madison (madison.txt) and Hamilton (hamilton.txt), from which we can generate author-specific word frequencies. We then model the unknown document as a multinomial, where each author has some probability of generating each word in the document, and these probabilities can be different depending on the author. Given the document word frequencies, if the author is more likely to be Madison than Hamilton, we report Madison as the author.


  • madison.txt: Federalist No. 10
  • hamilton.txt: Federalist No. 11
  • unknown.txt: Federalist No. 53
In [1]:
import csv
import operator
import math

# The below functions are not shown
from helper import makeWordProbMap, makeWordCountMap

Step 1: Generate two probability lookups from known writings.

Do once each for the two writers, Madison and Hamilton:

  • Go through a document and make a count of how many times each word appears.
  • Create a probability lookup wordProbMap that stores $P(word|writer)$.

  • makeWordProbMap(textfile): creates a map of word -> probability.

  • Use getWordProb(wordProbMap, word) to return $P(word|writer)$, where $writer$ is the author corresponding to wordProbMap.
In [2]:
# Calculate all the ps and qs
madisonWordProb = makeWordProbMap("madison.txt")
hamiltonWordProb = makeWordProbMap("hamilton.txt")

To use when retrieving words from our two word maps
    Return probability of a given word.
    If the word was not found, return some small probability epsilon.
EPSILON = 0.000001
def getWordProb(wordProbMap, word):
    if word in wordProbMap:
        return wordProbMap[word]
    return EPSILON

print("P(congress|madison) =", getWordProb(madisonWordProb, "congress"))
print("P(congress|hamilton) =", getWordProb(hamiltonWordProb, "congress"))

print("P(lisa|madison) =", getWordProb(madisonWordProb, "lisa"))
print("P(the|madison) =", getWordProb(madisonWordProb, "the"))
print("P(the|hamilton) =", getWordProb(hamiltonWordProb, "the"))
P(congress|madison) = 0.00016016229779509903
P(congress|hamilton) = 0.0011592117360195067
P(lisa|madison) = 1e-06
P(the|madison) = 0.09337461961454274
P(the|hamilton) = 0.07950593596354479

Step 2: Generate the word counts from the unknown document.

  • makeWordCountMap(textfile): creates a map of word -> count.
In [3]:
unknownDocCount, nDocWords = makeWordCountMap('unknown.txt')
print('# words in unknown.txt:', nDocWords)
print('# unique words in unknown.txt:', len(unknownDocCount))
print("# of times \"congress\" appears in unknown.txt:", unknownDocCount["congress"])
print("# of times \"the\" appears in unknown.txt:", unknownDocCount["the"])
# words in unknown.txt: 2172
# unique words in unknown.txt: 637
# of times "congress" appears in unknown.txt: 1
# of times "the" appears in unknown.txt: 193

Step 3: Bayes' Theorem simplification: compute $P(unknownDoc|writer)$ for each writer.

Bayes' Theorem says:

$P(writer|unknownDoc) = \dfrac{P(unknownDoc|writer)P(writer)}{P(unknownDoc)}$

  • However, since we are computing a ratio of two probabilities, we can cancel out many terms.

    $\dfrac{P(unknownDoc|Madison)}{P(unknownDoc|Hamilton)} > 1 \rightarrow \text{Madison wrote document}$

The distribution of word counts in an unknown document (conditioned on knowing the writer) is a Multinomial RV. Since the multinomial coefficients are identical in both numerator and denominator, these also cancel.

  • Ultimately, we can compute a ratio of the product of probabilities of observing each word given each author wrote it:

    $P(unknownDoc|Madison) \propto \Pi_{i=1}^m \left( p_{\text{M}, i}^{\text{# apperances of word }i \text{ in unknown}} \right)$

In [4]:
def calcProbDoc(wordProbMap, countMap):
    prob = 1
    for i, word_i in enumerate(countMap):
        c_i = countMap[word_i]
        p_i = getWordProb(wordProbMap, word_i)
        if i < 10:
            print(word_i, "appeared", c_i, "times. prob:", math.pow(p_i, c_i))
        prob *= math.pow(p_i, c_i)
    return prob

print('P(doc|madison) is proportional to:')
pMadison = calcProbDoc(madisonWordProb, unknownDocCount)
print('P(doc|hamilton) is proportional to:')
pHamilton = calcProbDoc(hamiltonWordProb, unknownDocCount)
print('madison: \t\t',pMadison)
print('hamilton: \t\t', pHamilton)
P(doc|madison) is proportional to:
to appeared 73 times. prob: 1.1250905510117391e-110
the appeared 193 times. prob: 1.7954247353201409e-199
people appeared 7 times. prob: 3.0763612651710167e-18
of appeared 128 times. prob: 2.2176346721581293e-161
state appeared 8 times. prob: 2.1932135953491616e-21
new appeared 3 times. prob: 1.4092076516160994e-09
york appeared 1 times. prob: 0.0006406491911803961
i appeared 3 times. prob: 2.3775909250082666e-09
shall appeared 1 times. prob: 0.0005338743259836634
here appeared 2 times. prob: 7.125544898612772e-08
P(doc|hamilton) is proportional to:
to appeared 73 times. prob: 6.2977692751420065e-106
the appeared 193 times. prob: 5.985687391614642e-213
people appeared 7 times. prob: 2.183281419899546e-19
of appeared 128 times. prob: 1.0591759406755472e-158
state appeared 8 times. prob: 1.2495592438577611e-21
new appeared 3 times. prob: 1.7244796687353517e-09
york appeared 1 times. prob: 0.0003597553663508814
i appeared 3 times. prob: 1.3117478143896215e-08
shall appeared 1 times. prob: 0.0014390214654035256
here appeared 2 times. prob: 1.4380435957584098e-08
madison: 		 0.0
hamilton: 		 0.0
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-4-2c05d17162d4> in <module>()
     17 print('madison: \t\t',pMadison)
     18 print('hamilton: \t\t', pHamilton)
---> 19 print('madison/hamilton:\t',pMadison/pHamilton)

ZeroDivisionError: float division by zero

Step 3 (tractable): Compute log probabilities for each writer.

  • Multiplying many small probabilities leads to underflow.

  • A tractable version computes the sum of log probabilities.

  • An equivalent comparison would then be as follows:

    $\log{P(unknownDoc|Madison)} - \log{P(unknownDoc|Hamilton)} > 0 \rightarrow \text{Madison wrote document},$

  • where

    $P(unknownDoc|Madison) \propto \sum_{i=1}^m \left( (\text{# apperances of word }i \text{ in unknown}) \log( p_{\text{M}, i}) \right)$

In [5]:
def calcLogProbDoc(wordProbMap, countMap):
    logprob = 0
    for word_i in countMap:
        c_i = countMap[word_i]
        p_i = getWordProb(wordProbMap, word_i)
        logprob += c_i * math.log(p_i)
    return logprob

logpMadison = calcLogProbDoc(madisonWordProb, unknownDocCount)
logpHamilton = calcLogProbDoc(hamiltonWordProb, unknownDocCount)
print('log madison: \t\t',logpMadison)
print('log hamilton: \t\t', logpHamilton)
print('log madison/hamilton:\t',logpMadison - logpHamilton)
log madison: 		 -12898.983382081531
log hamilton: 		 -14257.381189681942
log madison/hamilton:	 1358.3978076004114