Python for Probability

We'll hold a Python review session to get you up to speed on what you'll need for the problem sets. It will be Friday Sept 27th from 5-7pm. It will be recorded. The first hour is basic Python review, second hour is numpy and things helpful for data science. People are welcome to come right at 6 for just the second part

This handout only goes over probability functions for Python. We'll cover these concepts throughout the quarter. For a tutorial on the basics of python, there are many good online tutorials.

Installing Python
Probability Basics

Installing Python

You should install python on your computer if you haven't already done so. As an alternative you could use an online python environment like repl.it.

Install Python and PyCharm

You will need to install Python and PyCharm (or have a code editor you are comfortable with) on your computer. If you've installed Python as part of CS106A, you're good to go for CS109 and can stop reading here :-).
Use repl.it

Use the free online service repl.it in your browser. This does not require you to install anything on your computer, but it will require an internet connection during development. You will need to make a repl.it account and a Github account.
Use your existing installation of Python 3

If you have your own installation of Python 3.7 or higher, you can stop reading here. :-)

Option 1: Install Python and an IDE

Please read the CS106A Installing PyCharm Guide. This will install Python 3.8 and PyCharm, a Python IDE. Another popular (free) IDE is VSCode which is very simple to install.

Once you've installed an IDE, watch this short video to install numpy.

Make sure to follow all of the installation instructions. Remember to always open the folder containing the .py files in PyCharm, not the .py files themselves.

Option 2: Use repl.it

If you do not have a Github account, go to https://github.com and click 'Sign Up'. Follow the prompts to make an account.
Go to https://education.github.com/pack and enhance your Github account with the "student developer pack". You may need to wait to be approved.
Go to https://repl.it/login and click the Github icon to login through Github. The Github icon looks like this:

Since you setup your account with the student developer pack, you can make private projects.
Projects are called "repls". You can create a repl from your home page by clicking "new repl".

Set Python as the programming language and set the name to whatever you wish.
Important: you must set your repl to private for any class-related work. Failing to do so may violate the Stanford Honor Code.

Once your repl is created, you should see several panels. The middle panel is for code, and the right panel is for code output. Paste print('Hello World!') into the middle panel, and click "run". You should see Hello World! output in the right panel. Congratulations! You are all setup!

Repl uses Python version 3.8. This suffices for our purposes.

Counting Operations

Factorial

Compute $n!$ as an integer. This example computes $20!$:

import math
print(math.factorial(20))

Choose

As of Python 3.8, you can compute $n \choose m$ from the math module. This example computes $10 \choose 5$:

import mat
print(math.comb(10, 5))

Using SciPy

SciPy is a free and open source library for scientific computing that is built on top of NumPy. You may find it helpful to use SciPy to check the answers you obtain in the written section of your problem sets. NumPy has the capability of drawing samples from many common distributions (type `help(np.random)` in the python interpreter), but SciPy has the added capability of computing the probability of observing events, and it can perform computations directly on the probability mass/density functions.

Binomial

Make a Binomial Random variable $X$ and compute its probability mass function (PMF) or cumulative density function (CDF). We love the scipy stats library because it defines all the functions you would care about for a random variable, including expectation, variance, and even things we haven't talked about in CS109, like entropy. This example declares $X \sim \text{Bin}(n = 10, p = 0.2)$. It calculates a few statistics on $X$. It then calculates $P(X = 3)$ and $P(X \leq 4)$. Finally it generates a few random samples from $X$:

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
print(X.pmf(3))           # P(X = 3)
print(X.cdf(4))           # P(X <= 4)
print(X.mean())           # E[X]
print(X.var())            # Var(X)
print(X.std())            # Std(X)
print(X.rvs())            # Get a random sample from X
print(X.rvs(10))          # Get 10 random samples form X

From a terminal you can always use the "help" command to see a full list of methods defined on a variable (or for a package):

from scipy import stats
X = stats.binom(10, 0.2) # Declare X to be a binomial random variable
help(X)                  # List all methods defined for X

Poisson

Make a Poisson Random variable $Y$. This example declares $Y \sim \text{Poi}(\lambda = 2)$. It then calculates $P(Y = 3)$:

from scipy import stats
Y = stats.poisson(2)  # Declare Y to be a poisson random variable
print(Y.pmf(3))       # P(Y = 3)
print(Y.rvs())        # Get a random sample from Y

Geometric

Make a Geometric Random variable $X$, the number of trials until a success. This example declares $X \sim \text{Geo}(p = 0.75)$:

from scipy import stats
X = stats.geom(0.75)  # Declare X to be a geometric random variable
print(X.pmf(3))       # P(X = 3)
print(X.rvs())        # Get a random sample from Y

Normal

Make a Normal Random variable $A$. This example declares $A \sim N(\mu = 3, \sigma^2 = 16)$. It then calculates $f_Y(0)$ and $F_Y(0)$. Very Important!!! In class, the second parameter to a normal was the variance ($\sigma^2$). In the scipy library, the second parameter is the standard deviation ($\sigma$):

import math
from scipy import stats
A = stats.norm(3, math.sqrt(16)) # Declare A to be a normal random variable
print(A.pdf(4))       			 # f(3), the probability density at 3
print(A.cdf(2))       			 # F(2), which is also P(Y < 2)
print(A.rvs())        			 # Get a random sample from A

Exponential

Make an Exponential Random variable $B$. This example declares $B \sim \text{Exp}(\lambda = 4)$:

from scipy import stats
# `λ` is a common parameterization for the exponential,
#  but `scipy` uses `scale` which is `1/λ`
B = stats.expon(scale=1/4)
print(B.pdf(1))             # f(1), the probability density at 1
print(B.cdf(2))             # F(2) which is also P(B < 2)
print(B.rvs())              # Get a random sample from B

Beta

Make an Beta Random variable $X$. This example declares $X \sim \text{Beta}(\alpha = 1, \beta = 3)$:

from scipy import stats
X = stats.beta(1, 3)  # Declare X to be a beta random variable
print(X.pdf(0.5))     # f(0.5), the probability density at 1
print(X.cdf(0.7))     # F(0.7) which is also P(X < 0.7)
print(X.rvs())        # Get a random sample from X