Lecture 7: Variability

Lecture 7: Variability#

STATS 60 / STATS 160 / PSYCH 10

Concepts and Learning Goals:

Variability of distributions
Intuition for variability from histograms
Common measures of variability:
- Variance and Standard Deviation
- Quantiles

Announcements:

You will have 1 week for quiz regrade requests.
The 4:30-5:20PM discussion section only has 6 students.

Variability#

Last week, you learned about them mean and median.

Both measure where the center of a distribution (the data) is, for different notions of centering.

Many times, we don’t just care where the center of the distribution is; we also want to know about the variability of the data.

Are most of the samples close to the center (mean/median), or not?
What is the “typical range” the data falls into?

Why care about variability?#

Question: think of examples of scenarios where you care not only where the data is centered, but also what the variability is.

Medicine: you know the average life expectancy, given a diagnosis. But what are the best/worst case scenarios?
Exams: you know the class average, and you know your score. But how do you really compare to the rest of the class?
Investments: you are trying to decide if you should invest in a stock. You know the historical average annual rate of return. But is it possible that there will be a big loss?

Example 1: daily temperatures in different cities#

Recounting the example from lecture 5, below are the overlayed histograms of daily average temperatures in two cities in 2024-2025.

The means of the two cities are very close, but the distributions are very different.

Qualitatively, the temperature in Chicago exhibits greater variability.

Example 2: stock prices#

These histograms show the daily closing prices of Visa (VISA) and Tesla (TSLA) stock for the last 5 years.

The means are very close, but qualitatively, the TSLA price exhibits greater variability.

How should we measure variability?#

We saw two examples of distributions with similar means, but different levels of variability.

Question: how could we measure variability? Suggest a quantitative measure.

Variance#

A common quantitative summary of variability is the variance.

If our datapoints are $x_1,\ldots,x_n$, and their mean is $\bar{x} = \frac{x_1+x_2 + \cdots + x_n}{n}$,

the variance is the average squared distance to the mean:

\[ \overline{\sigma}^2 = \text{variance} = \frac{(x_1-\bar x)^2 + (x_2 - \bar x)^2 + \dots + (x_n - \bar x)^2}{n}. \]

Practice with the variance#

The variance is the average squared distance to the mean:

\[ \overline{\sigma}^2 = \text{variance} = \frac{(x_1-\bar x)^2 + (x_2 - \bar x)^2 + \dots + (x_n - \bar x)^2}{n} \]

Question: Calculate the variance of the rowers’ heights. What are the units?

$\bar{x}= 70.55\mathrm{in}$ ; $\bar{\sigma}^2 = 14.47 \mathrm{in}^2$.

Standard Deviation#

The standard deviation is the square root of the variance:

\[ \bar \sigma = \text{standard deviation} = \sqrt{\bar \sigma^2}. \]

If the data has the units $u$, then the variance has the units $u^2$.

The units of the variance are incompatible with the units of the data.

For this reason, if you want a measure of variability that you can compare to the mean, you should use the standard deviation rather than the variance.

Question: Calculate the standard deviation of the rowers’ heights.

$\sigma = 3.80\mathrm{in}$.

Variability and risk#

Suppose someone offers you a choice between:

A gift of $100
The chance to flip a fair coin for $300.

What would you choose, and why?

We can think of the outcomes in each scenario as datapoints in two different distribution:

Scenario 1 is a distribution containing exactly one datapoint: $100.
Scenario 2 is a distribution with two datapoints: $0 (tails), $300 (heads)

Question: calculate the mean and standard deviation of your earnings in each scenario.

Scenario	Mean	Standard Deviation
1	$100	$ 0
2	$150	$ 150

Example 1: daily temperature#

Mean and Standard Deviation in temperature in 2024-2025:

City	Mean Temperature	Standard Deviation
Seattle	$51.7^{\circ} F$	$10.3^{\circ} F$
Chicago	$54.3^{\circ} F$	$19.0^{\circ} F$

The standard deviation of temperature in Chicago is about twice as much as that of Seattle.

Example 2: stock prices#

Mean and Standard Deviation in closing value for the last 5 years:

Stock	Mean Value	Standard Deviation
TSLA	$274.60	$83.24
V	$258.92	$53.54

The standard deviation of Tesla stock is about 30% of its mean value.

The standard deviation of Visa stock is about 20% of its mean value.

Aside: the ratio of the standard deviation to the mean only makes sense as a measurement of variability for non-negative data.

Standard deviation & outliers#

Sometimes, the standard deviation can be large because of one outlier.

Example: The following dataset gives section attendance for each of the 5 sections of STATS60 this week:

TA	Attendance
Cole	20
Junyi	6
Leda	27
Skyler	25
Valerie	21

The mean is 19.8, the standard deviation is 7.3.

If we remove the outlier of Junyi’s section:

the mean is 23.3, the standard deviation is only 2.9.

Discussion#

Question: Do you think the standard deviation is a satisfying measure of variability? What is it conveying? What is it not conveying?

The standard deviation can be large because of the influence of outliers. It can be a “pessimistic” notion of variability.

A guarantee for the standard deviation#

Most samples are within a few standard deviations of the mean!

The following fact is called Chebyshev’s inequality:

For any $t > 0$, at most a $1/t^2$ fraction of datapoints are more than $t$ standard deviations away from the mean.

For example, this implies that 75% of the datapoints are no more than $2$ standard deviations away from the mean (Chebyshev’s inequality with the choice $t = 2$).

You’d see how to prove and use this fact in an intro probability course, like STATS 117/118.

Quantiles#

Quantiles tell us the fraction of the data that falls in each range. They give us a more complete picture of variability.

The $k$-quantiles of a distribution are the $k-1$ numbers which partition the histogram into $k$ equal-sized parts:

Depicted here are the $10$-quantiles, also known as deciles.

Other commonly used quantiles are the quartiles ($4$-quantiles), and percentiles ($100$-quantiles).

Using quantiles to measure variability#

Question: How can we use quantiles to measure variability?

We can measure distance between quantiles (the “width” of quantiles), or between quantiles and the mean.

For example, the distance from the $10$th percentile to $90$th percentile:

City	Mean Temp	Std. Dev	10th Percentile	90th percentile	10-90 percentile window
Seattle	$51.7^{\circ F}$	$10.3^{\circ F}$	$39^{\circ F}$	$66.5^{\circ F}$	$27.5^{\circ F}$
Chicago	$54.3^{\circ F}$	$19.0^{\circ F}$	$28.0^{\circ F}$	$77.0^{\circ F}$	$49^{\circ F}$

Another way to think about quantiles#

Question: How does the information we get from the standard deviation differ from the information we get from the quantiles?

The quantiles give us a better sense of the shape of the distribution.
They also exactly tell us what percent of datapoints fall in a range.

For example: 80% of data points in the histogram fall between the 10th and 90th percentile.

Question: Why?

City	10th	90th	Window Size
Seattle	$39^{\circ F}$	$66.5^{\circ F}$	$27.5^{\circ F}$
Chicago	$28.0^{\circ F}$	$77.0^{\circ F}$	$49^{\circ F}$

In each city, you can reasonably expect that 80% of the time, the temperature will be in the 10-90th percentile window.

This also gives us a sense of the variability.

Recap#

Concept of variability
Common measures of variability:
- Variance and Standard Deviation
- Quantiles

City	Mean Temp	Std. Dev	10th Percentile	90th percentile	10-90 percentile window
Seattle	\(51.7^{\circ F}\)	\(10.3^{\circ F}\)	\(39^{\circ F}\)	\(66.5^{\circ F}\)	\(27.5^{\circ F}\)
Chicago	\(54.3^{\circ F}\)	\(19.0^{\circ F}\)	\(28.0^{\circ F}\)	\(77.0^{\circ F}\)	\(49^{\circ F}\)

City	Mean Temperature	Standard Deviation
Seattle	\(51.7^{\circ} F\)	\(10.3^{\circ} F\)
Chicago	\(54.3^{\circ} F\)	\(19.0^{\circ} F\)

Lecture 7: Variability

Contents

Lecture 7: Variability#

Variability#

Why care about variability?#

Example 1: daily temperatures in different cities#

Example 2: stock prices#

How should we measure variability?#

Variance#

Practice with the variance#

Standard Deviation#

Variability and risk#

Example 1: daily temperature#

Example 2: stock prices#

Standard deviation & outliers#

Discussion#

A guarantee for the standard deviation#

Quantiles#

Using quantiles to measure variability#

Another way to think about quantiles#

Recap#