Lecture 7: Variability#

STATS 60 / STATS 160 / PSYCH 10

Concepts and Learning Goals:

  • Variability of distributions

  • Intuition for variability from histograms

  • Common measures of variability:

    • Variance and Standard Deviation

    • Quantiles

Announcements:

  • You will have 1 week for quiz regrade requests.

  • The 4:30-5:20PM discussion section only has 6 students.

Variability#

Last week, you learned about them mean and median.

Both measure where the center of a distribution (the data) is, for different notions of centering.

Many times, we don’t just care where the center of the distribution is; we also want to know about the variability of the data.

  • Are most of the samples close to the center (mean/median), or not?

  • What is the “typical range” the data falls into?

Why care about variability?#

Question: think of examples of scenarios where you care not only where the data is centered, but also what the variability is.

  • Medicine: you know the average life expectancy, given a diagnosis. But what are the best/worst case scenarios?

  • Exams: you know the class average, and you know your score. But how do you really compare to the rest of the class?

  • Investments: you are trying to decide if you should invest in a stock. You know the historical average annual rate of return. But is it possible that there will be a big loss?

Example 1: daily temperatures in different cities#

Recounting the example from lecture 5, below are the overlayed histograms of daily average temperatures in two cities in 2024-2025.

The means of the two cities are very close, but the distributions are very different.

Qualitatively, the temperature in Chicago exhibits greater variability.

Example 2: stock prices#

These histograms show the daily closing prices of Visa (VISA) and Tesla (TSLA) stock for the last 5 years.

The means are very close, but qualitatively, the TSLA price exhibits greater variability.

How should we measure variability?#

We saw two examples of distributions with similar means, but different levels of variability.

Question: how could we measure variability? Suggest a quantitative measure.

Variance#

A common quantitative summary of variability is the variance.

If our datapoints are \(x_1,\ldots,x_n\), and their mean is \(\bar{x} = \frac{x_1+x_2 + \cdots + x_n}{n}\),

the variance is the average squared distance to the mean:

\[ \overline{\sigma}^2 = \text{variance} = \frac{(x_1-\bar x)^2 + (x_2 - \bar x)^2 + \dots + (x_n - \bar x)^2}{n}. \]

Practice with the variance#

The variance is the average squared distance to the mean:

\[ \overline{\sigma}^2 = \text{variance} = \frac{(x_1-\bar x)^2 + (x_2 - \bar x)^2 + \dots + (x_n - \bar x)^2}{n} \]

Question: Calculate the variance of the rowers’ heights. What are the units?

\(\bar{x}= 70.55\mathrm{in}\) ; \(\bar{\sigma}^2 = 14.47 \mathrm{in}^2\).

Standard Deviation#

The standard deviation is the square root of the variance:

\[ \bar \sigma = \text{standard deviation} = \sqrt{\bar \sigma^2}. \]

If the data has the units \(u\), then the variance has the units \(u^2\).

The units of the variance are incompatible with the units of the data.

For this reason, if you want a measure of variability that you can compare to the mean, you should use the standard deviation rather than the variance.

Question: Calculate the standard deviation of the rowers’ heights.

\(\sigma = 3.80\mathrm{in}\).

Variability and risk#

Suppose someone offers you a choice between:

  1. A gift of $100

  2. The chance to flip a fair coin for $300.

What would you choose, and why?

We can think of the outcomes in each scenario as datapoints in two different distribution:

  • Scenario 1 is a distribution containing exactly one datapoint: $100.

  • Scenario 2 is a distribution with two datapoints: $0 (tails), $300 (heads)

Question: calculate the mean and standard deviation of your earnings in each scenario.

Scenario

Mean

Standard Deviation

1

$100

$ 0

2

$150

$ 150

Example 1: daily temperature#

Mean and Standard Deviation in temperature in 2024-2025:

City

Mean Temperature

Standard Deviation

Seattle

\(51.7^{\circ} F\)

\(10.3^{\circ} F\)

Chicago

\(54.3^{\circ} F\)

\(19.0^{\circ} F\)

The standard deviation of temperature in Chicago is about twice as much as that of Seattle.

Example 2: stock prices#

Mean and Standard Deviation in closing value for the last 5 years:

Stock

Mean Value

Standard Deviation

TSLA

$274.60

$83.24

V

$258.92

$53.54

The standard deviation of Tesla stock is about 30% of its mean value.

The standard deviation of Visa stock is about 20% of its mean value.

Aside: the ratio of the standard deviation to the mean only makes sense as a measurement of variability for non-negative data.

Standard deviation & outliers#

Sometimes, the standard deviation can be large because of one outlier.

Example: The following dataset gives section attendance for each of the 5 sections of STATS60 this week:

TA

Attendance

Cole

20

Junyi

6

Leda

27

Skyler

25

Valerie

21

The mean is 19.8, the standard deviation is 7.3.

If we remove the outlier of Junyi’s section:

the mean is 23.3, the standard deviation is only 2.9.

Discussion#

Question: Do you think the standard deviation is a satisfying measure of variability? What is it conveying? What is it not conveying?

  • The standard deviation can be large because of the influence of outliers. It can be a “pessimistic” notion of variability.

A guarantee for the standard deviation#

Most samples are within a few standard deviations of the mean!

The following fact is called Chebyshev’s inequality:

For any \(t > 0\), at most a \(1/t^2\) fraction of datapoints are more than \(t\) standard deviations away from the mean.

For example, this implies that 75% of the datapoints are no more than \(2\) standard deviations away from the mean (Chebyshev’s inequality with the choice \(t = 2\)).

You’d see how to prove and use this fact in an intro probability course, like STATS 117/118.

Quantiles#

Quantiles tell us the fraction of the data that falls in each range. They give us a more complete picture of variability.

The \(k\)-quantiles of a distribution are the \(k-1\) numbers which partition the histogram into \(k\) equal-sized parts:

Depicted here are the \(10\)-quantiles, also known as deciles.

Other commonly used quantiles are the quartiles (\(4\)-quantiles), and percentiles (\(100\)-quantiles).

Using quantiles to measure variability#

Question: How can we use quantiles to measure variability?

  • We can measure distance between quantiles (the “width” of quantiles), or between quantiles and the mean.

For example, the distance from the \(10\)th percentile to \(90\)th percentile:

City

Mean Temp

Std. Dev

10th Percentile

90th percentile

10-90 percentile window

Seattle

\(51.7^{\circ F}\)

\(10.3^{\circ F}\)

\(39^{\circ F}\)

\(66.5^{\circ F}\)

\(27.5^{\circ F}\)

Chicago

\(54.3^{\circ F}\)

\(19.0^{\circ F}\)

\(28.0^{\circ F}\)

\(77.0^{\circ F}\)

\(49^{\circ F}\)

Another way to think about quantiles#

Question: How does the information we get from the standard deviation differ from the information we get from the quantiles?

  • The quantiles give us a better sense of the shape of the distribution.

  • They also exactly tell us what percent of datapoints fall in a range.

For example: 80% of data points in the histogram fall between the 10th and 90th percentile.

Question: Why?

City

10th

90th

Window Size

Seattle

\(39^{\circ F}\)

\(66.5^{\circ F}\)

\(27.5^{\circ F}\)

Chicago

\(28.0^{\circ F}\)

\(77.0^{\circ F}\)

\(49^{\circ F}\)

In each city, you can reasonably expect that 80% of the time, the temperature will be in the 10-90th percentile window.

This also gives us a sense of the variability.

Recap#

  • Concept of variability

  • Common measures of variability:

    • Variance and Standard Deviation

    • Quantiles