Mean, Variance and Distributions


The Mean-variance Paradigm

The world is, unhappily, very complex. Before one can analyze, one must abstract. The time-state paradigm provides a procedure for doing so. Its power lies in the straightforward way that it accommodates time, risk, and options. But this power comes at a price. In general, one must assume a relatively simple structure (e.g. two possible outcomes in each trading period) and the existence of markets that are sufficiently complete to allow replication and valuation of desired patterns of payments and/or consumption over time.

Despite these limitations, the time-state paradigm is eminently practical in a number of settings. Dynamic strategies involving broad asset classes are frequently analyzed using it. It is also the paradigm of choice when derivative securities are the focus of attention. However, when the goal is to consider many possible combinations of many different financial instruments, use of the time-state approach poses a number of problems. One must either assume a limited number of outcomes in each trading interval, making most of the securities redundant, or many such outcomes, making the assumption of complete markets unrealistic. Clearly, a Hobson's choice.

In 1952, Markowitz proposed a paradigm for dealing with issues concerning choices which involve many possible financial instruments. Formally, it deals with only two discrete time periods (e.g. "now" and "a year from now"), or, equivalently, one accounting period (e.g. "one year"). In this scheme, the goal of an Investor is to select the portfolio of securities that will provide the best distribution of future consumption, given his or her investment budget. Two measures of the prospects provided by such a portfolio are assumed to be sufficient for evaluating its desirability: the expected or mean value at the end of the accounting period and the standard deviation or its square, the variance, of that value. If the initial investment budget is positive, there will be a one-to-one relationship between these end-of-period measures and comparable measures relating to the percentage change in value, or return over the period. Thus Markowitz' approach is often framed in terms of the expected return of a portfolio and its standard deviation of return, with the latter serving as a measure of risk.

The Markowitz paradigm is often characterized as dealing with portfolio risk and (expected) return or, more simply, risk and return. More precisely, it can be termed the mean-variance paradigm.

Expected Value

Assume that a portfolio will have a future (end-of-period) value of v1 in state 1, v2 in state 2, etc.. let v = [v1,v2,...vm] be a {1*m} element vector, where m is the number of possible states of the world. To compute the portfolio's expected future value, we need someone's estimate of the probabilities associated with the states. Let pr = [pr1,pr2,...,prm] be such a vector. The expected value is, as usual, a weighted average of the possible outcomes, with the probabilities of the outcomes used as weights:

     ev = pr*v'

If the current value of the portfolio is p, we can compute a vector of value-relatives (future/present values):

     vr = v/p

And a vector of returns (proportional changes in value):

     r = (v-p)/p

The portfolio's expected value-relative can be computed either directly or indirectly:

     evr = (pr*v')/p = ev/p

Similarly, the portfolio's expected return will be:

     er = (pr*(v-p)')/p = ((pr*v')-p)/p = (ev-p)/p

In the special case in which every outcome is equally likely, the expected value can be computed by simply taking the (arithmetic) mean of the possible values. In MATLAB:

     ev = mean(v)

Note that the mean function can be used with a matrix -- the result will be a row vector in which each element is the mean of the corresponding column in the original. This can prove handy with a matrix in which each column represents a different asset and each row a different state of the world, with the latter assumed to be equally likely.


Probability estimates are essential in the mean-variance approach. Unless all Investors agree about such probabilities, one cannot talk about "the" expected value or expected return (or risk, for that matter) of a portfolio, security, asset class or investment plan. Two different Analysts might well provide different estimates of expected values for the same investment product. Indeed, one of the key functions than an Analyst can perform for an Investor is the provision of informed estimates of the probabilities of various outcomes and the associated risks and expected values of alternative investment strategies.

Normative applications of the mean-variance paradigm often accept the possibility of disagreement among Investors and Analysts concerning probability estimates. Positive applications usually assume either that there is agreement concerning such probabilities or that prices are set as if there were agreement on a set of consensus probability estimates.

It is important to emphasize the fact that the mean-variance approach calls for the use of estimates of the probabilities of alternative future possible events in the next period. Historic frequencies of such events in past periods may prove helpful when forming such forward-looking estimates, but one should consider taking into account any additional information that might prove helpful. The world changes, and the future need not be like the past, even probabilistically. Issues concerning ways to implement the mean-variance approach can and should be separated from issues concerning its structure, assumptions, and implications.

Standard Deviation

If the future value of a portfolio will be vs in state s and the expected future value is ev, the deviation, or surprise, in state s will equal (vs-ev). More generally, if v is the vector of possible future values, the vector of deviations, state by state, will be:

     d = v - ev

In this vector, a positive deviation represents a happy surprise, a negative deviation an unhappy surprise, and a zero deviation no surprise at all. Roughly: the greater the "spread" of the possible deviations, the greater the uncertainty about the actual outcome.

To measure risk in a fully useful manner we need to take into account not only the possible surprises, but also the probabilities associated with them. Simply weighting each deviation by its probability won't do, since the answer will always equal zero.

One alternative uses the expected or mean absolute deviation (mad):

     mad = pr*abs(d)'

In practice, it is difficult to use mad measures when considering combinations of securities and portfolios. Mean-variance theory thus utilizes the expected squared deviation, known as the variance:

     var = pr*(d.^2)'

Variance is often the preferred measure for calculation, but for communication (e.g between an Analyst and an Investor), variance is usually inferior to its square root, the standard deviation:

     sd = sqrt(var) = sqrt(pr*(d.^2)')

Standard deviation is measured in the same units as the original outcomes (e.g. future values or returns), while variance is measured in such units squared (e.g. values squared or returns squared).

We again emphasize that standard deviation is used in this context as a forward-looking measure of risk, since it is based on probabilities of future outcomes, however derived. One can assume that future risk is similar to past variability, but this is neither required nor, in certain cases, desirable.

MATLAB provides a function for computing the standard deviation of a series of values, and one that can be used to compute the variance of such values. In each case, the computations assume that the outcomes are equally probable. In addition, it is assumed that the values are drawn from a sample distribution taken from a larger population., and that the variance and standard deviation of the population are to be estimated.

For reasons that we will not cover here, the best estimate of the population variance will equal the sample variance times n/(n-1), where n is the number of sample values. Correspondingly, the best estimate of the population standard deviation will equal the sample standard deviation times the square root of n/(n-1). MATLAB'a functions make this correction automatically, as do many functions included with spreadsheet software. When estimates of this type are desired, one can use std(v) to find the estimated population standard deviation where v is a vector of sample values. Alternatively, one can use cov(v) to find the estimated population variance. Note that both functions are inherently designed to process historic data in order to make predictions about future results and hence implicitly assume that future "samples" will be drawn from the same "population" as were prior ones. In some cases this assumption may be entirely justified; in others it may not.

Continuous and Discrete Outcomes

Thus far, we have dealt with a world in which a future value can take on one of a discrete set of specified values, with a probability associated with each value. The mean-variance approach can be utilized in such a setting, and we will do this from time to time for expository purposes. However, its natural setting is in a world in which outcomes can lie at any point along a continuum of values. Statisticians use the term random variable to denote a variable that can take on any of a number of such values.

In a discrete setting, the actual value of a variable will be drawn from a vector (e.g. v) having a finite number of possible outcomes, with the probability of drawing each value given by the corresponding entry in an associated probability vector (e.g. pr). The set of values (v) and the associated probabilities (pr) constitute a discrete probability distribution.

In a continuous setting, a value will be drawn from a continuous probability distribution, the parameters and form of which indicate the range of outcomes and the associated probabilities.

Cumulative Distributions

The most informative way to portray a distribution utilizes a plot of the probability that the actual outcome will be less than or equal to each of a set of possible values.

Let v be a vector of values, sorted in ascending order, and pr a vector of the probabilities associated with each of the corresponding values. For example:

     v =  [   10   20   30];
     pr = [ 0.20 0.30 0.50];

The probability that the actual outcome will be less than or equal to 10 is 0.20. The probability that the actual outcome will be less than or equal to 20 is (0.20+0.30), or 0.50, and the probability that the outcome will be less than or equal to 30 is 1.00. To produce a vector of these probabilities we can use the MATLAB cumsum function, which creates a new vector in which each element is the cumulative sum of all the elements up to and including the comparable position in the original vector. In this case:

    cumsum(pr) = 
         0.2000    0.5000    1.0000  

The figure below shows the associated cumulative probability distribution. Note that it is a step function, reflecting the discrete nature of the outcomes.

It is, of course, much simpler to simply plot the points, and let MATLAB connect them with straight lines. Here are the required statements:

     ylabel('Probability actual <= outcome');

In this case the result is:

The greater the number of points and the nearer together they are, the closer will be this type of plot to the more accurate step function. In the case of a continuous distribution, there will be no difference at all.

Normal Distributions

A uniformly-distributed random variable can take on any value within a specified range (e.g., zero to one) with equal probability. Most programming languages and spreadsheets provide functions that can generate close approximations to such variables (purists would, however, call them pseudo-random variables, since they are not completely random). In MATLAB, the function rand(r,c) generates an {r*c} element matrix of such numbers.

Consider the process of generating 1000 sets of 1000 such numbers, then taking the mean (unweighted average) of each set. In MATLAB:

     z = mean(rand(1000,1000))

A histogram showing the frequency distribution of the mean values in each of 25 "bins" can be obtained with the statement:


The figure below shows the results obtained in this manner in one experiment.

Note that the distribution is approximately "bell-shaped" and roughly symmetric. This is not surprising since the central limit theorem holds that the distribution of the sum or average of a set of unrelated variables will approach a particular form as the number of variables increases. The form is that of the normal distribution, given by the equations:

     nd = (x - ev)/sd;
     p(x) = (1/sqrt(2*pi))*exp(-(nd^2)/2)

where p(x) is proportional to the probability that the actual value will equal x; ev and sd stand for the expected value and standard deviation, respectively, of the distribution, and nd is the deviation of x from ev in standard deviation units.

The figure below plots p(x) for various values of nd.

More practical is the cumulative normal distribution . MATLAB does not provide such a function, but it offers the next best thing. The expression erf(x)/sqrt(2)) gives the probability that a normally-distributed random variable will fall between -x and +x standard deviations of the mean. This forms the basis for our function cnd(nd) where nd is a standardized deviation and cnd(nd) is the probability that the actual outcome will be less than nd.

The figure below shows the values of cnd(nd) for nd from -3 to +3 (in steps of 0.1), using the MATLAB statements:

    nd = -3:0.1:3;
    pr = cnd(nd)
    ylabel('Probability actual <= outcome');    

The cumulative normal distribution can be used to determine probabilities that a normally-distributed outcome will lie within a given range. For example, the probability that an outcome will like within one standard deviation of the mean is:


Thus there are roughly two chances out of three that the outcome will lie within this range. Some characterize an investment's prospects by giving its mean and standard deviation in the form: e +/- sd (read as e plus or minus sd); thus an asset mix might be said to offer returns of 10+/-15. If the return can be assumed to be normally-distributed, this means that there are roughly two chances out of three that the actual return will lie between -5% (10-15) and 25% (10+25).

The probability that a normally-distributed return will be within two standard deviations of the mean is given by:


Thus if a normally-distributed investment is characterized by 10+/-15, the chances are roughly 95% that its actual return will lie between -20% (10 - 2*15) and 40% (10+2*15).

In MATLAB one can produce normally-distributed random variables with an expected value of zero and a standard deviation of 1.0 directly using the function randn. Thus:

     z = ev + randn(100,10)*sd 

will produce a {100*10} matrix z of random numbers from a distribution with a mean of ev and a standard deviation of sd.

Joint Normality

While the central limit theorem provides a powerful inducement to assume that investment returns and values are normally distributed, it is not sufficient in its own right. While most investment results depend on many events and most portfolios contain many securities, it is unlikely that the influences on overall results are unrelated. If, for example, the health of an economy is not normally distributed, and if it affects most securities to at least some extent, even the value of a diversified portfolio will have a non-normal distribution.

To solve this problem at a formal level, Analysts often assume that the return or value of every investment is normally distributed as is the value or return of any possible combination of investments. Since knowledge of the expected value and standard deviation of a normal distribution is sufficient to calculate the probability of every possible outcome, this very convenient assumption implies that the expected value and standard deviation are sufficient statistics for investment choices in which an end-of-period value or return is the sole source of an Investor's utility.

If the value or return of every possible investment and combination of investments is normally distributed, we say that the set of such variables is jointly normally distributed .The mean-variance approach is well suited for application in such an environment.

Shortfall Measures

Some argue that standard deviation is a flawed measure of risk since it takes into account both happy and unhappy surprises, while most people associate the concept of risk with only the latter. Alternative measures focus on "downside risk" or likely "shortfall". Each requires the specification of an additional parameter -- the point from which shortfall is to be measured. This threshold may be zero, a riskless rate of return, or some level below which the Investor's disappointment with the outcome is assumed to be especially great.

Shortfall Probability

The simplest shortfall measure is the probability of a shortfall below a stated threshold. This can be read directly from a graph of the associated cumulative distribution. For example, assume that the probability that a return will be less than 10% is desired. In the figure below, find 10% on the horizontal axis. Go up to the curve, then over to the vertical axis. The result is 0.5. Thus there is a 50% probability that the return will fall below the selected threshold of 10%.

Measures of Likely Shortfall

More complex shortfall measures take into account all possible outcomes below the selected threshold and their probabilities to obtain an estimate of the "likely" magnitude of the shortfall. Let r be a vector of possible returns and pr a vector of the associated probabilities. For example:

    r =  [-10 0 10 20]
    pr = [.1 .2 .3 .4]

Assume that the desired threshold is 10 (%). The positions in r which contain returns below the threshold can be found simply using the MATLAB expression:

          1     1     0     0

To produce a vector of shortfalls we subtract the threshold from each return, then multiply the resulting vector by the vector that contains zeros in all positions in which the difference is positive:

    sf = (r-threshold).*(r<threshold)
           -20   -10     0     0

To find the expected shortfall multiply each of these values times the associated probability:


An alternative is the semi-variance, which is the expected squared shortfall:


The square root of the semi-variance is termed the semi-standard deviation. In a sense, it is the "downside" counterpart of the standard deviation. In the case at hand:


The expected shortfall, the semi-variance and the semi-standard deviation are all unconditional measures. For example, the expected shortfall is the expected value of the shortfall, whether there is one or not. All outcomes that exceed the threshold are treated equally (as zero shortfalls), no matter what their magnitude. Alternative measures answer a somewhat different set of questions. For example, one might wish to know the size of the expected shortfall if there is one. More directly: conditional on the existence of a shortfall, how large is it likely to be?

To compute a conditional measure, only states of the world in which a shortfall occurs are considered. The desired probabilities are those conditional on such a situation arising. In our example, only the first two states of the world produce shortfalls. The associated unconditional probabilities are 0.1 and 0.2. Thus the probability of a shortfall is 0.3. The conditional probabilities for the two states are 0.3333 (=0.1/0.3) and 0.6667 (=0.2/0.3).More generally, we divide each unconditional probability by the probability of a shortfall. To find the latter we need a vector of the unconditional probabilities for states in which there is a shortfall:

       0.1000    0.2000         0         0

The sum of these values is the probability of a shortfall:

   prsf = sum(pr.*(r<threshold))

To find the conditional expected shortfall, we could divide each unconditional probability by this value, then multiply by the shortfall vector. Equivalently, we could simply divide the unconditional expected shortfall by the probability of a shortfall:


Earlier we found that the expected shortfall is 4%. However, if there is a shortfall, the expected amount is 13.33%.

Similarly, the conditional semi-variance equals the unconditional semi-variance divided by the probability of a shortfall. From this it follows that the conditional semi-standard deviation equals the unconditional semi-standard deviation divided by the square root of the probability of a shortfall.

Value at Risk

Another measure of downside risk is based on a specified probability. In effect one asks the question: what is the (almost) worst thing that can happen? A probability px is selected. The associated (almost) worst thing that can happen is given by a return or future value x, such that there is only a 1% probability that the actual outcome will be worse than x.

Assume, for example, that a bad outcome is specified as one that will not be underperformed more than 10% (px) of the time. In the case shown in the previous figure, this is easily determined. Locate 0.1 (10%) on the vertical axis. Then go over to the curve and down to the horizontal axis. The result is -10%. Thus the (10%) worst case involves a return of -10%.

When the result of this kind of calculation involves a negative change in value, the change is often termed the value at risk. Thus, in our example, if the current amount invested were $500,000, we would say that the value at risk is $50,000.

Value at risk is often calculated for short holding periods (e.g. a day or a week). In such cases the expected return is often assumed to be zero. This allows the Analyst to concentrate on the shape of the distribution of returns and its standard deviation, thereby lending at least a somewhat greater sense of objectivity to the result.

Shortfall and other Risk Measures

In many cases it proves helpful to summarize the prospects of an investment strategy in terms of (1) its expected outcome and (2) a measure of downside risk or likely shortfall, even though the analysis leading to its choice utilized standard deviation as a measure of risk.

Among strategies with equal expected outcomes there is often a one-to-one correspondence between standard deviation and each of several alternative risk measures, including downside ones. Since calculations are far easier when standard deviation is utilized, we follow common practice by utilizing it in much of what follows. When issues of communication are paramount, however, we will include transformations to alternative measures that focus attention on bad outcomes rather than all outcomes.