Contents:

- A Generic Linear Factor Model
- Terminology
- Decomposing Returns
- Matrix Representation of Factor Models

A linear factor model relates the return on an asset (be it a stock, bond, mutual fund
or something else) to the values of a limited number of *factors, *with the
relationship described by a linear equation. In its most generic form, such a model
can be written as:

r

_{i}= b_{i1}*f_{1}+ b_{i2}*f_{2}+ .... + b_{im}*f_{m}+ e_{i}

where:

r

_{i}= the return on asset i

b_{i1}= the change in the return on asset i per unit change in factor 1

f_{1 }= the value of factor 1

b_{i2}= the change in the return on asset i per unit change in factor 2

f_{2 }= the value of factor 2

... = terms of the form b_{ij}*f_{j }with j going from 3 to m-1

f_{m }= the value of factor m

b_{im}= the change in the return on asset i per unit change in factor m

m = the number of factors

e_{i}= the portion of the return on asset i not related to the m factors

For emphasis, the equation is sometimes written so that variables that are assumed to be known before the fact are differentiated from those the value of which is generally not known until after the fact. For example:

r

_{i}^{~}= b_{i1}*f_{1}^{~}+ b_{i2}*f_{2}^{~}+ .... + b_{im}*f_{m}^{~}+ e_{i}^{~}

In this version, a *tilde *after a variable indicates that its value is not
generally known in advance. The values of such *stochastic variables* are
uncertain. Thus we do not know what the return on the asset (r_{i}^{~}
) will be, since we do not know the values that the factors (f_{1}^{~},
f_{2}^{~}, .... ,f_{m}^{~}
) will take on, nor do we know the amount of the asset's return
that will come from other sources (e_{i}^{~}).
On the other hand, we do know (or at least assume that we know) the
sensitivities of the return on the asset to each of the factors ( b_{i1}, b_{i2},
....,b_{im}) -- these are *deterministic* (not subject to
uncertainty). Somewhat differently put, they are *parameters* in the
model.

Purists will note that it is unusual to place tildes after stochastic variables rather than over them. The latter is indeed the convention in media that are not typographically challenged. Our approach is simply a pragmatic response to the limitations of standard browser formats.

The factor model equation may appear to make a significant statement about the
relationship between an asset's return and the values of the enumerated factors, but this
is not so. For example, one could choose any arbitrary set of b_{ij }'s and
f_{j}'s, then simply define the residual as:

e

_{i}^{~}= r_{i}^{~}- [b_{i1}*f_{1}^{~}+ b_{i2}*f_{2}^{~}+ .... + b_{im}*f_{m}^{~}]

The factor equation would then hold precisely, but could have no economic content at all. To make the equation have meaning, two assumptions are made. One is relatively innocuous. The other is not.

First, the residual return (e_{i}^{~}) is assumed to be
uncorrelated with each of the factors:

corr (e

_{i}^{~}, f_{j}^{~}) = 0 : for every j from 1 to m

This is not as restrictive as it may seem. Consider, for example, a case in which
the residual return is correlated with factor 1. By adjusting the factor exposure (b_{i1})
appropriately, the correlation of the residual with the factor can be made to equal zero.
Moreover, this can be done for every factor. In fact, in simple settings
using historic data, multiple regression procedures can be used to find a set of factor
exposures (b_{ij} 's) that will give residual returns that are uncorrelated with
each of the factors. Why? Because standard linear multiple regression methods
select slope coefficients (here, the b_{ij} 's) that minimize the variance of the
residual (here e_{i}). But this will insure that the residual is
uncorrelated with each of the independent variables (here, the f_{j}'s), since the
removal of any such correlation by changing one or more b_{ij}'s will reduce the
variance of the residual.

Thus the assumption that the residual is uncorrelated with each of the factors is convenient, but does not give the linear factor model much power. However, the second assumption does.

The *key* assumption of a linear factor model is that the residual for one
asset's return is uncorrelated with that of any other:

corr (e

_{i}^{~},e_{ j}^{~}) = 0 : for every i not equal to j, with i and j running from 1 to m

This means that the only sources of correlations among asset total returns are those
that arise from their exposures to the factors and the covariances among the factors.
The residual component of an asset's return is assumed to be unrelated to that of
any other asset, and hence totally *specific* to that asset. In other words,
the risk associated with the residual return is* idiosyncratic* to the asset in
question.

This assumption makes a linear factor model powerful in the sense that it rules out many possible combinations of outcomes. But greater power comes at a cost. The more restrictive a model, the greater the chance that it may be inconsistent with reality. For this reason it is incumbent on the Analyst to try to capture the most important sources of correlations among asset returns by including a sufficient number of factors and attempting to focus on the most important ones. This being said, as in the construction of any model, parsimony is a virtue, since the goal is to include "signals" and avoid "noise".

We have termed the standard factor model linear which, strictly speaking, it is.
However this is far less restrictive than might first seem. There are no
restrictions on correlations among the enumerated factors, so it perfectly possible to
include some that are correlated with others or are transforms of others. For example,
assume that the desired relationship is a quadratic one in which r_{i} is related
to two factors, f_{a} and f_{b} as follows:

r

_{i}= b_{i1}*f_{a}+ b_{i2}*f_{b + }b_{i3}*f_{a}^{2}+ b_{i4}*(f_{a}*f_{b}) + e_{i}

To put this in our standard format, define:

f

_{1}^{ = }f_{a}

f_{2}^{ = }f_{b}

f_{3}^{ = }f_{a}^{2}

f_{4}^{ = }f_{a}*f_{b}

Then the relationship can be written as a linear function of these new variables:

r

_{i}= b_{i1}*f_{1}+ b_{i2}*f_{2}+ b_{i3}*f_{3}+ b_{i4}*f_{4}+ e_{i}

In cases of this sort it may be difficult to estimate the values of the sensitivities
(b_{ij}'s) from historic data because the new factors are highly correlated with
each other, but there is no reason why such a format cannot be employed if good estimates
can be obtained.

To avoid needless carping on the need to define factors to allow a linear format for
the overall relationship, we henceforth will use the shorter term: *factor models*.

Thus far we have imposed no restrictions on the expected returns of the factors or on
the asset's residual returns (e_{i}'s). In general, we will not do so. This
allows the expected return of e_{i} to be positive, negative, or zero for any
asset. However, in some applications it is useful to divide the expected non-factor
return into two components -- a known expected value and an unknown residual
component with an expected value of zero. As typically written, the equation
becomes:

r

_{i}^{~}= b_{i1}*f_{1}^{~}+ b_{i2}*f_{2}^{~}+ .... + b_{im}*f_{m}^{~}+ (a_{i}+ e_{i}^{~})

where the expected value of e

_{i}^{~}= 0.

As the choice of letter suggests, the equation is often written with the a_{i }term
first:

r

_{i}^{~}= a_{i}+ b_{i1}*f_{1}^{~}+ b_{i2}*f_{2}^{~}+ .... + b_{im}*f_{m}^{~}+ e_{i}^{~}

In some cases the first term is called the asset's* alpha* value, but at this
point we use the more humble notation of "a".

Factor models are used in many domains in the field of investments, so it should not be surprising that different factors are used and different terms employed to describe the key components.

Factors (the f_{j}'s) may be::

- macro-economic variables
- returns on pre-specified portfolios,
- returns on zero-investment strategies (long and short positions of equal value) giving maximum exposure to a fundamental or macro-economic factors,
- returns on benchmark portfolios representing asset classes,
- or something else.

The b_{ij }coefficients may be called:

- factor exposures,
- factor sensitivities,
- factor loadings,
- factor betas,
- asset exposures
- style
- or something else.

The e_{i }term may be called:

- idiosyncratic return,
- security-specific return,
- non-factor return,
- residual return,
- selection return
- or something else.

Different problems require different factors and emphasize different economic relationships. The job of the Analyst is to either construct and apply an appropriate factor model for the task at hand or to at least understand the underlying structures and economic meanings of models constructed by others.

A factor model is especially useful when analyzing historic asset returns, since such a model allows the Analyst to separate components of the overall return of the asset. For such purposes it is useful to write the underlying model as:

r

_{it}= b_{i1}*f_{1t}+ b_{i2}*f_{2t}+ .... + b_{im}*f_{mt}+ e_{it}

where:

r

_{it}= the return on asset i in period t

b_{i1}= the change in the return on asset i per unit change in factor 1

f_{1t }= the value of factor 1 in period t

b_{i2}= the change in the return on asset i per unit change in factor 2

f_{2t }= the value of factor 2 in period t

... = terms of the form b_{ij}*f_{j }with j going from 3 to m-1

f_{m }= the value of factor m

b_{im}= the change in the return on asset i per unit change in factor m

m = the number of factors

e_{it}= the residual return on asset i in period t

While the subscript *t* and the term *period *suggest the traditional
application in which each period represents a different historic realization (for example,
a different month in the past), the concepts can be used as well in an *ex ante*
analyses, in which each period (t) represents a different possible *scenario* or *realization*
that could occur in the next (future) period. To emphasize the context, we will
sometimes use the subscript s (for scenario) instead of t (for time period). In the
former case, there are S scenarios. In the latter, there are T time periods.

Note that in this representation the b_{ij }terms are not given a t (period) or
s (scenario) subscript. This is innocuous in the latter case, since every scenario
involves the same future period. However, in the former case, the assumption is
quite restrictive, since it indicates that the asset's exposures to the factors were the
same in every period. In some cases involving ex post returns, different exposures
will be estimated for different time periods, with b_{ij }values replaced
with b_{ijt} values.

To simplify notation and to facilitate computation it is useful to switch from a subscripted notation to a matrix representation. As usual, we utilize Matlab conventions. We consider several cases in turn, focusing on decomposition of returns, be they over time or over scenarios. For simplicity we cast our examples in terms of historic returns over different time periods, but the interpretations can easily be adapted to cases involving different possible scenarios over a single future time period.

First consider the case in which there is one asset and one time period (looking backward) or scenario (looking forward).Let b be a {1*m) vector of the asset's factor exposures, let f be an {m*1} vector of actual factor values, r a scalar representing the asset's return and e a scalar representing its residual return. The factor model equation can then be written as:

r = b*f + e

For example, let the asset's exposures to the factors be:

b = [ 0.1 0.3 0.6 ]

Assume that the realized values of the factors in a given year were:

f = [ 4 7 20 ]

If the total return on the asset (r) was 16.0 percent, then:

e = r - b*f = 16.0 - 14.5 = 1.5

Thus, in the year in question, the asset's residual (non-factor related) return was 1.5%, while its factor-related return was 14.5%.

Next, consider a case in which there are many historic periods (looking backward) or scenarios (looking forward). Let there be T such alternatives. For each one, there will be a return on the asset, so that the scalar r will be replaced by T values, which can be written as a {1*T} (row) vector. Similarly, for every alternative there will be a set of factor values, so that f will be replaced by a {m*T} matrix. This will give a residual return for each of the periods or cases, so that the scalar e will be replaced by a {1*T} vector. We assume that the asset's factor exposures will be the same in each case, so that b will remain a {1*m} vector.

The relationships among these variables can then be written with the following succinct equation:

r = b*F + e

Given r, b and F, the residual returns can be found by performing the operation:

e = r - b*F

For example, assume that in the last two years the realized returns for the asset were:

r = [ 16 4 ]

while the factor values were:

F = [ 4 3 7 2 20 10 ]

This implies that the factor-related returns were:

b*F = [ 14.5 6.9 ]

and the residual returns were:

e = r - b*F = [ 1.5 -2.9 ]

In each case the first column corresponds to the previous one-period example, which is a special case (with T=1) of this present version.

An even more general case can subsume both of the prior ones as special cases. Assume that there are N assets and T realizations. Let:

R = an {N*T} matrix, where R(i,t) is the return on asset i in realization t

B = an {N*m} matrix, where B(i,j) is the exposure of asset i to factor j

F = an {m*T} matrix, where F(j,t) is the value of factor j in realization t

e = an {N*T} matrix, where e(i,t) is the residual return on asset i in realization t

The factor model then becomes:

R = B*F + E

and the matrix of residual returns can be found by computing:

E = R - B*F

As an example, assume that we have four assets, with exposures to three factors given by:

B =[ 0.1 0.3 0.6 0.2 0.8 0 0 0.7 0.3 0 0 1.0 ]

If the asset's returns in the two years were:

R =[ 16 4 7 1 8 6 22 7 ]

Then the residual returns were:

E = [ 1.5 -2.9 0.6 -1.2 -2.9 1.6 2.0 -3.0 ]

Not surprisingly, the first security is the one used in the prior case.

Note that three different dimensions are involved here (two periods, three factors, and four assets). The matrices, with row and column labels are as follows:

Returns (R):period 1 period2 security 1 16.0 4.0 security 2 7.0 1.0 security 3 8.0 6.0 security 4 22.0 7.0Asset Exposures (B):factor 1 factor 2 factor 3 security 1 0.1 0.3 0.6 security 2 0.2 0.8 0.0 security 3 0.0 0.7 0.3 security 4 0.0 0.0 1.0Factor realizations (F):period 1 period2 factor 1 4.0 3.0 factor 2 7.0 2.0 factor 3 20.0 10.0Factor-related Returns (B*F):period 1 period2 security 1 14.5 6.9 security 2 6.4 2.2 security 3 10.9 4.4 security 4 20.0 10.0Residual returns (E = R - B*F):period 1 period2 security 1 1.5 -2.9 security 2 0.6 -1.2 security 3 -2.9 1.6 security 4 2.0 -3.0