Linear Factor Models

A Generic Linear Factor Model

The Equation

A linear factor model relates the return on an asset (be it a stock, bond, mutual fund or something else) to the values of a limited number of factors, with the relationship described by a linear equation. In its most generic form, such a model can be written as:

r_i = b_i1*f₁ + b_i2*f₂ + .... + b_im*f_m + e_i

where:

r_i = the return on asset i
b_i1 = the change in the return on asset i per unit change in factor 1
f₁=   the value of factor 1
b_i2 = the change in the return on asset i per unit change in factor 2
f₂=   the value of factor 2
... = terms of the form b_ij*f_jwith j going from 3 to m-1
f_m=   the value of factor m
b_im = the change in the return on asset i per unit change in factor m
m = the number of factors
e_i = the portion of the return on asset i not related to the m factors

For emphasis, the equation is sometimes written so that variables that are assumed to be known before the fact are differentiated from those the value of which is generally not known until after the fact. For example:

r_i^{^~} = b_i1*f₁^{^~} + b_i2*f₂^{^~} + .... + b_im*f_m^{^~} + e_i^{^~}

In this version, a tilde after a variable indicates that its value is not generally known in advance. The values of such stochastic variables are uncertain. Thus we do not know what the return on the asset (r_i^{^~} ) will be, since we do not know the values that the factors (f₁^{^~}, f₂^{^~}, .... ,f_m^{^~} ) will take on, nor do we know the amount of the asset's return that will come from other sources (e_i^{^~}). On the other hand, we do know (or at least assume that we know) the sensitivities of the return on the asset to each of the factors ( b_i1, b_i2, ....,b_im) -- these are deterministic (not subject to uncertainty). Somewhat differently put, they are parameters in the model.

Purists will note that it is unusual to place tildes after stochastic variables rather than over them. The latter is indeed the convention in media that are not typographically challenged. Our approach is simply a pragmatic response to the limitations of standard browser formats.

The Key Assumption

The factor model equation may appear to make a significant statement about the relationship between an asset's return and the values of the enumerated factors, but this is not so. For example, one could choose any arbitrary set of b_ij's and f_j's, then simply define the residual as:

e_i^{^~} = r_i^{^~} - [b_i1*f₁^{^~} + b_i2*f₂^{^~} + .... + b_im*f_m^{^~}]

The factor equation would then hold precisely, but could have no economic content at all. To make the equation have meaning, two assumptions are made. One is relatively innocuous. The other is not.

First, the residual return (e_i^{^~}) is assumed to be uncorrelated with each of the factors:

corr (e_i^{^~}, f_j^{^~}) = 0 : for every j from 1 to m

This is not as restrictive as it may seem. Consider, for example, a case in which the residual return is correlated with factor 1. By adjusting the factor exposure (b_i1) appropriately, the correlation of the residual with the factor can be made to equal zero. Moreover, this can be done for every factor. In fact, in simple settings using historic data, multiple regression procedures can be used to find a set of factor exposures (b_ij 's) that will give residual returns that are uncorrelated with each of the factors. Why? Because standard linear multiple regression methods select slope coefficients (here, the b_ij 's) that minimize the variance of the residual (here e_i). But this will insure that the residual is uncorrelated with each of the independent variables (here, the f_j's), since the removal of any such correlation by changing one or more b_ij's will reduce the variance of the residual.

Thus the assumption that the residual is uncorrelated with each of the factors is convenient, but does not give the linear factor model much power. However, the second assumption does.

The key assumption of a linear factor model is that the residual for one asset's return is uncorrelated with that of any other:

corr (e_i^{^~},e_j^{^~}) = 0 : for every i not equal to j, with i and j running from 1 to m

This means that the only sources of correlations among asset total returns are those that arise from their exposures to the factors and the covariances among the factors. The residual component of an asset's return is assumed to be unrelated to that of any other asset, and hence totally specific to that asset. In other words, the risk associated with the residual return is idiosyncratic to the asset in question.

This assumption makes a linear factor model powerful in the sense that it rules out many possible combinations of outcomes. But greater power comes at a cost. The more restrictive a model, the greater the chance that it may be inconsistent with reality. For this reason it is incumbent on the Analyst to try to capture the most important sources of correlations among asset returns by including a sufficient number of factors and attempting to focus on the most important ones. This being said, as in the construction of any model, parsimony is a virtue, since the goal is to include "signals" and avoid "noise".

Non-linear Relationships

We have termed the standard factor model linear which, strictly speaking, it is. However this is far less restrictive than might first seem. There are no restrictions on correlations among the enumerated factors, so it perfectly possible to include some that are correlated with others or are transforms of others. For example, assume that the desired relationship is a quadratic one in which r_i is related to two factors, f_a and f_b as follows:

r_i = b_i1*f_a + b_i2*f_{b +}b_i3*f_a² + b_i4*(f_a*f_b) + e_i

To put this in our standard format, define:

f₁⁼f_a
f₂⁼f_b
f₃⁼f_a²
f₄⁼f_a*f_b

Then the relationship can be written as a linear function of these new variables:

r_i = b_i1*f₁ + b_i2*f₂ + b_i3*f₃ + b_i4*f₄ + e_i

In cases of this sort it may be difficult to estimate the values of the sensitivities (b_ij's) from historic data because the new factors are highly correlated with each other, but there is no reason why such a format cannot be employed if good estimates can be obtained.

To avoid needless carping on the need to define factors to allow a linear format for the overall relationship, we henceforth will use the shorter term: factor models.

Expected Residual Returns

Thus far we have imposed no restrictions on the expected returns of the factors or on the asset's residual returns (e_i's). In general, we will not do so. This allows the expected return of e_i to be positive, negative, or zero for any asset. However, in some applications it is useful to divide the expected non-factor return into two components -- a known expected value and an unknown residual component with an expected value of zero. As typically written, the equation becomes:

r_i^{^~} = b_i1*f₁^{^~} + b_i2*f₂^{^~} + .... + b_im*f_m^{^~} + (a_i + e_i^{^~})

where the expected value of e_i^{^~}= 0.

As the choice of letter suggests, the equation is often written with the a_iterm first:

r_i^{^~} = a_i + b_i1*f₁^{^~} + b_i2*f₂^{^~} + .... + b_im*f_m^{^~} + e_i^{^~}

In some cases the first term is called the asset's alpha value, but at this point we use the more humble notation of "a".

Terminology

Factor models are used in many domains in the field of investments, so it should not be surprising that different factors are used and different terms employed to describe the key components.

Factors (the f_j's) may be::

macro-economic variables
returns on pre-specified portfolios,
returns on zero-investment strategies (long and short positions of equal value) giving maximum exposure to a fundamental or macro-economic factors,
returns on benchmark portfolios representing asset classes,
or something else.

The b_ijcoefficients may be called:

factor exposures,
factor sensitivities,
factor loadings,
factor betas,
asset exposures
style
or something else.

The e_iterm may be called:

idiosyncratic return,
security-specific return,
non-factor return,
residual return,
selection return
or something else.

Different problems require different factors and emphasize different economic relationships. The job of the Analyst is to either construct and apply an appropriate factor model for the task at hand or to at least understand the underlying structures and economic meanings of models constructed by others.

Decomposing Returns

A factor model is especially useful when analyzing historic asset returns, since such a model allows the Analyst to separate components of the overall return of the asset. For such purposes it is useful to write the underlying model as:

r_it = b_i1*f_1t + b_i2*f_2t + .... + b_im*f_mt + e_it

where:

r_it = the return on asset i in period t
b_i1 = the change in the return on asset i per unit change in factor 1
f_1t=   the value of factor 1 in period t
b_i2 = the change in the return on asset i per unit change in factor 2
f_2t=   the value of factor 2 in period t
... = terms of the form b_ij*f_jwith j going from 3 to m-1
f_m=   the value of factor m
b_im = the change in the return on asset i per unit change in factor m
m = the number of factors
e_it = the residual return on asset i in period t

While the subscript t and the term period suggest the traditional application in which each period represents a different historic realization (for example, a different month in the past), the concepts can be used as well in an ex ante analyses, in which each period (t) represents a different possible scenario or realization that could occur in the next (future) period. To emphasize the context, we will sometimes use the subscript s (for scenario) instead of t (for time period). In the former case, there are S scenarios. In the latter, there are T time periods.

Note that in this representation the b_ijterms are not given a t (period) or s (scenario) subscript. This is innocuous in the latter case, since every scenario involves the same future period. However, in the former case, the assumption is quite restrictive, since it indicates that the asset's exposures to the factors were the same in every period. In some cases involving ex post returns, different exposures will be estimated for different time periods, with b_ijvalues replaced with b_ijt values.

Matrix Representations of Factor Models

To simplify notation and to facilitate computation it is useful to switch from a subscripted notation to a matrix representation. As usual, we utilize Matlab conventions. We consider several cases in turn, focusing on decomposition of returns, be they over time or over scenarios. For simplicity we cast our examples in terms of historic returns over different time periods, but the interpretations can easily be adapted to cases involving different possible scenarios over a single future time period.

One asset, one realization

First consider the case in which there is one asset and one time period (looking backward) or scenario (looking forward).Let b be a {1*m) vector of the asset's factor exposures, let f be an {m*1} vector of actual factor values, r a scalar representing the asset's return and e a scalar representing its residual return. The factor model equation can then be written as:

r = b*f + e

For example, let the asset's exposures to the factors be:

b =  [ 0.1    0.3    0.6 ]

Assume that the realized values of the factors in a given year were:

f = [ 4
      7
     20 ]

If the total return on the asset (r) was 16.0 percent, then:

e = r - b*f
  = 16.0 - 14.5
  =  1.5

Thus, in the year in question, the asset's residual (non-factor related) return was 1.5%, while its factor-related return was 14.5%.

One Asset, Multiple Realizations

Next, consider a case in which there are many historic periods (looking backward) or scenarios (looking forward). Let there be T such alternatives. For each one, there will be a return on the asset, so that the scalar r will be replaced by T values, which can be written as a {1*T} (row) vector. Similarly, for every alternative there will be a set of factor values, so that f will be replaced by a {m*T} matrix. This will give a residual return for each of the periods or cases, so that the scalar e will be replaced by a {1*T} vector. We assume that the asset's factor exposures will be the same in each case, so that b will remain a {1*m} vector.

The relationships among these variables can then be written with the following succinct equation:

r = b*F + e

Given r, b and F, the residual returns can be found by performing the operation:

e = r - b*F

For example, assume that in the last two years the realized returns for the asset were:

r =  [ 16     4 ]

while the factor values were:

F =  [ 4     3
       7     2
      20    10  ]

This implies that the factor-related returns were:

b*F = [ 14.5    6.9 ]

and the residual returns were:

e = r - b*F = [ 1.5   -2.9 ]

In each case the first column corresponds to the previous one-period example, which is a special case (with T=1) of this present version.

Multiple Assets, Multiple Realizations

An even more general case can subsume both of the prior ones as special cases. Assume that there are N assets and T realizations. Let:

R = an {N*T} matrix, where R(i,t) is the return on asset i in realization t

B = an {N*m} matrix, where B(i,j) is the exposure of asset i to factor j

F = an {m*T} matrix, where F(j,t) is the value of factor j in realization t

e = an {N*T} matrix, where e(i,t) is the residual return on asset i in realization t

The factor model then becomes:

R = B*F + E

and the matrix of residual returns can be found by computing:

E = R - B*F

As an example, assume that we have four assets, with exposures to three factors given by:

B =[ 0.1    0.3    0.6
     0.2    0.8     0
      0     0.7    0.3
      0      0     1.0 ]

If the asset's returns in the two years were:

R =[  16     4
       7     1
       8     6
      22     7  ]

Then the residual returns were:

E = [ 1.5   -2.9
      0.6   -1.2
     -2.9    1.6
      2.0   -3.0 ]

Not surprisingly, the first security is the one used in the prior case.

Note that three different dimensions are involved here (two periods, three factors, and four assets). The matrices, with row and column labels are as follows:

Returns (R):

                 period 1     period2
security 1           16.0         4.0
security 2            7.0         1.0
security 3            8.0         6.0
security 4           22.0         7.0

Asset Exposures (B):

                 factor 1    factor 2    factor 3
security 1            0.1         0.3         0.6
security 2            0.2         0.8         0.0
security 3            0.0         0.7         0.3
security 4            0.0         0.0         1.0

Factor realizations (F):

               period 1     period2
factor 1            4.0         3.0
factor 2            7.0         2.0
factor 3           20.0        10.0

Factor-related Returns (B*F):

                 period 1     period2
security 1           14.5         6.9
security 2            6.4         2.2
security 3           10.9         4.4
security 4           20.0        10.0

Residual returns (E = R - B*F):

                 period 1     period2
security 1            1.5        -2.9
security 2            0.6        -1.2
security 3           -2.9         1.6
security 4            2.0        -3.0