GLM
From VISTA LAB WIKI
- Back to Functional
The General Linear Model (GLM) is a standard approach for analyzing fMRI experimental data. The method is well-suited to experiments that include more than one condition (+ baseline). The method is used commonly to test the relative contribution of different experimental conditions to the BOLD time course in both block or rapid-event related measurements.
The GLM idea is to explain the time series measured in the BOLD response as the weighted sum of a collection of vectors (basis functions). These vectors make up the columns of a design matrix. The art of GLM analysis is to create these vectors so that they represent meaningful components of the time series. The basic linear methods used here are the same as those used in nearly all branches of science that rely on linear decompositions or transformations of data sets.
The GLM is used for analyzing Event Related as well as block-design experiments. Both are described here.
Contents |
GLM: Background
The GLM design matrix specifies how to decompose the observed fMRI time series into the weighted sum of a few, fixed time series. These fixed time series are called basis functions, though in neuroimaging they are usually called predictors. The predictors are the predicted signals from different experimental conditions. To perform a GLM analysis we define the predictor functions and determine how much each predictor contributes to the measured time series. That is, we express the time series, y, as a weighted sum of the predictors,
y = sum (b_i * D_i) + e
In this formula, y is the time series, D_i are the predictor functions, and b_i are the weights (these are often called the 'betas'). The term e is the error, sometimes called the residual. To fit a glm, we find the set of b_i that minimizes the squared error (voxel by voxel).
It is useful to know how effectively the predictors account for the time series. If the entire linear model does a poor job at predicting the time series, it is less meaningful to know that one predictor term (measured by its beta weight) is reliably larger than another. We use the percent variance explained within a region of interest to evaluate the quality of the linear model.
We can also ask more detailed questions about the predictors and weights. For example, we might ask whether a predictor weight differs significantly from zero. Or, we can ask whether the beta for one predictor exceeds that from another. These comparisons are called contrasts. The statistical significance of these contrast tests is estimated using a t-statistic.
We explain the mathematics and various statistical tests in the "The design matrix" and "Estimating the linear model weights" sections below.
Comparisons with SPM2
We compared the implementation in our GLM model, specifically the stimated weights and the t-values using mrVista, with the values calculated in SPM2. They are identical in most ways. We note some differences.
- The weights returned by SPM2 are a scale factor away from the weights in mrVista. This is due to the way SPM2 scales the design matrix, we believe. We don't care.
- The t-values are identical within the brain. SPM2 masks the t-values outside of the brain in a variety of ways. Hence, there can be differences between SPM2 and mrVista when doing whole brain statistics; the point-wise comparisons are identical.
- SPM2 allows you to set a cluster size threshold. We don't.
- See the code in EventRelated\glmTest for the data and methods of doing this comparison (MBS and BW).
The design matrix
To fit a linear model to fMRI data, we first specify predictors, D_i. Each predictor is a time series that represents what you expect the consequence of a particular event to be. If the measurements at each pixel comprise 72 measurements, the predictor is a a column vector with length 72.
A typical experiment includes several predictors. These are grouped into the columns of the design matrix, D. A design matrix that predicts four types of events has four columns. Hence, D is 72 x 4 in this example.
Normally, there is a predictor for each experimental condition. It is further possible to specify predictors that describe unwanted measurement artifacts, such as a slow drift in the signal. These predictors are called 'nuisance' terms. In many cases in the literature design matrices include predictors for nuisance factors.
In mrVista, we remove nuisance factors prior to performing the GLM analysis. We also express our time series as a modulation around the mean (percent modulation). We would prefer to use real units, but the BOLD signal does not yet have physical units. By using percent modulation, we reduce the main effect of unwanted effects such as the signal drop-off with distance from the RF coil.
Estimating the linear model weights (beta-weights)
We estimate the size of the contribution (beta) from each predictor using conventional linear methods. At each voxel we find the least-squares solution, b, to the formula
y = D * b
where y is the observed time series and D is the design matrix. The term b is a vector of beta-weights.
There are a variety of ways to calculate the least-squares estimate of b given y and D. The specific formula differs depending on assumptions concerning the noise and covariance of the predictors. These different methods are described in the signal processing literature (see XXX, pp. YYY in Oppenheim and Shafer? where else? for a tutorial).
In the case where the noise is Gaussian and the predictors are independent, the least-squares solution is the one-line Matlab command
b = pinv(D)*y
where pinv(D) is the Moore-Penrose pseudoinverse ((D^t*D)^-1*(D^t))).
In Matlab it is faster to compute using the operator b = D\y. It is also efficient to calculate in a single matrix, Y, whose columns are the time series at the voxels of interest. In this case, B = D\Y, where the columns of B are the beta-weights at each voxel in the region of interest.
The weights are a least-squared errors estimate of the signal given the design matrix. Hence, D*b approximates y, but it is not precisely equal. We can divide express the time series at a voxel as comprising two parts,
y = D*b + E
where D*b is the linear model prediction and E = (y - D*b) is the error. The vectors, D*b and E, are always orthogonal (i.e., the dot product of D*b and E is zero). Thus, the variance in the data, y, is the sum of the variance in D*b (V_D) plus the variance in E (V_E). We can summarize how well the linear model predicts the data using the percent variance explained, namely
100 * (V_D / (V_D + V_E))
The unexplained variance may be due to instrument noise or an inadequate model (i.e., we don't know what the heck us going on). Some people are happy when the variance explained, V_D, is significantly different from 0. Others are happy when the percent variance accounted for reaches some level, such as 20-30%.
Because of a widespread preference for using obscure statistical tests, very few labs report the percent variance explained. Rather they say that the linear model meets some statistical significance p(V_D > 0) > 0, which means the model is better than nothing at all. We think it is better to know how well the linear model explains the data, and thus we think one should report the percent variance explained.
SPM2 comparison
We (MBS and BW) verified that the linear model weights estimated by SPM2 and mrVista code agree to within a scale factor. The analyses for verifying this agreement are described in the scripts within the distribution at: mrLoadRet/EventRelated/glmTest.
The data files needed to run those scripts are not part of the usual distribution but can be obtained from SPM Files
Statistical analysis of the weights
If the linear model explains a significant portion of the time series, then it is meaningful to ask which of the terms in the linear model explains this variance. For example, when there are predictors for two different stimuli or tasks, A and B, a voxel may be well predicted by predictor(A) but not predictor(B). This is captured in the linear model because the weight corresponding to the predictor(A) differs significantly from 0, but not the weight for the predictor(B).
In many experiments, the predictors are not orthogonal functions. An unfortunate consequence of the lack of orthogonality is that the uncertainty associated with the weight you estimate for one predictor is correlated with the noise in the other predictors. Suppose we estimate two weights, b_1 and b_2. We know that the estimates contain noise. Thus b_1 = w_1 + N1, and b_2 = w_2 + N2, where w_1 and w_2 are the true weights, and N1 and N2 are two noise samples. When the predictors are not orthogonal, the noise in the two estimates will be correlated (positively or negatively). Statistical tests comparing these estimated weights must account for the correlation in the noise.
GLM: mrVista implementation
The general procedure for performing a GLM analysis in mrVista is
- Specify the design matrix:
- Create .par files specifying the experimental protocol
- Assign .par files to each scan
- Group related scans together
- Set parameters, e.g., HRF model used to smooth the predictors.
- Estimate the beta weights (apply the glm model)
- Test hypotheses about the weights (compute contrast map)
- Select Regions of Interest based on the contrast maps
- Verify effects / test orthogonal hypotheses about ROI using by viewing time courses
Specifying the design matrix in the mrVista GUI
Create .par files for each scan
To define a design matrix in mrVista, you need to create text files, known as 'paradigm files' or .par files, which specify the experimental protocol for each scan. Such a file can have 2 to 4 columns: EventStart, ConditionCode, ConditionName, ConditionColor. These columns should be delimited by a tab (which can be produced by sprintf('\t')
in MATLAB, or exported as tab-delimited ASCII in e.g. a spreadsheet program).
- EventStart: the time (in seconds) when the event started.
- ConditionCode: an integer for each condition. Set the baseline as condition 0. Number the conditions in the order you would like to see them plotted in, say, bar plots
- ConditionName: a string describing the event name
- ConditionColor: an [R G B] triplet associating a color with a given condition; each intensity can range from 0 to 1. E.g., magenta is [1 0 1], cyan is [0 1 1], dark red is [.5 0 0].
Only the first two columns, EventStart and ConditionCode, are required. For ConditionName and ConditionColor, the code only looks for the first occurence of each condition in a par file to get the name/color. If omitted, it will assign default names and colors.
For example, in a typical LO localizer, with a 12 sec off-12 sec on stimulus, we may have the following RhymingWordsLines1.par protocol file.The information is created using a text editor and saved in a file named [expCodeRunNumber].par.
Save the file in the directory: sessionDir/stim/parfiles/.. (sessionDir/Stimuli/Parfiles/ will also work.)
Assign .par files to each scan
After you create these files for each scan in an experiment, assign the files to the scans from the mrVista GUI. EventRelated | Assign parfiles to scans....
- In the first dialog, choose the scan(s) and click ok.
- In the second dialog, choose the par file(s) and click ok.
- If more than one item is selected, the par files will be assigned by the order that they appear. If you select many scans and one parfile, it will be assigned to all of them.
Group Scans Together
Select the menu option EventRelated | Grouping | Group Scans Together. Select first the data type of the scans you wish to group, then the set of scans you wish to analyze together. This information will be stored in the dataTYPES variable. Later operations, such as applying a GLM to the data set, computing contrast maps, and plotting time course data in ROIs will be facilitated by this. (The new GLM implementation requires that you do this before applying a GLM).
You can go back later and re-group scans; for instance, if one scan out of a set of 6 has excessive motion, you can re-analyze the data in the scans exlcuding the bad one.
When you are done, it is a good idea to check that the assignment of scans and .par files is correct by selecting Event-Related | Show scan group / parfiles.
Set Event-Related analysis parameters
To create the predictor functions, mrVista combines the specification of the stimulus event times with a model of the hemodynamic response function (HRF). You can choose from several models or load a file with the HRF. Choose this (and other) parameters in EventRelated | Edit event related parameters.
The event-related analysis parameters are described in detail on a page describing the mrVista code conventions. These parameters affect not just GLM calculations but how time course data is visualized. The relevant fields for the HRF specification relate to the shape of the HRF and the duration of each event. The HRF type flags are as follows:
0: "deconvolve": estimate the peri-stimulus time course using separate predictors for each time point, as per Burock and Dale, HBM, 2000. (This is often used in rapid event-related designs, because the close spacing of events means a pre-computed HRF may not closely match the shape of response to different event types.)
1: Use the event-triggered average for the conditions set in snrConds.
2: Use the Boynton et al Boynton et al 96 gamma HRF. [2=default value]
3: Use the SPM difference-of-gammas HRF. (requires that you have a version spm installed).
4: Use the Dale + Buckner Dale et al 99 HRF.
5: provide a dialog to select a saved HRF file.
(string): path to a saved HRF file.
In addition to selecting the HRF type, you can also specify event durations for block-design scans, by setting the "TRs / block" field.
Computing the beta weights in the mrVista GUI
Event-Related > GLM / contrast, new code > apply GLM (scan group)
preliminary requisites: .par files in the appropriate path, group all scans on which you wish to apply the model.
Relevant mrVista functions:
- applyGlm
- applyGlmSlice
- glm_createDesMtx
Each time you apply a GLM, a new scan is created in the "GLMs" data type (which is created the first time you run a GLM). This "scan" doesn't actually contain time series, but rather contains all the fields of the model once it is solved. (The code could save the concatenated time series from the original scan group, but we omit this because you can access these time series already, by loading the scan group time series, which points back at the source scans.) Scans in the GLMs data type more properly reflect separate analyses. For instance, if you collected data on 5 scans, and the quality of the last scan is questionable, you can run two GLMs, one on all 5 scans, and one on the first 4 only, and the results would be kept separately for each scan.
Several paramter maps are produced when a GLM is applied. This includes maps for the proportion variance explained by the model for each voxel; the residual variance at each voxel, and the beta weights for each predictor at each voxel. This last one is stored under GLMs/RawMaps, while the former is stored under GLMs/. When you compute statistical contrast maps (below), those maps are also stored in the GLMs/ directory. If the same map is computed for different GLM analyses, there will be a separate entry in that map for the corresponding scan. (E.g., if you compute "All Conditions" > "Baseline" for two analyses, and the specified contrast name is the same, then you can use the scan slider to switch between analyses and look at the resulting map under the different conditions).
Each scan in the GLMs data type has as its assigned scan group the original data used during the solution of the GLM, and the event-related parameters assigned to the GLM are the same used during the application. Thus, if you are in the GLMs data type, have a map loaded, and have selected an ROI whose time course you wish to see, you can plot the time course UI for the scan group (Ctrl-G) and the original data will load.
Computing a contrast map
How to compute a contrast map (t-statistic) in mrVista:
Event-Related > GLM/contrast, new code > compute contrast map
in the GUI, choose active and control conditions, and specify their weights (if not all equal). you can rename the contrast at the bottom, and press GO to compute. You can be more specific about which statistic to compute etc, under "More..."
Relevant mrVista functions:
- computeContrastMap2
The contrast map code saves the maps as parameter maps, which can be loaded and saved under the File | Parameter Map menu. The maps are saved in the GLMs data type of each scan.
Visualizing Time Course Data
Along with the GLM tools, there are also tools to examine time course data within an ROI. The Time Course UI allows you to plot and analyze the mean time series across voxels in an ROI, while the Multi Voxel UI allows analysis and visualization of time series data for each voxel in an ROI. Each interface may be invoked from the EventRelated menu in a mrVista window.