Lab Manual: Data Analysis

From SpanLabWiki
Jump to: navigation, search

Welcome to part 1 of the data analysis pathway. The lab manual will cover each step in the diagram below in order, beginning with the pre-processing column and moving on to the deconvolution column. This is an AFNI-centric data-processing pipeline; other popular programs include FSL (recommended) and SPM.

Error creating thumbnail: File missing

Data Pre-Processing[edit]

Back at the lab, hop on a computer with AFNI capabilities – dmthal and amygdala are probably your best bet, though you can also use a terminal on another computer to ssh to dmthal.

$ ssh –Y span@dmthal
[enter password – obtain this from the lab coordinator]
$ cd /
$ ls
[and navigate from here into the experiment directory]

Once you’ve got a terminal open, locate your subject’s data folder in the experiment directory. Note that the terminal does not by default open to the root directory. If you’re unable to find your data folder it may be up a few levels; type “cd /” to return to the root directory, then continue your search from there.

Reconstructing Anatomicals[edit]

You needn’t do this first if you record the I & S values at the scanner (The I & S values give you the inferior and superior location of the subject’s head). If you don’t have those values, though, you’ll want to reconstruct the anatomicals first thing and use 3dinfo on the anatomical files to obtain the I & S values.

To reconstruct the anatomical scans, go into the subject’s folder for their specific scan

$ cd SCANEXAM#_MMDDYEAR 

Go into the folder (number depends on series order in scan protocol) with the high-resolution anatomical files (in our 3dVolume scan [the last long scan at the scanner] there should be 116 .dcm files)

$ cd 005 

Use AFNI’s “to3d” command to create 3D datasets from the 2D I-files. You don’t usually need to specify coordinates with .dcm files, as to3d can read the information from the file. This will create two files in AFNI format with the specified prefix “anat”: anat+orig.BRIK and anat+orig.HEAD, where anat = axi24 for the low-res scan, and anat = sag116 for the high-res scan. (The “+orig” denotes that the images have not been warped into ac-pc aligned or Talairach space).

$ to3d –prefix anat1 I*  (where “anat1” = axi24 or sag116)

The anatomical folder with only 24 files is your low-res axi file; in that folder your anat1 will be axi24 ($ to3d –prefix axi24 I*). The anatomical folder with 116 files is your high-res sag file; in that folder you’ll run the to3d command with the sag116 prefix ($ to3d –prefix sag116 I*).

You should check that the anatomical images have been created successfully by opening the AFNI interface. Set sag116 as your underlay. Bring up and scroll through images in all three planes by hitting the image button on the LH side of the AFNI GUI (Graphic User Interfact).

$ afni

This is what the brain viewing part of the AFNI interface looks like. The first image is the "axial" view. The second image is the "sagittal" view. And the image on the right is the "coronal" view.

Afninav.jpg

After you’ve made sure that these images are correct and don’t look weird (e.g., have large stripes of missing image), you’ll want to copy or move them to the level of your functional data.

$ mv anat* ../..

So the entire process start to finish will look something like this:

dmthal 002] $ to3d –prefix axi24 I*
dmthal 002] $ mv axi* ../..
dmthal 002] $ cd ../006
dmthal 006] $ to3d –prefix sag116 I*
dmthal 006] $ mv sag116* ../..

Remember, use ‘sag116’ to name the high-resolution anatomical scans, ‘axi24’ to name the low-res in-plane scans.

Now, if you forgot to write down the I & S values and need to obtain them, apply the 3dinfo command to your constructed anatomical file.

dmthal jp120406] $ 3dinfo axi24+orig.*

This will display a lot of numbers; the only ones you care about are the I and S values. Round to the nearest tenth and record the I value, ignoring the minus sign, as well as the S value, both in the subject’s note file.

Editing the “process” script[edit]

The process script is responsible for whipping your data into usable shape. It does so by:

  • Reconstructing the data: the data is recorded in k-space and needs to be reconstructed into image space so that the result actually looks like an image of a brain. The files in k-space are the P#####.7 files and the reconstructed files are the P#####.7.mag files. we use the "grecons12" script (written by Gary Glover) for this transformation. (for more info on k-space see Huettel’s Functional Magnetic Resonance Imaging textbook.
  • Combining the spiral in and spiral out images: we use a spiral in/out pulse sequence. This means that two separate images of the brain are taken at each and every scanner acquisition. These images are then averaged, using a special weighting scheme, in oder to make one image per acquisition. We use the "spiralioadd" script (written by Gary Glover) to do this.
  • Converting to AFNI format: the "to3d" afni command is used to convert the data to afni format.
  • Cutting and pasting the data from multiple blocks: this includes snipping out data that is not of interest to you (for example, data acquired at the very beginning as the scanner is adjusting) and pasting together your assembled blocks into one long time series. (uses the 3dTcat afni comand)
  • Correcting the Slice Timing: Corrects for the fact that slices are taken at slightly different times within each TR. (It does this by interpolating the response curve for each voxel.) This is critical for event-related designs – for block designs, which generally gather data on timescales of 20 seconds or more, being off by less than 2s won’t make such a significant difference. (uses the 3dTshift afni command)
  • Coregistering the volumes: Makes sure each acquired full volume (full brain) is lined up (‘registered’) with every other volume. This is the main method of motion correction. This is the process that creates output files that allow you to check to see how much the subject moved on a graph. (uses the 3dvolreg afni command)
  • Applying spacial smoothing: Applying a slight Gaussian blur to your data. Note that most volumes have something on the order of 100,000 voxels that will be evaluated for significance. Even with a cutoff of < 0.05, you would expect to see around 5,000 spurious ‘activations.’ By blurring your data a bit, you minimize these false alarms, as surrounding voxels in these regions will not show similar ‘activation.’ Regions of real interest will tend to have clusters of significant voxels, so true activations won’t be smoothed out assuming you’ve chosen an intelligent amount of blurring – too much and you could attenuate true activation below the statistical threshold. Keep in mind that applying a Gaussian blur that is too large (e.g., a FWHM of >10mm) can cause bilateral midbrain activation to pool together into the ventricles down the center, obscuring the fact that you had bilateral activation to begin with. (uses the 3dmerge afni command)
  • Normalizing your data: creates data with units of percent-signal-change rather than ‘mag’ units, which are difficult for us to interpret. When you normalize your data you also specify a baseline. In imaging your baseline activation is to some degree arbitrary, so choice of baseline is not trivial. In our lab we average the activation of each voxel over the entire task and use that number as a baseline for that voxel. This doesn’t matter for our regressions, but it does matter for our timecourse for the entire task. A voxel in a brain region that is frequently recruited over the course of a task will have a higher baseline than a baseline in the physiological sense on account of the fact that you are averaging a voxel that is frequently ‘on’ during that time period. Another way of saying this is that this type of average is higher over the period of time in which you’re doing the task than it would be if you averaged over 24 hours of just sitting there. (uses a combination of the 3dTstat and 3dcalc afni commands)
  • Applying a temporal filter: Filters your data in frequency space (to remove variations caused by heart rate, pressing buttons, breathing, et cetera) and removing linear trends (as might be caused by your subject’s head slowly settling into the pillow). There’s a really worthwhile discussion of this starting on page 274 of Huettel’s Functional Magnetic Resonance Imaging textbook. (uses the 3dFourier afni command)

In your data folder you should find a folder with your subject’s initials and the date on which he/she was run (e.g., “jp120406”). Copy to your subject’s directory the process script from the scripts directory, or from another subject’s folder within the same study if it is not your first subject. Before running the script, however, you’ll need to edit it. To do so, open the script with emacs:

dmthal jp120406] $ emacs process

This section of the lab manual will walk you through the process script in detail.

Below is a screen capture of one such script, with comments and notes. If you’re modifying the process script from an entirely different experiment you’ll need to change all the circled values on each page of the script. Here we’ve only circled the first instance of each number, so be sure to do a find-replace to catch them all. [Find-replace on a mac is shift+apple+x, then type replace-string, press enter, type the string you’re replacing, press enter, then the replacement string, press enter.] If you are only modifying from one subject to another within the same experiment, you need only change the p-file numbers and I and S values (see above; in this case the p-file names are P43520.7 and P44032.7, while the I & S values are 15.2I-76.8S). Again, be sure to use find-replace to replace all the numbers at the same time; otherwise you are guaranteed to miss a few. (See the emacs section for information on navigating the program.)

Process1.jpg

Process2.jpg

Process3.jpg

Process4.jpg

If you’ve edited the process script as needed, you’re ready to run it. The script is run from within the directory of the subject you’re processing.

dmthal jp120406] $ ./process

Note on grecons: Grecons has gone through several manifestations. Older process scripts will use older versions of the command; older data must be processed with a concurrent version of grecons (grecons, grecons11, or grecons12). Keep this in mind if you’re going to try to reprocess old data.

The –G flag in the grecons command sets the gain. Gain is a scalar coefficient that each voxel intensity (in ‘mag’, or magnet, units) is multiplied by. Most of our scripts set gain to 1, thereby leaving the data as-is. Multiplying by a gain of less than 1 will reduce your data quality. Multiplying by a gain greater than one will improve it, but will be more processor intensive. If you get timecourse data that looks out of proportion, check the subject’s gain flag in the process script.

(NOTE: If you would like to create one process script that does not need to be edited for each individual subject, ask Greg how to do so. You can do this using a combination of existing lab scripts and scripts written by Gary Glover. This method was used for the FINRA studies.)

Checking the motion[edit]

One output of the process script is a 3dmotion file that plots the motion of your subject in all three Cartesian directions.

It’s necessary to check your subject’s motion to make sure there aren’t any big jumps that could really fudge up your data. Generally, gradual drifts and movements under 2mm are acceptable (be sure to check your units; small jumps can be stretched to look quite large). If you’re not sure whether the data is okay, check with Brian.

The 3dmotion files will normally be deposited in your subject’s main directory.

Using afni from the terminal[edit]

There are two possible commands to use to view the motion for the subject using afni from the command line.

(a) 1dplot

$ 1dplot 3dmotion12.1D[4] 			lets you see the z dimension alone 
$ 1dplot 3dmotion12.1D[5] 			lets you see the x dimension alone 
$ 1dplot 3dmotion12.1D[6] 			lets you see the y dimension alone 

(b) 1dplot -volreg

$ 1dplot -volreg 3dmotion12.1D[1-6\] 	

Lets you see all the dimensions at once, and if you want to print this you should save ps file as “XXmotion.ps,” close the window, and type “lpr XXmotion.ps”

1dplot.jpg

Record the largest movement in each direction (z, x, y) in the note file.

Note that if you set the functional data (output by your script into the subject’s main directory in a format that looks something like mid12b in the case of the mid task) as underlay in AFNI, you can graph it as a time series using the graph buttons. We’ll go into more detail on this with respect to the AFNI interface shortly. [INSERT UNDERLAY EXAMPLE SCREEN CAPTURE]

Using matlab[edit]

There is a matlab function called "inspectMotion" that you can use to view all 6 motion directions at once and get an exact measure of the largest jumps in your data. (Note: inspectMotion is not a built in function. see the lab manual scripts page if matlab doesn't recognize this command)

To use it, open matlab and run:

>>inspectMotion('3dmotion12.1D')

This will result in a plot like the one below. The highlighted values are used to point out questionable jumps. The orange highlights mean the jump is < 1mm but > .5mm (which is ok) and the red highlights mean the jump is > 1mm (which may or may not be ok depending on the study). The title of each plot tells you the direction of that motion vector as well as the corresponding afni index that you would use with 1dplot. InspectMotionplot.jpg

Note: The default is for inspectMotion to look across 3 TRs at a time when it looks for "jumps" in your data. If you would like to include more or fewer TRs (for example 5 TRs) you can use a second argument which would look like this:

>>inspectMotion('3dmotion12.1D', 5)

You can also use inspectMotion to test for correlations between your 6 motion vectors and a regressor from your model. For more details, see the help file for inspectMotion.

Creating a Note File[edit]

Having looked at your subject’s motion using 1dplot and AFNI, create a “note” file in their directory. When you “emacs filename” emacs will create that file if it does not already exist:

span@dmthal jp120406]$ emacs note

A typical note file will look something like this:

Notefile.jpg

Make sure to include subject initials, date run, experiment name, ethnicity, gender, winnings (if applicable), the I & S values, information on movement obtained through the 3dmotion plot (here “ns” for “not significant”), date/initials of Nudge (including S,L,P values), date/initials of warp, and any notes (including information on subject behavior, strategies, technical problems with the scan or eprime, issues or questions regarding data processing, et cetera).

3dinfo.[edit]

The 3dinfo command can be applied to .BRIK and .HEAD files. It is useful because it will tell you when the file was made and, with the axial anatomical .BRIK and .HEAD files, what your subject’s I and S values are. On group maps the command will allow you to see which subjects went into that particular analysis as well as which particular model was used.

3dinfo will tell you how much you’ve nudged your anatomical orig file if you fail to record that information when you nudge.

You should use 3dinfo on the z-file output of your reg script to find out what the beta (regressor) coefficients are for each contrast of interest. These are necessary for the 3dttest scripts.

$ 3dinfo sag116+orig.HEAD | more

Here, using a pipe | and the more command will allow us to view the 3dinfo page-by-page.

Sanity-checking your data.[edit]

Setting Overlay as Underlay.[edit]

Setting overlay as underlay at this stage in data processing is a tool for checking your data for weirdness, like movement and artifacts. Later on it will also be useful for looking at group-averaged timecourses (see the reordertcdump section later in the manual, as well as the reordertcdumpC script that’s commented in dmthal:/data3/bipomid/scripts/).

For now, however, check your data by setting the functional, blurred data as the underlay (this is the blurred data output by the process script – for example, “mid12b”). Then hit a graph button next to the Image buttons to display the time series of the crosshair-selected voxel. You can use C – and C shift + to resize; to move through time use the keyboard arrows.

Note that the “Graph” buttons are located to the right of the Axial, Sagittal, and Coronal view options.

As you move through your data in time using the keyboard arrow keys (make sure the mouse is over the graphed data, AFNI won’t let you do this otherwise), look for weird jumps or another graphical anomalies. The jumps are probably due to movement, but could be due to other issues, like data processing problems or even missing slices. Be on the lookout for large spikes in the graphed data – these definitely need to be investigated further if found.


My data looks funny.[edit]

Data can look scary because of problems with artifacts (material or scanner-related), processing problems, or modeling problems.

One of the most important things to do when checking processed data is to set the blurred data as the underlay in AFNI (which will display grayscale views of the functional data), and then hit the ‘graph’ button next to one of the image buttons. This will plot the MR signal throughout the whole scan. With the cursor over the graph window, arrows will scroll through time. Look for jumps in the graph and also strange patterns in the brain views.

Structural artifacts[edit]

The structural scan can reveal tumors, etc. Usually large problems are visible at the scanner, but some may turn up in pre-processing. These must be reported if significant, ask the lab manager or Anne Sawyer about reporting. The scan below revealed what was determined to be a non-clinical potential meningioma. It looks very large and possibly dangerous to the naked eye, but a neurologist determined it was not harmful after examination of the images.

Artifact1.jpg

In the structural view, if the subject moved a lot during the sag116 scan, a ‘ringing’ artifact may appear. This is not a problem beyond making the landmarks hard to find.

Functional Artifacts[edit]

I. Minor Problems[edit]

  • Dropout: this is evident when you don’t have signal in a given region (e.g. NAcc). Happens in lower parts of the brain, but it shouldn’t here at Stanford at the 1.5T with spiral-in-out acquisition. This can be quantified by running a signal-to-noise analysis.
  • Hair pins, necklaces, etc: you will see these as big holes in the initial localizer scans on the scanner computer. It is a good practice to flip through the localizer and inplane scans to see if there are any holes. If so, you didn’t screen your subject for metal correctly! Take them out and remove the metal (unless it is implanted, of course, in which case they shouldn’t be in the scanner in the first place!).
  • Dreadlocks: we screen for these because years back a subject with dreadlocks had a huge dropout surrounding the skull. This resulted from accumulated material in the hair.

II. Spiral Artifact[edit]

Artifact2.jpg

The above spiral artifact results from unspecific problems with the spiral acquisition sequence (using our normal 24-slice, 2sec TR scan). This example was late in a night, so perhaps the scanner was heating up, however, the following run looked fine. It is important to check data for this type of artifact, but it may be hard to characterize at first – primarily, every z-score map (whether it is of the motion regressors or task regressors) will have ‘striping’ outside the head (above left). When looking at ‘activation’ inside the brain, these stripes will often curve inwards along gyri, as visible in coronal views. A spiral artifact is obviously problematic for detecting activations. Answer: this happens randomly, so just beware (possible dub runs replacing the bad one with a good one).

This artifact can be easily confused with the slice artifact shown below. You may want to try the troubleshooting techniques listed there.

After you've double checked you processing script for errors, this is the right type of artifact to try asking Gary or Anne about. It may have resulted from a scanner malfunction or a problem with shimming. Be sure to tell them where (i.e. 1.5 or 3T magnet) and when the scan took place.

III. Slice artifact[edit]

Artifact3.jpg

Possible causes:

  • Novel scan parameters – 12 slices, 1sec TR.
  • Bad reconstruction during preprocessing which may be due to specifying the wrong number of TRs in the process script.

Diagnosing/fixing the problem:

  • go back and rerun the process script. Be sure to either watch the output of grecons12 and spiralioadd carefully or record the output in a file. Check for any unusual output/errors. These scripts are usually perfectly happy to give you garbage if there is an error during processing (garbage in, garbage out).
  • If there is an error, double check and then triple check to make sure that the number of TRs and the number of slices you pass to spiralioadd is correct.
  • Another thing to check is the size of the .mag file. Use "ls -l" to find out the number of bytes in your mag file.
correct number of bytes in a .mag file = 8192 (size of one slice) x number-of-slices x 2 (for spiral in and out) x number-of-TRs

IV. Ring artifact[edit]

Artifact4.jpg

Another odd artifact on a rapid 1.5T scan. What looks like nice NAcc activation in the coronal view (not shown) is revealed to be artifactual in the axial view – real activation never forms nice rings like this. This is probably due to a combination of motion and the spiral out pulse sequence. Diagnosing the problem:

  • Try looking through the motion regressors (particularly pitch) to see if this ring is showing up in one of them.
  • Use the matlab function inspectMotion to test to see if any motion vectors are correlated with your task regressors.

Possible fixes:

  • Try reprocessing your data using only the sipral in pulse sequence (you can add an argument to grecons12 to do this). This will probably get rid of the ring but it may decrease your general signal to noise ratio.

For future scans:

  • Problem may be improved by changing tilt of the head or other scan parameters.

IV. Dropped slices[edit]

Artifact5.jpg

This artifact results when the scanner for some reason doesn’t acquire data from a whole slice for a given TR. The above example is from a 12slic, 1sec TR scan, and the dropped slice artifact moves over time from the base of the brain to the top. Answer: Temporary solution is to drop the offending early scans, long-term solution is to change parameters or ask Gary to tell you how to avoid this.

You should also try the fixes listed in the slice artifact section.

V. Dropout due to field strength[edit]

Artifact6.jpg

Data may be better in some regions on one scanner or the other. The above example is from a 14-slice, 1sec TR scan on the 1.5T scanner (left) and 3T scanner (right). Note the ring of dropout/artifact in the ventral striatum for the 3T. Answer: change scanners.

VI. Dropout due to scan plane – sagittal is bad[edit]

Artifact7.jpg

Colors represent signal-to-noise in a 14-slice sagittal scan (as a percent of max achieved in a 24-slice axial) – basically <20% of max throughout the whole brain. Areas with no color have the worst drop-out. Answer: do not use a sagittal acquisition.

VI. Processing issues[edit]

If experimenting with a new pulse sequence, acquisition plane (sagittal is a horrible acquisition plane!), number of slices, or TR length, the resulting data from to3d (or the output of gary’s makebrick script, which runs to3d) may be upside down, flipped, etc. This can be fixed by running a few commands (e.g. 3drotate –rotate -90L 0A oI –zpad 20 …), or by making sure you have the right parameters in to3d.

VII. Modeling[edit]

The whole point of learning to model, and science in general, is to produce unfunny data. If your maps look funny or aren’t giving the expected result, try try again.

VIII. Float vs Short[edit]

File:Precision.jpg

Convolution and General Linear Modeling[edit]

Making regressors[edit]

There are 2 main ways to make regressors starting from e-prime .txt output files. One method is simpler but more rigid and the other way is somewhat more complex but much more flexible. If you are using a simple, well designed, factorial task such as the MID task you may want to use the method outlined in Data analysis appendix D. If you want to include more complex features of the task or of the subject's behavior you will probably want to use the method explained below.

Here is the general idea:

  • Start with an eprime .txt file
  • Make a large matrix that records all of the relevant task events and behavior by TR
  • Add any extra data to the master matrix such as preference ratings taken after the scan. (This may have to be done by hand/with excel)
  • Filter out the events that you want to make regressors out of using the makeVec.py script and a customized vector description file.


If you'd like to play with an example dataset to test out the scripts go to the /comonscripts/txt2matrix/ directory on dmthal. Run the cleanAll script to get rid of leftover data and start fresh.

The txt2matrix Script[edit]

To set up the fields you want copy "txt2matrix.py" to your study directory and make the appropriate changes to the "USER EDIT" section.

To run the example txt2matrix script use:

./txt2matrix.py EndowRun

You will end up with a file called: EndowRunMATRIX.csv This is a csv file with a column for every field you listed in "FieldsOfIntrest" (plus a column for the trialnumber and TR number) and a row for every TR in the scan.

In this example both of the blocks were appended together. If you want to run it for one block use:

./txt2matrix.py EndowRun1

Using makeVec and vector description files[edit]

After you have the master vector you can make regressors using "makeVec.py" and a vector description file. You do not need to make any changes to the "makeVec.py" script or even copy it, but you will need to make your own vector description files. There are a few files in /commonscripts/txt2matrix/ on dmthal that you can use as examples:

perVecs.txt - this describes 3 period regressors behVecs.txt - this description file results in regressors that are individualized by subject responses. comboVecs.txt - this gives examples of combining info from different fields and using '>' and '<' signs.


here is an example of running make vec to get vector files: makeVec.py perVecs.txt

(Note: if you are on dmthal you do not need to use "./makeVec.py" because the script is already on the path)

(after running this example run "./cleanAll" to remove the vectors you made)


Notes on Vector description syntax:

Example of making a Buy vs. Sell regressor in the product period (i.e. 1st and 2nd TRs):

BEGIN_VEC
INPUT: "EndowRunMATRIX.csv"
OUTPUT: "SellvsBuy_prodPer.1D"

MARK condition = SELL_? AND
     TR = 1,2 WITH 1
MARK condition = BUY_? AND
     TR = 1,2 WITH -1

END_VEC

Example of making a reaction time regressor in the choice period (i.e. 5th and 6th TRs):

BEGIN_VEC
INPUT: "EndowRunMATRIX.csv"
OUTPUT: "RT_choicePer.1D"

MARK TR = 5, 6 WITH $Decision.RT

END_VEC

General form of a vector description:

BEGIN_VEC
INPUT: "{inputfile}"
OUTPUT: "{outputfile}"

MARK {expression} AND
     {expression} AND 
      ... WITH {code}

MARK {expression} WITH {code}

END_VEC

BEGIN_VEC and END_VEC mark the beginning and end of a *single* vector description. Anything outside of these tags will not be read by makVec.py. You may include as many vector descriptions in a file as you like.

INPUT: - after this keyword, enter the input matrix file in quotation marks.
OUTPUT: - after this keyword, enter the output vector name in quotation marks.
MARK - this keyword starts the description. for every MARK keyword there must be one WITH keyword.
AND - specifies the start of a second expression. you can have as many of these as you like between the MARK and WITH tags.
WITH - indicates what code to mark the conditions with.

{expression}: must be of the form:
[column title] = [column value]
or
[column title] > [column value]
or
[column title] < [column value]

{code}: must be of the form:
[static value]
or
$[column title]

Reg (Regression) Scripts[edit]

SUMMARY: There are many variants of the reg script – usually one final version for each experiment, and many others along they way – but they all do essentially the same thing. The script compares your actual data to an ideal data set (where the ideal data set is a fake set that mimics what you would expect your data to look like if your hypothesis is correct for any given brain area), providing you with a statistical output that tells you how closely the two sets (real and imagined) match. It then gives you coefficients and z-scores for each voxel for every given contrast.
INPUT: .vec files (a trial type column for each subject), motion files (e.g. 3dmotion12.1D), m2v lookup tables, and functional MRI data.
OUTPUT: A file in a format similar to zriskbasicreg+orig.*; c.1D files (but the latter are generally not used again outside this script)
RUN FROM: the scripts folder in the experiment home directory
RUN ON: Run on everybody in each group of interest, but if you add one new subject you can comment out the other people and run it as you needn’t run the script on everybody all over again. If you add a new regressor, however, you will need to run the script on everyone again.
NOTE: If you run the reg script before nudging and warping, you don’t need to go back and rerun it since the reg script’s output is in +orig.* space, so AFNI will olay warp on demand using the tlrc file from your warp as a roadmap. That is to say, AFNI is smart enough to take your functional overlay file in its +orig format and use its tlrc file from the warp to warp the functional overlay from orig space to tlrc space. This feature is known as “warp overlay on demand.”

The reg script and its variants (usually with a “reg” in the title, e.g. “antoutnormreg”) can be found in the “scripts” folder in most experiment home directories.

The script applies the lookup tables to each subject’s vector file (as delineated/defined by the master vector into trial type and TR). If you look at the script you’ll notice that it calls each contrast/regressor (m2v) files. This means that after you’ve created the m2v files you don’t need to run them individually on each subject, but can do it all at once with the reg script.

Reg also convolves an idealized HRF (hemodynamic response function) to the step function .1D regressors (remember, these are the output of applying m2vs to vec files), producing a convolved xxc.1D file.


Convolution is a mathematical operator which takes two functions f and g and produces a third function that in a sense represents the amount of overlap between f and a reversed and translated version of g. In English, this means that we’re taking the step function (b) and combining it with what a typical HRF looks like. The math behind this step involves calculus and let’s be honest, even Newton probably didn’t really understand all of that. But the gist is that we’re essentially smoothing out the step function and making it look more like an HRF.

Waver.[edit]

The waver command follows each call to an m2v. AFNI’s waver command convolves your HRF with each lookup table to create that combined function that looks more like what we’d expect to see in a real HRF in the brain. Waver is found in any model script where you use 3dDeconvolve. In addition to convolving your HRF with each lookup table, waver pushes forward your regressors so that they match up with the HRF lagged data. This convolution (the “c” in xxc.1D) lags the expected response by the hemodynamic lag of 4-6 seconds. That means that your output will be aligned to real-time stimulus onset.

We do a 6 second lag because we use the gamma HRF function (this is the -GAM flag that's inside waver). This HRF peaks earlier than some other default HRFs used in fMRI research; choice of an HRF function is a matter of personal preference. Your convolved file is output in the format xxc.1D for use later in the script.

Note: Dealing with HRF Lag. When you view timecourse data, most will begin at the beginning of the trial. The timecourse should therefore start at a baseline, then deviate; remember that due to the HRF lag you'll want to look 2 or 3 TRs (4-6 sec) ahead of each event onset of interest to see what's happening in response to the event. While most researchers publish lagged timecourse data, some may chop things up, pulling back the BOLD response to correspond graphically to the stimulus onset. Read carefully.

3dDeconvolve: Regression[edit]

When we want to know how much of the variance in one variable is accounted for by another variable, we perform a regression. If, for example, I wanted to know how much I could know about someone’s height based on their weight, I would regress height on weight. Theoretically, there would be a strong relationship between weight and height and we could say that weight is a good (or strong) predictor of someone’s height. In mathematical terms, the regression equation is just like the equation for a line:

Reg1.jpg

Where y is the variable that we’re trying to predict (in this case, ‘height’), alpha is the intercept, beta is the mathematical weight of the x (predictor variable, which in this case is ‘weight’), and some error term that represent unique variability for each case. So, if I wanted to try to predict Brian’s height, I would take some intercept that is the average height for all men (5’7”), add to that his weight (170lbs) multiplied by the beta of weight (let’s say ‘.8), and then add to that some unique term for Brian, and in the end I’d come up with his height: 5’10”.

We’re going to run a regression on our data using the command 3dDeconvolve. Our predictor variable (the variable we’re trying to predict) will be our activation in each voxel in the brain (the model, with all its regressors, is run on each voxel). Our predictor variables will be our regressors, or contrasts of interest. (See the txt2master and lookup table sections for examples of contrast regressors.)

3dDeconvolve[edit]

Now that you have the ideal response to each stimulus coded in xxc.1D files, the reg script calls the AFNI program 3dDeconvolve. 3dDeconvolve does regressions; that is, it compares your actual data to what you would expect in a perfect world. [The perfect world is here represented by the idealized HRF convolved with the step function that was created by waver as a xxc.1D file.] In more concrete terms: 3dDeconvolve compares the fake hypothesis timecourse you’ve created with the waver command to your actual timecourse in each voxel of the brain. So for each voxel you’re comparing its real time series (or timecourse – a graph that plots its up and down movements over the course of the entire task) with the hypothesized time series to see how closely they match.

There are a lot of voxels in the brain, and it has to do this process individually for each, so the script tends to take a while on this step. Here, instead of dealing with height and weight, our predictor variables are the idealized sets of fake data that have been created with waver and the dependent variable is the actual data (scalar values) that we have in each voxel at each TR.

The better each regressor predicts your actual data for a given voxel, the stronger the beta-weighting coefficient will be for a given regressor. The script will give you a coefficient and t-statistic across the whole brain (that is, for each voxel) for each regressor (also known as a contrast) of interest.

This is what your regression looks like mathematically for multiple regressors. Note that it’s very similar to your equation for only one regressor.

Reg2.jpg

Because Y and each X are actually column vectors, the equation can be rewritten as:

Reg3.jpg

Where we’ve set epsilon, a term that contains our Y-intercept value plus an error value or some other kind of experimental fudge factor, equal to zero.

Here, each value of Y corresponds to the activation (in normalized units from your scanner output) of a single voxel during a single TR. So Y-sub-1 is the activation of that one single voxel during the first TR, Y-sub-2 is the activation of that same single voxel during the second TR, and so on. The “k” variable denotes your total number of TRs, so your last Y-sub-k is simply the value of that one single voxel during the last TR.

Each X column vector corresponds to an individual regressor. The first subscript entry denotes which regressor it is (here we’ve just labeled them 1 through N, but each of those numbers in reality would correspond with a different regressor). The second subscript entry, k, once again denotes the TR, with the first TR at the top and the last TR at the bottom.

Here, to simplify matters, all our values are 1, -1, or 0 (this is the step function given to you by your lookup tables … in your reg script you go a step further and convolve that step function with an HRF, so you’re real-life regressor column vectors will have values that are usually not integers like 1 – these column vectors are the xxc.1D files created by the reg script). For example, if the system of equations above represented the MID task our first column vector might be our reward anticipation contrast. The numeral 1 is our first entry because our very first TR in the task is a reward cue – that is, a reward anticipation period.

Just like the Y-column vector, each new row in all of the X-regressor vectors is a sequential TR (the first entry is TR 1, the second entry is TR 2, et cetera). An X column vector’s value at each TR is the value of your ‘hypothesis’ curve for that contrast at that TR – usually a non-integer value because you’ve convolved your step function with an HRF, but remember here we’ve left out that step for simplicity’s sake and we’re just looking at our regressor before it’s been convolved with the HRF.

Associated with each regressor column vector is a coefficient known as the beta coefficient. We can solve this system of equations for the beta coefficient, which will in effect tell us how much each column vector regressor is contributing the values of your Y column vector. If the beta coefficient is small, that regressor does not contribute much. If it is large, it indicates that that regressor is responsible for a non-significant amount of the variance you’re seeing in Y, your voxel activation at each TR timepoint k.

Note that there are certain circumstances in which you will have multiple solutions for the beta values. In fact, there are circumstances in which you could have an infinite number of beta value solutions. This circumstance is called collinearity. In the strictest mathematical sense it means that at least one of your vertical vectors (regressors) is a linear combination of at least two other regressors. That is, if you add those other two columns together you get a third column ... a collinear regressor could be a linear (additive) combination of any number of other regressors, however – it needn’t be just two.

Here's an example of collinear regressors. Our example consists of three regressors (regressor 1, regressor 2, and regressor 3) for an experiment only 3 TRs long (the top entry is TR 1, the middle entry is TR 2, and the last entry is TR 3).

Reg4.jpg

We can re-write this in matrix format and, just for kicks, give our Y entries real values for different time points, just like we'd find if this were real data in a single voxel in the brain:

Reg5.jpg

Here it's clear that column three, our third regressor, is a linear combination of column regressors one and two because by adding one and two together we get column three.

It's also clear that this is not a good state of affairs for our attempts to solve this system for the values of our beta coefficients. We'll find when we try to solve that there are in fact an infinite number of values the beta coefficients could take on and still satisfy the system of equations.

Reg6.jpg

Note above how you can assign beta3 any value you want and still solve the system of equations for beta1 and beta2. The beta values are not in any way constrained; there is no unique answer for this system of equations. Were we running these regressors in our analysis, the computer would choose just one solution where many are possible.

With real data in this sort of collinear situation, we wouldn't know for sure how much each regressor is contributing to the variance in the dependent variable (brain activation).

Now, let's review what we know about the regression equation, then take a look at the above situation in regressors that are generalized to our much longer experiments (as you'll recall, the example regressor above would only be a 3 TR experiment!). We'll also look at reasons why we might get something resembling collinearity in our data when our regressors are not actually collinear in the strictest algebraic sense. In statistics the term 'collinearity' is usually used in a much looser manner than its well-defined pure math counterpart. Collinearity of any flavor can mean bad things for your data.

First, a quick review. The regression equation can be written as follows

Reg7.jpg

Also recall that regression equations will often have a scalar correction term (epsilon) tacked onto the end; for the purposes of our example we'll set ours equal to zero.

Reg8.jpg

Onward.[edit]

For the sake of the example let's say that this is a 4-regressor experiment with k total TRs and 4 TRs per trial with only one trial type. Now, if we make a column vector out of each of our regressors (as we do with our .1D contrast files), we can write the above equation as:

Reg8.jpg

Note that each entry in the column vectors represents a different time point, starting with TR 1 at the top, then TR 2, TR 3, and so on. The Y values represent the activation of a single voxel at each of those TR time points (Y1 is the activation of that one voxel at TR 1, Y2 is the activation of that same voxel at TR 2, et cetera). The beta coefficient is a scalar quantity -- that is to say, it is a single number, not a vector.

The equation above can be re-written in matrix format (factor your betas out into a column vector and combine your regressor vectors into a matrix):

Reg9.jpg

Here we see that the fourth regressor is collinear with the 1st and 3rd regressors, since with only 4 TRs per trial and one type of trial, each vector will repeat its first four entries for the entirety of the experiment.

However, in practice your regressors need not be strictly collinear to manifest what appears to be collinear behavior in your analysis. There are at least three reasons for this.

(1) Computers do not work with infinite precision. For example, (1/3) * 3 equals one, but when a computer program calculates that amount it will terminate decimal places and arrive at the answer 0.999999999. This means that the computer could potentially make regressors that are not strictly collinear appear approximately collinear, with repercussions in your maps.

(2) Study design can force collinearity on your data. For example, if you were studying the effects of a number of independent variables on perceived pleasantness of music, but used sound files from real concerts whose volumes tended to scale with music type (eg, quiet ballads, medium country and ear-splitting death metal), you would have introduced collinearity in your design between your volume and music type regressors.

(3) Correlation serious enough to matter between predictor (independent) variables or contrasts of interest can cause collinear results. Some psychologists and neuroscientists will call any such correlation collinearity, others will not. It may not be obvious that it exists when you design your study and run your regressors, but it will hopefully be obvious in your output, which may have large standard errors, unexpected signs, and which may not show effects that have been robustly proven in past experiments.

In our own data it would be easy to create collinear regressors. For example, imagine that we have a regressor 'rout' (we want to see areas of the brain with more activation for a rewarding outcome than for anything else). However, we also want to look at hvlout (areas of the brain coding for high versus low reward outcomes). These two regressors are highly correlated since receiving *any* reward is correlated with receiving a high reward.

The take home message is that the word “collinearity” as used in fMRI and general statistics for psychology is NOT the same as the stricter definition used in mathematics. It IS, however, something you need to be very careful of when designing an experiment and when making your trial types and lookup tables.

NOTE: Truly orthogonal regressor vectors will have a dot product equal to 0. Fully orthogonalizing your regressors means that the dot product should be zero between all of them; you’ll note that sometimes this requires using step function values in your lookup tables other than 1, -1, and 0.

Another Math Moment[edit]

For those of you unfamiliar with dot products, a dot product is simply a rule for taking the product of two or more vectors. If a and b are our vectors, their dot product is a scalar value that’s computed by taking the sum of the products of each row in a with the corresponding row in b:

a· b = a1b1 + a2b2 + a3b3 + … + aNbN

You can think of correlations as existing in vector space, with each vector representing a regressor. It’s difficult for us to think of our regressors in this space, however, because with N timepoints we have an N-dimensional vector space that we can’t represent graphically.


If you do some derivations using the Pythagorean Theorem and the definition of terms like cosine, you discover that the absolute value of A times the cosine of theta is the projection (a scalar value) of A on vector B. It’s obvious from the diagram above that if A and B are at a 90 degree angle to one another that there is no element of A that’s shared with B (eg, no relationship between the two), because if we were to arbitrarily say that B lies along the X-axis A would not have an X component, only a Y component, and B would have only the X component and no Y component. That is to say, you cannot predict anything about B based on what you know about A. The two vectors are orthogonal, and therefore their dot product is zero since cos 90º is 0. If the dot product is not zero, however, it means we have some degree of collinearity, which could be bad for our data.

The non-math explanation.[edit]

Practically speaking, collinearity means that some of your independent variables are not only related to one another, but that they are partially or fully predicted by one another (for example, in our previous example receiving a high reward overlaps in large part with receiving any reward). Sometimes such overlap is obvious and sometimes it is not. This state of affairs is problematic because the regression equation can no longer determine how much one independent variable uniquely affects the dependent variable. The scary consequence of collinearity is that the estimates of how much each independent variable affects the dependent variables will be unreliable. If there are two regressors you really want to run that are collinear, you will need to split them up and run them in separate models. (Meaning you’ll start back toward the beginning with a separate reg script for each.)

How do I know if my regressors are “collinear enough” for bad things to happen?[edit]

Collinearity will most often manifest itself as wildly significant scores in either direction. The output of 3dDeconvolve will also tell you you’ve got it, but this doesn’t always mean anything bad for your data (it certainly means you should check, however).

The best way to check is to take a look at your contrasts. Here’s an example discovered during supplemental analysis for Knutson and Wimmer 2007.

Figure A. Illustration of two collinear regressors in the MID task modeling outcomes. Here, gain outcome is generated by the AD model, which calculates outcomes as the outcome value minus the expected value (cue*.66 win probability). The TD model includes a reward regressor that is positive and the same for all rewards.



Figure B. The effects of collinear regressors on regression output. Putamen and SMA activity are obviously flipped between the two regressors, a strong indication of model specification problems.


Collinearity that is not at all complete (and not flagged as a problem by AFNI when running the model), as shown in the first figure, results in very obvious problems in the regression output. These maps are largely inverses of one another (excluding the MPFC). Such a model should not be relied upon. Indeed, in the published report of this work (Knutson and Wimmer, 2007), including both regressors in the same model (a strong test of best fit) was rejected because of collinearity. Instead, two separate models were specified, one with the AD gain outcome regressor modeling the outcome period, and the other with the TD reward outcome regressor modeling the outcome period. A paired t-test comparison (3dttest –paired) showed that the AD gain outcome regressor indeed was a better fit for MPFC activity than the TD r(t) regressor.

Okay, back to our regularly scheduled reg script programming.

3dMerge[edit]

After 3dDeconvolve, the reg script calls 3dmerge, which converts all t-scores to z-scores. A z-score is a way of standardizing some sort of measure or statistic; it tells us how many standard deviations away from the mean our data is (a z-score of 3 means 3 standard deviations). Importantly, it allows us to compare a given score on some variable to the mean score on that variable. In terms of brain imaging, a z-score lets us know how much greater (or lesser) than baseline a given region of the brain is showing activation. The normal distribution is important here:



So let’s say we’re testing the dorkiness of graduate students and we’re wondering if Hal is as dorky as the rest of the Stanford graduate population. Let’s also say that the average graduate student at Stanford has a dorkiness score of 100 and there is a standard deviation of 20. And, for the sake of this example, let’s pretend that Hal’s score is 50 (less dorky than average).

The question, then, is the following:

Is Hal’s score sufficiently lower than the mean to assume that he comes from a population of non-dorky graduate students (where would we find such people?).

The null hypothesis is that Hal’s score does come from the dorky population. So then, let’s calculate the probability that a score as low as Hal’s does actually come from this population. If the probability is very low, we can reject the null hypothesis and conclude that he comes from a cooler, non-helmet wearing graduate student population. On the other hand, if the probability is not very low, then we would have no reason to doubt that Hal actually comes from the dorky population.

To do this, all we have to do is calculate a z score and then refer to a z-score table.

Z = X - μ / σ =

50 – 100 / 20 = -50 / 20 = -2.5

From a table of z scores, we see that the probability of a z score of –2.5 is .0062. This is our p value = .0062. Because this number is so low (much lower than our standard cutoff of p = .05), we can say that Hal comes from a different population of graduate students (perhaps he’s actually an undergrad).

The preceding logic is very important and is at play in t-tests, which are very similar to z-scores, except that t-tests take into account the size of a sample, whereas z-scores do not. When a sample gets to be ‘large’ (i.e., 120 people or measurements), t-values become equivalent to z-values.

After completing 3dmerge, the reg script writes out your new z-file to the subject’s directory and the script ends. You will use this z-file later to grab coefficient tags for your 3dttest script.

Editing the reg script[edit]

Let’s now take a look at the actual script and what you’ll need to modify for each new experiment/regressor:




Non-task related activations such as signal drift over the whole scan (e.g. a linear, nonlinear, or periodic increase in global activation values) are effectively taken out by the –POLORT flag in 3dDeconvolve (a value of 2 controls for drifts that are functions of up to 2 exponents). Such drift is almost unavoidable in fMRI and is the reason why we can’t compare task activation to a prior ‘baseline’ block as is done in PET. This is because the baseline block’s relation to the values of the task block are essentially unknowable because the drifts are unpredictable.

When your script is ready to go, run it from the scripts directory:

span@dmthal scripts]$ ./riskbasicreg

Your output should be a file in the format z*reg+orig.* (which provides you with your sub-brik tags (each sub-brik corresponds to a particular set of coefficients for a single contrast) for each contrast as will be needed by the 3dttest script) as well as c.1D files that are your regressors (but these are generally only used here in the reg script). The coefficients for each contrast take the form of a matrix of coefficients representing every voxel in the brain and how closely the time course of that voxel followed the hypothesized time course for each individual regressor.


Summary of the reg script[edit]

The reg script calls each of the contrasts (also known as lookup tables) that you created. These files are in the format m2v*1D (xx.1D). Once convolved (denoted xxc.1D) with an HR function, the script compares this idealized, ‘hypothesis’ data curve with the real data curve in every single voxel (that is, it compares what we actually have to what we would expect to have if our hypotheses fit the real world timecourse of that voxel exactly) and gives us a significance statistic in the form of a coefficient and a z score (and their associated p-value) for every single voxel in the brain. (See Schultz 1997.) In most of each subject’s brain voxels, the real timecourse for activation in that voxel will not be very similar to our idealized curve (regressor) created for each contrast of interest, so the regressor beta coefficient will be very weak and the computer will not translate it into a color for that voxel. If the relationship between the real curve and the hypothesis curve in any given voxel is very strong, however, this will be translated into an activation color that corresponds to your statistical significance for that voxel.

MORE ON VECTORS AND .1D FILES[edit]

Your reg script will output a number of different types of .1D files.

c files are convolved, time-shifted files: hence the non-integer values
x files are matrix files that have to do with regressors
Other .1D files should just be our original regressor files.

Some older experiment vector folders will have lots of .1D files. This is because for these experiments it was easier not to personalize the vectors (they were common to all subjects – for example, some ant and out files) and instead have them all in one place, modifying the reg script to look in the vectors folder for these .1Ds accordingly.