Lab Manual: Data Analysis Appendices

From SpanLabWiki
Jump to: navigation, search

APPENDIX A: AFNI PROGRAMS[edit]

sprlioadd[edit]

compiles Pfiles and labels them; averages the in-out slices (run1, run2) so that there is only one image; whichever is the first run should always go first; “-B”:byteswap; “-O”:averaging; 187: Number TRs (total time Run1/pulse time); 24: number of slices sprlioadd -B -O -m 0 P18944.7.mag npp1 187 24

to3d[edit]

This lets you convert anatomical image files into 3D datasets; converts I-files (2D slices) into AFNI’s .BRIK format (3D volume datasets). After you run this command, you should see that all your I-files have been used to create .head and brik files. cd to the directory containing the anatomical I-files (SPGR_data) and run ‘to3d’ from there to3d -epan -prefix npp1 -xFOV 120R-120L -yFOV 120A-120P -zSLAB 25.8I-66.2S - time:tz 187 24 2.0s seqplus 3D:0:0:64:64:4488:'npp1'

to3d -prefix anat1 I.* to3d -spgr -prefix anat1 -xFOV 120A-120P -yFOV 120S-120I -zSLAB 89.8R-82.7L 3D:-1:0:256:256:1:I.*.dcm

“-spgr”, “-epan”: specifies the type of data.

“-prefix”: what to call the new file

“-2swap”: byte-swap the input files

“-session” which directory to write to

“-xFOV”, “-yFOV”, “zSLAB”: These are the field-of-view dimensions, i.e. the extent in x, y, and z in reference to the 0,0,0 point set by the scanner. The FOV indicates that you’re specifying from the middle of the first slice to the middle of the last, and the SLAB means you’re specifying the outer edges of each slice. In most cases, you’ll use FOV for x and y, and SLAB for z.

• To find which axes are first for the FOV dimensions: As you look at a slice as it was collected (a coronal slice if you’re collecting coronally), the x-dimension refers to what is Right to Left across the slice, the y-dimension refers to Top to Bottom, and z-dimension refers to what is the dimension as you go from the First to Last slice. So, for sagittal acquisition, x is A to P or P to A, y is S to I or I to S, z is L to R or R to L. The direction is determined by the prescription and is on the scanlog. • Reading info from a Pfile: If you’re unsure of the direction data was collected, or if you want to know more info about a Pfile, you can print out the scanning parameters as stored in the header of the Pfile. Cd to the directory containing the Pfiles and type “ppi P######” (print p-file information) or “rrh P######” (read raw header). Info concerning the direction and extent that the data was acquired is in the line starting with “Image center (1st slice).” From the scanlog, you should know which axes correspond to x, y, and z. the numbers for the x and y axes correspond to the offsets of the extent. The sign of the number corresponds to the direction of the offset (a negative number after L/R corresponds to left offset, a positive is to the right, a negative number after S/I is an inferior offset, vice versa). In the z-direction, there is no offset possible, and the number corresponds to the coordinate of the first slice (the first number in the extent of the z-direction), so from the sign of the number you know in which direction slices were acquired. • A quicker way to find the FOV dimensions is to cd to the directory containing the Pfile and type “read_FOV_Pfile all P######.7” and the FOV dimensions will print to the screen. The “all” specifies that you want all 3 dimensions, and you could enter x,y,or z to get just one dimension (Note: may not work correctly for certain oblique and reverse-spiral prescriptions).


3dvolreg[edit]

If you have multiple runs, you’ll want to do a volume registration on the data, which will bring the images you collected with different methods or times into alignment as one “representative” image. “Misalignment” of image sequences is mostly due to movement, and the goal here is to align to an image that results in the least amount of fimage changes in all other images.

Usage: $ 3dvolreg –prefix <prefix name> -base <base file> <target file>

First you must select a base image J[x] from which to align the remaining target images I[x]. Given J[x] and I[x], find a geometric transformation T[x] such that T(I[x])-J[x] is minimized (i.e. each image I[x} is rotated and shifted to lie on top of the base image J[x]). How do you choose a base image? Your best choice is one that was collected very close in time to when the spgr image was collected.

$ 3dvolreg -Fourier -twopass -prefix 3dnpp1234 -base 90 -dfile 3dmotion1234.1D npp1234ts+orig

3dDeconvolve[edit]

For each voxel, takes the reference function and “fits” it to the observed fMRI signal, and fits a baseline value and a linear trend as well. Can be used with both shifted and wavered reference functions. Assumption of linearity, time invariance, and independent error. Takes a specific registered data set, uses three shifted reference functions, uses 3 motion regressors (roll, pitch, yaw), writes out statistics and coefficients. Can analyze: multiple runs at the same time (computes separate baselines and linear trends for each run); both block and event-related designs; subject-specific reference functions; over-sampled time series. Can also censor data points with movement artifact, and determine efficiency of the experimental design. The “-censor” option in this program allows you to remove outliers. Create a .1D file with unwanted TRs coded “0” and all else coded “1.” Replace unwanted TR with neighbor, or average of neighbors using 3dcalc and 3dTcat. First use 3dDeconvolve –nodata option (instead of –input option) to evaluate blocked or event related experimental designs for lack of multicollinearity (i.e. ability to compute the inverse matrix(Xt X) -1), to conduct a comparative analysis of IRF coefficient estimation accuracy (i.e., reduced standard deviations) for different designs, and examine effect of increasing the number of reps or observations on the IRF coefficient estimation accuracy.

-glt: Used to construct within-subject contrasts (average time-shifted block design RFs, area-under-curve time shifted event-related RFs, contrasts between stimulus types); requires a matrix (.mat) file (1 row for each contrast, 1 column for each parameter in the model); can test significance of glt’s on the group level with 1-sample t-tests


3dcalc[edit]

This allows the user to perform arithmetic on 3D datasets, voxel-by-voxel (no inter-voxel computation).

Usage: $ 3dcalc [-options]

3dTstat – SNR analysis: computes voxel-wise statistics for a 3d+time dataset. The program defaults to mean (if not given an option), and other examples include sum, slope, stdev, cvar. This command is used in calculating signal-to-noise ratio, a useful way of characterizing whether signal is present in a given brain region or across the brain for a new acquisition sequence.

Usage (SNR calculation using 3dTstat): $ 3dTstat –prefix mean input+orig $ 3dTstat –stdev –prefix stdev input+orig $ 3dcalc –a mean+orig –b stdev+orig –expr ‘(a/b)’ –prefix snr

An SNR calculation script called snrcalc is available in the scripts directory of many tasks.


waver[edit]

convolves the reference function with an idealized hemodynamic response function, and writes out a modified reference function. Options include type of idealized hemodynamic response, sample time, etc…

Usage: $ waver [-options] output_filename

3ddelay: This estimates the time delay between each voxel time series in a 3D+time dataset and a reference time series.

Usage: $ 3ddelay –input <filename> -ideal_file <ideal time series file name> -fs <sampling frequency in Hz> -T <stim period in seconds> [-prefix bucket]

The estimated delays are relative to the reference time series. For example, a delay of 4 seconds means that the voxel time series is delayed by 4 seconds with respect to the reference time series. The program uses a computationally efficient way of estimating the response delay of each voxel along with its cross-correlation coefficient which is used to determine whether a voxel is activated by the stimulus or not. The response delay is estimated using the Hilbert Transform of the cross correlation function.


3dIntracranial[edit]

This performs automatic segmentation of the intracranial region (skull stripping).

Usage: $ 3dIntracranial –anat <anat filename> [-options] –prefix <prefix name> With the skull stripped, the dataset can be viewed in the volume rendering plugin available in AFNI.


Participants often move between acquisition of functional and anatomical datasets. Two solutions 3dAnatNudge (automatic) or Nudge Dataset plugin (requires user input). For buth, use 3dbuc2fim to create a single EPI sub-brick (use 3dVolReg Base image).


3dAnatNudge (aligning functional and anatomical datasets)[edit]

Need to have made an intracranial anatomical dataset for each subject; after running commands, go into AFNI to compare overlay of epi onto nudged and original; if nudged looks good, delete other and use 3drefit command stored in refitcommand.txt to change position


Nudge Dataset Plug-in (aligning functional and anatomical datasets, option 2)[edit]

if you can’t make intracranials, want more user control, or think rotations will be necessary. Set the number of colors to 9 and open two locked AFNI windows (one with 3dbuc2fim functional overlay and one without). Select the dataset that you want to move (anatomical or functional); remember, anatomical has finer resolution and EPI is warped. Make desired movements by entering numbers and clicking NUDGE. If rotations are needed, use the DO ALL button; if only translations are needed, use 3DREFIT.

3drefit[edit]

Makes translation movements without interpolation by changing the header file. You need to decide which volume you want to move: epi or anatomical. Copy sign and magnitude of each desired shift directly from the Nudge Dataset plugin or the output of 3dAnatNudge and enter for –dxorigin, dyorigin, and/or dzorigin.

Talairach-ing: before or after calculation of %signal or filtering; Transforms brain into standard atlas coordinates, allows for group statistics; allows more robust localization even on the individual level; adwarp used to transform functionals

1.Define markers for AC-PC transformation (straightens and centers head, middle of cross haris should go through point, should be centered in all orientations) 2. Define markers for TLRC transformation (fits brain, not head, into shoe box; find far edge in relevant orientation and confirm with other orientations)


3dttest[edit]

There are 3 options (1-sample, paired samples, independent samples); all use the same basic command line, with slightly different options (2 brick output- Mean difference & t-value); Used with output of 3dfim+, LC or –glt output from 3dDeconvolve

3dANOVA[edit]

Uses a standard ANOVA model to determine on a voxel by voxel basis the effect of an independent variable on the dependent variable (change in signal induced by a task). Several types of 3dANOVA available:

-3dANOVA: single factor ANOVA- factor levels are fixed

-3dANOVA2: two-way ANOVA factor levels are: 1-fixed/fixed (gender/handedness), 2-random/random, 3-fixed/random (effect of drug doses on subjects); random refers to the fact that the exemplars in the analysis are representatives of a larger group (it changes the denominator in the F-statistic)

-3dANOVA3: three-way ANOVA factor levels are: 1-f/f/f, 2-r/r/r, 3-f/r/r, 4-f/f/r, 5-f/f/r (third factor is nested in first); limitations—all groups must have same number of subjects, differences can’t be calculated for random factors across levels of different factors


3dRegAna[edit]

Regression analysis, used when explanatory variables (EV) are continuous; can use multiple EVs but must hand-code interactions; multiple output options, defined by user (refression coefficient for parameters, T-stat for parameters, F-stat and R^2 for full model); can do hypothesis testing

AlphaSim: Determining significance; Cluster threshold method, assumes real activation occurs in groups of voxels, not isolated voxels; Monte Carlo simulations to determine probability that clusters of size N occur by chace; Provides 2-tiered alpha protection (3 ways to set alpha)

Monte Carlo Simulations: Define size of brain or ROI (TLRC Brain = 134mm * 174mm * 120mm); Define voxel-wise alpha (1-tailed; randomly assign each voxel as active or not); repeat 1000+ times; Determine frequency of various cluster sizes (calculate alpha)

Clustering: Once the voxel-wise statistical analysis has been completed there are several post-ANOVA processing steps to yield clusters of significant activation (clustering and thresholding; cluster tables; removal of activation patterns associated with variable brain coverage), then extraction of cluster-wise activation for each subject for post-ANOVA analyses.

3dclust: a simple clustering program to extract basic info (cluster number, center of mass (xyz), average value, max value, min/max (xyz); with –prefix option, outputs the clustered and thresholded BRIK (also can use 3dmerge for this)

Post-ANOVA Masking: Partial brain coverage of 3d+t files; goal is removal of clusters that are due to a subset of subjects where data were available for a given area; Possible Uses of Region of Interest Data: Statistical comparison of mean activation of two groups within an anatomical ROI; bar graphs of mean fit coefficient or % signal change in two groups in each function ROI to help interpret simple effects; visualization of mean time series in an ROI; scatter plot of performance with mean fit coefficient in each functional ROI (from 3dRegAna perhaps); Correlation between mean activation in distributed ROIs (connectivity analysis)

-Anatomical ROIs—Focused hypotheses give more power to detect activation; search region approach; individually-tailored approach;

Making Anatomical ROI Masks: Draw Dataset plug-in: draw by hand or import Talairach regions (from Talairach Daemon); Can visualize in 3D with Render plugin and Show Thru option 3dfractionize: to resample the mask into coarser resolution to match your functional

-Functional ROIs—Follow-up analyses after whole-brain; better visualization of cluster results

Making Functional ROI Masks: 3dmerge: use same cluster size, threshold, and radius that you did to make your clustered and thresholded image, but use the Iclust_order option; this creates a mask in which voxels within the largest cluster are labeled 1, the next largest 2, etc.

-Applying ROI Masks: Mask and data file must be at same resolution;

3dcalc: zeros out values outside the anatomical ROI; use this to make a picture or to limit clustering and thresholding to a search region of interest (use –mask option in AlphaSim to find less stringent cluster/threshold combo within the ROI)

3dROIstats: calculates statistics within anatomical or functional ROIs; can also be used with 3d+time datasets (have to be put into tlrc space if mask is in tlrc space); output will have column for each ROI, row for each time point for each subject; output can be imported into excel or spss as space or tab delimited text file

APPENDIX B: MORE DETAILED DISCUSSION OF THE PROCESS SCRIPT[edit]

      • NOTE: Almost every lab will create a process script to run these commands as a batch file, and in order to make them specific to a given subject, you’ll just have to change the P-file numbers and I&S values (recorded at scanner, found in subject info/note file). The following are the necessary pre-processing steps, which can be run via process script or as separate commands.

1. Reconstruct the .mag file from the P#####.7 file (optional) using grecons.

$ grecons11 -b -O P12345.7

2. Average the in-out slices to correct for slice timing in voxel sampling with sprlioadd; compiles P-files and labels them.

$ sprlioadd –B –O –m 0 P12345.7.mag run1 337 24

3. Convert 2D slices from the magnet into AFNI’s .BRIK format (3dVolume dataset) and make .HEAD file (contains header information) using to3d; individualize the I&S values

$ to3d –epan –prefix run1 –xFOV 120R-120L –yFOV 120A-120P –zSLAB 25.8I-66.2S –time:tz 337 24 2.0s seqplus 3D:0:0:64:64:8088: ‘run1’

4. Remove magnet stabilization periods using 3dTcat

$ 3dTcat –prefix run1tc ‘run1+orig[7..332]’

5. Shift voxel time series from the input dataset so that the separate slices are aligned to the same temporal origin using 3dTshift; This can also be done within 3dvolreg; this is most important in slow event-related designs, and unnecessary in block designs

$ 3dTshift –slice 0 –prefix run1ts run1tc+orig

6. Concatenate runs using 3dTcat. You may also choose to remove linear trends separately for each run before concatenation, especially if you are going to subsequently normalize your data (convert to % Signal Change).

$ 3dTcat –prefix run12ts ‘run1ts[0..325]’ ‘run2ts[0..325]’

7. Register each 3D sub-brick from the input dataset to a specific base brick using 3dvolreg. Choose a base brick that’s not too far toward the beginning or the end of a run. Can also use UCSD reg_briks program (from Philippe) to find and register to optimal base for each subject. Will create a .1D file that you can view ($ 1dplot 3dmotion12.1D[1..6]) in order to see sudden motions, between-run position changes, and overall subject drift over the session in 6 dimensions (A-P, L-R, I-S, roll, pitch, yaw; units are mm). You can ignore specific TRs with large movements using the censor option in 3dDeconvolve. You should use the six movement dimensions in the .1D file as regressors in 3dDeconvolve.

$ 3dvolreg –Fourier –twopass –prefix 3drun12 –base 90 –dfile 3dmotion12.1D run12ts+orig

8. If you decide you want to spatially smooth your data, you can use the command 3dmerge. Here, we apply a Gaussian blur with FWHM of 4 (about the size of one voxel)

$ 3dmerge –prefix run12b -1blur_fwhm 4 –doall 3drun12+orig

9. You probably want to apply a temporal filter to your data, and you will do this using the 3dFourier command. Definitely apply a high-pass filter to remove low frequency noise (the afnigroup decided that a good rule is 2x the characteristic frequency of your design; if you have an event related design, you can divide one run by 3 or 4). Some people choose to bandpass, and remove high-frequency noise as well, and the frequency you select will depend on your sampling rate. You will be removing the linear trend at this point as well, unless you add the –retrend flag. Also, you should perhaps filter runs separately.

$ 3dFourier –prefix run1f –highpass .011 ‘run12b[0..325]’ $ 3dFourier –prefix run2f –highpass .011 ‘run12b[326..351]’ $ 3dTcat –prefix run12f run1f+orig run2f+orig

      • NOTE: You should always visually inspect your data to make sure the signal is clean, and you can do this as soon as you’ve run the to3d command. Set the functional data as underlay. Bring out the axial, coronal, and sagittal views, and use the Index (upper left of the AFNI interface) to scroll through the full session, watching for ghosts, shadows, stripes, and nods. After that, use the GRAPH option to examine the time series for different voxels throughout the brain (you’re looking for spikes here). If you have abnormal spiking, you can use Gary’s imgzitfix program (specify cutoff in # Std. Devs. above mean) or AFNI’s 3dDespike program.


DUMMY GUIDE PRE-PROCESSING:

$ cd XXmmddyy (go into subject’s folder where P-files are) $ cp ../scripts/process . (names may reflect specific study; e.g. emoscripts, emoprocess)

$ emacs process (opens process script for editing in emacs text editor)

ctrl-z; $ bg (puts emacs in background so you can still use terminal)

$ less info (opens info file so you can retrieve P-file numbers and I&S values for subject) … keep info open, but make emacs the active window

Alt-x replace-string 11111 <enter> 12345 <enter> (replaces each P-file with subject-specific P-file 12345 as recorded in info file; cursor must be above the first occurrence of each P-file in order to replace all occurrences)

Alt-x replace-string 25.8I-66.2S<enter> 25.0I-67.0S<enter> (replaces I and S values in to3d command; retrieve from info file)

Ctrl-x-s (save) Ctrl-x Ctrl-c (close)

$ ./process (runs process script; you’ll end up with 3dmotion12.1D, run12b+orig.BRIK, run12b+orig.HEAD)

$ 1dplot 3dmotion12.1D[4] (plot session motion; repeat for 4 (I-S), 5 (L-R), and 6(A-P)) ctrl-z; $ bg (puts 1dplot in background so you can still use terminal)

$ emacs info (opens info file for editing in emacs)

Scroll down to appropriate place; Type “MOTION” and record largest sudden movements in I-S, L-R, and A-P directions; Initial and date.

Ctrl-x-s (save) Ctrl-x Ctrl-c (close)


APPENDIX C. STATS TUTORIAL[edit]

Check and see what’s going on with some of these images here.

When we’re undertaking any sort of research, there are 6 basic steps:

1. Start with a research hypothesis (e.g., Stanford guys are smarter than guys in general). 2. Set up the null hypothesis (e.g., Stanford guys don’t differ from the rest of the population) 3. Construct the sampling distribution operating on the assumption that the null hypothesis is true (i.e., Don’t sample a group of Mensa guys from Stanford and a group of mentally challenged guys from the rest of the population) 4. Collect some data (e.g., Give “smart” tests to Stanford guys and other guys) 5. Compare the sample statistic to that distribution (i.e., does the mean intelligence of our Stanford men represent statistically different from the mean of the population’s intelligence?) 6. Reject or retain your null hypothesis depending on the probability, under the distribution of our null hypothesis, that our sample statistic is actually as extreme as the one we have obtained.

II. What you be talkin about Willis, or What do statistics actually tell us?

First off, it’s important to know what exactly we’re doing when we use statistics to test differences between or within groups. Simply put, when we test statistical significance, we’re testing the probability that an observed relationship or difference occurred by pure chance, or in other words, “How “true” is our result?”

The p-value that we obtain from our tests tells us the chance (or probability of an error) that our result is actually a fluke. So, when p = .05 and we claim that there is a significant difference between our two two samples, we’re really saying that if we were to run our specific test 20 times, 1 time in those 20, we’d find a relationship between our two samples (i.e., they wouldn’t be significantly different).

III. Variance and Standard Deviation

A word about samples and populations:

Sample: The finite data set before us (Stanford students) Population: The infinite data set from which the sample was obtained (all college students)

Before we do any specific statistical tests, we should get to know our data. To continue with our example from Wednesday, say we’re cookie makers and we want to make sure that all of our chocolate chip cookies have an equal amount of chocolate chips in them (if they didn’t, I could forsee annoying children complaining that their cookies had fewer choc. chips than other cookies).

           vs.              

Ideally, we’d like to keep the variability of the amount of choc chips per cookie (in the industry, we call that CCPC) to a minimum.

Variance: To calculate variance, we sum our squared deviations and then divide

     that score by (N –1).  [N-1 is the divisor because it is a better estimate   
     of the population’s variance].  

So then, variance (s2x)= Σ(X-Xbar)2 / (N-1)

Returning to our example:


Chips per cookie 5 6 6 3 4 4 4 4 7 5


X = 4.8

s2x = Σ(X-Xbar)2 / (N-1)

= (5-4.8)2 + (6-4.8)2 + (6-4.8)2 … + (7-4.8)2 + (5-4.8)2 / (10-1) = 1.51

That’s not that bad of a variance (though I feel bad for the kid who got 3 chips if he’s standing next to the kid who got 7 chips).

So, the variance is an average of the squared deviations from the mean (if we didn’t square the deviations, we’d be left with a variance of zero, which would tell us nothing).

The standard deviation, then, is the square root of the variance, and is a measure of the averages of the deviations from the mean. In this case, our S.D. is √1.58 = 1.23.

IN SPSS:

Analyze → Descriptive Statistics → Descriptives

Enter your target variable in the “variable” box, click on options and check the descriptives that you want, and click “continue” and then “OK”.

Descriptive Statistics

	N	Mean	Std. Deviation	Variance

CHIPS 10 4.8000 1.22927 1.511 Valid N (listwise) 10

This is a healthy S.D. and variance; something very large would indicate that our data was funky – this will be addressed later.

IV. The Normal Distribution

Now, that we know about variance and standard deviation, what can we say about the distribution of our sample?

First, for any given sample, as we increase our N, we will approach a normal distribution of whatever we’re measuring.






E.g E.g., If I measure the intelligence of 5 Stanford students, they will most likely not represent the entire range of the intelligence scale. If, however, I test all of Roble, I will come closer to capturing the range of intelligence.

A normal distribution is depicted in the diagram above.

We can use the normal distribution to test hypotheses about an individual observation or about the mean of a number of observations.

An example: As an example, say we’re testing the dorkiness of graduate students and we’re wondering if Hal is as dorky as the rest of the Stanford graduate population. Let’s say that the average graduate student at Stanford has a dorkiness score of 100 and there is a standard deviation of 20. And, for the sake of this example, let’s pretend that Hal’s score is 50 (less dorky than average).

The question, then, is the following:

Is Hal’s score sufficiently lower than the mean to assume that he comes from a population of non-dorky graduate students (where would we find such people?).

The null hypothesis is that Hal’s score does come from the dorky population. So then, let’s calculate the probability that a score as low as Hal’s does actually come from this population. If the probability is very low, we can reject the null hypothesis and conclude that he comes from a cooler, non-helmet wearing graduate student population. On the other hand, if the probability is not ver low, then we would have no reason to doubt that Hal actually comes from the dorky population.

To do this, all we have to do is calculate a z score and then refer to a z-score table.

Z = X - μ / σ =

50 – 100 / 20 = -50 / 20 = -2.5

From a table of z scores, we see that the probability of a z score of –2.5 is .0062. This is our p value = .0062. Because this number is so low (much lower than our standard cutoff of p = .05), we can say that Hal comes from a different population of graduate students (perhaps he’s actually an undergrad).

The preceding logic is very important and is at play in t-tests.




V. CHOOSING YOUR TEST!

When should we use the different statistical tests in our toolbox?


Independent Variable Dependent Variable None, i.e., 1-sample Discrete Quantitative Discrete χ2 goodness-of-fit test χ2 test of independence Logistic Regression Quantitative z- and t-tests z- and t-tests, ANOVA Regression & Correlation

REMEMBER:

Discrete Variables: Have a limited number of values, like gender. E.g.: “Is John ugly?” This is a discrete variable b/c we have two options, “yes” or “no”, “no” being the correct answer.

Quantitative Variables: Have many different values, going from a low point to a high point. E.g.: “How ugly is John on a 1-10 scale?” This is a quantitative variable b/c we can use any value on the scale, the correct answer being “10”, very ugly.


VI. Hypothesis testing – One sample t-tests

We use a one sample t-test when we are comparing the mean of one sample (say, the average number of chocolate chips in one batch of John’s famous cookies) to the mean of the population at large (say, average number of choc. chips made by everybody) and we don’t know the population’s variance, which is almost always the case.

The major difference between t-tests and z-tests, then, is that we use an estimate of the population’s variance, and take this estimate into account in our t-value table. Essentially, t values have a different distribution than z values, and so we evaluate our sample statistic accordingly (none of this actually has to be done by hand anymore, it’s just good info to know).

Going back to our example, the same logic that we used for z-tests applies to our one sample t-test.

Using SPSS for a one-sample t-test: Analyze → Compare Means → One sample t-test Enter the target variable in the variable box, and then set the test value to the appropriate number. Say, for example, that we assume the normal population puts 5 chips in per cookie. Do John’s cookies bring all the boy’s to the yard? Is the mean number of chips in her cookie different from the mean number of chips in the average cookie?

YES!

One-Sample Test

	Test Value = 5
	T	Df	Sig. (2-tailed)	Mean Difference	95% Confidence Interval of the Difference
	 	 	 	 	Lower	Upper

dan's cookies 6.384 9 .000 2.7000 1.7432 3.6568



VII. Hypothesis testing – Independent Samples t-tests

Oftentimes, we’ll want to compare two groups to one another on some dependent variable.

For example, do CS honors students score significantly higher on the Purity Scale than Psych Honors students? I.V.: Group membership (CS vs. Psych) D.V.: Purity Scale score

1. We’ll expect that these groups will differ a bit, but the question is:

“Is this difference between the two groups large enough to justify the conclusion that the two groups were drawn from different populations?”

2. So then, we’re asking whether the difference between the two population means is different from 0. That is, the null hypothesis is:

CS Students’ Purity Score – Psych Students’ Purity Score = 0

3. The t-test statistic is derived from the z-test statistic, which as we mentioned previously, is some point on the distribution minus the mean of the distribution divided by the standard error of the distribution (or in this case, the estimated variance). For the independent samples t-test:

t = (Xbar1 – Xbar2) / √(s21/n1 + s22/n2)

4. We might want to first look at the distribution of our data to see if there are any outliers. To do that in SPSS: Graphs → Boxplots → Simple

Then, enter your target variable in the variable box and define your category appropriately. These diagrams will show you if you have any outliers:


As you can see, there is one psych student that seems to score really low on the scale. This would be a good indicator that you’d want to take a look at this participant’s data and possibly not use him in your analyses, but that decision is up to you.

5. Before we run our statistic, it’s important that our data be of the right format in SPSS. We want the data to be in one column and the group membership to be in another. The group membership will be 1, 2, 3, etc…and you will define what each of those numbers correspond to in the “variable view section of SPSS”. To do that, go to “variable view” and underneath “values”, assign your appropriate value (e.g., 1 = CS student, 2 = Psych Student).

6. Now we’re ready to run our test:

Analyze → Compare Means → Independent Samples t-test

Enter the appropriate dependent variable in the “variable” box and the independent variable in the grouping box. Click on “Define Groups” and enter your appropriate group numbers that you want to compare (in this case, it’s “1” and “2”). Click on OK…

The output shows us that CS Students score significantly higher on the purity scale than psych students:



How do we know that? We can see that underneath the “Sig.” column, the p-value is .013 suggesting that there is a 1 percent chance that these two groups aren’t independent.


VIII. Hypothesis testing – One-way ANalysis Of VAriance

1. Sometimes, we’ll have a situation where we’ll want to compare more than two groups to each other one some dependent variable. In such a case, we are going to be using ANOVA. REMEMBER: For an ANOVA, the dependent variable is continuous and the independent variable is discrete!

For example, do the different classes at Stanford (i.e., frosh, sophomores, juniors, seniors) have different amounts of facebook friends?

I.V.: Class Year D.V.: Number of Facebook Friends

2. The logic behind the ANOVA is a bit too much to cover today, but I’m always available to discuss it…


3. To run the ANOVA, we have a few options. In the simplest case (i.e., this one), we go to:

Analyze → Compare Means → One-way ANOVA

Enter the appropriate dependent variable in the “dependent list” box, and enter the appropriate independent variable in the “factor” box. Under Options click Descriptive Statistics so that you can see your means and standard deviations, etc.

The output shows that there is a significant difference between the class years in terms of number of facebook friends:



I know this difference is significant because our p-value is .001, meaning that there is a .1% chance that these groups aren’t actually different from one another (i.e., that this was a flukey result).

4. But what if we want to know more about our groups than simply “there is a difference between them”? Well, we could run independent-samples t-tests between all of the groups (if we had a hypothesis about where the differences were), OR we could run “post-hoc analyses” in SPSS, which will explore the differences between all of the groups. To do so, run the ANOVA again, and this time, click on Post-Hoc and then Bonferroni. See what happens:


Look under the significance column and compare all of the years to each other. If the significance level is less than .05, that’s a significant difference.

5. OK, but now say that we want to “control” for some other variable. Say we think “Yeah, OK seniors have more friends than freshmen, but could this simply be the result of some other variable that’s at play? Could age really be accounting for the difference between these two classes on the number of facebook friends that they have?”.

To examine this question empirically, we’d first want to make sure that we do in face measure this other variable. In this case, it’s simple – we just look at how old everyone is. Now, to run this test in SPSS, we go to:

Analyze → General Linear Model → Univariate

In the Dependent Variable box, put “Friends”. In the Fixed Factors box (this is the same as your independent variable), put “Year”. Then, finally, in the Covariate box, put the variable for which you are controlling; in this case, it’s “Age”. Click on Options and then drag your variables from the left box to the right box and then click on Descriptives. Hit OK, and then in the main dialog box, hit OK again. Here are the results:



We can see that class year is still a significant predictor of number of facebook friends, even when we control for age. Age, however, is not a significant predictor of number of facebook friends.

IX. Hypothesis Testing – Factorial ANOVA (Sometimes, a 2x2 ANOVA)

1. Perhaps you’re interested in testing the effects of two independent variables on one dependent variable. Some of you are testing such designs using, say, one factor that is ethnicity and another factor that is experimental condition. Formally, we’d say this design has two between-subjects factors, and we’d write it out as such: 2(Ethnicity: Asian American, Euro. American) x 2(Condition: Threat, No Threat). In such a case, we are testing the main effect of each of the factors AND also the interaction between the two of them.

2. To take a more hypothetical, way less realistic example…Say you’re interested in the relationship between the type of computer people use, the coast that they’re from, and how cool they are. In this case, our independent variables are Computer Type and Coast, and our dependent variable is some measure of coolness and let’s operationalize it by saying that it’s a score on a coolness scale. Here, we’re interested in the main effect of each independent variable (e.g., are Mac users cooler than PC users AND are west coast people than east coast people). We’re also interested in whether or not there is an interaction between the two independent variables. Can you come up with some possible interactions?

3. To test this factorial ANOVA in SPSS, go to:

Analyze → General Linear Model → Univariate

Put your dependent variable (‘coolness’) in the Dependent Variable box and put your two independent variables (‘Comp_Type’ and ‘coast’) in the Fixed Factors box.

Click Options and then move all of your factors over to the Display Means for box, and then check off Descriptive Statistics below. Click Continue and then OK.

In the output, you’ll get a table that looks like this:


From that table, we can look at each independent variable and see that computer type is significant, but ‘coast’ is not. We also see that there is a (just barely) significant interaction between computer type and coast. What is the nature of these significant results? To find out, let’s consult the table of means that you generated:




We can see here that mac users are cooler than pc users, and that there is an interaction such that west coast mac users are cooler than east coast mac users, but east coast pc users are cooler than west coast pc users. X. Hypothesis Testing – Chi-squared test of independence

1. Use a chi-squared test when you’re dealing with two categorical variables. This is a useful test when you are dealing with counts of something. For example, are there more men who like sports than women who like sports?

2. To test this relationship in SPSS, go to:

Analyze → Descriptive Statistics → Crosstabs and then put gender in the Rows box and ‘sportslike’ in the Columns box. It doesn’t matter which variable goes in which box here.

Click on Statistics and then click on Chi-square. Then click OK.

You’re first given a frequency table which has the counts for the variables of interest:



Just from a quick examination of this table, it looks like there are more males who like sports than there are females who like sports. Our chi-square statistic confirms this assumption:


XI. Hypothesis Testing – Correlation and Regression

1. Sometimes we might want to see how two continuous variables are related to each other (correlation), or we might want to see how one continuous variable predicts or explains the variance in another continuous variable (regression). REMEMBER: For correlation and regression both variables need to be QUANTITATVE!

To start with correlation, perhaps we’re interested in whether or not the number of facebook friends someone has is related to the number of dates that that person gets during the year.

2. The first thing we might want to do is look at a scatterplot to see if there is any graphical relationship between ‘friends’ and ‘dates’. To do so, go to:

Graphs → Scatter/Dot → Simple Scatter and click Define.

Put your variables in the Y Axis and X Axis boxes. It doesn’t matter which way you put them in. Double click the graph and click on the Add Fit Line at Total. Here’s your graph:


It looks like there is a positive relationship between the two variables – that is, more dates equals more friends and vice-a-versa. But, let’s see if there’s actually a statistically significant correlation between ‘friends’ and ‘dates’. To do so, go to:

Analyze → Correlate → Bivariate and put your two variables into the Variables Box. Click on OK. Here’s the output:


We see that there is in fact a statistically significant positive correlation b/w number of facebook friends and number of dates: r = .31, and p < .001.

3. OK, but now we want to know whether or not the number of friends someone has will actually predict how many dates that they go on. To answer this sort of question, we need to do a regression.

First, what is our dependent variable? In this case, it’s ‘dates’ because that’s what we’re trying to predict. And, our independent variable is ‘friends’. In the language of the statistician, we are going to regress ‘dates’ on ‘friends’. Here’s how we do it in SPSS:

Analyze → Regression → Linear

Put your dependent variable in the Dependent box and your independent variable in the Independent box. Hit OK.





The output not only shows us that ‘friends’ is a significant predictor of ‘dates’, but it allows us to create an equation so that we can predict ‘dates’:

Dates = 6.738 + .038*Friends


XII. Frequently or Unfrequently Asked Questions

When was my Pfile created? You can find the information about when your Pfile was created in the Efile. You can reconstruct the Efile if you do not have it using Gary Glover’s program writeihdr12. The program is stored in /usr/local/bin on dmthal and nac. It is stored on /home/span/bin/ on mpfc. The command is “writeihdr12 #Pfile” The Efiles have lots of interesting information about your scan such as when the scan started and the prescription of the scan.

How do I recreate an Efile from a Pfile? You can use Gary Glover’s program writeihdr12 to recreate an Efile from the Pfile. The program is stored in /usr/local/bin on dmthal and nac. It is stored on /home/span/bin/ on mpfc. The command is “writeihdr12 #Pfile” The Efiles have lots of interesting information about your scan such as when the scan started and the prescription of the scan.

APPENDIX D. ALTERNATIVES FOR MAKING REGRESSORS[edit]

The txt2master Script[edit]

SUMMARY: txt2master is a python script that creates a 1 column file that labels every single TR with its (a) trial type and (b) position within that type (eg, its identity as the first TR, second TR, third TR, etc, within that trial) INPUT: .txt Eprime output files (NOT .edat)
OUTPUT: a .vec file and .outkey file for each subject
RUN FROM: each individual subject’s directory
RUN ON: one subject at a time from his or her subject directory:

subjectname ] $ ./../scripts/txt2master.py –outkey “exptname*.txt”

NOTE: txt2master is of course not the only way to achieve the same end result; the lab is moving to Matlab for more complex tasks with highly varying regressors (e.g. price differential)
ALSO NOTE: The wildcard (*) in the txt2master command tells it to search for n number of blocks and paste them together sequentially. Watch out for temporary files with a tilde after them – txt2master will pull those in as well, lengthening your task by another block. Always check the output of txt2master: it should have the same number of lines as TRs in your task.

The txt2master script can be pulled from the “scripts” folder in most experiment home directories. One of the first things you’ll want, as mentioned, is something that allows you to quickly pull data from TRs of interest within specific trial types (for example, you want to look at what happened in the brain during the choice period of a RISK task $1 win trial). The txt2master script is the first step toward making this information readily available to your analysis scripts.

Txt2master looks in the eprime .txt data output file and pulls information from each trial that enables it to figure out what type of trial it was. [You can reference the commented txt2master script at the end of this section.] It further divides the trial into the trial TRs relative to the beginning of the trial rather than the beginning of the block; that is, the numbers will repeat as they only tag your position within a trial, not within the block as a whole. The script then summarizes this information in the simplest format possible in an output file of the form .vec, which will make it easy for us to reference the list of trial types later when we have other scripts ‘call’ that information for use in analysis. Each line of the .vec file is a TR, the top line represents your first TR in the experiment and the last line represents your last TR in the experiment. The letter and number combination on each line give you all the information you’ll need to know to about each TR for your models.

The input file for your txt2master script should be in each subject’s folder. It looks something like this:

RiskStraightUp-428-1.txt	(taskname-subjectnumber-block.txt)

To run the script, which lives in the script directory, you need only type

$ ./../scripts/txt2master.py –outkey “exptname*.txt”

from each subject’s home directory. The script will only run on that subject, so you need to do this for each subject from each separate directory. The nice thing, as previously mentioned, about the wildcard * is that the script will look for all .txt files in that format, so it can deal with multiple blocks automatically. (For example, you might have –outkey “BipoMIDKLC*.txt”, which would search for all .txt eprime files and would intelligently paste them together.) If you don’t have multiple blocks it will only do it for the one block. But remember that having a ~ file (backup) for any of your eprime .txt files can mess things up … for some reason it will pull it in as an additional block even though it’s a replicate and you don’t have a wildcard at the end of the file name.

Note that the –outkey flag is telling the script to create your outkey file. Outkey is a key that will tell you which letters correspond to which trial types. The filename in quotation marks is your eprime .txt input file.

The output of your text2master script comes in the form:

xx.vec (where xx are the subject’s initials) or something_else.vec

If your output is not already in the format xx.vec, you may want to change the name appropriately. This will make it easy to distinguish between .vec files belonging to different subjects in the experiment.

span@dmthal riskfmri]$ mv something_else.vec xx.vec

Open the .vec file in emacs to check it

span@dmthal riskfmri]$ emacs filename.vec

What you find inside the .vec file should look something like this:

B1
B2
B3
B4
A1
A2
A3
A4
B1
B2
B3
B4
B5
B6
C1
C2
C3
C4
A1
A2
A3
A4
A5
A6
B1
B2
B3
B4
B5
B6
C1
C2
C3
C4
C5
Etc.

Here we’re viewing a .vec file from the variable ITI RISK task. In this case our trial types are as follows

A: +$0.10 win trial
B: +$1 win trial
C: -$0.10 loss trial
D: -$1.00 loss trial

The number next to the trial type letter corresponds to each TR within that particular trial relative to the beginning of that trial (NOT the beginning of the block!). So in the case of the risk task:

A1: 1st TR, anticipation period +$0.10 win trial
A2: 2nd TR, choice period +$0.10 win trial
A3: 3rd TR, outcome period +$0.10 win trial
A4: 4th TR, ITI period (Inter-Trial-Interval, aka fixation) +$0.10 win trial

So the first set of Bs comprise trial 1, the next set of As comprise trial 2, et cetera., relative to the beginning of the block. However, do not, as previously mentioned, confuse the numbers in the .vec with the TR or trial relative to the beginning of the block.

Note that in the long list of A,B,C, and D above some of the trials have a 5th and 6th TR ITI period. This is because the experiment is “jittered,” meaning that the ITIs between trials vary in length. In a non-variable ITI experiment there should be the same number of TRs in all trials.

Keep in mind that your “trial types” are arbitrary insofar as you can specify what is meant by “trial type.” You could also decide to use upper and lowercase letters to correspond to wins and losses, respectively (e.g., A1 for $1 wins and a2 for $1 losses); the important thing for the scripts that will use the .vec information is to leave the integrity of the “letter-number” format in place.

Sanity Check: to check and make sure you had the appropriate number of trials of each type, you can go to each subject’s directory and type

$ grep –c A1 xx.vec

Where A1 is a stand-in for your trial type of interest (change it to reflect what you want to look at, of course), and xx.vec is your subject’s .vec file (change xx to his or her initials). The grep command will print out the number of entries of that name (eg, A1) in the file … QED, the number of trials of that type.

Sometimes txt2master will spit out a .vec file that has some letters you weren’t expecting, often lowercase l, m, n, and o’s. Go look at your edat file; most likely these correspond to trials in which the subject failed to respond. They could also correspond to trials in which the subject failed to meet a specific criteria (eg, loss trials on the MID task).


When creating your trial types it’s important to avoid colinearity – that is, trial types that share some attribute or sets of attributes. For more about this, see the section covering the riskbasicreg script a few pages further on.


In summary, txt2master takes your eprime .txt data file as in input and creates a .vec file and a .outkey file as output. The .vec file tags each experiment TR by trial type, while the .outkey provides a key to the .vec file, listing the associations between letter-number pairs and trial types-TRs within a trial.

One thing more – in the case of the risk task, we had to create an intermediary script to get the nice .vec list you see above. We knew we’d have to do this as soon as we opened our outkey file. Here’s why.

When we opened .outkey for the risk task

span@dmthal subjectfolder]$ emacs Risk.outkey

We got

+$0.10(1TRs):A
+$0.10(2TRs):B
+$0.10(3TRs):C
+$1.00(1TRs):D
+$1.00(2TRs):E
+$1.00(3TRs):F
-$0.10(1TRs):G
-$0.10(2TRs):H
-$0.10(3TRs):I
-$1.00(1TRs):J
-$1.00(2TRs):K
-$1.00(3TRs):L

The script is smart; it’s differentiating between the 1, 2, and 3 TR ITIs that occur for each trial type. If this happens to you, make sure you recode something somewhere (in eprime, or txt2master, or at this stage with a rescripting script) to deal with this fact so that you end up with the appropriate number of trial types (unless, of course, you want to consider the length of the ITI in your trial type). We created a script called “rescript” to recode these into four categories collapsed across ITI length, creating a more appropriate relationship between the letters and trial types:

+$0.10 (1,2, and 3 TR ITIs): A
+$1.00 (1,2, and 3 TR ITIs): B
-$0.10 (1,2, and 3 TR ITIs): C
-$1.00 (1,2, and 3 TR ITIs): D

Rescript doesn’t create a new outkey key, however – it just creates a new .vec file with the correct trial-letter associations.

Rescript (if necessary for variable ITI experiments)[edit]

The rescript script can be found on dmthal in data3, riskfmri:

span@dmthal]$ cd /data3/riskfmri/scripts/
span@dmthal scripts]$ emacs rescript

Taking a look now at the script, you’ll note that it first re-assigns to letters already available in the list of letters for each ITI for each trial type, THEN re-assigns to A, B, C, and D. This is because we could run into trouble going straight to ABCD. For example, if the script first hits a G, which it reassigns to C, it could then hit that same C again and reassign to A – the wrong trial type in our end result! Be careful with what you’re putting in and getting out. Take a moment to make sure there isn’t a chance you’ll be recoding to the wrong trial type.

Lookup tables and their scripts[edit]

SUMMARY: each m2v* script tags TRs of interest for a given contrast; there is a separate m2v script for each contrast
INPUT: .vec files
OUTPUT: contrastname.1D file for each subject – a column of 0s, 1s, and -1s
This output is a step function; the 1 and -1 entries are tagging events of interest from particular trial types. If you were to place the .1D file next to the subject’s .vec file each line in the first would correspond to the same line in the second, and therefore that trial type and TR.
RUN FROM: the reg script (see next section) calls and runs these scripts, so you don’t need to do so
RUN ON: the reg script calls and runs each contrast on each subject’s data


Great. So we have our list of trial types divided up by TRs, each of which corresponds to parts of each trial: anticipation, choice, decision, ITI, et cetera. Now we want to create what’s called a ‘lookup table.’ The lookup table will be used to ‘flag’ only the TRs of interest for a particular analysis we want to run – for example, we may care to look to see whether there are brain regions that show greater activity when a subject wins $1 as compared to trials in which the subject loses $1. In this case we would want to flag the $1 win trial outcome TR, as that is the TR during which the subject learns of the win or loss. Lumping in data from anticipation and choice periods would confuse our analysis for obvious reasons.

Lookup tables flag these trial TRs of interest by marking each TR with a 1 or -1 if we care about it, and a 0 if we don’t (the 1 and -1 correspond to the contrasts: in the case of activation greater for X than for Y we might choose to flag X with 1 and Y with -1). You’ll see why it’s done in this way in the discussion of the riskbasicreg script, our next step, below.

The generic letter-number representation of the .vec format is important. It allows us to replace these letter-number pairs with any set of numbers, at any particular timepoint in any trial type. If we created number-only regressor first, we would be stuck with those numbers and have no flexibility in creating other regressor types without starting from scratch.

A lookup table creates, in effect, a step function. It is like a fancy find-and-replace.

The lookup table scripts can be pulled from the “scripts” folder in most experiment home directories. Lookup scripts begin with the prefix “m2v” (Master to Vector). Each contrast (also known as a regressor) has its own lookup table script. Examples of contrasts/regressors for our RISK example include:

M2vtables.jpg

Switching. Oftentimes we’re interested in what is going on when subjects decide to switch their behavior or pursue a new strategy.

To do this we’d want to edit our lookup tables such that only switches are tagged. This is done by hand by going into the m2v file and zeroing out non-switches. You’d then save that file under a new name denoting is as a switch regressor.


Your output should be a contrastname.1D file (e.g., hvlant.1D).

Summary of lookup tables.

A “contrast” is a comparison of interest. The word “regressor” is used to mean the same thing as “contrast”, since the contrast is what we will be regressing (comparing) to an ideal outcome in our statistics. Lookup tables flag TRs of interest by multiplying the functions by 1 or -1; TRs that are not of interest are flattened by multiplying them by 0.