Output background-corrected probe intensities


Overview

This page describes how to generate files of background-corrected probe intensities. These files can be processed by perl and R scripts to produce GeneBASE estimates.

Step 1: Download the ProbeEffects program

Instructions are available on the ProbeEffects portion of the download page.

Step 2: Create a parameter file

A parameter file specifies the following types of information
A log file to store the progress of the computation and to output any error messages.
Exon array annotation, including pgf, clf and probeset annotation files. These can be downloaded from the annotation page.
The exon array data. We recommend a set of diverse samples of data for probe selection. The Affymetrix tissue panel data may be combined with small numbers of exon array data.
An output file stores the resulting background-corrected, normalized probe intensities.
The model parameters specify a choice of background correction and normalization method.

Examples

We provide several sample parameter files which can be modified. Detailed descriptions of the parameters are given below.

***Note that flags and parameter values are separated by tabs***

MAT background correction, scalar normalization.
MAT background correction, no normalization.

Description
[log]
logfile
The name of the file to log progress, errors, etc.
[exon_annotation]
probeset_annotation
The probeset annotation file specifies the grouping of probesets into transcript clusters and the level of annotation supporting each probeset. See the annotation page to download.
pgf_file
The pgf file specifies the grouping of probes into probesets. See the annotation page to download.
clf_file
The clf file describes the position of each probe on the chip. Those clf files with a description "crosshyb_x" including the mapping information of probes to off-targets allowing an edit distance of "x" base-pairs. To generate GeneBASE-xhyb estimates, a clf file with "crosshyb_x" must be specified. See the annotation page for details.
[exon_data]
folder
The folder storing the array cel files.
exon_cel_files
A list of cel files, each array separated by a single "," and no spaces.
[output]
output_model_fit
A file storing the MAT fitted coefficients and R-squared values.
output_all_bkgd_correct_norm_probes
When set to "true" one file for each array is output containing background-corrected, normalized probe intensities.
bkgd_correct_norm_probes_file
A prefix for the set of files created for each array.
[model]
array_type
The type of array analyzed. Here a value of "exon" should be specified.
method
Background-correction method. One of (mat, median_gc, none)
train_model
This value should be set to "true" to output background-corrected normalized probe intensities.
mat_training_probe_type
The probes used for training the background model. One of (background, full). Defaults to background.
normalization_method
Normalization method. One of (core_probe_scaling, none, quantile). The core_probe_scaling method applies a scalar to each array so that the median of background-corrected core probe intensities is equal to 100. The none method applies no normalization in addition to the background correction. The quantile method applies a quantile normalization (followed by background correction).

Step 3: Run the ProbeEffects program

The program is run using the parameter file "parameterFile.txt" on the command line with the following command:

./ProbeEffects2.0 -par parameterFile.txt

Step 4: Examine results

The program outputs a log file which should be checked for errors.

The background-corrected probe intensities will be output in files specified by the "output_model_fit" parameter in the parameter file.

Step 5: Download ProbeSelection

Download the ProbeSelection program from the download page.

Step 6: Run ProbeSelection

Detailed instructions for running probe selection can be found in the file ReadMe_ProbeSelection.