fileMuncher {AnnBuilder} | R Documentation |
This function takes a base file, a source file, and a segment of Perl script specifying how the source file will be pased and the generates a fully executable Perl script that is going to be called to parse the source file.
fileMuncher(outName, baseFile, dataFile, parser, isDir = FALSE) mergeRowByKey(mergeMe, keyCol = 1, sep = ";")
outName |
outName a character string the name of the file
where the parsed data will be stored |
baseFile |
baseFile a character string for the name of the
file that is going to be used as the base to process the source
file. Only data that are corresponding to the ids defined in the
base file will be processed and mapped |
dataFile |
dataFile a character string for the name of the
source data file |
parser |
perInst a character string for the name of the
file containing a segment of the a Perl script for parsing the
source file. An output connection to OUT that is for storing parsed
data, an input connection to BASE for inporting base file, and an
input connection to DATA for reading the source data file are
assumed to be open. perlInst should define how BASE, DATA will be
used to extract data and then store them in OUT |
pathForPerl |
A character string for the path to which temporary Perl scripts will be stored. |
isDir |
isDir a boolean indicating whether dataFile is a
name of a directory (TRUE) or not (FALSE) |
mergeMe |
mergeMe a data matrix that is going to be
processed to merge rows with duplicating keys |
keyCol |
keyCol an integer for the index of the column
containing keys based on which entries will be mereged |
sep |
sep a charater string for the separater used to
separate multiple values |
The system is assumed to be able to run Perl. Perl scripts generated dynamically will also be removed after execution.
mergeRowByKey
merges data based on common keys. Keys
multiple values for a given key will be separated by "sep".
fileMuncher
returns a character string for the name of
the output file
mergeRowByKey
returns a matrix with merged data.
This function is part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R
Jianhua Zhang
if(interactive()){ path <- file.path(.path.package("pubRepo"), "data") temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at", "X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at", "M16652"), ncol = 2, byrow = TRUE) write.table(temp, "tempBase", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) # Parse a truncated version of LL\_tmpl.gz from Bioconductor srcFile <- loadFromUrl("http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz") fileMuncher(outName = "temp", baseFile = "tempBase", dataFile = srcFile, parser = file.path(path, "basedLLParser"), isDir = FALSE) # Show the parsed data read.table(file = "temp", sep = "\t", header = FALSE) unlink("tempBase") unlink("temp") }