fileMuncher package:AnnBuilder R Documentation _D_y_n_a_m_i_c_a_l_l_y _c_r_e_a_t_e _a _P_e_r_l _s_c_r_i_p_t _t_o _p_a_r_s_e _a _s_o_u_r_c_e _f_i_l_e _b_a_s_e _o_n _u_s_e_r _s_p_e_c_i_f_i_c_a_t_i_o_n_s _D_e_s_c_r_i_p_t_i_o_n: This function takes a base file, a source file, and a segment of Perl script specifying how the source file will be pased and the generates a fully executable Perl script that is going to be called to parse the source file. _U_s_a_g_e: fileMuncher(outName, baseFile, dataFile, parser, isDir = FALSE) mergeRowByKey(mergeMe, keyCol = 1, sep = ";") _A_r_g_u_m_e_n_t_s: outName: `outName' a character string the name of the file where the parsed data will be stored baseFile: `baseFile' a character string for the name of the file that is going to be used as the base to process the source file. Only data that are corresponding to the ids defined in the base file will be processed and mapped dataFile: `dataFile' a character string for the name of the source data file parser: `perInst' a character string for the name of the file containing a segment of the a Perl script for parsing the source file. An output connection to OUT that is for storing parsed data, an input connection to BASE for inporting base file, and an input connection to DATA for reading the source data file are assumed to be open. perlInst should define how BASE, DATA will be used to extract data and then store them in OUT pathForPerl: A character string for the path to which temporary Perl scripts will be stored. isDir: `isDir' a boolean indicating whether dataFile is a name of a directory (TRUE) or not (FALSE) mergeMe: `mergeMe' a data matrix that is going to be processed to merge rows with duplicating keys keyCol: `keyCol' an integer for the index of the column containing keys based on which entries will be mereged sep: `sep' a charater string for the separater used to separate multiple values _D_e_t_a_i_l_s: The system is assumed to be able to run Perl. Perl scripts generated dynamically will also be removed after execution. `mergeRowByKey' merges data based on common keys. Keys multiple values for a given key will be separated by "sep". _V_a_l_u_e: `fileMuncher' returns a character string for the name of the output file `mergeRowByKey' returns a matrix with merged data. _N_o_t_e: This function is part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R _A_u_t_h_o_r(_s): Jianhua Zhang _S_e_e _A_l_s_o: `resolveMaps' _E_x_a_m_p_l_e_s: if(interactive()){ path <- file.path(.path.package("pubRepo"), "data") temp <- matrix(c("32469_f_at", "D90278", "32469_at", "L00693", "33825_at", "X68733", "35730_at", "X03350", "38912_at", "D90042", "38936_at", "M16652"), ncol = 2, byrow = TRUE) write.table(temp, "tempBase", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) # Parse a truncated version of LL\_tmpl.gz from Bioconductor srcFile <- loadFromUrl("http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz") fileMuncher(outName = "temp", baseFile = "tempBase", dataFile = srcFile, parser = file.path(path, "basedLLParser"), isDir = FALSE) # Show the parsed data read.table(file = "temp", sep = "\t", header = FALSE) unlink("tempBase") unlink("temp") }