ABPkgBuilder package:AnnBuilder R Documentation _F_u_n_c_t_i_o_n_s _t_h_a_t _s_u_p_p_o_r_t _a _s_i_n_g_l_e _A_P_I _f_o_r _b_u_i_l_d_i_n_g _d_a_t_a _p_a_c_k_a_g_e_s _D_e_s_c_r_i_p_t_i_o_n: These functions support a single API represented by ABPkgBuilder to allow users to build annotation data packages by providing a limited number of parameters. Other parameters will be figured out by the supporting functions. _U_s_a_g_e: ABPkgBuilder(baseName, srcUrls, baseMapType = c("gb", "ug", "ll"), otherSrc = NULL, pkgName, pkgPath, organism = c("human", "mouse", "rat"), version = "1.1.0", makeXML = TRUE, author = c(name = "who", address = "who@email.com"), fromWeb = TRUE) getBaseParsers(baseMapType = c("gb", "ug")) createEmptyDPkg(pkgName, pkgPath, force = TRUE) getDirContent(dirName, exclude = NULL) getMultiColNames() getUniColNames() getTypeColNames() splitEntry(dataRow, sep = ";", asNumeric = FALSE) twoStepSplit(dataRow, entrySep = ";", eleSep = "@", asNumeric = FALSE) _A_r_g_u_m_e_n_t_s: baseName: `baseName' a character string for the name of a file to be used as a base file to base source data. The file is assumed to have two columns (separated by tabs "\t") with the first one being the names of genes (probes) to be annotated and the second one being the maps to GenBank accession numbers, UniGene ids, or LocusLink ids srcUrls: `srcUrls' a vector of names character strings for the urls where source data files will be retained. Valid sources are LocusLink, UniGene, Golden Path, Gene Ontology, and KEGG. The names for the character strings should be LL, UG, GP, GO, and KEGG, respectively. LL and UG are required baseMapType: `baseMapType' a character string that is either "gb","ug", or "ll" to indicate whether the probe ids in baseName are mapped to GenBack accession numbers, UniGene ids, or LocusLink ids otherSrc: `otherSrc' a vector of named character strings for the names of files that contain mappings between probe ids of baseName and LobusLink ids that will be used to obtain the unified mappings between probe ids of baseName and LocusLink ids based on all the sources. The strings should not contain any number and the files have the same structure as baseName pkgName: `pkgName' a character string for the name of the data package to be built (e. g. hgu95a, rgu34a) pkgPath: `pkgPath' a character string for the full path of an existing directory where the built backage will be stored organism: `organism' a character string for the name of the organism of concern (now can only be "human", "mouse", or "rat") version: `version' a character string for the version number makeXML: `makeXML' a boolean to indicate whether an XML version will also be generated author: `author' a named vector of character string with a name element for the name of the author and address element for the email address of the author force: `force' a boolean that is set to TRUE if the package to be created will replace an existing package with the same name dirName: `dirName' a character string for the name of a directory whose contents are of interests exclude: `exclude' a character string for a pattern maching parameter that will be used to exclude contents of a directory that mach the pattern dataRow: `dataRow' a character string containing data elements with elements separated by `sep' or `entrySep' and a descriptive string attached to each element following `eleSep' sep: `sep' a character string for a separator entrySep: `entrySep' a character string for a separator eleSep: `eleSep' a character string for a separator asNumeric: `asNumeric' a boolean that is TRUE when the splited values will be returned as numeric values fromWeb: `fromWeb' a boolean to indicate whether the source data will be downloaded from the web or read from a local file _D_e_t_a_i_l_s: These functions are the results of an effort to make data package building easier for urers. As the results, users may not have great power controlling the process or imputs. Additionally, some of the built in functions that figure out the urls for source data may fail when maintainers of the data source web sites change the name, structure, ect of the source data. When such event occurs, users may have to follow the instructions contained in a vignette named AnnBuilder to build data packages. `getBaseParsers' figures out which of the built in parsers to use to parse the source data based on the type of the mappings done for the probes. `createEmptyDPkg' creates an empty package with the required subdirectories for data to be stored. `getMultiColNames' figures out what data elements for annotation have many to one relations with a probe. The many parts are separated by a separater in parsed annotation data. `getUniColNames' figures out what data elements for annotation have one to one relations with a probe. `getTypeColNames' figures out what data elements for annotation have many to one relations with a probe and additional information appended to the end of each element following a separate. The many parts are also separated by a separater in parsed annotation data. `splitEntry' splits entries by a separator. `twoStepSplit' splits entries by the separator specified by sep and the descriptive information of each element by eleSep. _V_a_l_u_e: `getBaseParsers' returns a named vector for the names of the parsers to use to parse the source data. `getDirContent' returns a vector of chracter strings for the content of a directory of interests. `getMultiColNames' returns a vector of character srings. `getUniColNames' returns a vector of character strings. `getTypeColNames' returns a vector of character strings. `splitEntry' returns a vector of character strings. `twoStepSplit' returns a named vector of character strings. The names are the desciptive information appended to each element by `eleSep' _N_o_t_e: The functions are part of the Bioconductor project at Dana-Farber Cancer Institute to provide Bioinformatics functionalities through R _A_u_t_h_o_r(_s): Jianhua Zhang _R_e_f_e_r_e_n_c_e_s: HowTo and AnnBuilder vignettes _S_e_e _A_l_s_o: `GOPkgBuilder',`KEGGPkgBuilder' _E_x_a_m_p_l_e_s: # Create a temporary directory for the data myDir <- tempdir() # Create a temp base data file geneNMap <- matrix(c("32468_f_at", "D90278", "32469_at", "L00693", "32481_at", "AL031663", "33825_at", " X68733", "35730_at", "X03350", "36512_at", "L32179", "38912_at", "D90042", "38936_at", "M16652", "39368_at", "AL031668"), ncol = 2, byrow = TRUE) write.table(geneNMap, file = file.path(myDir, "geneNMap"), sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) # Urls for truncated versions of source data mySrcUrls <- c(LL = "http://www.bioconductor.org/datafiles/wwwsources/Tll_tmpl.gz", UG = "http://www.bioconductor.org/datafiles/wwwsources/Ths.data.gz", GO = "http://www.bioconductor.org/datafiles/wwwsources/Tgo.xml") # Create temp files for other sources temp <- matrix(c("32468_f_at", NA, "32469_at", "2", "32481_at", NA, "33825_at", " 9", "35730_at", "1576", "36512_at", NA, "38912_at", "10", "38936_at", NA, "39368_at", NA), ncol = 2, byrow = TRUE) write.table(temp, file = file.path(myDir, "srcone"), sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) temp <- matrix(c("32468_f_at", NA, "32469_at", NA, "32481_at", "7051", "33825_at", NA, "35730_at", NA, "36512_at", "1084", "38912_at", NA, "38936_at", NA, "39368_at", "89"), ncol = 2, byrow = TRUE) write.table(temp, file = file.path(myDir, "srctwo"), sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE) otherMapping <- c(srcone = file.path(myDir, "srcone"), srctwo = file.path(myDir, "srctwo")) # Runs only upon user's request if(interactive()){ ABPkgBuilder(baseName = file.path(myDir, "geneNMap"), srcUrls = mySrcUrls, baseMapType = "gb", otherSrc = otherMapping, pkgName = "myPkg", pkgPath = myDir, organism = "human", version = "1.1.0", makeXML = TRUE, author = c(name = "myname", address = "myname@myemail.com")) # Output files list.files(myDir) # Content of the data package list.files(file.path(myDir, "myPkg")) list.files(file.path(myDir, "myPkg", "data")) list.files(file.path(myDir, "myPkg", "man")) list.files(file.path(myDir, "myPkg", "R")) unlink(file.path(myDir, "myPkg"), TRUE) unlink(file.path(myDir, "myPkg.xml"), file.path(myDir, "myPkgByNum.xml")) } unlink(c(file.path(myDir, "geneNMap"), file.path(myDir, "srcone"), file.path(myDir, "srctwo")))