Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of XgapObjectModel

Timestamp:: 2010-10-01T23:38:13+02:00 (15 years ago)
Author:: trac
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

XgapObjectModel

                       v1
+= '''User manual for XGAP''' =
+== Introduction ==
+The core prodocut of the dbGG project is:
+ * a '''data         model for genetical gemics''' that researchers can use to describe         relevant information on genetical genomics investigations in a         standard way. We refer to the dbGG manuscript (submitted) and         ‘description of data model’
+From the data model a software infrastructure is generated to directly start using the model:
+ * a '''database         for genetical genomics (dbGG) '''that researchers can use to store         and retrieve actual investigation data in the data model on a large         scale.
+ * a tab/comma         '''delimited flat file format '''that researchers can use to         exchange investigation data between dbGG instances.
+ * a '''graphical         user interface''' that researchers can use to navigate, search and         update individual data in the database software infrastructure
+ * several         '''programmatic interfaces''', currently in R-project, Java and web         services, that can be used by programming biologists to automate         data uploads/downloads on a large scale.
+ * a '''commandline         import/export program''' that can be used from the commandline to         upload/download complete investigations from/to the delimited flat         file format.
+This document describes ''use of the software infrastructure.''
+= Using the grapical user interface =
+TODO.
+= Using the R interface =
+The R-interface of dbGG distinguishes between two classes of data types:
+. ''Annotations''.
+Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example:
+||name||chr||cm||
+||PVV4||1||0||
+||AXR-1||1||6.398||
+||HH.335C-Col||1||10.786||
+||DF.162L/164C-Col||1||12.913||
+||EC.480C||1||15.059||
+||EC.66C||1||21.846||
+||GD.86L||1||23.802||
+||g2395||1||27.749||
+||CC.98L-Col/101C||1||31.212||
+||AD.121C||1||41.271||
+. ''Data matrices. ''
+A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example:
+(note how first row has one element less because of the rownames column):
+||X1||X3||X4||X5||X6||X7||X8||||
+||PVV4||1||1||2||1||2||2||1||
+||AXR-1||1||1||2||1||2||2||1||
+||HH.335C-Col||1||1||1||1||2||2||1||
+||DF.162L/164C-Col||1||1||1||1||2||2||1||
+||EC.480C||1||1||1||1||2||2||1||
+||EC.66C||1||1||1||1||2||2||1||
+||GD.86L||1||1||1||1||2||2||1||
+||g2395||2||1||1||1||2||2||1||
+||CC.98L-Col/101C||1||1||1||NA||2||2||1||
+Below is described how to use to R-interface and its annotation and data matrix facilities.
+== Connect to dbGG ==
+Connect to your dbGG server using command (edit to your servername!)
+source("!http://<yourhost>:8080/dbgg/api/R/")
+#e.g. using demonstration server
+source("!http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/")
+#e.g. using local install
+source("!http://localhost:8080/dbgg/api/R/")
+== Download and upload annotations ==
+Annotation data is described in this section.
+ * All annotations are handled inside R in tabular form using         data.frames. E.g.
+ * Each has a name and molgenisid
+ * See document ‘TAB delimited format’ for details.
+ * For each annotation type there are ‘find’, ‘add’, and         ‘find’ functions. E.g there are
+ * find.investigation(), add.investigation(),                 remove.investigation()
+ * find.marker(),                 add.marker, remove.marker()
+ * See all methods by calling ls()
+ * Find results can be limited by setting search parameters:
+# limit to only markers from experiment 1.
+find.marker(investigation=1)
+ * Default find parameters can be set. These parameters are then         always used as filter.
+# use only data from investigation 1
+use.investigation(molgenisid=1)
+# also can be done using investigation name
+use.investigation(name=”My investigation”)
+find.marker()
+# identical results to find.marker(investigation=1)
+ * Add or remove annotations either by setting the properties         individually or by passing them all in one data.frame. Note that the         result of ‘add’ is a dataframe with the added information, but         now including any default or autogenerated values (e.g. molgenisid)
+my_investigations = add.investigation(name=c(“Inv1”,”Inv2”)
+remove.investigation(my_investigations)
+== Download and upload data matrices ==
+The dbGG data model has a flexible structure to deal with data matrices.
+In the database these are stored using Data and !DataElement:
+ * ‘Data’ to store the properties         of the matrix (rowtype, coltype, valuetype).
+ * ‘!DoubleDataElement’ or         ‘!TextDataElement’ to store the double or text values of the         matrix.
+ * Each record of         Double/!TextDataElement must refer to !DimensionElement annotations         (e.g. Probe, Strain, Individual).
+An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.!DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix:
+=== add.datamatrix ===
+add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=)
+Description of parameters:
+'''.data_matrix        '''First parameter is the data matrix to be stured (as.matrix)
+'''name                '''The name of the data set. Should be unique within and investigation.
+'''investigation        '''The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before.
+'''rowtype        '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
+'''coltype        '''The type of the rows.
+Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
+'''valuetype        '''The type of the values in the matrix, either ‘text’ or ‘double’.
+If ‘text’ then each matrix cel is added as one row in !TextDataElement. If ‘double’ each matrix cel is added as one row in !DoubleDataElement.
+When executed succesfully, one row is added to Data, and many rows to either !DoubleDataElement or !TextDataElement.
+=== find.datamatrix / remove.datamatrix ===
+Functions:
+find.datamatrix(molgenisid=, name=, investigation=)
+#retrieves a data matrix
+remove.datamatrix(molgenisid=, name=, investigation=)
+#removes a data matrix
+Description of parameters:
+'''molgenisid        '''the unique idea of the data set.
+Use ‘find.data()’ to get a list of data matrices available.
+'''name                '''the name of the dataset (unique within this investigation).
+'''investigation        '''the molgenisid of the investigation
+Note: to search one must either provide a {molgenisid} or the {name and investigation id).
+=== Examples of data matrix functions ===
+Use find.datamatrix, add.datamatrix, remove.datamatrix:
+#add text matrix with rows refer to Marker and column to Individual
+add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”)
+#add double matrix with rows refer to Probe and column to Individual
+add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
+#add double matrix with rows refer to Probe and column to Marker
+#assume Probe and Marker are not known
+add.marker(name=colnames(matrix) #adds marker without annotation
+add.probe(name=rownames(matrix) #adds probes without annotation
+add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
+#find a data matrix
+#note: max one result, in contrast to find.annotation
+geno <- find.datamatrix(name=”my genotypes)
+#remove a data matrix
+remove.datamatrix(name=”my gene expression”)
+#list existing data matrices
+#note: is a normal annotation function
+find.data()
+= Using the web services interface =
+TODO
+= Using the commandline client =
+== Import whole investigation data from tab delimited files ==
+== Export whole investigation as tab delimited files. ==
+TODO
+= Appendix: a complete R script using dbGG =
+Copy paste ready example code, given that you '''update the host''' (first line)
+(Tested on R 2.4.1 and 2.7.0)
+#connect to dbGG
+#source("!http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R")
+#Uncomment if RCurl is missing
+#source("!http://bioconductor.org/biocLite.R")
+#biocLite("RCurl")
+#use existing data from !MetaNetwork for example
+#install from zipfile from !http://gbic.biol.rug.nl/spip.php?rubrique48
+library(!MetaNetwork)
+#
+#ADD DATA
+#-first annotations
+#-second data matrices (referering to annotatations)
+#
+#add investigation
+investigation_return = add.investigation(name="Example investigation !MetaNetwork", start="2008-05-31", end="2009-05-31")
+use.investigation(name="Example investigation !MetaNetwork")
+#use sets globabl parameter so we don't need to pass parameter'investigation=<number>' on every call
+#add markers
+data(markers)
+markers = as.data.frame(markers)
+markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm)
+#add individuals (take name from genotypes)
+data(genotypes)
+individuals = data.frame(name=colnames(genotypes))
+individuals_return = add.individual(individuals)
+#add metabolites (take name from traits)
+data(traits)
+metabolites = data.frame(name=rownames(traits))
+metabolites_return = add.metabolites(metabolites)
+#add data matrices for genotypes, metabolite expression and qtl profiles
+#data(traits)
+#data(genotypes)
+data(qtlProfiles)
+add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text")
+add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text")
+add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double")
+#
+# VERIFY DATA uploaded and downloaded data
+#
+#retrieve the uploaded data
+geno2   <- find.datamatrix(name="the genotypes")
+traits2 <- find.datamatrix(name="the metabolite expression")
+qtls2   <- find.datamatrix(name="the QTL profiles")
+#is it identical???
+identical(genotypes,geno2)
+identical(traits,traits)
+identical(qtlProfiles,qtls2)
+#ai, there is rounding going on somewhere!
+format(qtlProfiles[12,1],digits=20)
+format(qtls2[12,1],digits=20)
+#as this already happens during write.csv this seems partly due to R itself !!!
+#write.table(qtlProfiles, file="!c:/test.txt")
+#qtlProfiles_copy = read.table(file="!c:/test.txt")
+#identical(qtlProfiles,qtlProfiles_copy)
+#
+all.equal(qtlProfiles,qtls2)
+#compare annotations
+identical(markers_return$name,rownames(markers))
+identical(markers_return$name,rownames(genotypes))
+identical(markers_return$name,colnames(qtlProfiles))
+identical(metabolites_return$name,rownames(traits))
+identical(individuals_return$name,colnames(genotypes))
+identical(individuals_return$name,colnames(traits))
+#
+# REMOVE DATA again
+# in reverse order
+#
+#remove matrices
+remove.datamatrix(name="the genotypes")
+remove.datamatrix(name="the metabolite expression")
+remove.datamatrix(name="the QTL profiles")
+#remove annotations
+remove.metabolite(metabolites_return)
+remove.individual(individuals_return)
+remove.marker(markers_return)
+remove.investigation(investigation_return)