User manual for XGAP
Introduction
The core prodocut of the dbGG project is:
- a data model for genetical gemics that researchers can use to describe relevant information on genetical genomics investigations in a standard way. We refer to the dbGG manuscript (submitted) and ‘description of data model’
From the data model a software infrastructure is generated to directly start using the model:
- a database for genetical genomics (dbGG) that researchers can use to store and retrieve actual investigation data in the data model on a large scale.
- a tab/comma delimited flat file format that researchers can use to exchange investigation data between dbGG instances.
- a graphical user interface that researchers can use to navigate, search and update individual data in the database software infrastructure
- several programmatic interfaces, currently in R-project, Java and web services, that can be used by programming biologists to automate data uploads/downloads on a large scale.
- a commandline import/export program that can be used from the commandline to upload/download complete investigations from/to the delimited flat file format.
This document describes use of the software infrastructure.
Using the grapical user interface
TODO.
Using the R interface
The R-interface of dbGG distinguishes between two classes of data types:
- Annotations.
Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example:
name | chr | cm |
PVV4 | 1 | 0 |
AXR-1 | 1 | 6.398 |
HH.335C-Col | 1 | 10.786 |
DF.162L/164C-Col | 1 | 12.913 |
EC.480C | 1 | 15.059 |
EC.66C | 1 | 21.846 |
GD.86L | 1 | 23.802 |
g2395 | 1 | 27.749 |
CC.98L-Col/101C | 1 | 31.212 |
AD.121C | 1 | 41.271 |
- Data matrices.
A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example:
(note how first row has one element less because of the rownames column):
X1 | X3 | X4 | X5 | X6 | X7 | X8 | |
PVV4 | 1 | 1 | 2 | 1 | 2 | 2 | 1 |
AXR-1 | 1 | 1 | 2 | 1 | 2 | 2 | 1 |
HH.335C-Col | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
DF.162L/164C-Col | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
EC.480C | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
EC.66C | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
GD.86L | 1 | 1 | 1 | 1 | 2 | 2 | 1 |
g2395 | 2 | 1 | 1 | 1 | 2 | 2 | 1 |
CC.98L-Col/101C | 1 | 1 | 1 | NA | 2 | 2 | 1 |
Below is described how to use to R-interface and its annotation and data matrix facilities.
Connect to dbGG
Connect to your dbGG server using command (edit to your servername!)
source("http://<yourhost>:8080/dbgg/api/R/")
#e.g. using demonstration server
source("http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/")
#e.g. using local install
source("http://localhost:8080/dbgg/api/R/")
Download and upload annotations
Annotation data is described in this section.
- All annotations are handled inside R in tabular form using data.frames. E.g.
- Each has a name and molgenisid
- See document ‘TAB delimited format’ for details.
- For each annotation type there are ‘find’, ‘add’, and ‘find’ functions. E.g there are
- find.investigation(), add.investigation(), remove.investigation()
- find.marker(), add.marker, remove.marker()
- See all methods by calling ls()
- Find results can be limited by setting search parameters:
# limit to only markers from experiment 1.
find.marker(investigation=1)
- Default find parameters can be set. These parameters are then always used as filter.
# use only data from investigation 1
use.investigation(molgenisid=1)
# also can be done using investigation name
use.investigation(name=”My investigation”)
find.marker()
# identical results to find.marker(investigation=1)
- Add or remove annotations either by setting the properties individually or by passing them all in one data.frame. Note that the result of ‘add’ is a dataframe with the added information, but now including any default or autogenerated values (e.g. molgenisid)
my_investigations = add.investigation(name=c(“Inv1”,”Inv2”)
remove.investigation(my_investigations)
Download and upload data matrices
The dbGG data model has a flexible structure to deal with data matrices.
In the database these are stored using Data and DataElement:
- ‘Data’ to store the properties of the matrix (rowtype, coltype, valuetype).
- ‘DoubleDataElement’ or ‘TextDataElement’ to store the double or text values of the matrix.
- Each record of Double/TextDataElement must refer to DimensionElement annotations (e.g. Probe, Strain, Individual).
An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix:
add.datamatrix
add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=)
Description of parameters:
.data_matrix First parameter is the data matrix to be stured (as.matrix)
name The name of the data set. Should be unique within and investigation.
investigation The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before.
rowtype The type of the rows. Each rowname must refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
coltype The type of the rows.
Each rowname must refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found.
valuetype The type of the values in the matrix, either ‘text’ or ‘double’.
If ‘text’ then each matrix cel is added as one row in TextDataElement. If ‘double’ each matrix cel is added as one row in DoubleDataElement.
When executed succesfully, one row is added to Data, and many rows to either DoubleDataElement or TextDataElement.
find.datamatrix / remove.datamatrix
Functions:
find.datamatrix(molgenisid=, name=, investigation=)
#retrieves a data matrix
remove.datamatrix(molgenisid=, name=, investigation=)
#removes a data matrix
Description of parameters:
molgenisid the unique idea of the data set.
Use ‘find.data()’ to get a list of data matrices available.
name the name of the dataset (unique within this investigation).
investigation the molgenisid of the investigation
Note: to search one must either provide a {molgenisid} or the {name and investigation id).
Examples of data matrix functions
Use find.datamatrix, add.datamatrix, remove.datamatrix:
#add text matrix with rows refer to Marker and column to Individual
add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”)
#add double matrix with rows refer to Probe and column to Individual
add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
#add double matrix with rows refer to Probe and column to Marker
#assume Probe and Marker are not known
add.marker(name=colnames(matrix) #adds marker without annotation
add.probe(name=rownames(matrix) #adds probes without annotation
add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”)
#find a data matrix
#note: max one result, in contrast to find.annotation
geno <- find.datamatrix(name=”my genotypes)
#remove a data matrix
remove.datamatrix(name=”my gene expression”)
#list existing data matrices
#note: is a normal annotation function
find.data()
Using the web services interface
TODO
Using the commandline client
Import whole investigation data from tab delimited files
Export whole investigation as tab delimited files.
TODO
Appendix: a complete R script using dbGG
Copy paste ready example code, given that you update the host (first line)
(Tested on R 2.4.1 and 2.7.0)
#connect to dbGG
#source("http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R")
#Uncomment if RCurl is missing
#source("http://bioconductor.org/biocLite.R")
#biocLite("RCurl")
#use existing data from MetaNetwork for example
#install from zipfile from http://gbic.biol.rug.nl/spip.php?rubrique48
library(MetaNetwork)
#
#ADD DATA
#-first annotations
#-second data matrices (referering to annotatations)
#
#add investigation
investigation_return = add.investigation(name="Example investigation MetaNetwork", start="2008-05-31", end="2009-05-31")
use.investigation(name="Example investigation MetaNetwork")
#use sets globabl parameter so we don't need to pass parameter'investigation=<number>' on every call
#add markers
data(markers)
markers = as.data.frame(markers)
markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm)
#add individuals (take name from genotypes)
data(genotypes)
individuals = data.frame(name=colnames(genotypes))
individuals_return = add.individual(individuals)
#add metabolites (take name from traits)
data(traits)
metabolites = data.frame(name=rownames(traits))
metabolites_return = add.metabolites(metabolites)
#add data matrices for genotypes, metabolite expression and qtl profiles
#data(traits)
#data(genotypes)
data(qtlProfiles)
add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text")
add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text")
add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double")
#
# VERIFY DATA uploaded and downloaded data
#
#retrieve the uploaded data
geno2 <- find.datamatrix(name="the genotypes")
traits2 <- find.datamatrix(name="the metabolite expression")
qtls2 <- find.datamatrix(name="the QTL profiles")
#is it identical???
identical(genotypes,geno2)
identical(traits,traits)
identical(qtlProfiles,qtls2)
#ai, there is rounding going on somewhere!
format(qtlProfiles[12,1],digits=20)
format(qtls2[12,1],digits=20)
#as this already happens during write.csv this seems partly due to R itself !!!
#write.table(qtlProfiles, file="c:/test.txt")
#qtlProfiles_copy = read.table(file="c:/test.txt")
#identical(qtlProfiles,qtlProfiles_copy)
#
all.equal(qtlProfiles,qtls2)
#compare annotations
identical(markers_return$name,rownames(markers))
identical(markers_return$name,rownames(genotypes))
identical(markers_return$name,colnames(qtlProfiles))
identical(metabolites_return$name,rownames(traits))
identical(individuals_return$name,colnames(genotypes))
identical(individuals_return$name,colnames(traits))
#
# REMOVE DATA again
# in reverse order
#
#remove matrices
remove.datamatrix(name="the genotypes")
remove.datamatrix(name="the metabolite expression")
remove.datamatrix(name="the QTL profiles")
#remove annotations
remove.metabolite(metabolites_return)
remove.individual(individuals_return)
remove.marker(markers_return)
remove.investigation(investigation_return)
Attachments (1)
- xgap_umldiagram_v1_0.pdf (21.1 KB) - added by 15 years ago.
Download all attachments as: .zip