| 1 | = '''User manual for XGAP''' = |
| 2 | == Introduction == |
| 3 | The core prodocut of the dbGG project is: |
| 4 | |
| 5 | * a '''data model for genetical gemics''' that researchers can use to describe relevant information on genetical genomics investigations in a standard way. We refer to the dbGG manuscript (submitted) and ‘description of data model’ |
| 6 | |
| 7 | From the data model a software infrastructure is generated to directly start using the model: |
| 8 | |
| 9 | * a '''database for genetical genomics (dbGG) '''that researchers can use to store and retrieve actual investigation data in the data model on a large scale. |
| 10 | |
| 11 | * a tab/comma '''delimited flat file format '''that researchers can use to exchange investigation data between dbGG instances. |
| 12 | |
| 13 | * a '''graphical user interface''' that researchers can use to navigate, search and update individual data in the database software infrastructure |
| 14 | |
| 15 | * several '''programmatic interfaces''', currently in R-project, Java and web services, that can be used by programming biologists to automate data uploads/downloads on a large scale. |
| 16 | |
| 17 | * a '''commandline import/export program''' that can be used from the commandline to upload/download complete investigations from/to the delimited flat file format. |
| 18 | |
| 19 | This document describes ''use of the software infrastructure.'' |
| 20 | |
| 21 | = Using the grapical user interface = |
| 22 | TODO. |
| 23 | |
| 24 | = Using the R interface = |
| 25 | The R-interface of dbGG distinguishes between two classes of data types: |
| 26 | |
| 27 | 1. ''Annotations''. |
| 28 | |
| 29 | Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example: |
| 30 | |
| 31 | ||name||chr||cm|| |
| 32 | ||PVV4||1||0|| |
| 33 | ||AXR-1||1||6.398|| |
| 34 | ||HH.335C-Col||1||10.786|| |
| 35 | ||DF.162L/164C-Col||1||12.913|| |
| 36 | ||EC.480C||1||15.059|| |
| 37 | ||EC.66C||1||21.846|| |
| 38 | ||GD.86L||1||23.802|| |
| 39 | ||g2395||1||27.749|| |
| 40 | ||CC.98L-Col/101C||1||31.212|| |
| 41 | ||AD.121C||1||41.271|| |
| 42 | |
| 43 | 2. ''Data matrices. '' |
| 44 | |
| 45 | A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example: |
| 46 | |
| 47 | (note how first row has one element less because of the rownames column): |
| 48 | |
| 49 | ||X1||X3||X4||X5||X6||X7||X8|||| |
| 50 | ||PVV4||1||1||2||1||2||2||1|| |
| 51 | ||AXR-1||1||1||2||1||2||2||1|| |
| 52 | ||HH.335C-Col||1||1||1||1||2||2||1|| |
| 53 | ||DF.162L/164C-Col||1||1||1||1||2||2||1|| |
| 54 | ||EC.480C||1||1||1||1||2||2||1|| |
| 55 | ||EC.66C||1||1||1||1||2||2||1|| |
| 56 | ||GD.86L||1||1||1||1||2||2||1|| |
| 57 | ||g2395||2||1||1||1||2||2||1|| |
| 58 | ||CC.98L-Col/101C||1||1||1||NA||2||2||1|| |
| 59 | |
| 60 | Below is described how to use to R-interface and its annotation and data matrix facilities. |
| 61 | |
| 62 | == Connect to dbGG == |
| 63 | Connect to your dbGG server using command (edit to your servername!) |
| 64 | |
| 65 | source("!http://<yourhost>:8080/dbgg/api/R/") |
| 66 | |
| 67 | #e.g. using demonstration server |
| 68 | |
| 69 | source("!http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/") |
| 70 | |
| 71 | #e.g. using local install |
| 72 | |
| 73 | source("!http://localhost:8080/dbgg/api/R/") |
| 74 | |
| 75 | == Download and upload annotations == |
| 76 | Annotation data is described in this section. |
| 77 | |
| 78 | * All annotations are handled inside R in tabular form using data.frames. E.g. |
| 79 | |
| 80 | * Each has a name and molgenisid |
| 81 | |
| 82 | * See document ‘TAB delimited format’ for details. |
| 83 | |
| 84 | * For each annotation type there are ‘find’, ‘add’, and ‘find’ functions. E.g there are |
| 85 | |
| 86 | * find.investigation(), add.investigation(), remove.investigation() |
| 87 | |
| 88 | * find.marker(), add.marker, remove.marker() |
| 89 | |
| 90 | * See all methods by calling ls() |
| 91 | |
| 92 | * Find results can be limited by setting search parameters: |
| 93 | |
| 94 | # limit to only markers from experiment 1. |
| 95 | |
| 96 | find.marker(investigation=1) |
| 97 | |
| 98 | * Default find parameters can be set. These parameters are then always used as filter. |
| 99 | |
| 100 | # use only data from investigation 1 |
| 101 | |
| 102 | use.investigation(molgenisid=1) |
| 103 | |
| 104 | # also can be done using investigation name |
| 105 | |
| 106 | use.investigation(name=”My investigation”) |
| 107 | |
| 108 | find.marker() |
| 109 | |
| 110 | # identical results to find.marker(investigation=1) |
| 111 | |
| 112 | * Add or remove annotations either by setting the properties individually or by passing them all in one data.frame. Note that the result of ‘add’ is a dataframe with the added information, but now including any default or autogenerated values (e.g. molgenisid) |
| 113 | |
| 114 | my_investigations = add.investigation(name=c(“Inv1”,”Inv2”) |
| 115 | |
| 116 | remove.investigation(my_investigations) |
| 117 | |
| 118 | == Download and upload data matrices == |
| 119 | The dbGG data model has a flexible structure to deal with data matrices. |
| 120 | |
| 121 | In the database these are stored using Data and !DataElement: |
| 122 | |
| 123 | * ‘Data’ to store the properties of the matrix (rowtype, coltype, valuetype). |
| 124 | |
| 125 | * ‘!DoubleDataElement’ or ‘!TextDataElement’ to store the double or text values of the matrix. |
| 126 | |
| 127 | * Each record of Double/!TextDataElement must refer to !DimensionElement annotations (e.g. Probe, Strain, Individual). |
| 128 | |
| 129 | An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.!DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix: |
| 130 | |
| 131 | === add.datamatrix === |
| 132 | add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=) |
| 133 | |
| 134 | Description of parameters: |
| 135 | |
| 136 | '''.data_matrix '''First parameter is the data matrix to be stured (as.matrix) |
| 137 | |
| 138 | '''name '''The name of the data set. Should be unique within and investigation. |
| 139 | |
| 140 | '''investigation '''The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before. |
| 141 | |
| 142 | '''rowtype '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. |
| 143 | |
| 144 | '''coltype '''The type of the rows. |
| 145 | |
| 146 | Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. |
| 147 | |
| 148 | '''valuetype '''The type of the values in the matrix, either ‘text’ or ‘double’. |
| 149 | |
| 150 | If ‘text’ then each matrix cel is added as one row in !TextDataElement. If ‘double’ each matrix cel is added as one row in !DoubleDataElement. |
| 151 | |
| 152 | When executed succesfully, one row is added to Data, and many rows to either !DoubleDataElement or !TextDataElement. |
| 153 | |
| 154 | === find.datamatrix / remove.datamatrix === |
| 155 | Functions: |
| 156 | |
| 157 | find.datamatrix(molgenisid=, name=, investigation=) |
| 158 | |
| 159 | #retrieves a data matrix |
| 160 | |
| 161 | remove.datamatrix(molgenisid=, name=, investigation=) |
| 162 | |
| 163 | #removes a data matrix |
| 164 | |
| 165 | Description of parameters: |
| 166 | |
| 167 | '''molgenisid '''the unique idea of the data set. |
| 168 | |
| 169 | Use ‘find.data()’ to get a list of data matrices available. |
| 170 | |
| 171 | '''name '''the name of the dataset (unique within this investigation). |
| 172 | |
| 173 | '''investigation '''the molgenisid of the investigation |
| 174 | |
| 175 | Note: to search one must either provide a {molgenisid} or the {name and investigation id). |
| 176 | |
| 177 | === Examples of data matrix functions === |
| 178 | Use find.datamatrix, add.datamatrix, remove.datamatrix: |
| 179 | |
| 180 | #add text matrix with rows refer to Marker and column to Individual |
| 181 | |
| 182 | add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”) |
| 183 | |
| 184 | #add double matrix with rows refer to Probe and column to Individual |
| 185 | |
| 186 | add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) |
| 187 | |
| 188 | #add double matrix with rows refer to Probe and column to Marker |
| 189 | |
| 190 | #assume Probe and Marker are not known |
| 191 | |
| 192 | add.marker(name=colnames(matrix) #adds marker without annotation |
| 193 | |
| 194 | add.probe(name=rownames(matrix) #adds probes without annotation |
| 195 | |
| 196 | add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) |
| 197 | |
| 198 | #find a data matrix |
| 199 | |
| 200 | #note: max one result, in contrast to find.annotation |
| 201 | |
| 202 | geno <- find.datamatrix(name=”my genotypes) |
| 203 | |
| 204 | #remove a data matrix |
| 205 | |
| 206 | remove.datamatrix(name=”my gene expression”) |
| 207 | |
| 208 | #list existing data matrices |
| 209 | |
| 210 | #note: is a normal annotation function |
| 211 | |
| 212 | find.data() |
| 213 | |
| 214 | = Using the web services interface = |
| 215 | TODO |
| 216 | |
| 217 | = Using the commandline client = |
| 218 | == Import whole investigation data from tab delimited files == |
| 219 | == Export whole investigation as tab delimited files. == |
| 220 | TODO |
| 221 | |
| 222 | = Appendix: a complete R script using dbGG = |
| 223 | Copy paste ready example code, given that you '''update the host''' (first line) |
| 224 | |
| 225 | (Tested on R 2.4.1 and 2.7.0) |
| 226 | |
| 227 | #connect to dbGG |
| 228 | |
| 229 | #source("!http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R") |
| 230 | |
| 231 | #Uncomment if RCurl is missing |
| 232 | |
| 233 | #source("!http://bioconductor.org/biocLite.R") |
| 234 | |
| 235 | #biocLite("RCurl") |
| 236 | |
| 237 | #use existing data from !MetaNetwork for example |
| 238 | |
| 239 | #install from zipfile from !http://gbic.biol.rug.nl/spip.php?rubrique48 |
| 240 | |
| 241 | library(!MetaNetwork) |
| 242 | |
| 243 | # |
| 244 | |
| 245 | #ADD DATA |
| 246 | |
| 247 | #-first annotations |
| 248 | |
| 249 | #-second data matrices (referering to annotatations) |
| 250 | |
| 251 | # |
| 252 | |
| 253 | #add investigation |
| 254 | |
| 255 | investigation_return = add.investigation(name="Example investigation !MetaNetwork", start="2008-05-31", end="2009-05-31") |
| 256 | |
| 257 | use.investigation(name="Example investigation !MetaNetwork") |
| 258 | |
| 259 | #use sets globabl parameter so we don't need to pass parameter'investigation=<number>' on every call |
| 260 | |
| 261 | #add markers |
| 262 | |
| 263 | data(markers) |
| 264 | |
| 265 | markers = as.data.frame(markers) |
| 266 | |
| 267 | markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm) |
| 268 | |
| 269 | #add individuals (take name from genotypes) |
| 270 | |
| 271 | data(genotypes) |
| 272 | |
| 273 | individuals = data.frame(name=colnames(genotypes)) |
| 274 | |
| 275 | individuals_return = add.individual(individuals) |
| 276 | |
| 277 | #add metabolites (take name from traits) |
| 278 | |
| 279 | data(traits) |
| 280 | |
| 281 | metabolites = data.frame(name=rownames(traits)) |
| 282 | |
| 283 | metabolites_return = add.metabolites(metabolites) |
| 284 | |
| 285 | #add data matrices for genotypes, metabolite expression and qtl profiles |
| 286 | |
| 287 | #data(traits) |
| 288 | |
| 289 | #data(genotypes) |
| 290 | |
| 291 | data(qtlProfiles) |
| 292 | |
| 293 | add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text") |
| 294 | |
| 295 | add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text") |
| 296 | |
| 297 | add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double") |
| 298 | |
| 299 | # |
| 300 | |
| 301 | # VERIFY DATA uploaded and downloaded data |
| 302 | |
| 303 | # |
| 304 | |
| 305 | #retrieve the uploaded data |
| 306 | |
| 307 | geno2 <- find.datamatrix(name="the genotypes") |
| 308 | |
| 309 | traits2 <- find.datamatrix(name="the metabolite expression") |
| 310 | |
| 311 | qtls2 <- find.datamatrix(name="the QTL profiles") |
| 312 | |
| 313 | #is it identical??? |
| 314 | |
| 315 | identical(genotypes,geno2) |
| 316 | |
| 317 | identical(traits,traits) |
| 318 | |
| 319 | identical(qtlProfiles,qtls2) |
| 320 | |
| 321 | #ai, there is rounding going on somewhere! |
| 322 | |
| 323 | format(qtlProfiles[12,1],digits=20) |
| 324 | |
| 325 | format(qtls2[12,1],digits=20) |
| 326 | |
| 327 | #as this already happens during write.csv this seems partly due to R itself !!! |
| 328 | |
| 329 | #write.table(qtlProfiles, file="!c:/test.txt") |
| 330 | |
| 331 | #qtlProfiles_copy = read.table(file="!c:/test.txt") |
| 332 | |
| 333 | #identical(qtlProfiles,qtlProfiles_copy) |
| 334 | |
| 335 | # |
| 336 | |
| 337 | all.equal(qtlProfiles,qtls2) |
| 338 | |
| 339 | #compare annotations |
| 340 | |
| 341 | identical(markers_return$name,rownames(markers)) |
| 342 | |
| 343 | identical(markers_return$name,rownames(genotypes)) |
| 344 | |
| 345 | identical(markers_return$name,colnames(qtlProfiles)) |
| 346 | |
| 347 | identical(metabolites_return$name,rownames(traits)) |
| 348 | |
| 349 | identical(individuals_return$name,colnames(genotypes)) |
| 350 | |
| 351 | identical(individuals_return$name,colnames(traits)) |
| 352 | |
| 353 | # |
| 354 | |
| 355 | # REMOVE DATA again |
| 356 | |
| 357 | # in reverse order |
| 358 | |
| 359 | # |
| 360 | |
| 361 | #remove matrices |
| 362 | |
| 363 | remove.datamatrix(name="the genotypes") |
| 364 | |
| 365 | remove.datamatrix(name="the metabolite expression") |
| 366 | |
| 367 | remove.datamatrix(name="the QTL profiles") |
| 368 | |
| 369 | #remove annotations |
| 370 | |
| 371 | remove.metabolite(metabolites_return) |
| 372 | |
| 373 | remove.individual(individuals_return) |
| 374 | |
| 375 | remove.marker(markers_return) |
| 376 | |
| 377 | remove.investigation(investigation_return) |