| | 1 | = '''User manual for XGAP''' = |
| | 2 | == Introduction == |
| | 3 | The core prodocut of the dbGG project is: |
| | 4 | |
| | 5 | * a '''data model for genetical gemics''' that researchers can use to describe relevant information on genetical genomics investigations in a standard way. We refer to the dbGG manuscript (submitted) and ‘description of data model’ |
| | 6 | |
| | 7 | From the data model a software infrastructure is generated to directly start using the model: |
| | 8 | |
| | 9 | * a '''database for genetical genomics (dbGG) '''that researchers can use to store and retrieve actual investigation data in the data model on a large scale. |
| | 10 | |
| | 11 | * a tab/comma '''delimited flat file format '''that researchers can use to exchange investigation data between dbGG instances. |
| | 12 | |
| | 13 | * a '''graphical user interface''' that researchers can use to navigate, search and update individual data in the database software infrastructure |
| | 14 | |
| | 15 | * several '''programmatic interfaces''', currently in R-project, Java and web services, that can be used by programming biologists to automate data uploads/downloads on a large scale. |
| | 16 | |
| | 17 | * a '''commandline import/export program''' that can be used from the commandline to upload/download complete investigations from/to the delimited flat file format. |
| | 18 | |
| | 19 | This document describes ''use of the software infrastructure.'' |
| | 20 | |
| | 21 | = Using the grapical user interface = |
| | 22 | TODO. |
| | 23 | |
| | 24 | = Using the R interface = |
| | 25 | The R-interface of dbGG distinguishes between two classes of data types: |
| | 26 | |
| | 27 | 1. ''Annotations''. |
| | 28 | |
| | 29 | Annotations are lists of data that are stored as data.frame, e.g., each row describes a Marker. Each columnname refers to a particular property, e.g. ‘name’ or ‘molgenisid’. Rownames are ignored. For example: |
| | 30 | |
| | 31 | ||name||chr||cm|| |
| | 32 | ||PVV4||1||0|| |
| | 33 | ||AXR-1||1||6.398|| |
| | 34 | ||HH.335C-Col||1||10.786|| |
| | 35 | ||DF.162L/164C-Col||1||12.913|| |
| | 36 | ||EC.480C||1||15.059|| |
| | 37 | ||EC.66C||1||21.846|| |
| | 38 | ||GD.86L||1||23.802|| |
| | 39 | ||g2395||1||27.749|| |
| | 40 | ||CC.98L-Col/101C||1||31.212|| |
| | 41 | ||AD.121C||1||41.271|| |
| | 42 | |
| | 43 | 2. ''Data matrices. '' |
| | 44 | |
| | 45 | A data matrix contains data in tabular format, e.g. rownames refer to Marker, colnames refer to Probe, values indicate QTL p-value. Rownames refer to annotations and columnnames refering to annotations. Rownames and Columnnames are required. For example: |
| | 46 | |
| | 47 | (note how first row has one element less because of the rownames column): |
| | 48 | |
| | 49 | ||X1||X3||X4||X5||X6||X7||X8|||| |
| | 50 | ||PVV4||1||1||2||1||2||2||1|| |
| | 51 | ||AXR-1||1||1||2||1||2||2||1|| |
| | 52 | ||HH.335C-Col||1||1||1||1||2||2||1|| |
| | 53 | ||DF.162L/164C-Col||1||1||1||1||2||2||1|| |
| | 54 | ||EC.480C||1||1||1||1||2||2||1|| |
| | 55 | ||EC.66C||1||1||1||1||2||2||1|| |
| | 56 | ||GD.86L||1||1||1||1||2||2||1|| |
| | 57 | ||g2395||2||1||1||1||2||2||1|| |
| | 58 | ||CC.98L-Col/101C||1||1||1||NA||2||2||1|| |
| | 59 | |
| | 60 | Below is described how to use to R-interface and its annotation and data matrix facilities. |
| | 61 | |
| | 62 | == Connect to dbGG == |
| | 63 | Connect to your dbGG server using command (edit to your servername!) |
| | 64 | |
| | 65 | source("!http://<yourhost>:8080/dbgg/api/R/") |
| | 66 | |
| | 67 | #e.g. using demonstration server |
| | 68 | |
| | 69 | source("!http://gbicserver1.biol.rug.nl:8080/dbgg/api/R/") |
| | 70 | |
| | 71 | #e.g. using local install |
| | 72 | |
| | 73 | source("!http://localhost:8080/dbgg/api/R/") |
| | 74 | |
| | 75 | == Download and upload annotations == |
| | 76 | Annotation data is described in this section. |
| | 77 | |
| | 78 | * All annotations are handled inside R in tabular form using data.frames. E.g. |
| | 79 | |
| | 80 | * Each has a name and molgenisid |
| | 81 | |
| | 82 | * See document ‘TAB delimited format’ for details. |
| | 83 | |
| | 84 | * For each annotation type there are ‘find’, ‘add’, and ‘find’ functions. E.g there are |
| | 85 | |
| | 86 | * find.investigation(), add.investigation(), remove.investigation() |
| | 87 | |
| | 88 | * find.marker(), add.marker, remove.marker() |
| | 89 | |
| | 90 | * See all methods by calling ls() |
| | 91 | |
| | 92 | * Find results can be limited by setting search parameters: |
| | 93 | |
| | 94 | # limit to only markers from experiment 1. |
| | 95 | |
| | 96 | find.marker(investigation=1) |
| | 97 | |
| | 98 | * Default find parameters can be set. These parameters are then always used as filter. |
| | 99 | |
| | 100 | # use only data from investigation 1 |
| | 101 | |
| | 102 | use.investigation(molgenisid=1) |
| | 103 | |
| | 104 | # also can be done using investigation name |
| | 105 | |
| | 106 | use.investigation(name=”My investigation”) |
| | 107 | |
| | 108 | find.marker() |
| | 109 | |
| | 110 | # identical results to find.marker(investigation=1) |
| | 111 | |
| | 112 | * Add or remove annotations either by setting the properties individually or by passing them all in one data.frame. Note that the result of ‘add’ is a dataframe with the added information, but now including any default or autogenerated values (e.g. molgenisid) |
| | 113 | |
| | 114 | my_investigations = add.investigation(name=c(“Inv1”,”Inv2”) |
| | 115 | |
| | 116 | remove.investigation(my_investigations) |
| | 117 | |
| | 118 | == Download and upload data matrices == |
| | 119 | The dbGG data model has a flexible structure to deal with data matrices. |
| | 120 | |
| | 121 | In the database these are stored using Data and !DataElement: |
| | 122 | |
| | 123 | * ‘Data’ to store the properties of the matrix (rowtype, coltype, valuetype). |
| | 124 | |
| | 125 | * ‘!DoubleDataElement’ or ‘!TextDataElement’ to store the double or text values of the matrix. |
| | 126 | |
| | 127 | * Each record of Double/!TextDataElement must refer to !DimensionElement annotations (e.g. Probe, Strain, Individual). |
| | 128 | |
| | 129 | An conventient interface to deal with data matrices has been added. Instead of using find/add/remove.Data and find/add/remove.!DataElement. one can use find.datamatrix, add.datamatrix and remove.datamatrix: |
| | 130 | |
| | 131 | === add.datamatrix === |
| | 132 | add.datamatrix(.data_matrix, name=, investigation= , rowtype= , coltype= , valuetype=) |
| | 133 | |
| | 134 | Description of parameters: |
| | 135 | |
| | 136 | '''.data_matrix '''First parameter is the data matrix to be stured (as.matrix) |
| | 137 | |
| | 138 | '''name '''The name of the data set. Should be unique within and investigation. |
| | 139 | |
| | 140 | '''investigation '''The molgenisid of the investigation. Doesn’t need to be set if use.investigation() has been called before. |
| | 141 | |
| | 142 | '''rowtype '''The type of the rows. Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. |
| | 143 | |
| | 144 | '''coltype '''The type of the rows. |
| | 145 | |
| | 146 | Each rowname '''must''' refer to an instance of this type. E.g. rowtype=”marker” means that for each rowname there can be a marker$name found. |
| | 147 | |
| | 148 | '''valuetype '''The type of the values in the matrix, either ‘text’ or ‘double’. |
| | 149 | |
| | 150 | If ‘text’ then each matrix cel is added as one row in !TextDataElement. If ‘double’ each matrix cel is added as one row in !DoubleDataElement. |
| | 151 | |
| | 152 | When executed succesfully, one row is added to Data, and many rows to either !DoubleDataElement or !TextDataElement. |
| | 153 | |
| | 154 | === find.datamatrix / remove.datamatrix === |
| | 155 | Functions: |
| | 156 | |
| | 157 | find.datamatrix(molgenisid=, name=, investigation=) |
| | 158 | |
| | 159 | #retrieves a data matrix |
| | 160 | |
| | 161 | remove.datamatrix(molgenisid=, name=, investigation=) |
| | 162 | |
| | 163 | #removes a data matrix |
| | 164 | |
| | 165 | Description of parameters: |
| | 166 | |
| | 167 | '''molgenisid '''the unique idea of the data set. |
| | 168 | |
| | 169 | Use ‘find.data()’ to get a list of data matrices available. |
| | 170 | |
| | 171 | '''name '''the name of the dataset (unique within this investigation). |
| | 172 | |
| | 173 | '''investigation '''the molgenisid of the investigation |
| | 174 | |
| | 175 | Note: to search one must either provide a {molgenisid} or the {name and investigation id). |
| | 176 | |
| | 177 | === Examples of data matrix functions === |
| | 178 | Use find.datamatrix, add.datamatrix, remove.datamatrix: |
| | 179 | |
| | 180 | #add text matrix with rows refer to Marker and column to Individual |
| | 181 | |
| | 182 | add.datamatrix(matrix, name=”my genotypes”, rowtype=”Marker”, coltype=”Individual”, valuetype=”Text”) |
| | 183 | |
| | 184 | #add double matrix with rows refer to Probe and column to Individual |
| | 185 | |
| | 186 | add.datamatrix(matrix, name=”my gene expression”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) |
| | 187 | |
| | 188 | #add double matrix with rows refer to Probe and column to Marker |
| | 189 | |
| | 190 | #assume Probe and Marker are not known |
| | 191 | |
| | 192 | add.marker(name=colnames(matrix) #adds marker without annotation |
| | 193 | |
| | 194 | add.probe(name=rownames(matrix) #adds probes without annotation |
| | 195 | |
| | 196 | add.datamatrix(matrix, name=”my QTLs”, rowtype=”Probe”, coltype=”Individual”, valuetype=”Double”) |
| | 197 | |
| | 198 | #find a data matrix |
| | 199 | |
| | 200 | #note: max one result, in contrast to find.annotation |
| | 201 | |
| | 202 | geno <- find.datamatrix(name=”my genotypes) |
| | 203 | |
| | 204 | #remove a data matrix |
| | 205 | |
| | 206 | remove.datamatrix(name=”my gene expression”) |
| | 207 | |
| | 208 | #list existing data matrices |
| | 209 | |
| | 210 | #note: is a normal annotation function |
| | 211 | |
| | 212 | find.data() |
| | 213 | |
| | 214 | = Using the web services interface = |
| | 215 | TODO |
| | 216 | |
| | 217 | = Using the commandline client = |
| | 218 | == Import whole investigation data from tab delimited files == |
| | 219 | == Export whole investigation as tab delimited files. == |
| | 220 | TODO |
| | 221 | |
| | 222 | = Appendix: a complete R script using dbGG = |
| | 223 | Copy paste ready example code, given that you '''update the host''' (first line) |
| | 224 | |
| | 225 | (Tested on R 2.4.1 and 2.7.0) |
| | 226 | |
| | 227 | #connect to dbGG |
| | 228 | |
| | 229 | #source("!http://gbicserver1.biol.rug.nl:8080/molgenis4dbgg/api/R") |
| | 230 | |
| | 231 | #Uncomment if RCurl is missing |
| | 232 | |
| | 233 | #source("!http://bioconductor.org/biocLite.R") |
| | 234 | |
| | 235 | #biocLite("RCurl") |
| | 236 | |
| | 237 | #use existing data from !MetaNetwork for example |
| | 238 | |
| | 239 | #install from zipfile from !http://gbic.biol.rug.nl/spip.php?rubrique48 |
| | 240 | |
| | 241 | library(!MetaNetwork) |
| | 242 | |
| | 243 | # |
| | 244 | |
| | 245 | #ADD DATA |
| | 246 | |
| | 247 | #-first annotations |
| | 248 | |
| | 249 | #-second data matrices (referering to annotatations) |
| | 250 | |
| | 251 | # |
| | 252 | |
| | 253 | #add investigation |
| | 254 | |
| | 255 | investigation_return = add.investigation(name="Example investigation !MetaNetwork", start="2008-05-31", end="2009-05-31") |
| | 256 | |
| | 257 | use.investigation(name="Example investigation !MetaNetwork") |
| | 258 | |
| | 259 | #use sets globabl parameter so we don't need to pass parameter'investigation=<number>' on every call |
| | 260 | |
| | 261 | #add markers |
| | 262 | |
| | 263 | data(markers) |
| | 264 | |
| | 265 | markers = as.data.frame(markers) |
| | 266 | |
| | 267 | markers_return = add.markers(name=rownames(markers), chr=markers$chr, cm=markers$cm) |
| | 268 | |
| | 269 | #add individuals (take name from genotypes) |
| | 270 | |
| | 271 | data(genotypes) |
| | 272 | |
| | 273 | individuals = data.frame(name=colnames(genotypes)) |
| | 274 | |
| | 275 | individuals_return = add.individual(individuals) |
| | 276 | |
| | 277 | #add metabolites (take name from traits) |
| | 278 | |
| | 279 | data(traits) |
| | 280 | |
| | 281 | metabolites = data.frame(name=rownames(traits)) |
| | 282 | |
| | 283 | metabolites_return = add.metabolites(metabolites) |
| | 284 | |
| | 285 | #add data matrices for genotypes, metabolite expression and qtl profiles |
| | 286 | |
| | 287 | #data(traits) |
| | 288 | |
| | 289 | #data(genotypes) |
| | 290 | |
| | 291 | data(qtlProfiles) |
| | 292 | |
| | 293 | add.datamatrix(genotypes, name="the genotypes", rowtype="marker", coltype="individual", valuetype="text") |
| | 294 | |
| | 295 | add.datamatrix(traits, name="the metabolite expression", rowtype="metabolite", coltype="individual", valuetype="text") |
| | 296 | |
| | 297 | add.datamatrix(qtlProfiles, name="the QTL profiles", rowtype="metabolite", coltype="marker", valuetype="double") |
| | 298 | |
| | 299 | # |
| | 300 | |
| | 301 | # VERIFY DATA uploaded and downloaded data |
| | 302 | |
| | 303 | # |
| | 304 | |
| | 305 | #retrieve the uploaded data |
| | 306 | |
| | 307 | geno2 <- find.datamatrix(name="the genotypes") |
| | 308 | |
| | 309 | traits2 <- find.datamatrix(name="the metabolite expression") |
| | 310 | |
| | 311 | qtls2 <- find.datamatrix(name="the QTL profiles") |
| | 312 | |
| | 313 | #is it identical??? |
| | 314 | |
| | 315 | identical(genotypes,geno2) |
| | 316 | |
| | 317 | identical(traits,traits) |
| | 318 | |
| | 319 | identical(qtlProfiles,qtls2) |
| | 320 | |
| | 321 | #ai, there is rounding going on somewhere! |
| | 322 | |
| | 323 | format(qtlProfiles[12,1],digits=20) |
| | 324 | |
| | 325 | format(qtls2[12,1],digits=20) |
| | 326 | |
| | 327 | #as this already happens during write.csv this seems partly due to R itself !!! |
| | 328 | |
| | 329 | #write.table(qtlProfiles, file="!c:/test.txt") |
| | 330 | |
| | 331 | #qtlProfiles_copy = read.table(file="!c:/test.txt") |
| | 332 | |
| | 333 | #identical(qtlProfiles,qtlProfiles_copy) |
| | 334 | |
| | 335 | # |
| | 336 | |
| | 337 | all.equal(qtlProfiles,qtls2) |
| | 338 | |
| | 339 | #compare annotations |
| | 340 | |
| | 341 | identical(markers_return$name,rownames(markers)) |
| | 342 | |
| | 343 | identical(markers_return$name,rownames(genotypes)) |
| | 344 | |
| | 345 | identical(markers_return$name,colnames(qtlProfiles)) |
| | 346 | |
| | 347 | identical(metabolites_return$name,rownames(traits)) |
| | 348 | |
| | 349 | identical(individuals_return$name,colnames(genotypes)) |
| | 350 | |
| | 351 | identical(individuals_return$name,colnames(traits)) |
| | 352 | |
| | 353 | # |
| | 354 | |
| | 355 | # REMOVE DATA again |
| | 356 | |
| | 357 | # in reverse order |
| | 358 | |
| | 359 | # |
| | 360 | |
| | 361 | #remove matrices |
| | 362 | |
| | 363 | remove.datamatrix(name="the genotypes") |
| | 364 | |
| | 365 | remove.datamatrix(name="the metabolite expression") |
| | 366 | |
| | 367 | remove.datamatrix(name="the QTL profiles") |
| | 368 | |
| | 369 | #remove annotations |
| | 370 | |
| | 371 | remove.metabolite(metabolites_return) |
| | 372 | |
| | 373 | remove.individual(individuals_return) |
| | 374 | |
| | 375 | remove.marker(markers_return) |
| | 376 | |
| | 377 | remove.investigation(investigation_return) |