| 1 | [[TOC()]] |
| 2 | = How to convert between XGAP and other formats = |
| 3 | Below we describe existing and planned procedures to convert between XGAP and other formats. |
| 4 | |
| 5 | == !HapMap format == |
| 6 | A !HapMapParser is located at handwritten/java/convertors/!HapMapParser.java. |
| 7 | |
| 8 | To parse a file, just create a new instance of the class with an argument denoting the location of a !HapMap file ([http://www.xgap.org/attachment/wiki/XgapExchange/HapMap_format_example.txt example]). |
| 9 | |
| 10 | For example: |
| 11 | |
| 12 | {{{ |
| 13 | #!java |
| 14 | new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr1_CHD_r27_nr.b36_fwd.txt"); |
| 15 | new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr8_LWK_r27_nr.b36_fwd.txt"); |
| 16 | }}} |
| 17 | |
| 18 | Each input file will result in the creation of a new directory at the base path, in this case: |
| 19 | |
| 20 | {{{ |
| 21 | D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr1_CHD_r27_nr.b36_fwd/ |
| 22 | D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr8_LWK_r27_nr.b36_fwd/ |
| 23 | }}} |
| 24 | |
| 25 | In each new directory, the program creates the following XGAP format equivalents: |
| 26 | |
| 27 | * individual.txt |
| 28 | * marker.txt |
| 29 | * matrix.txt |
| 30 | |
| 31 | Which will content such as: |
| 32 | |
| 33 | individual.txt |
| 34 | {{{ |
| 35 | name |
| 36 | NA19028 |
| 37 | NA19031 |
| 38 | NA19035 |
| 39 | NA19027 |
| 40 | NA19041 |
| 41 | NA19046 |
| 42 | NA19308 |
| 43 | NA19311 |
| 44 | NA19317 |
| 45 | ... |
| 46 | }}} |
| 47 | |
| 48 | marker.txt |
| 49 | {{{ |
| 50 | name chr bpstart species_name seq |
| 51 | rs241846 8 81890 Homo sapiens C/T |
| 52 | rs2906360 8 151222 Homo sapiens C/G |
| 53 | rs6993172 8 155982 Homo sapiens C/T |
| 54 | rs2906364 8 158484 Homo sapiens C/T |
| 55 | rs2003497 8 166818 Homo sapiens A/G |
| 56 | rs17744505 8 169693 Homo sapiens G/T |
| 57 | rs17744517 8 172340 Homo sapiens A/G |
| 58 | rs6990702 8 173696 Homo sapiens C/G |
| 59 | rs2906326 8 174319 Homo sapiens C/T |
| 60 | ... ... ... ... |
| 61 | }}} |
| 62 | |
| 63 | matrix.txt |
| 64 | {{{ |
| 65 | NA19028 NA19031 NA19035 NA19027 NA19041 NA19046 NA19308 NA19311 NA19317 NA19376 ... |
| 66 | rs241846 TT TT TT TT TT CT TT TT TT CT ... |
| 67 | rs2906360 GG CG GG GG CG GG CG CG GG GG ... |
| 68 | rs6993172 CC CC CC CC CC CC CC CC CC CC ... |
| 69 | rs2906364 TT TT TT CT CT CT TT TT CC CT ... |
| 70 | rs2003497 AG GG GG AG AG AG GG AG AA AG ... |
| 71 | rs17744505 GT GG GG GG GG GT GG GG GG GT ... |
| 72 | rs17744517 AG AA AA AA AA AG AA AA AA AG ... |
| 73 | rs6990702 CC CC CC CC CC CC CG CC CC CC ... |
| 74 | rs2906326 CT CT TT NN CT CT TT CT CC CT ... |
| 75 | ... ... ... ... |
| 76 | }}} |
| 77 | |
| 78 | == PED and MAP format == |
| 79 | The PED and MAP file formats are used often in light of GWAS toolkits such as [http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml PLINK]. |
| 80 | |
| 81 | A convertor for the PED and MAP formats is located at handwritten/java/convertors/!PedMapParser.java. |
| 82 | |
| 83 | To parse a file, just create a new instance of the class with two arguments: |
| 84 | |
| 85 | * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/Ped_format_example.txt Ped file]. |
| 86 | * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/PedMap_format_example.txt Map file]. |
| 87 | |
| 88 | For example: |
| 89 | {{{ |
| 90 | #!java |
| 91 | new PedMapParser("D:/data/xgapdata/HumanPublicSets/193sgenome_sample.ped", "D:/data/xgapdata/HumanPublicSets/193sgenome.map"); |
| 92 | }}} |
| 93 | |
| 94 | Each input file will result in the creation of a new directory at the base path, in this case: |
| 95 | |
| 96 | {{{ |
| 97 | D:/data/xgapdata/HumanPublicSets/xgapnized/193sgenome_sample/ |
| 98 | }}} |
| 99 | |
| 100 | In each new directory, the program creates the following XGAP format equivalents: |
| 101 | |
| 102 | * strain.txt |
| 103 | * individual.txt |
| 104 | * marker.txt |
| 105 | * matrix.txt |
| 106 | |
| 107 | Which will content such as: |
| 108 | |
| 109 | strain.txt |
| 110 | |
| 111 | {{{ |
| 112 | name straintype |
| 113 | WGACON Natural |
| 114 | }}} |
| 115 | |
| 116 | individual.txt |
| 117 | |
| 118 | {{{ |
| 119 | name strain_name father_name mother_name |
| 120 | Ind1 WGACON Ind0 Ind0 |
| 121 | Ind6 WGACON Ind0 Ind0 |
| 122 | Ind7 WGACON Ind0 Ind0 |
| 123 | Ind9 WGACON Ind0 Ind0 |
| 124 | Ind11 WGACON Ind0 Ind0 |
| 125 | Ind12 WGACON Ind0 Ind0 |
| 126 | Ind15 WGACON Ind0 Ind0 |
| 127 | Ind17 WGACON Ind0 Ind0 |
| 128 | Ind18 WGACON Ind0 Ind0 |
| 129 | Ind20 WGACON Ind0 Ind0 |
| 130 | ... ... ... ... |
| 131 | }}} |
| 132 | |
| 133 | marker.txt |
| 134 | |
| 135 | {{{ |
| 136 | name chr bpstart species_name seq |
| 137 | rs3094315 1 792429 Homo sapiens 0 |
| 138 | rs6672353 1 817376 Homo sapiens 0 |
| 139 | rs4040617 1 819185 Homo sapiens 0 |
| 140 | rs2980300 1 825852 Homo sapiens 0 |
| 141 | rs2905036 1 832343 Homo sapiens 0 |
| 142 | rs4245756 1 839326 Homo sapiens 0 |
| 143 | rs4075116 1 1043552 Homo sapiens 0 |
| 144 | rs9442385 1 1137258 Homo sapiens 0 |
| 145 | rs10907175 1 1170650 Homo sapiens 0 |
| 146 | rs2887286 1 1196054 Homo sapiens 0 |
| 147 | ... ... ... ... |
| 148 | }}} |
| 149 | |
| 150 | matrix.txt |
| 151 | |
| 152 | {{{ |
| 153 | rs3094315 rs6672353 rs4040617 rs2980300 rs2905036 rs4245756 rs4075116 rs9442385 rs10907175 rs2887286 |
| 154 | Ind1 CT GG AG AG TT CC AA GG AA TT ... |
| 155 | Ind6 CT GG AG AG 00 CC GG GG AC CT ... |
| 156 | Ind7 TT GG AA GG TT CC AG GG AC CT ... |
| 157 | Ind9 TT GG AA GG TT CC AG GG AA TT ... |
| 158 | Ind11 TT GG AA GG TT CC AA GT AA TT ... |
| 159 | Ind12 TT GG AA GG TT CC AA GG AA TT ... |
| 160 | Ind15 CC GG 00 00 TT CC AA GT AA TT ... |
| 161 | Ind17 TT GG AA GG TT CC AG GG AA CC ... |
| 162 | Ind18 TT GG AA GG 00 CC AA GG AC CT ... |
| 163 | Ind20 TT GG AA GG TT CC AA GG AA CT ... |
| 164 | ... ... ... ... |
| 165 | }}} |
| 166 | |
| 167 | == !GeneNetwork format == |
| 168 | GeneNetwork allows upload/download of data using a proprietary format which is not unlike XGAP. We here describe how to produce a suitable file: |
| 169 | |
| 170 | The GeneNetwork data files look like this: |
| 171 | {{{ |
| 172 | ProbeSetID CXB5 BXD31 BXD62 BXD73 BXD23 BXD60 B6D2F1 BXD92 BXD43 BXD48 ... |
| 173 | 1415670_at 0.437 0.214 0.123 0.143 0.835 0.199 0.421 0.32 0.043 0.26 ... |
| 174 | 1415671_at 0.145 0.155 0.278 0.108 0.381 0.139 0.475 0.021 0.145 0.102 ... |
| 175 | 1415672_at 0.14 0.128 0.196 0.093 0.408 0.03 0.428 0.408 0.118 0.33 ... |
| 176 | 1415673_at 0.349 0.18 0.211 0.199 0.266 0.056 0.232 0.044 0.156 0.294 ... |
| 177 | 1415674_a_at 0.23 0.182 0.316 0.168 0.198 0.007 0.212 0.032 0.016 0.028 ... |
| 178 | 1415675_at 0.415 0.051 0.008 0.062 0.255 0.058 0.15 0.208 0.016 0.195 ... |
| 179 | 1415676_a_at 0.154 0.404 0.228 0.046 0.159 0.01 0.583 0.24 0.218 0.146 ... |
| 180 | 1415677_at 0.19 0.047 0.431 0.001 0.396 0.053 0.595 0.033 0.06 0.033 ... |
| 181 | 1415678_at 0.106 0.044 0.257 0.147 0.2 0.043 0.089 0.059 0.12 0.104 ... |
| 182 | 1415679_at 0.143 0.026 0.373 0.211 0.42 0.127 0.299 0.095 0.016 0.155 ... |
| 183 | ... ... ... ... |
| 184 | }}} |
| 185 | |
| 186 | This is practically identical to XGAP. In this case, one would have to remove |
| 187 | |
| 188 | {{{ |
| 189 | ProbeSetID |
| 190 | }}} |
| 191 | |
| 192 | and the format would be the same. |
| 193 | |
| 194 | In addition one would create annotation files for the rows and columns, eg. |
| 195 | |
| 196 | probes.txt |
| 197 | |
| 198 | {{{ |
| 199 | name {properties} |
| 200 | 1415670_at |
| 201 | 1415671_at |
| 202 | 1415672_at |
| 203 | ... |
| 204 | }}} |
| 205 | |
| 206 | individuals.txt |
| 207 | |
| 208 | {{{ |
| 209 | name {properties} |
| 210 | CXB5 |
| 211 | BXD31 |
| 212 | BXD62 |
| 213 | ... |
| 214 | }}} |
| 215 | |
| 216 | == MAGE-TAB and ISA-TAB format == |
| 217 | XGAP is based on FuGE which in turn is compatible with [http://www.mged.org/mage-tab/ MAGE-TAB] for microarray experiments and its generalized cousin [http://isatab.sourceforge.net/ ISA-TAB] for all kinds of experiments. |
| 218 | While the MAGE-TAB and ISA-TAB are also tab delimited files their format is a bit more complicated than XGAP. In collaboration with EBI a start has been made with a convertor which is expected to be finished by end of 2009. |
| 219 | Progress can be found on http://magetab-om.sourceforge.net. |
| 220 | Code can be found in handwritten/java/convertor/ |
| 221 | |
| 222 | == dbGaP and EGA genotype archives == |
| 223 | [http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap dbGaP] and [http://www.ebi.ac.uk/ega/page.php EGA] currently don't allow public download of genotype data. However, summary data on phenotypes can be downloaded while uploaded data can be done in . Just as with MAGE-TAB collaborative efforts have been started to enable exchange resulting in preliminary parsers. Moreover, dbGaP and EGA are working on an exchange format themselves that we aim to support. |
| 224 | Progess can be found on http://wwwdev.ebi.ac.uk/microarray-srv/pheno/ |
| 225 | Code can be found in handwritten/java/convertor/ |
| 226 | |