Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of XgapExchange

Timestamp:: 2010-10-01T23:38:13+02:00 (15 years ago)
Author:: trac
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

XgapExchange

                       v1
+[[TOC()]]
+= How to convert between XGAP and other formats =
+Below we describe existing and planned procedures to convert between XGAP and other formats.
+== !HapMap format ==
+A !HapMapParser is located at handwritten/java/convertors/!HapMapParser.java.
+To parse a file, just create a new instance of the class with an argument denoting the location of a !HapMap file ([http://www.xgap.org/attachment/wiki/XgapExchange/HapMap_format_example.txt example]).
+For example:
+{{{
+#!java
+new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr1_CHD_r27_nr.b36_fwd.txt");
+new HapMapParser("D:/data/xgapdata/HumanPublicSets/genotypes_chr8_LWK_r27_nr.b36_fwd.txt");
+}}}
+Each input file will result in the creation of a new directory at the base path, in this case:
+{{{
+D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr1_CHD_r27_nr.b36_fwd/
+D:/data/xgapdata/HumanPublicSets/xgapnized/genotypes_chr8_LWK_r27_nr.b36_fwd/
+}}}
+In each new directory, the program creates the following XGAP format equivalents:
+ * individual.txt
+ * marker.txt
+ * matrix.txt
+Which will content such as:
+individual.txt
+{{{
+name
+NA19028
+NA19031
+NA19035
+NA19027
+NA19041
+NA19046
+NA19308
+NA19311
+NA19317
+ ...
+}}}
+marker.txt
+{{{
+name    chr     bpstart species_name    seq
+rs241846        8       81890   Homo sapiens    C/T
+rs2906360       8       151222  Homo sapiens    C/G
+rs6993172       8       155982  Homo sapiens    C/T
+rs2906364       8       158484  Homo sapiens    C/T
+rs2003497       8       166818  Homo sapiens    A/G
+rs17744505      8       169693  Homo sapiens    G/T
+rs17744517      8       172340  Homo sapiens    A/G
+rs6990702       8       173696  Homo sapiens    C/G
+rs2906326       8       174319  Homo sapiens    C/T
+ ... ... ... ...
+}}}
+matrix.txt
+{{{
+NA19028 NA19031 NA19035 NA19027 NA19041 NA19046 NA19308 NA19311 NA19317 NA19376 ...
+rs241846        TT      TT      TT      TT      TT      CT      TT      TT      TT      CT ...
+rs2906360       GG      CG      GG      GG      CG      GG      CG      CG      GG      GG ...
+rs6993172       CC      CC      CC      CC      CC      CC      CC      CC      CC      CC ...
+rs2906364       TT      TT      TT      CT      CT      CT      TT      TT      CC      CT ...
+rs2003497       AG      GG      GG      AG      AG      AG      GG      AG      AA      AG ...
+rs17744505      GT      GG      GG      GG      GG      GT      GG      GG      GG      GT ...
+rs17744517      AG      AA      AA      AA      AA      AG      AA      AA      AA      AG ...
+rs6990702       CC      CC      CC      CC      CC      CC      CG      CC      CC      CC ...
+rs2906326       CT      CT      TT      NN      CT      CT      TT      CT      CC      CT ...
+ ... ... ... ...
+}}}
+== PED and MAP format ==
+The PED and MAP file formats are used often in light of GWAS toolkits such as [http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml PLINK].
+A convertor for the PED and MAP formats is located at handwritten/java/convertors/!PedMapParser.java.
+To parse a file, just create a new instance of the class with two arguments:
+ * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/Ped_format_example.txt Ped file].
+ * The location of a [http://www.xgap.org/attachment/wiki/XgapExchange/PedMap_format_example.txt Map file].
+For example:
+{{{
+#!java
+new PedMapParser("D:/data/xgapdata/HumanPublicSets/193sgenome_sample.ped", "D:/data/xgapdata/HumanPublicSets/193sgenome.map");
+}}}
+Each input file will result in the creation of a new directory at the base path, in this case:
+{{{
+D:/data/xgapdata/HumanPublicSets/xgapnized/193sgenome_sample/
+}}}
+In each new directory, the program creates the following XGAP format equivalents:
+ * strain.txt
+ * individual.txt
+ * marker.txt
+ * matrix.txt
+Which will content such as:
+strain.txt
+{{{
+name    straintype
+WGACON  Natural
+}}}
+individual.txt
+{{{
+name    strain_name     father_name     mother_name
+Ind1    WGACON  Ind0    Ind0
+Ind6    WGACON  Ind0    Ind0
+Ind7    WGACON  Ind0    Ind0
+Ind9    WGACON  Ind0    Ind0
+Ind11   WGACON  Ind0    Ind0
+Ind12   WGACON  Ind0    Ind0
+Ind15   WGACON  Ind0    Ind0
+Ind17   WGACON  Ind0    Ind0
+Ind18   WGACON  Ind0    Ind0
+Ind20   WGACON  Ind0    Ind0
+ ... ... ... ...
+}}}
+marker.txt
+{{{
+name    chr     bpstart species_name    seq
+rs3094315       1       792429  Homo sapiens    0
+rs6672353       1       817376  Homo sapiens    0
+rs4040617       1       819185  Homo sapiens    0
+rs2980300       1       825852  Homo sapiens    0
+rs2905036       1       832343  Homo sapiens    0
+rs4245756       1       839326  Homo sapiens    0
+rs4075116       1       1043552 Homo sapiens    0
+rs9442385       1       1137258 Homo sapiens    0
+rs10907175      1       1170650 Homo sapiens    0
+rs2887286       1       1196054 Homo sapiens    0
+ ... ... ... ...
+}}}
+matrix.txt
+{{{
+rs3094315       rs6672353       rs4040617       rs2980300       rs2905036       rs4245756       rs4075116       rs9442385       rs10907175      rs2887286
+Ind1    CT      GG      AG      AG      TT      CC      AA      GG      AA      TT ...
+Ind6    CT      GG      AG      AG      00      CC      GG      GG      AC      CT ...
+Ind7    TT      GG      AA      GG      TT      CC      AG      GG      AC      CT ...
+Ind9    TT      GG      AA      GG      TT      CC      AG      GG      AA      TT ...
+Ind11   TT      GG      AA      GG      TT      CC      AA      GT      AA      TT ...
+Ind12   TT      GG      AA      GG      TT      CC      AA      GG      AA      TT ...
+Ind15   CC      GG      00      00      TT      CC      AA      GT      AA      TT ...
+Ind17   TT      GG      AA      GG      TT      CC      AG      GG      AA      CC ...
+Ind18   TT      GG      AA      GG      00      CC      AA      GG      AC      CT ...
+Ind20   TT      GG      AA      GG      TT      CC      AA      GG      AA      CT ...
+ ... ... ... ...
+}}}
+== !GeneNetwork format ==
+GeneNetwork allows upload/download of data using a proprietary format which is not unlike XGAP. We here describe how to produce a suitable file:
+The GeneNetwork data files look like this:
+{{{
+ProbeSetID      CXB5    BXD31   BXD62   BXD73   BXD23   BXD60   B6D2F1  BXD92   BXD43   BXD48 ...
+1415670_at      0.437   0.214   0.123   0.143   0.835   0.199   0.421   0.32    0.043   0.26  ...
+1415671_at      0.145   0.155   0.278   0.108   0.381   0.139   0.475   0.021   0.145   0.102 ...
+1415672_at      0.14    0.128   0.196   0.093   0.408   0.03    0.428   0.408   0.118   0.33 ...
+1415673_at      0.349   0.18    0.211   0.199   0.266   0.056   0.232   0.044   0.156   0.294 ...
+1415674_a_at    0.23    0.182   0.316   0.168   0.198   0.007   0.212   0.032   0.016   0.028 ...
+1415675_at      0.415   0.051   0.008   0.062   0.255   0.058   0.15    0.208   0.016   0.195 ...
+1415676_a_at    0.154   0.404   0.228   0.046   0.159   0.01    0.583   0.24    0.218   0.146 ...
+1415677_at      0.19    0.047   0.431   0.001   0.396   0.053   0.595   0.033   0.06    0.033 ...
+1415678_at      0.106   0.044   0.257   0.147   0.2     0.043   0.089   0.059   0.12    0.104 ...
+1415679_at      0.143   0.026   0.373   0.211   0.42    0.127   0.299   0.095   0.016   0.155 ...
+ ... ... ... ...
+}}}
+This is practically identical to XGAP. In this case, one would have to remove
+{{{
+ProbeSetID
+}}}
+and the format would be the same.
+In addition one would create annotation files for the rows and columns, eg.
+probes.txt
+{{{
+name   {properties}
+1415670_at
+1415671_at
+1415672_at
+ ...
+}}}
+individuals.txt
+{{{
+name   {properties}
+CXB5
+BXD31
+BXD62
+ ...
+}}}
+== MAGE-TAB and ISA-TAB format ==
+XGAP is based on FuGE which in turn is compatible with [http://www.mged.org/mage-tab/ MAGE-TAB] for microarray experiments and its generalized cousin [http://isatab.sourceforge.net/ ISA-TAB] for all kinds of experiments.
+While the MAGE-TAB and ISA-TAB are also tab delimited files their format is a bit more complicated than XGAP. In collaboration with EBI a start has been made with a convertor which is expected to be finished by end of 2009.
+Progress can be found on http://magetab-om.sourceforge.net.
+Code can be found in handwritten/java/convertor/
+== dbGaP and EGA genotype archives ==
+[http://www.ncbi.nlm.nih.gov/sites/entrez?Db=gap dbGaP] and [http://www.ebi.ac.uk/ega/page.php EGA] currently don't allow public download of genotype data. However, summary data on phenotypes can be downloaded while uploaded data can be done in . Just as with MAGE-TAB collaborative efforts have been started to enable exchange resulting in preliminary parsers. Moreover, dbGaP and EGA are working on an exchange format themselves that we aim to support.
+Progess can be found on http://wwwdev.ebi.ac.uk/microarray-srv/pheno/
+Code can be found in handwritten/java/convertor/