Changes between Version 26 and Version 27 of SopConvertLifeLinesGenoData


Ignore:
Timestamp:
2012-04-10T21:35:26+02:00 (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • SopConvertLifeLinesGenoData

    v26 v27  
    3131Example mapping file:
    3232{{{
    33 LL_WGA0001   STUDYPSEUDO1   0
    34 LL_WGA0002   STUDYPSEUDO2   0
    35 LL_WGA0003   STUDYPSEUDO3   0
     331   LL_WGA0001   1   STUDYPSEUDO1
     341   LL_WGA0002   1   STUDYPSEUDO2
     351   LL_WGA0003   1   STUDYPSEUDO3
    3636...
    3737}}}
    3838
    39  * So: Geno individual ID's - TAB - Study pseudonyms - TAB - Phenotypes (can be all 0's as TFAM will be generated later by the user)
     39 * So: Geno family ID's - TAB - Geno individual ID's - TAB - Study family psuedonyms TAB Study pseudonyms
    4040 * Items are TAB-separated and it doesn't end with a newline
     41
    4142== Procedure ==
    4243
    4344=== Step 1: create subset_study<n>.txt file for study<n> ===
    4445
    45  * In every MOLGENIS<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
     46 * In every STUDY<n> schema for a study that has geno data, there is a VW_DICT_GENO_PSEUDONYMS view
    4647 * In this view, PA_IDs (LL IDs) are related to GNO_IDs ("Marcel" IDs, the LL_WGA numbers)
    4748 * Export this view (tab separated, no enclosures, no headers) to subset_study<n>.txt
     
    5556}}}
    5657
    57 reformat mapping file:
    58 
     58reformat mapping file '''WHY IS THIS?''':
    5959{{{#!sh
    6060./formatsubsetfile.sh study<n>.txt
    6161}}}
    6262
    63 run convertor on TriTyper and Mapping file:
     63filter individuals (repeat per chr)
    6464{{{#!sh
    65 /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/java -jar TriToPlinkLifeLines.jar P BeagleImputedTriTyper/ study<n> subset_study<n>.txt
     65#--file [file] is input file (expects .map and .ped)
     66#--keep [file] tells plink what individuals to keep (from txt file with fam + ind id)
     67#--recode tells plink to write results (otherwise no results!!! argh!)
     68#--out defines output prefix (here: filtered.*)
     69#--update-ids [file] tells prefix to update ids
     70#result: filtered.ped/map'
     71
     72plink --file testdata_chr1 --keep subset.txt --recode --out temp_chr1
    6673}}}
    6774
    68 Note:
    69 * Convertor from TriTyper to PLINK resides on /target/gpfs2/lifelines_rp/releases/LL3
    70 * Correct Java version resides on /target/gpfs2/lifelines_rp/tools/jdk1.6.0_22/bin/
    71 * Estimated runtime: 4 hours (4Gb/2 cpu @ cluster.gcc.rug.nl)
    72 === Step 3: convert into binary plink format ===
     75update individuals ids (repeat per chr)
     76{{{
     77#--file [file] is input file
     78#--keep [file] tells plink what individuals to update
     79#(from txt file with OLD fam + ind id + NEW fam id + ind id)
     80#--recode tells plink to write results (otherwise no results!!! argh!)
     81# result: updatedids.map/ped
    7382
    74 Convert .tped into study<n>.bed, .bim. and .fam files:
    75  {{{#!sh
    76 plink --tfile study<n> --make-bed --out study<n>
     83plink --file temp_chr1 --update-ids subset.txt --recode --out study2_chr1
    7784}}}
    7885
    79 Split study<n>.bed, .bim, fam per chromosome:
    8086
    81 >> this script is untested, awaiting account
     87#step 3:
     88#convert to bed (repeat per chr)
     89plink --file study2_chr1 --make-bed
    8290
    83 {{{#!sh
    84 #create variable holding study name
    85 study = study<n>
    86 
    87 #get all chromosomes out of .bim file
    88 chrs=`awk '{print $1}' ${study}.bim | sort -nur`
    89 echo "Chromosome in Map File: ${chrs}" | tr "\n" " "
    90 echo ""
    91 
    92 #use to split/convert
    93 for chr in $chrs; do
    94         print "Processing chromosome $_\n";
    95         plink --bfile $study --chr $_ --make-bed --out $study$_;
    96 }}}
    97 
    98 >NB: If this takes long we should make this cluster jobs!
    9991=== Step 4: convert into dosage format ===
    10092
    101 MISSING! ask Joeri?
     93TODO! ask Joeri?
    10294
    103 === Step 5: copy all study<n> files to the lifelines0<n> folder ===
     95=== Step 5: copy all study*<n> files to the lifelines0<n> folder ===
    10496
    10597{{{#!sh