Changes between Initial Version and Version 1 of Intranet/SequenceAnnotationPipeline


Ignore:
Timestamp:
2010-10-01T23:19:13+02:00 (14 years ago)
Author:
trac
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Intranet/SequenceAnnotationPipeline

    v1 v1  
     1The purpose of this pipeline is to further annotate the SNPs from the 39 celiac disease patients samples. The sequencing and downstream analysis was performed in the BGI institute in China. These samples may be further augmented with 6 patients sequenced in Groningen. The initial input is 39 GFF samples. Initially we identified that there is an error in the GFF format. Namely, the label "alleles" should be "allele" so this has to be corrected in all files. (i.e alleles=G/A --> allele=G/A).
     2
     3The first step of the pipeline was to annotate the GFF files with reference information from the HAPMAP3 and 1000Genome project. To do this we selected the SeattleSeqAnnotation tool. It is a fast, stable and well known tool. The negatives are that is a web application with closed source code. The tool's webpage is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/index.jsp they also provide a java program that wraps the web forms in order to run the tool from a command line: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java
     4
     5Second Step was to remove duplicates. SeattleSeqAnnotation output contained several lines per position. We kept the first one of every duplicate line
     6
     7Third step was to add annotation from Immuno_BeadChip
     8
     9Forth step was to add the rs codes of SNPs. The output of SeattleSeqAnnotation missed this information in some SNPs. For these SNPs we copied this information from the initial GFF files
     10
     11So far the header of he 39 annotated files is:
     12
     13{{{
     14# inDBSNPOrNot  chromosome      position        referenceBase   sampleGenotype  allelesMaq      allelesDBSNP    accession       functionGVS     functionDBSNP   rsID(dbSNP+1000genome)  aminoAcids      proteinPosition polyPhen        nickLab scorePhastCons  consScoreGERP   chimpAllele     CNV     geneList        AfricanHapMapFreq       EuropeanHapMapFreq      AsianHapMapFreq hasGenotypes    dbSNPValidation repeatMasker    tandemRepeat    clinicalAssociation     proteinSequence Immuno_BeadChip
     15}}}
     16From Patrick Deelen: I have added the annotations to the Q20 files. The only thing that is missing are the eQTL results, the rug cluster has crashed and so I can't download the results. I have tested my program with some old results and that is working so I hope they reset the cluster tomorrow. The analysis was already completed so it is only a matter of downloading.
     17
     18From Patrick Deelen: I have added the eQTL results to the files. If the gene name is know that it is displayed otherwise the probe-ID is displayed.
     19
     20These files where available via scp from Patrick. I 've downloaded them from him and given them to Agata.
     21
     22Things to do:
     23
     24 * Add GO annotation from (GenBrowser2 or David or ...)
     25 * Add allele frequencies for 1KGP and HapMap3
     26
     27Peter added the following annotations:
     28
     29{{{
     30eQTL gene       Celiac loci     Immunochip      Source_SeattleSeq       Function DB-SNP PolyPhen        scorePhastCons  consScoreGERP   CNV
     31}}}
     32= Pipeline overview =
     33|| script || property || description || source ||
     34|| 1KGP || alleleFreq || allele freq in 1KG || 1KG ||
     35
     36----
     37|| '''Name of Script''' || '''Version''' || '''Description''' || '''link''' || '''Input1''' || '''Input2''' || '''Output1''' || '''Output2''' ||
     38||Prepare_BGI_GFF_for_SeattleSeqAnnotation || 24.9.2010 || Preprocesses GFF files for SeattleAnnotationTool (change of alleles --> allele and adds the line "# autoFile testAuto.txt" in the top of the files) || http://www.bbmriwiki.nl/svn/SequenceAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation.py || GFF files (initial input) || ||  PreprocessedFilename || ||
     39||SubmitSeattleSeqAnnotationAutoJob || 26.9.2010 || This is a wrapper for the java tool provided by SeattleSeqAnnotation website: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ The location of the java wrapepr is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java || http://www.bbmriwiki.nl/svn/SequenceAnnotation/SubmitSeattleSeqAnnotationAutoJob/ |||||||| check documentation in source code for inputs and output ||
     40|| AddImmunoChipAnnotation || 26.9.2010 || Use this generic tool to add Immunochip annotation || http://www.bbmriwiki.nl/svn/SequenceAnnotation/AddImmunoChipAnnotation/ || Immuno_BeadChip_11419691_B_SNPinfo.txt || FileToBeAnnotated || FileWithAnnotation || ||