The purpose of this pipeline is to further annotate the SNPs from the 39 celiac disease patients samples. The sequencing and downstream analysis was performed in the BGI institute in China. These samples may be further augmented with 6 patients sequenced in Groningen. The initial input is 39 GFF samples. Initially we identified that there is an error in the GFF format. Namely, the label "alleles" should be "allele" so this has to be corrected in all files. (i.e alleles=G/A --> allele=G/A). The first step of the pipeline was to annotate the GFF files with reference information from the HAPMAP3 and 1000Genome project. To do this we selected the SeattleSeqAnnotation tool. It is a fast, stable and well known tool. The negatives are that is a web application with closed source code. The tool's webpage is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/index.jsp they also provide a java program that wraps the web forms in order to run the tool from a command line: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java Second Step was to remove duplicates. SeattleSeqAnnotation output contained several lines per position. We kept the first one of every duplicate line Third step was to add annotation from Immuno_BeadChip Forth step was to add the rs codes of SNPs. The output of SeattleSeqAnnotation missed this information in some SNPs. For these SNPs we copied this information from the initial GFF files So far the header of he 39 annotated files is: {{{ # inDBSNPOrNot chromosome position referenceBase sampleGenotype allelesMaq allelesDBSNP accession functionGVS functionDBSNP rsID(dbSNP+1000genome) aminoAcids proteinPosition polyPhen nickLab scorePhastCons consScoreGERP chimpAllele CNV geneList AfricanHapMapFreq EuropeanHapMapFreq AsianHapMapFreq hasGenotypes dbSNPValidation repeatMasker tandemRepeat clinicalAssociation proteinSequence Immuno_BeadChip }}} From Patrick Deelen: I have added the annotations to the Q20 files. The only thing that is missing are the eQTL results, the rug cluster has crashed and so I can't download the results. I have tested my program with some old results and that is working so I hope they reset the cluster tomorrow. The analysis was already completed so it is only a matter of downloading. From Patrick Deelen: I have added the eQTL results to the files. If the gene name is know that it is displayed otherwise the probe-ID is displayed. These files where available via scp from Patrick. I 've downloaded them from him and given them to Agata. Things to do: * Add GO annotation from (GenBrowser2 or David or ...) * Add allele frequencies for 1KGP and HapMap3 Peter added the following annotations: {{{ eQTL gene Celiac loci Immunochip Source_SeattleSeq Function DB-SNP PolyPhen scorePhastCons consScoreGERP CNV }}} = Pipeline overview = || script || property || description || source || || 1KGP || alleleFreq || allele freq in 1KG || 1KG || ---- || '''Name of Script''' || '''Version''' || '''Description''' || '''link''' || '''Input1''' || '''Input2''' || '''Output1''' || '''Output2''' || ||Prepare_BGI_GFF_for_SeattleSeqAnnotation || 24.9.2010 || Preprocesses GFF files for SeattleAnnotationTool (change of alleles --> allele and adds the line "# autoFile testAuto.txt" in the top of the files) || http://www.bbmriwiki.nl/svn/SequenceAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation.py || GFF files (initial input) || || PreprocessedFilename || || ||SubmitSeattleSeqAnnotationAutoJob || 26.9.2010 || This is a wrapper for the java tool provided by SeattleSeqAnnotation website: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ The location of the java wrapepr is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java || http://www.bbmriwiki.nl/svn/SequenceAnnotation/SubmitSeattleSeqAnnotationAutoJob/ |||||||| check documentation in source code for inputs and output || || AddImmunoChipAnnotation || 26.9.2010 || Use this generic tool to add Immunochip annotation || http://www.bbmriwiki.nl/svn/SequenceAnnotation/AddImmunoChipAnnotation/ || Immuno_BeadChip_11419691_B_SNPinfo.txt || FileToBeAnnotated || FileWithAnnotation || ||