wiki:Intranet/SequenceAnnotationPipeline

The purpose of this pipeline is to further annotate the SNPs from the 39 celiac disease patients samples. The sequencing and downstream analysis was performed in the BGI institute in China. These samples may be further augmented with 6 patients sequenced in Groningen. The initial input is 39 GFF samples. Initially we identified that there is an error in the GFF format. Namely, the label "alleles" should be "allele" so this has to be corrected in all files. (i.e alleles=G/A --> allele=G/A).

The first step of the pipeline was to annotate the GFF files with reference information from the HAPMAP3 and 1000Genome project. To do this we selected the SeattleSeqAnnotation? tool. It is a fast, stable and well known tool. The negatives are that is a web application with closed source code. The tool's webpage is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/index.jsp they also provide a java program that wraps the web forms in order to run the tool from a command line: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java

Second Step was to remove duplicates. SeattleSeqAnnotation? output contained several lines per position. We kept the first one of every duplicate line

Third step was to add annotation from Immuno_BeadChip

Forth step was to add the rs codes of SNPs. The output of SeattleSeqAnnotation? missed this information in some SNPs. For these SNPs we copied this information from the initial GFF files

So far the header of he 39 annotated files is:

# inDBSNPOrNot	chromosome	position	referenceBase	sampleGenotype	allelesMaq	allelesDBSNP	accession	functionGVS	functionDBSNP	rsID(dbSNP+1000genome)	aminoAcids	proteinPosition	polyPhen	nickLab	scorePhastCons	consScoreGERP	chimpAllele	CNV	geneList	AfricanHapMapFreq	EuropeanHapMapFreq	AsianHapMapFreq	hasGenotypes	dbSNPValidation	repeatMasker	tandemRepeat	clinicalAssociation	proteinSequence	Immuno_BeadChip

From Patrick Deelen: I have added the annotations to the Q20 files. The only thing that is missing are the eQTL results, the rug cluster has crashed and so I can't download the results. I have tested my program with some old results and that is working so I hope they reset the cluster tomorrow. The analysis was already completed so it is only a matter of downloading.

From Patrick Deelen: I have added the eQTL results to the files. If the gene name is know that it is displayed otherwise the probe-ID is displayed.

These files where available via scp from Patrick. I 've downloaded them from him and given them to Agata.

Things to do:

  • Add GO annotation from (GenBrowser2 or David or ...)
  • Add allele frequencies for 1KGP and HapMap3

Peter added the following annotations:

eQTL gene	Celiac loci	Immunochip	Source_SeattleSeq	Function DB-SNP	PolyPhen	scorePhastCons	consScoreGERP	CNV

Pipeline overview

script property description source
1KGP alleleFreq allele freq in 1KG 1KG

Name of Script Version Description link Input1 Input2 Output1 Output2
Prepare_BGI_GFF_for_SeattleSeqAnnotation 24.9.2010 Preprocesses GFF files for SeattleAnnotationTool? (change of alleles --> allele and adds the line "# autoFile testAuto.txt" in the top of the files) http://www.bbmriwiki.nl/svn/SequenceAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation/Prepare_BGI_GFF_for_SeattleSeqAnnotation.py GFF files (initial input) PreprocessedFilename?
SubmitSeattleSeqAnnotationAutoJob? 26.9.2010 This is a wrapper for the java tool provided by SeattleSeqAnnotation? website: http://gvs.gs.washington.edu/SeattleSeqAnnotation/ The location of the java wrapepr is: http://gvs.gs.washington.edu/SeattleSeqAnnotation/SubmitSeattleSeqAnnotationAutoJob.java http://www.bbmriwiki.nl/svn/SequenceAnnotation/SubmitSeattleSeqAnnotationAutoJob/ check documentation in source code for inputs and output
AddImmunoChipAnnotation? 26.9.2010 Use this generic tool to add Immunochip annotation http://www.bbmriwiki.nl/svn/SequenceAnnotation/AddImmunoChipAnnotation/ Immuno_BeadChip_11419691_B_SNPinfo.txt FileToBeAnnotated? FileWithAnnotation?
Last modified 14 years ago Last modified on 2010-10-01T23:19:13+02:00