wiki:GwasPipeline

Context Navigation

Placeholder for the Genome-wide association study pipeline for the LifeLines project. May get some help from BBMRI and NBIC as well.

GwasPipeline

developers:	AndreDeVries, JorisLops?, MorrisSwertz?
state:	design

In general, genome wide genotype data (SNPs) goes through the following processing steps:

Genotype calling
Cleaning of the genotype data
Imputation (optional)
Analysis

Steps 1-3 can be regarded as preprocessing steps, while step 4 is one that can be re-iterated many times, based on a single outcome of steps 1-3.

Steps 1 and 2 can be combined in a single software package.
Step 3 is performed using imputation software, such as IMPUTE, Beagle or MaCH.
Step 4 combines the cleaned (+imputed) data plus some phenotype data into an analysis.

An automated pipeline may be desirable. Steps 1+2 could be standardized and thus also automized into a pipeline. Step 3 may be added to that.

07/09/2010 An imputation pipeline is desired. Below a conceptual design is presented. The pipeline is about:

Setting up parameters for an imputation run
Run the job an a cluster
Administration of running and finished jobs, input and output files (track&trace)

Step 4 probably has to be in a separate pipeline. This would result in a kind of platform (based on Molgenis?) in which researchers construct instructions in order to run some analysis.
Results come back to the platform and can be inspected.
An important ingredient of whole genome SNP analysis is the command line program PLINK. Information about that can be found below.

Last modified 16 years ago Last modified on 2010-10-01T23:19:13+02:00

Attachments (2)

PLINKinLifeLines_v0.2.doc (898.0 KB) - added by andredevries 16 years ago.
ImputationManagementTool v0.1.doc (112.0 KB) - added by andredevries 16 years ago. Imputation Submission and Management Tool

Download all attachments as: .zip

Download in other formats:

Plain Text