wiki:GwasPipeline

Placeholder for the Genome-wide association study pipeline for the LifeLines project. May get some help from BBMRI and NBIC as well.

GwasPipeline

developers:AndreDeVries, JorisLops?, MorrisSwertz?
state:design

In general, genome wide genotype data (SNPs) goes through the following processing steps:

  1. Genotype calling
  2. Cleaning of the genotype data
  3. Imputation (optional)
  4. Analysis

Steps 1-3 can be regarded as preprocessing steps, while step 4 is one that can be re-iterated many times, based on a single outcome of steps 1-3.

Steps 1 and 2 can be combined in a single software package.
Step 3 is performed using imputation software, such as IMPUTE, Beagle or MaCH.
Step 4 combines the cleaned (+imputed) data plus some phenotype data into an analysis.

An automated pipeline may be desirable. Steps 1+2 could be standardized and thus also automized into a pipeline. Step 3 may be added to that.

07/09/2010 An imputation pipeline is desired. Below a conceptual design is presented. The pipeline is about:

  • Setting up parameters for an imputation run
  • Run the job an a cluster
  • Administration of running and finished jobs, input and output files (track&trace)

Step 4 probably has to be in a separate pipeline. This would result in a kind of platform (based on Molgenis?) in which researchers construct instructions in order to run some analysis.
Results come back to the platform and can be inspected.
An important ingredient of whole genome SNP analysis is the command line program PLINK. Information about that can be found below.

Last modified 14 years ago Last modified on 2010-10-01T23:19:13+02:00

Attachments (2)

Download all attachments as: .zip