wiki:GenotypePipeline

Version 1 (modified by trac, 14 years ago) (diff)

--

GenotypePipeline

developers:AndreDeVries, JorisLops?, MorrisSwertz?
state:design

In general, genome wide genotype data (SNPs) goes through the following processing steps:

  1. Genotype calling
  2. Cleaning of the genotype data
  3. Imputation (optional)
  4. Analysis

Steps 1-3 can be regarded as preprocessing steps, while step 4 is one that can be re-iterated many times, based on a single outcome of steps 1-3.

Steps 1 and 2 can be combined in a single software package.
Step 3 is performed using imputation software, such as IMPUTE, Beagle or MaCH.
Step 4 combines the cleaned (+imputed) data plus some phenotype data into an analysis.

An automated pipeline may be desirable. Steps 1+2 could be standardized and thus also automized into a pipeline. Step 3 may be added to that.

Step 4 probably has to be in a separate pipeline. This would result in a kind of platform (based on Molgenis?) in which researchers construct instructions in order to run some analysis.
Results come back to the platform and can be inspected.
An important ingredient of whole genome SNP analysis is the command line program PLINK. Information about that can be found below.

Attachments (1)

Download all attachments as: .zip