wiki:SopConvertLifeLinesGenoData

Version 1 (modified by Morris Swertz, 12 years ago) (diff)

--

SOP for converting LifeLines Geno Data

This SOP applies to LL3.

Data is released to researcher 'per study' (i.e. an approved research request).

  • Per study a subset of the genotypes is created and made available to the researcher:
  • Only individuals selected for study (e.g. 5000 out of total 17000)
  • The identifiers 're-pseunomized' from 'marcel identifiers' to 'study identifiers' (so data can not be matched between studies).

Expected outputs

User expects files in PLINK format:

  • TPED/TFAM genotype files (chosen for internal use as easier to produce)
  • BIM/BED/FAM genotype files (with empty phenotype, monomorphic filtered)
  • BIM/BED/FAM genotype files splitted per chromosome
  • MAP/PED dosage files
  • MAP/PED dosage files splitted per chromosome

Available inputs

Complete genotype data is in: /target/gpfs2/lifelines_rp/releases/LL3/