| 1 | = SOP for converting LifeLines Geno Data = |
| 2 | |
| 3 | [[TOC()]] |
| 4 | |
| 5 | This SOP applies to LL3. |
| 6 | |
| 7 | Data is released to researcher 'per study' (i.e. an approved research request). |
| 8 | * Per study a subset of the genotypes is created and made available to the researcher: |
| 9 | * Only individuals selected for study (e.g. 5000 out of total 17000) |
| 10 | * The identifiers 're-pseunomized' from 'marcel identifiers' to 'study identifiers' (so data can not be matched between studies). |
| 11 | |
| 12 | == Expected outputs == |
| 13 | |
| 14 | User expects files in PLINK format: |
| 15 | * TPED/TFAM genotype files (chosen for internal use as easier to produce) |
| 16 | * BIM/BED/FAM genotype files (with empty phenotype, monomorphic filtered) |
| 17 | * BIM/BED/FAM genotype files splitted per chromosome |
| 18 | * MAP/PED '''dosage''' files |
| 19 | * MAP/PED dosage files '''splitted per chromosome''' |
| 20 | |
| 21 | == Available inputs == |
| 22 | |
| 23 | Complete genotype data is in: /target/gpfs2/lifelines_rp/releases/LL3/ |
| 24 | |