Available internship projects at the GCC

This list is just a grasp of nearly limitless possibilities. If you have another fruitful proposal yourself, feel free to come over and discuss.

DAS-XGAP connection

Supervisor: Joeri van der Velde, Morris Swertz, Danny Arends

Duration: 3-6 months

The Distributed Annotation System (DAS) defines a communication protocol used to exchange annotations on genomic or protein sequences. It is motivated by the idea that such annotations should not be provided by single centralized databases, but should instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is done by DAS clients. The advantages of this system are that control over the data is retained by data providers, data is freed from the constraints of specific organisations and the normal issues of release cycles, API updates and data duplication are avoided.

We would like to connect our genomics data inside the database to DAS services. This means setting up a service to provide our data in DAS format, and then coupling DAS clients to this datasource. In combination with other available annotations through DAS, we hope to provide useful genetic visualisations for the biologists.

More information: DAS

Comparative analysis of genetic screens for protein aggregation in yeast, worms and flies

Supervisor: Alex Kanterakis, Joeri van der Velde, Morris Swertz, Rainer Breitling

Duration: 6 or 9 months

Toxic protein aggregation is characteristic for many diseases including Alzheimer’s, Parkinson’s and polyglutamine diseases. Several genetic screens have been performed in small model organisms (yeast, worms (C. elegans) and fruitflies (Drosophila) for modifier genes that alter toxicity or aggregation of disease related proteins (alpha-synuclein and polyglutamine). In this project you will perform the first large-scale comparative analysis of the results of these screens in a standardized way, using bioinformatics and statistics tools. The aim is to detect common pathways and mechanisms, as well as specific differences between the various protein aggregation diseases. Results will also be integrated with data from genetic screen results (GWAS) in human populations of, e.g., Parkison’s disease patients. The project will include the unified annotation of screening results using KEGG, UniGene? and Ensembl; Gene Ontology (GO) enrichment analysis using DAVID, GeneTrail? and GOstat (R package), as well as pathway analysis and data integration using custom-developed bioinformatics tools.


van Ham TJ, Breitling R, Swertz M, Nollen EAA (2009): Neurodegenerative diseases: Lessons from genome-wide screens in small model organisms. EMBO Molecular Medicine in press.

van Ham TJ, Thijssen KL, Breitling R, Hofstra RMW, Plasterk RHA, Nollen EAA (2008): C. elegans model identifies genetic modifiers of α-synuclein inclusion formation during aging. PLoS Genetics 4:e10000027.

3D Visualization of genetical genomic analysis

Supervisor: Danny Arends, Joeri v/d Velde, M. Swertz


O3D is a new initiative to bring 3D accelerated graphics to the web browser. During the last Google summer of code advances were made by the open source community an a stable API is now available for initial development. We predict that open3D is a good way of visualizing biological data. Especially when we are looking at/trough multiple levels (Organism, Cell, Protein, Transcription). In QTL analysis researchers search for association between traits and molecular markers, however after identifying these QTLs there is still no easy way to extract knowledge. We think that 3D visualization together with the use of real time user interaction would greatly enhance the toolbox of the current day life science researcher. This project is aimed a developing a pilot plug-in for the molgenis database system, that allows the user to navigate through a 3D space in which genetic data is visualized.

Duration: ~ 6-9 Months

Student Sketch:

  • Interested in 3D programming (and some experience)

References / Start:

Estimating optimal genetic marker placement

Supervisor: Danny Arends, R.C Jansen


Quantitative Trait Likelihood (QTL) analysis tries to determine genetic locations governing quantitative traits like blood pressure. MQM (Multiple QTL Mapping) is an algorithm which enables researchers to estimate in an automated way the importance of genetic markers. This is useful when large datasets need unsupervised clustering or classification of data. However usually too many markers are available in based on prior observations. The MQMalgorithm has already been successfully applied in several fields of biology, like plant breeding, drug discovery and . MQM has multiple advantages over other QTL mapping approaches. One of the main advantages are: Ability to 'compensate' for the effect of other markers, quick because the computational heavy work is programmed in C, and is available for R programming environment (+ other mappings). The R/qtl package brings together several QTL mapping strategies and also includes MQM. To increase detection of QTLs by using MQM cofactors are used. However the placement of these cofactors along the genome is still an issue of debate. To investigate cofactor placement several approaches can be adopted. The goal is to investigate multiple placement strategies, and their effect on detection of QTLs

Duration: ~ 6 Months

Student Sketch:

  • Interested in 2D maps, statistics
  • Semi-familiar with R (or C++), linear regression

References / Start:

  • QTL mapping

Smarter software (re) generation and database migration tools

Supervisor: Joeri van der Velde, Joris Lops, Morris Swertz


MOLGENIS makes use of datamodels in XML format. Any change made to these datamodel files requires the entire codebase of the application to be regenerated. This can be make smarter, by finding the differences between the old and the new model, and only regenerating the code that will be changed.

Additionally, changes in a MOLGENIS datamodel requires a new instance of a database. (typically MySQL) It would be great to have a tool that enables the transition of SQL structure so data can be maintained. This tool would be smart enough to inform users of changed which involve remapping or adding values etc to make the migration work. (...TODO)

Duration: ~ 3-6 Months

Graphical MOLGENIS modelling

Supervisor: Joeri van der Velde, Danny Arends, Morris Swertz


The database structures are modelled in MOLGENIS XML by using text (XML) editors. This can instead be done graphically, making it much easier to maintain overview of larger models. Additionally, it would enable to model applications at runtime, press test and save to configure your application. (...TODO)

Duration: ~ 3 Months

Last modified 8 years ago Last modified on 2011-02-01T13:04:53+01:00