Changes between Initial Version and Version 1 of MolgenisProgress2010_1


Ignore:
Timestamp:
2010-09-29T16:23:08+02:00 (14 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • MolgenisProgress2010_1

    v1 v1  
     1= MOLGENIS progress update Jan - Jun 2010 =
     2
     3== highlights ==
     4
     5* A dedicated MOLGENIS programmer, Robert, funded by [http://www.nbic.nl NBIC] since feb 2010
     6
     7* MOLGENIS for eXtensible Genotype And Phenotype (XGAP) published ([http://www.ncbi.nlm.nih.gov/pubmed/20214801 Swertz et al, Genome Biology])
     8
     9* MOLGENIS used for noricdb.org ([http://www.ncbi.nlm.nih.gov/pubmed/20664631 Leu et al, Eur J Hum Genetics]) published
     10
     11* MOLGENIS used for multiple GEN2PHEN data model pilots (EBI, FIMM, U Leic, shared programmers), paper in draft
     12
     13* MOLGENIS used for a locus specific database (UMCG), paper in draft
     14
     15* MOLGENIS under development for [HGVBaseG2P ] data management (U Leicester, dedicated programmer)
     16
     17* MOLGENIS oral presentations at [BOSC], [HVP], [ISMB], [NBIC] conferences
     18
     19* MOLGENIS uptake: animaldb, eu-panacea/xgap, lifelines/xgap, eu-sysgenet/xgap([http://www.ncbi.nlm.nih.gov/pubmed/20627861 Zouberakis, Database(Oxford)], [http://www.ncbi.nlm.nih.gov/pubmed/20205870 Gruenberger et al, BMC research notes])
     20
     21* Extensive documentation and support infrastructure now online
     22
     23== Progress ==
     24
     25Find the complete list of progress at http://www.molgenis.org/wiki/ChangeLog + filter
     26
     27* Batch upload by name:
     28Enabled users to batch upload 'by name'. This way users don't have to worry about the internal id numbers when using cross references.
     29For example: In the import you can have a column 'Sample_name' and that will automatically resolve the link between your data and this named sample. Status: released.
     30
     31* Batch upload wizard:
     32The user is now provided with an option to choose to 'ignore duplicates' or 'update existing'. This in essence mean they can upload more dirty data and let MOLGENIS take care of the cleaning.
     33
     34* Compact view:
     35The user can now specify <form compact_view="field1,field2". This hides all the other fields from the view when navigating the data which is particularly usefull when having entities with many properties. The user is provided with a 'details' button if they want to see al the other fields. Status: released.
     36
     37* REST/JSON services:
     38The MOLGENIS REST service API has been further improved and hardened in real life. It has been made to work happily together with jQuery which opens MOLGENIS up as suitable back-end for scripting programmers. The interface now can return both XML and JSON messages. A WADL description file is autogenerated.
     39
     40* Documentation generator improvement
     41The automatic UML generator (pictures plus text) has been improved. To ease understanding by non-computer scientists also inherited fields are shown in the subclass. Also the diagrams have been enhanced in colors and layout. Status: released.
     42
     43* Enabled multiple MOLGENIS instances within one project
     44To ease reuse you can now have multiple molgenis generators in one folder. E.g. GenerateAnimaldDB and GenerateLifeLinesDB. This makes it very easy for similar projects to work together while still each producting a unique system for their clients. Status: released.
     45
     46* Simplified Screen and Command framework.
     47Enabled adding or replacing commands in generated screens.
     48use <form name="x" entity="y" commands="command.CommandClass1,command.CommandClass2"/> in meta model. Status: released.
     49
     50* Enable multi-column lookup lists
     51When working in larger systems the organisation of data is often nested. For example, Samples are named within Investigations. To keep things clear people want to make sample names unique, but within an investigation. For the user, this means they must see both Investigation_name and Sample_name to uniquely identify samples. These kind of composite xref_labels="field1,field2" are now possible. Status: released.
     52
     53* Improved model validation
     54MOLGENIS now does extensive checking of the model. This has almost eradicated generator errors because the modeler is now kept from making erroneous models, for example, by validating cross references in the model. Status: released.
     55
     56* Improved the decorator framework
     57Decorators enable MOLGENIS designers to change the behavior of the database on add, update, remove and find. What now can be done is that additional logic can be added pre or post these actions. Moreover, this now also works in inheritance. So if, for example, somebody designs a 'Versionable' interface that keeps track of record versions than all sublcasses of this Versionable would also have this feature. Status: released.
     58
     59* Created Excel and zip based file imports
     60Instead of using a directory of CSV files users can now upload an Excel file. Each of the sheets that has a name matching an entity in your MOLGENIS model will be tested for import. the columns matching entity fields will be reported. Based on this report the user can choose to import. Alternatively, users can upload a zip file with csv/tab files. Status: released.
     61
     62* Added automated testing suite
     63Each MOLGENIs now autogenerates an extensive testing suite that is subsequently tested using a permution of values based on the current data model. Both CSV import/export as database add/update/find/remove are extensively tested. This has greatly improved the quality of each MOLGENIS.
     64After each import these tests are now automatically run on the http://www.molgenis.org/apps/hudson server. Status: released.
     65
     66* Running: authorization and authentication
     67MOLGENIS users can now include a MolgenisAuth plugin that allows users to register and log in using name or openid. Users can be organized in groups, and groups can have read/write access control on the level of forms and entities. Finally, a plugin extension point is added to enable more sohpisticated access control rules, for example for row level security. As planned, we will add standardized implementations of this extension point, for example for row level security in the next 6 months. Status: in beta testing with known partners, we invite anybody interested to contact us as beta tester. Status: under development.
     68
     69* Running: MOLGENIS compute integration
     70MOLGENIS users can now add jobs to a job manager to be submitted to a PBS compatible cluster. Typical use case is to run R scripts. These R scripts then use the MOLGENIS R/API to read raw data and write back results. A simple meta model has been added to design input/output parameters do that the scripts can be parameterized via the MOLGENIS user interface. This work has been piloted in the XGAP system. Also a parser was made to enable tool model exchange with Galaxy servers; this is however not yet fully functional and will be continuted as planned for the coming 6 months. Status: under development.
     71
     72* Running: Index and ontology enhanced search
     73Together with the NBIC/biobank programmers we invested in search (driven by biobank use cases). We have piloted a Lucene indexing method to enable 'google' like searches on whole MOLGENIS instances. Next we devoted effort in the development of OntoCAT (ontocat.org) the open source toolbox that enables simple and uniform access to diverse ontological sources. Currently we are in the process of incorporating this tool to enable semantic query expansion, using ontological relationships to rewrite users query such that more revelent information is found. This project will be further developed in the next 6 months so it can be publically released. Status: under development.
     74
     75* Running: large data matrix storage
     76Large GWAS and QTL studies result in data of incredible sizes, for example 165k individuals * 1M snp markers. We have found that this cannot work on mysql when storing each data element in the database seperately. To overcome this problem without loosing the power to integrate with MOLGENIS we have been working on a software module 'MatrixInterface' that allows alternative backend implementations for such large data. Big advantage is that the data is still connected to the rest of MOLGENIS which enables constraint checking and that user interface efforts to navigate this data are shared. Next to pilots on Oracle this includes a binary and text based format which has been released. Next step is to also support other back-ends like map/ped, bam, trityper, hdf5, hadoop, DAS/ensembl, biomart and so on. Status: under development.
     77
     78* Many bigfixes
     79nesting of submnus, dealing null fields, dealing with null query rules, date related issues, automatic defaults for mrefs, corrected many small issues following automatic code quality check using PMD, extensive work on documentation. Status: released.
     80
     81NB Compared to roadmap made with NBIC we are a little ahead of schedule (we already started with semantics) thanks to support from GEN2PHEN, EBI and NBIC/biobanking.
     82
     83== Bottlenecks ==
     84
     85* MOLGENIS 1st international workshop or mini-conference
     86Diverse groups have asked for a MOLGENIS hackathon, workshop or course.
     87Would NBIC or others be willing to sponsor and co-organize such an event?
     88
     89* MOLGENIS coordination NBIC
     90We are slightly dissapointed that MOLGENIS dissimination is not pushed within NBIC platforms. This is surprising given the international uptake
     91Would it be an idea to add MOLGENIS to the course rotation analogous to other tools like Galaxy? Or to make it part of BRS project requests which would also make more use of our local strengths. Also the scale of MOLGENIS sponsoring as compared to support for other initiatives is rather modest.
     92
     93== Scientific output ==
     94
     95* '''Papers'''
     96  * XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments. Swertz et al - Genome Biol. 2010.11(3):R27
     97  * Towards the integration of mouse databases - definition and implementation of solutions to two use-cases in mouse functional genomics. Gruenberger M et al. BMC Res Notes. 2010 Jan 22.3(1):16.
     98* '''Presentations'''
     99  * XGAP - eXtensible software platform for high throughput Genotypes And Phenotypes. Invited oral presentation at EU-SYSGENET cost meeting, Braunschweig, April 8, 2010 (part of Sysgenet publication)
     100  * Towards flexible data infrastructures for genotype and phenotypes: models, generators, formats & tools. Selected for oral presentation at 3rd Human Variome Project meeting, Paris, May 13, 2010
     101  * Chair of the BioAssist study capturing workshop, June 10, Utrecht, 2010.
     102  * User friendly cluster computing for QTL analysis on XGAP. Danny Arends et al. Poster presentation at NBIC Conference – 2010, Lunteren, March 29
     103  * Towards a MOLGENIS based Platform for Proteomics. Poster at NPC-2010, Utrecht, February 16 and NBIC Conference – 2010, Lunteren, March 29)
     104  * Towards a MOLGENIS based data analysis framework for proteomics. Oral presentation at NBIC Conference – 2010, Lunteren, March 29
     105* '''Future publications'''
     106  * MOLGENIS: rapid prototyping of biosoftware at the push of a button. Morris Swertz et al. Accepted for Technology Track and poster presentation at ISMB2010
     107  * MOLGENIS: rapid prototyping of biosoftware at the push of a button. Morris Swertz et al. Accepted for oral presentation at BOSC2010
     108  * Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB. Alexandros Kanterakis et al. Accepted for oral presentation at BOSC2010.
     109  * Towards a federated microarray gene expression repository using MOLGENIS and MAGE-TAB. Alexandros Kanterakis et al. Accepted for poster presentation at ISMB2010
     110  * Towards a MOLGENIS based computational framework, H. Byelas, M. Swertz, The 19th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Ayia Napa, Cyprus, from 9th to 11th of February, 2011 (submitted paper)
     111  * SYSGENET paper
     112  * GMOD invited presentation
     113
     114== Collaborations ==
     115
     116* '''National'''
     117  * We continue intense collaborations with NBIC (biobanking, brs, molgenis) and NPC (proteomics) just as previous period
     118  * We collaborate now with the LifeLines project, a biobank following 165k individuals for 30 years. MOLGENIS is now piloted for the researchers data access platform
     119  * We are participating in the BBMRI-NL project, the local biobank infrastructure initiative. MOLGENIS will be an indispensible tool for data management.
     120* '''International'''
     121  * We continuated the collaboration with the European Bioinformatics Institute Hinxton
     122