= MolgenisProcessing =
[[TOC()]]

This is a place holder for MOLGENIS processing framework that is now underway.
Some notes below.

== Use Cases ==

This is a short note on use cases we want to support in the MOLGENIS processing extension

 * Share my pipeline
 * Add new module
 * List my data items
 * Incorporate Galaxy or GenePattern modules in my pipeline
 * How did I produce this result file?
 * Autogenerate a R-suave (?) document that is executable documentation?
 * Export R data annotation packages?

== PBS best practices ==
This is a short note on how we are now using PBS

Overview:
  * We use Freemarker to define templates of jobs
  * We generate for each job one <job>.sh
  * We generate one submit.sh for the whole workflow
  * The whole workflow behaves like 'make': it can recover from failure where it left of
  * The workflow shares one working directory with conventions to ease inter-step variable passing

Main ingredients:
* '''The workflow works on a data blackboard'''
  * The whole workflow uses the same working directory (= blackboard architecture pattern)
  * We use standard file names to reduce inter-step parameter passing (= convention over configuration)
  * Naming convention: <unit of analysis>_<name of step>.<ext>
  * For example in NGS lane (unit) alignment (step): {{{<flowcell_lane>_<pairedalign>.bam}}}
* '''Make style submit.sh'''
  * Each line puts one command in the qsub queue
  * We solve dependency ordering using {{{-W depend=afterok:job1:job2}}} option
  * Use of proper return values will ensure dependent jobs are canceled on fail
* '''Recoverable steps job<step>.sh'''
  * We generate a .sh file for each job including standard logging
  * Each script checks if the output is already there (otherwise it can be skipped)
  * Each script checks if it has produced its output (otherwise return error)
  * N.B. check file existence with error status return {{{test -h FILE || return 1}}}