= MolgenisProcessing = [[TOC()]] This is a place holder for MOLGENIS processing framework that is now underway. Some notes below. == Use Cases == This is a short note on use cases we want to support in the MOLGENIS processing extension * Share my pipeline * Add new module * List my data items * Incorporate Galaxy or GenePattern modules in my pipeline * How did I produce this result file? * Autogenerate a R-suave (?) document that is executable documentation? * Export R data annotation packages? == PBS best practices == This is a short note on how we are now using PBS Overview: * We use Freemarker to define templates of jobs * We generate for each job one .sh * We generate one submit.sh for the whole workflow * The whole workflow behaves like 'make': it can recover from failure where it left of * The workflow shares one working directory with conventions to ease inter-step variable passing Main ingredients: * '''The workflow works on a data blackboard''' * The whole workflow uses the same working directory (= blackboard architecture pattern) * We use standard file names to reduce inter-step parameter passing (= convention over configuration) * Naming convention: _. * For example in NGS lane (unit) alignment (step): {{{_.bam}}} * '''Make style submit.sh''' * Each line puts one command in the qsub queue * We solve dependency ordering using {{{-W depend=afterok:job1:job2}}} option * Use of proper return values will ensure dependent jobs are canceled on fail * '''Recoverable steps job.sh''' * We generate a .sh file for each job including standard logging * Each script checks if the output is already there (otherwise it can be skipped) * Each script checks if it has produced its output (otherwise return error) * N.B. check file existence with error status return {{{test -h FILE || return 1}}}