Context Navigation

Version 8 (modified by Morris Swertz, 15 years ago) (diff)
--

MolgenisProcessing

This is a place holder for MOLGENIS processing framework that is now underway. Some notes below.

This is a short note on use cases we want to support in the MOLGENIS processing extension

This is a short note on how we are now using PBS

Overview:

We use Freemarker to define templates of jobs
We generate for each job one <job>.sh
We generate one submit.sh for the whole workflow
The whole workflow behaves like 'make': it can recover from failure where it left of
The workflow shares one working directory with conventions to ease inter-step variable passing

Main ingredients:

The workflow works on a data blackboard
- The whole workflow uses the same working directory (= blackboard architecture pattern)
- We use standard file names to reduce inter-step parameter passing (= convention over configuration)
- Naming convention: <unit of analysis>_<name of step>.<ext>
- For example in NGS lane (unit) alignment (step): <flowcell_lane>_<pairedalign>.bam
Make style submit.sh
- Each line puts one command in the qsub queue
- We solve dependency ordering using -W depend=afterok:job1:job2 option
- Use of proper return values will ensure dependent jobs are canceled on fail
Recoverable steps job<step>.sh
- We generate a .sh file for each job including standard logging
- Each script checks if the output is already there (otherwise it can be skipped)
- Each script checks if it has produced its output (otherwise return error)
- N.B. check file existence using if ! test -h FILE return -1