| 15 | | * We use Freemarker to define templates |
| 16 | | * We use shell scripts to execute jobs |
| 17 | | * PBS supports dependencies |
| 18 | | * Each step should check for completion and have proper return values (so PBS knows and can cancel dependent jobs) |
| | 15 | Overview: |
| | 16 | * We use Freemarker to define templates of jobs |
| | 17 | * We generate for each job one <job>.sh |
| | 18 | * We generate one submit.sh for the whole workflow |
| | 19 | * The whole workflow behaves like 'make': it can recover from failure where it left of |
| | 20 | * The workflow shares one working directory with conventions to ease inter-step variable passing |
| | 21 | |
| | 22 | Main ingredients: |
| | 23 | * '''The workflow works on a data blackboard''' |
| | 24 | * The whole workflow uses the same working directory (= blackboard architecture pattern) |
| | 25 | * We use standard file names to reduce inter-step parameter passing (= convention over configuration) |
| | 26 | * Naming convention: <unit of analysis>_<name of step>.<ext> |
| | 27 | * For example in NGS lane (unit) alignment (step): {{{<flowcell_lane>_<pairedalign>.bam}}} |
| | 28 | * '''Make style submit.sh''' |
| | 29 | * Each line puts one command in the qsub queue |
| | 30 | * We solve dependency ordering using {{{-W depend=afterok:job1:job2}}} option |
| | 31 | * Use of proper return values will ensure dependent jobs are canceled on fail |
| | 32 | * '''Recoverable steps job<step>.sh''' |
| | 33 | * We generate a .sh file for each job including standard logging |
| | 34 | * Each script checks if the output is already there (otherwise it can be skipped) |
| | 35 | * Each script checks if it has produced its output (otherwise return error) |
| | 36 | * N.B. check file existence using {{{if ! test -h FILE return -1}}} |