15 | | * We use Freemarker to define templates |
16 | | * We use shell scripts to execute jobs |
17 | | * PBS supports dependencies |
18 | | * Each step should check for completion and have proper return values (so PBS knows and can cancel dependent jobs) |
| 15 | Overview: |
| 16 | * We use Freemarker to define templates of jobs |
| 17 | * We generate for each job one <job>.sh |
| 18 | * We generate one submit.sh for the whole workflow |
| 19 | * The whole workflow behaves like 'make': it can recover from failure where it left of |
| 20 | * The workflow shares one working directory with conventions to ease inter-step variable passing |
| 21 | |
| 22 | Main ingredients: |
| 23 | * '''The workflow works on a data blackboard''' |
| 24 | * The whole workflow uses the same working directory (= blackboard architecture pattern) |
| 25 | * We use standard file names to reduce inter-step parameter passing (= convention over configuration) |
| 26 | * Naming convention: <unit of analysis>_<name of step>.<ext> |
| 27 | * For example in NGS lane (unit) alignment (step): {{{<flowcell_lane>_<pairedalign>.bam}}} |
| 28 | * '''Make style submit.sh''' |
| 29 | * Each line puts one command in the qsub queue |
| 30 | * We solve dependency ordering using {{{-W depend=afterok:job1:job2}}} option |
| 31 | * Use of proper return values will ensure dependent jobs are canceled on fail |
| 32 | * '''Recoverable steps job<step>.sh''' |
| 33 | * We generate a .sh file for each job including standard logging |
| 34 | * Each script checks if the output is already there (otherwise it can be skipped) |
| 35 | * Each script checks if it has produced its output (otherwise return error) |
| 36 | * N.B. check file existence using {{{if ! test -h FILE return -1}}} |