Changes between Initial Version and Version 1 of ComputeRoadmap


Ignore:
Timestamp:
2013-01-09T08:09:33+01:00 (12 years ago)
Author:
Morris Swertz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ComputeRoadmap

    v1 v1  
     1[[TOC()]]
     2= Compute Roadmap =
     3
     4This pages describes plans to make Compute even better.
     5
     6NB: the features below have still to be put on a release schedule!
     7
     8== Finish creation of tests for all important configurations ==
     9* local, pbs, grid
     10* impute, align
     11
     12== Make it easier to insert additional step/remove a step ==
     13* requires seperation between 'per protocol input/output parameters' and 'workflow links'
     14* should not be extra work; could do automatic mapping?
     15* should be solved in the workflow.csv (instead of control flow, do data flow, i.e. create list of output-input edges)
     16
     17== Auto generate the list of parameters that you need ==
     18* could automatigically be filled from templates or, if we have it, the data flow?
     19
     20== Better error reporting on input ==
     21* doesn't list which variable is missing
     22* templates are missing
     23* syntax checking of CSV files
     24
     25== Monitoring of progress ==
     26* how far is the analysis
     27* succesfully or wrongly, during running (could be done with #end macro)
     28* add a job at the end that creates report of results (for commandline)
     29
     30== Monitoring of success and resource usage ==
     31* have harmonized method to report 'success' or 'error', incl message + runtime
     32* include also stuff like max, min etc.
     33
     34== Make transparant/unimportant for the user which backend is actually used ==
     35* system decides where it (can) run: cluster, pbs, grid, local
     36* needs flexible file manager that can 'stage' data for any backend
     37* like to restart on other backend
     38
     39== Transparent server to stage data ==
     40* to easily move pipeline to other storage.
     41
     42== Restart from specific step ==
     43* remove files so far
     44* reduce problem by using *tmp file and and only 'mv' if step succesful
     45* however, this doesn't work if we want to restart
     46* could use folders per step, so you could delete the folders from step onwards
     47* can be solved by good practice
     48
     49== Store pilot job id in the task in the database ==
     50* I need to know which pilot jobs have died, and which tasks were associated with it
     51* Then I can re-release tasks to they can be done by another pilot
     52
     53== Like to have a 'heartbeat' for jobs ==
     54* so I can be sure a (pilot) job is still alive
     55* could us a 'background' process that pings back to database
     56* could also be used for pbs jobs
     57
     58== Add putFile -force to manual ==
     59
     60== Enable easy merging of workflows, merging of parameters ==
     61* Easily combine protocols from multiple workflows
     62* wants less parameter files
     63* meanwhile allow multiple worksheets
     64
     65== Get rid of parameters.csv and instead create worksheet ==
     66* so parameter names on first row
     67* hasOne using naming scheme A_B, means B has one A
     68* conclusion: use multiple headers.
     69* allow -parameters and -tparameter
     70
     71== Cleanup backend specific protocols ==
     72* e.g. 'touch' commands
     73
     74== Visualization framework for analyses runs ==
     75
     76== Rearchitecture the components ==
     77* one interface, multiple implementations
     78* unit tests
     79
     80== Make submit.sh uses 'lock' file so the first job only ends when all is submitted ==
     81* Problem is that first jobs fails quickly, many dependent jobs are not yet submitted, and get orphaned
     82* so dependent jobs can be submitted and never have issue (alex feature)
     83* at this in the #end macro so jobs can never finish until complete workflow is submitted
     84
     85== Seperate protocols from code ==
     86* yes, seperate github repo
     87* should enable to combine multiple protocol folders, multiple parameter filse
     88* should indicate at which compute version it works
     89
     90== Users for runs, priorities? ==
     91* needed if we want 'priority queue' for pilot jobs
     92
     93== Publish!==
     94
     95== Approach ==
     96Clean start = yes
     97seperate development 5 from using 4 (bug fixes) = yes
     98database, commandline or both = both
     99release schedule -> roadmap
     100backwards compatibility = no
     101
     102
     103