= User stories for using R within the Compute module =
[[TOC()]]
As an bioinformatician I want to share my bioinformatics script and scale it up.
Authors: Danny, Morris, Joeri

== User story: adding an R script ==

How to demo:

Within the user interface you can choose [!ComputeProtocol] and then choose [New].
Then you can give the compute protocol a name, set the type to 'R' and in a text box copy your R script.

For example:

||Name ||!OneTraitQtlMapping ||
||Interpreter ||R ||
||Script ||{{{genotypes <- getFromDb(genotypes_name)}}}[[BR]]{{{phenotypes <- getFromDb(phenotype_name)}}}[[BR]]{{{onetraitqtlmapping(trait_names, genotypes, phenotypes)}}}||

Next you define the parameters that you have in your script, which are 'traitname' and 'datasetname' and 'iteration'.
These are can be described using !ComputeFeature, where we define them as follows:

||'''Name''' ||'''Datatype''' ||'''Description''' ||
||traitnames ||List(String) ||This is a list of trait names such as defined in your data set. For example 'height' or 'probe01' ||
||genotypes_name ||xref(Data) ||This a lookup in the list of available Data so the user can choose the genotypes (implicitly links to map ||
||phenotype_name ||xref(Data) ||This is a lookup in the list of available Data so the user can choose the phenotypes ||
||results_name ||String ||This is the name of the new data set that will be created to store the data in (or if it exists, data will be added to it ||

== User story: running an R script ==

How to demo:

Within the user interface you can choose [Run analyses]. Here you can browse the list of available compute protocols. For example, lets choose the protocol that we defined in the previous user story. Then you are provided with autogenerated input boxes based on the ComputeFeature that you defined; We fill in the parameters:

||traitnames ||height,length ||
||genotypes_name ||geno ||
||phenotypes_name || classic ||
||results_name ||My phenotype qtl profiles ||

And push [Next].

Automatically a new !ComputeApplication is generated that looks as follows. 

script = 
{{{
//sources useful scripts to interact with MOLGENIS
source("path/your/molgenis/instance/source.R")

//autogenerated by MOLGENIS based on your parameters
traitnames <- c("height","length")
genotypes_name <- "geno"
phenotypes_name <- "classic"
results_name <- "My phenotype qtl profiles" //spaces will be removed when storing as file...

//also this includes some useful system parameters
dbPath <-  "xyz"

//below the script as we defined above is copied for full provenance
genotypes <- getFromDb(genotypes_name)
phenotypes <- getFromDb(phenotype_name)
onetraitqtlmapping(trait_names, genotypes, phenotypes)

}}}

Notice how the parameters are simply passed at the beginning of your script.
This means you can just copy-paste this script in you R terminal and it should work there as well as on the cluster (ideal for testing)!

If you again push [[NEXT]] this job will be sent to Job manager (user story TODO).

== User story: using R to define large scale analyses ==

How to demo:

Within the user interface you repeat user story on 'adding an R script'. Goal now is to make a script uses MOLGENIS job submission API to automatically calculate qtl profiles on all our phenotypes. We again choose [New !ComputePrococol] and add a script that looks like the following: 

{{{
//here our script will use the job submission API so you can run many R scripts
for(i in 1:number_of_jobs)
{
        //cut out a subset of the phenotypes
        selectedNames <- sliceList(i,number_of_jobs)

        //define what other 'compute protocol' MOLGENIS should call (obviously the one we defined above)
	submitJob(protocol="OneTraitQtlMapping", genotypes_name=genotypes_name, phenotypes_name=phenotypes_name, traitnames=selectedNames);

	//Note: all parameters you pass to submitJob will be automatically passed to the other protocol
	//result is that a new "computeapplication" will be created programmatically instead of via the UI to be sent to the cluster
}
}}}

Notice how we replaced "trait_names" with "number_of_jobs" because we just want to run all phenotypes :-)

For completeness the parameters:

||'''Name''' ||'''Datatype''' ||'''Description''' ||
||genotypes_name ||xref(Data) ||This a lookup in the list of available Data so the user can choose the genotypes (implicitly links to map ||
||phenotype_name ||xref(Data) ||This is a lookup in the list of available Data so the user can choose the phenotypes ||
||results_name ||String ||This is the name of the new data set that will be created to store the data in (or if it exists, data will be added to it ||
||number_of_jobs ||Integer ||This will determine how the phenotype dataset will be split in jobs; should be 1 or larger. ||

== user story: using the job submission API to generate scripts ==

How to demo:

In the above use story we used the job submission API to start pre-existing compute protocols (in this case, !OneTraitQtlMapping).
However, we can also create a variant where the job is a script instead of a protocol (aka, an 'anonymous protocol'):

{{{

//here our script will use the job submission API so you can run many R scripts
for(i in 1:number_of_jobs)
{
        //cut out a subset of the phenotypes
        selectedNames <- sliceList(i,number_of_jobs)

        //define what other 'compute protocol' MOLGENIS should call (obviously the one we defined above)
        script <- paste("genotypes <- getFromDb(genotypes_name)",
                        "phenotypes <- getFromDb(phenotype_name)",
                        "onetraitqtlmapping(trait_names, genotypes, phenotypes)", 
                        sep="\n")
	submitJob(script=script, genotypes_name=genotypes_name, phenotypes_name=phenotypes_name, traitnames=selectedNames);

	//Note: all parameters you pass to submitJob will be automatically passed to the other protocol
	//result is that a new "computeapplication" will be created programmatically instead of via the UI to be sent to the cluster
}

}}}