= SOP for central deployment of software and reference data sets =

[[TOC()]] 

== Deploying (reference) data sets ==

== Deploying software ==

* Deployment of ''system'' software (packages from the repos of the Linux distros we use) is handled by our sys admins and beyond the scope of this SOP.
* Deployment of bioinformatics software is handled by the bioinformaticians from the ''depad'' group.

If you want to become part of the ''depad'' group [wiki:Contact contact the helpdesk]. 
The depad group uses [https://hpcugent.github.io/easybuild/ EasyBuild], which uses !EasyConfigs as recipes to enforce consistent, reproducible installations.
In a nutshell an EasyConfig deployment recipe can handle the following steps:
* (Bootstrap an installation and satisfy dependencies)
* Download the (source) code
* Verify checksums of the downloads
* Unpack the downloads
* Configure the build
* Compile the code
* Run sanity checks to verify the build was Ok
* Install the (compiled) code together with it's !EasyBuild log
* Generate a module file for use with a module system to configure the environment at runtime.

The locations where we store source code, deployed apps, their accompanying module files, etc. are documented in the [wiki:HPC_storage#Software Storage SOP].
For many apps !EasyBuild !EasyConfigs are already available; These files are stored
* On [https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs]

If there is not an easybuild file (.eb) on github and there is no eb file on the cluster (/apps/sources/EasyBuild/custom), we have to create one ourselves.

First an example of an custom !EasyBuild file created for deploying NGS_DNA pipeline on the cluster. Below the code there will be the explanation of all the steps in the script.
{{{
name = 'NGS_DNA'
version = '3.1.2'
namelower = name.lower()
homepage = 'https://github.com/molgenis/molgenis-pipelines'
description = """This distribution already contains several pipelines/protocols/parameter files which you can use 'out-of-the-box' to align and impute your NGS data using MOLGENIS Compute."""

toolchain = {'name': 'dummy', 'version': 'dummy'}
easyblock = 'Tarball'

#dependencies
molname = 'Molgenis-Compute'
molversion = 'v15.04.1-Java-1.7.0_80'
versionsuffix = '-%s-%s' % (molname,molversion)
dependencies = [(molname,molversion)]

source_urls = [('http://github.com/molgenis/molgenis-pipelines/releases/download/%s/' % (version))]
sources = [('%s-%s.tar.gz' % (name, version))]

sanity_check_paths = {
    'files': ['workflow.csv', 'parameters.csv'],
    'dirs': []
}

moduleclass = 'bio'
}}}
 * '''name''' and '''version''' are pretty clear [[BR]]
 * '''homepage''' and '''description'''; are recommended when releasing a future release of the tool. [[BR]]
 * '''toolchain'''; can be left like it is in the example (only when multiple tools need to be installed before this, you should use toolchain (see manual online of Easybuild). [[BR]]
 * '''easyblock'''; this is the type of data, tar.gz = ‘Tarball’ , executable = ‘Binary’ . All the different easyblocks are described [https://github.com/hpcugent/easybuild-easyblocks/tree/master/easybuild/easyblocks/generic here]  [[BR]]
 * If there any '''dependencies''' (in this case Molgenis-Compute), you put it in the name of your eb file (name will look like this: NGS_DNA-3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80)  [[BR]]
 * '''Using variables''' instead of typing the same string 5 times is done with %s and then between () the name of the variable. [[BR]]
 * One necessary step is to set '''sanity_check_paths''', this is a check whether the file is unpacked/installed correctly.  [[BR]]
 * All the installed eb configs are put automatically in the /apps/modules/all folder, but with '''moduleclass''' you can specify an extra module path.  '''N.B.''''''' when typing the command module avail on the cluster will only display the non-all modules. So specifying an extra moduleclass is necessary to find your module back in module avail  [[BR]]

For installing more advanced tools like R please read the [https://hpcugent.github.io/easybuild/ documentation] online or have a look in our custom scripts on the cluster

=== installing a new tool ===
 * Create YOURFILE.eb file in /apps/sources/EasyBuild/custom/
 * module load !EasyBuild
 * eb YOURFILE.eb

Before you can execute the installed !EasyBuild file the module needs to be synced to the storage and nodes:
{{{ 
sudo -u umcg-envsync bash 
}}}
A new environment will be loaded, afterwards sync the new module by executing:
{{{
hpc-environment-sync.bash -m <modulename>/<module version>
}}}
To sync new resources use "-r" instead of "-m". To see the full list of options in the sync script use:
{{{
hpc-environment-sync.bash -h
}}}

=== running an already existing .eb file ===
 * eb YOURFILE.eb
 
== FAQ ==
=== error: The module file is already there ===
 * rerunning an .eb file that already is installed will result in this error . To overwrite the module file run with '''''-f''''' argument.

=== I try to rerun an already installed tool, but my sources are not updated ===
 * !EasyBuild will first always check if there is source code in /apps/sources, when it is not there then it will try to download the file. Removing the source code will solve this problem