wiki:HPC_deploy

Version 18 (modified by Pieter Neerincx, 9 years ago) (diff)

--

SOP for central deployment of software and reference data sets

Deploying (reference) data sets

Deploying software

  • Deployment of system software (packages from the repos of the Linux distros we use) is handled by our sys admins and beyond the scope of this SOP.
  • Deployment of bioinformatics software is handled by the bioinformaticians from the depad group.

If you want to become part of the depad group contact the helpdesk. The depad group uses EasyBuild, which uses EasyConfigs as recipes to enforce consistent, reproducible installations. In a nutshell an EasyConfig? deployment recipe can handle the following steps:

  • (Bootstrap an installation and satisfy dependencies)
  • Download the (source) code
  • Verify checksums of the downloads
  • Unpack the downloads
  • Configure the build
  • Compile the code
  • Run sanity checks to verify the build was Ok
  • Install the (compiled) code together with it's EasyBuild log
  • Generate a module file for use with a module system to configure the environment at runtime.

The locations where we store source code, deployed apps, their accompanying module files, etc. are documented in the Storage SOP?. For many apps EasyBuild EasyConfigs are already available; These files are stored

If there is not an easybuild file (.eb) on github and there is no eb file on the cluster (/apps/sources/EasyBuild/custom), we have to create one ourselves.

First an example of an custom EasyBuild file created for deploying NGS_DNA pipeline on the cluster. Below the code there will be the explanation of all the steps in the script.

name = 'NGS_DNA'
version = '3.1.2'
namelower = name.lower()
homepage = 'https://github.com/molgenis/molgenis-pipelines'
description = """This distribution already contains several pipelines/protocols/parameter files which you can use 'out-of-the-box' to align and impute your NGS data using MOLGENIS Compute."""

toolchain = {'name': 'dummy', 'version': 'dummy'}
easyblock = 'Tarball'

#dependencies
molname = 'Molgenis-Compute'
molversion = 'v15.04.1-Java-1.7.0_80'
versionsuffix = '-%s-%s' % (molname,molversion)
dependencies = [(molname,molversion)]

source_urls = [('http://github.com/molgenis/molgenis-pipelines/releases/download/%s/' % (version))]
sources = [('%s-%s.tar.gz' % (name, version))]

sanity_check_paths = {
    'files': ['workflow.csv', 'parameters.csv'],
    'dirs': []
}

moduleclass = 'bio'
  • name and version are pretty clear
  • homepage and description; are recommended when releasing a future release of the tool.
  • toolchain; can be left like it is in the example (only when multiple tools need to be installed before this, you should use toolchain (see manual online of Easybuild).
  • easyblock; this is the type of data, tar.gz = ‘Tarball’ , executable = ‘Binary’ . All the different easyblocks are described here
  • If there any dependencies (in this case Molgenis-Compute), you put it in the name of your eb file (name will look like this: NGS_DNA-3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80)
  • Using variables instead of typing the same string 5 times is done with %s and then between () the name of the variable.
  • One necessary step is to set sanity_check_paths, this is a check whether the file is unpacked/installed correctly.
  • All the installed eb configs are put automatically in the /apps/modules/all folder, but with moduleclass you can specify an extra module path. N.B. when typing the command module avail on the cluster will only display the non-all modules. So specifying an extra moduleclass is necessary to find your module back in module avail

For installing more advanced tools like R please read the documentation online or have a look in our custom scripts on the cluster

installing a new tool

  • Create YOURFILE.eb file in /apps/sources/EasyBuild/custom/
  • module load EasyBuild
  • eb YOURFILE.eb

Before you can execute the installed EasyBuild file the module needs to be synced to the storage and nodes:

sudo -u umcg-envsync bash 

A new environment will be loaded, afterwards sync the new module by executing:

hpc-environment-sync.bash -m <modulename>/<module version>

To sync new resources use "-r" instead of "-m". To see the full list of options in the sync script use:

hpc-environment-sync.bash -h

running an already existing .eb file

  • eb YOURFILE.eb

FAQ

error: The module file is already there

  • rerunning an .eb file that already is installed will result in this error . To overwrite the module file run with -f argument.

I try to rerun an already installed tool, but my sources are not updated

  • EasyBuild will first always check if there is source code in /apps/sources, when it is not there then it will try to download the file. Removing the source code will solve this problem