= SOP for central deployment of software and reference data sets = [[TOC()]] == Deploying (reference) data sets == ==== 0. Make sure perms are correct ==== Use umask **before** you start: {{{ umask 0002 }}} ==== 1. Where to put reference data ==== Reference data sets available to all (Hence not group specific data) can be deployed ''as-is'' in: {{{ /apps/data/${provider}/${data_set}/$version/ }}} When reference data must be modified for example because it must be indexed / reformatted for use with specific version of software, you must put the derived version in a sub dir to indicate it is not the original. When it was modified for a specific version of an app you could for example create additional sub dirs like this: {{{ /apps/data/${provider}/${data_set}/$version/${app}/${version}/ }}} Always add a {{{/apps/data/${provider}/${data_set}/$version/README}}} with at least details on: * What the source location of the data was. * When it was download. * If a derived flavor was created: how the data was modified (link to eLabjournal and/or code in our !GitHub repos) and for what purpose. {{{#!comment TODO: add example for GRCh38 /apps/data/GRC/GRCh/38/ * data gedownload "as is"; hooguit uitgepakt /apps/data/GRC/GRCh/38/BWA/0.7.12-goolf-1.7.20/ * een setje relatieve symlinks naar de referentie sequenties: ../../uitgepakte referentie fasta seqs * bwa indices voor deze referentie }}} ==== 2. Syncing deployed reference data to nodes ==== #SyncRefData Before you can use reference data on cluster nodes it needs to be synced to various places. * Switch to the ''envsync'' user: {{{ $> sudo -u umcg-envsync bash }}} * Now sync the reference data by specifying the path to the data set relative to /apps/data/ (or specify the complete absolute path if you like to type). The sync will work recursively. {{{ $> hpc-environment-sync.bash -r ReferenceData/ }}} or {{{ $> hpc-environment-sync.bash -r /apps/data/ReferenceData/ }}} * For a full list of options use the commandline help: {{{ hpc-environment-sync.bash -h }}} == Deploying software == Responsibility for deploying and maintaining software is distributed over two teams: * Deployment of ''system'' software (packages from the repos of the Linux distros we use) is handled by our sys admins and beyond the scope of this SOP. * Deployment of bioinformatics software is handled by the bioinformaticians from the ''depad'' group.[[BR]] If you want to join the ''depad'' group, please [wiki:Contact contact the helpdesk]. The depad group uses [https://hpcugent.github.io/easybuild/ EasyBuild], which in turn uses !EasyConfigs as recipes to enforce consistent, reproducible installations. In a nutshell an !EasyConfig deployment recipe can handle the following steps: 1. (Bootstrap an installation and satisfy dependencies) 1. Download the (source) code 1. Verify checksums of the downloads 1. Unpack the downloads 1. Configure the build 1. Compile the code 1. Run sanity checks to verify the build was Ok 1. Install the (compiled) code together with it's !EasyBuild log 1. Generate a module file for use with a module system to configure the environment at runtime. The locations where we store source code, deployed apps, their accompanying module files, etc. are documented in the [wiki:HPC_storage#Software Storage SOP]. We use the [https://www.tacc.utexas.edu/research-development/tacc-projects/lmod Lua based module system (Lmod)] to make software transparently available on all machines at runtime. Details on how to install and configure !EasyBuild, Lmod and our environment sync script on a new cluster can be found in our [https://github.com/molgenis/depad-utils/blob/master/hpc-2.x/README.md depad-utils GitHub repo]. ==== Locations of !EasyConfig files to deploy software with !EasyBuild ==== 1. !EasyBuild comes with !EasyConfigs for many of our apps of interest ''out of the box''; These files are stored * On our machines in a sub sub sub directory of where !EasyBuild was installed.[[BR]] The easiest way to find this directory is to load !EasyBuild and search for an app with {{{-S}}} like this: {{{ $> module load EasyBuild $> module list Currently Loaded Modules: 1) EasyBuild/2.2.0 $> eb -S GATK == temporary log file in case of crash /tmp/eb-JHpvCj/easybuild-tR8MKl.log == Searching (case-insensitive) for 'GATK' in /apps/software/EasyBuild/2.2.0/lib/python2.6/site-packages/easybuild_easyconfigs-2.2.0-py2.6.egg/easybuild/easyconfigs CFGS1=/apps/software/EasyBuild/2.2.0/lib/python2.6/site-packages/easybuild_easyconfigs-2.2.0-py2.6.egg/easybuild/easyconfigs/g/GATK * $CFGS1/GATK-1.0.5083.eb * $CFGS1/GATK-2.5-2-Java-1.7.0_10.eb * $CFGS1/GATK-2.6-5-Java-1.7.0_10.eb * $CFGS1/GATK-2.7-4-Java-1.7.0_10.eb * $CFGS1/GATK-2.7-4.eb * $CFGS1/GATK-2.8-1-Java-1.7.0_10.eb * $CFGS1/GATK-3.0-0-Java-1.7.0_10.eb * $CFGS1/GATK-3.3-0-Java-1.7.0_21.eb == temporary log file(s) /tmp/eb-JHpvCj/easybuild-tR8MKl.log* have been removed. == temporary directory /tmp/eb-JHpvCj has been removed. }}} * On [https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs https://github.com/hpcugent/easybuild-easyconfigs/tree/master/easybuild/easyconfigs] for the latest and greatest from the source. 1. Custom !EasyConfigs not yet pull-merged into the !GitHub repos from [http://www.ugent.be/hpc/en HPC Ugent] are stored in a fork from this repo at: {{{ https://github.com/molgenis/easybuild-easyconfigs.git }}} If it ain't on !GitHub your !EasyConfig does not exist and software centrally deployed in /apps without !EasyConfig on !GitHub is subject to removal without warning! We use the standard way of working with !GitHub repo's, so: 1. Create an online fork @ !GitHub 2. Clone your own online !GitHub fork in for example your home dir on the cluster 3. Add/modify !EasyConfigs, commit and push to your own fork. Never push to the blessed master @ https://github.com/molgenis/easybuild-easyconfigs.git! 4. Create a pull request to get your changes into the blessed master. Creating a pull request to get your changes directly into the main repo from UGent is off course also an option, but may require more patience to have your pull request processed. 5. We do plan to create pull requests from the github.com/molgenis to the source @ github.com/hpcugent ...[[BR]] In pseudo code: {{{ # # After creating your online fork @ GitHub, login on a cluster # and clone your GitHub repo supplemented with our custom EasyConfigs. # $> cd ${HOME} $> mkdir git $> cd git $> git clone https://github.com/${your_github_account}/easybuild-easyconfigs.git $> cd easybuild-easyconfigs $> git remote add blessed https://github.com/molgenis/easybuild-easyconfigs.git $> git remote set-url --push blessed non.existing.domain # # Now you can deploy a tool; for example version v16.11.1 of our cluster-utils with EasyBuild # $> ml EasyBuild $> eb --robot \ --robot-paths="${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/:" \ --software=cluster-utils,v16.11.1 # # Software which needs to be compiled and requires a toolchain can be deployed with: # $> ml EasyBuild $> eb --robot \ --robot-paths="${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/:" \ --software=${name},${version} \ --toolchain=foss,2015b }}} Note that the **foss 2015b** toolchain is still our default. We will most likely skip 2016a and move to 2016b during scheduled maintenance of summer 2017. ==== Creating a new !EasyConfig ==== If there is no existing !EasyConfig, we have to create one ourselves. First an example of a custom !EasyBuild file created for deploying our NGS_DNA pipeline. Below the code there will be the explanation of all the steps in the recipe. {{{ name = 'NGS_DNA' version = '3.1.2' namelower = name.lower() homepage = 'https://github.com/molgenis/molgenis-pipelines' description = """This distribution already contains several pipelines/protocols/parameter files which you can use 'out-of-the-box' to align and impute your NGS data using MOLGENIS Compute.""" toolchain = {'name': 'dummy', 'version': 'dummy'} easyblock = 'Tarball' #dependencies molname = 'Molgenis-Compute' molversion = 'v15.04.1-Java-1.7.0_80' versionsuffix = '-%s-%s' % (molname,molversion) dependencies = [(molname,molversion)] source_urls = [('http://github.com/molgenis/molgenis-pipelines/releases/download/%s/' % (version))] sources = [('%s-%s.tar.gz' % (name, version))] sanity_check_paths = { 'files': ['workflow.csv', 'parameters.csv'], 'dirs': [] } moduleclass = 'bio' }}} * '''name''' and '''version''' are pretty clear [[BR]] * '''homepage''' and '''description'''; are recommended when releasing a future release of the tool. [[BR]] * '''toolchain'''; can be left like it is in this example. Only when multiple tools need to be installed as dependencies for compiling or installing this one, a toolchain is required. See the [http://easybuild.readthedocs.org/en/latest/index.html !EasyBuild docs] for details. [[BR]] * '''easyblock'''; this is the type of data, tar.gz = ‘Tarball’ , executable = ‘Binary’ . All the different easyblocks are described [https://github.com/hpcugent/easybuild-easyblocks/tree/master/easybuild/easyblocks/generic here] [[BR]] * If there are any language/framework '''dependencies''' (in this case Molgenis-Compute), you put it in the name of your eb file; Name may look for example like this: NGS_DNA-3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 [[BR]] * '''Using variables''' instead of typing the same string 5 times is done with %s and then between () the name of the variable. [[BR]] * One necessary step is to set '''sanity_check_paths''', this is a check whether the file is unpacked/installed correctly. [[BR]] * All the installed eb configs are put automatically in the /apps/modules/all folder, but with '''moduleclass''' you can specify an extra module path. '''N.B.''' when typing the command module avail on the cluster will only display the non-all modules. So specifying an extra moduleclass is necessary to find your module back in module avail [[BR]] For more complicated installations like for example for ''R'', please read the [https://hpcugent.github.io/easybuild/ documentation] online and have a look at existing !EasyConfigs. ==== 0. Make sure perms are correct ==== use umask **before** you start: {{{ umask 0002 }}} ==== 1. Find existing !EasyConfig or create a new one ==== * To search for existing !EasyConfigs in both the default robot search path as well as in your fork of our custom !EasyConfigs git repo cloned into your home dir: {{{ $> module load EasyBuild $> module list $> eb --robot-paths="${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/:" -S MySearchTerm }}} * When no suitable !EasyConfig is present, create your own: {{{ $> cd ${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/ $> touch m/MyEasyConfig-1.2.3.eb }}} Make sure to name the *.eb file exactly the same as the name of the eventually installed ''module'' and its ''version''. When a toolchain is used this must be included in the file name. E.g.: {{{ $> touch m/MyEasyConfig-1.2.3-foss-2015b.eb }}} ==== 2. Installing the software ==== Enable the ''robot'' option for automatic dependency resolution and **prepend** our custom !EasyConfigs repo to the search path to make !EasyBuild search for deps in our custom !EasyConfigs repo first. {{{ $> module load EasyBuild $> module list $> eb --robot \ --robot-paths="${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/:" \ --software=MyEasyConfig,1.2.3 \ --toolchain=foss,2015b }}} ==== 3. Specifying the default version of an app (optional) ==== When you load an app into your environment with the {{{module load}}} command without specifying explicitly which version you want to use, Lmod will load the highest version number. This is usually fine, but if you installed a new version that is not yet well tested, you may want to explicitly configure an older version as the default. This can simply be accomplished by creating a **relative** symlink named default. Our module files are located at {{{/apps/modules/}}}. For example let's look at our NGS_DNA analysis pipeline. It is part of the ''bio'' collection of apps, so the the module files are in: {{{ $> ls -ahl /apps/modules/bio/NGS_DNA/ total 20K drwxrwsr-x 2 prefix-someuser umcg-depad 4.0K Nov 12 15:28 . drwxrwsr-x 64 prefix-someuser umcg-depad 4.0K Nov 18 17:43 .. lrwxrwxrwx 1 prefix-someuser umcg-depad 71 Aug 25 10:52 3.0.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 -> /apps/modules/all/NGS_DNA/3.0.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 lrwxrwxrwx 1 prefix-someuser umcg-depad 71 Nov 4 09:04 3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 -> /apps/modules/all/NGS_DNA/3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 lrwxrwxrwx 1 prefix-someuser umcg-depad 71 Nov 12 15:28 3.2.1-Molgenis-Compute-v15.11.1-Java-1.8.0_45 -> /apps/modules/all/NGS_DNA/3.2.1-Molgenis-Compute-v15.11.1-Java-1.8.0_45 lrwxrwxrwx 1 prefix-someuser umcg-depad 45 Sep 24 16:22 default -> 3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 }}} Note that all module files for all apps are in {{{/apps/modules/all/}}} and we only see symlinks in {{{/apps/modules/bio/}}}. Version 3.2.1 is the latest, but 3.1.2 has been designated as default using a symlink. The symlink must be a relative one pointing to a file or another symlink in the same directory. You may have to [wiki:HPC_deploy#UpdateCacheAndSync update the Lmod caches] before the change in default version takes effect. We can check with if the symlink was recognised using: {{{ $> module avail NGS_DNA -------------------------- /apps/modules/bio -------------------------- NGS_DNA/3.0.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 NGS_DNA/3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 (D) NGS_DNA/3.2.1-Molgenis-Compute-v15.11.1-Java-1.8.0_45 Where: (D): Default Module $> module load NGS_DNA $> module list Currently Loaded Modules: 1) Java/1.7.0_80 2) Molgenis-Compute/v15.04.1-Java-1.7.0_80 3) NGS_DNA/3.1.2-Molgenis-Compute-v15.04.1-Java-1.7.0_80 }}} ==== 4. Deprecating a previously installed older version of an app (optional) ==== You can inform users that (a certain version of) an app is deprecated and will be removed in the near future by creating a custom message when an app is added to the environment with {{{module load}}}. Add your custom message to {{{/apps/modules/modules.admin}}}. For example with this modules.admin: {{{ # # The Lmod admin file consists of "key: value" pairs terminated with a blank line: # # moduleName/version: message # # # Or # # Full/PATH/to/Modulefile: message # # # The message can be as many lines as you like and must be terminated with a blank line. # # Currently used by picard/1.102-Java-1.7.0_80 R/3.1.2-goolf-1.7.20: Deprecated incomplete installation. Will be removed in the near future. }}} loading this specific version of ''R'' will now result in: {{{ $> module load R/3.1.2-goolf-1.7.20 -------------------------------------------------------------------------- There are messages associated with the following module(s): -------------------------------------------------------------------------- R/3.1.2-goolf-1.7.20: Deprecated incomplete installation. Will be removed in the near future. -------------------------------------------------------------------------- }}} Note: You may have to [wiki:HPC_deploy#UpdateCacheAndSync update the Lmod caches] before a change in custom messages takes effect. ==== 5. Updating the Lmod caches and syncing installed software to nodes ==== #UpdateCacheAndSync Before you can use the installed software it needs to be synced to various places and the Lmod caches needs to be updated. * Switch to the ''envsync'' user: {{{ $> sudo -u umcg-envsync bash }}} * Now sync the module: {{{ $> hpc-environment-sync.bash -m / }}} * For a full list of options use the commandline help: {{{ $> hpc-environment-sync.bash -h }}} == FAQ == ==== Q: Why can I not re-deploy a module and receive an error that the module file is already present ==== A: By default !EasyBuild will refuse to overwrite an existing installation. Modules that have been deployed and are used for production should never be modified: deploy a new version instead. During debugging/testing it may be necessary to overwrite a module though; I that case you can force install using ''-f'' like this: {{{ $> eb -f --robot --robot-paths="${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/:" --software=MyEasyConfig,1.2.3 --toolchain=foss,2015b }}} ==== Q: I've updated the source code for an app, but when I try to re-deploy with !EasyBuild nothing changes; What is wrong? ==== A: !EasyBuild will first check whether the source code was previously already downloaded and cached. If yes, it will not re-download again. The cached sources are located in {{{/apps/sources/[a-z]/NameOfTheApp/}}}. Removing the existing download will force !EasyBuild to re-download the (updated) source code. ==== Q: !EasyBuild fails to download the source code. How can I continue the installation process? ==== A: First check if the location where !EasyBuild tries to download the source code is still up-to-date. If not update the !EasyConfig. If the location is correct, but !EasyBuild cannot access this location directly for example because it is blocked by our firewall or because it requires authentication, you can try to download the source manually and put it in the cache directory for the app in {{{/apps/sources/[a-z]/NameOfTheApp/}}}. When !EasyBuild finds the cached source code it will skip the download step and continue. ==== Q: How can I resume a partially failed/succeeded deployment of an !EasyConfig with a lot of extra packages/modules? A: When you have an !EasyConfig of type {{{easyblock = 'Bundle'}}} with a long list of extra modules, packages, etc. for a language like Perl, Python, R, etc. and your install failed partially it ain't fun to start all over from scratch. You can resume the install after fixing the issue when * your !EasyConfig specifies * the extension defaultclass (= the !EasyBlock) using {{{ exts_defaultclass = 'someEasyBlock' }}} * and the extension filter (= command to test whether installation of the extension succeeded) using {{{ exts_filter = ("someCommand", "") }}} * and adds the path where the additional modules/packages will be installed to the language specific environment variable that is used to search for extras - {{{PERL5LIB}}} for Perl, {{{PYTHONPATH}}} for Python, {{{R_LIBS}}} for R, etc. using {{{ modextrapaths = {'ENVVAR': ['path', 'another/path']} }}} * and when you use the {{{--skip}}} and {{{--resume}}} commandline options for the {{{eb}}} command - e.g. {{{ $> eb --skip --rebuild \ --robot --robot-paths=${HOME}/git/easybuild-easyconfigs/easybuild/easyconfigs/: \ path/to/someEaysyConfig.eb }}} * and when deployment already finished succesfully resulting in a module file, but there is an easy workaround: just touch an empty module file: {{{ $> touch ${HPC_ENV_PREFIX/modules/all/moduleName/moduleVersion.lua }}} Once the deployment finished successfully the fake empty module file will be overwritten with a proper one. Example of a minimal Bundle to extend Perl 5.22.0 with the module Some::Module version 1.2.3 from CPAN: {{{ easyblock = 'Bundle' name = 'PerlPlus' version = '5.22.0' # Same as the vanilla Perl module on which these add-on modules depend. versionsuffix = '-v17.01.1' # In format YY.MM.IncrementedReleaseNumber. homepage = 'http://www.perl.org/' description = """Extra modules for Larry Wall's Practical Extraction and Report Language.""" toolchain = {'name': 'foss', 'version': '2015b'} toolchainopts = {'optarch': True, 'pic': True} dependencies = [ ('Perl', version, '-bare'), ] modextrapaths = {'PERL5LIB': ['lib/perl5/', 'lib/perl5/site_perl','lib/perl5/site_perl/5.22.0/'] } moduleclass = 'lang' exts_defaultclass = 'PerlModule' exts_filter = ("perldoc -lm %(ext_name)s ", "") exts_list = [ ('Some::Module', '1.2.3', { 'source_tmpl': 'Some-Module-1.2.3.tar.gz', 'source_urls': ['https://cpan.metacpan.org/authors/id/A/AU/AUTHOR'], }), }}} Example of a minimal Bundle to extend Python 2.7.11 with the egg !SomeEgg version 1.2.3 from !PyPi: {{{ easyblock = 'Bundle' name = 'PythonPlus' version = '%(pyver)s' # Same as the vanilla Python module on which these add-on modules depend. versionsuffix = '-v17.06.1' # In format YY.MM.IncrementedReleaseNumber. homepage = 'https://www.python.org/' description = """The PythonPlus bundle contains add-on modules for Python.""" toolchain = {'name': 'foss', 'version': '2015b'} dependencies = [ ('Python', '2.7.11'), ] exts_defaultclass = 'PythonPackage' exts_filter = ('python -c "import %(ext_name)s"', "") exts_list = [ ('SomeEgg', '1.2.3', { 'source_urls': ['https://pypi.python.org/packages/....long URL'], 'checksums': ['da32434ebfebae2c7506e9577ac558f5'], 'source_tmpl': '%(name)s-%(version)s.zip', # Only required when file name deviates from default naming scheme @ PiPy. 'modulename': 'segg', # name of module for Python import command. Only required when not the same as name of the extension. }), ] modextrapaths = {'PYTHONPATH': ['lib/python%(pyshortver)s/site-packages']} full_sanity_check = True sanity_check_paths = { 'files': [], 'dirs': ['lib/python%(pyshortver)s/site-packages'], } moduleclass = 'lang' }}}