wiki:HPC_analysisPublicBeta

HPC analysis SOP for beta Testers

This page lists only the differences of the new Gearshift cluster compared to

Stability of environment and volatility of storage

During the beta test phase we invite you to hammer the new cluster to test it to the max / limits. This does mean things may brake down. In addition to unscheduled maintenance we may re-deploy the complete cluster and accompanying tmp storage systems every 2 sprints = 6 weeks from scratch. We do this with automated code/configs using Ansible playbooks. Ansible playbooks are in theory idempotent, but in order to test whether we don't have circular dependencies and whether everything is configured in the correct order of dependencies, we may re-deploy from scratch. When necessary to make changes to the storage systems we may have to re-initialize them too. We won't do that unless necessary, but this may mean all data will get lost when we have to re-initialize the tmp or home storage systems too. This does not affect prm storage systems: these are in production phase and with tape backups, so no beta phase there.

!!! This means all data may get lost every 6 weeks !!!

Redeploy will start at 09:00 on End Of Sprint (EOS) Fridays. When we decide to redeploy from scratch, we'll send out notifications to the UMCG-HPC mailinglist and we'll send another notification when deployment has finished successfully and you can resume beta testing. Below is the schedule for the next few EOS Fridays where we plan to redeploy. (Redeploy does not involve prm file systems)

Date EOS
Friday 2020-03-12 Sprint 152
Friday 2020-04-24 Sprint 154
Friday 2020-06-05 Sprint 156
Friday 2020-07-16 Sprint 158
Friday 2020-08-27 Sprint 160

Make sure you rsync any results/data you want to keep from the tmp file systems and home dirs of the new Gearshift cluster to any of the prm file systems of either the new Gearshift or prm of the old Boxy or Calculon clusters before redeployment starts.

entitlement

The Gearshift cluster can currently only be used by users and groups from the UMCG entitlement. Hence it is not yet available for users and groups from the LifeLines entitlement. If your user or group name starts with a 'umcg-' prefix, you are part of the former; if your user or group name starts with an 'll-' prefix on the other hand, you are part of the latter.

group folders

Gearshift uses a new

  • tmp01 storage system with groups folders available at /groups/${groupname}/tmp01/.
  • prm03 storage system. This is physically a new storage system with all data from the old prm03 migrated to the new prm03.

quota

Quota (limits) are not yet functional on Gearshift.

  • There are no limits for group folder on tmp01 nor for individual homes,
    but the physical limit of the entire file system is a hard one as is a 6-weekly re-initialistion from scratch (see above) or unscheduled maintenance in case of a breakdown.
  • There are default limits for groups on prm03 and they are wrong: 275 GB for each group.
    After migration of group data from the old to the new physical file system hosting prm03, most groups have exceeded that limit.
    We'll try to fix that a.s.a.p.

Logins

Always use the proxy/jumphost

For the old clusters Boxy and Calculon you only had to use a proxy/jumphost when connecting from outside the UMCG/RUG network. For the new Gearshift cluster you will always need to use a proxy/jumphost even when connecting from inside te UMCG/RUG network. Gearshift has its own new proxy/jumphost called

airlock.hpc.rug.nl

User Interface server

To submit jobs, check their status, test scripts, etc. you need to login on a User Interface (UI) server using SSH. The UI server for the new cluster is

gearshift

You will need to update your SSH config for the machines of the new cluster (UI + jumphost). See http://docs.gcc.rug.nl/gearshift/logins/

Signed host keys

The Gearshift cluster uses signed host keys. In order to use them you'll need to update your list of known hosts for SSH. See http://docs.gcc.rug.nl/gearshift/logins/

Key pairs

The requirements for strong keys have been upgraded. This means keys generated with the DSA algorithm are no longer accepted and the minimum key size for RSA keys has been increased to 4069 bits. The new Gearshift cluster does accept the new ED25519 elliptic curve based algorithm (preferred.)

You can continue to use your existing old key pair for the old clusters, but if you need to create a new one, please create a new ED25519 key pair using the instructions from the new documentation for Gearshift: http://docs.gcc.rug.nl/gearshift/accounts/

Send the public key of your new key pair to the UMCG HPC helpdesk and request them explicitly to add your new public key as opposed to replacing your old public key with the new one.

Nodes

Cluster Nodes RAM per node* Cores per node* TMP FS PRM FS
Gearshift 10 204/216 GB 22/24 tmp01 prm03
  • For RAM per node and Cores per node:
    • First value is the max that can be allocated to Slurm jobs and hence the max you can request.
    • Second value is the max of the machine/server.

To get an overview of the nodes and see how they are doing:

module load cluster-utils
cnodes

Toolchain and Bioinformatics

  • We've upgraded from foss/2015b to foss/2018b and deployed many bioinformatics tools
  • If your favorite tool is not yet listed using module list send an email to the helpdesk.
  • We now support minimal toolchains. Some tools need a full blown toolchain including various additional libraries, but some only need a C compiler. To minimize the use of unnecessary dependencies the tools have been deployed with
    GCCcore/7.3.0
    
    which is only the C compiler part of foss/2018b
  • Perl, Python and R now all consist of a bare module and a Plus module.
    • The bare module contains the minimal/default installation for that language.
    • The Plus module has a dependency on the bare module for the same version of the same language and includes hundreds of additional packages.
    • E.g. the most complete R environment can be loaded using module load RPlus/3.6.1-foss-2018b-v19.07.1 which loads over 700 additional R packages on top of R/3.6.1-foss-2018b-bare

Transfer data new clusters <-> old clusters

You can transfer data Calculon|Boxy <-> Gearshift using rsync, but there are some limitations:

  • You must initialize the transfer on Gearshift.
    On Gearhfift you can then either pull from or push to Boxy/Calculon.
    Hence you can SSH from Gearshift to Boxy/Calculon, but not the other way around due to the fact that the OpenSSH version of Boxy/Calculon is too old for login on Gearshift.
  • You must use the IP addresses of the internal storage VLAN network interfaces of Boxy/Calculon:
    boxy     in VLAN 985 = 172.23.34.237
    calculon in VLAN 985 = 172.23.34.247
    
    These interfaces don't have names in DNS, so this does NOT work:
    ssh -A youraccount@airlock+gearshift
    rsync -av   some_local_path/to/data   youraccount@boxy.hpc.rug.nl:/other_remote_path/to/data
    
    But this will work to push data from Gearshift to Boxy:
    ssh -A youraccount@airlock+gearshift
    rsync -av   some_local_path/to/data   youraccount@172.23.34.237:/other_remote_path/to/data
    
    Or in order to pull data from Boxy on Gearshift you can use something like this:
    ssh -A youraccount@airlock+gearshift
    rsync -av   youraccount@172.23.34.237:/other_remote_path/to/data   some_local_path/to/data
    
Last modified 5 months ago Last modified on 2020-03-13T10:58:23+01:00

Attachments (1)

Download all attachments as: .zip