Path: blob/master/part-2/workflows/hyperqueue.md
696 views
---
---
Using HyperQueue and local disk to process many small files efficiently
This tutorial is done on Puhti, which requires that
you have a user account at CSC.
your account belongs to a project that has access to the Puhti service.
Overview
💬 HyperQueue is a tool for efficient sub-node task scheduling and well suited for task farming and running embarrassingly parallel jobs.
💬 In this example, we have more than 4000 files which we want to convert to another format.
Each file contains a SMILES string (an ASCII notation encoding the two-dimensional structure of a molecule) which we want to convert into a three-dimensional coordinate format.
The computational cost of each of the conversions is expected to be comparable.
Since the workflow involves many small files, we will also demonstrate the usage of the fast local disk to avoid stressing the parallel file system.
The workflow of this exercise
Download the files from Allas.
Decompress the files to
$LOCAL_SCRATCH
.Convert each
.smi
file to a three-dimensional.sdf
molecular file format.Archive and compress the output files and move them back to
/scratch
.
Download the input files
Create and enter a suitable scratch directory on Puhti (replace
<project>
with your CSC project, e.g.project_2001234
):Download the input files representing molecules initially obtained from the ChEMBL database:
Create a HyperQueue batch script
💬 We will use Open Babel to convert the SMILES strings into a three-dimensional .sdf
coordinate format.
☝🏻 Multiple files can be converted using the batch conversion mode of Open Babel. For 4000+ files this would, however, take more than an hour. Submitting each conversion as a separate Slurm job is also a bad idea since each conversion takes just a few seconds. Running many short jobs will degrade the performance of the scheduler for all users.
💡 HyperQueue is a program that allows us to schedule sub-node tasks within a Slurm allocation. One can think of HyperQueue as a "Slurm within a Slurm", or a so-called "meta-scheduler", which allows us to leverage embarrassing parallelism without overloading Slurm by looping srun
or sbatch
commands.
Copy the following script into a file
hq.sh
using, e.g.,nano
:As explained by the comments in the script above, HyperQueue works on a worker-server-client basis, i.e. a worker is started on each compute node to execute commands that the client submitted to the server.
This is in principle what Slurm also does, only difference is that you need to start the server and workers yourself.
In this example, one full Puhti node is reserved for processing the files, meaning that 40 conversion commands will be running in parallel.
☝🏻 Ideally, the number of sub-tasks should be larger than the amount that can fit running on the reserved resources simultaneously to avoid too short Slurm jobs.
Submit the script with:
After a short while, you should notice that a file
sdf.tar.gz
containing the output files has appeared in your working directory. How long did it take to convert all files?
💡 Tip: If you want to monitor the progress of your HyperQueue jobs/tasks, just set the location of the HyperQueue server and use the hq
commands (see the official documentation for details). Note that the HQ job/task IDs are not the same as the Slurm job ID!
☝🏻 Remember to load the hyperqueue
module before running the commands above from the terminal, otherwise you will get an error message -bash: hq: command not found
.
☝🏻 Also note that these commands are available only after your job has started running.
💬 To get a report of how your jobs/tasks completed and spot possible failures, you could also run one of these commands in your batch script and redirect the output to a file before shutting down the server.