Path: blob/master/part-2/data-io/fast-local-disks.md
696 views
---
---
How to run I/O intensive computing tasks efficiently?
Background
☝🏻 Lustre-based project-specific directories /scratch
and /projappl
can store large amounts of data and are accessible to all compute nodes of Puhti. However, these directories are not good for managing numerous files or performing intensive input/output (I/O) operations. If you need to work with a huge number of smaller files or perform frequent reads/writes, you should consider using the NVMe-based local temporary scratch directories, either through normal or interactive batch jobs.
💡 Read more about the advantages of using the local scratch disk in Docs CSC.
Convert the following regular batch job script into one that uses local scratch for faster I/O
💬 Below is a normal batch job script that pulls a docker image from DockerHub and converts it into an Apptainer image that is compatible with HPC environments such as CSC supercomputers Puhti and Mahti. During the conversion process, several layers are retrieved, cached and then converted into an Apptainer .sif
image file.
Copy the script above to a file (e.g.
batch_job.sh
) and modify it accordingly.You can then submit the script file to a compute node using the command:
💭 How long did it take to finish the job? What about when using NVMe?
Hints
If you first ran the default script (above), then you need to clear the cache before running the next one.
Request fast local storage using the
--gres
flag in the#SBATCH
directive as follows:E.g., to request 200 GB of fast disk space, use:
Use the environment variable
$LOCAL_SCRATCH
to access the local storage on each compute node.Important! After you've processed the data on the fast local disk, remember to move it back to the shared disk area (
/scratch
), otherwise the data will be lost!Solution for script:
Below is a comparison of execution time for running the same job on
$LOCAL_SCRATCH
vs. normal/scratch
.$LOCAL_SCRATCH
/scratch
Wall-clock time 22m 06s 50m 06s