Path: blob/master/part-2/data-io/tutorial-fastdisks.md
696 views
---
---
Fast disk areas in CSC's computing environment
☝🏻 This tutorial requires that you have a user account at CSC that is a member of a project that has access to the Puhti service.
Upon completion of this tutorial, you will be familiar with ideal disk areas for I/O-intensive workloads, i.e. frequent read and write operations.
Perform a light-weight pre-processing of data files using fast local disk
💬 You may sometimes come across situations where you have to process a large number of smaller files, which can cause heavy input/output load on the shared file system used in CSC's computing environment.
💬 In order to facilitate such heavy I/O operations, CSC provides fast local disk areas on the login and compute nodes.
First login to Puhti using SSH (or by opening a login node shell in the Puhti web interface):
Identify the fast local disk areas on the login nodes with the following command:
💡 The local disk area on the login nodes is meant for light-weight pre-processing of data and I/O-intensive tasks such as software compilation. Actual computations should be submitted to the batch queue from the /scratch
disk.
💡 The local disk area on the login nodes are meant for temporary use and cleaned often, so make sure to move important data to /scratch
or /projappl
once you do not need the fast disk anymore.
☝🏻 Note that a local disk is specific to a particular node, i.e. you cannot access the local disk of puhti-login11
from puhti-login12
.
Download a tar archive containing thousands of small files and merge the files into one large file using the fast local disk
Download a tar file from the Allas object storage directly to the local disk:
Unpack the downloaded tar file:
Merge each small file into a larger one and remove all small files:
💡
xargs
is a convenient command that takes the output from one command and uses it as an argument to another.
Move your pre-processed data to the project-specific /scratch
area before analysis
💭 Remember: the commands csc-projects
and csc-workspaces
reveal information about your projects.
Create your own folder (using the environment variable
$USER
) under a project-specific directory on the/scratch
disk (or skip this step if you already created the folder in a previous tutorial):Move your pre-processed data from the previous step (i.e., the
Merged.fasta
file) from the fast disk to/scratch
:You have now successfully moved your data to the
/scratch
area and can start performing actual analysis using batch job scripts.
Optional: Fast local disk areas on compute nodes
☝🏻 If you intend to perform heavy computing tasks using a large number of small files, you have to use the fast local disk areas on the compute nodes instead of the login nodes. The compute nodes are accessed either interactively or using batch jobs.
Move to the
/scratch
area of your project and use thesinteractive
command to request an interactive session on a compute node with 1 GB fast local disk for 10 minutes:☝🏻 Not all compute nodes have fast local disks, meaning that you may have to queue for a while before the interactive session starts. You may skip this part if you're in a hurry.
In the interactive session, use the following commands to locate the fast local storage areas on that compute node:
💡 Note how the path to the fast local storage area contains the ID of your Slurm job,
/run/nvme/job_<id>
.Terminate the interactive session and now try the same in a proper batch job. Create a file called
my_nvme.bash
using, for example, thenano
text editor:Copy the following batch script there and change
<project>
to the CSC project you actually want to use:Submit the batch job with the command:
Monitor the progress of your batch job and print the contents of the output file when it has completed:
☝🏻 Again, please note that requesting fast local disk space tends to increase your queueing time. It is a scarce resource, and should only be requested if you really need it. Please ask CSC Service Desk if you're unsure.
‼️ If you write important data to the local disk in your interactive session or batch job, remember to copy the data back to
/scratch
before the job terminates! The local disk is cleaned immediately after your job, and salvaging any forgotten files is not possible afterwards.💭 Bonus exercise: Try to repeat the first part of this tutorial using a batch job!
More information
💡 Docs CSC: Temporary local disk areas
💡 Docs CSC: Local storage on Puhti
💡 Docs CSC: Local storage on Mahti