---
---
Disk areas in CSC's HPC environment {.title}
In this section, you will learn how to work in different disk areas in CSC's HPC environment
Overview of disk areas
Main disk areas and their specific uses on Puhti and Mahti
Moving data between supercomputers
Understanding quotas (available space and number of files) of different disk areas
Additional fast disk areas
Disk and storage overview
Main disk areas in Puhti/Mahti
Home directory (
$HOME
)Other users cannot access your home directory
ProjAppl directory (
/projappl/project_name
)Shared with project members
Possible to limit access (
chmod g-rw
) to subfolders
Scratch directory (
/scratch/project_name
)Shared with project members
Files older than 180 days will be automatically removed
These directories reside on the Lustre parallel file system
Default quotas and more info in disk areas section of Docs CSC
Moving data between and to/from supercomputers
Puhti and Mahti have separate file systems
Data can be moved between the supercomputers
There are many ways to transfer data between the CSC supercomputers and your local computer
Displaying current status of disk areas
Use the
csc-workspaces
command to show available projects and quotas
{width=50%}
Disk and storage overview (revisited)
Additional fast local disk areas
Each of the login nodes have 2900 GiB of fast local storage in
$TMPDIR
The local disk is meant for temporary storage (e.g. compiling software) and is cleaned frequently
NVMe disks on some compute nodes on Puhti and Mahti
Interactive, I/O and GPU nodes have fast local disks (NVMe) in
$LOCAL_SCRATCH
You must copy data to and from the fast disk during your batch job since the NVMe is accessible only during your job allocation
If your job reads and/or writes a lot of small files, using this can give a huge performance boost!
What are the different disk areas for?
Allas -- for data which is not actively used
$HOME
-- small, only for the most important (small) files, personal access only/scratch
-- main working area, shared with project members, only for data in active use/projappl
-- not cleaned up, e.g. for shared binariesLogin node
$TMPDIR
-- compiling, temporary storage, fast I/OCompute node NVMe
$LOCAL_SCRATCH
-- fast I/O in batch jobs
Best practices
None of the disk areas are automatically backed up by CSC, so make sure to perform regular backups to, e.g., Allas
Don't run databases or Conda on Lustre (
/projappl
,/scratch
,$HOME
)Don't create a lot of files, especially within a single folder
If you're creating 10 000+ files, you should probably rethink your workflow
Consider using fast local disks when working with many small files
Lustre best practices and efficient I/O in high-throughput workflows