Path: blob/master/part-1/allas/allas-bio-data.md
696 views
---
---
Using Allas in CSC's HPC environment
Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.
Go to https://www.puhti.csc.fi and login with your account.
Configure an Allas S3 connection using the Cloud storage configuration tool.
You need to first authenticate by providing your CSC password.
If you have several projects available, choose one that you want to use in this exercise.
Once you've configured a connection, select
s3allas-project_<id>
from the Files dropdown menu in the top navigation bar. Replace<id>
with the number of the project you chose to use (e.g. 2001234).During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.
1. Login to Puhti
Login to Puhti (open a login node shell if using the web interface):
In Puhti, check your environment with the command:
Move to the
/scratch
directory of your projectCreate your own subdirectory named with your username:
Move to the directory:
2. Download data with wget
Next, download a dataset and uncompress it
The dataset contains some pythium genomes with related BWA indexes
3. Using Allas
Open a connection to Allas:
If you have several Allas projects available, select the same project as earlier
Upload case 1: rclone
Upload the data from Puhti to Allas with
rclone
:How long did the data upload take?
What was the transfer rate?
How long would it take to transfer 100 GiB assuming the same speed?
Study what you have uploaded to Allas with the commands:
Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
In the Puhti web interface, go to the Files app and select
s3allas-project_<id>
to list the buckets of your project (replace<id>
as needed).Locate your own
$USER-genomes-rc
bucket and download one of the uploaded fasta files to your local computer
š” You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone
Upload case 2: a-put
Upload the pythium directory from Puhti to Allas using a-commands
Case 1: Store everything as a single object (replace
<project number>
with your CSC project number, e.g. 2001234):Case 2: Each subdirectory (species) as a separate object (replace
<project number>
with your CSC project number, e.g. 2001234):Case 3: Use a custom bucket name (replace
<project number>
with your project number, e.g. 2001234):Can you see the difference between the three
a-put
commands above?Study the
<project number>-$USER-genomes-ap
bucket with commands:Why do the two commands above list a different amount of objects?
Try the command (replace
<project number>
with your project number, e.g. 2001234):This command is actually the same as:
Finally, try the command:
Try opening the public link that
a-flip
produced with your browser
Upload case 3: allas-backup
Run the commands:
What did these commands do to your data?
4. Exit
The data in the
pythium
directory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:
5. Downloading data from Allas to Puhti
Login to Puhti and move to your personal directory in your project's
/scratch
:In Puhti, check you projects with the command:
Set up the Allas connection:
Then run the commands (we will use the same bucket that was created earlier):
Next, download the data in different ways:
1. Download with rclone
Copy everything:
Copy a set of objects:
Copy just one object:
2. Download with a-get
Return to your
$USER
directory under your project's/scratch
on Puhti (Thepwd
command should print/scratch/<project/$USER
):Make a new directory:
Create a directory
all
and move there:List your default
SCRATCH
bucket (replace<project number>
with your project number, e.g. 2001234):Look for the file
pythium_vexans.fasta
in your PuhtiSCRATCH
bucket:Download the full dataset with command:
Check what you got:
Now, download just a single genome dataset:
3. Downloading data from allas-backup
Return to your main scratch directory and make a new directory:
Use the commands below to find out the ID of the most recent backup version of your pythium directory:
Use
allas-backup restore
to download the data: