GitHub Repository: csc-training/csc-env-eff
Path: blob/master/part-1/allas/allas-bio-data.md
⁶⁹⁶ views

---

layout: default
title: Using Allas with bio data
parent: 7. Allas
grand_parent: Part 1
nav_order: 3
has_children: false
has_toc: false
permalink: /hands-on/allas/allas-tutorial.html

---

Using Allas in CSC's HPC environment

Before the actual exercise, open a view to the Allas service in your browser using the Puhti web interface.

Go to https://www.puhti.csc.fi and login with your account.
Configure an Allas S3 connection using the Cloud storage configuration tool.
- You need to first authenticate by providing your CSC password.
- If you have several projects available, choose one that you want to use in this exercise.
Once you've configured a connection, select s3allas-project_<id> from the Files dropdown menu in the top navigation bar. Replace <id> with the number of the project you chose to use (e.g. 2001234).
During the exercise, you can use this web interface to get another view to the buckets and objects in Allas.

ssh <username>@puhti.csc.fi    # replace <username> with your CSC username

In Puhti, check your environment with the command:
```
csc-workspaces
```

Move to the /scratch directory of your project

cd /scratch/<project>  # replace <project> with your CSC project, e.g. project_2001234

Create your own subdirectory named with your username:
```
mkdir -p $USER
```
Move to the directory:
```
cd $USER
```

2. Download data with `wget`

Next, download a dataset and uncompress it
- The dataset contains some pythium genomes with related BWA indexes
```
wget https://a3s.fi/course_12.11.2019/pythium.tgz
tar -xzvf pythium.tgz  
tree pythium
```

3. Using Allas

Open a connection to Allas:
```
module load allas
allas-conf
```
If you have several Allas projects available, select the same project as earlier

Upload case 1: `rclone`

Upload the data from Puhti to Allas with rclone:
```
rclone -P copyto pythium allas:$USER-genomes-rc/
```
- How long did the data upload take?
- What was the transfer rate?
- How long would it take to transfer 100 GiB assuming the same speed?

Study what you have uploaded to Allas with the commands:

rclone lsd allas:
rclone ls allas:$USER-genomes-rc/
rclone lsl allas:$USER-genomes-rc/
rclone lsf allas:$USER-genomes-rc/

Check how this looks like in the Puhti web interface. Open a browser and go to https://www.puhti.csc.fi/
In the Puhti web interface, go to the Files app and select s3allas-project_<id> to list the buckets of your project (replace <id> as needed).
Locate your own $USER-genomes-rc bucket and download one of the uploaded fasta files to your local computer

💡 You can read more about moving files at Docs CSC: Copying files using scp and Moving data with rclone

Upload case 2: `a-put`

Upload the pythium directory from Puhti to Allas using a-commands

Case 1: Store everything as a single object (replace <project number> with your CSC project number, e.g. 2001234):

a-put pythium      
a-list
a-list <project number>-puhti-SCRATCH
a-info <project number>-puhti-SCRATCH/$USER/pythium.tar

Case 2: Each subdirectory (species) as a separate object (replace <project number> with your CSC project number, e.g. 2001234):

a-put pythium/*
a-list <project number>-puhti-SCRATCH 
a-check pythium/*
a-info <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar

Case 3: Use a custom bucket name (replace <project number> with your project number, e.g. 2001234):

a-put pythium/* -b <project number>-$USER-genomes-ap
a-list <project number>-$USER-genomes-ap

Can you see the difference between the three a-put commands above?

Study the <project number>-$USER-genomes-ap bucket with commands:

a-list <project number>-$USER-genomes-ap
rclone ls allas:<project number>-$USER-genomes-ap

Why do the two commands above list a different amount of objects?
Try the command (replace <project number> with your project number, e.g. 2001234):
```
a-info <project number>-$USER-genomes-ap/pythium_vexans.tar
```

This command is actually the same as:

rclone cat allas:<project number>-$USER-genomes-ap/pythium_vexans.tar_ameta

Finally, try the command:

a-flip pythium/pythium_vexans/pythium_vexans.fasta

Try opening the public link that a-flip produced with your browser

Upload case 3: `allas-backup`

Run the commands:

allas-backup -help
allas-backup pythium
allas-backup list

What did these commands do to your data?

4. Exit

The data in the pythium directory is now stored in many ways in Allas, so we can remove the data from Puhti and log out:
```
rm -r pythium
exit
```

5. Downloading data from Allas to Puhti

ssh <username>@puhti.csc.fi   # replace <username> with your CSC username
cd /scratch/<project>/$USER   # replace `<project>` with your CSC project, e.g. project_2001234

In Puhti, check you projects with the command:
```
csc-workspaces
```
Set up the Allas connection:
```
module load allas
allas-conf 
```

Then run the commands (we will use the same bucket that was created earlier):

a-list
rclone lsd allas:
# replace <project number> with your project number, e.g. 2001234
a-list <project number>-$USER-genomes-ap
rclone ls allas:<project number>-$USER-genomes-ap
a-find pythium_vexans.fasta
a-find -a pythium_vexans.fasta

Next, download the data in different ways:

1. Download with `rclone`

Copy everything:

mkdir rclone_dir
cd rclone_dir/
mkdir all
rclone ls allas:<project number>-$USER-genomes-ap
rclone copyto -P allas:<project number>-$USER-genomes-ap all/
ls all

Copy a set of objects:

mkdir vexans 
rclone copyto allas:$USER-genomes-rc/pythium_vexans vexans/
ls vexans

Copy just one object:

rclone copyto allas:$USER-genomes-rc/pythium_vexans/pythium_vexans.fasta ./vexans.fasta
ls

2. Download with `a-get`

Return to your $USER directory under your project's /scratch on Puhti (The pwd command should print /scratch/<project/$USER):
```
cd ..
pwd
```
Make a new directory:
```
mkdir a_dir
cd a_dir/
```
Create a directory all and move there:
```
mkdir all
cd all
```
List your default SCRATCH bucket (replace <project number> with your project number, e.g. 2001234):
```
a-list <project number>-puhti-SCRATCH
a-list <project number>-puhti-SCRATCH/$USER
```

Look for the file pythium_vexans.fasta in your Puhti SCRATCH bucket:

a-find pythium_vexans.fasta -b <project number>-puhti-SCRATCH    # replace <project number> with your project number, e.g. 2001234

Download the full dataset with command:

a-get <project number>-puhti-SCRATCH/$USER/pythium.tar   # replace <project number> with your project number, e.g. 2001234

Check what you got:
```
ls -l
ls -R
```

Now, download just a single genome dataset:

cd ..
a-get <project number>-puhti-SCRATCH/$USER/pythium/pythium_vexans.tar   # replace <project number> with your project number, e.g. 2001234
ls -l pythium/
ls -l pythium/pythium_vexans/

3. Downloading data from `allas-backup`

Return to your main scratch directory and make a new directory:
```
cd ..
mkdir a_backup
cd a_backup/
```
Use the commands below to find out the ID of the most recent backup version of your pythium directory:
```
allas-backup list 
allas-backup list | grep $USER
```

Use allas-backup restore to download the data:

allas-backup restore <id string>   # replace <id string> with the ID of your backup snapshot
ls -l
ls -l pythium

Using Allas in CSC's HPC environment

2. Download data with `wget`

3. Using Allas

Upload case 1: `rclone`

Upload case 2: `a-put`

Upload case 3: `allas-backup`

4. Exit

5. Downloading data from Allas to Puhti

1. Download with `rclone`

2. Download with `a-get`

3. Downloading data from `allas-backup`

Product

Resources

Company

Using Allas in CSC's HPC environment

1. Login to Puhti

2. Download data with wget

3. Using Allas

Upload case 1: rclone

Upload case 2: a-put

Upload case 3: allas-backup

4. Exit

5. Downloading data from Allas to Puhti

1. Download with rclone

2. Download with a-get

3. Downloading data from allas-backup

2. Download data with `wget`

Upload case 1: `rclone`

Upload case 2: `a-put`

Upload case 3: `allas-backup`

1. Download with `rclone`

2. Download with `a-get`

3. Downloading data from `allas-backup`