CoCalc -- 02_HPCenv_SRT_English

GitHub Repository: csc-training/csc-env-eff
Path: blob/master/_slides/SRTFiles/02_HPCenv_SRT_English_mac.srt
⁶⁹⁶ views

1
00:00:22,666 --> 00:00:26,199
We assume that a laptop computer is a familiar thing to you,

2
00:00:26,199 --> 00:00:29,350
so this comparison shows how the components of a laptop

3
00:00:29,350 --> 00:00:33,083
and a supercomputer roughly relate to each other.

4
00:00:34,066 --> 00:00:38,116
So your laptop would be something similar to one node in a cluster.

5
00:00:38,766 --> 00:00:43,766
A cluster is a collection of nodes that are linked together with fast connections.

6
00:00:43,950 --> 00:00:50,100
That collection of nodes is what makes a supercomputer so super in terms of computing power.

7
00:00:50,583 --> 00:00:56,500
Your laptop has a processor and the equivalent of that in a supercomputer is called a socket.

8
00:00:56,783 --> 00:01:00,799
A socket or a processor has CPU's or cores.

9
00:01:01,133 --> 00:01:04,633
Terms CPU and core are used interchangeably.

10
00:01:04,633 --> 00:01:07,349
Very often it's just the core.

11
00:01:07,349 --> 00:01:11,366
Sometimes it's a CPU core and so on.

12
00:01:11,783 --> 00:01:17,200
Simply put you can think a supercomputer as a set of computers that are connected together.

13
00:01:17,516 --> 00:01:20,900
Each of the nodes have their own memory.

14
00:01:21,066 --> 00:01:25,816
Some nodes have also local storage which is useful in certain workflows.

15
00:01:29,583 --> 00:01:35,966
Typically a computing cluster contains a storage system, login nodes and compute nodes.

16
00:01:36,233 --> 00:01:41,733
The supercomputer uses the login nodes for connecting with the outside world.

17
00:01:42,083 --> 00:01:48,366
They are like fat laptops that enable users for example to browse their files in the supercomputer.

18
00:01:48,633 --> 00:01:54,033
Running heavy computations on login nodes is against the CSC usage policy.

19
00:01:56,833 --> 00:02:02,733
Then there is a central storage system where users can have their codes and relevant files.

20
00:02:05,133 --> 00:02:09,733
The essence of the supercomputer are of course the compute nodes.

21
00:02:09,833 --> 00:02:13,216
You are not supposed to log in in those, but they are dedicated

22
00:02:13,216 --> 00:02:16,550
for actually running the jobs and doing the heavy computing.

23
00:02:17,216 --> 00:02:21,266
Accessing the compute nodes is done through a batch job system.

24
00:02:23,116 --> 00:02:28,150
On CSC machines we use batch job scheduler called Slurm.

25
00:02:28,633 --> 00:02:35,599
There are other schedulers for example SGE, Torque or PBS, and they all do the same function.

26
00:02:36,250 --> 00:02:39,233
The syntax between those is a little bit different.

27
00:02:39,233 --> 00:02:42,900
If you copy-paste a batch script from the internet you have to adapt it

28
00:02:42,900 --> 00:02:45,516
to the supercomputer you are using.

29
00:02:45,750 --> 00:02:50,883
Also a different Slurm cluster may use a little bit different syntax than ours.

30
00:02:55,650 --> 00:02:59,849
CSC provides these HPC resources for customers.

31
00:03:00,000 --> 00:03:04,050
Try the links in the slides to open corresponding documentation.

32
00:03:05,050 --> 00:03:08,300
Puhti is the general purpose supercomputer.

33
00:03:08,483 --> 00:03:14,383
It has the most pre-installed applications and it can run both serial and parallel jobs.

34
00:03:14,933 --> 00:03:18,199
Mahti is a massively parallel supercomputer.

35
00:03:19,050 --> 00:03:23,066
Later in 2021 Lumi will come available with even more resources or parallel computation.

36
00:03:25,949 --> 00:03:29,983
Pouta is a cloud resource infrastructure-as-a-service.

37
00:03:30,616 --> 00:03:34,666
There you set up virtual machines and operate them so you get root access for your system.

38
00:03:36,316 --> 00:03:40,333
Rahti is a little bit similar system but everything you run there should be deployed as containers.

39
00:03:42,433 --> 00:03:44,833
Allas is for storing data.

40
00:03:44,900 --> 00:03:49,116
It is managed by CSC and designed for large amounts of data.

41
00:03:50,099 --> 00:03:53,533
One of the reasons of using HPC is that you may have so much data

42
00:03:53,533 --> 00:03:56,300
that you cannot keep it in your own laptop.

43
00:04:02,416 --> 00:04:06,449
This is an open-ended question and the answer depends on your needs.

44
00:04:07,183 --> 00:04:11,233
You should figure out what kind of resources your application can use.

45
00:04:12,266 --> 00:04:16,949
Some software can use only one core which makes them serial programs.

46
00:04:17,166 --> 00:04:19,250
They should run in Puhti.

47
00:04:19,866 --> 00:04:25,733
Parallel progrms can use more than one core, which is what Mahti and Lumi are designed for.

48
00:04:26,716 --> 00:04:29,399
Other factors to consider are the memory requirement

49
00:04:29,399 --> 00:04:34,050
and if the application can benefit from GPU's or fast local disk.

50
00:04:34,833 --> 00:04:38,233
Every job has a limiting stage in the workflow.

51
00:04:38,483 --> 00:04:43,083
Identifying that may give insight to choosing a suitable supercomputer.

52
00:04:43,316 --> 00:04:48,983
We recommend you to find out answers to these questions by investigating your application.

53
00:04:49,933 --> 00:04:53,966
Then you can see which resources are available in the different supercomputers.

54
00:04:54,550 --> 00:04:57,383
If the code that you want to use is already installed,

55
00:04:57,383 --> 00:05:00,566
then probably there are also instructions on usage.

56
00:05:00,566 --> 00:05:03,433
Then you can get started immediately with running the code

57
00:05:03,433 --> 00:05:06,050
rather than installing and learning how it works.

58
00:05:06,833 --> 00:05:09,516
Different machines or different partitions have

59
00:05:09,516 --> 00:05:12,883
different maximum run times or provisioning policies.

60
00:05:13,083 --> 00:05:17,100
Some let you apply for a single core or then full nodes and so on.

61
00:05:17,966 --> 00:05:21,250
We recommend always to check the documentation.

62
00:05:22,000 --> 00:05:26,666
Feel free to contact us if the documentation does not provide you with answers.

63
00:05:33,133 --> 00:05:38,050
Together Puhti and Mahti provide users with extensive HPC resources.

64
00:05:38,283 --> 00:05:42,316
This comparison gives a quick overview that helps to decide which one to use.

65
00:05:43,850 --> 00:05:46,649
Puhti is more general use supercomputer

66
00:05:46,649 --> 00:05:50,916
and it also has a lot more applications preinstalled than Mahti.

67
00:05:51,683 --> 00:05:55,699
The links in the slides take you to the lists of the installed applications.

68
00:05:56,266 --> 00:06:00,300
Of course, you can also install your own applications if you want.

69
00:06:01,466 --> 00:06:04,050
The sizes of the nodes are quite different.

70
00:06:04,333 --> 00:06:10,199
In Puhti the each node has 40 cores or CPU's and in Mahti it has 128.

71
00:06:10,899 --> 00:06:17,399
The job size - so how many CPU cores one job can use - starts from one in Puhti.

72
00:06:17,399 --> 00:06:21,449
That means you can run both serial or parallel jobs there.

73
00:06:22,683 --> 00:06:25,600
In Mahti the resources are allocated by nodes,

74
00:06:25,600 --> 00:06:29,083
so the minimum amount that you can get is one full node.

75
00:06:29,316 --> 00:06:31,616
That means your job should be able to use

76
00:06:31,616 --> 00:06:34,600
at least one hundred and twenty eight cores in parallel.

77
00:06:34,633 --> 00:06:39,750
This is what Mahti is really meant for - to run this massively parallel jobs.

78
00:06:41,350 --> 00:06:46,500
If your job needs a lot of memory, then we have these huge memory nodes in Puhti.

79
00:06:46,699 --> 00:06:51,166
Certain kind of applications will run a lot faster with a lot of memory.

80
00:06:52,399 --> 00:06:58,916
In Mahti, there's a lot less memory per core available, but that is enough for most applications.

81
00:06:59,216 --> 00:07:03,883
In comparison your laptop has propably 8 or 16 gigs of RAM.

82
00:07:06,566 --> 00:07:09,899
Both machines have Nvidia GPU cards.

83
00:07:09,899 --> 00:07:13,949
These are extremely good for a certain machine learning jobs.

84
00:07:14,466 --> 00:07:20,300
Later we will cover whether it makes sense to use GPUs or should you use CPUs instead.

85
00:07:21,783 --> 00:07:25,766
In Puhti we have 120 nodes with fast local disk. On Mahti only the GPU nodes have this local disk.

86
00:07:26,183 --> 00:07:30,000
In Puhti we have 120 nodes with fast local disk. On Mahti only the GPU nodes have this local disk.

87
00:07:30,949 --> 00:07:34,850
To summarize: Puhti is a general purpose supercomputer.

88
00:07:34,899 --> 00:07:41,366
One node in Puhti is around 10 times faster than your laptop, and there are many of those nodes.

89
00:07:41,949 --> 00:07:44,600
Mahti is for large parallel jobs.

90
00:07:44,600 --> 00:07:49,516
But if you want to use that, then be prepared to install and optimize your code.

Product

Resources

Company