Path: blob/master/_slides/SRTFiles/02_HPCenv_SRT_English.srt
696 views
1 00:00:22,666 --> 00:00:26,199 We assume that a laptop computer is a familiar thing to you, 2 00:00:26,199 --> 00:00:29,350 so this comparison shows how the components of a laptop 3 00:00:29,350 --> 00:00:33,083 and a supercomputer roughly relate to each other. 4 00:00:34,066 --> 00:00:38,116 So your laptop would be something similar to one node in a cluster. 5 00:00:38,766 --> 00:00:43,766 A cluster is a collection of nodes that are linked together with fast connections. 6 00:00:43,950 --> 00:00:48,299 That collection of nodes is what makes a supercomputer so super - 7 00:00:48,299 --> 00:00:50,100 terms of computing power. 8 00:00:50,583 --> 00:00:52,616 Your laptop has a processor, 9 00:00:52,616 --> 00:00:56,500 and the equivalent of that in a supercomputer is called a socket. 10 00:00:56,783 --> 00:01:00,799 A socket or a processor has CPU's or cores. 11 00:01:01,133 --> 00:01:04,633 Terms CPU and core are used interchangeably. 12 00:01:04,633 --> 00:01:07,349 Very often it's just the core. 13 00:01:07,349 --> 00:01:11,366 Sometimes it's a CPU core and so on. 14 00:01:12,166 --> 00:01:15,983 Simply put you can think a supercomputer as a set of computers 15 00:01:15,983 --> 00:01:17,583 that are connected together. 16 00:01:17,583 --> 00:01:20,650 Each of the nodes have their own memory. 17 00:01:20,650 --> 00:01:25,400 Some nodes have also local storage which is useful in certain workflows. 18 00:01:29,583 --> 00:01:33,333 Typically a computing cluster contains a storage system, 19 00:01:33,333 --> 00:01:35,966 login nodes and compute nodes. 20 00:01:36,233 --> 00:01:41,733 The supercomputer uses the login nodes for connecting with the outside world. 21 00:01:42,083 --> 00:01:48,366 They are like fat laptops that enable users for example to browse their files in the supercomputer. 22 00:01:48,633 --> 00:01:54,033 Running heavy computations on login nodes is against the CSC usage policy. 23 00:01:56,833 --> 00:01:59,516 Then there is a central storage system 24 00:01:59,516 --> 00:02:02,733 where users can have their codes and relevant files. 25 00:02:05,133 --> 00:02:09,733 The essence of the supercomputer are of course the compute nodes. 26 00:02:09,833 --> 00:02:13,216 You are not supposed to log in in those, but they are dedicated 27 00:02:13,216 --> 00:02:16,550 for actually running the jobs and doing the heavy computing. 28 00:02:17,216 --> 00:02:21,266 Accessing the compute nodes is done through a batch job system. 29 00:02:23,116 --> 00:02:28,150 On CSC machines we use batch job scheduler called Slurm. 30 00:02:28,633 --> 00:02:33,133 There are other schedulers for example SGE, Torque or PBS, 31 00:02:33,133 --> 00:02:35,599 and they all do the same function. 32 00:02:36,250 --> 00:02:39,233 The syntax between those is a little bit different. 33 00:02:39,233 --> 00:02:42,900 If you copy-paste a batch script from the internet you have to adapt it 34 00:02:42,900 --> 00:02:45,516 to the supercomputer you are using. 35 00:02:45,750 --> 00:02:50,883 Also a different Slurm cluster may use a little bit different syntax than ours. 36 00:02:56,183 --> 00:03:00,383 CSC provides these HPC resources for customers. 37 00:03:00,383 --> 00:03:04,616 Try the links in the slides to open corresponding documentation. 38 00:03:06,000 --> 00:03:09,250 Puhti is the general purpose supercomputer. 39 00:03:09,250 --> 00:03:14,383 It has the most pre-installed applications and it can run both serial and parallel jobs. 40 00:03:15,833 --> 00:03:19,099 Mahti is a massively parallel supercomputer. 41 00:03:19,516 --> 00:03:23,483 Then Lumi will have even more resources for parallel computation. 42 00:03:25,949 --> 00:03:29,983 Pouta is a cloud resource infrastructure-as-a-service. 43 00:03:30,616 --> 00:03:33,116 There you set up virtual machines and operate them - 44 00:03:33,116 --> 00:03:35,716 so you get root access for your system. 45 00:03:36,316 --> 00:03:38,316 Rahti is a little bit similar system, 46 00:03:38,316 --> 00:03:41,566 but everything you run there should be deployed as containers. 47 00:03:42,433 --> 00:03:44,833 Allas is for storing data. 48 00:03:44,900 --> 00:03:49,116 It is managed by CSC and designed for large amounts of data. 49 00:03:49,766 --> 00:03:53,533 One of the reasons of using HPC is that you may have so much data 50 00:03:53,533 --> 00:03:56,300 that you cannot keep it in your own laptop. 51 00:04:02,416 --> 00:04:06,449 This is an open-ended question and the answer depends on your needs. 52 00:04:07,183 --> 00:04:11,233 You should figure out what kind of resources your application can use. 53 00:04:12,266 --> 00:04:16,949 Some software can use only one core which makes them serial programs. 54 00:04:17,166 --> 00:04:19,250 They should run in Puhti. 55 00:04:19,866 --> 00:04:22,666 Parallel progrms can use more than one core, 56 00:04:22,666 --> 00:04:25,733 which is what Mahti and Lumi are designed for. 57 00:04:26,716 --> 00:04:29,399 Other factors to consider are the memory requirement 58 00:04:29,399 --> 00:04:34,050 and if the application can benefit from GPU's or fast local disk. 59 00:04:34,833 --> 00:04:38,233 Every job has a limiting stage in the workflow. 60 00:04:38,483 --> 00:04:43,083 Identifying that may give insight to choosing a suitable supercomputer. 61 00:04:43,316 --> 00:04:46,300 We recommend you to find out answers to these questions 62 00:04:46,300 --> 00:04:48,983 by investigating your application. 63 00:04:49,933 --> 00:04:53,966 Then you should check which resources are available in the different supercomputers. 64 00:04:54,550 --> 00:04:57,383 If the code that you want to use is already installed, 65 00:04:57,383 --> 00:05:00,566 then probably there are also instructions on usage. 66 00:05:00,566 --> 00:05:03,433 Then you can get started immediately with running the code 67 00:05:03,433 --> 00:05:06,050 rather than installing and learning how it works. 68 00:05:06,833 --> 00:05:09,516 Different machines or different partitions have 69 00:05:09,516 --> 00:05:12,883 different maximum run times or provisioning policies. 70 00:05:13,083 --> 00:05:17,100 Some let you apply for a single core or then full nodes and so on. 71 00:05:17,966 --> 00:05:21,250 We recommend always to check the documentation. 72 00:05:21,533 --> 00:05:26,316 Feel free to contact us if the documentation does not provide you with answers. 73 00:05:33,133 --> 00:05:38,050 Together Puhti and Mahti provide users with extensive HPC resources. 74 00:05:38,283 --> 00:05:43,166 This comparison gives a quick overview that helps to decide which one to use. 75 00:05:43,850 --> 00:05:46,649 Puhti is more general use supercomputer 76 00:05:46,649 --> 00:05:50,916 and it also has a lot more applications preinstalled than Mahti. 77 00:05:51,683 --> 00:05:55,699 The links in the slides take you to the lists of the installed applications. 78 00:05:56,266 --> 00:06:00,300 Of course, you can also install your own applications if you want. 79 00:06:01,466 --> 00:06:04,050 The sizes of the nodes are quite different. 80 00:06:04,333 --> 00:06:10,199 In Puhti the each node has 40 cores or CPU's and in Mahti it has 128. 81 00:06:10,899 --> 00:06:17,399 The job size - so how many CPU cores one job can use - starts from one in Puhti. 82 00:06:17,399 --> 00:06:21,449 That means you can run both serial or parallel jobs there. 83 00:06:22,683 --> 00:06:25,600 In Mahti the resources are allocated by nodes, 84 00:06:25,600 --> 00:06:29,083 so the minimum amount that you can get is one full node. 85 00:06:29,316 --> 00:06:31,616 That means your job should be able to use 86 00:06:31,616 --> 00:06:34,600 at least one hundred and twenty eight cores in parallel. 87 00:06:34,633 --> 00:06:40,116 This is what Mahti is really meant for - to run this massively parallel jobs. 88 00:06:41,350 --> 00:06:46,500 If your job needs a lot of memory, then we have these huge memory nodes in Puhti. 89 00:06:46,699 --> 00:06:51,166 Certain kind of applications will run a lot faster with a lot of memory. 90 00:06:52,399 --> 00:06:55,983 In Mahti, there's a lot less memory per core available, 91 00:06:55,983 --> 00:06:58,916 but that is enough for most applications. 92 00:06:59,216 --> 00:07:03,883 In comparison your laptop has propably 8 or 16 gigs of RAM. 93 00:07:06,566 --> 00:07:09,899 Both machines have Nvidia GPU cards. 94 00:07:09,899 --> 00:07:13,949 These are extremely good for a certain machine learning jobs. 95 00:07:14,466 --> 00:07:18,016 Later we will cover whether it makes sense to use GPUs 96 00:07:18,016 --> 00:07:20,300 or should you use CPUs instead. 97 00:07:21,783 --> 00:07:25,766 In Puhti we have 120 nodes with fast local disk. 98 00:07:26,183 --> 00:07:30,000 On Mahti only the GPU nodes have this local disk. 99 00:07:31,133 --> 00:07:35,033 To summarize: Puhti is a general purpose supercomputer. 100 00:07:35,366 --> 00:07:39,199 One node in Puhti is around 10 times faster than your laptop, 101 00:07:39,199 --> 00:07:41,516 and there are many of those nodes. 102 00:07:41,949 --> 00:07:44,600 Mahti is for large parallel jobs. 103 00:07:44,600 --> 00:07:49,516 But if you want to use that, then be prepared to install and optimize your code.