Path: blob/master/_slides/SRTFiles/08_Installing_SRT_English_mac.srt
696 views
1 00:00:21,783 --> 00:00:27,116 The take home message is that you can install software on CSC supercomputers. 2 00:00:27,683 --> 00:00:32,700 There might be an application that is not already installed on Puhti or Mahti. 3 00:00:33,299 --> 00:00:37,333 Or you might have implemented a program yourself that you want to use. 4 00:00:38,116 --> 00:00:43,166 This lecture gives an overview of different types of software and installation methods. 5 00:00:43,950 --> 00:00:50,700 Remember that you can always contact [email protected] for guidance. 6 00:00:51,116 --> 00:00:55,033 Epecially if the program you are trying to install is very complicated 7 00:00:55,033 --> 00:00:58,416 or requires compiling it from a source code. 8 00:01:05,633 --> 00:01:10,166 The most important thing to begin with is to read the software documentation. 9 00:01:10,866 --> 00:01:13,616 How the software is installed depends a lot on 10 00:01:13,616 --> 00:01:16,516 what type of code you are trying to install. 11 00:01:16,883 --> 00:01:21,966 It might be just a Python package, or then it is a large parallel program. 12 00:01:22,616 --> 00:01:26,400 Note that in general simple copy and paste options from the internet 13 00:01:26,400 --> 00:01:29,483 do not work in an HPC environment. 14 00:01:29,900 --> 00:01:34,166 Before you invest a lot of effort and time into trying to install something, 15 00:01:34,166 --> 00:01:38,483 it is a better option to check out the CSC application list. 16 00:01:38,833 --> 00:01:42,933 Also remember that with command module spider you can see software available 17 00:01:42,933 --> 00:01:46,333 in CSC supercomputing environment. 18 00:01:46,849 --> 00:01:49,616 It may well be that there is some other program 19 00:01:49,633 --> 00:01:53,583 - similar to the one you would like to use - already installed. 20 00:02:01,133 --> 00:02:07,533 If your application is a single binary file you can try to run it on the CSC supercomputer. 21 00:02:08,050 --> 00:02:13,199 Usually these binaries may not be suitable or efficient in HPC environments. 22 00:02:13,816 --> 00:02:17,383 If they are compiled on a laptop they are optimized for a laptop 23 00:02:17,383 --> 00:02:19,733 and not for a supercomputer. 24 00:02:20,183 --> 00:02:25,550 Especially if your program is massively parallelized - like MPI codes which run on 25 00:02:25,550 --> 00:02:29,966 multiple nodes - it must be recompiled for the best performance. 26 00:02:30,583 --> 00:02:35,616 Binary file might be the only option if the source code is not available. 27 00:02:35,866 --> 00:02:41,066 They might work fine if the program has been compiled on an identical HPC system, 28 00:02:41,066 --> 00:02:44,300 or if it is a very simple program that runs for example 29 00:02:44,300 --> 00:02:47,349 lightweight serial or threaded processes. 30 00:02:56,699 --> 00:03:00,099 Different kind of programs can be categorised based on 31 00:03:00,099 --> 00:03:02,633 the used programming languages. 32 00:03:03,116 --> 00:03:07,683 The highest level of programming languages are the interpreted languages. 33 00:03:08,300 --> 00:03:14,166 For example Python, Java, and R, are all interpreted coding languages. 34 00:03:14,800 --> 00:03:17,916 The main advantage of those is that they are easy to program 35 00:03:17,916 --> 00:03:20,383 and they do not need to be compiled. 36 00:03:21,083 --> 00:03:22,833 If performance is an issue, 37 00:03:22,833 --> 00:03:27,866 e.g. Python can be compiled to a faster C-code using cython. 38 00:03:28,449 --> 00:03:32,266 Programming and running programs based on interpreted languages 39 00:03:32,266 --> 00:03:36,933 usually require loading specific modules on the CSC supercomputers. 40 00:03:37,583 --> 00:03:43,599 E.g. Python module enables access to some of the most important Python libraries. 41 00:03:44,133 --> 00:03:49,166 It also ensures that the software behaves the same on all the compute nodes. 42 00:03:50,966 --> 00:03:54,849 If the computational problem you are dealing with is very complicated, 43 00:03:54,849 --> 00:03:58,199 then interpreted languages are usually not the best option, 44 00:03:58,199 --> 00:04:01,900 because they are not efficient for numerically heavy tasks. 45 00:04:02,550 --> 00:04:07,449 To do heavy tasks somewhat efficiently with interpreted languages make sure you 46 00:04:07,449 --> 00:04:12,033 utilize libraries to perform some of the most demanding computational tasks. 47 00:04:12,633 --> 00:04:15,866 There might be high performance computing libraries, 48 00:04:15,866 --> 00:04:21,166 or libraries that have been written using HPC programming languages. 49 00:04:29,766 --> 00:04:32,883 These programming languages do not run as such, 50 00:04:32,883 --> 00:04:36,850 but when they are compiled and run, they have great performance. 51 00:04:37,433 --> 00:04:44,100 Examples of HPC languages are e.g. C, C++, and FORTRAN. 52 00:04:44,683 --> 00:04:50,133 It requires some experience to write efficient codes with HPC languages. 53 00:04:50,733 --> 00:04:55,066 They are not so easy to program and the syntax might not be so simple. 54 00:04:55,716 --> 00:04:59,133 Most of very resource intensive scientific applications 55 00:04:59,133 --> 00:05:03,216 have been programmed using HPC languages, because they are good 56 00:05:03,216 --> 00:05:07,250 in heavy number crunching, linear algebra and such tasks. 57 00:05:07,850 --> 00:05:11,483 If someone - a person involved with method development - 58 00:05:11,483 --> 00:05:13,766 has already implemented the program, 59 00:05:13,766 --> 00:05:17,016 then researchers need only to compile the code. 60 00:05:17,416 --> 00:05:22,433 If you consider compiling complicated, contact servicedesk. 61 00:05:30,433 --> 00:05:33,100 There are some things to keep in mind when installing 62 00:05:33,100 --> 00:05:36,500 custom software on the CSC supercomputers. 63 00:05:37,149 --> 00:05:40,933 It is not the same as installing something on your own computer. 64 00:05:41,600 --> 00:05:45,750 For example you can not use sudo for getting root privileges. 65 00:05:46,466 --> 00:05:50,250 Also you can not use package managers like apt or yum. 66 00:05:50,983 --> 00:05:54,800 Then you don't have permissions to install into default locations. 67 00:05:55,566 --> 00:05:58,766 Instead custom installations should go to the PROJAPPL 68 00:05:58,766 --> 00:06:01,166 directory - where you have write access. 69 00:06:02,733 --> 00:06:05,416 When you start to compile your application, 70 00:06:05,416 --> 00:06:09,500 you should first load required compiler suites or language modules. 71 00:06:10,016 --> 00:06:15,033 When you install custom software, it is not automatically added to the PATH. 72 00:06:15,649 --> 00:06:18,300 That means you can't run the program from anywhere 73 00:06:18,300 --> 00:06:21,283 without providing the path to the binary file. 74 00:06:31,116 --> 00:06:36,166 If you install software directly to the system it is called native installation. 75 00:06:36,616 --> 00:06:41,383 That would be for example the binary file that you have compiled and run. 76 00:06:41,966 --> 00:06:47,000 It is a good way of installing if the program does not have many dependencies. 77 00:06:55,616 --> 00:06:59,383 Applications can also be available as containers. 78 00:07:00,100 --> 00:07:05,116 Containers provide a convenient way of installing software and its dependencies. 79 00:07:05,733 --> 00:07:08,766 There are many containers available in public. 80 00:07:09,399 --> 00:07:14,449 Please consider using containers even if other installation methods were available. 81 00:07:14,916 --> 00:07:19,250 See the container lecture and the documentation for more information. 82 00:07:28,333 --> 00:07:33,083 Conda is a package manager used especially for many Python packages. 83 00:07:33,716 --> 00:07:39,733 It is very popular, but unfortunately it is also problematic in HPC systems. 84 00:07:40,250 --> 00:07:43,399 A Conda environment contains a huge number of files 85 00:07:43,399 --> 00:07:46,483 which really slows down Lustre file system. 86 00:07:46,916 --> 00:07:51,300 They are also difficult to maintain because of many dependencies. 87 00:07:51,949 --> 00:07:54,766 Containers are a preferred installation method, 88 00:07:54,766 --> 00:07:57,866 because they package everything in a single file. 89 00:07:58,333 --> 00:08:04,283 Also Conda installations can be put into a container and used in HPC environment. 90 00:08:13,816 --> 00:08:18,266 Check out the batch job lecture for details on submitting batch jobs. 91 00:08:19,000 --> 00:08:21,866 Test the software with short batch jobs. 92 00:08:22,583 --> 00:08:26,383 It you submit a long and heavy job with no knowledge of how it works, 93 00:08:26,383 --> 00:08:28,699 you are wasting time and resources 94 00:08:28,699 --> 00:08:32,166 - not only yours but other users and admins as well. 95 00:08:32,783 --> 00:08:36,200 Try to reproduce some results that you know beforehand. 96 00:08:36,933 --> 00:08:40,516 That way you can deduce if the program works as intended. 97 00:08:41,266 --> 00:08:44,983 Some software might have a tutorial that you can run. 98 00:08:45,633 --> 00:08:50,299 Run your tests e.g. in the test queue or in an interactive session. 99 00:08:50,833 --> 00:08:56,366 Check also the efficiency of the code by comparing it to your previous compilations. 100 00:08:56,366 --> 00:09:01,383 In other words, be sure to benchmark your new software. 101 00:09:08,683 --> 00:09:13,716 The tutorials about installing software have some easy-to-follow examples. 102 00:09:14,216 --> 00:09:18,733 Be sure to check out the documentation in DocsCSC pages. 103 00:09:19,250 --> 00:09:23,783 There is information on how to compile applications in Puhti or Mahti, 104 00:09:23,783 --> 00:09:27,066 and how to use different HPC libraries. 105 00:09:27,533 --> 00:09:31,383 There is also guidelines for loading different compilers. 106 00:09:32,033 --> 00:09:36,750 For example whether to compile using an Intel or GNU compiler, what kind of 107 00:09:36,750 --> 00:09:43,000 optimization flags you can use, or how to link different HPC libraries to your codes. 108 00:09:43,533 --> 00:09:47,433 There is of course a lot of information available online. 109 00:09:47,933 --> 00:09:51,116 If you get some errors or problems in the compilation, 110 00:09:51,116 --> 00:09:54,483 copy paste the error messages and Google for some tips. 111 00:09:55,083 --> 00:09:59,616 Also be sure to contact the servicedesk in case of any issues.