Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
csc-training
GitHub Repository: csc-training/csc-env-eff
Path: blob/master/_slides/SRTFiles/08_Installing_SRT_English_mac.srt
696 views
1
00:00:21,783 --> 00:00:27,116
The take home message is that you can install software on CSC supercomputers. 

2
00:00:27,683 --> 00:00:32,700
There might be an application that is not already installed on Puhti or Mahti.

3
00:00:33,299 --> 00:00:37,333
Or you might have implemented a program yourself that you want to use.

4
00:00:38,116 --> 00:00:43,166
This lecture gives an overview of different types of software and installation methods.

5
00:00:43,950 --> 00:00:50,700
Remember that you can always contact [email protected] for guidance.

6
00:00:51,116 --> 00:00:55,033
Epecially if the program you are trying to install is very complicated 

7
00:00:55,033 --> 00:00:58,416
or requires compiling it from a source code. 

8
00:01:05,633 --> 00:01:10,166
The most important thing to begin with is to read the software documentation. 

9
00:01:10,866 --> 00:01:13,616
How the software is installed depends a lot on 

10
00:01:13,616 --> 00:01:16,516
what type of code you are trying to install. 

11
00:01:16,883 --> 00:01:21,966
It might be just a Python package, or then it is a large parallel program.

12
00:01:22,616 --> 00:01:26,400
Note that in general simple copy and paste options from the internet

13
00:01:26,400 --> 00:01:29,483
do not work in an HPC environment. 

14
00:01:29,900 --> 00:01:34,166
Before you invest a lot of effort and time into trying to install something, 

15
00:01:34,166 --> 00:01:38,483
it is a better option to check out the CSC application list.

16
00:01:38,833 --> 00:01:42,933
Also remember that with command module spider you can see software available

17
00:01:42,933 --> 00:01:46,333
in CSC supercomputing environment.

18
00:01:46,849 --> 00:01:49,616
It may well be that there is some other program

19
00:01:49,633 --> 00:01:53,583
 - similar to the one you would like to use - already installed.

20
00:02:01,133 --> 00:02:07,533
If your application is a single binary file you can try to run it on the CSC supercomputer. 

21
00:02:08,050 --> 00:02:13,199
Usually these binaries may not be suitable or efficient in HPC environments. 

22
00:02:13,816 --> 00:02:17,383
If they are compiled on a laptop they are optimized for a laptop

23
00:02:17,383 --> 00:02:19,733
and not for a supercomputer. 

24
00:02:20,183 --> 00:02:25,550
Especially if your program is massively parallelized - like MPI codes which run on 

25
00:02:25,550 --> 00:02:29,966
multiple nodes - it must be recompiled for the best performance. 

26
00:02:30,583 --> 00:02:35,616
Binary file might be the only option if the source code is not available.

27
00:02:35,866 --> 00:02:41,066
They might work fine if the program has been compiled on an identical HPC system,

28
00:02:41,066 --> 00:02:44,300
or if it is a very simple program that runs for example 

29
00:02:44,300 --> 00:02:47,349
lightweight serial or threaded processes.

30
00:02:56,699 --> 00:03:00,099
Different kind of programs can be categorised based on 

31
00:03:00,099 --> 00:03:02,633
the used programming languages. 

32
00:03:03,116 --> 00:03:07,683
The highest level of programming languages are the interpreted languages.

33
00:03:08,300 --> 00:03:14,166
For example Python, Java, and R, are all interpreted coding languages.

34
00:03:14,800 --> 00:03:17,916
The main advantage of those is that they are easy to program 

35
00:03:17,916 --> 00:03:20,383
and they do not need to be compiled.

36
00:03:21,083 --> 00:03:22,833
If performance is an issue, 

37
00:03:22,833 --> 00:03:27,866
e.g. Python can be compiled to a faster C-code using cython.

38
00:03:28,449 --> 00:03:32,266
Programming and running programs based on interpreted languages 

39
00:03:32,266 --> 00:03:36,933
usually require loading specific modules on the CSC supercomputers.

40
00:03:37,583 --> 00:03:43,599
E.g. Python module enables access to some of the most important Python libraries. 

41
00:03:44,133 --> 00:03:49,166
It also ensures that the software behaves the same on all the compute nodes. 

42
00:03:50,966 --> 00:03:54,849
If the computational problem you are dealing with is very complicated, 

43
00:03:54,849 --> 00:03:58,199
then interpreted languages are usually not the best option, 

44
00:03:58,199 --> 00:04:01,900
because they are not efficient for numerically heavy tasks. 

45
00:04:02,550 --> 00:04:07,449
To do heavy tasks somewhat efficiently with interpreted languages make sure you

46
00:04:07,449 --> 00:04:12,033
utilize libraries to perform some of the most demanding computational tasks.

47
00:04:12,633 --> 00:04:15,866
There might be high performance computing libraries, 

48
00:04:15,866 --> 00:04:21,166
or libraries that have been written using HPC programming languages. 

49
00:04:29,766 --> 00:04:32,883
These programming languages do not run as such, 

50
00:04:32,883 --> 00:04:36,850
but when they are compiled and run, they have great performance.

51
00:04:37,433 --> 00:04:44,100
Examples of HPC languages are e.g. C, C++, and FORTRAN. 

52
00:04:44,683 --> 00:04:50,133
It requires some experience to write efficient codes with HPC languages.

53
00:04:50,733 --> 00:04:55,066
They are not so easy to program and the syntax might not be so simple. 

54
00:04:55,716 --> 00:04:59,133
Most of very resource intensive scientific applications 

55
00:04:59,133 --> 00:05:03,216
have been programmed using HPC languages, because they are good

56
00:05:03,216 --> 00:05:07,250
in heavy number crunching, linear algebra and such tasks.

57
00:05:07,850 --> 00:05:11,483
If someone - a person involved with method development - 

58
00:05:11,483 --> 00:05:13,766
has already implemented the program, 

59
00:05:13,766 --> 00:05:17,016
then researchers need only to compile the code.

60
00:05:17,416 --> 00:05:22,433
If you consider compiling complicated, contact servicedesk.

61
00:05:30,433 --> 00:05:33,100
There are some things to keep in mind when installing

62
00:05:33,100 --> 00:05:36,500
custom software on the CSC supercomputers.

63
00:05:37,149 --> 00:05:40,933
It is not the same as installing something on your own computer. 

64
00:05:41,600 --> 00:05:45,750
For example you can not use sudo for getting root privileges.

65
00:05:46,466 --> 00:05:50,250
Also you can not use package managers like apt or yum.

66
00:05:50,983 --> 00:05:54,800
Then you don't have permissions to install into default locations.

67
00:05:55,566 --> 00:05:58,766
Instead custom installations should go to the PROJAPPL

68
00:05:58,766 --> 00:06:01,166
directory - where you have write access. 

69
00:06:02,733 --> 00:06:05,416
When you start to compile your application,

70
00:06:05,416 --> 00:06:09,500
you should first load required compiler suites or language modules. 

71
00:06:10,016 --> 00:06:15,033
When you install custom software, it is not automatically added to the PATH.

72
00:06:15,649 --> 00:06:18,300
That means you can't run the program from anywhere

73
00:06:18,300 --> 00:06:21,283
without providing the path to the binary file.

74
00:06:31,116 --> 00:06:36,166
If you install software directly to the system it is called native installation.

75
00:06:36,616 --> 00:06:41,383
That would be for example the binary file that you have compiled and run.

76
00:06:41,966 --> 00:06:47,000
It is a good way of installing if the program does not have many dependencies.

77
00:06:55,616 --> 00:06:59,383
Applications can also be available as containers.

78
00:07:00,100 --> 00:07:05,116
Containers provide a convenient way of installing software and its dependencies.

79
00:07:05,733 --> 00:07:08,766
There are many containers available in public.

80
00:07:09,399 --> 00:07:14,449
Please consider using containers even if other installation methods were available.

81
00:07:14,916 --> 00:07:19,250
See the container lecture and the documentation for more information.

82
00:07:28,333 --> 00:07:33,083
Conda is a package manager used especially for many Python packages.

83
00:07:33,716 --> 00:07:39,733
It is very popular, but unfortunately it is also problematic in HPC systems.

84
00:07:40,250 --> 00:07:43,399
A Conda environment contains a huge number of files 

85
00:07:43,399 --> 00:07:46,483
which really slows down Lustre file system.

86
00:07:46,916 --> 00:07:51,300
They are also difficult to maintain because of many dependencies.

87
00:07:51,949 --> 00:07:54,766
Containers are a preferred installation method, 

88
00:07:54,766 --> 00:07:57,866
because they package everything in a single file.

89
00:07:58,333 --> 00:08:04,283
Also Conda installations can be put into a container and used in HPC environment.

90
00:08:13,816 --> 00:08:18,266
Check out the batch job lecture for details on submitting batch jobs.

91
00:08:19,000 --> 00:08:21,866
Test the software with short batch jobs.

92
00:08:22,583 --> 00:08:26,383
It you submit a long and heavy job with no knowledge of how it works,

93
00:08:26,383 --> 00:08:28,699
you are wasting time and resources

94
00:08:28,699 --> 00:08:32,166
 - not only yours but other users and admins as well.

95
00:08:32,783 --> 00:08:36,200
Try to reproduce some results that you know beforehand.

96
00:08:36,933 --> 00:08:40,516
That way you can deduce if the program works as intended.

97
00:08:41,266 --> 00:08:44,983
Some software might have a tutorial that you can run.

98
00:08:45,633 --> 00:08:50,299
Run your tests e.g. in the test queue or in an interactive session.

99
00:08:50,833 --> 00:08:56,366
Check also the efficiency of the code by comparing it to your previous compilations. 

100
00:08:56,366 --> 00:09:01,383
In other words, be sure to benchmark your new software. 

101
00:09:08,683 --> 00:09:13,716
The tutorials about installing software have some easy-to-follow examples.

102
00:09:14,216 --> 00:09:18,733
Be sure to check out the documentation in DocsCSC pages.

103
00:09:19,250 --> 00:09:23,783
There is information on how to compile applications in Puhti or Mahti, 

104
00:09:23,783 --> 00:09:27,066
and how to use different HPC libraries. 

105
00:09:27,533 --> 00:09:31,383
There is also guidelines for loading different compilers. 

106
00:09:32,033 --> 00:09:36,750
For example whether to compile using an Intel or GNU compiler, what kind of

107
00:09:36,750 --> 00:09:43,000
optimization flags you can use, or how to link different HPC libraries to your codes. 

108
00:09:43,533 --> 00:09:47,433
There is of course a lot of information available online. 

109
00:09:47,933 --> 00:09:51,116
If you get some errors or problems in the compilation, 

110
00:09:51,116 --> 00:09:54,483
copy paste the error messages and Google for some tips. 

111
00:09:55,083 --> 00:09:59,616
Also be sure to contact the servicedesk in case of any issues.