Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
csc-training
GitHub Repository: csc-training/csc-env-eff
Path: blob/master/_slides/SRTFiles/07_Allas_SRT_English_mac.srt
696 views
1
00:00:22,250 --> 00:00:26,933
Allas is a CSC service and to use it you need to have a CSC user account 

2
00:00:26,933 --> 00:00:29,716
which is a member in a CSC project.

3
00:00:30,250 --> 00:00:35,299
And then just like for Puhti and Mahti, you apply for the Allas service. 

4
00:00:35,616 --> 00:00:41,116
Allas is available for all project members after they accept the Allas terms of use.

5
00:00:41,783 --> 00:00:44,883
Note that you can use Allas without Puhti or Mahti 

6
00:00:44,883 --> 00:00:48,233
if you only want to store files for your project.

7
00:00:55,700 --> 00:01:01,066
Allas is a general purpose storage system, where files are stored as objects.

8
00:01:01,733 --> 00:01:05,766
It is developed at CSC to provide a long term data storage space

9
00:01:05,766 --> 00:01:08,283
for computing and cloud services.

10
00:01:08,916 --> 00:01:13,950
It is a CEPH-based data storage system - in case anyone asks.

11
00:01:14,166 --> 00:01:17,616
Allas is accessible from the CSC computing environment 

12
00:01:17,616 --> 00:01:19,616
and personal computers as well.

13
00:01:20,599 --> 00:01:24,799
It is meant for storing your datasets during your project lifetime. 

14
00:01:27,116 --> 00:01:32,516
The default Allas storage quota for one CSC project is 10 terabytes. 

15
00:01:33,133 --> 00:01:36,816
If you need to have more quota and you have good reasons for that,

16
00:01:36,816 --> 00:01:38,583
you can apply for more. 

17
00:01:39,166 --> 00:01:42,950
For that you need to send email to the CSC servicedesk, 

18
00:01:42,950 --> 00:01:46,183
but don't be shy to let us know if you really need the space. 

19
00:01:46,450 --> 00:01:52,216
The biggest projects have several hundreds of terabytes of data stored in Allas. 

20
00:01:53,049 --> 00:01:58,116
It is better to use Allas for storing large datasets rather than, for example, 

21
00:01:58,116 --> 00:02:02,633
applying for more scratch space for just having some data available for you.

22
00:02:04,099 --> 00:02:07,650
Puhti and Mahti have clients for accessing Allas, 

23
00:02:07,650 --> 00:02:11,416
but Allas not in any way bound to other CSC services. 

24
00:02:12,083 --> 00:02:15,333
You can upload data and use Allas directly from your PC 

25
00:02:15,333 --> 00:02:18,300
or from your organization's own computing system. 

26
00:02:18,966 --> 00:02:24,016
Just note that there you have to install yourself the clients that you use.

27
00:02:32,000 --> 00:02:36,500
So Allas is in a sense a some kind of data hub of CSC. 

28
00:02:37,050 --> 00:02:41,433
The illustration shows that you can access Allas from Puhti and Mahti.

29
00:02:41,666 --> 00:02:44,166
Other examples are a weather measurement station

30
00:02:44,166 --> 00:02:47,316
where the sensors send the data directly to Allas, 

31
00:02:47,316 --> 00:02:51,650
an university computer server, a Virtual Machine in cPouta,

32
00:02:51,650 --> 00:02:55,216
a personal computer, a mobile phone and a web page.

33
00:02:56,433 --> 00:03:01,449
So it does not need to be CSC involved in the process of moving data.

34
00:03:01,683 --> 00:03:05,900
As long as you are connected to internet, you can use Allas.

35
00:03:07,033 --> 00:03:11,133
The usual workflow is to have the static dataset stored all the time in Allas 

36
00:03:11,133 --> 00:03:14,449
and copy it to a computer to do some analyses.

37
00:03:15,050 --> 00:03:18,316
And then put a new or another version of the dataset to Allas 

38
00:03:18,316 --> 00:03:20,650
if it has been modified somehow. 

39
00:03:22,233 --> 00:03:25,616
Files in Allas can be shared publicly to Internet.

40
00:03:26,400 --> 00:03:29,366
For example these slides are stored in Allas.

41
00:03:30,083 --> 00:03:33,250
Check the URL from the video description.

42
00:03:40,766 --> 00:03:43,599
Allas is not a file system or disk. 

43
00:03:44,216 --> 00:03:48,516
Many of the interfaces represent the data like it had some hierarchy 

44
00:03:48,516 --> 00:03:51,699
 - directories, subdirectories and files - 

45
00:03:51,699 --> 00:03:55,633
but in practice it is just a pile of static data objects.

46
00:03:56,133 --> 00:04:01,333
You can add, read and delete objects, but you are not modifying any data there.

47
00:04:02,183 --> 00:04:04,283
There is no real hierarchy 

48
00:04:04,283 --> 00:04:07,633
 - it is just a place where you have some data blobs stored. 

49
00:04:08,933 --> 00:04:12,583
Allas is also not a data management environment.

50
00:04:13,250 --> 00:04:15,166
The tools for searching data, 

51
00:04:15,166 --> 00:04:18,850
doing version controlling or metadata management are minimal. 

52
00:04:19,600 --> 00:04:22,600
The basic Allas command line tools can handle maybe 

53
00:04:22,600 --> 00:04:25,000
some hundreds of objects in Allas.

54
00:04:25,633 --> 00:04:30,149
Workflows that automatically collect data to Allas and store it for a long time

55
00:04:30,149 --> 00:04:33,733
will end up having hundreds of thousands of small files.

56
00:04:34,416 --> 00:04:39,000
They will need some other tools to keep track of the data stored in Allas. 

57
00:04:39,416 --> 00:04:42,433
You can store huge amounts of data to Allas,

58
00:04:42,433 --> 00:04:45,883
but we recommend to set up some support service like a database

59
00:04:45,883 --> 00:04:50,199
that tells you later on what data is where and how to access it from Allas. 

60
00:04:51,899 --> 00:04:55,266
Also, Allas is not a back-up service.

61
00:04:56,149 --> 00:04:59,966
If you ask where to have a copy of your important files in ProjAppl, 

62
00:04:59,966 --> 00:05:02,733
we will suggest to make a copy to Allas.

63
00:05:03,449 --> 00:05:06,933
Nevertheless we want to point out that it is not a full backup, 

64
00:05:06,933 --> 00:05:11,350
because you or your colleagues can still accidentally delete the data from Allas. 

65
00:05:12,033 --> 00:05:16,666
All the project members have equal rights to the project's data in Allas.

66
00:05:17,199 --> 00:05:21,033
You have to make sure that your project members know how to use the service 

67
00:05:21,033 --> 00:05:24,000
and agree on files that should not be deleted.

68
00:05:24,733 --> 00:05:28,149
Note that all it takes is just one erratic command 

69
00:05:28,149 --> 00:05:31,733
 - from you or your project members - and the data is lost.

70
00:05:32,300 --> 00:05:35,416
Always keep important backups in a safe place

71
00:05:35,416 --> 00:05:38,666
 - for example in a separate local hard drive.

72
00:05:40,199 --> 00:05:43,033
Allas is not a final resting place of your data

73
00:05:43,033 --> 00:05:47,350
 - the point is that you can have your data there while you are actively working with that data - 

74
00:05:47,350 --> 00:05:49,433
which is sometimes several years. 

75
00:05:56,433 --> 00:05:59,399
Technical-wise Allas is quite secure.

76
00:06:00,016 --> 00:06:03,300
Data is stored into several servers in Allas. 

77
00:06:03,983 --> 00:06:07,616
Individual disc or server breaks will not result to data losses 

78
00:06:07,616 --> 00:06:10,966
because other discs or servers have another copies. 

79
00:06:11,649 --> 00:06:14,933
Note that even these multiple copies of data do not help 

80
00:06:14,933 --> 00:06:17,683
if the user deletes the file or object.

81
00:06:18,416 --> 00:06:22,433
We at CSC cannot recover your deleted data from Allas.

82
00:06:24,199 --> 00:06:29,483
Storing data as static objets means that they can not be modified while they are in Allas. 

83
00:06:30,233 --> 00:06:34,516
If you want to edit a file you have to download it to some other environment,

84
00:06:34,516 --> 00:06:37,033
for example to your local laptop. 

85
00:06:37,899 --> 00:06:42,199
Then after you edit the document you can overwrite the file in Allas. 

86
00:06:43,716 --> 00:06:47,866
You can use some data management and metadata features included in Allas, 

87
00:06:47,866 --> 00:06:50,300
but they have somewhat limited features. 

88
00:06:58,283 --> 00:07:01,683
Allas storage space is given for a CSC project

89
00:07:01,683 --> 00:07:04,216
 - sometimes referred as Allas project. 

90
00:07:04,866 --> 00:07:09,233
If your project has only one user then the data is accessible by you only, 

91
00:07:09,233 --> 00:07:11,033
but that is rarely the case.

92
00:07:11,899 --> 00:07:16,116
Each project can have up to 1000 of so-called buckets. 

93
00:07:16,833 --> 00:07:20,050
A bucket is kind of a root directory in Allas.

94
00:07:20,883 --> 00:07:23,916
Some interfaces may refer to buckets as containers,

95
00:07:23,949 --> 00:07:27,083
but that confuses easily with Docker containers!

96
00:07:28,800 --> 00:07:32,683
Each of the bucket names must be unique throughout all Allas. 

97
00:07:33,233 --> 00:07:36,183
This is because the bucket names are used if you generate 

98
00:07:36,183 --> 00:07:38,266
public URLs for your buckets. 

99
00:07:38,616 --> 00:07:42,683
Therefore two projects cannot have the same bucket name in use. 

100
00:07:43,133 --> 00:07:46,583
Keep this in mind when creating buckets and include for example 

101
00:07:46,583 --> 00:07:49,166
your project number in the bucket name

102
00:07:49,483 --> 00:07:54,199
Then you can be quite sure no one else has a bucket with the same name. 

103
00:07:54,483 --> 00:07:59,533
If you try to use already used bucket name the system gives you an error message. 

104
00:08:06,866 --> 00:08:12,516
Data is stored in a way that is called as an object, which is like a static blob of data.

105
00:08:13,300 --> 00:08:16,933
In general you can think that one file is one object. 

106
00:08:17,699 --> 00:08:21,899
That means that in a normal bucket an object name equals to a file name,

107
00:08:21,899 --> 00:08:26,433
and you can pull the file to you environment by pointing the object name.

108
00:08:27,183 --> 00:08:32,483
Then for example larger files might be automatically stored as several objects in Allas, 

109
00:08:32,483 --> 00:08:35,333
but in practice you don't have to worry about that. 

110
00:08:36,100 --> 00:08:40,500
Objects have metadata and users can add or edit their own metadata.

111
00:08:41,233 --> 00:08:44,433
Each bucket can have half a million objects. 

112
00:08:44,750 --> 00:08:49,866
It sure sounds a lot at first, but if you have an automatic data collection service, 

113
00:08:49,866 --> 00:08:52,616
you may end up having these many files. 

114
00:08:53,450 --> 00:08:56,866
We ask you not to have these many objects in your buckets, 

115
00:08:56,866 --> 00:08:59,516
because it will make the system very slow. 

116
00:08:59,750 --> 00:09:04,133
It is better to create more buckets and spread the files among those.

117
00:09:06,516 --> 00:09:11,266
The one level of hierarchy means that there can be only objects inside buckets, 

118
00:09:11,266 --> 00:09:13,216
not buckets inside buckets. 

119
00:09:13,533 --> 00:09:15,916
You can have object names which look like 

120
00:09:15,916 --> 00:09:18,216
that there is some directory structure there. 

121
00:09:18,216 --> 00:09:22,250
For example if the object name is maindir/dataset/data 

122
00:09:22,250 --> 00:09:25,016
where maindir and dataset are like pseudofolders.

123
00:09:25,666 --> 00:09:28,649
Still there is no real directory structure there

124
00:09:28,649 --> 00:09:32,766
 - instead it is a long object name with directory names and slashes. 

125
00:09:33,649 --> 00:09:37,983
All in all you may think your project's Allas space as a home folder.

126
00:09:37,983 --> 00:09:42,366
In the home folder you have up to 1000 folders that have files

127
00:09:42,366 --> 00:09:44,716
 - but not real subdirectories. 

128
00:09:52,149 --> 00:09:55,366
There are several tools to interface with Allas.

129
00:09:56,016 --> 00:10:01,450
Those tools use either S3 or Swift protocol for uploading and downloading data. 

130
00:10:02,049 --> 00:10:07,066
Both of them have their pros and cons, and you can use either one of them. 

131
00:10:08,683 --> 00:10:14,033
For the end user the biggest difference between S3 and Swift is in the authentication. 

132
00:10:14,733 --> 00:10:19,049
When you open the connection to Allas you have to authenticate first. 

133
00:10:19,383 --> 00:10:25,116
S3 protocol creates a permanent key for accessing your project's data in Allas. 

134
00:10:25,549 --> 00:10:29,333
These keys are always project specific and permanent.

135
00:10:29,333 --> 00:10:32,250
The same can be used from any client to access 

136
00:10:32,250 --> 00:10:35,833
the project's data in Allas until you delete the key from the system. 

137
00:10:35,833 --> 00:10:39,899
It is convenient for a user, but also very unsecure. 

138
00:10:39,899 --> 00:10:44,566
If anybody steals your key - which is just a two random character strings

139
00:10:44,566 --> 00:10:47,549
 - they can access all the data in your project. 

140
00:10:47,549 --> 00:10:52,100
It means they can also delete all the data and you won't even notice it. 

141
00:10:53,733 --> 00:10:57,516
Swift protocol also has a random string used for authentication 

142
00:10:57,516 --> 00:10:59,283
but it is a temporary token. 

143
00:11:00,049 --> 00:11:04,750
The key is valid only for a limited time - currently eight hours. 

144
00:11:05,433 --> 00:11:09,066
After eight hours, or if you close the terminal session, 

145
00:11:09,066 --> 00:11:12,383
you need to generate new key with your CSC password.

146
00:11:13,166 --> 00:11:18,766
It is perfectly fine to initiate a new connection already before eight hours have passed.

147
00:11:19,366 --> 00:11:23,850
If somebody gets access to your Allas keys used by Swift protocol, 

148
00:11:23,866 --> 00:11:26,666
they have only eight hours time to do something.

149
00:11:27,333 --> 00:11:30,383
Then if someone gets your CSC password

150
00:11:30,383 --> 00:11:33,416
 - well that is not good also for many other reasons.

151
00:11:35,366 --> 00:11:41,049
In Puhti and Mahti environment we are preferring Swift for its safer authentication. 

152
00:11:42,633 --> 00:11:47,416
The two protocols also manage metadata a little bit different way. 

153
00:11:47,983 --> 00:11:51,566
And they handle large files also a little bit differently. 

154
00:11:52,283 --> 00:11:55,283
Swift protocols split your data in smaller pieces

155
00:11:55,283 --> 00:11:58,466
so that you can easily read only part of it if needed.

156
00:11:59,216 --> 00:12:02,883
S3 instead stores everything in a one big object. 

157
00:12:03,516 --> 00:12:09,433
Because of these differencies you should avoid cross-using Swift and S3 based objects. 

158
00:12:10,149 --> 00:12:14,083
That means if you have uploaded the data to Allas with one protocol, 

159
00:12:14,083 --> 00:12:17,566
it is better to also read it using the same protocol.

160
00:12:23,916 --> 00:12:28,933
These Allas clients listed in this slide use either Swift or S3 protocol. 

161
00:12:29,049 --> 00:12:32,933
Many of these tools can actually use both of the protocols. 

162
00:12:34,149 --> 00:12:38,100
In Puhti and Mahti, we are mostly using command line clients

163
00:12:38,100 --> 00:12:42,283
like rclone, swift, s3cmd and a-tools. 

164
00:12:42,916 --> 00:12:48,250
In your local Mac or Linux computer you can also use the same tools in Terminal. 

165
00:12:48,933 --> 00:12:51,083
In Windows and Mac you can use at least 

166
00:12:51,083 --> 00:12:56,483
Cyberduck, FileZilla pro, Pouta web-interface or SD-Connect.

167
00:12:57,333 --> 00:13:00,016
You can also use FUSE-based virtual mounts 

168
00:13:00,016 --> 00:13:03,366
which makes one bucket in Allas to be shown as a directory.

169
00:13:04,133 --> 00:13:07,399
That is handy especially in virtual machines.

170
00:13:07,783 --> 00:13:10,133
It is also very prone for errors, 

171
00:13:10,133 --> 00:13:14,299
and we suggest you use only the Read-Only mode with this kind of approach.

172
00:13:23,100 --> 00:13:27,166
First you have to enable Allas service in my CSC. 

173
00:13:27,633 --> 00:13:31,166
In Puhti or Mahti environment Allas is available as a module

174
00:13:31,166 --> 00:13:33,299
which includes the Allas tools.

175
00:13:33,966 --> 00:13:37,950
Load the Allas module with command module load allas. 

176
00:13:38,299 --> 00:13:43,649
Then run the command allas-conf, which by default opens a swift-based connection.

177
00:13:44,316 --> 00:13:48,799
The configuration process asks you to specify a project. 

178
00:13:49,266 --> 00:13:52,283
The connection stays for eight hours and you can start 

179
00:13:52,283 --> 00:13:56,149
to figure out what was the command to see your buckets and files.

180
00:13:56,850 --> 00:14:02,216
Check out the material bank for tutorials and a hands-on Allas tutorial video.

181
00:14:11,166 --> 00:14:16,200
A straight-forward command line interface that can be used with Allas is rclone.

182
00:14:16,966 --> 00:14:22,316
It works fast and provides functions like move, copy, tree and cat.

183
00:14:23,000 --> 00:14:26,299
You can install it to all operating systems.

184
00:14:26,916 --> 00:14:31,366
Be mindful that rclone overrides and removes data without asking. 

185
00:14:32,016 --> 00:14:36,166
That means you have to know which copy of your file is the newer version.

186
00:14:36,783 --> 00:14:41,783
Rclone does not ask that do you want to override this new file with this older one 

187
00:14:41,783 --> 00:14:44,649
 - it just copies what you write in the command.

188
00:14:45,149 --> 00:14:48,049
And then if you have a very large sets of objects then 

189
00:14:48,049 --> 00:14:50,383
it does not always function properly. 

190
00:14:51,183 --> 00:14:55,366
We have found that rclone has difficulties to list for example datasets 

191
00:14:55,366 --> 00:14:58,649
which have tens of thousands of objects in a one bucket.

192
00:14:59,549 --> 00:15:05,833
By default at CSC rclone is configured to use swift, but S3 can be used as well.

193
00:15:12,766 --> 00:15:17,083
We at CSC have created a wrapper around the native rclone.

194
00:15:17,799 --> 00:15:22,100
The aim is to make the scripts more easy to use in Puhti and Mahti. 

195
00:15:22,783 --> 00:15:26,816
For example it uses default bucket names so you don't have to. 

196
00:15:27,383 --> 00:15:29,683
If you do not define a bucket name,

197
00:15:29,683 --> 00:15:34,616
it checks that your data is coming from e.g. Scratch directory of Puhti

198
00:15:34,616 --> 00:15:38,566
 - and it creates a Scratch bucket for your project and puts the data there.

199
00:15:39,316 --> 00:15:43,666
A-tools also do ask before overwriting or removing data.

200
00:15:44,149 --> 00:15:49,783
The basic use case is that you use data from Puhti or Mahti, make a copy to Allas, 

201
00:15:49,783 --> 00:15:53,583
and later on download the data back to Puhti or Mahti. 

202
00:15:54,200 --> 00:15:59,000
You can install these tools to other Linux and Mac machines as well. 

203
00:15:59,633 --> 00:16:04,683
But keep in mind that a-tools are developed with CSC environment in mind.

204
00:16:05,216 --> 00:16:08,666
For example it collects the directories in the one tar package and

205
00:16:08,666 --> 00:16:11,583
does compression before uploading to Allas. 

206
00:16:12,250 --> 00:16:15,483
If you want to then push the data out to some other service, 

207
00:16:15,483 --> 00:16:19,316
it may require extra steps for uncompressing and unpacking the data 

208
00:16:19,316 --> 00:16:21,883
before you can access the files. 

209
00:16:33,149 --> 00:16:36,700
This is also a comparison of a-tools and rclone.

210
00:16:37,500 --> 00:16:43,100
As stated in the previous slide, a-tools work nicely in the CSC environment.

211
00:16:43,950 --> 00:16:47,350
It packages the data nicely and allows you to store and move it 

212
00:16:47,350 --> 00:16:50,066
in a bigger chunks instead of file by file. 

213
00:16:50,950 --> 00:16:53,666
The file size is reduced with compression,

214
00:16:53,666 --> 00:16:57,299
so you can store more data consuming less billing units.

215
00:16:59,283 --> 00:17:02,983
With a-tools you don't need to think about bucket names.

216
00:17:02,983 --> 00:17:08,099
The default bucket names refer to Puhti and Mahti directory structures. 

217
00:17:08,883 --> 00:17:12,916
A-tools also prevent you from accidentally overwriting your data. 

218
00:17:14,683 --> 00:17:17,950
The downsides of a-tools are mainly related to the usage

219
00:17:17,950 --> 00:17:20,433
in other that CSC environments.

220
00:17:21,233 --> 00:17:26,283
Trying to read an object that is created with some other tool is usually complicated.

221
00:17:26,766 --> 00:17:31,783
Objects created with a-tools have an additional ameta metadata object.

222
00:17:31,883 --> 00:17:36,083
A-tools put them in your bucket in addition to the object itself.

223
00:17:44,483 --> 00:17:48,799
Then some practical good-to-know issues concerning Allas in general. 

224
00:17:49,299 --> 00:17:53,083
First of all is this eight hour connection limit with Swift. 

225
00:17:53,799 --> 00:17:57,133
It usually is not an issue in normal interactive work, 

226
00:17:57,133 --> 00:18:00,150
but in batch jobs you need to take that into account.

227
00:18:00,516 --> 00:18:05,933
It might be that your batch job does not even start before the eight-hour limit has gone. 

228
00:18:06,233 --> 00:18:10,266
In batch jobs you should configure Allas so that it stores your password

229
00:18:10,266 --> 00:18:12,783
in an environment variable in the session.

230
00:18:13,150 --> 00:18:16,366
Then the Allas connection can be refreshed by the batch job 

231
00:18:16,366 --> 00:18:19,950
by using the password from the environmental variable.

232
00:18:20,200 --> 00:18:24,166
Check the material bank for a tutorial on how to achieve this.

233
00:18:24,883 --> 00:18:27,450
In Allas you cannot check the quota to see

234
00:18:27,450 --> 00:18:30,933
the maximum amount of data you can have in Allas. 

235
00:18:31,033 --> 00:18:34,200
If you increase your quota from the 10TB default 

236
00:18:34,200 --> 00:18:36,816
there is no way to check the quota in Allas. 

237
00:18:36,866 --> 00:18:39,666
You can check the emails from CSC telling that 

238
00:18:39,683 --> 00:18:42,950
you have been granted 50 terabytes of quota. 

239
00:18:43,299 --> 00:18:48,866
If you try to put there data that exceeds the quotas it will tell that object is too large.

240
00:18:49,266 --> 00:18:53,400
Then you might guess that you have hit the size limit of the Allas area. 

241
00:18:55,299 --> 00:19:00,333
Moving data inside Allas is not possible, at least with the swift protocol. 

242
00:19:00,833 --> 00:19:03,849
If you want to move a dataset from one bucket to another

243
00:19:03,849 --> 00:19:07,566
or from one project to another, in practice you have to download it to 

244
00:19:07,566 --> 00:19:12,083
for example Puhti Scratch and then push it to the new location in Allas. 

245
00:19:12,799 --> 00:19:15,666
That is of course more time consuming that it would be

246
00:19:15,666 --> 00:19:18,333
to move the data only inside Allas.

247
00:19:20,049 --> 00:19:22,650
Freezing the data means such a read-only mode 

248
00:19:22,650 --> 00:19:25,299
that would prevent others modifying the data. 

249
00:19:26,099 --> 00:19:28,650
To achieve that in Allas you could use a so-called 

250
00:19:28,650 --> 00:19:31,316
'two project protocol' as a workaround. 

251
00:19:32,116 --> 00:19:35,183
The first Allas project is the hosting project. 

252
00:19:35,183 --> 00:19:39,099
The managers has full access rights to data under that project.

253
00:19:39,099 --> 00:19:42,716
They can set up another project for the users of the data. 

254
00:19:42,916 --> 00:19:46,183
The users don't belong to the data hosting project,

255
00:19:46,183 --> 00:19:48,733
so they don't have full access to all the data.

256
00:19:49,049 --> 00:19:54,733
The managers can grant access for a selected users to e.g. one single bucket.

257
00:19:54,833 --> 00:19:57,349
This way you can have more secure way of 

258
00:19:57,349 --> 00:20:00,000
sharing your data with your project members.

259
00:20:01,750 --> 00:20:06,900
Another option is that you create a separate front server inside or outside Allas.

260
00:20:07,400 --> 00:20:11,966
With the server you can control the access to data that you have in Allas. 

261
00:20:12,450 --> 00:20:16,233
For example you can set up a NextCloud server in cPouta, 

262
00:20:16,233 --> 00:20:20,900
and use that to share data to your external collaborators somewhere else.

263
00:20:23,333 --> 00:20:26,716
It is a good idea to learn one Allas interface like a-tools and 

264
00:20:26,716 --> 00:20:29,216
stick with that as much as possible.

265
00:20:29,866 --> 00:20:32,566
If you need to use many different interfaces, 

266
00:20:32,566 --> 00:20:36,483
keep in mind that they may work in a little bit different ways.

267
00:20:37,250 --> 00:20:42,283
For example different CyberDuck versions show the data in a little bit different way. 

268
00:20:43,216 --> 00:20:47,766
You may not always see the same buckets and objects there - even though you should

269
00:20:47,766 --> 00:20:51,516
 - because the interfaces interpret the pseudofolders differently.

270
00:21:00,733 --> 00:21:03,633
Whenever you plan to store something in Allas you should think 

271
00:21:03,633 --> 00:21:08,349
whether to store individual files, or to collect them in larger chunks.

272
00:21:08,849 --> 00:21:11,683
Think about how you will use the data.

273
00:21:12,133 --> 00:21:15,799
If you use a set of scripts that consist of three files,

274
00:21:15,799 --> 00:21:18,866
you should collect those files to a one tar package,

275
00:21:18,866 --> 00:21:21,666
and upload the package as one object to Allas. 

276
00:21:21,916 --> 00:21:26,933
Then you can download the files as one object which puts less stress to the system.

277
00:21:28,266 --> 00:21:31,983
Consider also using compression for its benefits. 

278
00:21:32,599 --> 00:21:35,549
It reduces the space needed to store the data, 

279
00:21:35,549 --> 00:21:39,616
and it also reduces the time to move the data in and out of Allas. 

280
00:21:40,316 --> 00:21:45,333
That is especially good thing if you have a slow or unstable internet connection.

281
00:21:45,566 --> 00:21:50,083
Compression of course takes time, so it may not bring time efficiency 

282
00:21:50,083 --> 00:21:54,083
when moving data between CSC supecomputers and Allas.

283
00:21:55,549 --> 00:21:59,049
You should also think of who can use the data.

284
00:21:59,333 --> 00:22:03,333
By default all project members have equal access to the data. 

285
00:22:03,583 --> 00:22:06,383
Usually that is the ideal situation.

286
00:22:06,833 --> 00:22:12,033
If not, then consider using two projects or another server to manage access.

287
00:22:14,666 --> 00:22:18,766
Note that Allas is not the final resting place of your data. 

288
00:22:19,333 --> 00:22:22,533
When you start a CSC project you should already consider 

289
00:22:22,533 --> 00:22:25,866
what will happen to the project's data when the project ends.

290
00:22:26,616 --> 00:22:29,583
Remember that the project lifetime has to be extended 

291
00:22:29,583 --> 00:22:32,133
yearly if the project still continues.

292
00:22:32,916 --> 00:22:37,099
After the project is finished you need to save everything in somewhere else, 

293
00:22:37,099 --> 00:22:40,483
because everything will be removed from the CSC environment

294
00:22:40,483 --> 00:22:42,466
 - which means from Allas as well.

295
00:22:44,049 --> 00:22:48,183
In time you start to accumulate large amounts of data in Allas.

296
00:22:48,466 --> 00:22:52,349
It will get difficult to keep track of all the data you have.

297
00:22:53,066 --> 00:22:56,866
You should plan and make rules on how to store the data in the Allas project

298
00:22:56,866 --> 00:22:59,483
already before starting to use Allas.

299
00:22:59,533 --> 00:23:03,983
Especially if there is a whole group pushing files and datasets to Allas, 

300
00:23:03,983 --> 00:23:06,183
it will get messy after a while.

301
00:23:14,316 --> 00:23:18,116
A few words about other data services at CSC. 

302
00:23:18,833 --> 00:23:22,500
Allas is not the only place where you can store data. 

303
00:23:23,166 --> 00:23:28,383
The Fairdata services are more focused services developed with datasets in mind.

304
00:23:29,049 --> 00:23:33,066
You can use them when you have a ready static data set that you want to

305
00:23:33,066 --> 00:23:37,583
store for longer time, and make it available for other researchers as well. 

306
00:23:38,250 --> 00:23:41,916
Fairdata services consist of three main services. 

307
00:23:42,549 --> 00:23:44,466
IDA is the storage service, 

308
00:23:44,466 --> 00:23:48,316
where you can do things like data freezing and backup copies. 

309
00:23:49,049 --> 00:23:52,616
Linking to IDA there is Quvain for describing the data.

310
00:23:53,150 --> 00:23:56,599
There are rich metadata features and a possibility to create 

311
00:23:56,599 --> 00:24:01,766
a persistent identifier for your data, which you can use in your publications. 

312
00:24:02,450 --> 00:24:06,416
Then there is the Etsin service, which allows external users to look for 

313
00:24:06,416 --> 00:24:09,616
the datasets stored in IDA and described with Quvain. 

314
00:24:10,266 --> 00:24:14,250
The datasets do not need to be accessible to everybody. 

315
00:24:15,000 --> 00:24:20,016
People can see that the dataset they need is there in IDA, and who has access to it.

316
00:24:20,683 --> 00:24:25,716
Then they can contact the manager to ask for access to use the dataset.

317
00:24:33,049 --> 00:24:37,283
Sensitive data services at CSC are intended for working with datasets

318
00:24:37,283 --> 00:24:40,416
that contain sensitive or secret material. 

319
00:24:41,133 --> 00:24:46,433
In the core of the service is SD Desktop - Sensitive Data Virtual Desktop.

320
00:24:47,033 --> 00:24:51,016
You can use that to work with sensitive data secure way. 

321
00:24:51,650 --> 00:24:56,466
The data is imported to the system through interface called SD-Connect.

322
00:24:56,816 --> 00:24:59,233
The fact that you cannot access internet

323
00:24:59,233 --> 00:25:02,799
or pull out data from SD Desktop makes it more secure.

324
00:25:03,650 --> 00:25:07,650
It is possible to have a collaboration project where you let your collaborators do

325
00:25:07,650 --> 00:25:12,250
analyses with your data, but never get the data out of your control. 

326
00:25:12,849 --> 00:25:16,349
In practice you can use Allas for storing sensitive data, 

327
00:25:16,349 --> 00:25:20,349
but you have to always encrypt the data before you upload it to Allas.

328
00:25:20,916 --> 00:25:24,066
SD-Connect does encryption automatically and in a way 

329
00:25:24,066 --> 00:25:26,983
which is compatible with the SD-Desktop.

330
00:25:29,033 --> 00:25:32,299
The tutorials about Allas continue from here. 

331
00:25:32,483 --> 00:25:36,500
They cover the basic use cases with easy-to-follow examples.

332
00:25:36,650 --> 00:25:41,683
Allas documentation covers the introduction to Allas and the technical details.