Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
torvalds
GitHub Repository: torvalds/linux
Path: blob/master/tools/perf/Documentation/perf-bench.txt
51030 views
1
perf-bench(1)
2
=============
3
4
NAME
5
----
6
perf-bench - General framework for benchmark suites
7
8
SYNOPSIS
9
--------
10
[verse]
11
'perf bench' [<common options>] <subsystem> <suite> [<options>]
12
13
DESCRIPTION
14
-----------
15
This 'perf bench' command is a general framework for benchmark suites.
16
17
COMMON OPTIONS
18
--------------
19
-r::
20
--repeat=::
21
Specify number of times to repeat the run (default 10).
22
23
-f::
24
--format=::
25
Specify format style.
26
Current available format styles are:
27
28
'default'::
29
Default style. This is mainly for human reading.
30
---------------------
31
% perf bench sched pipe # with no style specified
32
(executing 1000000 pipe operations between two tasks)
33
Total time:5.855 sec
34
5.855061 usecs/op
35
170792 ops/sec
36
---------------------
37
38
'simple'::
39
This simple style is friendly for automated
40
processing by scripts.
41
---------------------
42
% perf bench --format=simple sched pipe # specified simple
43
5.988
44
---------------------
45
46
SUBSYSTEM
47
---------
48
49
'sched'::
50
Scheduler and IPC mechanisms.
51
52
'syscall'::
53
System call performance (throughput).
54
55
'mem'::
56
Memory access performance.
57
58
'numa'::
59
NUMA scheduling and MM benchmarks.
60
61
'futex'::
62
Futex stressing benchmarks.
63
64
'epoll'::
65
Eventpoll (epoll) stressing benchmarks.
66
67
'internals'::
68
Benchmark internal perf functionality.
69
70
'uprobe'::
71
Benchmark overhead of uprobe + BPF.
72
73
'all'::
74
All benchmark subsystems.
75
76
SUITES FOR 'sched'
77
~~~~~~~~~~~~~~~~~~
78
*messaging*::
79
Suite for evaluating performance of scheduler and IPC mechanisms.
80
Based on hackbench by Rusty Russell.
81
82
Options of *messaging*
83
^^^^^^^^^^^^^^^^^^^^^^
84
-p::
85
--pipe::
86
Use pipe() instead of socketpair()
87
88
-t::
89
--thread::
90
Be multi thread instead of multi process
91
92
-g::
93
--group=::
94
Specify number of groups
95
96
-l::
97
--nr_loops=::
98
Specify number of loops
99
100
Example of *messaging*
101
^^^^^^^^^^^^^^^^^^^^^^
102
103
---------------------
104
% perf bench sched messaging # run with default
105
options (20 sender and receiver processes per group)
106
(10 groups == 400 processes run)
107
108
Total time:0.308 sec
109
110
% perf bench sched messaging -t -g 20 # be multi-thread, with 20 groups
111
(20 sender and receiver threads per group)
112
(20 groups == 800 threads run)
113
114
Total time:0.582 sec
115
---------------------
116
117
*pipe*::
118
Suite for pipe() system call.
119
Based on pipe-test-1m.c by Ingo Molnar.
120
121
Options of *pipe*
122
^^^^^^^^^^^^^^^^^
123
-l::
124
--loop=::
125
Specify number of loops.
126
127
-G::
128
--cgroups=::
129
Names of cgroups for sender and receiver, separated by a comma.
130
This is useful to check cgroup context switching overhead.
131
Note that perf doesn't create nor delete the cgroups, so users should
132
make sure that the cgroups exist and are accessible before use.
133
134
135
Example of *pipe*
136
^^^^^^^^^^^^^^^^^
137
138
---------------------
139
% perf bench sched pipe
140
(executing 1000000 pipe operations between two tasks)
141
142
Total time:8.091 sec
143
8.091833 usecs/op
144
123581 ops/sec
145
146
% perf bench sched pipe -l 1000 # loop 1000
147
(executing 1000 pipe operations between two tasks)
148
149
Total time:0.016 sec
150
16.948000 usecs/op
151
59004 ops/sec
152
153
% perf bench sched pipe -G AAA,BBB
154
(executing 1000000 pipe operations between cgroups)
155
# Running 'sched/pipe' benchmark:
156
# Executed 1000000 pipe operations between two processes
157
158
Total time: 6.886 [sec]
159
160
6.886208 usecs/op
161
145217 ops/sec
162
163
---------------------
164
165
SUITES FOR 'syscall'
166
~~~~~~~~~~~~~~~~~~
167
*basic*::
168
Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
169
This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
170
cached by glibc.
171
172
173
SUITES FOR 'mem'
174
~~~~~~~~~~~~~~~~
175
*memcpy*::
176
Suite for evaluating performance of simple memory copy in various ways.
177
178
Options of *memcpy*
179
^^^^^^^^^^^^^^^^^^^
180
-s::
181
--size::
182
Specify size of memory to copy (default: 1MB).
183
Available units are B, KB, MB, GB and TB (case insensitive).
184
185
-p::
186
--page::
187
Specify page-size for mapping memory buffers (default: 4KB).
188
Available values are 4KB, 2MB, 1GB (case insensitive).
189
190
-k::
191
--chunk::
192
Specify the chunk-size for each invocation. (default: 0, or full-extent)
193
Available units are B, KB, MB, GB and TB (case insensitive).
194
195
-f::
196
--function::
197
Specify function to copy (default: default).
198
Available functions are depend on the architecture.
199
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
200
201
-l::
202
--nr_loops::
203
Repeat memcpy invocation this number of times.
204
205
-c::
206
--cycles::
207
Use perf's cpu-cycles event instead of gettimeofday syscall.
208
209
*memset*::
210
Suite for evaluating performance of simple memory set in various ways.
211
212
Options of *memset*
213
^^^^^^^^^^^^^^^^^^^
214
-s::
215
--size::
216
Specify size of memory to set (default: 1MB).
217
Available units are B, KB, MB, GB and TB (case insensitive).
218
219
-p::
220
--page::
221
Specify page-size for mapping memory buffers (default: 4KB).
222
Available values are 4KB, 2MB, 1GB (case insensitive).
223
224
-k::
225
--chunk::
226
Specify the chunk-size for each invocation. (default: 0, or full-extent)
227
Available units are B, KB, MB, GB and TB (case insensitive).
228
229
-f::
230
--function::
231
Specify function to set (default: default).
232
Available functions are depend on the architecture.
233
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
234
235
-l::
236
--nr_loops::
237
Repeat memset invocation this number of times.
238
239
-c::
240
--cycles::
241
Use perf's cpu-cycles event instead of gettimeofday syscall.
242
243
*mmap*::
244
Suite for evaluating memory subsystem performance for mmap()'d memory.
245
246
Options of *mmap*
247
^^^^^^^^^^^^^^^^^
248
-s::
249
--size::
250
Specify size of memory to set (default: 1MB).
251
Available units are B, KB, MB, GB and TB (case insensitive).
252
253
-p::
254
--page::
255
Specify page-size for mapping memory buffers (default: 4KB).
256
Available values are 4KB, 2MB, 1GB (case insensitive).
257
258
-r::
259
--randomize::
260
Specify seed to randomize page access offset (default: 0, or not randomized).
261
262
-f::
263
--function::
264
Specify function to set (default: all).
265
Available functions are 'demand' and 'populate', with the first
266
demand faulting pages in the region and the second using an eager
267
mapping.
268
269
-l::
270
--nr_loops::
271
Repeat mmap() invocation this number of times.
272
273
-c::
274
--cycles::
275
Use perf's cpu-cycles event instead of gettimeofday syscall.
276
277
SUITES FOR 'numa'
278
~~~~~~~~~~~~~~~~~
279
*mem*::
280
Suite for evaluating NUMA workloads.
281
282
SUITES FOR 'futex'
283
~~~~~~~~~~~~~~~~~~
284
*hash*::
285
Suite for evaluating hash tables.
286
287
*wake*::
288
Suite for evaluating wake calls.
289
290
*wake-parallel*::
291
Suite for evaluating parallel wake calls.
292
293
*requeue*::
294
Suite for evaluating requeue calls.
295
296
*lock-pi*::
297
Suite for evaluating futex lock_pi calls.
298
299
SUITES FOR 'epoll'
300
~~~~~~~~~~~~~~~~~~
301
*wait*::
302
Suite for evaluating concurrent epoll_wait calls.
303
304
*ctl*::
305
Suite for evaluating multiple epoll_ctl calls.
306
307
SUITES FOR 'internals'
308
~~~~~~~~~~~~~~~~~~~~~~
309
*synthesize*::
310
Suite for evaluating perf's event synthesis performance.
311
312
SEE ALSO
313
--------
314
linkperf:perf[1]
315
316