CoCalc -- perf-bench.txt

GitHub Repository: torvalds/linux
Path: blob/master/tools/perf/Documentation/perf-bench.txt
²⁶²⁸² views
1
perf-bench(1)
2
=============
3

4
NAME
5
----
6
perf-bench - General framework for benchmark suites
7

8
SYNOPSIS
9
--------
10
[verse]
11
'perf bench' [<common options>] <subsystem> <suite> [<options>]
12

13
DESCRIPTION
14
-----------
15
This 'perf bench' command is a general framework for benchmark suites.
16

17
COMMON OPTIONS
18
--------------
19
-r::
20
--repeat=::
21
Specify number of times to repeat the run (default 10).
22

23
-f::
24
--format=::
25
Specify format style.
26
Current available format styles are:
27

28
'default'::
29
Default style. This is mainly for human reading.
30
---------------------
31
% perf bench sched pipe                      # with no style specified
32
(executing 1000000 pipe operations between two tasks)
33
        Total time:5.855 sec
34
                5.855061 usecs/op
35
		170792 ops/sec
36
---------------------
37

38
'simple'::
39
This simple style is friendly for automated
40
processing by scripts.
41
---------------------
42
% perf bench --format=simple sched pipe      # specified simple
43
5.988
44
---------------------
45

46
SUBSYSTEM
47
---------
48

49
'sched'::
50
	Scheduler and IPC mechanisms.
51

52
'syscall'::
53
	System call performance (throughput).
54

55
'mem'::
56
	Memory access performance.
57

58
'numa'::
59
	NUMA scheduling and MM benchmarks.
60

61
'futex'::
62
	Futex stressing benchmarks.
63

64
'epoll'::
65
	Eventpoll (epoll) stressing benchmarks.
66

67
'internals'::
68
	Benchmark internal perf functionality.
69

70
'uprobe'::
71
	Benchmark overhead of uprobe + BPF.
72

73
'all'::
74
	All benchmark subsystems.
75

76
SUITES FOR 'sched'
77
~~~~~~~~~~~~~~~~~~
78
*messaging*::
79
Suite for evaluating performance of scheduler and IPC mechanisms.
80
Based on hackbench by Rusty Russell.
81

82
Options of *messaging*
83
^^^^^^^^^^^^^^^^^^^^^^
84
-p::
85
--pipe::
86
Use pipe() instead of socketpair()
87

88
-t::
89
--thread::
90
Be multi thread instead of multi process
91

92
-g::
93
--group=::
94
Specify number of groups
95

96
-l::
97
--nr_loops=::
98
Specify number of loops
99

100
Example of *messaging*
101
^^^^^^^^^^^^^^^^^^^^^^
102

103
---------------------
104
% perf bench sched messaging                 # run with default
105
options (20 sender and receiver processes per group)
106
(10 groups == 400 processes run)
107

108
      Total time:0.308 sec
109

110
% perf bench sched messaging -t -g 20        # be multi-thread, with 20 groups
111
(20 sender and receiver threads per group)
112
(20 groups == 800 threads run)
113

114
      Total time:0.582 sec
115
---------------------
116

117
*pipe*::
118
Suite for pipe() system call.
119
Based on pipe-test-1m.c by Ingo Molnar.
120

121
Options of *pipe*
122
^^^^^^^^^^^^^^^^^
123
-l::
124
--loop=::
125
Specify number of loops.
126

127
-G::
128
--cgroups=::
129
Names of cgroups for sender and receiver, separated by a comma.
130
This is useful to check cgroup context switching overhead.
131
Note that perf doesn't create nor delete the cgroups, so users should
132
make sure that the cgroups exist and are accessible before use.
133

134

135
Example of *pipe*
136
^^^^^^^^^^^^^^^^^
137

138
---------------------
139
% perf bench sched pipe
140
(executing 1000000 pipe operations between two tasks)
141

142
        Total time:8.091 sec
143
                8.091833 usecs/op
144
                123581 ops/sec
145

146
% perf bench sched pipe -l 1000              # loop 1000
147
(executing 1000 pipe operations between two tasks)
148

149
        Total time:0.016 sec
150
                16.948000 usecs/op
151
                59004 ops/sec
152

153
% perf bench sched pipe -G AAA,BBB
154
(executing 1000000 pipe operations between cgroups)
155
# Running 'sched/pipe' benchmark:
156
# Executed 1000000 pipe operations between two processes
157

158
     Total time: 6.886 [sec]
159

160
       6.886208 usecs/op
161
         145217 ops/sec
162

163
---------------------
164

165
SUITES FOR 'syscall'
166
~~~~~~~~~~~~~~~~~~
167
*basic*::
168
Suite for evaluating performance of core system call throughput (both usecs/op and ops/sec metrics).
169
This uses a single thread simply doing getppid(2), which is a simple syscall where the result is not
170
cached by glibc.
171

172

173
SUITES FOR 'mem'
174
~~~~~~~~~~~~~~~~
175
*memcpy*::
176
Suite for evaluating performance of simple memory copy in various ways.
177

178
Options of *memcpy*
179
^^^^^^^^^^^^^^^^^^^
180
-l::
181
--size::
182
Specify size of memory to copy (default: 1MB).
183
Available units are B, KB, MB, GB and TB (case insensitive).
184

185
-f::
186
--function::
187
Specify function to copy (default: default).
188
Available functions are depend on the architecture.
189
On x86-64, x86-64-unrolled, x86-64-movsq and x86-64-movsb are supported.
190

191
-l::
192
--nr_loops::
193
Repeat memcpy invocation this number of times.
194

195
-c::
196
--cycles::
197
Use perf's cpu-cycles event instead of gettimeofday syscall.
198

199
*memset*::
200
Suite for evaluating performance of simple memory set in various ways.
201

202
Options of *memset*
203
^^^^^^^^^^^^^^^^^^^
204
-l::
205
--size::
206
Specify size of memory to set (default: 1MB).
207
Available units are B, KB, MB, GB and TB (case insensitive).
208

209
-f::
210
--function::
211
Specify function to set (default: default).
212
Available functions are depend on the architecture.
213
On x86-64, x86-64-unrolled, x86-64-stosq and x86-64-stosb are supported.
214

215
-l::
216
--nr_loops::
217
Repeat memset invocation this number of times.
218

219
-c::
220
--cycles::
221
Use perf's cpu-cycles event instead of gettimeofday syscall.
222

223
SUITES FOR 'numa'
224
~~~~~~~~~~~~~~~~~
225
*mem*::
226
Suite for evaluating NUMA workloads.
227

228
SUITES FOR 'futex'
229
~~~~~~~~~~~~~~~~~~
230
*hash*::
231
Suite for evaluating hash tables.
232

233
*wake*::
234
Suite for evaluating wake calls.
235

236
*wake-parallel*::
237
Suite for evaluating parallel wake calls.
238

239
*requeue*::
240
Suite for evaluating requeue calls.
241

242
*lock-pi*::
243
Suite for evaluating futex lock_pi calls.
244

245
SUITES FOR 'epoll'
246
~~~~~~~~~~~~~~~~~~
247
*wait*::
248
Suite for evaluating concurrent epoll_wait calls.
249

250
*ctl*::
251
Suite for evaluating multiple epoll_ctl calls.
252

253
SUITES FOR 'internals'
254
~~~~~~~~~~~~~~~~~~~~~~
255
*synthesize*::
256
Suite for evaluating perf's event synthesis performance.
257

258
SEE ALSO
259
--------
260
linkperf:perf[1]
261

262
Product

Resources

Company