Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
awilliam
GitHub Repository: awilliam/linux-vfio
Path: blob/master/tools/perf/Documentation/examples.txt
10821 views
1
2
------------------------------
3
****** perf by examples ******
4
------------------------------
5
6
[ From an e-mail by Ingo Molnar, http://lkml.org/lkml/2009/8/4/346 ]
7
8
9
First, discovery/enumeration of available counters can be done via
10
'perf list':
11
12
titan:~> perf list
13
[...]
14
kmem:kmalloc [Tracepoint event]
15
kmem:kmem_cache_alloc [Tracepoint event]
16
kmem:kmalloc_node [Tracepoint event]
17
kmem:kmem_cache_alloc_node [Tracepoint event]
18
kmem:kfree [Tracepoint event]
19
kmem:kmem_cache_free [Tracepoint event]
20
kmem:mm_page_free_direct [Tracepoint event]
21
kmem:mm_pagevec_free [Tracepoint event]
22
kmem:mm_page_alloc [Tracepoint event]
23
kmem:mm_page_alloc_zone_locked [Tracepoint event]
24
kmem:mm_page_pcpu_drain [Tracepoint event]
25
kmem:mm_page_alloc_extfrag [Tracepoint event]
26
27
Then any (or all) of the above event sources can be activated and
28
measured. For example the page alloc/free properties of a 'hackbench
29
run' are:
30
31
titan:~> perf stat -e kmem:mm_page_pcpu_drain -e kmem:mm_page_alloc
32
-e kmem:mm_pagevec_free -e kmem:mm_page_free_direct ./hackbench 10
33
Time: 0.575
34
35
Performance counter stats for './hackbench 10':
36
37
13857 kmem:mm_page_pcpu_drain
38
27576 kmem:mm_page_alloc
39
6025 kmem:mm_pagevec_free
40
20934 kmem:mm_page_free_direct
41
42
0.613972165 seconds time elapsed
43
44
You can observe the statistical properties as well, by using the
45
'repeat the workload N times' feature of perf stat:
46
47
titan:~> perf stat --repeat 5 -e kmem:mm_page_pcpu_drain -e
48
kmem:mm_page_alloc -e kmem:mm_pagevec_free -e
49
kmem:mm_page_free_direct ./hackbench 10
50
Time: 0.627
51
Time: 0.644
52
Time: 0.564
53
Time: 0.559
54
Time: 0.626
55
56
Performance counter stats for './hackbench 10' (5 runs):
57
58
12920 kmem:mm_page_pcpu_drain ( +- 3.359% )
59
25035 kmem:mm_page_alloc ( +- 3.783% )
60
6104 kmem:mm_pagevec_free ( +- 0.934% )
61
18376 kmem:mm_page_free_direct ( +- 4.941% )
62
63
0.643954516 seconds time elapsed ( +- 2.363% )
64
65
Furthermore, these tracepoints can be used to sample the workload as
66
well. For example the page allocations done by a 'git gc' can be
67
captured the following way:
68
69
titan:~/git> perf record -f -e kmem:mm_page_alloc -c 1 ./git gc
70
Counting objects: 1148, done.
71
Delta compression using up to 2 threads.
72
Compressing objects: 100% (450/450), done.
73
Writing objects: 100% (1148/1148), done.
74
Total 1148 (delta 690), reused 1148 (delta 690)
75
[ perf record: Captured and wrote 0.267 MB perf.data (~11679 samples) ]
76
77
To check which functions generated page allocations:
78
79
titan:~/git> perf report
80
# Samples: 10646
81
#
82
# Overhead Command Shared Object
83
# ........ ............... ..........................
84
#
85
23.57% git-repack /lib64/libc-2.5.so
86
21.81% git /lib64/libc-2.5.so
87
14.59% git ./git
88
11.79% git-repack ./git
89
7.12% git /lib64/ld-2.5.so
90
3.16% git-repack /lib64/libpthread-2.5.so
91
2.09% git-repack /bin/bash
92
1.97% rm /lib64/libc-2.5.so
93
1.39% mv /lib64/ld-2.5.so
94
1.37% mv /lib64/libc-2.5.so
95
1.12% git-repack /lib64/ld-2.5.so
96
0.95% rm /lib64/ld-2.5.so
97
0.90% git-update-serv /lib64/libc-2.5.so
98
0.73% git-update-serv /lib64/ld-2.5.so
99
0.68% perf /lib64/libpthread-2.5.so
100
0.64% git-repack /usr/lib64/libz.so.1.2.3
101
102
Or to see it on a more finegrained level:
103
104
titan:~/git> perf report --sort comm,dso,symbol
105
# Samples: 10646
106
#
107
# Overhead Command Shared Object Symbol
108
# ........ ............... .......................... ......
109
#
110
9.35% git-repack ./git [.] insert_obj_hash
111
9.12% git ./git [.] insert_obj_hash
112
7.31% git /lib64/libc-2.5.so [.] memcpy
113
6.34% git-repack /lib64/libc-2.5.so [.] _int_malloc
114
6.24% git-repack /lib64/libc-2.5.so [.] memcpy
115
5.82% git-repack /lib64/libc-2.5.so [.] __GI___fork
116
5.47% git /lib64/libc-2.5.so [.] _int_malloc
117
2.99% git /lib64/libc-2.5.so [.] memset
118
119
Furthermore, call-graph sampling can be done too, of page
120
allocations - to see precisely what kind of page allocations there
121
are:
122
123
titan:~/git> perf record -f -g -e kmem:mm_page_alloc -c 1 ./git gc
124
Counting objects: 1148, done.
125
Delta compression using up to 2 threads.
126
Compressing objects: 100% (450/450), done.
127
Writing objects: 100% (1148/1148), done.
128
Total 1148 (delta 690), reused 1148 (delta 690)
129
[ perf record: Captured and wrote 0.963 MB perf.data (~42069 samples) ]
130
131
titan:~/git> perf report -g
132
# Samples: 10686
133
#
134
# Overhead Command Shared Object
135
# ........ ............... ..........................
136
#
137
23.25% git-repack /lib64/libc-2.5.so
138
|
139
|--50.00%-- _int_free
140
|
141
|--37.50%-- __GI___fork
142
| make_child
143
|
144
|--12.50%-- ptmalloc_unlock_all2
145
| make_child
146
|
147
--6.25%-- __GI_strcpy
148
21.61% git /lib64/libc-2.5.so
149
|
150
|--30.00%-- __GI_read
151
| |
152
| --83.33%-- git_config_from_file
153
| git_config
154
| |
155
[...]
156
157
Or you can observe the whole system's page allocations for 10
158
seconds:
159
160
titan:~/git> perf stat -a -e kmem:mm_page_pcpu_drain -e
161
kmem:mm_page_alloc -e kmem:mm_pagevec_free -e
162
kmem:mm_page_free_direct sleep 10
163
164
Performance counter stats for 'sleep 10':
165
166
171585 kmem:mm_page_pcpu_drain
167
322114 kmem:mm_page_alloc
168
73623 kmem:mm_pagevec_free
169
254115 kmem:mm_page_free_direct
170
171
10.000591410 seconds time elapsed
172
173
Or observe how fluctuating the page allocations are, via statistical
174
analysis done over ten 1-second intervals:
175
176
titan:~/git> perf stat --repeat 10 -a -e kmem:mm_page_pcpu_drain -e
177
kmem:mm_page_alloc -e kmem:mm_pagevec_free -e
178
kmem:mm_page_free_direct sleep 1
179
180
Performance counter stats for 'sleep 1' (10 runs):
181
182
17254 kmem:mm_page_pcpu_drain ( +- 3.709% )
183
34394 kmem:mm_page_alloc ( +- 4.617% )
184
7509 kmem:mm_pagevec_free ( +- 4.820% )
185
25653 kmem:mm_page_free_direct ( +- 3.672% )
186
187
1.058135029 seconds time elapsed ( +- 3.089% )
188
189
Or you can annotate the recorded 'git gc' run on a per symbol basis
190
and check which instructions/source-code generated page allocations:
191
192
titan:~/git> perf annotate __GI___fork
193
------------------------------------------------
194
Percent | Source code & Disassembly of libc-2.5.so
195
------------------------------------------------
196
:
197
:
198
: Disassembly of section .plt:
199
: Disassembly of section .text:
200
:
201
: 00000031a2e95560 <__fork>:
202
[...]
203
0.00 : 31a2e95602: b8 38 00 00 00 mov $0x38,%eax
204
0.00 : 31a2e95607: 0f 05 syscall
205
83.42 : 31a2e95609: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
206
0.00 : 31a2e9560f: 0f 87 4d 01 00 00 ja 31a2e95762 <__fork+0x202>
207
0.00 : 31a2e95615: 85 c0 test %eax,%eax
208
209
( this shows that 83.42% of __GI___fork's page allocations come from
210
the 0x38 system call it performs. )
211
212
etc. etc. - a lot more is possible. I could list a dozen of
213
other different usecases straight away - neither of which is
214
possible via /proc/vmstat.
215
216
/proc/vmstat is not in the same league really, in terms of
217
expressive power of system analysis and performance
218
analysis.
219
220
All that the above results needed were those new tracepoints
221
in include/tracing/events/kmem.h.
222
223
Ingo
224
225
226
227