Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
torvalds
GitHub Repository: torvalds/linux
Path: blob/master/tools/perf/Documentation/cpu-and-latency-overheads.txt
26282 views
1
CPU and latency overheads
2
-------------------------
3
There are two notions of time: wall-clock time and CPU time.
4
For a single-threaded program, or a program running on a single-core machine,
5
these notions are the same. However, for a multi-threaded/multi-process program
6
running on a multi-core machine, these notions are significantly different.
7
Each second of wall-clock time we have number-of-cores seconds of CPU time.
8
Perf can measure overhead for both of these times (shown in 'overhead' and
9
'latency' columns for CPU and wall-clock time correspondingly).
10
11
Optimizing CPU overhead is useful to improve 'throughput', while optimizing
12
latency overhead is useful to improve 'latency'. It's important to understand
13
which one is useful in a concrete situation at hand. For example, the former
14
may be useful to improve max throughput of a CI build server that runs on 100%
15
CPU utilization, while the latter may be useful to improve user-perceived
16
latency of a single interactive program build.
17
These overheads may be significantly different in some cases. For example,
18
consider a program that executes function 'foo' for 9 seconds with 1 thread,
19
and then executes function 'bar' for 1 second with 128 threads (consumes
20
128 seconds of CPU time). The CPU overhead is: 'foo' - 6.6%, 'bar' - 93.4%.
21
While the latency overhead is: 'foo' - 90%, 'bar' - 10%. If we try to optimize
22
running time of the program looking at the (wrong in this case) CPU overhead,
23
we would concentrate on the function 'bar', but it can yield only 10% running
24
time improvement at best.
25
26
By default, perf shows only CPU overhead. To show latency overhead, use
27
'perf record --latency' and 'perf report':
28
29
-----------------------------------
30
Overhead Latency Command
31
93.88% 25.79% cc1
32
1.90% 39.87% gzip
33
0.99% 10.16% dpkg-deb
34
0.57% 1.00% as
35
0.40% 0.46% sh
36
-----------------------------------
37
38
To sort by latency overhead, use 'perf report --latency':
39
40
-----------------------------------
41
Latency Overhead Command
42
39.87% 1.90% gzip
43
25.79% 93.88% cc1
44
10.16% 0.99% dpkg-deb
45
4.17% 0.29% git
46
2.81% 0.11% objtool
47
-----------------------------------
48
49
To get insight into the difference between the overheads, you may check
50
parallelization histogram with '--sort=latency,parallelism,comm,symbol --hierarchy'
51
flags. It shows fraction of (wall-clock) time the workload utilizes different
52
numbers of cores ('Parallelism' column). For example, in the following case
53
the workload utilizes only 1 core most of the time, but also has some
54
highly-parallel phases, which explains significant difference between
55
CPU and wall-clock overheads:
56
57
-----------------------------------
58
Latency Overhead Parallelism / Command / Symbol
59
+ 56.98% 2.29% 1
60
+ 16.94% 1.36% 2
61
+ 4.00% 20.13% 125
62
+ 3.66% 18.25% 124
63
+ 3.48% 17.66% 126
64
+ 3.26% 0.39% 3
65
+ 2.61% 12.93% 123
66
-----------------------------------
67
68
By expanding corresponding lines, you may see what commands/functions run
69
at the given parallelism level:
70
71
-----------------------------------
72
Latency Overhead Parallelism / Command / Symbol
73
- 56.98% 2.29% 1
74
32.80% 1.32% gzip
75
4.46% 0.18% cc1
76
2.81% 0.11% objtool
77
2.43% 0.10% dpkg-source
78
2.22% 0.09% ld
79
2.10% 0.08% dpkg-genchanges
80
-----------------------------------
81
82
To see the normal function-level profile for particular parallelism levels
83
(number of threads actively running on CPUs), you may use '--parallelism'
84
filter. For example, to see the profile only for low parallelism phases
85
of a workload use '--latency --parallelism=1-2' flags.
86
87