Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
torvalds
GitHub Repository: torvalds/linux
Path: blob/master/tools/perf/Documentation/intel-hybrid.txt
26282 views
1
Intel hybrid support
2
--------------------
3
Support for Intel hybrid events within perf tools.
4
5
For some Intel platforms, such as AlderLake, which is hybrid platform and
6
it consists of atom cpu and core cpu. Each cpu has dedicated event list.
7
Part of events are available on core cpu, part of events are available
8
on atom cpu and even part of events are available on both.
9
10
Kernel exports two new cpu pmus via sysfs:
11
/sys/bus/event_source/devices/cpu_core
12
/sys/bus/event_source/devices/cpu_atom
13
14
The 'cpus' files are created under the directories. For example,
15
16
cat /sys/bus/event_source/devices/cpu_core/cpus
17
0-15
18
19
cat /sys/bus/event_source/devices/cpu_atom/cpus
20
16-23
21
22
It indicates cpu0-cpu15 are core cpus and cpu16-cpu23 are atom cpus.
23
24
As before, use perf-list to list the symbolic event.
25
26
perf list
27
28
inst_retired.any
29
[Fixed Counter: Counts the number of instructions retired. Unit: cpu_atom]
30
inst_retired.any
31
[Number of instructions retired. Fixed Counter - architectural event. Unit: cpu_core]
32
33
The 'Unit: xxx' is added to brief description to indicate which pmu
34
the event is belong to. Same event name but with different pmu can
35
be supported.
36
37
Enable hybrid event with a specific pmu
38
39
To enable a core only event or atom only event, following syntax is supported:
40
41
cpu_core/<event name>/
42
or
43
cpu_atom/<event name>/
44
45
For example, count the 'cycles' event on core cpus.
46
47
perf stat -e cpu_core/cycles/
48
49
Create two events for one hardware event automatically
50
51
When creating one event and the event is available on both atom and core,
52
two events are created automatically. One is for atom, the other is for
53
core. Most of hardware events and cache events are available on both
54
cpu_core and cpu_atom.
55
56
For hardware events, they have pre-defined configs (e.g. 0 for cycles).
57
But on hybrid platform, kernel needs to know where the event comes from
58
(from atom or from core). The original perf event type PERF_TYPE_HARDWARE
59
can't carry pmu information. So now this type is extended to be PMU aware
60
type. The PMU type ID is stored at attr.config[63:32].
61
62
PMU type ID is retrieved from sysfs.
63
/sys/bus/event_source/devices/cpu_atom/type
64
/sys/bus/event_source/devices/cpu_core/type
65
66
The new attr.config layout for PERF_TYPE_HARDWARE:
67
68
PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
69
AA: hardware event ID
70
EEEEEEEE: PMU type ID
71
72
Cache event is similar. The type PERF_TYPE_HW_CACHE is extended to be
73
PMU aware type. The PMU type ID is stored at attr.config[63:32].
74
75
The new attr.config layout for PERF_TYPE_HW_CACHE:
76
77
PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
78
BB: hardware cache ID
79
CC: hardware cache op ID
80
DD: hardware cache op result ID
81
EEEEEEEE: PMU type ID
82
83
When enabling a hardware event without specified pmu, such as,
84
perf stat -e cycles -a (use system-wide in this example), two events
85
are created automatically.
86
87
------------------------------------------------------------
88
perf_event_attr:
89
size 120
90
config 0x400000000
91
sample_type IDENTIFIER
92
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
93
disabled 1
94
inherit 1
95
exclude_guest 1
96
------------------------------------------------------------
97
98
and
99
100
------------------------------------------------------------
101
perf_event_attr:
102
size 120
103
config 0x800000000
104
sample_type IDENTIFIER
105
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
106
disabled 1
107
inherit 1
108
exclude_guest 1
109
------------------------------------------------------------
110
111
type 0 is PERF_TYPE_HARDWARE.
112
0x4 in 0x400000000 indicates it's cpu_core pmu.
113
0x8 in 0x800000000 indicates it's cpu_atom pmu (atom pmu type id is random).
114
115
The kernel creates 'cycles' (0x400000000) on cpu0-cpu15 (core cpus),
116
and create 'cycles' (0x800000000) on cpu16-cpu23 (atom cpus).
117
118
For perf-stat result, it displays two events:
119
120
Performance counter stats for 'system wide':
121
122
6,744,979 cpu_core/cycles/
123
1,965,552 cpu_atom/cycles/
124
125
The first 'cycles' is core event, the second 'cycles' is atom event.
126
127
Thread mode example:
128
129
perf-stat reports the scaled counts for hybrid event and with a percentage
130
displayed. The percentage is the event's running time/enabling time.
131
132
One example, 'triad_loop' runs on cpu16 (atom core), while we can see the
133
scaled value for core cycles is 160,444,092 and the percentage is 0.47%.
134
135
perf stat -e cycles \-- taskset -c 16 ./triad_loop
136
137
As previous, two events are created.
138
139
------------------------------------------------------------
140
perf_event_attr:
141
size 120
142
config 0x400000000
143
sample_type IDENTIFIER
144
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
145
disabled 1
146
inherit 1
147
enable_on_exec 1
148
exclude_guest 1
149
------------------------------------------------------------
150
151
and
152
153
------------------------------------------------------------
154
perf_event_attr:
155
size 120
156
config 0x800000000
157
sample_type IDENTIFIER
158
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
159
disabled 1
160
inherit 1
161
enable_on_exec 1
162
exclude_guest 1
163
------------------------------------------------------------
164
165
Performance counter stats for 'taskset -c 16 ./triad_loop':
166
167
233,066,666 cpu_core/cycles/ (0.43%)
168
604,097,080 cpu_atom/cycles/ (99.57%)
169
170
perf-record:
171
172
If there is no '-e' specified in perf record, on hybrid platform,
173
it creates two default 'cycles' and adds them to event list. One
174
is for core, the other is for atom.
175
176
perf-stat:
177
178
If there is no '-e' specified in perf stat, on hybrid platform,
179
besides of software events, following events are created and
180
added to event list in order.
181
182
cpu_core/cycles/,
183
cpu_atom/cycles/,
184
cpu_core/instructions/,
185
cpu_atom/instructions/,
186
cpu_core/branches/,
187
cpu_atom/branches/,
188
cpu_core/branch-misses/,
189
cpu_atom/branch-misses/
190
191
Of course, both perf-stat and perf-record support to enable
192
hybrid event with a specific pmu.
193
194
e.g.
195
perf stat -e cpu_core/cycles/
196
perf stat -e cpu_atom/cycles/
197
perf stat -e cpu_core/r1a/
198
perf stat -e cpu_atom/L1-icache-loads/
199
perf stat -e cpu_core/cycles/,cpu_atom/instructions/
200
perf stat -e '{cpu_core/cycles/,cpu_core/instructions/}'
201
202
But '{cpu_core/cycles/,cpu_atom/instructions/}' will return
203
warning and disable grouping, because the pmus in group are
204
not matched (cpu_core vs. cpu_atom).
205
206