Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
allendowney
GitHub Repository: allendowney/cpython
Path: blob/main/Objects/lnotab_notes.txt
12 views
1
Description of the internal format of the line number table
2
3
Conceptually, the line number table consists of a sequence of triples:
4
start-offset (inclusive), end-offset (exclusive), line-number.
5
6
Note that not all byte codes have a line number so we need handle `None` for the line-number.
7
8
However, storing the above sequence directly would be very inefficient as we would need 12 bytes per entry.
9
10
First, note that the end of one entry is the same as the start of the next, so we can overlap entries.
11
Second, we don't really need arbitrary access to the sequence, so we can store deltas.
12
13
We just need to store (end - start, line delta) pairs. The start offset of the first entry is always zero.
14
15
Third, most deltas are small, so we can use a single byte for each value, as long we allow several entries for the same line.
16
17
Consider the following table
18
Start End Line
19
0 6 1
20
6 50 2
21
50 350 7
22
350 360 No line number
23
360 376 8
24
376 380 208
25
26
Stripping the redundant ends gives:
27
28
End-Start Line-delta
29
6 +1
30
44 +1
31
300 +5
32
10 No line number
33
16 +1
34
4 +200
35
36
37
Note that the end - start value is always positive.
38
39
Finally, in order to fit into a single byte we need to convert start deltas to the range 0 <= delta <= 254,
40
and line deltas to the range -127 <= delta <= 127.
41
A line delta of -128 is used to indicate no line number.
42
Also note that a delta of zero indicates that there are no bytecodes in the given range,
43
which means we can use an invalid line number for that range.
44
45
Final form:
46
47
Start delta Line delta
48
6 +1
49
44 +1
50
254 +5
51
46 0
52
10 -128 (No line number, treated as a delta of zero)
53
16 +1
54
0 +127 (line 135, but the range is empty as no bytecodes are at line 135)
55
4 +73
56
57
Iterating over the table.
58
-------------------------
59
60
For the `co_lines` attribute we want to emit the full form, omitting the (350, 360, No line number) and empty entries.
61
62
The code is as follows:
63
64
def co_lines(code):
65
line = code.co_firstlineno
66
end = 0
67
table_iter = iter(code.internal_line_table):
68
for sdelta, ldelta in table_iter:
69
if ldelta == 0: # No change to line number, just accumulate changes to end
70
end += sdelta
71
continue
72
start = end
73
end = start + sdelta
74
if ldelta == -128: # No valid line number -- skip entry
75
continue
76
line += ldelta
77
if end == start: # Empty range, omit.
78
continue
79
yield start, end, line
80
81
82
83
84
The historical co_lnotab format
85
-------------------------------
86
87
prior to 3.10 code objects stored a field named co_lnotab.
88
This was an array of unsigned bytes disguised as a Python bytes object.
89
90
The old co_lnotab did not account for the presence of bytecodes without a line number,
91
nor was it well suited to tracing as a number of workarounds were required.
92
93
The old format can still be accessed via `code.co_lnotab`, which is lazily computed from the new format.
94
95
Below is the description of the old co_lnotab format:
96
97
98
The array is conceptually a compressed list of
99
(bytecode offset increment, line number increment)
100
pairs. The details are important and delicate, best illustrated by example:
101
102
byte code offset source code line number
103
0 1
104
6 2
105
50 7
106
350 207
107
361 208
108
109
Instead of storing these numbers literally, we compress the list by storing only
110
the difference from one row to the next. Conceptually, the stored list might
111
look like:
112
113
0, 1, 6, 1, 44, 5, 300, 200, 11, 1
114
115
The above doesn't really work, but it's a start. An unsigned byte (byte code
116
offset) can't hold negative values, or values larger than 255, a signed byte
117
(line number) can't hold values larger than 127 or less than -128, and the
118
above example contains two such values. (Note that before 3.6, line number
119
was also encoded by an unsigned byte.) So we make two tweaks:
120
121
(a) there's a deep assumption that byte code offsets increase monotonically,
122
and
123
(b) if byte code offset jumps by more than 255 from one row to the next, or if
124
source code line number jumps by more than 127 or less than -128 from one row
125
to the next, more than one pair is written to the table. In case #b,
126
there's no way to know from looking at the table later how many were written.
127
That's the delicate part. A user of co_lnotab desiring to find the source
128
line number corresponding to a bytecode address A should do something like
129
this:
130
131
lineno = addr = 0
132
for addr_incr, line_incr in co_lnotab:
133
addr += addr_incr
134
if addr > A:
135
return lineno
136
if line_incr >= 0x80:
137
line_incr -= 0x100
138
lineno += line_incr
139
140
(In C, this is implemented by PyCode_Addr2Line().) In order for this to work,
141
when the addr field increments by more than 255, the line # increment in each
142
pair generated must be 0 until the remaining addr increment is < 256. So, in
143
the example above, assemble_lnotab in compile.c should not (as was actually done
144
until 2.2) expand 300, 200 to
145
255, 255, 45, 45,
146
but to
147
255, 0, 45, 127, 0, 73.
148
149
The above is sufficient to reconstruct line numbers for tracebacks, but not for
150
line tracing. Tracing is handled by PyCode_CheckLineNumber() in codeobject.c
151
and maybe_call_line_trace() in ceval.c.
152
153
*** Tracing ***
154
155
To a first approximation, we want to call the tracing function when the line
156
number of the current instruction changes. Re-computing the current line for
157
every instruction is a little slow, though, so each time we compute the line
158
number we save the bytecode indices where it's valid:
159
160
*instr_lb <= frame->f_lasti < *instr_ub
161
162
is true so long as execution does not change lines. That is, *instr_lb holds
163
the first bytecode index of the current line, and *instr_ub holds the first
164
bytecode index of the next line. As long as the above expression is true,
165
maybe_call_line_trace() does not need to call PyCode_CheckLineNumber(). Note
166
that the same line may appear multiple times in the lnotab, either because the
167
bytecode jumped more than 255 indices between line number changes or because
168
the compiler inserted the same line twice. Even in that case, *instr_ub holds
169
the first index of the next line.
170
171
However, we don't *always* want to call the line trace function when the above
172
test fails.
173
174
Consider this code:
175
176
1: def f(a):
177
2: while a:
178
3: print(1)
179
4: break
180
5: else:
181
6: print(2)
182
183
which compiles to this:
184
185
2 0 SETUP_LOOP 26 (to 28)
186
>> 2 LOAD_FAST 0 (a)
187
4 POP_JUMP_IF_FALSE 18
188
189
3 6 LOAD_GLOBAL 0 (print)
190
8 LOAD_CONST 1 (1)
191
10 CALL_NO_KW 1
192
12 POP_TOP
193
194
4 14 BREAK_LOOP
195
16 JUMP_ABSOLUTE 2
196
>> 18 POP_BLOCK
197
198
6 20 LOAD_GLOBAL 0 (print)
199
22 LOAD_CONST 2 (2)
200
24 CALL_NO_KW 1
201
26 POP_TOP
202
>> 28 LOAD_CONST 0 (None)
203
30 RETURN_VALUE
204
205
If 'a' is false, execution will jump to the POP_BLOCK instruction at offset 18
206
and the co_lnotab will claim that execution has moved to line 4, which is wrong.
207
In this case, we could instead associate the POP_BLOCK with line 5, but that
208
would break jumps around loops without else clauses.
209
210
We fix this by only calling the line trace function for a forward jump if the
211
co_lnotab indicates we have jumped to the *start* of a line, i.e. if the current
212
instruction offset matches the offset given for the start of a line by the
213
co_lnotab. For backward jumps, however, we always call the line trace function,
214
which lets a debugger stop on every evaluation of a loop guard (which usually
215
won't be the first opcode in a line).
216
217
Why do we set f_lineno when tracing, and only just before calling the trace
218
function? Well, consider the code above when 'a' is true. If stepping through
219
this with 'n' in pdb, you would stop at line 1 with a "call" type event, then
220
line events on lines 2, 3, and 4, then a "return" type event -- but because the
221
code for the return actually falls in the range of the "line 6" opcodes, you
222
would be shown line 6 during this event. This is a change from the behaviour in
223
2.2 and before, and I've found it confusing in practice. By setting and using
224
f_lineno when tracing, one can report a line number different from that
225
suggested by f_lasti on this one occasion where it's desirable.
226
227