Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
bytecodealliance
GitHub Repository: bytecodealliance/wasmtime
Path: blob/main/cranelift/codegen/src/machinst/buffer.rs
3054 views
1
//! In-memory representation of compiled machine code, with labels and fixups to
2
//! refer to those labels. Handles constant-pool island insertion and also
3
//! veneer insertion for out-of-range jumps.
4
//!
5
//! This code exists to solve three problems:
6
//!
7
//! - Branch targets for forward branches are not known until later, when we
8
//! emit code in a single pass through the instruction structs.
9
//!
10
//! - On many architectures, address references or offsets have limited range.
11
//! For example, on AArch64, conditional branches can only target code +/- 1MB
12
//! from the branch itself.
13
//!
14
//! - The lowering of control flow from the CFG-with-edges produced by
15
//! [BlockLoweringOrder](super::BlockLoweringOrder), combined with many empty
16
//! edge blocks when the register allocator does not need to insert any
17
//! spills/reloads/moves in edge blocks, results in many suboptimal branch
18
//! patterns. The lowering also pays no attention to block order, and so
19
//! two-target conditional forms (cond-br followed by uncond-br) can often by
20
//! avoided because one of the targets is the fallthrough. There are several
21
//! cases here where we can simplify to use fewer branches.
22
//!
23
//! This "buffer" implements a single-pass code emission strategy (with a later
24
//! "fixup" pass, but only through recorded fixups, not all instructions). The
25
//! basic idea is:
26
//!
27
//! - Emit branches as they are, including two-target (cond/uncond) compound
28
//! forms, but with zero offsets and optimistically assuming the target will be
29
//! in range. Record the "fixup" for later. Targets are denoted instead by
30
//! symbolic "labels" that are then bound to certain offsets in the buffer as
31
//! we emit code. (Nominally, there is a label at the start of every basic
32
//! block.)
33
//!
34
//! - As we do this, track the offset in the buffer at which the first label
35
//! reference "goes out of range". We call this the "deadline". If we reach the
36
//! deadline and we still have not bound the label to which an unresolved branch
37
//! refers, we have a problem!
38
//!
39
//! - To solve this problem, we emit "islands" full of "veneers". An island is
40
//! simply a chunk of code inserted in the middle of the code actually produced
41
//! by the emitter (e.g., vcode iterating over instruction structs). The emitter
42
//! has some awareness of this: it either asks for an island between blocks, so
43
//! it is not accidentally executed, or else it emits a branch around the island
44
//! when all other options fail (see `Inst::EmitIsland` meta-instruction).
45
//!
46
//! - A "veneer" is an instruction (or sequence of instructions) in an "island"
47
//! that implements a longer-range reference to a label. The idea is that, for
48
//! example, a branch with a limited range can branch to a "veneer" instead,
49
//! which is simply a branch in a form that can use a longer-range reference. On
50
//! AArch64, for example, conditionals have a +/- 1 MB range, but a conditional
51
//! can branch to an unconditional branch which has a +/- 128 MB range. Hence, a
52
//! conditional branch's label reference can be fixed up with a "veneer" to
53
//! achieve a longer range.
54
//!
55
//! - To implement all of this, we require the backend to provide a `LabelUse`
56
//! type that implements a trait. This is nominally an enum that records one of
57
//! several kinds of references to an offset in code -- basically, a relocation
58
//! type -- and will usually correspond to different instruction formats. The
59
//! `LabelUse` implementation specifies the maximum range, how to patch in the
60
//! actual label location when known, and how to generate a veneer to extend the
61
//! range.
62
//!
63
//! That satisfies label references, but we still may have suboptimal branch
64
//! patterns. To clean up the branches, we do a simple "peephole"-style
65
//! optimization on the fly. To do so, the emitter (e.g., `Inst::emit()`)
66
//! informs the buffer of branches in the code and, in the case of conditionals,
67
//! the code that would have been emitted to invert this branch's condition. We
68
//! track the "latest branches": these are branches that are contiguous up to
69
//! the current offset. (If any code is emitted after a branch, that branch or
70
//! run of contiguous branches is no longer "latest".) The latest branches are
71
//! those that we can edit by simply truncating the buffer and doing something
72
//! else instead.
73
//!
74
//! To optimize branches, we implement several simple rules, and try to apply
75
//! them to the "latest branches" when possible:
76
//!
77
//! - A branch with a label target, when that label is bound to the ending
78
//! offset of the branch (the fallthrough location), can be removed altogether,
79
//! because the branch would have no effect).
80
//!
81
//! - An unconditional branch that starts at a label location, and branches to
82
//! another label, results in a "label alias": all references to the label bound
83
//! *to* this branch instruction are instead resolved to the *target* of the
84
//! branch instruction. This effectively removes empty blocks that just
85
//! unconditionally branch to the next block. We call this "branch threading".
86
//!
87
//! - A conditional followed by an unconditional, when the conditional branches
88
//! to the unconditional's fallthrough, results in (i) the truncation of the
89
//! unconditional, (ii) the inversion of the condition's condition, and (iii)
90
//! replacement of the conditional's target (using the original target of the
91
//! unconditional). This is a fancy way of saying "we can flip a two-target
92
//! conditional branch's taken/not-taken targets if it works better with our
93
//! fallthrough". To make this work, the emitter actually gives the buffer
94
//! *both* forms of every conditional branch: the true form is emitted into the
95
//! buffer, and the "inverted" machine-code bytes are provided as part of the
96
//! branch-fixup metadata.
97
//!
98
//! - An unconditional B preceded by another unconditional P, when B's label(s) have
99
//! been redirected to target(B), can be removed entirely. This is an extension
100
//! of the branch-threading optimization, and is valid because if we know there
101
//! will be no fallthrough into this branch instruction (the prior instruction
102
//! is an unconditional jump), and if we know we have successfully redirected
103
//! all labels, then this branch instruction is unreachable. Note that this
104
//! works because the redirection happens before the label is ever resolved
105
//! (fixups happen at island emission time, at which point latest-branches are
106
//! cleared, or at the end of emission), so we are sure to catch and redirect
107
//! all possible paths to this instruction.
108
//!
109
//! # Branch-optimization Correctness
110
//!
111
//! The branch-optimization mechanism depends on a few data structures with
112
//! invariants, which are always held outside the scope of top-level public
113
//! methods:
114
//!
115
//! - The latest-branches list. Each entry describes a span of the buffer
116
//! (start/end offsets), the label target, the corresponding fixup-list entry
117
//! index, and the bytes (must be the same length) for the inverted form, if
118
//! conditional. The list of labels that are bound to the start-offset of this
119
//! branch is *complete* (if any label has a resolved offset equal to `start`
120
//! and is not an alias, it must appear in this list) and *precise* (no label
121
//! in this list can be bound to another offset). No label in this list should
122
//! be an alias. No two branch ranges can overlap, and branches are in
123
//! ascending-offset order.
124
//!
125
//! - The labels-at-tail list. This contains all MachLabels that have been bound
126
//! to (whose resolved offsets are equal to) the tail offset of the buffer.
127
//! No label in this list should be an alias.
128
//!
129
//! - The label_offsets array, containing the bound offset of a label or
130
//! UNKNOWN. No label can be bound at an offset greater than the current
131
//! buffer tail.
132
//!
133
//! - The label_aliases array, containing another label to which a label is
134
//! bound or UNKNOWN. A label's resolved offset is the resolved offset
135
//! of the label it is aliased to, if this is set.
136
//!
137
//! We argue below, at each method, how the invariants in these data structures
138
//! are maintained (grep for "Post-invariant").
139
//!
140
//! Given these invariants, we argue why each optimization preserves execution
141
//! semantics below (grep for "Preserves execution semantics").
142
//!
143
//! # Avoiding Quadratic Behavior
144
//!
145
//! There are two cases where we've had to take some care to avoid
146
//! quadratic worst-case behavior:
147
//!
148
//! - The "labels at this branch" list can grow unboundedly if the
149
//! code generator binds many labels at one location. If the count
150
//! gets too high (defined by the `LABEL_LIST_THRESHOLD` constant), we
151
//! simply abort an optimization early in a way that is always correct
152
//! but is conservative.
153
//!
154
//! - The fixup list can interact with island emission to create
155
//! "quadratic island behavior". In a little more detail, one can hit
156
//! this behavior by having some pending fixups (forward label
157
//! references) with long-range label-use kinds, and some others
158
//! with shorter-range references that nonetheless still are pending
159
//! long enough to trigger island generation. In such a case, we
160
//! process the fixup list, generate veneers to extend some forward
161
//! references' ranges, but leave the other (longer-range) ones
162
//! alone. The way this was implemented put them back on a list and
163
//! resulted in quadratic behavior.
164
//!
165
//! To avoid this fixups are split into two lists: one "pending" list and one
166
//! final list. The pending list is kept around for handling fixups related to
167
//! branches so it can be edited/truncated. When an island is reached, which
168
//! starts processing fixups, all pending fixups are flushed into the final
169
//! list. The final list is a `BinaryHeap` which enables fixup processing to
170
//! only process those which are required during island emission, deferring
171
//! all longer-range fixups to later.
172
173
use crate::binemit::{Addend, CodeOffset, Reloc};
174
use crate::ir::function::FunctionParameters;
175
use crate::ir::{DebugTag, ExceptionTag, ExternalName, RelSourceLoc, SourceLoc, TrapCode};
176
use crate::isa::unwind::UnwindInst;
177
use crate::machinst::{
178
BlockIndex, MachInstLabelUse, TextSectionBuilder, VCodeConstant, VCodeConstants, VCodeInst,
179
};
180
use crate::trace;
181
use crate::{MachInstEmitState, ir};
182
use crate::{VCodeConstantData, timing};
183
use alloc::collections::BinaryHeap;
184
use alloc::string::String;
185
use alloc::vec::Vec;
186
use core::cmp::Ordering;
187
use core::mem;
188
use core::ops::Range;
189
use cranelift_control::ControlPlane;
190
use cranelift_entity::{PrimaryMap, SecondaryMap, entity_impl};
191
use smallvec::SmallVec;
192
193
#[cfg(feature = "enable-serde")]
194
use serde::{Deserialize, Serialize};
195
196
#[cfg(feature = "enable-serde")]
197
pub trait CompilePhase {
198
type MachSrcLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
199
type SourceLocType: for<'a> Deserialize<'a> + Serialize + core::fmt::Debug + PartialEq + Clone;
200
}
201
202
#[cfg(not(feature = "enable-serde"))]
203
pub trait CompilePhase {
204
type MachSrcLocType: core::fmt::Debug + PartialEq + Clone;
205
type SourceLocType: core::fmt::Debug + PartialEq + Clone;
206
}
207
208
/// Status of a compiled artifact that needs patching before being used.
209
#[derive(Clone, Debug, PartialEq)]
210
#[cfg_attr(feature = "enable-serde", derive(Serialize, Deserialize))]
211
pub struct Stencil;
212
213
/// Status of a compiled artifact ready to use.
214
#[derive(Clone, Debug, PartialEq)]
215
pub struct Final;
216
217
impl CompilePhase for Stencil {
218
type MachSrcLocType = MachSrcLoc<Stencil>;
219
type SourceLocType = RelSourceLoc;
220
}
221
222
impl CompilePhase for Final {
223
type MachSrcLocType = MachSrcLoc<Final>;
224
type SourceLocType = SourceLoc;
225
}
226
227
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
228
enum ForceVeneers {
229
Yes,
230
No,
231
}
232
233
/// A buffer of output to be produced, fixed up, and then emitted to a CodeSink
234
/// in bulk.
235
///
236
/// This struct uses `SmallVec`s to support small-ish function bodies without
237
/// any heap allocation. As such, it will be several kilobytes large. This is
238
/// likely fine as long as it is stack-allocated for function emission then
239
/// thrown away; but beware if many buffer objects are retained persistently.
240
pub struct MachBuffer<I: VCodeInst> {
241
/// The buffer contents, as raw bytes.
242
data: SmallVec<[u8; 1024]>,
243
/// The required alignment of this buffer.
244
min_alignment: u32,
245
/// Any relocations referring to this code. Note that only *external*
246
/// relocations are tracked here; references to labels within the buffer are
247
/// resolved before emission.
248
relocs: SmallVec<[MachReloc; 16]>,
249
/// Any trap records referring to this code.
250
traps: SmallVec<[MachTrap; 16]>,
251
/// Any call site records referring to this code.
252
call_sites: SmallVec<[MachCallSite; 16]>,
253
/// Any patchable call site locations.
254
patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
255
/// Any exception-handler records referred to at call sites.
256
exception_handlers: SmallVec<[MachExceptionHandler; 16]>,
257
/// Any source location mappings referring to this code.
258
srclocs: SmallVec<[MachSrcLoc<Stencil>; 64]>,
259
/// Any debug tags referring to this code.
260
debug_tags: Vec<MachDebugTags>,
261
/// Pool of debug tags referenced by `MachDebugTags` entries.
262
debug_tag_pool: Vec<DebugTag>,
263
/// Any user stack maps for this code.
264
///
265
/// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
266
/// by code offset, and each stack map covers `span` bytes on the stack.
267
user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
268
/// Any unwind info at a given location.
269
unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
270
/// The current source location in progress (after `start_srcloc()` and
271
/// before `end_srcloc()`). This is a (start_offset, src_loc) tuple.
272
cur_srcloc: Option<(CodeOffset, RelSourceLoc)>,
273
/// Known label offsets; `UNKNOWN_LABEL_OFFSET` if unknown.
274
label_offsets: SmallVec<[CodeOffset; 16]>,
275
/// Label aliases: when one label points to an unconditional jump, and that
276
/// jump points to another label, we can redirect references to the first
277
/// label immediately to the second.
278
///
279
/// Invariant: we don't have label-alias cycles. We ensure this by,
280
/// before setting label A to alias label B, resolving B's alias
281
/// target (iteratively until a non-aliased label); if B is already
282
/// aliased to A, then we cannot alias A back to B.
283
label_aliases: SmallVec<[MachLabel; 16]>,
284
/// Constants that must be emitted at some point.
285
pending_constants: SmallVec<[VCodeConstant; 16]>,
286
/// Byte size of all constants in `pending_constants`.
287
pending_constants_size: CodeOffset,
288
/// Traps that must be emitted at some point.
289
pending_traps: SmallVec<[MachLabelTrap; 16]>,
290
/// Fixups that haven't yet been flushed into `fixup_records` below and may
291
/// be related to branches that are chomped. These all get added to
292
/// `fixup_records` during island emission.
293
pending_fixup_records: SmallVec<[MachLabelFixup<I>; 16]>,
294
/// The nearest upcoming deadline for entries in `pending_fixup_records`.
295
pending_fixup_deadline: CodeOffset,
296
/// Fixups that must be performed after all code is emitted.
297
fixup_records: BinaryHeap<MachLabelFixup<I>>,
298
/// Latest branches, to facilitate in-place editing for better fallthrough
299
/// behavior and empty-block removal.
300
latest_branches: SmallVec<[MachBranch; 4]>,
301
/// All labels at the current offset (emission tail). This is lazily
302
/// cleared: it is actually accurate as long as the current offset is
303
/// `labels_at_tail_off`, but if `cur_offset()` has grown larger, it should
304
/// be considered as empty.
305
///
306
/// For correctness, this *must* be complete (i.e., the vector must contain
307
/// all labels whose offsets are resolved to the current tail), because we
308
/// rely on it to update labels when we truncate branches.
309
labels_at_tail: SmallVec<[MachLabel; 4]>,
310
/// The last offset at which `labels_at_tail` is valid. It is conceptually
311
/// always describing the tail of the buffer, but we do not clear
312
/// `labels_at_tail` eagerly when the tail grows, rather we lazily clear it
313
/// when the offset has grown past this (`labels_at_tail_off`) point.
314
/// Always <= `cur_offset()`.
315
labels_at_tail_off: CodeOffset,
316
/// Metadata about all constants that this function has access to.
317
///
318
/// This records the size/alignment of all constants (not the actual data)
319
/// along with the last available label generated for the constant. This map
320
/// is consulted when constants are referred to and the label assigned to a
321
/// constant may change over time as well.
322
constants: PrimaryMap<VCodeConstant, MachBufferConstant>,
323
/// All recorded usages of constants as pairs of the constant and where the
324
/// constant needs to be placed within `self.data`. Note that the same
325
/// constant may appear in this array multiple times if it was emitted
326
/// multiple times.
327
used_constants: SmallVec<[(VCodeConstant, CodeOffset); 4]>,
328
/// Indicates when a patchable region is currently open, to guard that it's
329
/// not possible to nest patchable regions.
330
open_patchable: bool,
331
/// Stack frame layout metadata. If provided for a MachBuffer
332
/// containing a function body, this allows interpretation of
333
/// runtime state given a view of an active stack frame.
334
frame_layout: Option<MachBufferFrameLayout>,
335
}
336
337
impl MachBufferFinalized<Stencil> {
338
/// Get a finalized machine buffer by applying the function's base source location.
339
pub fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachBufferFinalized<Final> {
340
MachBufferFinalized {
341
data: self.data,
342
relocs: self.relocs,
343
traps: self.traps,
344
call_sites: self.call_sites,
345
patchable_call_sites: self.patchable_call_sites,
346
exception_handlers: self.exception_handlers,
347
srclocs: self
348
.srclocs
349
.into_iter()
350
.map(|srcloc| srcloc.apply_base_srcloc(base_srcloc))
351
.collect(),
352
debug_tags: self.debug_tags,
353
debug_tag_pool: self.debug_tag_pool,
354
user_stack_maps: self.user_stack_maps,
355
unwind_info: self.unwind_info,
356
alignment: self.alignment,
357
frame_layout: self.frame_layout,
358
nop_units: self.nop_units,
359
}
360
}
361
}
362
363
/// A `MachBuffer` once emission is completed: holds generated code and records,
364
/// without fixups. This allows the type to be independent of the backend.
365
#[derive(PartialEq, Debug, Clone)]
366
#[cfg_attr(
367
feature = "enable-serde",
368
derive(serde_derive::Serialize, serde_derive::Deserialize)
369
)]
370
pub struct MachBufferFinalized<T: CompilePhase> {
371
/// The buffer contents, as raw bytes.
372
pub(crate) data: SmallVec<[u8; 1024]>,
373
/// Any relocations referring to this code. Note that only *external*
374
/// relocations are tracked here; references to labels within the buffer are
375
/// resolved before emission.
376
pub(crate) relocs: SmallVec<[FinalizedMachReloc; 16]>,
377
/// Any trap records referring to this code.
378
pub(crate) traps: SmallVec<[MachTrap; 16]>,
379
/// Any call site records referring to this code.
380
pub(crate) call_sites: SmallVec<[MachCallSite; 16]>,
381
/// Any patchable call site locations refering to this code.
382
pub(crate) patchable_call_sites: SmallVec<[MachPatchableCallSite; 16]>,
383
/// Any exception-handler records referred to at call sites.
384
pub(crate) exception_handlers: SmallVec<[FinalizedMachExceptionHandler; 16]>,
385
/// Any source location mappings referring to this code.
386
pub(crate) srclocs: SmallVec<[T::MachSrcLocType; 64]>,
387
/// Any debug tags referring to this code.
388
pub(crate) debug_tags: Vec<MachDebugTags>,
389
/// Pool of debug tags referenced by `MachDebugTags` entries.
390
pub(crate) debug_tag_pool: Vec<DebugTag>,
391
/// Any user stack maps for this code.
392
///
393
/// Each entry is an `(offset, span, stack_map)` triple. Entries are sorted
394
/// by code offset, and each stack map covers `span` bytes on the stack.
395
pub(crate) user_stack_maps: SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]>,
396
/// Stack frame layout metadata. If provided for a MachBuffer
397
/// containing a function body, this allows interpretation of
398
/// runtime state given a view of an active stack frame.
399
pub(crate) frame_layout: Option<MachBufferFrameLayout>,
400
/// Any unwind info at a given location.
401
pub unwind_info: SmallVec<[(CodeOffset, UnwindInst); 8]>,
402
/// The required alignment of this buffer.
403
pub alignment: u32,
404
/// The means by which to NOP out patchable call sites.
405
///
406
/// This allows a consumer of a `MachBufferFinalized` to disable
407
/// patchable call sites (which are enabled by default) without
408
/// specific knowledge of the target ISA.
409
///
410
/// Each entry is one form of nop, and these are required to be
411
/// sorted in ascending-size order.
412
pub nop_units: Vec<Vec<u8>>,
413
}
414
415
const UNKNOWN_LABEL_OFFSET: CodeOffset = 0xffff_ffff;
416
const UNKNOWN_LABEL: MachLabel = MachLabel(0xffff_ffff);
417
418
/// Threshold on max length of `labels_at_this_branch` list to avoid
419
/// unbounded quadratic behavior (see comment below at use-site).
420
const LABEL_LIST_THRESHOLD: usize = 100;
421
422
/// A label refers to some offset in a `MachBuffer`. It may not be resolved at
423
/// the point at which it is used by emitted code; the buffer records "fixups"
424
/// for references to the label, and will come back and patch the code
425
/// appropriately when the label's location is eventually known.
426
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
427
pub struct MachLabel(u32);
428
entity_impl!(MachLabel);
429
430
impl MachLabel {
431
/// Get a label for a block. (The first N MachLabels are always reserved for
432
/// the N blocks in the vcode.)
433
pub fn from_block(bindex: BlockIndex) -> MachLabel {
434
MachLabel(bindex.index() as u32)
435
}
436
437
/// Creates a string representing this label, for convenience.
438
pub fn to_string(&self) -> String {
439
format!("label{}", self.0)
440
}
441
}
442
443
impl Default for MachLabel {
444
fn default() -> Self {
445
UNKNOWN_LABEL
446
}
447
}
448
449
/// Represents the beginning of an editable region in the [`MachBuffer`], while code emission is
450
/// still occurring. An [`OpenPatchRegion`] is closed by [`MachBuffer::end_patchable`], consuming
451
/// the [`OpenPatchRegion`] token in the process.
452
pub struct OpenPatchRegion(usize);
453
454
/// A region in the [`MachBuffer`] code buffer that can be edited prior to finalization. An example
455
/// of where you might want to use this is for patching instructions that mention constants that
456
/// won't be known until later: [`MachBuffer::start_patchable`] can be used to begin the patchable
457
/// region, instructions can be emitted with placeholder constants, and the [`PatchRegion`] token
458
/// can be produced by [`MachBuffer::end_patchable`]. Once the values of those constants are known,
459
/// the [`PatchRegion::patch`] function can be used to get a mutable buffer to the instruction
460
/// bytes, and the constants uses can be updated directly.
461
pub struct PatchRegion {
462
range: Range<usize>,
463
}
464
465
impl PatchRegion {
466
/// Consume the patch region to yield a mutable slice of the [`MachBuffer`] data buffer.
467
pub fn patch<I: VCodeInst>(self, buffer: &mut MachBuffer<I>) -> &mut [u8] {
468
&mut buffer.data[self.range]
469
}
470
}
471
472
impl<I: VCodeInst> MachBuffer<I> {
473
/// Create a new section, known to start at `start_offset` and with a size limited to
474
/// `length_limit`.
475
pub fn new() -> MachBuffer<I> {
476
MachBuffer {
477
data: SmallVec::new(),
478
min_alignment: I::function_alignment().minimum,
479
relocs: SmallVec::new(),
480
traps: SmallVec::new(),
481
call_sites: SmallVec::new(),
482
patchable_call_sites: SmallVec::new(),
483
exception_handlers: SmallVec::new(),
484
srclocs: SmallVec::new(),
485
debug_tags: vec![],
486
debug_tag_pool: vec![],
487
user_stack_maps: SmallVec::new(),
488
unwind_info: SmallVec::new(),
489
cur_srcloc: None,
490
label_offsets: SmallVec::new(),
491
label_aliases: SmallVec::new(),
492
pending_constants: SmallVec::new(),
493
pending_constants_size: 0,
494
pending_traps: SmallVec::new(),
495
pending_fixup_records: SmallVec::new(),
496
pending_fixup_deadline: u32::MAX,
497
fixup_records: Default::default(),
498
latest_branches: SmallVec::new(),
499
labels_at_tail: SmallVec::new(),
500
labels_at_tail_off: 0,
501
constants: Default::default(),
502
used_constants: Default::default(),
503
open_patchable: false,
504
frame_layout: None,
505
}
506
}
507
508
/// Current offset from start of buffer.
509
pub fn cur_offset(&self) -> CodeOffset {
510
self.data.len() as CodeOffset
511
}
512
513
/// Add a byte.
514
pub fn put1(&mut self, value: u8) {
515
self.data.push(value);
516
517
// Post-invariant: conceptual-labels_at_tail contains a complete and
518
// precise list of labels bound at `cur_offset()`. We have advanced
519
// `cur_offset()`, hence if it had been equal to `labels_at_tail_off`
520
// before, it is not anymore (and it cannot become equal, because
521
// `labels_at_tail_off` is always <= `cur_offset()`). Thus the list is
522
// conceptually empty (even though it is only lazily cleared). No labels
523
// can be bound at this new offset (by invariant on `label_offsets`).
524
// Hence the invariant holds.
525
}
526
527
/// Add 2 bytes.
528
pub fn put2(&mut self, value: u16) {
529
let bytes = value.to_le_bytes();
530
self.data.extend_from_slice(&bytes[..]);
531
532
// Post-invariant: as for `put1()`.
533
}
534
535
/// Add 4 bytes.
536
pub fn put4(&mut self, value: u32) {
537
let bytes = value.to_le_bytes();
538
self.data.extend_from_slice(&bytes[..]);
539
540
// Post-invariant: as for `put1()`.
541
}
542
543
/// Add 8 bytes.
544
pub fn put8(&mut self, value: u64) {
545
let bytes = value.to_le_bytes();
546
self.data.extend_from_slice(&bytes[..]);
547
548
// Post-invariant: as for `put1()`.
549
}
550
551
/// Add a slice of bytes.
552
pub fn put_data(&mut self, data: &[u8]) {
553
self.data.extend_from_slice(data);
554
555
// Post-invariant: as for `put1()`.
556
}
557
558
/// Reserve appended space and return a mutable slice referring to it.
559
pub fn get_appended_space(&mut self, len: usize) -> &mut [u8] {
560
let off = self.data.len();
561
let new_len = self.data.len() + len;
562
self.data.resize(new_len, 0);
563
&mut self.data[off..]
564
565
// Post-invariant: as for `put1()`.
566
}
567
568
/// Align up to the given alignment.
569
pub fn align_to(&mut self, align_to: CodeOffset) {
570
trace!("MachBuffer: align to {}", align_to);
571
assert!(
572
align_to.is_power_of_two(),
573
"{align_to} is not a power of two"
574
);
575
while self.cur_offset() & (align_to - 1) != 0 {
576
self.put1(0);
577
}
578
579
// Post-invariant: as for `put1()`.
580
}
581
582
/// Begin a region of patchable code. There is one requirement for the
583
/// code that is emitted: It must not introduce any instructions that
584
/// could be chomped (branches are an example of this). In other words,
585
/// you must not call [`MachBuffer::add_cond_branch`] or
586
/// [`MachBuffer::add_uncond_branch`] between calls to this method and
587
/// [`MachBuffer::end_patchable`].
588
pub fn start_patchable(&mut self) -> OpenPatchRegion {
589
assert!(!self.open_patchable, "Patchable regions may not be nested");
590
self.open_patchable = true;
591
OpenPatchRegion(usize::try_from(self.cur_offset()).unwrap())
592
}
593
594
/// End a region of patchable code, yielding a [`PatchRegion`] value that
595
/// can be consumed later to produce a one-off mutable slice to the
596
/// associated region of the data buffer.
597
pub fn end_patchable(&mut self, open: OpenPatchRegion) -> PatchRegion {
598
// No need to assert the state of `open_patchable` here, as we take
599
// ownership of the only `OpenPatchable` value.
600
self.open_patchable = false;
601
let end = usize::try_from(self.cur_offset()).unwrap();
602
PatchRegion { range: open.0..end }
603
}
604
605
/// Allocate a `Label` to refer to some offset. May not be bound to a fixed
606
/// offset yet.
607
pub fn get_label(&mut self) -> MachLabel {
608
let l = self.label_offsets.len() as u32;
609
self.label_offsets.push(UNKNOWN_LABEL_OFFSET);
610
self.label_aliases.push(UNKNOWN_LABEL);
611
trace!("MachBuffer: new label -> {:?}", MachLabel(l));
612
MachLabel(l)
613
614
// Post-invariant: the only mutation is to add a new label; it has no
615
// bound offset yet, so it trivially satisfies all invariants.
616
}
617
618
/// Reserve the first N MachLabels for blocks.
619
pub fn reserve_labels_for_blocks(&mut self, blocks: usize) {
620
trace!("MachBuffer: first {} labels are for blocks", blocks);
621
debug_assert!(self.label_offsets.is_empty());
622
self.label_offsets.resize(blocks, UNKNOWN_LABEL_OFFSET);
623
self.label_aliases.resize(blocks, UNKNOWN_LABEL);
624
625
// Post-invariant: as for `get_label()`.
626
}
627
628
/// Registers metadata in this `MachBuffer` about the `constants` provided.
629
///
630
/// This will record the size/alignment of all constants which will prepare
631
/// them for emission later on.
632
pub fn register_constants(&mut self, constants: &VCodeConstants) {
633
for (c, val) in constants.iter() {
634
self.register_constant(&c, val);
635
}
636
}
637
638
/// Similar to [`MachBuffer::register_constants`] but registers a
639
/// single constant metadata. This function is useful in
640
/// situations where not all constants are known at the time of
641
/// emission.
642
pub fn register_constant(&mut self, constant: &VCodeConstant, data: &VCodeConstantData) {
643
let c2 = self.constants.push(MachBufferConstant {
644
upcoming_label: None,
645
align: data.alignment(),
646
size: data.as_slice().len(),
647
});
648
assert_eq!(*constant, c2);
649
}
650
651
/// Completes constant emission by iterating over `self.used_constants` and
652
/// filling in the "holes" with the constant values provided by `constants`.
653
///
654
/// Returns the alignment required for this entire buffer. Alignment starts
655
/// at the ISA's minimum function alignment and can be increased due to
656
/// constant requirements.
657
fn finish_constants(&mut self, constants: &VCodeConstants) -> u32 {
658
let mut alignment = self.min_alignment;
659
for (constant, offset) in mem::take(&mut self.used_constants) {
660
let constant = constants.get(constant);
661
let data = constant.as_slice();
662
self.data[offset as usize..][..data.len()].copy_from_slice(data);
663
alignment = constant.alignment().max(alignment);
664
}
665
alignment
666
}
667
668
/// Returns a label that can be used to refer to the `constant` provided.
669
///
670
/// This will automatically defer a new constant to be emitted for
671
/// `constant` if it has not been previously emitted. Note that this
672
/// function may return a different label for the same constant at
673
/// different points in time. The label is valid to use only from the
674
/// current location; the MachBuffer takes care to emit the same constant
675
/// multiple times if needed so the constant is always in range.
676
pub fn get_label_for_constant(&mut self, constant: VCodeConstant) -> MachLabel {
677
let MachBufferConstant {
678
align,
679
size,
680
upcoming_label,
681
} = self.constants[constant];
682
if let Some(label) = upcoming_label {
683
return label;
684
}
685
686
let label = self.get_label();
687
trace!(
688
"defer constant: eventually emit {size} bytes aligned \
689
to {align} at label {label:?}",
690
);
691
self.pending_constants.push(constant);
692
self.pending_constants_size += size as u32;
693
self.constants[constant].upcoming_label = Some(label);
694
label
695
}
696
697
/// Bind a label to the current offset. A label can only be bound once.
698
pub fn bind_label(&mut self, label: MachLabel, ctrl_plane: &mut ControlPlane) {
699
trace!(
700
"MachBuffer: bind label {:?} at offset {}",
701
label,
702
self.cur_offset()
703
);
704
debug_assert_eq!(self.label_offsets[label.0 as usize], UNKNOWN_LABEL_OFFSET);
705
debug_assert_eq!(self.label_aliases[label.0 as usize], UNKNOWN_LABEL);
706
let offset = self.cur_offset();
707
self.label_offsets[label.0 as usize] = offset;
708
self.lazily_clear_labels_at_tail();
709
self.labels_at_tail.push(label);
710
711
// Invariants hold: bound offset of label is <= cur_offset (in fact it
712
// is equal). If the `labels_at_tail` list was complete and precise
713
// before, it is still, because we have bound this label to the current
714
// offset and added it to the list (which contains all labels at the
715
// current offset).
716
717
self.optimize_branches(ctrl_plane);
718
719
// Post-invariant: by `optimize_branches()` (see argument there).
720
}
721
722
/// Lazily clear `labels_at_tail` if the tail offset has moved beyond the
723
/// offset that it applies to.
724
fn lazily_clear_labels_at_tail(&mut self) {
725
let offset = self.cur_offset();
726
if offset > self.labels_at_tail_off {
727
self.labels_at_tail_off = offset;
728
self.labels_at_tail.clear();
729
}
730
731
// Post-invariant: either labels_at_tail_off was at cur_offset, and
732
// state is untouched, or was less than cur_offset, in which case the
733
// labels_at_tail list was conceptually empty, and is now actually
734
// empty.
735
}
736
737
/// Resolve a label to an offset, if known. May return `UNKNOWN_LABEL_OFFSET`.
738
pub(crate) fn resolve_label_offset(&self, mut label: MachLabel) -> CodeOffset {
739
let mut iters = 0;
740
while self.label_aliases[label.0 as usize] != UNKNOWN_LABEL {
741
label = self.label_aliases[label.0 as usize];
742
// To protect against an infinite loop (despite our assurances to
743
// ourselves that the invariants make this impossible), assert out
744
// after 1M iterations. The number of basic blocks is limited
745
// in most contexts anyway so this should be impossible to hit with
746
// a legitimate input.
747
iters += 1;
748
assert!(iters < 1_000_000, "Unexpected cycle in label aliases");
749
}
750
self.label_offsets[label.0 as usize]
751
752
// Post-invariant: no mutations.
753
}
754
755
/// Emit a reference to the given label with the given reference type (i.e.,
756
/// branch-instruction format) at the current offset. This is like a
757
/// relocation, but handled internally.
758
///
759
/// This can be called before the branch is actually emitted; fixups will
760
/// not happen until an island is emitted or the buffer is finished.
761
pub fn use_label_at_offset(&mut self, offset: CodeOffset, label: MachLabel, kind: I::LabelUse) {
762
trace!(
763
"MachBuffer: use_label_at_offset: offset {} label {:?} kind {:?}",
764
offset, label, kind
765
);
766
767
// Add the fixup, and update the worst-case island size based on a
768
// veneer for this label use.
769
let fixup = MachLabelFixup {
770
label,
771
offset,
772
kind,
773
};
774
self.pending_fixup_deadline = self.pending_fixup_deadline.min(fixup.deadline());
775
self.pending_fixup_records.push(fixup);
776
777
// Post-invariant: no mutations to branches/labels data structures.
778
}
779
780
/// Inform the buffer of an unconditional branch at the given offset,
781
/// targeting the given label. May be used to optimize branches.
782
/// The last added label-use must correspond to this branch.
783
/// This must be called when the current offset is equal to `start`; i.e.,
784
/// before actually emitting the branch. This implies that for a branch that
785
/// uses a label and is eligible for optimizations by the MachBuffer, the
786
/// proper sequence is:
787
///
788
/// - Call `use_label_at_offset()` to emit the fixup record.
789
/// - Call `add_uncond_branch()` to make note of the branch.
790
/// - Emit the bytes for the branch's machine code.
791
///
792
/// Additional requirement: no labels may be bound between `start` and `end`
793
/// (exclusive on both ends).
794
pub fn add_uncond_branch(&mut self, start: CodeOffset, end: CodeOffset, target: MachLabel) {
795
debug_assert!(
796
!self.open_patchable,
797
"Branch instruction inserted within a patchable region"
798
);
799
assert!(self.cur_offset() == start);
800
debug_assert!(end > start);
801
assert!(!self.pending_fixup_records.is_empty());
802
let fixup = self.pending_fixup_records.len() - 1;
803
self.lazily_clear_labels_at_tail();
804
self.latest_branches.push(MachBranch {
805
start,
806
end,
807
target,
808
fixup,
809
inverted: None,
810
labels_at_this_branch: self.labels_at_tail.clone(),
811
});
812
813
// Post-invariant: we asserted branch start is current tail; the list of
814
// labels at branch is cloned from list of labels at current tail.
815
}
816
817
/// Inform the buffer of a conditional branch at the given offset,
818
/// targeting the given label. May be used to optimize branches.
819
/// The last added label-use must correspond to this branch.
820
///
821
/// Additional requirement: no labels may be bound between `start` and `end`
822
/// (exclusive on both ends).
823
pub fn add_cond_branch(
824
&mut self,
825
start: CodeOffset,
826
end: CodeOffset,
827
target: MachLabel,
828
inverted: &[u8],
829
) {
830
debug_assert!(
831
!self.open_patchable,
832
"Branch instruction inserted within a patchable region"
833
);
834
assert!(self.cur_offset() == start);
835
debug_assert!(end > start);
836
assert!(!self.pending_fixup_records.is_empty());
837
debug_assert!(
838
inverted.len() == (end - start) as usize,
839
"branch length = {}, but inverted length = {}",
840
end - start,
841
inverted.len()
842
);
843
let fixup = self.pending_fixup_records.len() - 1;
844
let inverted = Some(SmallVec::from(inverted));
845
self.lazily_clear_labels_at_tail();
846
self.latest_branches.push(MachBranch {
847
start,
848
end,
849
target,
850
fixup,
851
inverted,
852
labels_at_this_branch: self.labels_at_tail.clone(),
853
});
854
855
// Post-invariant: we asserted branch start is current tail; labels at
856
// branch list is cloned from list of labels at current tail.
857
}
858
859
fn truncate_last_branch(&mut self) {
860
debug_assert!(
861
!self.open_patchable,
862
"Branch instruction truncated within a patchable region"
863
);
864
865
self.lazily_clear_labels_at_tail();
866
// Invariants hold at this point.
867
868
let b = self.latest_branches.pop().unwrap();
869
assert!(b.end == self.cur_offset());
870
871
// State:
872
// [PRE CODE]
873
// Offset b.start, b.labels_at_this_branch:
874
// [BRANCH CODE]
875
// cur_off, self.labels_at_tail -->
876
// (end of buffer)
877
self.data.truncate(b.start as usize);
878
self.pending_fixup_records.truncate(b.fixup);
879
880
// Trim srclocs and debug tags now past the end of the buffer.
881
while let Some(last_srcloc) = self.srclocs.last_mut() {
882
if last_srcloc.end <= b.start {
883
break;
884
}
885
if last_srcloc.start < b.start {
886
last_srcloc.end = b.start;
887
break;
888
}
889
self.srclocs.pop();
890
}
891
while let Some(last_debug_tag) = self.debug_tags.last() {
892
if last_debug_tag.offset <= b.start {
893
break;
894
}
895
self.debug_tags.pop();
896
}
897
898
// State:
899
// [PRE CODE]
900
// cur_off, Offset b.start, b.labels_at_this_branch:
901
// (end of buffer)
902
//
903
// self.labels_at_tail --> (past end of buffer)
904
let cur_off = self.cur_offset();
905
self.labels_at_tail_off = cur_off;
906
// State:
907
// [PRE CODE]
908
// cur_off, Offset b.start, b.labels_at_this_branch,
909
// self.labels_at_tail:
910
// (end of buffer)
911
//
912
// resolve_label_offset(l) for l in labels_at_tail:
913
// (past end of buffer)
914
915
trace!(
916
"truncate_last_branch: truncated {:?}; off now {}",
917
b, cur_off
918
);
919
920
// Fix up resolved label offsets for labels at tail.
921
for &l in &self.labels_at_tail {
922
self.label_offsets[l.0 as usize] = cur_off;
923
}
924
// Old labels_at_this_branch are now at cur_off.
925
self.labels_at_tail.extend(b.labels_at_this_branch);
926
927
// Post-invariant: this operation is defined to truncate the buffer,
928
// which moves cur_off backward, and to move labels at the end of the
929
// buffer back to the start-of-branch offset.
930
//
931
// latest_branches satisfies all invariants:
932
// - it has no branches past the end of the buffer (branches are in
933
// order, we removed the last one, and we truncated the buffer to just
934
// before the start of that branch)
935
// - no labels were moved to lower offsets than the (new) cur_off, so
936
// the labels_at_this_branch list for any other branch need not change.
937
//
938
// labels_at_tail satisfies all invariants:
939
// - all labels that were at the tail after the truncated branch are
940
// moved backward to just before the branch, which becomes the new tail;
941
// thus every element in the list should remain (ensured by `.extend()`
942
// above).
943
// - all labels that refer to the new tail, which is the start-offset of
944
// the truncated branch, must be present. The `labels_at_this_branch`
945
// list in the truncated branch's record is a complete and precise list
946
// of exactly these labels; we append these to labels_at_tail.
947
// - labels_at_tail_off is at cur_off after truncation occurs, so the
948
// list is valid (not to be lazily cleared).
949
//
950
// The stated operation was performed:
951
// - For each label at the end of the buffer prior to this method, it
952
// now resolves to the new (truncated) end of the buffer: it must have
953
// been in `labels_at_tail` (this list is precise and complete, and
954
// the tail was at the end of the truncated branch on entry), and we
955
// iterate over this list and set `label_offsets` to the new tail.
956
// None of these labels could have been an alias (by invariant), so
957
// `label_offsets` is authoritative for each.
958
// - No other labels will be past the end of the buffer, because of the
959
// requirement that no labels be bound to the middle of branch ranges
960
// (see comments to `add_{cond,uncond}_branch()`).
961
// - The buffer is truncated to just before the last branch, and the
962
// fixup record referring to that last branch is removed.
963
}
964
965
/// Performs various optimizations on branches pointing at the current label.
966
pub fn optimize_branches(&mut self, ctrl_plane: &mut ControlPlane) {
967
if ctrl_plane.get_decision() {
968
return;
969
}
970
971
self.lazily_clear_labels_at_tail();
972
// Invariants valid at this point.
973
974
trace!(
975
"enter optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
976
self.latest_branches, self.labels_at_tail, self.pending_fixup_records
977
);
978
979
// We continue to munch on branches at the tail of the buffer until no
980
// more rules apply. Note that the loop only continues if a branch is
981
// actually truncated (or if labels are redirected away from a branch),
982
// so this always makes progress.
983
while let Some(b) = self.latest_branches.last() {
984
let cur_off = self.cur_offset();
985
trace!("optimize_branches: last branch {:?} at off {}", b, cur_off);
986
// If there has been any code emission since the end of the last branch or
987
// label definition, then there's nothing we can edit (because we
988
// don't move code once placed, only back up and overwrite), so
989
// clear the records and finish.
990
if b.end < cur_off {
991
break;
992
}
993
994
// If the "labels at this branch" list on this branch is
995
// longer than a threshold, don't do any simplification,
996
// and let the branch remain to separate those labels from
997
// the current tail. This avoids quadratic behavior (see
998
// #3468): otherwise, if a long string of "goto next;
999
// next:" patterns are emitted, all of the labels will
1000
// coalesce into a long list of aliases for the current
1001
// buffer tail. We must track all aliases of the current
1002
// tail for correctness, but we are also allowed to skip
1003
// optimization (removal) of any branch, so we take the
1004
// escape hatch here and let it stand. In effect this
1005
// "spreads" the many thousands of labels in the
1006
// pathological case among an actual (harmless but
1007
// suboptimal) instruction once per N labels.
1008
if b.labels_at_this_branch.len() > LABEL_LIST_THRESHOLD {
1009
break;
1010
}
1011
1012
// Invariant: we are looking at a branch that ends at the tail of
1013
// the buffer.
1014
1015
// For any branch, conditional or unconditional:
1016
// - If the target is a label at the current offset, then remove
1017
// the conditional branch, and reset all labels that targeted
1018
// the current offset (end of branch) to the truncated
1019
// end-of-code.
1020
//
1021
// Preserves execution semantics: a branch to its own fallthrough
1022
// address is equivalent to a no-op; in both cases, nextPC is the
1023
// fallthrough.
1024
if self.resolve_label_offset(b.target) == cur_off {
1025
trace!("branch with target == cur off; truncating");
1026
self.truncate_last_branch();
1027
continue;
1028
}
1029
1030
// If latest is an unconditional branch:
1031
//
1032
// - If the branch's target is not its own start address, then for
1033
// each label at the start of branch, make the label an alias of the
1034
// branch target, and remove the label from the "labels at this
1035
// branch" list.
1036
//
1037
// - Preserves execution semantics: an unconditional branch's
1038
// only effect is to set PC to a new PC; this change simply
1039
// collapses one step in the step-semantics.
1040
//
1041
// - Post-invariant: the labels that were bound to the start of
1042
// this branch become aliases, so they must not be present in any
1043
// labels-at-this-branch list or the labels-at-tail list. The
1044
// labels are removed form the latest-branch record's
1045
// labels-at-this-branch list, and are never placed in the
1046
// labels-at-tail list. Furthermore, it is correct that they are
1047
// not in either list, because they are now aliases, and labels
1048
// that are aliases remain aliases forever.
1049
//
1050
// - If there is a prior unconditional branch that ends just before
1051
// this one begins, and this branch has no labels bound to its
1052
// start, then we can truncate this branch, because it is entirely
1053
// unreachable (we have redirected all labels that make it
1054
// reachable otherwise). Do so and continue around the loop.
1055
//
1056
// - Preserves execution semantics: the branch is unreachable,
1057
// because execution can only flow into an instruction from the
1058
// prior instruction's fallthrough or from a branch bound to that
1059
// instruction's start offset. Unconditional branches have no
1060
// fallthrough, so if the prior instruction is an unconditional
1061
// branch, no fallthrough entry can happen. The
1062
// labels-at-this-branch list is complete (by invariant), so if it
1063
// is empty, then the instruction is entirely unreachable. Thus,
1064
// it can be removed.
1065
//
1066
// - Post-invariant: ensured by truncate_last_branch().
1067
//
1068
// - If there is a prior conditional branch whose target label
1069
// resolves to the current offset (branches around the
1070
// unconditional branch), then remove the unconditional branch,
1071
// and make the target of the unconditional the target of the
1072
// conditional instead.
1073
//
1074
// - Preserves execution semantics: previously we had:
1075
//
1076
// L1:
1077
// cond_br L2
1078
// br L3
1079
// L2:
1080
// (end of buffer)
1081
//
1082
// by removing the last branch, we have:
1083
//
1084
// L1:
1085
// cond_br L2
1086
// L2:
1087
// (end of buffer)
1088
//
1089
// we then fix up the records for the conditional branch to
1090
// have:
1091
//
1092
// L1:
1093
// cond_br.inverted L3
1094
// L2:
1095
//
1096
// In the original code, control flow reaches L2 when the
1097
// conditional branch's predicate is true, and L3 otherwise. In
1098
// the optimized code, the same is true.
1099
//
1100
// - Post-invariant: all edits to latest_branches and
1101
// labels_at_tail are performed by `truncate_last_branch()`,
1102
// which maintains the invariants at each step.
1103
1104
if b.is_uncond() {
1105
// Set any label equal to current branch's start as an alias of
1106
// the branch's target, if the target is not the branch itself
1107
// (i.e., an infinite loop).
1108
//
1109
// We cannot perform this aliasing if the target of this branch
1110
// ultimately aliases back here; if so, we need to keep this
1111
// branch, so break out of this loop entirely (and clear the
1112
// latest-branches list below).
1113
//
1114
// Note that this check is what prevents cycles from forming in
1115
// `self.label_aliases`. To see why, consider an arbitrary start
1116
// state:
1117
//
1118
// label_aliases[L1] = L2, label_aliases[L2] = L3, ..., up to
1119
// Ln, which is not aliased.
1120
//
1121
// We would create a cycle if we assigned label_aliases[Ln]
1122
// = L1. Note that the below assignment is the only write
1123
// to label_aliases.
1124
//
1125
// By our other invariants, we have that Ln (`l` below)
1126
// resolves to the offset `b.start`, because it is in the
1127
// set `b.labels_at_this_branch`.
1128
//
1129
// If L1 were already aliased, through some arbitrarily deep
1130
// chain, to Ln, then it must also resolve to this offset
1131
// `b.start`.
1132
//
1133
// By checking the resolution of `L1` against this offset,
1134
// and aborting this branch-simplification if they are
1135
// equal, we prevent the below assignment from ever creating
1136
// a cycle.
1137
if self.resolve_label_offset(b.target) != b.start {
1138
let redirected = b.labels_at_this_branch.len();
1139
for &l in &b.labels_at_this_branch {
1140
trace!(
1141
" -> label at start of branch {:?} redirected to target {:?}",
1142
l, b.target
1143
);
1144
self.label_aliases[l.0 as usize] = b.target;
1145
// NOTE: we continue to ensure the invariant that labels
1146
// pointing to tail of buffer are in `labels_at_tail`
1147
// because we already ensured above that the last branch
1148
// cannot have a target of `cur_off`; so we never have
1149
// to put the label into `labels_at_tail` when moving it
1150
// here.
1151
}
1152
// Maintain invariant: all branches have been redirected
1153
// and are no longer pointing at the start of this branch.
1154
let mut_b = self.latest_branches.last_mut().unwrap();
1155
mut_b.labels_at_this_branch.clear();
1156
1157
if redirected > 0 {
1158
trace!(" -> after label redirects, restarting loop");
1159
continue;
1160
}
1161
} else {
1162
break;
1163
}
1164
1165
let b = self.latest_branches.last().unwrap();
1166
1167
// Examine any immediately preceding branch.
1168
if self.latest_branches.len() > 1 {
1169
let prev_b = &self.latest_branches[self.latest_branches.len() - 2];
1170
trace!(" -> more than one branch; prev_b = {:?}", prev_b);
1171
// This uncond is immediately after another uncond; we
1172
// should have already redirected labels to this uncond away
1173
// (but check to be sure); so we can truncate this uncond.
1174
if prev_b.is_uncond()
1175
&& prev_b.end == b.start
1176
&& b.labels_at_this_branch.is_empty()
1177
{
1178
trace!(" -> uncond follows another uncond; truncating");
1179
self.truncate_last_branch();
1180
continue;
1181
}
1182
1183
// This uncond is immediately after a conditional, and the
1184
// conditional's target is the end of this uncond, and we've
1185
// already redirected labels to this uncond away; so we can
1186
// truncate this uncond, flip the sense of the conditional, and
1187
// set the conditional's target (in `latest_branches` and in
1188
// `fixup_records`) to the uncond's target.
1189
if prev_b.is_cond()
1190
&& prev_b.end == b.start
1191
&& self.resolve_label_offset(prev_b.target) == cur_off
1192
{
1193
trace!(
1194
" -> uncond follows a conditional, and conditional's target resolves to current offset"
1195
);
1196
// Save the target of the uncond (this becomes the
1197
// target of the cond), and truncate the uncond.
1198
let target = b.target;
1199
let data = prev_b.inverted.clone().unwrap();
1200
self.truncate_last_branch();
1201
1202
// Mutate the code and cond branch.
1203
let off_before_edit = self.cur_offset();
1204
let prev_b = self.latest_branches.last_mut().unwrap();
1205
let not_inverted = SmallVec::from(
1206
&self.data[(prev_b.start as usize)..(prev_b.end as usize)],
1207
);
1208
1209
// Low-level edit: replaces bytes of branch with
1210
// inverted form. cur_off remains the same afterward, so
1211
// we do not need to modify label data structures.
1212
self.data.truncate(prev_b.start as usize);
1213
self.data.extend_from_slice(&data[..]);
1214
1215
// Save the original code as the inversion of the
1216
// inverted branch, in case we later edit this branch
1217
// again.
1218
prev_b.inverted = Some(not_inverted);
1219
self.pending_fixup_records[prev_b.fixup].label = target;
1220
trace!(" -> reassigning target of condbr to {:?}", target);
1221
prev_b.target = target;
1222
debug_assert_eq!(off_before_edit, self.cur_offset());
1223
continue;
1224
}
1225
}
1226
}
1227
1228
// If we couldn't do anything with the last branch, then break.
1229
break;
1230
}
1231
1232
self.purge_latest_branches();
1233
1234
trace!(
1235
"leave optimize_branches:\n b = {:?}\n l = {:?}\n f = {:?}",
1236
self.latest_branches, self.labels_at_tail, self.pending_fixup_records
1237
);
1238
}
1239
1240
fn purge_latest_branches(&mut self) {
1241
// All of our branch simplification rules work only if a branch ends at
1242
// the tail of the buffer, with no following code; and branches are in
1243
// order in latest_branches; so if the last entry ends prior to
1244
// cur_offset, then clear all entries.
1245
let cur_off = self.cur_offset();
1246
if let Some(l) = self.latest_branches.last() {
1247
if l.end < cur_off {
1248
trace!("purge_latest_branches: removing branch {:?}", l);
1249
self.latest_branches.clear();
1250
}
1251
}
1252
1253
// Post-invariant: no invariant requires any branch to appear in
1254
// `latest_branches`; it is always optional. The list-clear above thus
1255
// preserves all semantics.
1256
}
1257
1258
/// Emit a trap at some point in the future with the specified code and
1259
/// stack map.
1260
///
1261
/// This function returns a [`MachLabel`] which will be the future address
1262
/// of the trap. Jumps should refer to this label, likely by using the
1263
/// [`MachBuffer::use_label_at_offset`] method, to get a relocation
1264
/// patched in once the address of the trap is known.
1265
///
1266
/// This will batch all traps into the end of the function.
1267
pub fn defer_trap(&mut self, code: TrapCode) -> MachLabel {
1268
let label = self.get_label();
1269
self.pending_traps.push(MachLabelTrap {
1270
label,
1271
code,
1272
loc: self.cur_srcloc.map(|(_start, loc)| loc),
1273
});
1274
label
1275
}
1276
1277
/// Is an island needed within the next N bytes?
1278
pub fn island_needed(&self, distance: CodeOffset) -> bool {
1279
let deadline = match self.fixup_records.peek() {
1280
Some(fixup) => fixup.deadline().min(self.pending_fixup_deadline),
1281
None => self.pending_fixup_deadline,
1282
};
1283
deadline < u32::MAX && self.worst_case_end_of_island(distance) > deadline
1284
}
1285
1286
/// Returns the maximal offset that islands can reach if `distance` more
1287
/// bytes are appended.
1288
///
1289
/// This is used to determine if veneers need insertions since jumps that
1290
/// can't reach past this point must get a veneer of some form.
1291
fn worst_case_end_of_island(&self, distance: CodeOffset) -> CodeOffset {
1292
// Assume that all fixups will require veneers and that the veneers are
1293
// the worst-case size for each platform. This is an over-generalization
1294
// to avoid iterating over the `fixup_records` list or maintaining
1295
// information about it as we go along.
1296
let island_worst_case_size = ((self.fixup_records.len() + self.pending_fixup_records.len())
1297
as u32)
1298
* (I::LabelUse::worst_case_veneer_size())
1299
+ self.pending_constants_size
1300
+ (self.pending_traps.len() * I::TRAP_OPCODE.len()) as u32;
1301
self.cur_offset()
1302
.saturating_add(distance)
1303
.saturating_add(island_worst_case_size)
1304
}
1305
1306
/// Emit all pending constants and required pending veneers.
1307
///
1308
/// Should only be called if `island_needed()` returns true, i.e., if we
1309
/// actually reach a deadline. It's not necessarily a problem to do so
1310
/// otherwise but it may result in unnecessary work during emission.
1311
pub fn emit_island(&mut self, distance: CodeOffset, ctrl_plane: &mut ControlPlane) {
1312
self.emit_island_maybe_forced(ForceVeneers::No, distance, ctrl_plane);
1313
}
1314
1315
/// Same as `emit_island`, but an internal API with a `force_veneers`
1316
/// argument to force all veneers to always get emitted for debugging.
1317
fn emit_island_maybe_forced(
1318
&mut self,
1319
force_veneers: ForceVeneers,
1320
distance: CodeOffset,
1321
ctrl_plane: &mut ControlPlane,
1322
) {
1323
// We're going to purge fixups, so no latest-branch editing can happen
1324
// anymore.
1325
self.latest_branches.clear();
1326
1327
// End the current location tracking since anything emitted during this
1328
// function shouldn't be attributed to whatever the current source
1329
// location is.
1330
//
1331
// Note that the current source location, if it's set right now, will be
1332
// restored at the end of this island emission.
1333
let cur_loc = self.cur_srcloc.map(|(_, loc)| loc);
1334
if cur_loc.is_some() {
1335
self.end_srcloc();
1336
}
1337
1338
let forced_threshold = self.worst_case_end_of_island(distance);
1339
1340
// First flush out all traps/constants so we have more labels in case
1341
// fixups are applied against these labels.
1342
//
1343
// Note that traps are placed first since this typically happens at the
1344
// end of the function and for disassemblers we try to keep all the code
1345
// contiguously together.
1346
for MachLabelTrap { label, code, loc } in mem::take(&mut self.pending_traps) {
1347
// If this trap has source information associated with it then
1348
// emit this information for the trap instruction going out now too.
1349
if let Some(loc) = loc {
1350
self.start_srcloc(loc);
1351
}
1352
self.align_to(I::LabelUse::ALIGN);
1353
self.bind_label(label, ctrl_plane);
1354
self.add_trap(code);
1355
self.put_data(I::TRAP_OPCODE);
1356
if loc.is_some() {
1357
self.end_srcloc();
1358
}
1359
}
1360
1361
for constant in mem::take(&mut self.pending_constants) {
1362
let MachBufferConstant { align, size, .. } = self.constants[constant];
1363
let label = self.constants[constant].upcoming_label.take().unwrap();
1364
self.align_to(align);
1365
self.bind_label(label, ctrl_plane);
1366
self.used_constants.push((constant, self.cur_offset()));
1367
self.get_appended_space(size);
1368
}
1369
1370
// Either handle all pending fixups because they're ready or move them
1371
// onto the `BinaryHeap` tracking all pending fixups if they aren't
1372
// ready.
1373
assert!(self.latest_branches.is_empty());
1374
for fixup in mem::take(&mut self.pending_fixup_records) {
1375
if self.should_apply_fixup(&fixup, forced_threshold) {
1376
self.handle_fixup(fixup, force_veneers, forced_threshold);
1377
} else {
1378
self.fixup_records.push(fixup);
1379
}
1380
}
1381
self.pending_fixup_deadline = u32::MAX;
1382
while let Some(fixup) = self.fixup_records.peek() {
1383
trace!("emit_island: fixup {:?}", fixup);
1384
1385
// If this fixup shouldn't be applied, that means its label isn't
1386
// defined yet and there'll be remaining space to apply a veneer if
1387
// necessary in the future after this island. In that situation
1388
// because `fixup_records` is sorted by deadline this loop can
1389
// exit.
1390
if !self.should_apply_fixup(fixup, forced_threshold) {
1391
break;
1392
}
1393
1394
let fixup = self.fixup_records.pop().unwrap();
1395
self.handle_fixup(fixup, force_veneers, forced_threshold);
1396
}
1397
1398
if let Some(loc) = cur_loc {
1399
self.start_srcloc(loc);
1400
}
1401
}
1402
1403
fn should_apply_fixup(&self, fixup: &MachLabelFixup<I>, forced_threshold: CodeOffset) -> bool {
1404
let label_offset = self.resolve_label_offset(fixup.label);
1405
label_offset != UNKNOWN_LABEL_OFFSET || fixup.deadline() < forced_threshold
1406
}
1407
1408
fn handle_fixup(
1409
&mut self,
1410
fixup: MachLabelFixup<I>,
1411
force_veneers: ForceVeneers,
1412
forced_threshold: CodeOffset,
1413
) {
1414
let MachLabelFixup {
1415
label,
1416
offset,
1417
kind,
1418
} = fixup;
1419
let start = offset as usize;
1420
let end = (offset + kind.patch_size()) as usize;
1421
let label_offset = self.resolve_label_offset(label);
1422
1423
if label_offset != UNKNOWN_LABEL_OFFSET {
1424
// If the offset of the label for this fixup is known then
1425
// we're going to do something here-and-now. We're either going
1426
// to patch the original offset because it's an in-bounds jump,
1427
// or we're going to generate a veneer, patch the fixup to jump
1428
// to the veneer, and then keep going.
1429
//
1430
// If the label comes after the original fixup, then we should
1431
// be guaranteed that the jump is in-bounds. Otherwise there's
1432
// a bug somewhere because this method wasn't called soon
1433
// enough. All forward-jumps are tracked and should get veneers
1434
// before their deadline comes and they're unable to jump
1435
// further.
1436
//
1437
// Otherwise if the label is before the fixup, then that's a
1438
// backwards jump. If it's past the maximum negative range
1439
// then we'll emit a veneer that to jump forward to which can
1440
// then jump backwards.
1441
let veneer_required = if label_offset >= offset {
1442
assert!((label_offset - offset) <= kind.max_pos_range());
1443
false
1444
} else {
1445
(offset - label_offset) > kind.max_neg_range()
1446
};
1447
trace!(
1448
" -> label_offset = {}, known, required = {} (pos {} neg {})",
1449
label_offset,
1450
veneer_required,
1451
kind.max_pos_range(),
1452
kind.max_neg_range()
1453
);
1454
1455
if (force_veneers == ForceVeneers::Yes && kind.supports_veneer()) || veneer_required {
1456
self.emit_veneer(label, offset, kind);
1457
} else {
1458
let slice = &mut self.data[start..end];
1459
trace!(
1460
"patching in-range! slice = {slice:?}; offset = {offset:#x}; label_offset = {label_offset:#x}"
1461
);
1462
kind.patch(slice, offset, label_offset);
1463
}
1464
} else {
1465
// If the offset of this label is not known at this time then
1466
// that means that a veneer is required because after this
1467
// island the target can't be in range of the original target.
1468
assert!(forced_threshold - offset > kind.max_pos_range());
1469
self.emit_veneer(label, offset, kind);
1470
}
1471
}
1472
1473
/// Emits a "veneer" the `kind` code at `offset` to jump to `label`.
1474
///
1475
/// This will generate extra machine code, using `kind`, to get a
1476
/// larger-jump-kind than `kind` allows. The code at `offset` is then
1477
/// patched to jump to our new code, and then the new code is enqueued for
1478
/// a fixup to get processed at some later time.
1479
fn emit_veneer(&mut self, label: MachLabel, offset: CodeOffset, kind: I::LabelUse) {
1480
// If this `kind` doesn't support a veneer then that's a bug in the
1481
// backend because we need to implement support for such a veneer.
1482
assert!(
1483
kind.supports_veneer(),
1484
"jump beyond the range of {kind:?} but a veneer isn't supported",
1485
);
1486
1487
// Allocate space for a veneer in the island.
1488
self.align_to(I::LabelUse::ALIGN);
1489
let veneer_offset = self.cur_offset();
1490
trace!("making a veneer at {}", veneer_offset);
1491
let start = offset as usize;
1492
let end = (offset + kind.patch_size()) as usize;
1493
let slice = &mut self.data[start..end];
1494
// Patch the original label use to refer to the veneer.
1495
trace!(
1496
"patching original at offset {} to veneer offset {}",
1497
offset, veneer_offset
1498
);
1499
kind.patch(slice, offset, veneer_offset);
1500
// Generate the veneer.
1501
let veneer_slice = self.get_appended_space(kind.veneer_size() as usize);
1502
let (veneer_fixup_off, veneer_label_use) =
1503
kind.generate_veneer(veneer_slice, veneer_offset);
1504
trace!(
1505
"generated veneer; fixup offset {}, label_use {:?}",
1506
veneer_fixup_off, veneer_label_use
1507
);
1508
// Register a new use of `label` with our new veneer fixup and
1509
// offset. This'll recalculate deadlines accordingly and
1510
// enqueue this fixup to get processed at some later
1511
// time.
1512
self.use_label_at_offset(veneer_fixup_off, label, veneer_label_use);
1513
}
1514
1515
fn finish_emission_maybe_forcing_veneers(
1516
&mut self,
1517
force_veneers: ForceVeneers,
1518
ctrl_plane: &mut ControlPlane,
1519
) {
1520
while !self.pending_constants.is_empty()
1521
|| !self.pending_traps.is_empty()
1522
|| !self.fixup_records.is_empty()
1523
|| !self.pending_fixup_records.is_empty()
1524
{
1525
// `emit_island()` will emit any pending veneers and constants, and
1526
// as a side-effect, will also take care of any fixups with resolved
1527
// labels eagerly.
1528
self.emit_island_maybe_forced(force_veneers, u32::MAX, ctrl_plane);
1529
}
1530
1531
// Ensure that all labels have been fixed up after the last island is emitted. This is a
1532
// full (release-mode) assert because an unresolved label means the emitted code is
1533
// incorrect.
1534
assert!(self.fixup_records.is_empty());
1535
assert!(self.pending_fixup_records.is_empty());
1536
}
1537
1538
/// Finish any deferred emissions and/or fixups.
1539
pub fn finish(
1540
mut self,
1541
constants: &VCodeConstants,
1542
ctrl_plane: &mut ControlPlane,
1543
) -> MachBufferFinalized<Stencil> {
1544
let _tt = timing::vcode_emit_finish();
1545
1546
self.finish_emission_maybe_forcing_veneers(ForceVeneers::No, ctrl_plane);
1547
1548
let alignment = self.finish_constants(constants);
1549
1550
// Resolve all labels to their offsets.
1551
let finalized_relocs = self
1552
.relocs
1553
.iter()
1554
.map(|reloc| FinalizedMachReloc {
1555
offset: reloc.offset,
1556
kind: reloc.kind,
1557
addend: reloc.addend,
1558
target: match &reloc.target {
1559
RelocTarget::ExternalName(name) => {
1560
FinalizedRelocTarget::ExternalName(name.clone())
1561
}
1562
RelocTarget::Label(label) => {
1563
FinalizedRelocTarget::Func(self.resolve_label_offset(*label))
1564
}
1565
},
1566
})
1567
.collect();
1568
1569
let finalized_exception_handlers = self
1570
.exception_handlers
1571
.iter()
1572
.map(|handler| handler.finalize(|label| self.resolve_label_offset(label)))
1573
.collect();
1574
1575
let mut srclocs = self.srclocs;
1576
srclocs.sort_by_key(|entry| entry.start);
1577
1578
MachBufferFinalized {
1579
data: self.data,
1580
relocs: finalized_relocs,
1581
traps: self.traps,
1582
call_sites: self.call_sites,
1583
patchable_call_sites: self.patchable_call_sites,
1584
exception_handlers: finalized_exception_handlers,
1585
srclocs,
1586
debug_tags: self.debug_tags,
1587
debug_tag_pool: self.debug_tag_pool,
1588
user_stack_maps: self.user_stack_maps,
1589
unwind_info: self.unwind_info,
1590
alignment,
1591
frame_layout: self.frame_layout,
1592
nop_units: I::gen_nop_units(),
1593
}
1594
}
1595
1596
/// Add an external relocation at the given offset.
1597
pub fn add_reloc_at_offset<T: Into<RelocTarget> + Clone>(
1598
&mut self,
1599
offset: CodeOffset,
1600
kind: Reloc,
1601
target: &T,
1602
addend: Addend,
1603
) {
1604
let target: RelocTarget = target.clone().into();
1605
// FIXME(#3277): This should use `I::LabelUse::from_reloc` to optionally
1606
// generate a label-use statement to track whether an island is possibly
1607
// needed to escape this function to actually get to the external name.
1608
// This is most likely to come up on AArch64 where calls between
1609
// functions use a 26-bit signed offset which gives +/- 64MB. This means
1610
// that if a function is 128MB in size and there's a call in the middle
1611
// it's impossible to reach the actual target. Also, while it's
1612
// technically possible to jump to the start of a function and then jump
1613
// further, island insertion below always inserts islands after
1614
// previously appended code so for Cranelift's own implementation this
1615
// is also a problem for 64MB functions on AArch64 which start with a
1616
// call instruction, those won't be able to escape.
1617
//
1618
// Ideally what needs to happen here is that a `LabelUse` is
1619
// transparently generated (or call-sites of this function are audited
1620
// to generate a `LabelUse` instead) and tracked internally. The actual
1621
// relocation would then change over time if and when a veneer is
1622
// inserted, where the relocation here would be patched by this
1623
// `MachBuffer` to jump to the veneer. The problem, though, is that all
1624
// this still needs to end up, in the case of a singular function,
1625
// generating a final relocation pointing either to this particular
1626
// relocation or to the veneer inserted. Additionally
1627
// `MachBuffer` needs the concept of a label which will never be
1628
// resolved, so `emit_island` doesn't trip over not actually ever
1629
// knowing what some labels are. Currently the loop in
1630
// `finish_emission_maybe_forcing_veneers` would otherwise infinitely
1631
// loop.
1632
//
1633
// For now this means that because relocs aren't tracked at all that
1634
// AArch64 functions have a rough size limits of 64MB. For now that's
1635
// somewhat reasonable and the failure mode is a panic in `MachBuffer`
1636
// when a relocation can't otherwise be resolved later, so it shouldn't
1637
// actually result in any memory unsafety or anything like that.
1638
self.relocs.push(MachReloc {
1639
offset,
1640
kind,
1641
target,
1642
addend,
1643
});
1644
}
1645
1646
/// Add an external relocation at the current offset.
1647
pub fn add_reloc<T: Into<RelocTarget> + Clone>(
1648
&mut self,
1649
kind: Reloc,
1650
target: &T,
1651
addend: Addend,
1652
) {
1653
self.add_reloc_at_offset(self.data.len() as CodeOffset, kind, target, addend);
1654
}
1655
1656
/// Add a trap record at the current offset.
1657
pub fn add_trap(&mut self, code: TrapCode) {
1658
self.traps.push(MachTrap {
1659
offset: self.data.len() as CodeOffset,
1660
code,
1661
});
1662
}
1663
1664
/// Add a call-site record at the current offset.
1665
pub fn add_call_site(&mut self) {
1666
self.add_try_call_site(None, core::iter::empty());
1667
}
1668
1669
/// Add a call-site record at the current offset with exception
1670
/// handlers.
1671
pub fn add_try_call_site(
1672
&mut self,
1673
frame_offset: Option<u32>,
1674
exception_handlers: impl Iterator<Item = MachExceptionHandler>,
1675
) {
1676
let start = u32::try_from(self.exception_handlers.len()).unwrap();
1677
self.exception_handlers.extend(exception_handlers);
1678
let end = u32::try_from(self.exception_handlers.len()).unwrap();
1679
let exception_handler_range = start..end;
1680
1681
self.call_sites.push(MachCallSite {
1682
ret_addr: self.data.len() as CodeOffset,
1683
frame_offset,
1684
exception_handler_range,
1685
});
1686
}
1687
1688
/// Add a patchable call record at the current offset The actual
1689
/// call is expected to have been emitted; the VCodeInst trait
1690
/// specifies how to NOP it out, and we carry that information to
1691
/// the finalized Machbuffer.
1692
pub fn add_patchable_call_site(&mut self, len: u32) {
1693
self.patchable_call_sites.push(MachPatchableCallSite {
1694
ret_addr: self.cur_offset(),
1695
len,
1696
});
1697
}
1698
1699
/// Add an unwind record at the current offset.
1700
pub fn add_unwind(&mut self, unwind: UnwindInst) {
1701
self.unwind_info.push((self.cur_offset(), unwind));
1702
}
1703
1704
/// Set the `SourceLoc` for code from this offset until the offset at the
1705
/// next call to `end_srcloc()`.
1706
/// Returns the current [CodeOffset] and [RelSourceLoc].
1707
pub fn start_srcloc(&mut self, loc: RelSourceLoc) -> (CodeOffset, RelSourceLoc) {
1708
let cur = (self.cur_offset(), loc);
1709
self.cur_srcloc = Some(cur);
1710
cur
1711
}
1712
1713
/// Mark the end of the `SourceLoc` segment started at the last
1714
/// `start_srcloc()` call.
1715
pub fn end_srcloc(&mut self) {
1716
let (start, loc) = self
1717
.cur_srcloc
1718
.take()
1719
.expect("end_srcloc() called without start_srcloc()");
1720
let end = self.cur_offset();
1721
// Skip zero-length extends.
1722
debug_assert!(end >= start);
1723
if end > start {
1724
self.srclocs.push(MachSrcLoc { start, end, loc });
1725
}
1726
}
1727
1728
/// Push a user stack map onto this buffer.
1729
///
1730
/// The stack map is associated with the given `return_addr` code
1731
/// offset. This must be the PC for the instruction just *after* this stack
1732
/// map's associated instruction. For example in the sequence `call $foo;
1733
/// add r8, rax`, the `return_addr` must be the offset of the start of the
1734
/// `add` instruction.
1735
///
1736
/// Stack maps must be pushed in sorted `return_addr` order.
1737
pub fn push_user_stack_map(
1738
&mut self,
1739
emit_state: &I::State,
1740
return_addr: CodeOffset,
1741
mut stack_map: ir::UserStackMap,
1742
) {
1743
let span = emit_state.frame_layout().active_size();
1744
trace!("Adding user stack map @ {return_addr:#x} spanning {span} bytes: {stack_map:?}");
1745
1746
debug_assert!(
1747
self.user_stack_maps
1748
.last()
1749
.map_or(true, |(prev_addr, _, _)| *prev_addr < return_addr),
1750
"pushed stack maps out of order: {} is not less than {}",
1751
self.user_stack_maps.last().unwrap().0,
1752
return_addr,
1753
);
1754
1755
stack_map.finalize(emit_state.frame_layout().sp_to_sized_stack_slots());
1756
self.user_stack_maps.push((return_addr, span, stack_map));
1757
}
1758
1759
/// Push a debug tag associated with the current buffer offset.
1760
pub fn push_debug_tags(&mut self, pos: MachDebugTagPos, tags: &[DebugTag]) {
1761
trace!("debug tags at offset {}: {tags:?}", self.cur_offset());
1762
let start = u32::try_from(self.debug_tag_pool.len()).unwrap();
1763
self.debug_tag_pool.extend(tags.iter().cloned());
1764
let end = u32::try_from(self.debug_tag_pool.len()).unwrap();
1765
self.debug_tags.push(MachDebugTags {
1766
offset: self.cur_offset(),
1767
pos,
1768
range: start..end,
1769
});
1770
}
1771
1772
/// Increase the alignment of the buffer to the given alignment if bigger
1773
/// than the current alignment.
1774
pub fn set_log2_min_function_alignment(&mut self, align_to: u8) {
1775
self.min_alignment = self.min_alignment.max(
1776
1u32.checked_shl(u32::from(align_to))
1777
.expect("log2_min_function_alignment too large"),
1778
);
1779
}
1780
1781
/// Set the frame layout metadata.
1782
pub fn set_frame_layout(&mut self, frame_layout: MachBufferFrameLayout) {
1783
debug_assert!(self.frame_layout.is_none());
1784
self.frame_layout = Some(frame_layout);
1785
}
1786
}
1787
1788
impl<I: VCodeInst> Extend<u8> for MachBuffer<I> {
1789
fn extend<T: IntoIterator<Item = u8>>(&mut self, iter: T) {
1790
for b in iter {
1791
self.put1(b);
1792
}
1793
}
1794
}
1795
1796
impl<T: CompilePhase> MachBufferFinalized<T> {
1797
/// Get a list of source location mapping tuples in sorted-by-start-offset order.
1798
pub fn get_srclocs_sorted(&self) -> &[T::MachSrcLocType] {
1799
&self.srclocs[..]
1800
}
1801
1802
/// Get all debug tags, sorted by associated offset.
1803
pub fn debug_tags(&self) -> impl Iterator<Item = MachBufferDebugTagList<'_>> {
1804
self.debug_tags.iter().map(|tags| {
1805
let start = usize::try_from(tags.range.start).unwrap();
1806
let end = usize::try_from(tags.range.end).unwrap();
1807
MachBufferDebugTagList {
1808
offset: tags.offset,
1809
pos: tags.pos,
1810
tags: &self.debug_tag_pool[start..end],
1811
}
1812
})
1813
}
1814
1815
/// Get the total required size for the code.
1816
pub fn total_size(&self) -> CodeOffset {
1817
self.data.len() as CodeOffset
1818
}
1819
1820
/// Return the code in this mach buffer as a hex string for testing purposes.
1821
pub fn stringify_code_bytes(&self) -> String {
1822
// This is pretty lame, but whatever ..
1823
use core::fmt::Write;
1824
let mut s = String::with_capacity(self.data.len() * 2);
1825
for b in &self.data {
1826
write!(&mut s, "{b:02X}").unwrap();
1827
}
1828
s
1829
}
1830
1831
/// Get the code bytes.
1832
pub fn data(&self) -> &[u8] {
1833
// N.B.: we emit every section into the .text section as far as
1834
// the `CodeSink` is concerned; we do not bother to segregate
1835
// the contents into the actual program text, the jumptable and the
1836
// rodata (constant pool). This allows us to generate code assuming
1837
// that these will not be relocated relative to each other, and avoids
1838
// having to designate each section as belonging in one of the three
1839
// fixed categories defined by `CodeSink`. If this becomes a problem
1840
// later (e.g. because of memory permissions or similar), we can
1841
// add this designation and segregate the output; take care, however,
1842
// to add the appropriate relocations in this case.
1843
1844
&self.data[..]
1845
}
1846
1847
/// Get a mutable slice of the code bytes, allowing patching
1848
/// post-passes.
1849
pub fn data_mut(&mut self) -> &mut [u8] {
1850
&mut self.data[..]
1851
}
1852
1853
/// Get the list of external relocations for this code.
1854
pub fn relocs(&self) -> &[FinalizedMachReloc] {
1855
&self.relocs[..]
1856
}
1857
1858
/// Get the list of trap records for this code.
1859
pub fn traps(&self) -> &[MachTrap] {
1860
&self.traps[..]
1861
}
1862
1863
/// Get the user stack map metadata for this code.
1864
pub fn user_stack_maps(&self) -> &[(CodeOffset, u32, ir::UserStackMap)] {
1865
&self.user_stack_maps
1866
}
1867
1868
/// Take this buffer's user strack map metadata.
1869
pub fn take_user_stack_maps(&mut self) -> SmallVec<[(CodeOffset, u32, ir::UserStackMap); 8]> {
1870
mem::take(&mut self.user_stack_maps)
1871
}
1872
1873
/// Get the list of call sites for this code, along with
1874
/// associated exception handlers.
1875
///
1876
/// Each item yielded by the returned iterator is a struct with:
1877
///
1878
/// - The call site metadata record, with a `ret_addr` field
1879
/// directly accessible and denoting the offset of the return
1880
/// address into this buffer's code.
1881
/// - The slice of pairs of exception tags and code offsets
1882
/// denoting exception-handler entry points associated with this
1883
/// call site.
1884
pub fn call_sites(&self) -> impl Iterator<Item = FinalizedMachCallSite<'_>> + '_ {
1885
self.call_sites.iter().map(|call_site| {
1886
let handler_range = call_site.exception_handler_range.clone();
1887
let handler_range = usize::try_from(handler_range.start).unwrap()
1888
..usize::try_from(handler_range.end).unwrap();
1889
FinalizedMachCallSite {
1890
ret_addr: call_site.ret_addr,
1891
frame_offset: call_site.frame_offset,
1892
exception_handlers: &self.exception_handlers[handler_range],
1893
}
1894
})
1895
}
1896
1897
/// Get the frame layout, if known.
1898
pub fn frame_layout(&self) -> Option<&MachBufferFrameLayout> {
1899
self.frame_layout.as_ref()
1900
}
1901
1902
/// Get the list of patchable call sites for this code.
1903
///
1904
/// Each location in the buffer contains the bytes for a call
1905
/// instruction to the specified target. If the call is to be
1906
/// patched out, the bytes in the region should be replaced with
1907
/// those given in the `MachBufferFinalized::nop` array, repeated
1908
/// as many times as necessary. (The length of the patchable
1909
/// region is guaranteed to be an integer multiple of that NOP
1910
/// unit size.)
1911
pub fn patchable_call_sites(&self) -> impl Iterator<Item = &MachPatchableCallSite> + '_ {
1912
self.patchable_call_sites.iter()
1913
}
1914
}
1915
1916
/// An item in the exception-handler list for a callsite, with label
1917
/// references. Items are interpreted in left-to-right order and the
1918
/// first match wins.
1919
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1920
pub enum MachExceptionHandler {
1921
/// A specific tag (in the current dynamic context) should be
1922
/// handled by the code at the given offset.
1923
Tag(ExceptionTag, MachLabel),
1924
/// All exceptions should be handled by the code at the given
1925
/// offset.
1926
Default(MachLabel),
1927
/// The dynamic context for interpreting tags is updated to the
1928
/// value stored in the given machine location (in this frame's
1929
/// context).
1930
Context(ExceptionContextLoc),
1931
}
1932
1933
impl MachExceptionHandler {
1934
fn finalize<F: Fn(MachLabel) -> CodeOffset>(self, f: F) -> FinalizedMachExceptionHandler {
1935
match self {
1936
Self::Tag(tag, label) => FinalizedMachExceptionHandler::Tag(tag, f(label)),
1937
Self::Default(label) => FinalizedMachExceptionHandler::Default(f(label)),
1938
Self::Context(loc) => FinalizedMachExceptionHandler::Context(loc),
1939
}
1940
}
1941
}
1942
1943
/// An item in the exception-handler list for a callsite, with final
1944
/// (lowered) code offsets. Items are interpreted in left-to-right
1945
/// order and the first match wins.
1946
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1947
#[cfg_attr(
1948
feature = "enable-serde",
1949
derive(serde_derive::Serialize, serde_derive::Deserialize)
1950
)]
1951
pub enum FinalizedMachExceptionHandler {
1952
/// A specific tag (in the current dynamic context) should be
1953
/// handled by the code at the given offset.
1954
Tag(ExceptionTag, CodeOffset),
1955
/// All exceptions should be handled by the code at the given
1956
/// offset.
1957
Default(CodeOffset),
1958
/// The dynamic context for interpreting tags is updated to the
1959
/// value stored in the given machine location (in this frame's
1960
/// context).
1961
Context(ExceptionContextLoc),
1962
}
1963
1964
/// A location for a dynamic exception context value.
1965
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
1966
#[cfg_attr(
1967
feature = "enable-serde",
1968
derive(serde_derive::Serialize, serde_derive::Deserialize)
1969
)]
1970
pub enum ExceptionContextLoc {
1971
/// An offset from SP at the callsite.
1972
SPOffset(u32),
1973
/// A GPR at the callsite. The physical register number for the
1974
/// GPR register file on the target architecture is used.
1975
GPR(u8),
1976
}
1977
1978
/// Metadata about a constant.
1979
struct MachBufferConstant {
1980
/// A label which has not yet been bound which can be used for this
1981
/// constant.
1982
///
1983
/// This is lazily created when a label is requested for a constant and is
1984
/// cleared when a constant is emitted.
1985
upcoming_label: Option<MachLabel>,
1986
/// Required alignment.
1987
align: CodeOffset,
1988
/// The byte size of this constant.
1989
size: usize,
1990
}
1991
1992
/// A trap that is deferred to the next time an island is emitted for either
1993
/// traps, constants, or fixups.
1994
struct MachLabelTrap {
1995
/// This label will refer to the trap's offset.
1996
label: MachLabel,
1997
/// The code associated with this trap.
1998
code: TrapCode,
1999
/// An optional source location to assign for this trap.
2000
loc: Option<RelSourceLoc>,
2001
}
2002
2003
/// A fixup to perform on the buffer once code is emitted. Fixups always refer
2004
/// to labels and patch the code based on label offsets. Hence, they are like
2005
/// relocations, but internal to one buffer.
2006
#[derive(Debug)]
2007
struct MachLabelFixup<I: VCodeInst> {
2008
/// The label whose offset controls this fixup.
2009
label: MachLabel,
2010
/// The offset to fix up / patch to refer to this label.
2011
offset: CodeOffset,
2012
/// The kind of fixup. This is architecture-specific; each architecture may have,
2013
/// e.g., several types of branch instructions, each with differently-sized
2014
/// offset fields and different places within the instruction to place the
2015
/// bits.
2016
kind: I::LabelUse,
2017
}
2018
2019
impl<I: VCodeInst> MachLabelFixup<I> {
2020
fn deadline(&self) -> CodeOffset {
2021
self.offset.saturating_add(self.kind.max_pos_range())
2022
}
2023
}
2024
2025
impl<I: VCodeInst> PartialEq for MachLabelFixup<I> {
2026
fn eq(&self, other: &Self) -> bool {
2027
self.deadline() == other.deadline()
2028
}
2029
}
2030
2031
impl<I: VCodeInst> Eq for MachLabelFixup<I> {}
2032
2033
impl<I: VCodeInst> PartialOrd for MachLabelFixup<I> {
2034
fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
2035
Some(self.cmp(other))
2036
}
2037
}
2038
2039
impl<I: VCodeInst> Ord for MachLabelFixup<I> {
2040
fn cmp(&self, other: &Self) -> Ordering {
2041
other.deadline().cmp(&self.deadline())
2042
}
2043
}
2044
2045
/// A relocation resulting from a compilation.
2046
#[derive(Clone, Debug, PartialEq)]
2047
#[cfg_attr(
2048
feature = "enable-serde",
2049
derive(serde_derive::Serialize, serde_derive::Deserialize)
2050
)]
2051
pub struct MachRelocBase<T> {
2052
/// The offset at which the relocation applies, *relative to the
2053
/// containing section*.
2054
pub offset: CodeOffset,
2055
/// The kind of relocation.
2056
pub kind: Reloc,
2057
/// The external symbol / name to which this relocation refers.
2058
pub target: T,
2059
/// The addend to add to the symbol value.
2060
pub addend: i64,
2061
}
2062
2063
type MachReloc = MachRelocBase<RelocTarget>;
2064
2065
/// A relocation resulting from a compilation.
2066
pub type FinalizedMachReloc = MachRelocBase<FinalizedRelocTarget>;
2067
2068
/// A Relocation target
2069
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2070
pub enum RelocTarget {
2071
/// Points to an [ExternalName] outside the current function.
2072
ExternalName(ExternalName),
2073
/// Points to a [MachLabel] inside this function.
2074
/// This is different from [MachLabelFixup] in that both the relocation and the
2075
/// label will be emitted and are only resolved at link time.
2076
///
2077
/// There is no reason to prefer this over [MachLabelFixup] unless the ABI requires it.
2078
Label(MachLabel),
2079
}
2080
2081
impl From<ExternalName> for RelocTarget {
2082
fn from(name: ExternalName) -> Self {
2083
Self::ExternalName(name)
2084
}
2085
}
2086
2087
impl From<MachLabel> for RelocTarget {
2088
fn from(label: MachLabel) -> Self {
2089
Self::Label(label)
2090
}
2091
}
2092
2093
/// A Relocation target
2094
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
2095
#[cfg_attr(
2096
feature = "enable-serde",
2097
derive(serde_derive::Serialize, serde_derive::Deserialize)
2098
)]
2099
pub enum FinalizedRelocTarget {
2100
/// Points to an [ExternalName] outside the current function.
2101
ExternalName(ExternalName),
2102
/// Points to a [CodeOffset] from the start of the current function.
2103
Func(CodeOffset),
2104
}
2105
2106
impl FinalizedRelocTarget {
2107
/// Returns a display for the current [FinalizedRelocTarget], with extra context to prettify the
2108
/// output.
2109
pub fn display<'a>(&'a self, params: Option<&'a FunctionParameters>) -> String {
2110
match self {
2111
FinalizedRelocTarget::ExternalName(name) => format!("{}", name.display(params)),
2112
FinalizedRelocTarget::Func(offset) => format!("func+{offset}"),
2113
}
2114
}
2115
}
2116
2117
/// A trap record resulting from a compilation.
2118
#[derive(Clone, Debug, PartialEq)]
2119
#[cfg_attr(
2120
feature = "enable-serde",
2121
derive(serde_derive::Serialize, serde_derive::Deserialize)
2122
)]
2123
pub struct MachTrap {
2124
/// The offset at which the trap instruction occurs, *relative to the
2125
/// containing section*.
2126
pub offset: CodeOffset,
2127
/// The trap code.
2128
pub code: TrapCode,
2129
}
2130
2131
/// A call site record resulting from a compilation.
2132
#[derive(Clone, Debug, PartialEq)]
2133
#[cfg_attr(
2134
feature = "enable-serde",
2135
derive(serde_derive::Serialize, serde_derive::Deserialize)
2136
)]
2137
pub struct MachCallSite {
2138
/// The offset of the call's return address, *relative to the
2139
/// start of the buffer*.
2140
pub ret_addr: CodeOffset,
2141
2142
/// The offset from the FP at this callsite down to the SP when
2143
/// the call occurs, if known. In other words, the size of the
2144
/// stack frame up to the saved FP slot. Useful to recover the
2145
/// start of the stack frame and to look up dynamic contexts
2146
/// stored in [`ExceptionContextLoc::SPOffset`].
2147
///
2148
/// If `None`, the compiler backend did not specify a frame
2149
/// offset. The runtime in use with the compiled code may require
2150
/// the frame offset if exception handlers are present or dynamic
2151
/// context is used, but that is not Cranelift's concern: the
2152
/// frame offset is optional at this level.
2153
pub frame_offset: Option<u32>,
2154
2155
/// Range in `exception_handlers` corresponding to the exception
2156
/// handlers for this callsite.
2157
exception_handler_range: Range<u32>,
2158
}
2159
2160
/// A call site record resulting from a compilation.
2161
#[derive(Clone, Debug, PartialEq)]
2162
pub struct FinalizedMachCallSite<'a> {
2163
/// The offset of the call's return address, *relative to the
2164
/// start of the buffer*.
2165
pub ret_addr: CodeOffset,
2166
2167
/// The offset from the FP at this callsite down to the SP when
2168
/// the call occurs, if known. In other words, the size of the
2169
/// stack frame up to the saved FP slot. Useful to recover the
2170
/// start of the stack frame and to look up dynamic contexts
2171
/// stored in [`ExceptionContextLoc::SPOffset`].
2172
///
2173
/// If `None`, the compiler backend did not specify a frame
2174
/// offset. The runtime in use with the compiled code may require
2175
/// the frame offset if exception handlers are present or dynamic
2176
/// context is used, but that is not Cranelift's concern: the
2177
/// frame offset is optional at this level.
2178
pub frame_offset: Option<u32>,
2179
2180
/// Exception handlers at this callsite, with target offsets
2181
/// *relative to the start of the buffer*.
2182
pub exception_handlers: &'a [FinalizedMachExceptionHandler],
2183
}
2184
2185
/// A patchable call site record resulting from a compilation.
2186
#[derive(Clone, Debug, PartialEq)]
2187
#[cfg_attr(
2188
feature = "enable-serde",
2189
derive(serde_derive::Serialize, serde_derive::Deserialize)
2190
)]
2191
pub struct MachPatchableCallSite {
2192
/// The offset of the call's return address (i.e., the address
2193
/// after the end of the patchable region), *relative to the start
2194
/// of the buffer*.
2195
pub ret_addr: CodeOffset,
2196
2197
/// The length of the region to be patched by NOP bytes.
2198
pub len: u32,
2199
}
2200
2201
/// A source-location mapping resulting from a compilation.
2202
#[derive(PartialEq, Debug, Clone)]
2203
#[cfg_attr(
2204
feature = "enable-serde",
2205
derive(serde_derive::Serialize, serde_derive::Deserialize)
2206
)]
2207
pub struct MachSrcLoc<T: CompilePhase> {
2208
/// The start of the region of code corresponding to a source location.
2209
/// This is relative to the start of the function, not to the start of the
2210
/// section.
2211
pub start: CodeOffset,
2212
/// The end of the region of code corresponding to a source location.
2213
/// This is relative to the start of the function, not to the start of the
2214
/// section.
2215
pub end: CodeOffset,
2216
/// The source location.
2217
pub loc: T::SourceLocType,
2218
}
2219
2220
impl MachSrcLoc<Stencil> {
2221
fn apply_base_srcloc(self, base_srcloc: SourceLoc) -> MachSrcLoc<Final> {
2222
MachSrcLoc {
2223
start: self.start,
2224
end: self.end,
2225
loc: self.loc.expand(base_srcloc),
2226
}
2227
}
2228
}
2229
2230
/// Record of branch instruction in the buffer, to facilitate editing.
2231
#[derive(Clone, Debug)]
2232
struct MachBranch {
2233
start: CodeOffset,
2234
end: CodeOffset,
2235
target: MachLabel,
2236
fixup: usize,
2237
inverted: Option<SmallVec<[u8; 8]>>,
2238
/// All labels pointing to the start of this branch. For correctness, this
2239
/// *must* be complete (i.e., must contain all labels whose resolved offsets
2240
/// are at the start of this branch): we rely on being able to redirect all
2241
/// labels that could jump to this branch before removing it, if it is
2242
/// otherwise unreachable.
2243
labels_at_this_branch: SmallVec<[MachLabel; 4]>,
2244
}
2245
2246
impl MachBranch {
2247
fn is_cond(&self) -> bool {
2248
self.inverted.is_some()
2249
}
2250
fn is_uncond(&self) -> bool {
2251
self.inverted.is_none()
2252
}
2253
}
2254
2255
/// Stack-frame layout information carried through to machine
2256
/// code. This provides sufficient information to interpret an active
2257
/// stack frame from a running function, if provided.
2258
#[derive(Clone, Debug, PartialEq)]
2259
#[cfg_attr(
2260
feature = "enable-serde",
2261
derive(serde_derive::Serialize, serde_derive::Deserialize)
2262
)]
2263
pub struct MachBufferFrameLayout {
2264
/// Offset from bottom of frame to FP (near top of frame). This
2265
/// allows reading the frame given only FP.
2266
pub frame_to_fp_offset: u32,
2267
/// Offset from bottom of frame for each StackSlot,
2268
pub stackslots: SecondaryMap<ir::StackSlot, MachBufferStackSlot>,
2269
}
2270
2271
/// Descriptor for a single stack slot in the compiled function.
2272
#[derive(Clone, Debug, PartialEq, Default)]
2273
#[cfg_attr(
2274
feature = "enable-serde",
2275
derive(serde_derive::Serialize, serde_derive::Deserialize)
2276
)]
2277
pub struct MachBufferStackSlot {
2278
/// Offset from the bottom of the stack frame.
2279
pub offset: u32,
2280
2281
/// User-provided key to describe this stack slot.
2282
pub key: Option<ir::StackSlotKey>,
2283
}
2284
2285
/// Debug tags: a sequence of references to a stack slot, or a
2286
/// user-defined value, at a particular PC.
2287
#[derive(Clone, Debug, PartialEq)]
2288
#[cfg_attr(
2289
feature = "enable-serde",
2290
derive(serde_derive::Serialize, serde_derive::Deserialize)
2291
)]
2292
pub(crate) struct MachDebugTags {
2293
/// Offset at which this tag applies.
2294
pub offset: CodeOffset,
2295
2296
/// Position on the attached instruction. This indicates whether
2297
/// the tags attach to the prior instruction (i.e., as a return
2298
/// point from a call) or the current instruction (i.e., as a PC
2299
/// seen during a trap).
2300
pub pos: MachDebugTagPos,
2301
2302
/// The range in the tag pool.
2303
pub range: Range<u32>,
2304
}
2305
2306
/// Debug tag position on an instruction.
2307
///
2308
/// We need to distinguish position on an instruction, and not just
2309
/// use offsets, because of the following case:
2310
///
2311
/// ```plain
2312
/// <tag1, tag2> call ...
2313
/// <tag3, tag4> trapping_store ...
2314
/// ```
2315
///
2316
/// If the stack is walked and interpreted with debug tags while
2317
/// within the call, the PC seen will be the return point, i.e. the
2318
/// address after the call. If the stack is walked and interpreted
2319
/// with debug tags upon a trap of the following instruction, it will
2320
/// be the PC of that instruction -- which is the same PC! Thus to
2321
/// disambiguate which tags we want, we attach a "pre/post" flag to
2322
/// every group of tags at an offset; and when we look up tags, we
2323
/// look them up for an offset and "position" at that offset.
2324
///
2325
/// Thus there are logically two positions at every offset -- so the
2326
/// above will be emitted as
2327
///
2328
/// ```plain
2329
/// 0: call ...
2330
/// 4, post: <tag1, tag2>
2331
/// 4, pre: <tag3, tag4>
2332
/// 4: trapping_store ...
2333
/// ```
2334
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
2335
#[cfg_attr(
2336
feature = "enable-serde",
2337
derive(serde_derive::Serialize, serde_derive::Deserialize)
2338
)]
2339
pub enum MachDebugTagPos {
2340
/// Tags attached after the instruction that ends at this offset.
2341
///
2342
/// This is used to attach tags to a call, because the PC we see
2343
/// when walking the stack is the *return point*.
2344
Post,
2345
/// Tags attached before the instruction that starts at this offset.
2346
///
2347
/// This is used to attach tags to every other kind of
2348
/// instruction, because the PC we see when processing a trap of
2349
/// that instruction is the PC of that instruction, not the
2350
/// following one.
2351
Pre,
2352
}
2353
2354
/// Iterator item for visiting debug tags.
2355
pub struct MachBufferDebugTagList<'a> {
2356
/// Offset at which this tag applies.
2357
pub offset: CodeOffset,
2358
2359
/// Position at this offset ("post", attaching to prior
2360
/// instruction, or "pre", attaching to next instruction).
2361
pub pos: MachDebugTagPos,
2362
2363
/// The underlying tags.
2364
pub tags: &'a [DebugTag],
2365
}
2366
2367
/// Implementation of the `TextSectionBuilder` trait backed by `MachBuffer`.
2368
///
2369
/// Note that `MachBuffer` was primarily written for intra-function references
2370
/// of jumps between basic blocks, but it's also quite usable for entire text
2371
/// sections and resolving references between functions themselves. This
2372
/// builder interprets "blocks" as labeled functions for the purposes of
2373
/// resolving labels internally in the buffer.
2374
pub struct MachTextSectionBuilder<I: VCodeInst> {
2375
buf: MachBuffer<I>,
2376
next_func: usize,
2377
force_veneers: ForceVeneers,
2378
}
2379
2380
impl<I: VCodeInst> MachTextSectionBuilder<I> {
2381
/// Creates a new text section builder which will have `num_funcs` functions
2382
/// pushed into it.
2383
pub fn new(num_funcs: usize) -> MachTextSectionBuilder<I> {
2384
let mut buf = MachBuffer::new();
2385
buf.reserve_labels_for_blocks(num_funcs);
2386
MachTextSectionBuilder {
2387
buf,
2388
next_func: 0,
2389
force_veneers: ForceVeneers::No,
2390
}
2391
}
2392
}
2393
2394
impl<I: VCodeInst> TextSectionBuilder for MachTextSectionBuilder<I> {
2395
fn append(
2396
&mut self,
2397
labeled: bool,
2398
func: &[u8],
2399
align: u32,
2400
ctrl_plane: &mut ControlPlane,
2401
) -> u64 {
2402
// Conditionally emit an island if it's necessary to resolve jumps
2403
// between functions which are too far away.
2404
let size = func.len() as u32;
2405
if self.force_veneers == ForceVeneers::Yes || self.buf.island_needed(size) {
2406
self.buf
2407
.emit_island_maybe_forced(self.force_veneers, size, ctrl_plane);
2408
}
2409
2410
self.buf.align_to(align);
2411
let pos = self.buf.cur_offset();
2412
if labeled {
2413
self.buf.bind_label(
2414
MachLabel::from_block(BlockIndex::new(self.next_func)),
2415
ctrl_plane,
2416
);
2417
self.next_func += 1;
2418
}
2419
self.buf.put_data(func);
2420
u64::from(pos)
2421
}
2422
2423
fn resolve_reloc(&mut self, offset: u64, reloc: Reloc, addend: Addend, target: usize) -> bool {
2424
crate::trace!(
2425
"Resolving relocation @ {offset:#x} + {addend:#x} to target {target} of kind {reloc:?}"
2426
);
2427
let label = MachLabel::from_block(BlockIndex::new(target));
2428
let offset = u32::try_from(offset).unwrap();
2429
match I::LabelUse::from_reloc(reloc, addend) {
2430
Some(label_use) => {
2431
self.buf.use_label_at_offset(offset, label, label_use);
2432
true
2433
}
2434
None => false,
2435
}
2436
}
2437
2438
fn force_veneers(&mut self) {
2439
self.force_veneers = ForceVeneers::Yes;
2440
}
2441
2442
fn write(&mut self, offset: u64, data: &[u8]) {
2443
self.buf.data[offset.try_into().unwrap()..][..data.len()].copy_from_slice(data);
2444
}
2445
2446
fn finish(&mut self, ctrl_plane: &mut ControlPlane) -> Vec<u8> {
2447
// Double-check all functions were pushed.
2448
assert_eq!(self.next_func, self.buf.label_offsets.len());
2449
2450
// Finish up any veneers, if necessary.
2451
self.buf
2452
.finish_emission_maybe_forcing_veneers(self.force_veneers, ctrl_plane);
2453
2454
// We don't need the data any more, so return it to the caller.
2455
mem::take(&mut self.buf.data).into_vec()
2456
}
2457
}
2458
2459
// We use an actual instruction definition to do tests, so we depend on the `arm64` feature here.
2460
#[cfg(all(test, feature = "arm64"))]
2461
mod test {
2462
use cranelift_entity::EntityRef as _;
2463
2464
use super::*;
2465
use crate::ir::UserExternalNameRef;
2466
use crate::isa::aarch64::inst::{BranchTarget, CondBrKind, EmitInfo, Inst};
2467
use crate::isa::aarch64::inst::{OperandSize, xreg};
2468
use crate::machinst::{MachInstEmit, MachInstEmitState};
2469
use crate::settings;
2470
2471
fn label(n: u32) -> MachLabel {
2472
MachLabel::from_block(BlockIndex::new(n as usize))
2473
}
2474
fn target(n: u32) -> BranchTarget {
2475
BranchTarget::Label(label(n))
2476
}
2477
2478
#[test]
2479
fn test_elide_jump_to_next() {
2480
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2481
let mut buf = MachBuffer::new();
2482
let mut state = <Inst as MachInstEmit>::State::default();
2483
let constants = Default::default();
2484
2485
buf.reserve_labels_for_blocks(2);
2486
buf.bind_label(label(0), state.ctrl_plane_mut());
2487
let inst = Inst::Jump { dest: target(1) };
2488
inst.emit(&mut buf, &info, &mut state);
2489
buf.bind_label(label(1), state.ctrl_plane_mut());
2490
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2491
assert_eq!(0, buf.total_size());
2492
}
2493
2494
#[test]
2495
fn test_elide_trivial_jump_blocks() {
2496
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2497
let mut buf = MachBuffer::new();
2498
let mut state = <Inst as MachInstEmit>::State::default();
2499
let constants = Default::default();
2500
2501
buf.reserve_labels_for_blocks(4);
2502
2503
buf.bind_label(label(0), state.ctrl_plane_mut());
2504
let inst = Inst::CondBr {
2505
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2506
taken: target(1),
2507
not_taken: target(2),
2508
};
2509
inst.emit(&mut buf, &info, &mut state);
2510
2511
buf.bind_label(label(1), state.ctrl_plane_mut());
2512
let inst = Inst::Jump { dest: target(3) };
2513
inst.emit(&mut buf, &info, &mut state);
2514
2515
buf.bind_label(label(2), state.ctrl_plane_mut());
2516
let inst = Inst::Jump { dest: target(3) };
2517
inst.emit(&mut buf, &info, &mut state);
2518
2519
buf.bind_label(label(3), state.ctrl_plane_mut());
2520
2521
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2522
assert_eq!(0, buf.total_size());
2523
}
2524
2525
#[test]
2526
fn test_flip_cond() {
2527
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2528
let mut buf = MachBuffer::new();
2529
let mut state = <Inst as MachInstEmit>::State::default();
2530
let constants = Default::default();
2531
2532
buf.reserve_labels_for_blocks(4);
2533
2534
buf.bind_label(label(0), state.ctrl_plane_mut());
2535
let inst = Inst::CondBr {
2536
kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2537
taken: target(1),
2538
not_taken: target(2),
2539
};
2540
inst.emit(&mut buf, &info, &mut state);
2541
2542
buf.bind_label(label(1), state.ctrl_plane_mut());
2543
let inst = Inst::Nop4;
2544
inst.emit(&mut buf, &info, &mut state);
2545
2546
buf.bind_label(label(2), state.ctrl_plane_mut());
2547
let inst = Inst::Udf {
2548
trap_code: TrapCode::STACK_OVERFLOW,
2549
};
2550
inst.emit(&mut buf, &info, &mut state);
2551
2552
buf.bind_label(label(3), state.ctrl_plane_mut());
2553
2554
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2555
2556
let mut buf2 = MachBuffer::new();
2557
let mut state = Default::default();
2558
let inst = Inst::TrapIf {
2559
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2560
trap_code: TrapCode::STACK_OVERFLOW,
2561
};
2562
inst.emit(&mut buf2, &info, &mut state);
2563
let inst = Inst::Nop4;
2564
inst.emit(&mut buf2, &info, &mut state);
2565
2566
let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2567
2568
assert_eq!(buf.data, buf2.data);
2569
}
2570
2571
#[test]
2572
fn test_island() {
2573
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2574
let mut buf = MachBuffer::new();
2575
let mut state = <Inst as MachInstEmit>::State::default();
2576
let constants = Default::default();
2577
2578
buf.reserve_labels_for_blocks(4);
2579
2580
buf.bind_label(label(0), state.ctrl_plane_mut());
2581
let inst = Inst::CondBr {
2582
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2583
taken: target(2),
2584
not_taken: target(3),
2585
};
2586
inst.emit(&mut buf, &info, &mut state);
2587
2588
buf.bind_label(label(1), state.ctrl_plane_mut());
2589
while buf.cur_offset() < 2000000 {
2590
if buf.island_needed(0) {
2591
buf.emit_island(0, state.ctrl_plane_mut());
2592
}
2593
let inst = Inst::Nop4;
2594
inst.emit(&mut buf, &info, &mut state);
2595
}
2596
2597
buf.bind_label(label(2), state.ctrl_plane_mut());
2598
let inst = Inst::Nop4;
2599
inst.emit(&mut buf, &info, &mut state);
2600
2601
buf.bind_label(label(3), state.ctrl_plane_mut());
2602
let inst = Inst::Nop4;
2603
inst.emit(&mut buf, &info, &mut state);
2604
2605
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2606
2607
assert_eq!(2000000 + 8, buf.total_size());
2608
2609
let mut buf2 = MachBuffer::new();
2610
let mut state = Default::default();
2611
let inst = Inst::CondBr {
2612
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2613
2614
// This conditionally taken branch has a 19-bit constant, shifted
2615
// to the left by two, giving us a 21-bit range in total. Half of
2616
// this range positive so the we should be around 1 << 20 bytes
2617
// away for our jump target.
2618
//
2619
// There are two pending fixups by the time we reach this point,
2620
// one for this 19-bit jump and one for the unconditional 26-bit
2621
// jump below. A 19-bit veneer is 4 bytes large and the 26-bit
2622
// veneer is 20 bytes large, which means that pessimistically
2623
// assuming we'll need two veneers. Currently each veneer is
2624
// pessimistically assumed to be the maximal size which means we
2625
// need 40 bytes of extra space, meaning that the actual island
2626
// should come 40-bytes before the deadline.
2627
taken: BranchTarget::ResolvedOffset((1 << 20) - 20 - 20),
2628
2629
// This branch is in-range so no veneers should be needed, it should
2630
// go directly to the target.
2631
not_taken: BranchTarget::ResolvedOffset(2000000 + 4 - 4),
2632
};
2633
inst.emit(&mut buf2, &info, &mut state);
2634
2635
let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2636
2637
assert_eq!(&buf.data[0..8], &buf2.data[..]);
2638
}
2639
2640
#[test]
2641
fn test_island_backward() {
2642
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2643
let mut buf = MachBuffer::new();
2644
let mut state = <Inst as MachInstEmit>::State::default();
2645
let constants = Default::default();
2646
2647
buf.reserve_labels_for_blocks(4);
2648
2649
buf.bind_label(label(0), state.ctrl_plane_mut());
2650
let inst = Inst::Nop4;
2651
inst.emit(&mut buf, &info, &mut state);
2652
2653
buf.bind_label(label(1), state.ctrl_plane_mut());
2654
let inst = Inst::Nop4;
2655
inst.emit(&mut buf, &info, &mut state);
2656
2657
buf.bind_label(label(2), state.ctrl_plane_mut());
2658
while buf.cur_offset() < 2000000 {
2659
let inst = Inst::Nop4;
2660
inst.emit(&mut buf, &info, &mut state);
2661
}
2662
2663
buf.bind_label(label(3), state.ctrl_plane_mut());
2664
let inst = Inst::CondBr {
2665
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2666
taken: target(0),
2667
not_taken: target(1),
2668
};
2669
inst.emit(&mut buf, &info, &mut state);
2670
2671
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2672
2673
assert_eq!(2000000 + 12, buf.total_size());
2674
2675
let mut buf2 = MachBuffer::new();
2676
let mut state = Default::default();
2677
let inst = Inst::CondBr {
2678
kind: CondBrKind::NotZero(xreg(0), OperandSize::Size64),
2679
taken: BranchTarget::ResolvedOffset(8),
2680
not_taken: BranchTarget::ResolvedOffset(4 - (2000000 + 4)),
2681
};
2682
inst.emit(&mut buf2, &info, &mut state);
2683
let inst = Inst::Jump {
2684
dest: BranchTarget::ResolvedOffset(-(2000000 + 8)),
2685
};
2686
inst.emit(&mut buf2, &info, &mut state);
2687
2688
let buf2 = buf2.finish(&constants, state.ctrl_plane_mut());
2689
2690
assert_eq!(&buf.data[2000000..], &buf2.data[..]);
2691
}
2692
2693
#[test]
2694
fn test_multiple_redirect() {
2695
// label0:
2696
// cbz x0, label1
2697
// b label2
2698
// label1:
2699
// b label3
2700
// label2:
2701
// nop
2702
// nop
2703
// b label0
2704
// label3:
2705
// b label4
2706
// label4:
2707
// b label5
2708
// label5:
2709
// b label7
2710
// label6:
2711
// nop
2712
// label7:
2713
// ret
2714
//
2715
// -- should become:
2716
//
2717
// label0:
2718
// cbz x0, label7
2719
// label2:
2720
// nop
2721
// nop
2722
// b label0
2723
// label6:
2724
// nop
2725
// label7:
2726
// ret
2727
2728
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2729
let mut buf = MachBuffer::new();
2730
let mut state = <Inst as MachInstEmit>::State::default();
2731
let constants = Default::default();
2732
2733
buf.reserve_labels_for_blocks(8);
2734
2735
buf.bind_label(label(0), state.ctrl_plane_mut());
2736
let inst = Inst::CondBr {
2737
kind: CondBrKind::Zero(xreg(0), OperandSize::Size64),
2738
taken: target(1),
2739
not_taken: target(2),
2740
};
2741
inst.emit(&mut buf, &info, &mut state);
2742
2743
buf.bind_label(label(1), state.ctrl_plane_mut());
2744
let inst = Inst::Jump { dest: target(3) };
2745
inst.emit(&mut buf, &info, &mut state);
2746
2747
buf.bind_label(label(2), state.ctrl_plane_mut());
2748
let inst = Inst::Nop4;
2749
inst.emit(&mut buf, &info, &mut state);
2750
inst.emit(&mut buf, &info, &mut state);
2751
let inst = Inst::Jump { dest: target(0) };
2752
inst.emit(&mut buf, &info, &mut state);
2753
2754
buf.bind_label(label(3), state.ctrl_plane_mut());
2755
let inst = Inst::Jump { dest: target(4) };
2756
inst.emit(&mut buf, &info, &mut state);
2757
2758
buf.bind_label(label(4), state.ctrl_plane_mut());
2759
let inst = Inst::Jump { dest: target(5) };
2760
inst.emit(&mut buf, &info, &mut state);
2761
2762
buf.bind_label(label(5), state.ctrl_plane_mut());
2763
let inst = Inst::Jump { dest: target(7) };
2764
inst.emit(&mut buf, &info, &mut state);
2765
2766
buf.bind_label(label(6), state.ctrl_plane_mut());
2767
let inst = Inst::Nop4;
2768
inst.emit(&mut buf, &info, &mut state);
2769
2770
buf.bind_label(label(7), state.ctrl_plane_mut());
2771
let inst = Inst::Ret {};
2772
inst.emit(&mut buf, &info, &mut state);
2773
2774
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2775
2776
let golden_data = vec![
2777
0xa0, 0x00, 0x00, 0xb4, // cbz x0, 0x14
2778
0x1f, 0x20, 0x03, 0xd5, // nop
2779
0x1f, 0x20, 0x03, 0xd5, // nop
2780
0xfd, 0xff, 0xff, 0x17, // b 0
2781
0x1f, 0x20, 0x03, 0xd5, // nop
2782
0xc0, 0x03, 0x5f, 0xd6, // ret
2783
];
2784
2785
assert_eq!(&golden_data[..], &buf.data[..]);
2786
}
2787
2788
#[test]
2789
fn test_handle_branch_cycle() {
2790
// label0:
2791
// b label1
2792
// label1:
2793
// b label2
2794
// label2:
2795
// b label3
2796
// label3:
2797
// b label4
2798
// label4:
2799
// b label1 // note: not label0 (to make it interesting).
2800
//
2801
// -- should become:
2802
//
2803
// label0, label1, ..., label4:
2804
// b label0
2805
let info = EmitInfo::new(settings::Flags::new(settings::builder()));
2806
let mut buf = MachBuffer::new();
2807
let mut state = <Inst as MachInstEmit>::State::default();
2808
let constants = Default::default();
2809
2810
buf.reserve_labels_for_blocks(5);
2811
2812
buf.bind_label(label(0), state.ctrl_plane_mut());
2813
let inst = Inst::Jump { dest: target(1) };
2814
inst.emit(&mut buf, &info, &mut state);
2815
2816
buf.bind_label(label(1), state.ctrl_plane_mut());
2817
let inst = Inst::Jump { dest: target(2) };
2818
inst.emit(&mut buf, &info, &mut state);
2819
2820
buf.bind_label(label(2), state.ctrl_plane_mut());
2821
let inst = Inst::Jump { dest: target(3) };
2822
inst.emit(&mut buf, &info, &mut state);
2823
2824
buf.bind_label(label(3), state.ctrl_plane_mut());
2825
let inst = Inst::Jump { dest: target(4) };
2826
inst.emit(&mut buf, &info, &mut state);
2827
2828
buf.bind_label(label(4), state.ctrl_plane_mut());
2829
let inst = Inst::Jump { dest: target(1) };
2830
inst.emit(&mut buf, &info, &mut state);
2831
2832
let buf = buf.finish(&constants, state.ctrl_plane_mut());
2833
2834
let golden_data = vec![
2835
0x00, 0x00, 0x00, 0x14, // b 0
2836
];
2837
2838
assert_eq!(&golden_data[..], &buf.data[..]);
2839
}
2840
2841
#[test]
2842
fn metadata_records() {
2843
let mut buf = MachBuffer::<Inst>::new();
2844
let ctrl_plane = &mut Default::default();
2845
let constants = Default::default();
2846
2847
buf.reserve_labels_for_blocks(3);
2848
2849
buf.bind_label(label(0), ctrl_plane);
2850
buf.put1(1);
2851
buf.add_trap(TrapCode::HEAP_OUT_OF_BOUNDS);
2852
buf.put1(2);
2853
buf.add_trap(TrapCode::INTEGER_OVERFLOW);
2854
buf.add_trap(TrapCode::INTEGER_DIVISION_BY_ZERO);
2855
buf.add_try_call_site(
2856
Some(0x10),
2857
[
2858
MachExceptionHandler::Tag(ExceptionTag::new(42), label(2)),
2859
MachExceptionHandler::Default(label(1)),
2860
]
2861
.into_iter(),
2862
);
2863
buf.add_reloc(
2864
Reloc::Abs4,
2865
&ExternalName::User(UserExternalNameRef::new(0)),
2866
0,
2867
);
2868
buf.put1(3);
2869
buf.add_reloc(
2870
Reloc::Abs8,
2871
&ExternalName::User(UserExternalNameRef::new(1)),
2872
1,
2873
);
2874
buf.put1(4);
2875
buf.bind_label(label(1), ctrl_plane);
2876
buf.put1(0xff);
2877
buf.bind_label(label(2), ctrl_plane);
2878
buf.put1(0xff);
2879
2880
let buf = buf.finish(&constants, ctrl_plane);
2881
2882
assert_eq!(buf.data(), &[1, 2, 3, 4, 0xff, 0xff]);
2883
assert_eq!(
2884
buf.traps()
2885
.iter()
2886
.map(|trap| (trap.offset, trap.code))
2887
.collect::<Vec<_>>(),
2888
vec![
2889
(1, TrapCode::HEAP_OUT_OF_BOUNDS),
2890
(2, TrapCode::INTEGER_OVERFLOW),
2891
(2, TrapCode::INTEGER_DIVISION_BY_ZERO)
2892
]
2893
);
2894
let call_sites: Vec<_> = buf.call_sites().collect();
2895
assert_eq!(call_sites[0].ret_addr, 2);
2896
assert_eq!(call_sites[0].frame_offset, Some(0x10));
2897
assert_eq!(
2898
call_sites[0].exception_handlers,
2899
&[
2900
FinalizedMachExceptionHandler::Tag(ExceptionTag::new(42), 5),
2901
FinalizedMachExceptionHandler::Default(4)
2902
],
2903
);
2904
assert_eq!(
2905
buf.relocs()
2906
.iter()
2907
.map(|reloc| (reloc.offset, reloc.kind))
2908
.collect::<Vec<_>>(),
2909
vec![(2, Reloc::Abs4), (3, Reloc::Abs8)]
2910
);
2911
}
2912
}
2913
2914