Path: blob/master/Utilities/cmliblzma/liblzma/simple/riscv.c
3156 views
// SPDX-License-Identifier: 0BSD12///////////////////////////////////////////////////////////////////////////////3//4/// \file riscv.c5/// \brief Filter for 32-bit/64-bit little/big endian RISC-V binaries6///7/// This converts program counter relative addresses in function calls8/// (JAL, AUIPC+JALR), address calculation of functions and global9/// variables (AUIPC+ADDI), loads (AUIPC+load), and stores (AUIPC+store).10///11/// For AUIPC+inst2 pairs, the paired instruction checking is fairly relaxed.12/// The paired instruction opcode must only have its lowest two bits set,13/// meaning it will convert any paired instruction that is not a 16-bit14/// compressed instruction. This was shown to be enough to keep the number15/// of false matches low while improving code size and speed.16//17// Authors: Lasse Collin18// Jia Tan19//20// Special thanks:21//22// - Chien Wong <[email protected]> provided a few early versions of RISC-V23// filter variants along with test files and benchmark results.24//25// - Igor Pavlov helped a lot in the filter design, getting it both26// faster and smaller. The implementation here is still independently27// written, not based on LZMA SDK.28//29///////////////////////////////////////////////////////////////////////////////3031/*3233RISC-V filtering34================3536RV32I and RV64I, possibly combined with extensions C, Zfh, F, D,37and Q, are identical enough that the same filter works for both.3839The instruction encoding is always little endian, even on systems40with big endian data access. Thus the same filter works for both41endiannesses.4243The following instructions have program counter relative44(pc-relative) behavior:4546JAL47---4849JAL is used for function calls (including tail calls) and50unconditional jumps within functions. Jumps within functions51aren't useful to filter because the absolute addresses often52appear only once or at most a few times. Tail calls and jumps53within functions look the same to a simple filter so neither54are filtered, that is, JAL x0 is ignored (the ABI name of the55register x0 is "zero").5657Almost all calls store the return address to register x1 (ra)58or x5 (t0). To reduce false matches when the filter is applied59to non-code data, only the JAL instructions that use x1 or x560are converted. JAL has pc-relative range of +/-1 MiB so longer61calls and jumps need another method (AUIPC+JALR).6263C.J and C.JAL64-------------6566C.J and C.JAL have pc-relative range of +/-2 KiB.6768C.J is for tail calls and jumps within functions and isn't69filtered for the reasons mentioned for JAL x0.7071C.JAL is an RV32C-only instruction. Its encoding overlaps with72RV64C-only C.ADDIW which is a common instruction. So if filtering73C.JAL was useful (it wasn't tested) then a separate filter would74be needed for RV32 and RV64. Also, false positives would be a75significant problem when the filter is applied to non-code data76because C.JAL needs only five bits to match. Thus, this filter77doesn't modify C.JAL instructions.7879BEQ, BNE, BLT, BGE, BLTU, BGEU, C.BEQZ, and C.BNEZ80--------------------------------------------------8182These are conditional branches with pc-relative range83of +/-4 KiB (+/-256 B for C.*). The absolute addresses often84appear only once and very short distances are the most common,85so filtering these instructions would make compression worse.8687AUIPC with rd != x088-------------------8990AUIPC is paired with a second instruction (inst2) to do91pc-relative jumps, calls, loads, stores, and for taking92an address of a symbol. AUIPC has a 20-bit immediate and93the possible inst2 choices have a 12-bit immediate.9495AUIPC stores pc + 20-bit signed immediate to a register.96The immediate encodes a multiple of 4 KiB so AUIPC itself97has a pc-relative range of +/-2 GiB. AUIPC does *NOT* set98the lowest 12 bits of the result to zero! This means that99the 12-bit immediate in inst2 cannot just include the lowest10012 bits of the absolute address as is; the immediate has to101compensate for the lowest 12 bits that AUIPC copies from the102program counter. This means that a good filter has to convert103not only AUIPC but also the paired inst2.104105A strict filter would focus on filtering the following106AUIPC+inst2 pairs:107108- AUIPC+JALR: Function calls, including tail calls.109110- AUIPC+ADDI: Calculating the address of a function111or a global variable.112113- AUIPC+load/store from the base instruction sets114(RV32I, RV64I) or from the floating point extensions115Zfh, F, D, and Q:116* RV32I: LB, LH, LW, LBU, LHU, SB, SH, SW117* RV64I has also: LD, LWU, SD118* Zfh: FLH, FSH119* F: FLW, FSW120* D: FLD, FSD121* Q: FLQ, FSQ122123NOTE: AUIPC+inst2 can only be a pair if AUIPC's rd specifies124the same register as inst2's rs1.125126Instead of strictly accepting only the above instructions as inst2,127this filter uses a much simpler condition: the lowest two bits of128inst2 must be set, that is, inst2 must not be a 16-bit compressed129instruction. So this will accept all 32-bit and possible future130extended instructions as a pair to AUIPC if the bits in AUIPC's131rd [11:7] match the bits [19:15] in inst2 (the bits that I-type and132S-type instructions use for rs1). Testing showed that this relaxed133condition for inst2 did not consistently or significantly affect134compression ratio but it reduced code size and improved speed.135136Additionally, the paired instruction is always treated as an I-type137instruction. The S-type instructions used by stores (SB, SH, SW,138etc.) place the lowest 5 bits of the immediate in a different139location than I-type instructions. AUIPC+store pairs are less140common than other pairs, and testing showed that the extra141code required to handle S-type instructions was not worth the142compression ratio gained.143144AUIPC+inst2 don't necessarily appear sequentially next to each145other although very often they do. Especially AUIPC+JALR are146sequential as that may allow instruction fusion in processors147(and perhaps help branch prediction as a fused AUIPC+JALR is148a direct branch while JALR alone is an indirect branch).149150Clang 16 can generate code where AUIPC+inst2 is split:151152- AUIPC is outside a loop and inst2 (load/store) is inside153the loop. This way the AUIPC instruction needs to be154executed only once.155156- Load-modify-store may have AUIPC for the load and the same157AUIPC-result is used for the store too. This may get combined158with AUIPC being outside the loop.159160- AUIPC is before a conditional branch and inst2 is hundreds161of bytes away at the branch target.162163- Inner and outer pair:164165auipc a1,0x2f166auipc a2,0x3d167ld a2,-500(a2)168addi a1,a1,-233169170- Many split pairs with an untaken conditional branch between:171172auipc s9,0x1613 # Pair 1173auipc s4,0x1613 # Pair 2174auipc s6,0x1613 # Pair 3175auipc s10,0x1613 # Pair 4176beqz a5,a3baae177ld a0,0(a6)178ld a6,246(s9) # Pair 1179ld a1,250(s4) # Pair 2180ld a3,254(s6) # Pair 3181ld a4,258(s10) # Pair 4182183It's not possible to find all split pairs in a filter like this.184At least in 2024, simple sequential pairs are 99 % of AUIPC uses185so filtering only such pairs gives good results and makes the186filter simpler. However, it's possible that future compilers will187produce different code where sequential pairs aren't as common.188189This filter doesn't convert AUIPC instructions alone because:190191(1) The conversion would be off-by-one (or off-by-4096) half the192time because the lowest 12 bits from inst2 (inst2_imm12)193aren't known. We only know that the absolute address is194pc + AUIPC_imm20 + [-2048, +2047] but there is no way to195know the exact 4096-byte multiple (or 4096 * n + 2048):196there are always two possibilities because AUIPC copies197the 12 lowest bits from pc instead of zeroing them.198199NOTE: The sign-extension of inst2_imm12 adds a tiny bit200of extra complexity to AUIPC math in general but it's not201the reason for this problem. The sign-extension only changes202the relative position of the pc-relative 4096-byte window.203204(2) Matching AUIPC instruction alone requires only seven bits.205When the filter is applied to non-code data, that leads206to many false positives which make compression worse.207As long as most AUIPC+inst2 pairs appear as two consecutive208instructions, converting only such pairs gives better results.209210In assembly, AUIPC+inst2 tend to look like this:211212# Call:213auipc ra, 0x12345214jalr ra, -42(ra)215216# Tail call:217auipc t1, 0x12345218jalr zero, -42(t1)219220# Getting the absolute address:221auipc a0, 0x12345222addi a0, a0, -42223224# rd of inst2 isn't necessarily the same as rs1 even225# in cases where there is no reason to preserve rs1.226auipc a0, 0x12345227addi a1, a0, -42228229As of 2024, 16-bit instructions from the C extension don't230appear as inst2. The RISC-V psABI doesn't list AUIPC+C.* as231a linker relaxation type explicitly but it's not disallowed232either. Usefulness is limited as most of the time the lowest23312 bits won't fit in a C instruction. This filter doesn't234support AUIPC+C.* combinations because this makes the filter235simpler, there are no test files, and it hopefully will never236be needed anyway.237238(Compare AUIPC to ARM64 where ADRP does set the lowest 12 bits239to zero. The paired instruction has the lowest 12 bits of the240absolute address as is in a zero-extended immediate. Thus the241ARM64 filter doesn't need to care about the instructions that242are paired with ADRP. An off-by-4096 issue can still occur if243the code section isn't aligned with the filter's start offset.244It's not a problem with standalone ELF files but Windows PE245files need start_offset=3072 for best results. Also, a .tar246stores files with 512-byte alignment so most of the time it247won't be the best for ARM64.)248249AUIPC with rd == x0250-------------------251252AUIPC instructions with rd=x0 are reserved for HINTs in the base253instruction set. Such AUIPC instructions are never filtered.254255As of January 2024, it seems likely that AUIPC with rd=x0 will256be used for landing pads (pseudoinstruction LPAD). LPAD is used257to mark valid targets for indirect jumps (for JALR), for example,258beginnings of functions. The 20-bit immediate in LPAD instruction259is a label, not a pc-relative address. Thus it would be260counterproductive to convert AUIPC instructions with rd=x0.261262Often the next instruction after LPAD won't have rs1=x0 and thus263the filtering would be skipped for that reason alone. However,264it's not good to rely on this. For example, consider a function265that begins like this:266267int foo(int i)268{269if (i <= 234) {270...271}272273A compiler may generate something like this:274275lpad 0x54321276li a5, 234277bgt a0, a5, .L2278279Converting the pseudoinstructions to raw instructions:280281auipc x0, 0x54321282addi x15, x0, 234283blt x15, x10, .L2284285In this case the filter would undesirably convert the AUIPC+ADDI286pair if the filter didn't explicitly skip AUIPC instructions287that have rd=x0.288289*/290291292#include "simple_private.h"293294295// This checks two conditions at once:296// - AUIPC rd == inst2 rs1.297// - inst2 opcode has the lowest two bits set.298//299// The 8 bit left shift aligns the rd of AUIPC with the rs1 of inst2.300// By XORing the registers, any non-zero value in those bits indicates the301// registers are not equal and thus not an AUIPC pair. Subtracting 3 from302// inst2 will zero out the first two opcode bits only when they are set.303// The mask tests if any of the register or opcode bits are set (and thus304// not an AUIPC pair).305//306// Alternative expression: (((((auipc) << 8) ^ (inst2)) & 0xF8003) != 3)307#define NOT_AUIPC_PAIR(auipc, inst2) \308((((auipc) << 8) ^ ((inst2) - 3)) & 0xF8003)309310// This macro checks multiple conditions:311// (1) AUIPC rd [11:7] == x2 (special rd value).312// (2) AUIPC bits 12 and 13 set (the lowest two opcode bits of packed inst2).313// (3) inst2_rs1 doesn't equal x0 or x2 because the opposite314// conversion is only done when315// auipc_rd != x0 &&316// auipc_rd != x2 &&317// auipc_rd == inst2_rs1.318//319// The left-hand side takes care of (1) and (2).320// (a) The lowest 7 bits are already known to be AUIPC so subtracting 0x17321// makes those bits zeros.322// (b) If AUIPC rd equals x2, subtracting 0x100 makes bits [11:7] zeros.323// If rd doesn't equal x2, then there will be at least one non-zero bit324// and the next step (c) is irrelevant.325// (c) If the lowest two opcode bits of the packed inst2 are set in [13:12],326// then subtracting 0x3000 will make those bits zeros. Otherwise there327// will be at least one non-zero bit.328//329// The shift by 18 removes the high bits from the final '>=' comparison and330// ensures that any non-zero result will be larger than any possible result331// from the right-hand side of the comparison. The cast ensures that the332// left-hand side didn't get promoted to a larger type than uint32_t.333//334// On the right-hand side, inst2_rs1 & 0x1D will be non-zero as long as335// inst2_rs1 is not x0 or x2.336//337// The final '>=' comparison will make the expression true if:338// - The subtraction caused any bits to be set (special AUIPC rd value not339// used or inst2 opcode bits not set). (non-zero >= non-zero or 0)340// - The subtraction did not cause any bits to be set but inst2_rs1 was341// x0 or x2. (0 >= 0)342#define NOT_SPECIAL_AUIPC(auipc, inst2_rs1) \343((uint32_t)(((auipc) - 0x3117) << 18) >= ((inst2_rs1) & 0x1D))344345346// The encode and decode functions are split for this filter because of the347// AUIPC+inst2 filtering. This filter design allows a decoder-only348// implementation to be smaller than alternative designs.349350#ifdef HAVE_ENCODER_RISCV351static size_t352riscv_encode(void *simple lzma_attribute((__unused__)),353uint32_t now_pos,354bool is_encoder lzma_attribute((__unused__)),355uint8_t *buffer, size_t size)356{357// Avoid using i + 8 <= size in the loop condition.358//359// NOTE: If there is a JAL in the last six bytes of the stream, it360// won't be converted. This is intentional to keep the code simpler.361if (size < 8)362return 0;363364size -= 8;365366size_t i;367368// The loop is advanced by 2 bytes every iteration since the369// instruction stream may include 16-bit instructions (C extension).370for (i = 0; i <= size; i += 2) {371uint32_t inst = buffer[i];372373if (inst == 0xEF) {374// JAL375const uint32_t b1 = buffer[i + 1];376377// Only filter rd=x1(ra) and rd=x5(t0).378if ((b1 & 0x0D) != 0)379continue;380381// The 20-bit immediate is in four pieces.382// The encoder stores it in big endian form383// since it improves compression slightly.384const uint32_t b2 = buffer[i + 2];385const uint32_t b3 = buffer[i + 3];386const uint32_t pc = now_pos + (uint32_t)i;387388// The following chart shows the highest three bytes of JAL, focusing on389// the 20-bit immediate field [31:12]. The first row of numbers is the390// bit position in a 32-bit little endian instruction. The second row of391// numbers shows the order of the immediate field in a J-type instruction.392// The last row is the bit number in each byte.393//394// To determine the amount to shift each bit, subtract the value in395// the last row from the value in the second last row. If the number396// is positive, shift left. If negative, shift right.397//398// For example, at the rightmost side of the chart, the bit 4 in b1 is399// the bit 12 of the address. Thus that bit needs to be shifted left400// by 12 - 4 = 8 bits to put it in the right place in the addr variable.401//402// NOTE: The immediate of a J-type instruction holds bits [20:1] of403// the address. The bit [0] is always 0 and not part of the immediate.404//405// | b3 | b2 | b1 |406// | 31 30 29 28 27 26 25 24 | 23 22 21 20 19 18 17 16 | 15 14 13 12 x x x x |407// | 20 10 9 8 7 6 5 4 | 3 2 1 11 19 18 17 16 | 15 14 13 12 x x x x |408// | 7 6 5 4 3 2 1 0 | 7 6 5 4 3 2 1 0 | 7 6 5 4 x x x x |409410uint32_t addr = ((b1 & 0xF0) << 8)411| ((b2 & 0x0F) << 16)412| ((b2 & 0x10) << 7)413| ((b2 & 0xE0) >> 4)414| ((b3 & 0x7F) << 4)415| ((b3 & 0x80) << 13);416417addr += pc;418419buffer[i + 1] = (uint8_t)((b1 & 0x0F)420| ((addr >> 13) & 0xF0));421422buffer[i + 2] = (uint8_t)(addr >> 9);423buffer[i + 3] = (uint8_t)(addr >> 1);424425// The "-2" is included because the for-loop will426// always increment by 2. In this case, we want to427// skip an extra 2 bytes since we used 4 bytes428// of input.429i += 4 - 2;430431} else if ((inst & 0x7F) == 0x17) {432// AUIPC433inst |= (uint32_t)buffer[i + 1] << 8;434inst |= (uint32_t)buffer[i + 2] << 16;435inst |= (uint32_t)buffer[i + 3] << 24;436437// Branch based on AUIPC's rd. The bitmask test does438// the same thing as this:439//440// const uint32_t auipc_rd = (inst >> 7) & 0x1F;441// if (auipc_rd != 0 && auipc_rd != 2) {442if (inst & 0xE80) {443// AUIPC's rd doesn't equal x0 or x2.444445// Check if AUIPC+inst2 are a pair.446uint32_t inst2 = read32le(buffer + i + 4);447448if (NOT_AUIPC_PAIR(inst, inst2)) {449// The NOT_AUIPC_PAIR macro allows450// a false AUIPC+AUIPC pair if the451// bits [19:15] (where rs1 would be)452// in the second AUIPC match the rd453// of the first AUIPC.454//455// We must skip enough forward so456// that the first two bytes of the457// second AUIPC cannot get converted.458// Such a conversion could make the459// current pair become a valid pair460// which would desync the decoder.461//462// Skipping six bytes is enough even463// though the above condition looks464// at the lowest four bits of the465// buffer[i + 6] too. This is safe466// because this filter never changes467// those bits if a conversion at468// that position is done.469i += 6 - 2;470continue;471}472473// Convert AUIPC+inst2 to a special format:474//475// - The lowest 7 bits [6:0] retain the476// AUIPC opcode.477//478// - The rd [11:7] is set to x2(sp). x2 is479// used as the stack pointer so AUIPC with480// rd=x2 should be very rare in real-world481// executables.482//483// - The remaining 20 bits [31:12] (that484// normally hold the pc-relative immediate)485// are used to store the lowest 20 bits of486// inst2. That is, the 12-bit immediate of487// inst2 is not included.488//489// - The location of the original inst2 is490// used to store the 32-bit absolute491// address in big endian format. Compared492// to the 20+12-bit split encoding, this493// results in a longer uninterrupted494// sequence of identical common bytes495// when the same address is referred496// with different instruction pairs497// (like AUIPC+LD vs. AUIPC+ADDI) or498// when the occurrences of the same499// pair use different registers. When500// referring to adjacent memory locations501// (like function calls that go via the502// ELF PLT), in big endian order only the503// last 1-2 bytes differ; in little endian504// the differing 1-2 bytes would be in the505// middle of the 8-byte sequence.506//507// When reversing the transformation, the508// original rd of AUIPC can be restored509// from inst2's rs1 as they are required to510// be the same.511512// Arithmetic right shift makes sign extension513// trivial but (1) it's implementation-defined514// behavior (C99/C11/C23 6.5.7-p5) and so is515// (2) casting unsigned to signed (6.3.1.3-p3).516//517// One can check for (1) with518//519// if ((-1 >> 1) == -1) ...520//521// but (2) has to be checked from the522// compiler docs. GCC promises that (1)523// and (2) behave in the common expected524// way and thus525//526// addr += (uint32_t)(527// (int32_t)inst2 >> 20);528//529// does the same as the code below. But since530// the 100 % portable way is only a few bytes531// bigger code and there is no real speed532// difference, let's just use that, especially533// since the decoder doesn't need this at all.534uint32_t addr = inst & 0xFFFFF000;535addr += (inst2 >> 20)536- ((inst2 >> 19) & 0x1000);537538addr += now_pos + (uint32_t)i;539540// Construct the first 32 bits:541// [6:0] AUIPC opcode542// [11:7] Special AUIPC rd = x2543// [31:12] The lowest 20 bits of inst2544inst = 0x17 | (2 << 7) | (inst2 << 12);545546write32le(buffer + i, inst);547548// The second 32 bits store the absolute549// address in big endian order.550write32be(buffer + i + 4, addr);551} else {552// AUIPC's rd equals x0 or x2.553//554// x0 indicates a landing pad (LPAD).555// It's always skipped.556//557// AUIPC with rd == x2 is used for the special558// format as explained above. When the input559// contains a byte sequence that matches the560// special format, "fake" decoding must be561// done to keep the filter bijective (that562// is, safe to apply on arbitrary data).563//564// See the "x0 or x2" section in riscv_decode()565// for how the "real" decoding is done. The566// "fake" decoding is a simplified version567// of "real" decoding with the following568// differences (these reduce code size of569// the decoder):570// (1) The lowest 12 bits aren't sign-extended.571// (2) No address conversion is done.572// (3) Big endian format isn't used (the fake573// address is in little endian order).574575// Check if inst matches the special format.576const uint32_t fake_rs1 = inst >> 27;577578if (NOT_SPECIAL_AUIPC(inst, fake_rs1)) {579i += 4 - 2;580continue;581}582583const uint32_t fake_addr =584read32le(buffer + i + 4);585586// Construct the second 32 bits:587// [19:0] Upper 20 bits from AUIPC588// [31:20] The lowest 12 bits of fake_addr589const uint32_t fake_inst2 = (inst >> 12)590| (fake_addr << 20);591592// Construct new first 32 bits from:593// [6:0] AUIPC opcode594// [11:7] Fake AUIPC rd = fake_rs1595// [31:12] The highest 20 bits of fake_addr596inst = 0x17 | (fake_rs1 << 7)597| (fake_addr & 0xFFFFF000);598599write32le(buffer + i, inst);600write32le(buffer + i + 4, fake_inst2);601}602603i += 8 - 2;604}605}606607return i;608}609610611extern lzma_ret612lzma_simple_riscv_encoder_init(lzma_next_coder *next,613const lzma_allocator *allocator,614const lzma_filter_info *filters)615{616return lzma_simple_coder_init(next, allocator, filters,617&riscv_encode, 0, 8, 2, true);618}619#endif620621622#ifdef HAVE_DECODER_RISCV623static size_t624riscv_decode(void *simple lzma_attribute((__unused__)),625uint32_t now_pos,626bool is_encoder lzma_attribute((__unused__)),627uint8_t *buffer, size_t size)628{629if (size < 8)630return 0;631632size -= 8;633634size_t i;635for (i = 0; i <= size; i += 2) {636uint32_t inst = buffer[i];637638if (inst == 0xEF) {639// JAL640const uint32_t b1 = buffer[i + 1];641642// Only filter rd=x1(ra) and rd=x5(t0).643if ((b1 & 0x0D) != 0)644continue;645646const uint32_t b2 = buffer[i + 2];647const uint32_t b3 = buffer[i + 3];648const uint32_t pc = now_pos + (uint32_t)i;649650// | b3 | b2 | b1 |651// | 31 30 29 28 27 26 25 24 | 23 22 21 20 19 18 17 16 | 15 14 13 12 x x x x |652// | 20 10 9 8 7 6 5 4 | 3 2 1 11 19 18 17 16 | 15 14 13 12 x x x x |653// | 7 6 5 4 3 2 1 0 | 7 6 5 4 3 2 1 0 | 7 6 5 4 x x x x |654655uint32_t addr = ((b1 & 0xF0) << 13)656| (b2 << 9) | (b3 << 1);657658addr -= pc;659660buffer[i + 1] = (uint8_t)((b1 & 0x0F)661| ((addr >> 8) & 0xF0));662663buffer[i + 2] = (uint8_t)(((addr >> 16) & 0x0F)664| ((addr >> 7) & 0x10)665| ((addr << 4) & 0xE0));666667buffer[i + 3] = (uint8_t)(((addr >> 4) & 0x7F)668| ((addr >> 13) & 0x80));669670i += 4 - 2;671672} else if ((inst & 0x7F) == 0x17) {673// AUIPC674uint32_t inst2;675676inst |= (uint32_t)buffer[i + 1] << 8;677inst |= (uint32_t)buffer[i + 2] << 16;678inst |= (uint32_t)buffer[i + 3] << 24;679680if (inst & 0xE80) {681// AUIPC's rd doesn't equal x0 or x2.682683// Check if it is a "fake" AUIPC+inst2 pair.684inst2 = read32le(buffer + i + 4);685686if (NOT_AUIPC_PAIR(inst, inst2)) {687i += 6 - 2;688continue;689}690691// Decode (or more like re-encode) the "fake"692// pair. The "fake" format doesn't do693// sign-extension, address conversion, or694// use big endian. (The use of little endian695// allows sharing the write32le() calls in696// the decoder to reduce code size when697// unaligned access isn't supported.)698uint32_t addr = inst & 0xFFFFF000;699addr += inst2 >> 20;700701inst = 0x17 | (2 << 7) | (inst2 << 12);702inst2 = addr;703} else {704// AUIPC's rd equals x0 or x2.705706// Check if inst matches the special format707// used by the encoder.708const uint32_t inst2_rs1 = inst >> 27;709710if (NOT_SPECIAL_AUIPC(inst, inst2_rs1)) {711i += 4 - 2;712continue;713}714715// Decode the "real" pair.716uint32_t addr = read32be(buffer + i + 4);717718addr -= now_pos + (uint32_t)i;719720// The second instruction:721// - Get the lowest 20 bits from inst.722// - Add the lowest 12 bits of the address723// as the immediate field.724inst2 = (inst >> 12) | (addr << 20);725726// AUIPC:727// - rd is the same as inst2_rs1.728// - The sign extension of the lowest 12 bits729// must be taken into account.730inst = 0x17 | (inst2_rs1 << 7)731| ((addr + 0x800) & 0xFFFFF000);732}733734// Both decoder branches write in little endian order.735write32le(buffer + i, inst);736write32le(buffer + i + 4, inst2);737738i += 8 - 2;739}740}741742return i;743}744745746extern lzma_ret747lzma_simple_riscv_decoder_init(lzma_next_coder *next,748const lzma_allocator *allocator,749const lzma_filter_info *filters)750{751return lzma_simple_coder_init(next, allocator, filters,752&riscv_decode, 0, 8, 2, false);753}754#endif755756757