Path: blob/main/Modules/_decimal/libmpdec/literature/mulmod-ppro.txt
12 views
12(* Copyright (c) 2011-2020 Stefan Krah. All rights reserved. *)345========================================================================6Calculate (a * b) % p using the 80-bit x87 FPU7========================================================================89A description of the algorithm can be found in the apfloat manual by10Tommila [1].1112The proof follows an argument made by Granlund/Montgomery in [2].131415Definitions and assumptions:16----------------------------1718The 80-bit extended precision format uses 64 bits for the significand:1920(1) F = 642122The modulus is prime and less than 2**31:2324(2) 2 <= p < 2**312526The factors are less than p:2728(3) 0 <= a < p29(4) 0 <= b < p3031The product a * b is less than 2**62 and is thus exact in 64 bits:3233(5) n = a * b3435The product can be represented in terms of quotient and remainder:3637(6) n = q * p + r3839Using (3), (4) and the fact that p is prime, the remainder is always40greater than zero:4142(7) 0 <= q < p /\ 1 <= r < p434445Strategy:46---------4748Precalculate the 80-bit long double inverse of p, with a maximum49relative error of 2**(1-F):5051(8) pinv = (long double)1.0 / p5253Calculate an estimate for q = floor(n/p). The multiplication has another54maximum relative error of 2**(1-F):5556(9) qest = n * pinv5758If we can show that q < qest < q+1, then trunc(qest) = q. It is then59easy to recover the remainder r. The complete algorithm is:6061a) Set the control word to 64-bit precision and truncation mode.6263b) n = a * b # Calculate exact product.6465c) qest = n * pinv # Calculate estimate for the quotient.6667d) q = (qest+2**63)-2**63 # Truncate qest to the exact quotient.6869f) r = n - q * p # Calculate remainder.707172Proof for q < qest < q+1:73-------------------------7475Using the cumulative error, the error bounds for qest are:7677n n * (1 + 2**(1-F))**278(9) --------------------- <= qest <= ---------------------79p * (1 + 2**(1-F))**2 p808182Lemma 1:83--------84n q * p + r85(10) q < --------------------- = ---------------------86p * (1 + 2**(1-F))**2 p * (1 + 2**(1-F))**2878889Proof:90~~~~~~9192(I) q * p * (1 + 2**(1-F))**2 < q * p + r9394(II) q * p * 2**(2-F) + q * p * 2**(2-2*F) < r9596Using (1) and (7), it is sufficient to show that:9798(III) q * p * 2**(-62) + q * p * 2**(-126) < 1 <= r99100(III) can easily be verified by substituting the largest possible101values p = 2**31-1 and q = 2**31-2.102103The critical cases occur when r = 1, n = m * p + 1. These cases104can be exhaustively verified with a test program.105106107Lemma 2:108--------109110n * (1 + 2**(1-F))**2 (q * p + r) * (1 + 2**(1-F))**2111(11) --------------------- = ------------------------------- < q + 1112p p113114Proof:115~~~~~~116117(I) (q * p + r) + (q * p + r) * 2**(2-F) + (q * p + r) * 2**(2-2*F) < q * p + p118119(II) (q * p + r) * 2**(2-F) + (q * p + r) * 2**(2-2*F) < p - r120121Using (1) and (7), it is sufficient to show that:122123(III) (q * p + r) * 2**(-62) + (q * p + r) * 2**(-126) < 1 <= p - r124125(III) can easily be verified by substituting the largest possible126values p = 2**31-1, q = 2**31-2 and r = 2**31-2.127128The critical cases occur when r = (p - 1), n = m * p - 1. These cases129can be exhaustively verified with a test program.130131132[1] http://www.apfloat.org/apfloat/2.40/apfloat.pdf133[2] http://gmplib.org/~tege/divcnst-pldi94.pdf134[Section 7: "Use of floating point"]135136137138(* Coq proof for (10) and (11) *)139140Require Import ZArith.141Require Import QArith.142Require Import Qpower.143Require Import Qabs.144Require Import Psatz.145146Open Scope Q_scope.147148149Ltac qreduce T :=150rewrite <- (Qred_correct (T)); simpl (Qred (T)).151152Theorem Qlt_move_right :153forall x y z:Q, x + z < y <-> x < y - z.154Proof.155intros.156split.157intros.158psatzl Q.159intros.160psatzl Q.161Qed.162163Theorem Qlt_mult_by_z :164forall x y z:Q, 0 < z -> (x < y <-> x * z < y * z).165Proof.166intros.167split.168intros.169apply Qmult_lt_compat_r. trivial. trivial.170intros.171rewrite <- (Qdiv_mult_l x z). rewrite <- (Qdiv_mult_l y z).172apply Qmult_lt_compat_r.173apply Qlt_shift_inv_l.174trivial. psatzl Q. trivial. psatzl Q. psatzl Q.175Qed.176177Theorem Qle_mult_quad :178forall (a b c d:Q),1790 <= a -> a <= c ->1800 <= b -> b <= d ->181a * b <= c * d.182intros.183psatz Q.184Qed.185186187Theorem q_lt_qest:188forall (p q r:Q),189(0 < p) -> (p <= (2#1)^31 - 1) ->190(0 <= q) -> (q <= p - 1) ->191(1 <= r) -> (r <= p - 1) ->192q < (q * p + r) / (p * (1 + (2#1)^(-63))^2).193Proof.194intros.195rewrite Qlt_mult_by_z with (z := (p * (1 + (2#1)^(-63))^2)).196197unfold Qdiv.198rewrite <- Qmult_assoc.199rewrite (Qmult_comm (/ (p * (1 + (2 # 1) ^ (-63)) ^ 2)) (p * (1 + (2 # 1) ^ (-63)) ^ 2)).200rewrite Qmult_inv_r.201rewrite Qmult_1_r.202203assert (q * (p * (1 + (2 # 1) ^ (-63)) ^ 2) == q * p + (q * p) * ((2 # 1) ^ (-62) + (2 # 1) ^ (-126))).204qreduce ((1 + (2 # 1) ^ (-63)) ^ 2).205qreduce ((2 # 1) ^ (-62) + (2 # 1) ^ (-126)).206ring_simplify.207reflexivity.208rewrite H5.209210rewrite Qplus_comm.211rewrite Qlt_move_right.212ring_simplify (q * p + r - q * p).213qreduce ((2 # 1) ^ (-62) + (2 # 1) ^ (-126)).214215apply Qlt_le_trans with (y := 1).216rewrite Qlt_mult_by_z with (z := 85070591730234615865843651857942052864 # 18446744073709551617).217ring_simplify.218219apply Qle_lt_trans with (y := ((2 # 1) ^ 31 - (2#1)) * ((2 # 1) ^ 31 - 1)).220apply Qle_mult_quad.221assumption. psatzl Q. psatzl Q. psatzl Q. psatzl Q. psatzl Q. assumption. psatzl Q. psatzl Q.222Qed.223224Theorem qest_lt_qplus1:225forall (p q r:Q),226(0 < p) -> (p <= (2#1)^31 - 1) ->227(0 <= q) -> (q <= p - 1) ->228(1 <= r) -> (r <= p - 1) ->229((q * p + r) * (1 + (2#1)^(-63))^2) / p < q + 1.230Proof.231intros.232rewrite Qlt_mult_by_z with (z := p).233234unfold Qdiv.235rewrite <- Qmult_assoc.236rewrite (Qmult_comm (/ p) p).237rewrite Qmult_inv_r.238rewrite Qmult_1_r.239240assert ((q * p + r) * (1 + (2 # 1) ^ (-63)) ^ 2 == q * p + r + (q * p + r) * ((2 # 1) ^ (-62) + (2 # 1) ^ (-126))).241qreduce ((1 + (2 # 1) ^ (-63)) ^ 2).242qreduce ((2 # 1) ^ (-62) + (2 # 1) ^ (-126)).243ring_simplify. reflexivity.244rewrite H5.245246rewrite <- Qplus_assoc. rewrite <- Qplus_comm. rewrite Qlt_move_right.247ring_simplify ((q + 1) * p - q * p).248249rewrite <- Qplus_comm. rewrite Qlt_move_right.250251apply Qlt_le_trans with (y := 1).252qreduce ((2 # 1) ^ (-62) + (2 # 1) ^ (-126)).253254rewrite Qlt_mult_by_z with (z := 85070591730234615865843651857942052864 # 18446744073709551617).255ring_simplify.256257ring_simplify in H0.258apply Qle_lt_trans with (y := (2147483646 # 1) * (2147483647 # 1) + (2147483646 # 1)).259260apply Qplus_le_compat.261apply Qle_mult_quad.262assumption. psatzl Q. auto with qarith. assumption. psatzl Q.263auto with qarith. auto with qarith.264psatzl Q. psatzl Q. assumption.265Qed.266267268269270271