Path: blob/main/Modules/_decimal/libmpdec/literature/matrix-transform.txt
12 views
12(* Copyright (c) 2011-2020 Stefan Krah. All rights reserved. *)345The Matrix Fourier Transform:6=============================78In libmpdec, the Matrix Fourier Transform [1] is called four-step transform9after a variant that appears in [2]. The algorithm requires that the input10array can be viewed as an R*C matrix.1112All operations are done modulo p. For readability, the proofs drop all13instances of (mod p).141516Algorithm four-step (forward transform):17----------------------------------------1819a := input array20d := len(a) = R * C21p := prime22w := primitive root of unity of the prime field23r := w**((p-1)/d)24A := output array25261) Apply a length R FNT to each column.27282) Multiply each matrix element (addressed by j*C+m) by r**(j*m).29303) Apply a length C FNT to each row.31324) Transpose the matrix.333435Proof (forward transform):36--------------------------3738The algorithm can be derived starting from the regular definition of39the finite-field transform of length d:4041d-142,----43\44A[k] = | a[l] * r**(k * l)45/46`----47l = 0484950The sum can be rearranged into the sum of the sums of columns:5152C-1 R-153,---- ,----54\ \55= | | a[i * C + j] * r**(k * (i * C + j))56/ /57`---- `----58j = 0 i = 0596061Extracting a constant from the inner sum:6263C-1 R-164,---- ,----65\ \66= | r**k*j * | a[i * C + j] * r**(k * i * C)67/ /68`---- `----69j = 0 i = 0707172Without any loss of generality, let k = n * R + m,73where n < C and m < R:7475C-1 R-176,---- ,----77\ \78A[n*R+m] = | r**(R*n*j) * r**(m*j) * | a[i*C+j] * r**(R*C*n*i) * r**(C*m*i)79/ /80`---- `----81j = 0 i = 0828384Since r = w ** ((p-1) / (R*C)):8586a) r**(R*C*n*i) = w**((p-1)*n*i) = 18788b) r**(C*m*i) = w**((p-1) / R) ** (m*i) = r_R ** (m*i)8990c) r**(R*n*j) = w**((p-1) / C) ** (n*j) = r_C ** (n*j)9192r_R := root of the subfield of length R.93r_C := root of the subfield of length C.949596C-1 R-197,---- ,----98\ \99A[n*R+m] = | r_C**(n*j) * [ r**(m*j) * | a[i*C+j] * r_R**(m*i) ]100/ ^ /101`---- | `---- 1) transform the columns102j = 0 | i = 0103^ |104| `-- 2) multiply105|106`-- 3) transform the rows107108109Note that the entire RHS is a function of n and m and that the results110for each pair (n, m) are stored in Fortran order.111112Let the term in square brackets be f(m, j). Step 1) and 2) precalculate113the term for all (m, j). After that, the original matrix is now a lookup114table with the mth element in the jth column at location m * C + j.115116Let the complete RHS be g(m, n). Step 3) does an in-place transform of117length n on all rows. After that, the original matrix is now a lookup118table with the mth element in the nth column at location m * C + n.119120But each (m, n) pair should be written to location n * R + m. Therefore,121step 4) transposes the result of step 3).122123124125Algorithm four-step (inverse transform):126----------------------------------------127128A := input array129d := len(A) = R * C130p := prime131d' := d**(p-2) # inverse of d132w := primitive root of unity of the prime field133r := w**((p-1)/d) # root of the subfield134r' := w**((p-1) - (p-1)/d) # inverse of r135a := output array1361370) View the matrix as a C*R matrix.1381391) Transpose the matrix, producing an R*C matrix.1401412) Apply a length C FNT to each row.1421433) Multiply each matrix element (addressed by i*C+n) by r**(i*n).1441454) Apply a length R FNT to each column.146147148Proof (inverse transform):149--------------------------150151The algorithm can be derived starting from the regular definition of152the finite-field inverse transform of length d:153154d-1155,----156\157a[k] = d' * | A[l] * r' ** (k * l)158/159`----160l = 0161162163The sum can be rearranged into the sum of the sums of columns. Note164that at this stage we still have a C*R matrix, so C denotes the number165of rows:166167R-1 C-1168,---- ,----169\ \170= d' * | | a[j * R + i] * r' ** (k * (j * R + i))171/ /172`---- `----173i = 0 j = 0174175176Extracting a constant from the inner sum:177178R-1 C-1179,---- ,----180\ \181= d' * | r' ** (k*i) * | a[j * R + i] * r' ** (k * j * R)182/ /183`---- `----184i = 0 j = 0185186187Without any loss of generality, let k = m * C + n,188where m < R and n < C:189190R-1 C-1191,---- ,----192\ \193A[m*C+n] = d' * | r' ** (C*m*i) * r' ** (n*i) * | a[j*R+i] * r' ** (R*C*m*j) * r' ** (R*n*j)194/ /195`---- `----196i = 0 j = 0197198199Since r' = w**((p-1) - (p-1)/d) and d = R*C:200201a) r' ** (R*C*m*j) = w**((p-1)*R*C*m*j - (p-1)*m*j) = 1202203b) r' ** (C*m*i) = w**((p-1)*C - (p-1)/R) ** (m*i) = r_R' ** (m*i)204205c) r' ** (R*n*j) = r_C' ** (n*j)206207d) d' = d**(p-2) = (R*C) ** (p-2) = R**(p-2) * C**(p-2) = R' * C'208209r_R' := inverse of the root of the subfield of length R.210r_C' := inverse of the root of the subfield of length C.211R' := inverse of R212C' := inverse of C213214215R-1 C-1216,---- ,---- 2) transform the rows of a^T217\ \218A[m*C+n] = R' * | r_R' ** (m*i) * [ r' ** (n*i) * C' * | a[j*R+i] * r_C' ** (n*j) ]219/ ^ / ^220`---- | `---- |221i = 0 | j = 0 |222^ | `-- 1) Transpose input matrix223| `-- 3) multiply to address elements by224| i * C + j225`-- 3) transform the columns226227228229Note that the entire RHS is a function of m and n and that the results230for each pair (m, n) are stored in C order.231232Let the term in square brackets be f(n, i). Without step 1), the sum233would perform a length C transform on the columns of the input matrix.234This is a) inefficient and b) the results are needed in C order, so235step 1) exchanges rows and columns.236237Step 2) and 3) precalculate f(n, i) for all (n, i). After that, the238original matrix is now a lookup table with the ith element in the nth239column at location i * C + n.240241Let the complete RHS be g(m, n). Step 4) does an in-place transform of242length m on all columns. After that, the original matrix is now a lookup243table with the mth element in the nth column at location m * C + n,244which means that all A[k] = A[m * C + n] are in the correct order.245246247--248249[1] Joerg Arndt: "Matters Computational"250http://www.jjj.de/fxt/251[2] David H. Bailey: FFTs in External or Hierarchical Memory252http://crd.lbl.gov/~dhbailey/dhbpapers/253254255256257258