CoCalc -- Report.ipynb

Jupyter notebook Report.ipynb

Michael McIntire - AMath 483/583 - High Performance Scientific Computing

¹¹ views

Kernel: Python 2 (SageMath)

computational complexity of solve upper and lower triangular

These operation do some form of matrix multiplication and matrix devision. On an NxN matix this does N operations N times, so the each type of operation takes N-squared operations or 2N-squared. In big-oh notation this is O(N^2).

The code I used for solve_upper_triangular is:

void solve_lower_triangular(double* out, double* L, double* b, int N)
{
  // for the rows in L
  for (int i=0; i<N; i++)
  {
    // rowadd will add up all the L*x elements not on the diagonal.
    double rowadd = 0;
    // The sole purpose of this for loop is the non diagonal elements.
    for (int j=0; j<i; j++)
    {
      rowadd += L[i*N+j]*out[j];
    }
    // When i is 0 for loop j is skipped and we get x0.
    out[i] = (b[i]-rowadd)/L[i*N+i];
  }
}

Description of function

In this code 'solve_lower_triangular' I used forward substitution to solve for 'x' given the lower triangular matrix 'L' and the vector 'b'. The code uses two for loops to solve the triangular system. The first for loop introduces a variable i to track the rows of the matrix 'L' and the elements of the vector 'b'. A place holder (rowadd) for the sum of the non diagonal elements of the lower triangle is introduced and set to zero in this loop. The second for loop is tracking the diagonal and ensuring that the diagonal elements are not included in the sum of rowadd. This is done by making sure the integer j stays below the integer i, since in a lower triangular matrix the elements on the right hand side of the diagonal are zero they do not need to be considered. One the necessary elements have been added up the code exits the second for loop and enters back into the first for loop where the vector out will store the value of the vector b[i] - rowadd all divided by the diagonal of the matrix 'L'. This repeats N times, where N is the number of rows and elements of 'L', and 'b' respectively.

Is the function contiguous

The function 'solve_lower_triangular' has a contiguous memory access pattern. The code does not use all the elements array 'L' especially for the first few rows of the matrix, but it does access the memory efficiently. Using L[i*N+j] ensures that the memory elements are accessed in numerical order allowing the system to access the memory from the cache in an efficient way. The access of memory is not always contiguous. Accessing the diagonals of the 'L' matrix to solve for out is not contiguous, just as the accessing the rows of 'L' are not always completely contiguous.

requesting upper triangular

This would be very inefficient. The result would be the operator pulling one element of the matrix from the cache at the time of execution. The cache would have to accessed at almost every time step of the function.

The code I used for gauss_seidel is:

int gauss_seidel(double* out, double* A, double* b, int N, double epsilon)
{
  // Creating the lower, upper, and diagonal matrices from A.
  double* L = (double*) malloc(N*N*sizeof(double));
  create_lower_triangle(L, A, N);
  double* U = (double*) malloc(N*N*sizeof(double));
  create_upper_triangle(U, A, N);
  double* D = (double*) malloc(N*N*sizeof(double));
  create_diag(D, A, N);

  // need to create an initial guess of xk as a array of zeros length N.
  double* xk = (double*) malloc(N*sizeof(double));
  for (int i=0; i<N; i++)
  {
    xk[i] = 0;
  }

  // creates a value for out to get the norm
  solve_gauss_seidel(out, L, U, D, b, xk, N);
  int counter = 1;
 
  // gets the norm of the difference of the initial guess and the returned value.
  double* sub = (double*) malloc(N*sizeof(double));
  vec_sub(sub, out, xk, N);
  double diffx = vec_norm(sub, N);

  // if the initial guess is not close enough to the correct answer.
  while (diffx > epsilon)
  {
    for (int k=0; k<N; k++)
    {
      xk[k] = out[k];
    }
    solve_gauss_seidel(out, L, U, D, b, xk, N);
    counter++;
    vec_sub(sub, out, xk, N);
    diffx = vec_norm(sub, N);
  }

  free(sub);
  free(xk);
  free(L);
  free(U);
  free(D);
  return counter;
}

Description of function

This function takes in the matrix 'A', the vector 'b' and solves for the value of 'x' using the Gauss-Seidel method. The matrix 'A' is broken down to the lower, upper, and diagonal matrix. These matrices are sent off, along with vector 'b' and an initial guess 'xk' to the function solve_gauss_seidel, which returns another vector for x. This value is stored in xk and used as the next "guess" for the next round. This process repeats until the two norm of the difference between the guess sent out and the one returned is less than some value epsilon.

Memory requirements

The memory requirements of function have to do with the number of matricies, vectors, and integers called. The matrices 'A', 'L', 'U', and 'D' require N-squared doubles for storage, while the vectors, 'b', 'xn', 'sub' and 'out' require N doubles. This is around 4N-squared doubles + 4N doubles.

Contiguous parts of gauss_seidel

The contiguous parts of the function gauss_seidel are the portions of the code where I am setting the initial guess of vector 'xk' to zero, and wher I set 'xk' to the value of out. There is also some contiguous functions that are called by gauss_seidel which are the vec_sub function, the vec_norm function, and creating the 'L', 'U', and 'D' matrices. The function solve_gauss_seidel has some contiguous parts to it but it also has some sections that are not coontiguous.

Non-contiguous parts of gauss_seidel

The non-contiguous portion of the function gauss_seidel are the two instances when the code calls solve_gauss_seidel. This function calls from memory rows of the matrix at a time, but never uses the full row in a calculation, and in some instances only one element of the row before needing to call over the next row. To improve on this I could possibly create an array of only the triangular portions of the matrix that I use in the calculations, but creating that array would not be a contiguous operation.

Product

Resources

Company