CoCalc -- hutidoc.tex

GitHub Repository: ElmerCSC/elmerfem
Path: blob/devel/fhutiter/doc/hutidoc.tex
³²⁰⁶ views
1
%
2
% Documentation for the HUT-Iter library
3
%
4

5
\documentclass[11pt,a4paper,english,oneside]{report}
6
\usepackage[us]{datetime}
7
\usepackage[latin1]{inputenc}
8
\usepackage{float,graphicx,t1enc}
9

10
\title{HUTI - HUT Iter Library, User's Guide}
11

12
\author{Jouni Malinen\\
13
CSC - IT Center for Science Ltd.\\
14
P.O.BOX 405, FIN-02101 Espoo, Finland\\
15
Jouni.Malinen@csc.fi}
16

17
\date{\formatdate{01}{08}{1997}\\
18
Version 1.0
19
}
20

21
% ------------ Here begins the actual document
22
%              Standard stuff
23

24
\begin{document}
25

26
% Miscellaneous settings
27

28
\setcounter{secnumdepth}{4}
29
\setcounter{tocdepth}{4}
30

31
\pagenumbering{roman}
32
\pagestyle{plain}
33

34
\maketitle
35

36
\tableofcontents
37
% \listoffigures
38
\listoftables
39

40
% ------------ The actual text begins here
41

42
% ------------------------------------------------------------------------
43
% ------------------------------------------------------------------------
44

45
\chapter{Introduction}
46
\label{ch:intro}
47
\pagenumbering{arabic}
48
\pagestyle{headings}
49

50
\section{General}
51

52
Many computational problems require the solution of linear systems of
53
equations $Ax = b$, where $A$ is the coefficient matrix, $b$ is
54
the {\em right-hand side} \/and $x$ is the solution.
55

56
There are several methods for solving linear systems of equations. These
57
can be divided into two classes, direct and iterative methods. In
58
general direct methods are preferred because of their predictable
59
behaviour and robustness. However, the need to solve very large linear
60
systems in a reasonable time and with limited resources has been one of
61
the key reasons for the development of iterative solvers. 
62
We now know more of the behaviour and suitability of these methods
63
and are able to use them in different kind of applications.
64

65
HUTI is an effort to make an effective and well structured library
66
containing a collection of iterative methods. The methods implemented
67
in the library are:
68

69
\begin{itemize}
70
\item Conjugate Gradient (CG) \cite{Bar93}
71
\item Conjugate Gradient Squared (CGS) \cite{Bar93}
72
\item Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) \cite{Bar93}
73
\item Bi-Conjugate Gradient Stabilized (2) (Bi-CGSTAB(2))
74
\item Quasi-Minimal Residual (QMR) \cite{Bar93,Fre91,Fre94,Buc96}
75
\item Transpose-Free Quasi-Minimal Residual (TFQMR) \cite{Fre93b}
76
\item Generalized Minimum Residual (GMRES) \cite{Bar93,Saa96}
77
\end{itemize}
78

79
This library supports both serial and parallel execution. Parallelisation
80
has been made in a distributed memory environment using message passing
81
as a communication method between processes. User has the same interface
82
into the library for both of the execution models. The selection of the
83
model can be controlled with special library routines or via environment
84
variables.
85

86
\section{Why HUTI?}
87

88
There are already several implementations of various iterative
89
methods both as libraries and as ``plain code''
90
\cite{Cun95,Bal95,Saa95,Fre96}.
91
HUTI differs from these implementations in the sense that it has been
92
specially tuned for the parallel architecture it is using and is
93
not meant to be a general purpose code.
94

95
Another reason for making this library is the
96
master's thesis of the author. HUTI has also been incorporated into
97
a software called ELMER which in turn is made for a
98
TEKES\footnote{Technology Development Center in Finland}
99
funded project VIRKE\footnote{VIRtauslaskentaohjelmiston KEhitt�minen}.
100

101
The name HUTI comes from Helsinki University of Technology (HUT) and
102
Iterative solvers.
103

104
% ------------------------------------------------------------------------
105
% ------------------------------------------------------------------------
106

107
\chapter{Iterative Methods}
108
\label{ch:methods}
109

110
This chapter should somehow present the characteristics of different
111
iterative methods and their usability in different problem areas.
112

113
\section{Overview of the Methods}
114
\section{Preconditioning}
115
\section{Stopping Criteria}
116
\section{Convergence}
117
\section{Parallelism}
118

119
% ------------------------------------------------------------------------
120
% ------------------------------------------------------------------------
121

122
\chapter{Using HUTI}
123
\label{ch:using}
124

125
\section{Naming Conventions}
126

127
All the HUTI routine names and variables start with the 
128

129
\begin{minipage}{1in}
130
\begin{center}
131
\bigskip
132
{\ttfamily huti\_} \\
133
or \\
134
{\ttfamily HUTI\_}
135
\bigskip
136
\end{center}
137
\end{minipage}
138

139
\noindent prefix. In the routine names the precision is denoted by an
140
appropriate character: {\ttfamily s} for
141
{\em single precision},
142
{\ttfamily d} for {\em double precision}, {\ttfamily c} for {\em complex}
143
and {\ttfamily z} for {\em double complex}.
144

145
% ------------------------------------------------------------------------
146

147
\section{Driver Routines}
148

149
The key idea in HUTI is that all iterator routines have the same calling
150
conventions regardless of the selected method.
151
All matrix related operations are done externally from the iterator
152
library. This means that the solver doesn't need to know the exact matrix
153
structure. Matrix can be stored for example in well-known Compressed
154
Row Storage (CRS) or Compressed Comlumn Storage (CCS) formats.
155
This eases the optimization of memory usage in each particular case.
156

157
In parallel setting it is on users responsibility to define the storage
158
convention for the distribution of matrices and vectors.
159
Well-known distribution concepts are for example block-cyclic decomposition
160
and domain based decompositions. More information can be found from
161
\cite{Kum94,Saa96}. See also chapter \ref{ch:examples} for an example
162
of user supplied distribution of data.
163

164
Solver routines are called in the following way:
165

166
\noindent
167
\begin{tabbing}
168
{\ttfamily CALL HUTI\_$*$\_{\em SOLVER\_TYPE}} {\ttfamily (} \= 
169
{\ttfamily X, RHS, IPAR, DPAR, WORK, MATVEC,} \\
170
\> {\ttfamily PCONDL, PCONDR, DOTPROD, NORM,} \\
171
\> {\ttfamily STOPC)} \\
172
\end{tabbing}
173

174
\noindent
175
\begin{tabular*}{\textwidth}{lll}
176
where & $*$ & is either {\ttfamily S, D, C} or {\ttfamily Z}
177
depending on the precision. \\
178
& {\ttfamily {\em SOLVER\_TYPE}} & is either {\ttfamily CG, CGS, BICGSTAB,
179
BICGSTAB\_2, QMR, TFQMR} or {\ttfamily GMRES} \\
180
& & depending on the method. \\
181
\end{tabular*}
182

183
Table \ref{table:solver-param} describes the parameters for solver
184
routines.
185

186
\begin{table}[H]
187
\begin{tabular*}{\textwidth}{lll}
188
\hline\hline
189
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
190
\hline
191
X	& vector of 		& Vector $x$, the current iterate \\
192
	& type $*$		& \\
193
RHS 	& vector of		& $b$, the Right Hand Side \\
194
	& type $*$		& \\
195
IPAR	& vector type		& IPAR-structure, see section \ref{sec:ipars} \\
196
	& integer		& \\
197
DPAR 	& vector of type	& DPAR-structure, see section \ref{sec:dpars} \\
198
	& double prec.		& \\
199
WORK 	& matrix of		& User allocated working array, size varies\\
200
	& type $*$		& depending on the method, see table \ref{table:ipar-input} \\
201
MATVEC 	& subroutine 		& User supplied external routine, \\
202
	&			& must perform matrix-vector product \\
203
PCONDL 	& subroutine 		& User supplied routine for left side \\
204
	&			& preconditioning \\
205
PCONDR 	& subroutine 		& User supplied routine for right side \\
206
	&			& preconditioning \\
207
DOTPROD	& function 		& Used supplied routine to perform dot \\
208
	&			& product \\
209
NORM 	& function 		& User supplied routine returning norm \\
210
	&			& of a vector \\
211
STOPC 	& function 		& User supplied routine to perform stopping\\
212
	&			& criteria testing \\
213
\hline\hline
214
\end{tabular*}
215
\caption{Parameters for the solver routines}
216
\label{table:solver-param}
217
\end{table}
218

219
The external routine {\ttfamily MATVEC} is the only needed routine when
220
calling a solver. It should perform matrix-vector product.
221
Using zeros in place of the other external routine names will force the
222
library to use default routines applicapable to the selected execution model.
223
For example {\em double complex} Conjugate Gradient method could be called
224
from a FORTRAN program in the following way:
225

226
\medskip
227
\noindent
228
{\ttfamily CALL HUTI\_Z\_CG (X, RHS, IPAR, DPAR, WORK, MATVEC, 0, 0, 0, 0, 0)}
229
\medskip
230

231
\noindent
232
where {\ttfamily X, RHS, IPAR, DPAR} are user supplied vectors and
233
{\ttfamily WORK} is the user allocated work space (an array) for the iterator.
234
In this case the library would use BLAS-1 calls for {\ttfamily DOTPROD} and
235
{\ttfamily NORM} if executed in serial execution mode. There would be no
236
preconditioning applied. The {\ttfamily IPAR} and {\ttfamily DPAR} structures
237
must contain user supplied information about the dimensions of the vectors
238
and work array and certain control information for the iterators.
239

240
\section{External Routines}
241

242
This section describes external routines that can be given as arguments
243
for the solver routine. Only {\ttfamily MATVEC} routine is required,
244
others routines are optional.
245

246
These routines are called from the solver and type of arguments and order
247
is presented for each routine.
248

249
The matrix $A$ can be stored in any format because it is totally on
250
the user's responsibility to make it available for the external routines.
251

252
The {\ttfamily IPAR} structure is passed to some of the external routines
253
and is used to carry on certain control variables from the solver routine.
254
In the {\ttfamily IPAR} structure there is for example the assumed type
255
of the matrix in external operation. This applies for both the
256
matrix-vector operation
257
$Au = v$ and preconditioning operations $M_{1}^{-1}u = v$ and
258
$M_{2}^{-1}u = v$.
259

260
\subsection{Matrix-Vector operation}
261

262
The arguments for the external matrix-vector operation {\ttfamily MATVEC}
263
are given in Table \ref{table:matvec-param}. This routine should perform
264
matrix-vector product. In the {\ttfamily IPAR} structure the iterator
265
provides information about the matrix form, should it be transposed or not.
266
Only non-transposed forms are used in CG, CGS, Bi-CGSTAB, TFQMR and GMRES
267
methods.
268
Only QMR will need a transposed matrix-vector product, that is $A^{T}u = v$.
269

270
The calling convention for the {\ttfamily MATVEC} is:
271

272
\bigskip
273
\noindent
274
{\ttfamily SUBROUTINE MATVEC ( U, V, IPAR )}
275
\bigskip
276

277
\begin{table}[H]
278
\begin{tabular*}{\textwidth}{lll}
279
\hline\hline
280
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
281
\hline
282
U	& vector of 		& Vector u in $Au = v$ \\
283
	& type $*$		& \\
284
V 	& vector of		& Vector v in $Au = v$ \\
285
	& type $*$		& \\
286
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
287
	& integer		& \\
288
\hline\hline
289
\end{tabular*}
290
\caption{Parameters for the external MATVEC subroutine}
291
\label{table:matvec-param}
292
\end{table}
293

294
\subsection{Preconditioning}
295

296
The routines {\ttfamily PCONDL} and {\ttfamily PCONDR} should solve
297
the $M_{1}u = v$ and $M_{2}u = v$ respectively if preconditioning
298
matrix is splitted in two parts. If only one preconditioning matrix $M$ is
299
available, the {\ttfamily PCONDL} routine should solve $Mu = v$ and
300
{\ttfamily PCONDR} should not be supplied for the solver (the argument must
301
be zero).
302
 
303
The arguments for the external preconditioning operations
304
{\ttfamily PCONDL} and {\ttfamily PCONDR} are given in
305
Table \ref{table:pcond-param}. Preconditioning routines should use the
306
information in {\ttfamily IPAR} structure to apply transposed or
307
non-transposed solve when needed. Only QMR method will need the
308
$M^{-T}u = v$ operation.
309

310
The calling convention for the {\ttfamily PCONDL} is
311

312
\bigskip
313
\noindent
314
{\ttfamily SUBROUTINE PCONDL ( U, V, IPAR )}
315
\bigskip
316

317
\noindent
318
and for the {\ttfamily PCONDR}
319

320
\bigskip
321
\noindent
322
{\ttfamily SUBROUTINE PCONDR ( U, V, IPAR )}
323
\bigskip
324

325
\begin{table}[H]
326
\begin{tabular*}{\textwidth}{lll}
327
\hline\hline
328
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
329
\hline
330
U	& vector of 		& Vector u in $Mu = v$ \\
331
	& type $*$		& \\
332
V 	& vector of		& Vector v in $Mu = v$ \\
333
	& type $*$		& \\
334
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
335
	& integer		& \\
336
\hline\hline
337
\end{tabular*}
338
\caption{Parameters for the external PCONDL and PCONDR subroutines}
339
\label{table:pcond-param}
340
\end{table}
341

342
\subsection{Global Dot Product}
343

344
The external function {\ttfamily DOTPROD} is supposed to perform global
345
dot product for two given vectors. In the serial case this routine
346
is by default the corresponding BLAS-1 routine. In the parallel case
347
this is the place to do global product using for example
348
{\ttfamily MPI\_ALLREDUCE} function to sum up the local products
349
computed using for example BLAS-1 routine.
350

351
The calling convention for the function {\ttfamily DOTPROD} is
352

353
\bigskip
354
\noindent
355
{\ttfamily FUNCTION DOTPROD ( NDIM, X, INCX, Y, INCY )}
356
\bigskip
357

358
\begin{table}[H]
359
\begin{tabular*}{\textwidth}{lll}
360
\hline\hline
361
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
362
\hline
363
NDIM	& integer		& Dimension of vectors X and Y\\
364
X	& vector of 		& Vector x in $x \cdot y$ \\
365
	& type $*$		& \\
366
INCX	& integer		& The increment for the elements of X \\
367
Y 	& vector of		& Vector y in $x \cdot y$ \\
368
	& type $*$		& \\
369
INCY	& integer		& The increment for the elements of Y \\
370
\hline\hline
371
\end{tabular*}
372
\caption{Parameters for the external DOTPROD function}
373
\label{table:dotprod-param}
374
\end{table}
375

376
The function {\ttfamily DOTPROD} must return a value of the same type as the
377
argument vectors.
378

379
\subsection{Global Vector Norm}
380

381
The external routine {\ttfamily NORM} is used to produce the global
382
vector norm, usually the vector 2-norm $\|x\|_{2}$. In the serial case
383
this routine is by default the corresponding BLAS-1 routine. In parallel
384
case this is very similar as {\ttfamily DOTPROD} function.
385

386
The calling convention for the function {\ttfamily NORM} is
387

388
\bigskip
389
\noindent
390
{\ttfamily FUNCTION NORM ( NDIM, X, INCX )}
391
\bigskip
392

393
\begin{table}[H]
394
\begin{tabular*}{\textwidth}{lll}
395
\hline\hline
396
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
397
\hline
398
NDIM	& integer		& Dimension of vector X \\
399
X	& vector of 		& Vector x in $\|x\|$ \\
400
	& type $*$		& \\
401
INCX	& integer		& The increment for the elements of X \\
402
\hline\hline
403
\end{tabular*}
404
\caption{Parameters for the external NORM function}
405
\label{table:norm-param}
406
\end{table}
407

408
The function {\ttfamily NORM} must return a value that is either
409
real if X is single precision ({\em real or complex}) or double if X is
410
double precision ({\em double precision or double complex}).
411

412
\subsection{Stopping Criterion}
413

414
Stopping criterion can be selected from the built-in stopping criteria
415
or it can be supplied by the user. Built-in alternatives are
416
listed in table \ref{table:ipar-input}.
417

418
The calling convention for the user supplied function {\ttfamily STOPC} is
419

420
\bigskip
421
\noindent
422
{\ttfamily FUNCTION STOPC ( X, B, R, IPAR, DPAR )}
423
\bigskip
424

425
\begin{table}[H]
426
\begin{tabular*}{\textwidth}{lll}
427
\hline\hline
428
{\bfseries Argument} & {\bfseries Type} & {\bfseries Description} \\
429
\hline
430
X	& vector of 		& Current iterate $x_{n}$ \\
431
	& type $*$		& \\
432
B	& vector of 		& The original right-hand side \\
433
	& type $*$		& \\
434
R	& vector of 		& Current residual vector $r_{n}$ \\
435
	& type $*$		& \\
436
IPAR	& vector of type	& IPAR-structure, see section \ref{sec:ipars} \\
437
	& integer		& \\
438
DPAR	& vector of type	& DPAR-structure, see section \ref{sec:dpars} \\
439
	& double precision	& \\
440
\hline\hline
441
\end{tabular*}
442
\caption{Parameters for the external STOPC function}
443
\label{table:stopc-param}
444
\end{table}
445

446
The function {\ttfamily STOPC} must return a value of same type as
447
the {\ttfamily NORM} function for selected precision. See the previous
448
section.
449

450
The returned value should describe somehow how close the current iterate is
451
the user supplied tolerance. It will be tested against the tolerance and
452
printed if requested.
453

454
% ------------------------------------------------------------------------
455

456
\section{Iteration Parameters}
457

458
\subsection{IPAR -Structure}
459
\label{sec:ipars}
460

461
The {\ttfamily IPAR} structure is used to control the progress and behaviour
462
of the iterator routine and to get status back from it. {\ttfamily IPAR}
463
is also passed further to some of the user supplied routines.
464

465
Input parameters are described in table \ref{table:ipar-input} along with
466
their default values, output parameters are in table \ref{table:ipar-output}.
467

468
A more detailed description of the various parameters and output values
469
for each solver is listed on the corresponding reference pages.
470

471
\begin{table}[H]
472

473
\begin{tabular*}{\textwidth}{lll}
474
\hline\hline
475
{\bfseries Element} & {\bfseries Description} & {\bfseries Default} \\
476
\hline\hline
477
	& {\em General parameters} 				& \\
478
\hline
479
1	& Length of the IPAR structure 				& 50 \\
480
2	& Length of the DPAR structure 				& 10 \\
481
3	& Leading dimension of the matrix (and vectors)		& \\
482
4	& Number of vectors in the {\ttfamily work} array: 	& \\
483
	& CG: 4 						& \\
484
	& CGS: 7 						& \\
485
	& Bi-CGSTAB: 8						& \\
486
	& Bi-CGSTAB\_2: 8					& \\
487
	& QMR: 14 						& \\
488
	& TFQMR: 10 						& \\
489
	& GMRES: 7 + number of restart vectors			& \\
490
5 	& Number of iterations between debug output 		& 0 \\
491
6 	& Assumed matrix type in external operations		& \\
492
	& 0: Matrix must {\em not} be transposed 		& \\
493
	& 1: Matrix must be transposed 				& \\
494
\hline
495
	& {\em Iteration parameters} 				& \\
496
\hline
497
10 	& Maximum number of iterations allowed 			& 5000 \\
498
12 	& Stopping criterion used: 				& 0 \\
499
	& ($\epsilon$ is the tolerance given by the user, see table \ref{table:dpar}) & \\
500
	& 0: $\|r_{n}\| < \epsilon$ 				& \\
501
	& 1: $\|r_{n}\| < \epsilon \|b\| $ 			& \\
502
	& 2: $\|z_{n}\| < \epsilon$ 				& \\
503
	& 3: $\|z_{n}\| < \epsilon \|b\| $ 			& \\
504
	& 4: $M^{-1}\|z_{n}\| < \epsilon M^{-1}\|b\| $ 		& \\
505
	& 5: $\|x_{n} - x_{n-1}\| < \epsilon $ 			& \\
506
	& 6: {\em upper bound} $< \epsilon$ (only with TFQMR)	& \\
507
	& 10: Use the user supplied routine {\ttfamily STOPC} 	&\\
508
13	& Preconditioning technique used: 			& 0 \\
509
	& 0: None 						& \\
510
	& 1: Right preconditioning 				& \\
511
	& 2: Left preconditioning 				& \\
512
	& 3: Symmetric preconditioning 				& \\
513
14 	& Initial $x_{0}$, starting vector: 			& 0 \\
514
	& 0: Random $x_{0}$					& \\
515
	& 1: User supplied $x_{0}$, vector in {\ttfamily XVEC} 	& \\
516
15	& Number of restart vectors in GMRES(m)		 	& 1 \\
517
\hline
518
	& {\em Parallel environment parameters} 		& \\
519
\hline
520
20 	& Processor identification number for specific process 	& \\
521
21 	& Number of processors 					& 1 \\
522
\hline\hline
523
\end{tabular*}
524

525
\caption{IPAR-structure, input parameters}
526
\label{table:ipar-input}
527
\end{table}
528

529
\begin{table}[H]
530

531
\begin{tabular*}{\textwidth}{ll}
532
\hline\hline
533
{\bfseries Element} & {\bfseries Description} \\
534
\hline\hline
535
	& {\em General parameters} \\
536
\hline
537
30 	& Status information: \\
538
	& 0: No change \\
539
	& 1: Iteration converged \\
540
	& 2: Maximum number of iterations reached \\
541
	& 10: QMR breakdown in $\rho$ or $\psi$ \\
542
	& 11: QMR breakdown in $\delta$ \\
543
	& 12: QMR breakdown in $\epsilon$ \\
544
	& 13: QMR breakdown in $\beta$ \\
545
	& 14: QMR breakdown in $\gamma$ \\
546
	& 20: CG breakdown in $\rho$ \\
547
	& 25: CGS breakdown in $\rho$ \\
548
	& 30: TFQMR breakdown in $\rho$ \\
549
	& 35: Bi-CGSTAB breakdown in $\rho$ \\
550
	& 36: Bi-CGSTAB breakdown in $\|s\|$ \\
551
	& 37: Bi-CGSTAB breakdown in $\omega$ \\
552
31 	& Number of iterations run through \\
553
\hline\hline
554
\end{tabular*}
555

556
\caption{IPAR-structure, output parameters}
557
\label{table:ipar-output}
558
\end{table}
559

560
\subsection{DPAR -Structure}
561
\label{sec:dpars}
562

563
For parameters of type {\em double precision} there is a structure
564
called {\ttfamily DPAR}. Table \ref{table:dpar} describes the
565
elements of this structure.
566

567
\begin{table}[h]
568
\begin{tabular*}{\textwidth}{lll}
569
\hline\hline
570
{\bfseries Element} & {\bfseries Description} & {\bfseries Default} \\
571
\hline\hline
572
	& {\em General parameters} & \\
573
\hline
574
1 	& Tolerance used by stopping criterion & $10e^{-6}$ \\
575
\hline\hline
576
\end{tabular*}
577
\caption{DPAR-structure}
578
\label{table:dpar}
579
\end{table}
580

581
% ------------------------------------------------------------------------
582

583
\section{Header Files}
584
\subsection{{\ttfamily huti\_fdefs.h} and {\ttfamily huti\_defs.h}}
585

586
There are header files in preprocessor format for both Fortran90 and C
587
languages. These header files include definitions for all of the
588
variables described in
589
tables \ref{table:ipar-input}, \ref{table:ipar-output} and \ref{table:dpar}.
590
There are also definitions for possible flags of certain variables and
591
default values.
592

593
The user should use the named definitions by including header file
594
via {\ttfamily \#include ``huti\_defs.h''} for C defines and 
595
{\ttfamily \#include ``huti\_fdefs.h''} for Fortran90 defines. In that
596
way the compatibility is guaranteed also with the later versions of
597
the library.
598

599
% ------------------------------------------------------------------------
600
% ------------------------------------------------------------------------
601

602
\chapter{Examples}
603
\label{ch:examples}
604

605
% ------------ End of the main text
606

607
% ------------ Bibliography
608

609
\begin{thebibliography}{1}
610

611
	\bibitem{Gol89}
612
	Gene H. Golub and Charles F. van Loan, {\em Matrix Computations},
613
	second edition, The Johns Hopkins University Press, 1993.
614

615
        \bibitem{Gei93}
616
        Al Geist et al., {\em PVM 3 User's Guide and Reference Manual},
617
        Oak Ridge National Laboratory, Oak Ridge Tennessee, May 1993.
618

619
        \bibitem{Bar93}
620
        Richard Barrett et al.,{\em Templates for the Solution of Linear 
621
        Systems: Building Blocks for Iterative Methods}, SIAM, 1993
622

623
        \bibitem{Fre91}
624
        Roland W. Freund and No\"el Nachtigal, {\em QMR: a Quasi-Minimal
625
        Residual Method for Non-Hermitian Linear Systems}, Numer. Math. 
626
        60, 315-339, 1991
627

628
        \bibitem{Fre93a}
629
        Roland W. Freund, {\em An Implementation of the Look-Ahead
630
	Lanczos Algorithm for Non-Hermitian Matrices},
631
        SIAM J. Sci. Comput., Vol. 14, No. 1, pp. 137-158, January 1993
632

633
        \bibitem{Fre93b}
634
        Roland W. Freund, {\em A Transpose-Free Quasi-Minimal Residual
635
	Algorithm for Non-Hermitian Linear Systems}, 
636
        SIAM J. Sci. Comput., Vol. 14, No. 2, pp. 470-482, March 1993
637

638
        \bibitem{Fre94}
639
        Roland W. Freund and No\"el Nachtigal, {\em An Implementation of the 
640
        QMR Method Based on Coupled Two-Term Recurrences},
641
        SIAM J. Sci. Comput., Vol. 15, No. 2, pp. 313-337, March 1994
642

643
	\bibitem{Mpi94}
644
	{\em MPI: A Message-Passing Interface Standard}, Message
645
	Passing Interface Forum, April 1994
646

647
	\bibitem{Gro94}
648
	William Gropp, Ewing Lusk and Anthony Skjellum, {\em Using MPI,
649
	Portable Parallel Programming with the Message-Passing Interface},
650
	The MIT Press, 1994
651

652
	\bibitem{Cun95}
653
	Rudnei Dias da Cunha and Tim Hopkins, {\em PIM 2.0, The Parallel
654
	Iterative Methods package for Systems of Linear Equations, User's
655
	Guide}, ftp://unix.hensa.ac.uk/pub/misc/netlib/pim/ug20.ps.gz, 1995
656

657
	\bibitem{Buc96}
658
	H. Martin B\"{u}cker and Manfred Sauren, {\em A Parallel Version
659
	of the Unsymmetric Lanczos Algorithm and its Application to QMR},
660
	Forschungszentrum J\"{u}lich, March 1996
661

662
	\bibitem{Saa96}
663
	Yousef Saad, {\em Iterative Methods for Sparse Linear Systems},
664
	PWS Publishing Company, 1996
665

666
	\bibitem{Kor95}
667
	Samuel Kortas and Philippe Angot, {\em A Practical and Portable
668
	Model of Programming for Iterative Solvers on Distributed Memory
669
	Machines}, Parallel Computing, Vol. 22, No. 4, June 1996
670

671
	\bibitem{Jon95}
672
	Mark T. Jones and Paul E. Plassman, {\em BlockSolve95 Users Manual:
673
	Scalable Library Software for the Parallel Solution of Sparse
674
	Linear Systems}, Argonne National Laboratory ANL-95/48,
675
	December 1995
676

677
	\bibitem{Bal95}
678
	S. Balay, W. Gropp, L. C. McInnes and B. Smith, {\em PETSc 2.0
679
	Users Manual}, Argonne National Laboratory ANL-95/11, 1995
680

681
	\bibitem{Saa95}
682
	Yousef Saad and Andrei V. Malevsky, {\em P-SPARSLIB: A Portable
683
	Library of Distributed Memory Sparse Iterative Solvers},
684
	University of Minnesota, Department of Computer Science, May 1995.
685

686
	\bibitem{Kum94}
687
	Vipin Kumar, Ananth Grama, Anshul Gupta and George Karypis,
688
	{\em Introduction to Parallel Computing, Design and Analysis
689
	of Algorithms}, The Benjamin/Cummings Publishing Company Inc.,
690
	1994.
691

692
	\bibitem{Fre96}
693
	Roland W. Freund and No\"el Nachtigal, {\em QMRPACK: A Package
694
	of QMR Algorithms}, ACM Transactions on Mathematical Software,
695
	Vol 22, No. 1, pp. 46-77, March 1996
696

697
\end{thebibliography}
698

699
\label{page:last}
700
\end{document}
701

702