Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
United States General Accounting Office
6
GAO
7
October 2002 External Version 1
8
9
10
Assessing the Reliability of Computer-Processed Data
11
a
12
Contents
13
Preface
14
Section 1: Introduction
15
Section 2: Understanding Data Reliability
16
17
Assessment
18
Page i GAO-03-273G Assessing Reliabillity
19
23
20
Section 8: Conducting
21
22
Tracing to and from Source
23
Documents 24 Using Advanced Electronic Testing 25 Reviewing
24
Selected System Controls 26 Using Data of Undetermined Reliability
25
27
26
28
27
Section 9: Making the
28
29
Sufficiently Reliable Data 29 Not
30
Sufficiently Reliable Data 29 Data of Undetermined Reliability
31
30
32
31
33
Section 10: Including
34
35
Sufficiently Reliable Data 31
36
Not Sufficiently Reliable Data 31 in the Report Data of
37
Undetermined Reliability 32
38
Glossary of Technical Terms
39
Figure 1:
40
Figures
41
Figure 2:
42
Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: Factors to
43
Consider in Making the Decision on Using the Data 1 Decision
44
Process for Determining If a Data Reliability Assessment Is
45
Required 7 Data Reliability Assessment Process 13 The First Steps
46
of the Assessment 14 The Preliminary Assessment 19 Choosing and
47
Conducting Additional Work 23 Making the Final Assessment 28
48
49
Preface
50
Computer-processed data, often from external sources,
51
increasingly underpin audit reports, including evaluations
52
(performance audits) and financial audits. Therefore, the
53
reliability of such data has become more and more important.
54
Historically, computer-processed data have been treated as unique
55
evidence. However, these data are simply one form of evidence
56
relied on, although they may require more technical assessment than
57
other forms of evidence. In addition, the very nature of the
58
information system creating the data allows opportunities for
59
errors to be introduced by many people.
60
This guidance is intended to demystify the assessment of
61
computerprocessed data. It supplements GAO's "Yellow Book"
62
(Government Auditing Standards, 1994 Revision), which defines the
63
generally accepted government auditing standards (GAGAS), and
64
replaces the earlier GAO guidance, Assessing the Reliability of
65
Computer-Processed Data (GAO/OP-8.1.3, Sept. 1990).
66
For all types of evidence, various tests are used-sufficiency,
67
competence, and relevance-to assess whether the evidence standard
68
is met. You probably have been using these tests for years and have
69
become quite proficient at them. But because assessing
70
computer-processed data requires more technical tests, it may
71
appear that such data are subject to a higher standard of testing
72
than other evidence. That is not the case. For example, many of the
73
same tests of sufficiency and relevance are applied to other types
74
of evidence. But in assessing computer-processed data, the focus is
75
on one test in the evidence standard-competence-which includes
76
validity and reliability. Reliability, in turn, includes the
77
completeness and accuracy of the data.
78
This guidance, therefore, provides a flexible, risk-based
79
framework for data reliability assessments that can be geared to
80
the specific circumstances of each engagement. The framework also
81
provides a structure for planning and reporting, facilitates
82
bringing the right mix of skills to each engagement, and ensures
83
timely management buy-in on assessment strategies. The framework is
84
built on
85
86
87
88
making use of all existing information about the
89
data,
90
91
92
93
performing at least a minimal level of data
94
testing,
95
96
97
98
doing only the amount of work necessary to determine
99
whether the data are reliable enough for our purposes,
100
101
102
103
maximizing professional judgment, and
104
105
106
107
bringing the appropriate people, including management, to
108
the table at key decision points.
109
110
111
The ultimate goal of the data reliability assessment is to
112
determine whether you can use the data for your intended purposes.
113
This guidance is designed to help you make an appropriate,
114
defensible assessment in the most efficient manner. With any
115
related questions, call Barbara Johnson, focal point for data
116
reliability issues, at (202) 512-3663, or Barry Seltser, the Acting
117
Director of GAO's Center for Design, Methods, and Analysis, at
118
(202) 512-3234.
119
120
Nancy Kingsbury
121
Managing Director, Applied Research and Methods
122
Section 1: Introduction
123
This guidance explains what data reliability means and provides
124
a framework for assessing the reliability of computer-processed
125
data. It begins with the steps in a preliminary assessment, which,
126
in many cases, may be all you need to do to assess reliability.
127
This guidance also helps you decide whether you should follow up
128
the preliminary assessment with additional work. If so, it explains
129
the steps in a final assessment and the actions to take, depending
130
on the results of your additional work. The ultimate goal in
131
determining data reliability is to make the following decision: For
132
our engagement, can we use the data to answer the research
133
question? See figure 1 for an overview of the factors that help to
134
inform that decision. Not all of these factors may be necessary for
135
all engagements.
136
Figure 1: Factors to Consider in Making the Decision on Using
137
the Data
138
139
Source: GAO.
140
In addition, this guidance discusses suggested
141
language-appropriate under different circumstances-for reporting
142
the results of your assessment. Finally, it provides detailed
143
descriptions of all the stages of the assessment, as well as a
144
glossary of technical terms used (see p. 33). An on-line version of
145
this guidance, which will include tools that may help you in
146
assessing reliability, is currently being developed. The overall
147
process is illustrated in figures 2 (p. 7) and 3 (p. 13).
148
Section 2: Understanding Data Reliability
149
Data reliability refers to the accuracy and completeness of
150
computerprocessed data, given the intended purposes for use.
151
Computer-processed data include data (1) entered into a computer
152
system and (2) resulting from computer processing.
153
Computer-processed data can vary in form-from electronic files to
154
tables in published reports. The definition of computerprocessed
155
data is therefore broad. In this guidance, the term data always
156
refers to computer-processed data.
157
The "Yellow Book" requires that a data reliability assessment be
158
performed for all data used as support for engagement findings,
159
conclusions, or recommendations.1 This guidance will help you to
160
design a data reliability assessment appropriate for the purposes
161
of the engagement and then to evaluate the results of the
162
assessment.
163
Data are reliable when they are (1) complete (they contain all
164
of the data elements and records needed for the engagement)2 and
165
(2) accurate (they reflect the data entered at the source or, if
166
available, in the source documents). A subcategory of accuracy is
167
consistency. Consistency refers to the need to obtain and use data
168
that are clear and well-defined enough to yield similar results in
169
similar analyses. For example, if data are entered at multiple
170
sites, inconsistent interpretation of data rules can lead to data
171
that, taken as a whole, are unreliable. Reliability also means that
172
for any computer processing of the data elements used, the results
173
are reasonably complete and accurate, meet your intended purposes,
174
and are not subject to inappropriate alteration.
175
Assessments of reliability should be made in the broader context
176
of the particular characteristics of the engagement and the risk
177
associated with the possibility of using data of insufficient
178
reliability. Reliability does not mean that computer-processed data
179
are error-free. Errors are considered acceptable under these
180
circumstances: You have assessed the associated risk and found the
181
errors are not significant enough to cause a reasonable person,
182
aware of the errors, to doubt a finding, conclusion, or
183
recommendation based on the data.
184
1U.S. General Accounting Office, Government Auditing Standards,
185
GAO/OGC-94-4(Washington, D.C.: June 1994), pp.
186
62-87.
187
2A data element is a unit of information with definable
188
parameters (for example, a Social Security number), sometimes
189
referred to as a data variable or data field.
190
Page 3 GAO-03-273G Assessing Reliability
191
While this guidance focuses only on the reliability of data in
192
terms of accuracy and completeness, other data quality
193
considerations are just as important. In particular, you should
194
also consider the validity of data. Validity (as used here) refers
195
to whether the data actually represent what you think is being
196
measured. For example, if a data field is named "annual evaluation
197
score," is this an appropriate measure of a person's job
198
performance? Considerations of data validity and reliability issues
199
should be addressed early in the engagement, and appropriate
200
technical specialists-such as data analysts, statisticians, or
201
information technology specialists-should be consulted.
202
203
204
Section 3: Deciding If a Data Reliability Assessment Is
205
Necessary
206
To decide if a data reliability assessment is necessary, you
207
should consider certain conditions. The engagement type and planned
208
use of the data help to determine when you should assess data
209
reliability. See figure 2 for an illustration of the decision
210
process that you should use.
211
Figure 2: Decision Process for Determining If a Data Reliability
212
Assessment Is Required
213
214
Source: GAO.
215
Conditions Requiring a Data Reliability Assessment
216
You should assess reliability if the data to be analyzed are
217
intended to support the engagement findings, conclusions, or
218
recommendations. Keep in mind that a finding may include only a
219
description of the condition, as in a purely descriptive report. In
220
the audit plan for the engagement, you should include a brief
221
discussion of how you plan to assess data reliability, as well as
222
any limitations that may exist due to shortcomings in the data.
223
Conditions Not Requiring a Data Reliability Assessment
224
You do not need to assess reliability if the data are used (1)
225
only as background information or (2) in documents without
226
findings, conclusions, or recommendations. Background information
227
generally sets the stage for reporting the results of an engagement
228
or provides information that puts the results in proper context.
229
Such information could be the size of the program or activity you
230
are reviewing, for example. When you gather background or other
231
data, ensure that they are from the best available source(s). When
232
you present the data, cite the source(s) and state that the data
233
were not assessed.
234
Sometimes, as a best practice, however, you may want to do some
235
assessment of background data. Your judgment of the data's
236
importance and the reliability of the source, as well as other
237
engagement factors, can help you determine the extent of such an
238
assessment.
239
Finally, for financial audits and information system reviews,
240
you should not follow this guidance in assessing data reliability.
241
For financial audits, which include financial statement and
242
financial-related audits, you should follow the GAO/PCIE Financial
243
Audit Manual (FAM) and the Federal Information System Controls
244
Audit Manual (FISCAM). In an information system review, all
245
controls in a computer system, for the full range of application
246
functions and products, are assessed and tested. Such a review
247
includes (1) examining the general and application controls of a
248
computer system,3 (2) testing whether those controls are being
249
complied with, and
250
(3) testing data produced by the system.4 To design such a
251
review, appropriate to the research question, seek assistance from
252
information technology specialists.
253
3General controls refers to the structure, policies, and
254
procedures-which apply to all or a large segment of an
255
organization's information systems-that help to ensure proper
256
operation, data integrity, and security. Application controls
257
refers to the structure, policies, and procedures that apply to
258
individual application systems, such as inventory or payroll.
259
4Guidance for carrying out reviews of general and application
260
controls is provided in the
261
U.S. General Accounting Office, Federal Information System
262
Controls Audit Manual,
263
GAO/AIMD-12.19.6(Washington, D.C.: Jan. 1999).
264
265
266
Section 4: Performing a Data Reliability Assessment
267
Timing the Assessment
268
To perform a data reliability assessment, you need to decide on
269
the timing-when to perform the assessment-and how to document
270
it.
271
A data reliability assessment should be performed as early as
272
possible in the engagement process, preferably during the design
273
phase. The audit plan should reflect data reliability issues and
274
any additional steps that still need to be performed to assess the
275
reliability of critical data. The engagement team generally should
276
not finalize the audit plan or issue a commitment letter until it
277
has done initial testing and reviewed existing information about
278
the data and the system that produces the data. In addition, the
279
team should not commit to making conclusions or recommendations
280
based on the data unless the team expects to be satisfied with the
281
data reliability.
282
Documenting the Assessment
283
All work performed as part of the data reliability assessment
284
should be documented and included in the engagement workpapers.
285
This includes all testing, information review, and interviews
286
related to data reliability. In addition, decisions made during the
287
assessment, including the final assessment of whether the data are
288
sufficiently reliable for the purposes of the engagement, should be
289
summarized and included with the workpapers. These workpapers
290
should be (1) clear about what steps the team took and what
291
conclusions they reached and (2) reviewed by staff with appropriate
292
skills or, if needed, technical specialists.
293
294
295
Section 5: Viewing the Entire Assessment Process
296
The ultimate goal of the data reliability assessment is to
297
determine whether you can use the data to answer the research
298
question. The assessment should be performed only for those
299
portions of the data that are relevant to the engagement. The
300
extensiveness of the assessment is driven by
301
302
303
304
the expected significance of the data to the final
305
report,
306
307
308
309
the anticipated risk level of using the data,
310
and
311
312
313
314
the strength or weakness of any corroborating
315
evidence.
316
317
318
Therefore, the specific assessment process should take into
319
account these factors along with what is learned during the initial
320
stage of the assessment. The process is likely to be different for
321
each engagement.
322
The overall framework of the process for data reliability
323
assessment is shown in figure 3. The framework identifies several
324
key stages in the assessment, as well as actions and decisions
325
expected as you move through the process. The framework allows you
326
to identify the appropriate mix of assessment steps to fit the
327
particular needs of your engagement. In most cases, all of the
328
elements in figure 3 would not be necessary in completing the
329
assessment. Specific actions for each stage are discussed in
330
sections 6-10.
331
Figure 3: Data Reliability Assessment Process
332
333
Source: GAO.
334
335
336
Section 6: Taking the First Steps
337
Reviewing Existing Information
338
The data reliability process begins with two relatively simple
339
steps. These steps provide the basis for making a preliminary
340
assessment of data reliability: (1) a review of related information
341
and (2) initial testing (see figure 4). In some situations, you may
342
have an extremely short time frame for the engagement; this section
343
also provides some advice for this situation.
344
The time required to review related information and perform
345
initial testing will vary, depending on the engagement and the
346
amount of risk involved. As discussed in section 4, these steps
347
should take place early in the engagement and include the team
348
members, as well as appropriate technical staff.
349
Figure 4: The First Steps of the Assessment
350
351
Source: GAO.
352
The first step-a review of existing information-helps you to
353
determine what is already known about the data and the computer
354
processing. The related information you collect can indicate both
355
the accuracy and completeness of the entry and processing of the
356
data, as well as how data integrity is maintained. This information
357
can be in the form of reports, studies, or interviews with
358
individuals who are knowledgeable about the data and the system.
359
Sources for related information include GAO, the agency under
360
review, and others.
361
GAO GAO may already have related information in reports. Those
362
from fiscal year 1995 to the present are available via GAO's
363
Internet site. This site also provides other useful information:
364
for example, as part of the annual governmentwide consolidated
365
financial audit, GAO's Information Technology Team is involved with
366
reporting on the effectiveness of controls for financial
367
information systems at 24 major federal agencies.
368
Agency under Review
369
Officials of the agency or entity under review are aware of
370
evaluations of their computer data or systems and usually can
371
direct you to both. However, keep in mind that information from
372
agency officials may be biased. Consider asking appropriate
373
technical specialists to help in evaluating this information.
374
Agency information includes Inspector General reports, Federal
375
Managers' Financial Integrity Act reports, Government Performance
376
and Results Act (GPRA) plans and reports, Clinger-Cohen Act
377
reports, and Chief Information Officer reports. (Some of this
378
information can be found in agency homepages on the Web.)
379
Others Other organizations and users of the data may be sources
380
of relevant information. To help you identify these sources, you
381
can use a variety of databases and other research tools, which
382
include the Congressional Research Service Public Policy Literature
383
Abstracts and organizations' Web sites.
384
Performing Initial Testing
385
The second step-initial testing-can be done by applying logical
386
tests to electronic data files or hard copy reports. For electronic
387
data, you use computer programs to test all entries of key data
388
elements in the entire data file.5 Keep in mind that you only test
389
those data elements you plan to use for the engagement. You will
390
find that testing with computer programs often takes less than a
391
day, depending on the complexity of the file. For
392
5 Though an in-depth discussion of quality-assurance practices
393
to be used in electronic testing and analyses is beyond the scope
394
of this guidance, it is important to perform appropriate checks to
395
ensure that you have obtained the correct file. All too often,
396
analysts receive an incorrect file (an early version or an
397
incomplete file). Appropriate steps would include counting records
398
and comparing totals with the responsible agency or entity.
399
Page 15 GAO-03-273G Assessing Reliability
400
Dealing with Short Time Frames
401
hard copy or summarized data-provided by the audited entity or
402
retrieved from the Internet-you can ask for the electronic data
403
file used to create the hard copy or summarized data. If you are
404
unable to obtain electronic data, use the hard copy or summarized
405
data and, to the extent possible, manually apply the tests to all
406
instances of key data elements or, if the report or summary is
407
voluminous, to a sample of them.
408
Whether you have an electronic data file or a hard copy report
409
or summary, you apply the same types of tests to the data. These
410
can include testing for
411
412
413
414
missing data, either entire records or values of key data
415
elements;
416
417
418
419
the relationship of one data element to
420
another;
421
422
423
424
values outside of a designated range; and
425
426
427
428
dates outside valid time frames or in an illogical
429
progression.
430
431
432
Be sure to keep a log of your testing for inclusion in the
433
engagement workpapers.
434
In some instances, the engagement may have a time frame that is
435
too short for a complete preliminary assessment, for example, a
436
request for testimony in 2 weeks. However, given that all
437
engagements are a function of time, as well as scope and resources,
438
limitations in one require balancing the others.
439
Despite a short time frame, you may have time to review existing
440
information and carry out testing of data that are critical for
441
answering a research question, for example: You can question
442
knowledgeable agency staff about data reliability or review
443
existing GAO or Inspector General reports to quickly gather
444
information about data reliability issues. In addition, electronic
445
testing of critical data elements for obvious errors of
446
completeness and accuracy can generally be done in a short period
447
of time on all but the most complicated or immense files. From that
448
review and testing, you will be able to make a more informed
449
determination about whether the data are sufficiently reliable to
450
use for the purposes of the engagement. (See sections 7 and 8 for
451
the actions to take, depending on your determination.)
452
453
454
Section 7: Making the Preliminary Assessment
455
Factors to Consider in the Assessment
456
The preliminary assessment is the first decision point in the
457
assessment process, including the consideration of multiple
458
factors, a determination of the sufficiency of the data reliability
459
with what is known at this point, and a decision about whether
460
further work is required. You will decide whether the data are
461
sufficiently reliable for the purposes of the engagement, not
462
sufficiently reliable, or as yet undetermined. Keep in mind that
463
you are not attesting to the overall reliability of the data or
464
database. You are only determining the reliability of the data as
465
needed to support the findings, conclusions, or recommendations of
466
the engagement. As you gather information and make your judgments,
467
consult appropriate technical specialists for assistance.
468
To make the preliminary assessment of the sufficiency of the
469
data reliability for the engagement, you should consider all
470
factors related to aspects of the engagement, as well as assessment
471
work performed to this point. As shown in figure 5, these factors
472
include
473
474
475
476
the expected significance of the data in the final
477
report,
478
479
480
481
corroborating evidence,
482
483
484
485
level of risk, and
486
487
488
489
the results of initial assessment work.
490
491
492
Figure 5: The Preliminary Assessment
493
494
Source: GAO.
495
Expected Significance of In making the preliminary assessment,
496
consider the data in the context of the final report: Will the
497
engagement team depend on the data alone to
498
the Data in the Final Report
499
answer a research question? Will the data be summarized or will
500
detailed information be required? Is it important to have precise
501
data, making magnitude of errors an issue?
502
Corroborating Evidence You should consider the extent to which
503
corroborating evidence is likely to exist and will independently
504
support your findings, conclusions, or recommendations.
505
Corroborating evidence is independent evidence that supports
506
information in the database. Such evidence, if available, can be
507
found in the form of alternative databases or expert views. It is
508
unique to each engagement, and its
509
strength-persuasiveness-varies.
510
For help in deciding the strength or weakness of corroborating
511
evidence, consider the extent to which the corroborating
512
evidence
513
514
515
516
is consistent with the "Yellow Book" standards of
517
evidence-sufficiency, competence, and relevance;
518
519
520
521
provides crucial support;
522
523
524
Level of Risk
525
526
527
528
is drawn from different types of sources-testimonial,
529
documentary, physical, or analytical; and
530
531
532
533
534
is independent of other sources.
535
536
Risk is the likelihood that using data of questionable
537
reliability could have significant negative consequences on the
538
decisions of policymakers and others. To do a risk assessment,
539
consider the following risk conditions:
540
541
542
543
The data could be used to influence legislation, policy,
544
or a program that could have significant impact.
545
546
547
548
The data could be used for significant decisions by
549
individuals or organizations with an interest in the
550
subject.
551
552
553
554
The data will be the basis for numbers that are likely to
555
be widely quoted, for example, "In 1999, the United States owed the
556
United Nations about $1.3 billion for the regular and peacekeeping
557
budgets."
558
559
560
561
The engagement is concerned with a sensitive or
562
controversial subject.
563
564
565
566
The engagement has external stakeholders who have taken
567
positions on the subject.
568
569
570
571
The overall engagement risk is medium or high.
572
573
574
575
The engagement has unique factors that strongly increase
576
risk.
577
578
579
Bear in mind that any one of the conditions may have more
580
importance than another, depending on the engagement.
581
Results of Initial Assessment Work
582
At this point, as shown in figure 5 (p. 19), the team will
583
already have performed the initial stage of the data reliability
584
assessment. They should have the results from the (1) review of all
585
available existing information about the data and the system that
586
produced them and (2) initial testing of the critical data
587
elements. These results should be appropriately documented and
588
reviewed before the team enters into the decision-making phase of
589
the preliminary assessment. Because the results will, in whole or
590
in part, provide the evidence that the data are sufficiently
591
reliable-and therefore competent enough-or not sufficiently
592
reliable for the purposes
593
Outcomes to Consider in the Assessment
594
of the engagement, the workpapers should include documentation
595
of the process and results.
596
The results of your combined judgments of the strength of
597
corroborating evidence and degree of risk suggest different
598
assessments. If the corroborating evidence is strong and the risk
599
is low, the data are more likely to be considered sufficiently
600
reliable for your purposes. If the corroborating evidence is weak
601
and the risk is high, the data are more likely to be considered not
602
sufficiently reliable for your purposes. The overall assessment is
603
a judgment call, which should be made in the context of discussion
604
with team management and technical specialists.
605
The preliminary assessment categorizes the data as sufficiently
606
reliable, not sufficiently reliable, or of undetermined
607
reliability. Each category has implications for the next steps of
608
the data reliability assessment.
609
When to Assess Data as Sufficiently Reliable for Engagement
610
Purposes
611
You can assess the data as sufficiently reliable for engagement
612
purposes when you conclude the following: Both the review of
613
related information and the initial testing provide assurance that
614
(1) the likelihood of significant errors or incompleteness is
615
minimal and (2) the use of the data would not lead to an incorrect
616
or unintentional message. You could have some problems or
617
uncertainties about the data, but they would be minor, given the
618
research question and intended use of the data. When the
619
preliminary assessment indicates that the data are sufficiently
620
reliable, use the data.
621
When to Assess Data as Not Sufficiently Reliable for Engagement
622
Purposes
623
You can assess the data as not sufficiently reliable for
624
engagement purposes when you conclude the following: The review of
625
related information or initial testing indicates that (1)
626
significant errors or incompleteness exist in some or all of the
627
key data elements and (2) using the data would probably lead to an
628
incorrect or unintentional message.
629
When the preliminary assessment indicates that the data are not
630
sufficiently reliable, you should seek evidence from other sources,
631
including (1) alternative computerized data-the reliability of
632
which you should also assess-or (2) original data in the form of
633
surveys, case studies, or expert interviews.
634
When to Assess Data as of Undetermined Reliability and Consider
635
Additional Work
636
You should coordinate with the requester if seeking evidence
637
from other sources does not result in a source of sufficiently
638
reliable data. Inform the requester that such data, needed to
639
respond to the request, are unavailable. Reach an agreement with
640
the requester to
641
642
643
644
redefine the research questions to eliminate the need to
645
use the data,
646
647
648
649
end the engagement, or
650
651
652
653
use the data with appropriate disclaimers.
654
655
656
Remember that you-not the requester-are responsible for deciding
657
what data to use. If you decide you must use data that you have
658
determined are not sufficiently reliable for the purposes of the
659
engagement, make the limitations of the data clear, so that
660
incorrect or unintentional conclusions will not be drawn. Finally,
661
given that the data you assessed have serious reliability
662
weaknesses, you should include this finding in the report and
663
recommend that the agency take corrective action.
664
You can assess the data as of undetermined reliability when you
665
conclude one of the following:
666
667
668
669
The review of some of the related information or initial
670
testing raises questions about the data's reliability.
671
672
673
674
The related information or initial testing provides too
675
little information to judge reliability.
676
677
678
679
The time or resource constraints limit the extent of the
680
examination of related information or initial testing.
681
682
683
When the preliminary assessment indicates that the reliability
684
of the data is undetermined, consider doing additional work to
685
determine reliability. Section 8 provides guidance on the types of
686
additional work to consider, as well as suggestions if no
687
additional work is feasible.
688
689
690
Section 8: Conducting Additional Work
691
When you have determined (through the preliminary assessment)
692
that the data are of undetermined reliability, consider conducting
693
additional work (see figure 6). A range of additional steps to
694
further determine data reliability includes tracing to and from
695
source documents, using advanced electronic testing, and reviewing
696
selected system controls. The mix depends on what weaknesses you
697
identified in the preliminary assessment and the circumstances
698
specific to your engagement, such as risk level and corroborating
699
evidence, as well as other factors. Focus particularly on those
700
aspects of the data that pose the greatest potential risk for your
701
engagement. You should get help from appropriate technical
702
specialists to discuss whether additional work is required and to
703
carry out any part of the additional reliability assessment.
704
Figure 6: Choosing and Conducting Additional Work
705
706
Source: GAO.
707
Tracing to and from Source Documents
708
Tracing a sample of data records to source documents helps you
709
to determine whether the computer data accurately and completely
710
reflect these documents. In deciding what and how to trace,
711
consider the relative risks to the engagement of overstating or
712
understating the conclusions drawn from the data, for example: On
713
the one hand, if you are particularly concerned that questionable
714
cases might not have been entered into the computer system and that
715
as a result, the degree of compliance may be overstated, you should
716
consider tracing from source documents to the database. On the
717
other hand, if you are more concerned that ineligible cases have
718
been included in the database and that as a result, the potential
719
problems may be understated, you should consider tracing from the
720
database back to source documents.
721
The reason to trace only a sample is because sampling saves time
722
and cost. To be useful, however, the sample should be random and
723
large enough to estimate the error rate within reasonable levels of
724
precision. Tracing a random sample will provide the error rate and
725
the magnitude of errors for the entire data file. It is this error
726
rate that helps you to determine the data reliability. Generally,
727
every data file will have some degree of error (see example 1 for
728
error rate and example 2 for magnitude of errors). Consult
729
statisticians to assist you in selecting the sampling method most
730
suited to the engagement.
731
Example 1: According to a random sample, 10 percent of the data
732
records have incorrect dates. However, the dates may be off by an
733
average of only 3 days. Depending on what the data are used for, 3
734
days may not compromise reliability.
735
Example 2: The value of a data element was incorrectly entered
736
as $100,000, rather than $1,000,000. The documentation of the
737
database shows that the acceptable range for this data element is
738
between $100 and $5,000,000. Therefore, the electronic testing done
739
in the initial testing phase would have confirmed that the value of
740
$100,000 fell within that range. In this case, the error could be
741
caught, not by electronic testing, but only by tracing the data to
742
source documents.
743
Tracing to Source Documents
744
Consider tracing to source documents when (1) the source
745
documents are available relatively easily or (2) the possible
746
magnitude of errors is especially critical.
747
To trace a sample to source documents, match the entered data
748
with the corresponding data in the source documents. But in
749
attempting to trace entered data back to source documents, several
750
problems can arise: Source documents may not be available because
751
they were destroyed, were never created, or are not centrally
752
located.
753
Several options exist if source documents are not available. For
754
those documents never created-for example, when data may be based
755
on electronic submissions-use interviews to obtain related
756
information, any corroborating evidence obtained earlier, or a
757
review of the adequacy of system controls.
758
Tracing from Source Documents
759
Consider tracing from source documents, instead of or in
760
addition to tracing a sample to source documents, when you have
761
concerns that the data are not complete. To trace a sample from
762
source documents, match the source documents with the entered data.
763
Such tracing may be appropriate to determine whether all data are
764
completely entered. However, if source documents were never created
765
or are now missing, you cannot identify the missing data.
766
Using Advanced Electronic Testing
767
Advanced electronic testing goes beyond the basic electronic
768
testing that you did in initial testing (see section 5). It
769
generally requires specialized computer programs to test for
770
specific conditions in the data. Such testing can be particularly
771
helpful in determining the accuracy and completeness of processing
772
by the application system that produced the data. Consider using
773
advanced electronic testing for
774
• following up on troubling aspects of the data-such as
775
extremely high values associated with a certain geographic
776
location-found in initial testing or while analyzing the data;
777
Reviewing Selected System Controls
778
779
780
781
testing relationships-cross-tabulation-between data
782
elements, such as whether data elements follow a skip pattern from
783
a questionnaire; and
784
785
786
787
verifying that computer processing is accurate and
788
complete, such as testing a formula used in generating specific
789
data elements.
790
791
792
Depending on what will be tested, this testing can require a
793
range of programming skills-from creating cross-tabulations on
794
related data elements to duplicating an intricate automated process
795
with more advanced programming techniques. Consult appropriate
796
technical specialists, as needed.
797
Your review of selected system controls-the underlying
798
structures and processes of the computer in which the data are
799
maintained-can provide some assurance that the data are
800
sufficiently reliable. Examples of system controls are limits on
801
access to the system and edit checks on data entered into the
802
system. Controls can reduce, to an acceptable level, the risk that
803
a significant mistake could occur and remain undetected and
804
uncorrected. Limit the review to evaluating the specific controls
805
that can most directly affect the reliability of the data in
806
question. Choose areas for review on the basis of what is known
807
about the system. Sometimes, you identify potential system control
808
problems in the initial steps of the assessment. Other times, you
809
learn during the preliminary assessment that source documents are
810
not readily available. Therefore, a review of selected system
811
controls is the best method to determine if data were entered
812
reliably. If needed, consult information system auditors for help
813
in evaluating general and application controls.
814
Using what you know about the system, concentrate on evaluating
815
the controls that most directly affect the data. These controls
816
will usually include (1) certain general controls, such as logical
817
access and control of changes to the data, and (2) the application
818
controls that help to ensure that the data are accurate and
819
complete, as well as authorized.
820
The steps for reviewing selected system controls are
821
822
823
824
gain a detailed understanding of the system as it relates
825
to the data and
826
827
828
829
identify and assess the application and general controls
830
that are critical to ensuring the reliability of the data required
831
for the engagement.
832
833
834
In some situations, it may not be feasible to perform any
835
additional work,
836
Using Data of
837
838
for example, when (1) given a short
839
time frame (too short for a complete assessment), (2) original
840
computer files have been deleted, or (3) access to Reliability
841
needed documents is unavailable. See section 9 for how to
842
proceed.
843
844
845
Section 9: Making the Final Assessment
846
During the final assessment, you should consider the results of
847
all your previous work to determine whether, for your intended use,
848
the data are sufficiently reliable, not sufficiently reliable, or
849
still undetermined. Again, remember that you are not attesting to
850
the reliability of the data or database. You are only determining
851
the sufficiency of the reliability of the data for your intended
852
use. The final assessment will help you decide what actions to take
853
(see figure 7).
854
Figure 7: Making the Final Assessment
855
856
Source: GAO.
857
The following are some considerations to help you decide whether
858
you can use the data:
859
860
861
862
The corroborating evidence is strong.
863
864
865
866
The degree of risk is low.
867
868
869
870
The results of additional assessment (1) answered issues
871
raised in the preliminary assessment and (2) did not raise any new
872
questions.
873
874
875
876
The error rate, in tracing to or from source documents,
877
did not compromise reliability.
878
879
880
In making this assessment, you should consult with appropriate
881
technical specialists.
882
You can consider the data sufficiently reliable when you
883
conclude the following: On the basis of the additional work, as
884
well as the initial assessment work, using the data would not
885
weaken the analysis nor lead to an incorrect or unintentional
886
message. You could have some problems or uncertainties about the
887
data, but they would be minor, given the research question and
888
intended use of the data. When your final assessment indicates that
889
the data are reliable, use the data.
890
Sufficiently Reliable Data
891
Not Sufficiently Reliable Data
892
You can consider the data to be not sufficiently reliable when
893
you conclude the following: On the basis of information drawn from
894
the additional assessment, as well as the preliminary assessment,
895
(1) using the data would most likely lead to an incorrect or
896
unintentional message and (2) the data have significant or
897
potentially significant limitations, given the research question
898
and intended use of the data.
899
When you determine that the data are not sufficiently reliable,
900
you should inform the requester that sufficiently reliable data,
901
needed to respond to the request, are unavailable. Remember that
902
you-not the requester-are responsible for deciding what data to
903
use. Although the requester may want information based on
904
insufficiently reliable data, you are responsible for ensuring that
905
data are used appropriately to respond to the requester. If you
906
decide to use the data for the report, make the limitations of the
907
data clear, so that incorrect or unintentional conclusions will not
908
be arrived at. Appropriate team management should be consulted
909
before you agree to use data that are not sufficiently
910
reliable.
911
Finally, given that the data you assessed have serious
912
reliability weaknesses, you should include this finding in the
913
report and recommend that the agency take corrective action.
914
Data of Undetermined Reliability
915
You can consider the data to be of undetermined reliability when
916
you conclude the following: On the basis of the information drawn
917
from any additional work, as well as the preliminary assessment,
918
(1) use of the data could lead to a incorrect or unintentional
919
message and (2) the data have significant or potentially
920
significant limitations, given the research question and the
921
intended use. You can consider the data to be of undetermined
922
reliability if specific factors-such as short time frames, the
923
deletion of original computer files, and the lack of access to
924
needed documents-are present. If you decide to use the data, make
925
the limitations of the data clear, so that incorrect or
926
unintentional conclusions will not be arrived at.
927
As noted above in the case of not sufficiently reliable data,
928
when you determine that the data are of undetermined reliability,
929
you should inform the requester-if appropriate-that sufficiently
930
reliable data, needed to respond to the request, are unavailable.
931
Remember that you-not the requester-are responsible for deciding
932
what data to use. Although the requester may want information based
933
on data of undetermined reliability, you are responsible for
934
ensuring that appropriate data are used to respond to the
935
requester. If you decide to use the data in your report, make the
936
limitations clear, so that incorrect or unintentional conclusions
937
will not be arrived at. Appropriate team management should be
938
consulted before you agree to use data of undetermined
939
reliability.
940
941
942
Section 10: Including Appropriate Language in the Report
943
Sufficiently Reliable Data
944
In the report, you should include a statement in the methodology
945
section about conformance to generally accepted government auditing
946
standards (GAGAS). These standards refer to how you did your work,
947
not how reliable the data are. Therefore, you are conforming to
948
GAGAS as long as, in reporting, you discuss what you did to assess
949
the data; disclose any data concerns; and reach a judgment about
950
the reliability of the data for use in the report.
951
Furthermore, in the methodology section, include a discussion of
952
your assessment of data reliability and the basis for this
953
assessment. The language in this discussion will vary, depending on
954
whether the data are sufficiently reliable, not sufficiently
955
reliable, or of undetermined reliability. In addition, you may need
956
to discuss the reliability of the data in other sections of the
957
report. Whether you do so depends on the importance of the data to
958
the message.
959
Present your basis for assessing the data as sufficiently
960
reliable, given the research questions and intended use of the
961
data. This presentation includes (1) noting what kind of assessment
962
you relied on, (2) explaining the steps in the assessment, and (3)
963
disclosing any data limitations. Such disclosure includes
964
965
966
967
telling why using the data would not lead to an incorrect
968
or unintentional message,
969
970
971
972
explaining how limitations could affect any expansion of
973
the message, and
974
975
976
977
pointing out that any data limitations are minor in the
978
context of the engagement.
979
980
981
Present your basis for assessing the data as not sufficiently
982
reliable, given
983
Not Sufficiently
984
985
the research questions and intended
986
use of the data. This presentation should include what kind of
987
assessment you relied on, with an explanation of the steps in the
988
assessment.
989
Data of Undetermined Reliability
990
In this explanation, (1) describe the problems with the data, as
991
well as why using the data would probably lead to an incorrect or
992
unintentional message, and (2) state that the data problems are
993
significant or potentially significant. In addition, if the report
994
contains a conclusion or recommendation supported by evidence other
995
than these data, state that fact. Finally, if the data you assessed
996
are not sufficiently reliable, you should include this finding in
997
the report and recommend that the audited entity take corrective
998
action.
999
Present your basis for assessing the reliability of the data as
1000
undetermined. Include such factors as short time frames, the
1001
deletion of original computer files, and the lack of access to
1002
needed documents. Explain the reasonableness of using the data, for
1003
example: These are the only available data on the subject; the data
1004
are widely used by outside experts or policymakers; or the data are
1005
supported by credible corroborating evidence. In addition, make the
1006
limitations of the data clear, so that incorrect or unintentional
1007
conclusions will not be drawn from the data. For example, indicate
1008
how the use of these data could lead to an incorrect or
1009
unintentional message. Finally, if the report contains a conclusion
1010
or recommendation supported by evidence other than these data,
1011
state that fact.
1012
1013
1014
Glossary of Technical Terms
1015
accuracy. Freedom from error in the data.
1016
completeness. The inclusion of all necessary parts or
1017
elements.
1018
database. A collection of related data files (for example,
1019
questionnaire responses from several different groups of people,
1020
with each group's identity maintained.)
1021
data element. An individual piece of information that has
1022
definable parameters, sometimes referred to as variables or fields
1023
(for example, the response to any question in a questionnaire).
1024
data file. A collection of related data records, also referred
1025
to as a data set (for example, the collected questionnaire
1026
responses from a group of people).
1027
data record. A collection of related data elements that relate
1028
to a specific event, transaction, or occurrence (for example,
1029
questionnaire responses about one individual-such as age, sex, and
1030
marital status).
1031
source document. Information that is the basis for entry of data
1032
into a computer.
1033
GAO's Mission
1034
The General Accounting Office, the investigative arm of
1035
Congress, exists to support Congress in meeting its constitutional
1036
responsibilities and to help improve the performance and
1037
accountability of the federal government for the American people.
1038
GAO examines the use of public funds; evaluates federal programs
1039
and policies; and provides analyses, recommendations, and other
1040
assistance to help Congress make informed oversight, policy, and
1041
funding decisions. GAO's commitment to good government is reflected
1042
in its core values of accountability, integrity, and
1043
reliability.
1044
Obtaining Copies of GAO Reports and Testimony
1045
The fastest and easiest way to obtain copies of GAO documents at
1046
no cost is through the Internet. GAO's Web site (www.gao.gov)
1047
contains abstracts and fulltext files of current reports and
1048
testimony and an expanding archive of older products. The Web site
1049
features a search engine to help you locate documents using key
1050
words and phrases. You can print these documents in their entirety,
1051
including charts and other graphics.
1052
Each day, GAO issues a list of newly released reports,
1053
testimony, and correspondence. GAO posts this list, known as
1054
"Today's Reports," on its Web site daily. The list contains links
1055
to the full-text document files. To have GAO e-mail this list to
1056
you every afternoon, go to www.gao.gov and select "Subscribe to
1057
daily E-mail alert for newly released products" under the GAO
1058
Reports heading.
1059
Order by Mail or Phone
1060
The first copy of each printed report is free. Additional copies
1061
are $2 each. A check or money order should be made out to the
1062
Superintendent of Documents. GAO also accepts VISA and Mastercard.
1063
Orders for 100 or more copies mailed to a single address are
1064
discounted 25 percent. Orders should be sent to:
1065
U.S. General Accounting Office 441 G Street NW, Room LM
1066
Washington, D.C. 20548
1067
To order by Phone: Voice: (202) 512-6000 TDD: (202) 512-2537
1068
Fax: (202) 512-6061
1069
Contact:
1070
To Report Fraud, Web site: www.gao.gov/fraudnet/fraudnet.htm
1071
1072
E-mail: [email protected]
1073
Federal Programs Automated answering system: (800) 424-5454 or
1074
(202) 512-7470
1075
Jeff Nelligan, managing director, [email protected] (202)
1076
512-4800
1077
Public Affairs
1078
U.S. GeneralAccounting Office, 441 G Street NW, Room 7149
1079
Washington, D.C. 20548
1080
1081
Presorted Standard Postage & Fees Paid GAO Permit No.
1082
GI00
1083
United States General Accounting Office Washington, D.C.
1084
20548-0001
1085
Official Business Penalty for Private Use $300
1086
Address Service Requested
1087
1088
1089
1090
1091
1092