Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
Download
29547 views
1
2
3
4
5
Introduction
6
With the exception of counting deaths from all causes, a
7
common problem in clinical trials is the missing data
8
caused by patients who do not complete the study in full
9
schedule and drop out of the study without further
10
measurements. Possible reasons for patients dropping out of
11
the study (the so-called 'withdrawals') include death,
12
adverse reactions, unpleasant study procedures, lack of
13
improvement, early recovery, and other factors related or
14
unrelated to trial procedure and treatments. Missing data
15
in a study because of dropouts may cause the usual
16
statistical analysis for complete or available data to be
17
subject to a potential bias. This review attempts to raise
18
the awareness of the problem and to provide some general
19
guidance to clinical trial practitioners.
20
21
22
Examples
23
24
Example 1
25
A multicenter, randomized, double-blind, three
26
parallel groups trial to compare placebo, candesartan
27
ciltexetil and enalapril in patients with mild to
28
moderate essential hypertension [ 1 ] . The study
29
randomized 205 to treatment, however, only 178 patients
30
were evaluable by protocol at the end of an 8-week
31
treatment period. 'The remaining patients were excluded
32
from the analysis of blood pressure (BP) data because of
33
major protocol violations, poor compliance with medical
34
visits, or withdrawal because of adverse events.'
35
36
37
Example 2
38
A multicenter, randomized, open-label, parallel-design
39
study to compare the treatment effect of niacin and
40
atorvastatin (for 12 weeks) on lipoprotein subfractions
41
in patients with atherogenic dyslipidemia [ 2 ] . 'Of the
42
total 108 patients randomized to treatment, 12 withdrew
43
from the study. Of those who withdrew, nine were due to
44
adverse events, two were lost to follow-up, and one did
45
not return for the final visit.'
46
47
48
Example 3
49
A multicenter, randomized, double-blind,
50
placebo-controlled trial to assess treatment effect of
51
pimobendan on exercise capacity in patients with chronic
52
heart failure [ 3 ] . 'The primary pre-specified analysis
53
of exercise time was limited to those patients who had at
54
least the first follow up (four-week) exercise test
55
carried out and had shown good compliance up to the day
56
of the test. If subsequent tests were not performed,
57
whatever the reason, or performed although compliance
58
between tests had been poor, the last exercise time value
59
obtained while compliance was good was carried forward.'
60
Two hundred and forty of the 317 randomized patients had
61
exercise test done with good compliance at four, 12, and
62
24 weeks. Listed reasons (and number of patients) for
63
missing exercise time data at 24 weeks were: 'exercise
64
test not done due to death' (n = 30), 'exercise testing
65
contraindicated' (n = 9), and 'exercise test not done for
66
other reasons' (n = 10).
67
68
69
Example 4
70
A randomized, double-blind study to compare
71
nifedipine-GITS and verapamil-SR on hemodynamics, left
72
ventricular mass, and coronary vasodilatory in patients
73
with advanced hypertension [ 4 ] . Fifty-four patients
74
were randomized after the placebo run-in phase.
75
'Twenty-four failed to complete the (six-month) trial,
76
and thus were not included for analysis because of 1)
77
withdrawal for symptomatic adverse effects, 2) lack of
78
response, and 3) poor compliance.' 'Consequently, there
79
were 30 subjects with sufficient data sets for inclusion
80
in analyses.'
81
82
83
Example 5
84
A randomized, double-blind, titration study of
85
omapatrilat with hydrochlorothiazide in comparison with
86
hydrochlorothiazide (HCTZ) plus placebo for the treatment
87
of hypertension [ 5 ] . After 2 weeks of placebo lead in
88
and four weeks of HCTZ period, 274 subjects were
89
randomized into three treatment groups. 'A total of 235
90
subjects completed the (eight-week double-blind period)
91
study.'
92
93
94
95
Effect of withdrawals on the data analysis
96
To demonstrate with simple algebra the effects and key
97
statistical concepts surrounding missing data, I use the
98
data from Example 4 above. In that study, 54 patients were
99
randomized. However, only the 30 patients who completed the
100
trial were included in the paper's analysis. The authors
101
excluded from the analysis the other 24 patients who
102
withdrew early because of adverse effects, lack of response
103
or poor compliance. Defining effective control of BP by the
104
criteria of either maintaining diastolic blood pressure
105
(DBP) ≤ 95 mmHg or achieving a least ≥ 15 mmHg decrease in
106
DBP, the authors summarized the following results: 'Eighty
107
per cent of randomized patients completed the protocol with
108
effective control of BP and no side effects.' Obviously,
109
the authors only counted 24 patients out of the 30
110
completers and obtained 80% and ignored the 24 patients who
111
dropped out prior to the scheduled end of the study at six
112
months. To distinguish the different 24 patients in this
113
example, we denote 24 (cr)for the former and 24 (d)for the
114
latter, that is, dropouts. It is easily seen that the
115
correct summary should be 24 (cr)/54 = 44.4% completed the
116
protocol with effective control of BP and no side effects,
117
rather than the reported 24 (cr)/30 = 80%. (See Table
118
1.)
119
If the authors really intended to estimate the chance
120
for patients to have effective control of BP with no side
121
effects with the study therapies at Month 6 ('responders'
122
in brief), then we need to do more work. First, the
123
calculation should always use 54 as the denominator because
124
that was the number of patients randomized to the study;
125
however, only 30 patients had BP measurement at Month 6; of
126
them, 24 (cr)were responders. This means that the true
127
answer should be (24 (cr)+?)/(30+24 (d)) = (24+?)/54, where
128
the question mark represents the unknown number of
129
responders among the 24 (d)withdrawals counted in the
130
denominator. Next, we calculate the extreme possibilities
131
as (a) (24 (cr)+0)/54 = 44.4% and (b) (24 (cr)+24)/54 =
132
48/54 = 88.9%.
133
In (a) we assumed that none of 24 (d)withdrawals (0%)
134
responded, while in (b) we assume all 24 (d)withdrawals
135
(100%) responded. Of course, we know that (b) is
136
unrealistic since some people withdrew because of lack of
137
response and some because of side effects, but the paper
138
did not provide the exact numbers. In general, we usually
139
do not feel comfortable with either extreme, but we
140
understand that they provide an idea of the uncertainty in
141
the data because of withdrawals.
142
An estimate between the extremes is (c): to substitute
143
the unknown number by 24 (d)× (24 (cr)/30) = 24 (d)× 0.80 =
144
19.2, where 24 (cr)/30 = 80% is the proportion of
145
responders among those who completed the trial. That is,
146
when no particular information was available, we may assume
147
that the same proportion of patients (80%) among the 24
148
(d)dropouts would have also responded, had they completed
149
six months. Unsurprisingly, when we do the calculation, the
150
estimate becomes (24 (cr)+19.2)/54 = 43.2/54 = 80%, the
151
same answer as that using only the completers. In fact, a
152
simple algebra can show that this is always so. We can see
153
that (c) is in-between (a) and (b), and in this case, leans
154
toward (b). See Table 2.
155
Notice that the paper reported that a proportion of 80%
156
of 'randomized patients completed the protocol with
157
effective control of BP and no side effects' (as explained
158
earlier, the figure should instead be 44.4%), while the 80%
159
in (c) is an estimate of the chance of effective control of
160
BP without side effects with the study therapies at Month
161
6, under an assumption of 'no information available for the
162
missing data'. We should not be confused with these two
163
'80%'. The former 80% is a wrong summary number; the latter
164
is an estimate of the quantity of interest with a
165
particular assumption about the missing data. This
166
assumption is not likely to be appropriate for all the
167
dropouts, especially for those patients who dropped out
168
because of ineffective therapy; more discussion is given
169
later. We do not know whether the authors might have
170
intended to make the latter estimate but gave a wrong
171
summary instead.
172
Even more interesting and useful would be the same
173
calculations within each treatment group along with a
174
comparison of the estimates. Unfortunately, the paper did
175
not give the number of dropouts according to their
176
treatment groups.
177
Using proportions simplifies the illustration, but the
178
idea can easily be conveyed to the estimation of continuous
179
data as well, such as BP, exercise time, hemodynamic
180
measures, and lipoprotein levels.
181
182
183
Lessons learned
184
Several points can be generalized from the simple
185
illustration given above and closer examinations of the
186
other examples.
187
• It does not take very much missing data to mislead an
188
investigator. A good principle to avoid being misled is to
189
always account for every subject randomized to the study in
190
the analysis. Using the total number of randomized subjects
191
in the denominator is a step towards accomplishing this
192
principle, whether it is to calculate an average or a
193
proportion. This principle is known as intent to treat
194
(ITT). However, the much harder job for ITT is to account
195
for the dropouts in the numerator. This requires further
196
consideration, which follows below.
197
• It is important to record and report the reasons for
198
withdrawal and the number of subjects in each category of
199
withdrawal according to their treatment group. The reasons
200
for patients dropping out can be used to help properly
201
assess the nature of the missing data. For example, if all
202
the dropouts were because of a lack of response or side
203
effects, then the calculation in (a) would be appropriate.
204
In statistical terms, they would be called informative
205
missing data. This is because useful information can be
206
found in the reason for the dropout and this can be used to
207
estimate the true response. Outcome-related dropouts are
208
informative and should not be disregarded in analytical
209
study without careful thought. In particular, when a
210
patient dies, whatever the cause of the death might be,
211
such as in Example 3, all of the subsequent physiological
212
and quality of life data should not even be regarded as
213
missing, but as having values equal to zero or the worst
214
category. When a patient's clinical status has reached a
215
terminal disease progression stage (such as New York Heart
216
Association class IV) and they are unable to perform
217
exercise testing, as in Example 3, the exercise time should
218
also be equal to zero seconds, and not simply regarded as
219
missing data. For the same reason, the remaining survival
220
time after death (of any cause) would be zero days as well,
221
not a censored observation when doing, say Kaplan-Meier,
222
survival analysis for an endpoint such as cardiovascular
223
death. Treating non-cardiovascular death as equivalent to
224
censoring because of loss-to-follow-up or
225
end-of-observation for an endpoint of cardiovascular death
226
has unfortunately become a popular practice in many medical
227
journal articles. This needs to be corrected.
228
• The extreme calculations in (a) and (b) enable us to
229
assess the uncertainty of the data which contains missing
230
values, especially if we do the calculation for each
231
treatment group separately. The bias seen in the medical
232
publishing industry in the decisions over which articles
233
are chosen for publication is a mirror image of the dropout
234
problem in patient studies. In the former, positive studies
235
have better chance of getting published, while negative
236
studies have a higher chance of being rejected. The same is
237
true for the latter: patients responding to treatment tend
238
to continue in the study, while patients failing to respond
239
tend to drop out prematurely. Using only the available data
240
or only the subgroup of those who complete the study leads
241
to a biased result. The approaches in (a) and (b) take this
242
consideration into account, although they may also be
243
biased by over-correction.
244
• The assumption underlying the approach in (c) is
245
interesting. When no particular information is known about
246
the missing data, we are essentially assuming that the
247
dropouts are not much different from the completers. This
248
is generally described statistically as missing completely
249
at random (MCAR), meaning that the process which caused the
250
missing data is not informative about the parameter that we
251
are trying to estimate. A good way to think of MCAR is that
252
the dropouts are a simple, random sample of the study
253
sample. Examples of MCAR include patients who have moved
254
away, or study that has closed and the late entry of
255
patients being administratively 'censored'. We have seen
256
the convenience of MCAR in the above illustration: simply
257
use the completers and we get the same result. However,
258
whether this assumption is valid or not should be examined
259
carefully in each individual case. In many situations,
260
dropouts are not the same patient population as those who
261
stayed within the trial. MCAR certainly is less restrictive
262
than the assumptions in (a) or (b). Still, other less
263
restrictive assumptions than MCAR exist, and these are
264
discussed later.
265
• All three estimates given by (a), (b), and (c) are
266
biased to a certain extent. Had the authors given the
267
detail about the numbers of dropout categories of 'lack of
268
response' and 'side effects', a better estimate could be
269
derived.
270
• We would certainly feel more comfortable with a study
271
conclusion when it is not altered by different approaches.
272
Sensitivity analysis is actually the best way to analyze
273
data in the presence of dropouts. Medical investigators
274
should consult with statisticians when dealing with missing
275
data because there are many possible methods available.
276
Some popular approaches are reviewed below.
277
278
279
More about methods handling missing data
280
281
Objectives
282
As in any data analysis, the first consideration is
283
the objective of the analysis. In the presence of
284
dropouts, there can be two types of questions: (i) What
285
would be the treatment effect without dropouts? and (ii)
286
What would be the treatment effect in the presence of
287
dropouts? Question (i) is concerned with an ideal
288
situation. It is also known as a 'question for
289
explanatory trials' [ 6 ] . It is often concerned with
290
the human pharmacological properties of new drugs under
291
investigation rather than practical usage. Regarding
292
question (ii), we need to further differentiate two
293
situations: patients drop out either (a) totally from the
294
study and no data are collected after withdrawal, or (b)
295
merely from the study assigned treatment with data still
296
being collected. For (b) there will be no missing data.
297
If we can design trials that will allow patients to be
298
followed until the end of the study despite the patient's
299
lack of compliance, then (ii) is a very practical
300
question, also known as the 'question for pragmatic
301
trials' [ 7 ] . Prevention studies with all-cause
302
mortality as the primary endpoint usually follow this
303
design. However, other endpoints may also be followed-up
304
(until death) in such a design. A recent example is [ 8 ]
305
, in which all participants, even those who discontinued
306
treatment (lovastatin or placebo), were contacted
307
annually for vital status, cardiovascular events, and
308
cancer history. Since no missing data would occur, the
309
design of (b) is highly recommended for all trials if at
310
all possible. In fact, the ITT principle originally aims
311
to answer question (ii) with (b) type of dropouts, where
312
no missing data would occur. However, more often than not
313
we face studies in which patients have withdrawn from the
314
study entirely and caused the missing data problem, ie,
315
type (a), as the Examples 1-5 (with the exception of
316
Example 3) above have demonstrated. Unless the patient's
317
clinical status does not permit further testing after
318
discontinuing the study treatment, type (a) dropout
319
problem is a common design flaw and should be corrected.
320
Nevertheless, the problem of no follow-up data prevails
321
in clinical trials. For clinical trials conducted for
322
drug registrations it is possible that, in light of the
323
International Conference on Harmonization (ICH)-E9
324
guideline [ 9 ] , the data analyses have to address both
325
questions (i) and (ii).
326
327
328
Imputation methods
329
The analyses illustrated in Table 2were methods in the
330
general category of imputation. In general, the basic
331
idea of imputation is to fill in the missing data by
332
using values based on a certain model with assumptions.
333
There are methods based on a single imputation and
334
methods based on multiple imputation, which, instead of
335
filling in a single value for each missing value, replace
336
each missing value with a set of plausible values that
337
represent the uncertainty about the right value to
338
impute. The attraction of imputation is that once the
339
missing data are filled-in (imputed), all the statistical
340
tools available for the complete data may be applied.
341
Each method of (a), (b) and (c) in Table 2is a single
342
simple imputation method, but together they may be viewed
343
as a 'multiple simple imputation' method (as opposed to
344
the 'proper multiple imputation' method discussed below).
345
The data in Table 2only had one time-point (Month 6) for
346
analysis.
347
For longitudinal data with multiple time-points, the
348
conventional last-observation-carried-forward (LOCF)
349
approach is a common practice of another simple
350
imputation. This approach was used by the authors in
351
Examples 3 and 5. Attempting to follow the principle of
352
ITT to account for all randomized, LOCF method includes
353
every randomized subject who has at least one
354
post-therapy observation. LOCF is popular among
355
practitioners because it is simple to put into effect and
356
because of a misconception that it is conservative
357
(meaning working against an effective treatment group).
358
However, every imputation method implicitly or explicitly
359
assumes a model for the missing data. The LOCF assumes
360
(unrealistically) that the missing data after patient's
361
withdrawal are the same as the last value observed for
362
that patient. The consequence of this assumption is that
363
it imputes data without giving them within-subject
364
variability and that it alters the sample size.
365
Proper multiple imputation (PMI) methods are described
366
in [ 10 ] and [ 11 ] , which use regression models to
367
create more than one imputed data sets and thus provide
368
variability within and between imputations. PMI method
369
has long been a preferred approach in survey research.
370
Its popularity has recently gainied in clinical trials
371
since the method became automated by commercial computer
372
software [ 12 13 ] . However, the complexity of
373
regression models used in PMI should be carefully thought
374
through by clinical trial practitioners, because the
375
method assumes that the missing data process can be fully
376
captured by the regression model employed on observed
377
values. This assumption is called missing at random
378
(MAR). MAR essentially says that the cause of the missing
379
data may be dependent on observed data (such as data of
380
previous visits) but must be independent of the missing
381
value that would have been observed. It is a less
382
restrictive model than MCAR, which says that the missing
383
data cannot be dependent on either the observed or the
384
missing data. The design suggested by Murray and Findlay
385
[ 14 ] , which forced dropouts upon observing
386
uncontrolled BP, uses the MAR principle. When MAR or MCAR
387
conditions are met, model-based analyses can be
388
appropriately performed based on the observed data alone
389
without further modeling the missing data process.
390
Another imputation method, which is in-between the
391
LOCF and PMI, is the partial imputation (PI) or improved
392
LOCF method [ 15 ] . The idea of this method is quite
393
simple. In LOCF, one imputes every missing visit
394
time-point by carrying the last observation forward until
395
the end of the study. Since LOCF requires the strong
396
assumption of stability, the more it imputes the more
397
bias it introduces if the assumption of stability does
398
not hold. The method of PI does not always carry the
399
observations to the end time-point of the study, but just
400
far enough to balance the dropout patterns between the
401
treatment groups. The underlying principle is that when
402
the dropout patterns are made almost identical between
403
the treatment groups, the relative comparison of the
404
treatment effects will be less biased. Since PI does less
405
imputation, it is less biased than LOCF because the
406
assumption of stability usually does not hold. Some
407
simulation results under various missing data processes
408
demonstrated the potential usefulness of PI over the
409
methods of using all available data and LOCF [ 15 ] .
410
However, more experience is still needed to test this new
411
method in practice.
412
413
414
415
Methods based on special missing data models
416
Other, more sophisticated methods based on statistical
417
models are available [ 16 17 18 ] ; a technical review can
418
be found, for example, in [ 19 ] and [ 20 ] . No general
419
computer programs are available to put them into effect
420
though, because every so-called informative missing data
421
set requires a unique model to describe it.
422
423
424
Methods based on ranking observations
425
A large class of non-parametric methods is based on the
426
ranks or 'scores' of the observations instead of the actual
427
values. Commonly used non-parametric methods in clinical
428
trials include the Wilcoxon signed-rank test, Mann-Whitney
429
test, and so on. Example 3 also used a ranking method after
430
LOCF for a secondary analysis. Incorporating missing data
431
into these methods can be easily done, by ranking the
432
missing data, according to the reasons for withdrawals [ 21
433
] , and, in longitudinal study cases, the time of
434
withdrawal [ 22 ] . For example, death would be given the
435
worst rank, followed by 'lack of efficacy', then 'adverse
436
reaction', 'patient refusal', and so on. Within the same
437
category of withdrawal, early dropouts would be given worse
438
ranks than later dropouts. Ground rules for the ranking
439
should be set prior to unmasking the treatment codes for
440
data analyses to avoid being post-hoc. After missing data
441
are replaced by their ranks, the usual testing procedure
442
can be carried out. One major drawback in these methods is
443
that they do not provide any estimation of the treatment
444
effect in the original measurement unit, because the data
445
are replaced by the ranks.
446
All these methods, parametric or non-parametric, require
447
much closer collaboration between medical investigators and
448
statisticians. In the parametric case, the observed outcome
449
cannot provide statistical tests to select the missing data
450
models. In both cases, the validity of the various models
451
or ranking rules requires an examination of the missing
452
data information and strong faith in the reasons given for
453
the patients' withdrawal. Still, the main issue is the
454
question that these methods are addressing. They attempt to
455
follow the ITT principle (but with missing data) to answer
456
question (i) above, hoping that the dropouts can
457
hypothetically be removed by, say, a truly ITT design, or
458
by successfully using concurrent treatments for intolerable
459
side effects without affecting the efficacy of the study
460
medication.
461
462
463
Composite comparisons
464
Many believe that removing the patient's dropout process
465
is not plausible in clinical practice. In this case, the
466
dropout process itself may be an outcome of interest and
467
not a nuisance effect. For example, the US Federal Drug
468
Association's draft guidance on diabetes trials
469
specifically requested the consideration of dropouts as an
470
endpoint [ 23 ] . Therefore, the problem becomes a
471
'composite endpoints' issue. This is the approach taken in
472
[ 24 25 ] , and it has lately been extended to modeling the
473
joint distribution of the longitudinal and time-to-event
474
data (ie, time to withdrawal) [ 26 27 ] . In this setting,
475
we would compare the treatment groups with two aspects
476
simultaneously: (a) the chance (or duration) of complying
477
with the prescribed protocol and, (b) the outcome measure
478
(eg, mean change in systolic blood pressure) given the
479
pattern of compliance. The comparison (a) is
480
straightforward by either the standard binomial or survival
481
techniques. The comparison (b) requires the same care as
482
has been discussed here previously, because, given the
483
pattern of compliance, the subgroup of patients has already
484
been self-selected. The randomization mechanism used for
485
achieving comparability between treatment groups is broken
486
by the post-randomization stratification of compliance. It
487
is then important to check the key outcome-correlated
488
baseline characteristics between the treatment groups for
489
any incomparability among these subgroup patients. This was
490
done in Example 4 but not others. Recognizing that the
491
subgroups are no longer randomized, we should treat this
492
portion as a semi-observational study imbedded in the
493
randomized trial. Techniques used for analyzing
494
observational studies should be applied to this part of
495
comparison [ 28 ] . Generally speaking, in an observational
496
study, bias can only be reduced but not entirely eliminated
497
by methods of adjustment or matching. Sensitivity analysis
498
in this approach is to consider different baseline
499
covariates for matching or adjustment.
500
501
502
Conclusion
503
The issue of what to do about missing data caused by
504
dropouts in clinical trials is a research topic that is
505
still under development in statistical literature. As has
506
been noted in the ICH-E9 guideline [ 9 ] , 'no universally
507
applicable methods of handling missing values can be
508
recommended.' The issue of handling missing data is
509
intrinsically difficult because it requires a large
510
proportion of missing data to investigate a method. On the
511
other hand, a large proportion of missing data would make a
512
clinical study less credible. The best available advice is
513
to minimize the chance of dropouts at the design stage and
514
during trial monitoring. A truly ITT design is absolutely
515
encouraged. This requires follow-up data to be collected
516
even after patients discontinue the treatment, whenever the
517
clinical status of the patient permits. If it is
518
anticipated that there will be many dropouts, then perhaps
519
the study's duration should be shortened. Alternatively,
520
the medical procedure that is deemed to be the most likely
521
cause of patients' withdrawal should be altered. All data
522
after death of any cause should be given a value of zero
523
instead of a blank. Consideration may also be given to
524
define an endpoint (event), instead of a measurement value,
525
as the primary response variable, which can be determined
526
even if the patient withdraws from the study. In an
527
analysis, one should be clear about the question or
528
objective of the analysis with missing data, and conduct
529
sensitivity analysis with a set of plausible, pre-specified
530
models of the missing data.
531
532
533
Competing interests
534
None declared.
535
536
537
Abbreviations
538
BP = blood pressure; HCTZ = hydrochlorothiazide; DBP =
539
diastolic blood pressure; ITT = intent to treat; MCAR =
540
missing completely at random; ICH = International
541
Conference on Harmonization; LOCF
542
last-observation-carried-forward; PMI = proper multiple
543
imputation; MAR = missing at random; PI = partial
544
imputation.
545
546
547
548
549