(Stroke. 2001;32:1370.)
© 2001 American Heart Association, Inc.
Original Contributions |
From the Department of Neurology, Royal Hallamshire Hospital, Sheffield (N.U.W.), and Department of Clinical Neurosciences, Western General Hospital, Edinburgh (P.A.G.S., S.C.L., D.F.S., C.P.W.), UK.
Correspondence to Professor P.A.G. Sandercock, Department of Clinical Neurosciences, Western General Hospital, Crewe Rd, Edinburgh, EH4 2XU, Scotland. E-mail pags{at}skull.dcn.ed.ac.uk
| Abstract |
|---|
|
|
|---|
MethodsWe analyzed data from the 15 116 patients recruited in Argentina, Australia, Italy, the Netherlands, Norway, Poland, Sweden, Switzerland, and the United Kingdom. We compared crude case fatality and the proportion of patients dead or dependent at 6 months; we used logistic regression to adjust for age, sex, atrial fibrillation, systolic blood pressure, level of consciousness, and number of neurological deficits. We used the frequency of prerandomization head CT scan and prescription of aspirin at discharge to indicate quality of care.
ResultsThe differences in outcome (all treatment groups combined) between the "best" and "worst" countries were very large for death (171 cases per 1000 patients) and for death or dependency (375 cases per 1000 patients). The differences were somewhat smaller after adjustment for case mix (160 and 311 cases per 1000 patients, respectively). Process of care may have accounted for some but not all of the residual variation in outcome.
ConclusionsAdjustment for case mix explained only some of the variation in outcome between countries. The residual differences in outcome were too large to be explained by variations in care and most likely reflect differences in unmeasured baseline factors. These findings demonstrate the need to achieve balance of treatment and control within each country in multinational randomized controlled stroke trials and the need for caution in the interpretation of nonrandomized comparisons of outcome after stroke between countries.
Key Words: case fatality rate cerebrovascular disorders disability evaluation outcome randomized controlled trials
| Introduction |
|---|
|
|
|---|
| Subjects and Methods |
|---|
|
|
|---|
Baseline Characteristics and Measures of the
Process of Stroke Care
We extracted the following potential prognostic
variables from the baseline data set: age, sex, delay from symptom
onset to randomization, level of consciousness, presence of atrial
fibrillation, systolic blood pressure on admission, presence of
any infarction on CT scan, and presence or absence of 8 different
neurological deficits. We also extracted data on 2 potential markers of
the quality of the process of care: whether patients had had a CT scan
before randomization and whether patients who were discharged alive
from the participating hospital were reported to be prescribed
long-term aspirin.
Crude and Adjusted Outcome at 6 Months
We calculated the proportion who died from any cause
(case fatality) and the proportion who were dead or needed help from
another person for activities of daily living (death or dependency) at
6 months in each country (observed number). We used 2 previously
described logistic regression models to adjust these data for important
differences in case mix.13
The prognostic models were constructed and validated on 2 separate
subsets of the IST data set and take the following variables into
account: age, sex, systolic blood pressure, atrial
fibrillation, level of consciousness, and total number of neurological
deficits (see Appendix). We used the models to predict the probability
of an outcome event for each patient and then calculated the total
number of predicted events for each country as the sum of these
probabilities (predicted number).
We expressed the adjusted outcome as a w score, a method that measures the difference in the number of observed and predicted events per 1000 patients treated within each country.14 15 We calculated the w score using the formula w=1000(o- p)/t, where o is the observed number of events, p the predicted number of events, and t the total number of patients per country. For example, if 500 patients are treated and a total of 100 deaths are predicted but 150 deaths are observed, then the w score is 1000x(150-100)/500=+100, that is, 100 more deaths than predicted per 1000 patients treated. However, we were principally interested in the difference in adjusted outcome between countries and calculated these simply by subtracting w scores. For example, if the w score for case fatality in country A was +30 and that in country B was -60, we estimated the absolute difference in adjusted case fatality between countries to be 90 deaths per 1000 patients treated. To directly compare outcomes before and after adjustment for case mix, we also calculated crude w scores by defining p in each country as tpo, where po is the proportion of patients that experienced the outcome in the entire study population (all countries combined).
Correction for Multiple Comparisons
Because the study necessarily involved multiple
comparisons between countries, we calculated 99% CIs for the
w scores (according to Parry et
al14 ) to reduce the
possibility that any observed differences might be due to
chance.
Relationship Between Measures of the Process of
Care and Adjusted Outcome After Stroke
We ranked the countries by the 2 adjusted outcomes
and by the 2 measured items of the process of stroke care. Better
outcome and better process of care are represented by lower
numbers, and the converse is also true. We investigated the
relationship between rankings with Spearmans rank correlation
coefficient.
Predictive Properties of the Prognostic
Models
We determined the predictive properties of our
prognostic models by estimating their calibration and
discrimination.16
Calibration refers to the degree of bias of model predictions for
groups of patients and can be estimated by plotting a calibration
curve. To derive the calibration curves, we ordered the data set by
ascending predictions of risk and then divided it into 10 equal groups
(deciles). For each decile, we plotted the mean observed risk against
the mean predicted risk. A model is well calibrated if, within each
decile, the proportion of patients predicted to have an event and the
proportion observed to have done so is the same, ie, if the calibration
curve follows a 45° line. Discrimination refers to the ability of the
model to differentiate between individuals who do and do not experience
an event and may be estimated by calculating the area under the
receiver operating characteristic (ROC) curve [a plot of the
sensitivity against (1-specificity) of the model predictions]. Thus,
a model that predicts case fatality with an area under the ROC curve
of, for example, 80% will, in 80% of cases, correctly assign a higher
risk of death to a randomly selected patient with a fatal outcome than
to a randomly selected survivor.
Analyses were performed with the use of SAS (version 6.12).
| Results |
|---|
|
|
|---|
Baseline Characteristics and the Process of
Care
The proportion of patients with each prognostic
variable at baseline varied highly significantly between countries
(all P<0.0001,
2 tests)
(Table 1
), confirming our expectation that there would
be variation between countries in the types of stroke patients
considered eligible for the trial.
Table 1
also shows substantial variation between countries
in the process of stroke care, as judged by the proportion of patients
who had a CT scan before randomization (range, 42% to 98%) and the
proportion of patients prescribed aspirin for long-term secondary
prevention at hospital discharge (range, 53% to
73%).
|
Variations in Outcome at 6 Months (Observed
and Predicted)
Table 2
shows the variation between countries in the
proportion of patients who had died by 6 months (range, 12% to 30%)
and in the proportion who were dead or dependent at 6 months (range,
42% to 79%). The proportion of patients predicted dead and the
proportion predicted dead or dependent can be taken to summarize the
case mix of each cohort, and, not surprisingly, these also varied
between countries
(Table 2
).
|
Crude and Adjusted Case Fatality at 6
Months
Figure 1
shows the
w scores for case fatality for
each country before and after adjustment for case mix. The figure
clearly illustrates the substantial variation in crude case fatality
between countries, the most extreme difference being between Sweden and
Poland (171 more deaths per 1000 patients treated in Poland). After
adjustment for case mix, some of the differences in case fatality
between pairs of countries altered considerably. For example,
adjustment for case mix reduced the differences in case fatality
between Switzerland and Sweden and between the United Kingdom and the
Netherlands by approximately 70%. However, the most striking finding
shown in
Figure 1
is that, on the whole, adjustment for important
differences in case mix had little influence on the variation in case
fatality between countries, which remained very substantial. For
example, at 6 months, the difference between Sweden and Italy was 69
deaths per 1000 patients and between the United Kingdom and Poland was
57 deaths per 1000 patients; at its most extreme, 160 more patients
were dead at 6 months per 1000 treated in Poland than in Sweden despite
adjustment for case mix.
|
Crude and Adjusted Death or Dependency at 6
Months
Figure 2
shows the
w scores for death or
dependency at 6 months before and after adjustment for case mix. As
with case fatality, the crude number of patients dead or dependent at 6
months varied considerably between countries, with the largest
difference between Sweden and the United Kingdom (375 more patients
were dead or dependent per 1000 treated in the United Kingdom).
Adjustment for case mix markedly reduced some of the between-country
differences, for example, that between Switzerland and Italy was
reduced by 86% and that between Argentina and Australia was reduced by
60%. Again, however, despite adjustment for case mix, the overall
variation in death or dependency between countries remained very
substantial. For example, at 6 months, the difference between Sweden
and Poland was 86 patients dead or dependent per 1000 treated and
between Switzerland and the United Kingdom was 146 patients dead or
dependent per 1000 treated; at its most extreme, 311 more patients were
dead or dependent per 1000 treated in the United Kingdom than in Sweden
even after adjustment for case mix.
|
Association of the Process of Care With
Adjusted Outcome
The ranking of countries by adjusted outcome and
by the proportion of patients that received each item of care is shown
in
Table 3
. Higher adjusted case fatality was strongly
correlated with a lower rate of CT scanning before randomization
(r=0.78) and also with a lower
rate of prescription of long-term aspirin
(r=0.53). The correlation
between adjusted death or dependency and the 2 processes of care
variables was weaker
(r=0.40 in both
cases).
|
Performance of the Predictive
Models
Figure 3
shows that both prognostic models were well
calibrated (both calibration plots follow a 45° line). Both models
also showed moderately good discrimination with an area under the ROC
curve of 0.79 in each case. For both outcome states, however, a
substantial proportion of the predicted probabilities (33% for death,
57% for death or dependency) lay between 25% and 75%, ie, the models
failed to place substantial numbers of individuals into groups either
very likely or very unlikely to have an outcome
event.
|
| Discussion |
|---|
|
|
|---|
Given the well-described international differences in the treatment of acute stroke and speculation that these differences may contribute to differences in stroke outcome between countries,3 4 6 7 8 9 10 we considered whether differences in care between countries in the IST might explain their residual differences in outcome. Our observation that lower case fatality was strongly correlated with a higher proportion of patients with a head CT scan and with a higher proportion prescribed aspirin on discharge lends some support to this possibility. Similarly, it is notable that the Swedish and Norwegian cohorts had the lowest case fatality of all. The Scandinavian countries were early advocates of the organization and specialization of stroke care and, at the time of the IST, the majority of Swedish and Norwegian patients were likely to have been treated in a stroke unit.10 This is in sharp contrast to the United Kingdom, Argentina, and Poland, the countries with the highest case fatality, where the majority of patients would have received conventional ward care.17 Such differences in care may well underlie at least some of the difference in outcome between the 2 groups of countries.18
On the whole, however, the evidence from these data that differences in care might explain some of the residual differences in outcome is weak. Stroke interventions, especially those relating to secondary prevention, may quite properly be withheld from patients who survive in a very poor functional state. Our simplistic measurements of the quality of care do not take this into account and portray all cases in which the intervention is withheld as an error. Our finding that higher case fatality is strongly correlated with apparently worse provision of care was therefore perhaps inevitable. The weaker correlation between death or dependency and our measures of the process of care also argues against a significant impact of quality of care on the differences in outcome between countries. In particular, if a difference in the proportion of patients treated on a stroke unit was an important reason for the difference in outcome, how is one to reconcile Norways apparently excellent "performance" as measured by case fatality but its comparatively poor performance when measured by the combined outcome of death or dependency? Similarly, how does one explain the opposite findings for Argentina and Poland? Perhaps, one might argue, where there is better care, patients otherwise destined to die are more likely to survive but do so in a dependent state (eg, Norway), and where there is inferior care, patients with a poor prognosis die and therefore cannot be counted as dependent (eg, Argentina and Poland). However, if this were so, how would one explain the ranking of Sweden as the country with the best performance on both measures of outcome or the considerably worse performance of the United Kingdom when measured by death or dependency than when measured by case fatality alone?
A stronger argument that differences in stroke care are not the major cause of the residual differences in outcome is the sheer size of the absolute differences between countries. The differences in the proportion of patients dead or dependent between the United Kingdom and the other 8 countries were between 150 and 300 events per 1000 patients treated. These absolute differences in outcome are 2 to 4 times larger than the treatment effect of stroke unit care18 and twice as large as the benefit of giving thrombolysis within 3 hours of the onset of stroke.19 The differences in outcome are therefore much larger than might plausibly be explained by the differential use of even the most efficacious of known interventions. Indeed, at the time of the IST, thrombolysis was not in routine use. Thus, while it remains plausible that differences in medical treatment may account for some of the residual differences in outcome, other factors are likely to account for much more.
As with all observational studies, these other factors are chance, bias, and residual variation in case mix (confounding). Given our use of large samples and the narrow 99% CIs, chance is highly unlikely to explain much of the residual variation in outcome between countries. Similarly, given the nearly complete follow-up and the unambiguous nature of death, biased measurement cannot explain the residual variation in case fatality. However, measurement error may underlie some of the residual variation in death or dependency. Functional status is difficult to define and measure, and it is known that international comparisons are prone to bias.20 21 The fact that the range of differences between countries was greater for death or dependency than for death alone suggests that such bias might have operated here. In nonEnglish-speaking countries, the follow-up questions were translated into the local language without back-translation to check for alterations, and subtle but important differences in interpretation may have been introduced.22 23 The method used to collect outcome varied between countries. In some it was by postal questionnaire, in others by telephone, and in others still by a combination of the 2 methods. These differences might also have influenced response. Cultural differences in the perception of disability and dependence may also have led patients in different countries to report dependency differently despite similar degrees of impairment in function.21 22 24 25 In general, however, the impact of these second order biases is modest. They would also be unlikely to explain the marked difference in death or dependency between patients in the United Kingdom and Australia, countries that used exactly the same outcome questionnaire and method of follow-up and that experience reasonably similar cultures. Furthermore, empirical research suggests that people in Sweden and the United Kingdom, the countries with the greatest difference in the proportions dead or dependent, value health states very similarly.26 It seems likely, therefore, that most of the residual variation in outcome between countries in the IST must be due to unmeasured variation in case mix.
The reason for the marked variation in case mix between countries in the IST is that the fundamental entry criterion for the trial was simply that the clinician had to be substantially uncertain whether or not to treat a given patient with aspirin, heparin, both, or neither. Variation between countries in "uncertainty," in the types of physicians participating in the trial, and in the types of stroke patients routinely admitted to hospital9 are therefore all likely to have played a part. Although detailed, our prognostic models might have better accounted for the marked differences in case mix if they had adjusted for other recognized prognostic variables, such as prestroke functional status and living arrangements, comorbid conditions such as heart failure and diabetes mellitus, poststroke urinary incontinence, hyperglycemia, and the size of the stroke lesion on brain imaging.27 However, they certainly could not have accounted for the (probably many) important differences in case mix that are not currently understood. The limitations of our models are illustrated by the fact that for large numbers of patients the predictions of risk are in the middle, nonconfident range, ie, they are not particularly good at separating patients into high- and low-risk groups. This in turn implies that they explain only a small part of the total variability in outcome (analogous to having a low r2 statistic in linear regression). This is the case despite the fact that both models show excellent calibration and moderately good discrimination, 2 widely quoted measures of model performance. This observation highlights the important difference between a prognostic model having a good fit and providing clinically useful predictions.28
Regardless of their explanation, our findings have a number of implications. First, they emphasize the potential limitations of adjusting nonrandomized comparisons of stroke outcomes for differences in case mix, even with high-quality and complete data from large numbers of patients and especially when the differences in unadjusted outcome are very large. These observations may be of particular relevance to those attempting to draw inferences about the quality of care from nonrandomized comparisons of stroke outcome, whether between hospitals, regions, or countries, and especially if their data are retrospective, incomplete, or less completely adjusted than our data. Second, investigators need to be aware that simply because case mix adjustment has been performed with the use of models that show excellent calibration and moderately good discrimination, a considerable amount of variation in outcome may remain to be explained. Third, therefore, before conclusions about the quality of stroke care are drawn, the plausibility of ascribing any residual differences in outcome to variation in the use of currently understood stroke interventions should be carefully considered. Fourth, those wishing to measure between-country differences in functional outcome after stroke should recognize that this task is prone to various subtle forms of bias.
Finally, our findings also have implications for the design of multinational randomized controlled trials. The variations in outcome between groups of patients in different countries in the IST do not affect the validity of the overall trial results because the trial used a method of allocation that ensured balance within countries (minimization). This design enabled the trial to detect the effect of a treatment (a reduction in death or dependency of 10 cases per 1000 treated) that was 30 times smaller than the largest difference in outcome between countries. If, however, a multinational study did not ensure balanced allocation within countries, then treatment effects might be spuriously generated or obscured. For instance, in an imaginary trial of a truly ineffective treatment using the same study population as the IST, if by chance the proportion of patients randomized to drug or to placebo happened to be 2:1 in Sweden and, for the same number of patients, 1:2 in the United Kingdom, then an apparent benefit of the truly ineffective drug would be observed even though the trial would have allocated equal numbers to each intervention.
In summary, this study demonstrates the potential limitations of using analyses of observational data to explain international differences in outcome after stroke. Differences in the quality of care, chance, measurement error, and cultural bias may account for some of the residual variation in outcome between countries in the IST, but most of the unexplained variation is likely to reflect the difficulty of achieving perfect case mix adjustment. To avoid these biases, multinational randomized controlled trials in stroke must ensure a balance of treatment and control within each country as well as in the overall trial. Furthermore, those wishing to draw inferences from nonrandomized comparisons of outcome after stroke should consider the issues raised in this report carefully.
| Appendix 1 |
|---|
|
|
|---|
![]() |
Case Fatality at 6 Months
y=-7.3529+(0.0603xage)-(0.1637xsex)+(0.5130xAF)+(0.9533x
level of consciousness)+(0.3272xnumber of neurological deficits),
where AF is atrial fibrillation.
Dead or Dependent at 6 Months
y=-1.5288+(1.059xage)+(0.2988xage2)+(0.3066xsex)+(0.1963x
AF)+(1.0634xlevel of consciousness)- (0.1012x SBP)+
(0.2291x SBP2) +(0.4249xnumber of
neurological deficits), where SBP is systolic blood
pressure.
Coding of Variables
Age=(age-70)/20;
age2=(age
variable)2 Sex:
female=1, male=0 AF: present=1,
absent=0 Level of consciousness:
drowsy/unconscious=1, fully
conscious=0 SBP=(SBP-160)/60;
SBP2=(sBP
variable)2 Number
of neurological deficits=observed number except 0 coded as 1 and 8
coded as 7 (covariate treated as a continuous variable because it
has a linear relationship with each outcome)
| Acknowledgments |
|---|
Received January 15, 2001; revision received March 2, 2001; accepted March 5, 2001.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K. Kuwabara, S. Matsuda, Y. Imanaka, K. Fushimi, H. Hashimoto, and K. Ishikawa The effect of age and procedure on resource use for patients with cerebrovascular disease J Health Serv Res Policy, January 1, 2008; 13(1): 26 - 32. [Abstract] [Full Text] [PDF] |
||||
![]() |
D W J Dippel National variations in mortality and functional outcome: should we be worried? J. Neurol. Neurosurg. Psychiatry, March 1, 2006; 77(3): 288 - 288. [Full Text] [PDF] |
||||
![]() |
L J Gray, N Sprigg, P M W Bath, P Sorensen, E Lindenstrom, G Boysen, P P De Deyn, P Friis, D Leys, R Marttila, et al. Significant variation in mortality and functional outcome after acute ischaemic stroke between western countries: data from the tinzaparin in acute ischaemic stroke trial (TAIST) J. Neurol. Neurosurg. Psychiatry, March 1, 2006; 77(3): 327 - 333. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. De Wit, K. Putman, E. Dejaeger, I. Baert, P. Berman, K. Bogaerts, N. Brinkmann, L. Connell, H. Feys, W. Jenni, et al. Use of Time by Stroke Patients: A Comparison of Four European Rehabilitation Centers Stroke, September 1, 2005; 36(9): 1977 - 1983. [Abstract] [Full Text] [PDF] |
||||
![]() |
H Markus Variations in care and outcome in the first year after stroke: a Western and Central European perspective J. Neurol. Neurosurg. Psychiatry, December 1, 2004; 75(12): 1660 - 1661. [Full Text] [PDF] |
||||
![]() |
C D A Wolfe, K Tilling, A Rudd, M Giroud, and D Inzitari Variations in care and outcome in the first year after stroke: a Western and Central European perspective J. Neurol. Neurosurg. Psychiatry, December 1, 2004; 75(12): 1702 - 1706. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Wardlaw, J. Seymour, J. Cairns, S. Keir, S. Lewis, and P. Sandercock Immediate Computed Tomography Scanning of Acute Stroke Is Cost-Effective and Improves Quality of Life Stroke, November 1, 2004; 35(11): 2477 - 2483. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. U. Heuschmann, P. L. Kolominsky-Rabas, J. Roether, B. Misselwitz, K. Lowitzsch, J. Heidrich, P. Hermanek, C. Leffmann, M. Sitzer, M. Biegler, et al. Predictors of In-Hospital Mortality in Patients With Acute Ischemic Stroke Treated With Thrombolytic Therapy JAMA, October 20, 2004; 292(15): 1831 - 1838. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. U. Heuschmann, K. Berger, B. Misselwitz, P. Hermanek, C. Leffmann, M. Adelmann, H.-J. Buecker-Nott, J. Rother, B. Neundoerfer, and P. L. Kolominsky-Rabas Frequency of Thrombolytic Therapy in Patients With Acute Ischemic Stroke and the Risk of In-Hospital Mortality: The German Stroke Registers Study Group Stroke, May 1, 2003; 34(5): 1106 - 1112. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2001 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |