(Stroke. 1999;30:2146-2151.)
© 1999 American Heart Association, Inc.
Original Contributions |
From the Department of Neurology, Regional Neurosciences Center, Newcastle General Hospital, Newcastle upon Tyne, England (P.D.), and Department of Clinical Neuroscience, University of Edinburgh (Scotland) (M.D., P.S.).
Correspondence to Dr Paul Dorman, Department of Neurology, Regional Neurosciences Center, Newcastle General Hospital, Westgate Rd, Newcastle upon Tyne, NE4 6BE, UK. E-mail P.J.Dorman{at}ncl.ac.uk
| Abstract |
|---|
|
|
|---|
MethodsA total of 2253 patients with stroke entered by United Kingdom hospitals in the International Stroke Trial were randomized to follow-up with either the EuroQol or SF-36 instruments. We randomly selected one third of patients who had responded to the EuroQol for follow-up, again using the SF-36, and two thirds of patients who had responded to the SF-36 for follow-up, again using the EuroQol. We assessed the patients' mean score for each domain of the SF-36 categorized by their response to the corresponding EuroQol domain and the correlation between the domains of the 2 instruments.
ResultsThe domains for both instruments, which assessed physical functioning, social functioning, bodily pain, and overall health-related quality of life, correlated closely. The mental health domain of the SF-36 correlated only poorly with the psychological functioning domain of the EuroQol.
ConclusionsBoth the EuroQol and SF-36 measure broadly similar domains of health. The weak relationship between the assessments of mental health may reflect a difference in content or more fundamental problems with the validity or reliability of the items in one of the instruments with respect to this domain. This study has provided the first empirical qualitative evidence by which the data on the SF-36 after stroke may be interpreted.
Key Words: health status neuropsychological tests quality of life stroke outcome
| Introduction |
|---|
|
|
|---|
A clearer understanding of the relationship between these instruments might help to improve the interpretation of a change of score with either instrument. It should also be helpful in comparing studies that have used different measures and may even allow the translation of outcomes for meta-analysis. Furthermore, since both instruments aim to measure health-related quality of life, there should be a strong correlation between responses on the 2 instruments. A poor correlation might suggest poor validity of 1 or both of the measures. We therefore administered both the EuroQol and SF-36 to a group of patients after stroke to compare their responses to these instruments.
| Subjects and Methods |
|---|
|
|
|---|
For a study of reproducibility, we then randomly sampled one third of
the patients who had responded within approximately 3 weeks to the
first questionnaire for repeated testing with the same health-related
quality of life instrument (test-retest reliability).10
Concurrently, we also randomly selected one third of patients who had
responded within approximately 3 weeks to the first EuroQol (the
remaining one third of patients were involved in another study) for
follow-up again with the SF-36 and two thirds of the patients who had
responded within approximately 3 weeks to the first SF-36 for follow-up
again with the EuroQol (Figure 1
). These
patients were completely separate from those included in the study of
the reproducibility.10 The planned (and actual) flow of
patients is shown in Figure 1
.
|
We mailed the second questionnaire booklet containing the appropriate instrument to all eligible patients with a personalized letter and a postage-paid reply envelope. The letter explained the purpose of the repeated questionnaire and asked the subjects to respond if possible without the help of another person and, if not, to give the questionnaire to a close relative or caregiver who was willing to respond on the patient's behalf. We sent a reminder letter and an additional identical questionnaire to any patient who had not responded within 14 days. We made no further attempts to contact nonrespondents thereafter. We marked individual questionnaire booklets with labels that included details of the patient's name, address, trial identifying number, and questionnaire allocation.
Statistical Analysis
The ability of a questionnaire to discriminate between different
levels of health is an important aspect of validity. This is determined
in part by whether a measure can define a full range of potential
health states and whether it is sensitive to change or difference over
this range. Patients who are at the lowest score on a measure will have
no scope to show any further decline of health ("floor"
effects).11 12 Similarly, if the majority of patients
score near the top of the measure, it will have little scope to show
improvements in health ("ceiling" effects).11 12 We
initially examined the distribution of scores in each domain to
identify the degree of skewness of the distribution, chiefly by
assessing, for each domain, the proportion of respondents with a
maximum or minimum score for that domain (ie, ceiling and floor
effects). We also assessed the levels of missing data for both
instruments.
"Ordering" effects are a potentially important source of bias in an unbalanced crossover study. For instance, completing the EuroQol questionnaire first might affect the patients' subsequent response to the SF-36. We therefore used a simple factorial 1-way ANOVA to investigate whether ordering effects occurred after the administration of either the EuroQol or SF-36. We restricted these analyses to the comparable domains of both instruments (physical functioning [SF-36] versus mobility [EuroQol] and self-care [EuroQol], physical role functioning [SF-36] versus mobility [EuroQol] and self-care [EuroQol], social functioning [SF-36] versus activities [EuroQol], pain [SF-36] versus pain [EuroQol], mental health [SF-36] versus psychological functioning [EuroQol], and psychological role functioning [SF-36] versus psychological functioning [EuroQol]), to avoid the problems that may arise from multiple testing.
We assessed the construct validity of both instruments further by testing the relationship between the EuroQol and SF-36 domains. Thus, the relationship between comparable domains on the EuroQol and SF-36 (such as physical functioning on the SF-36 and mobility on the EuroQol) should be higher than between less comparable domains (such as physical functioning on the SF-36 and psychological functioning on the EuroQol). In contrast, the domains that examine more general aspects of health (such as overall health-related quality of life on the EuroQol) should be moderately correlated with all the other domains. We examined these relationships in 2 separate ways. We initially calculated patients' median score, for each separate domain of the SF-36, for patients categorized according to their response to the corresponding EuroQol domain. These analyses were performed to facilitate the interpretation of patients' scores with the SF-36. We subsequently calculated correlation coefficients between the domains of the EuroQol and each of the domains of the SF-36. All analyses were performed with the statistical software package SPSS for Windows (release 6.1).
| Results |
|---|
|
|
|---|
We performed a simple 1-way factorial ANOVA to assess the effect of the questionnaire ordering (ie, "EuroQol then SF-36" or "SF-36 then EuroQol") on the relationship between EuroQol and SF-36 scores for comparable domains. The "ordering term" was not a significant determinant of the relationship between the EuroQol and SF-36 scores in the 8 analyses performed. We therefore combined all the data for the remaining analyses.
The distribution of scores for the SF-36 is described in Table 1
. There was substantial variation in the
proportion of responses with missing data between the different
domains, which ranged from 2% to 16% (social functioning and
psychological role functioning domains, respectively). The distribution
of scores was highly skewed for some domains. A large proportion of
respondents scored the minimum possible score (0 of a possible 100, ie,
the floor of the scale, and the worst possible outcome) for the domains
of physical role functioning and emotional role functioning.
Approximately one quarter of patients scored the maximum score for the
bodily pain and psychological functioning domains of the SF-36.
|
The distribution of patients' responses to the categorical domains of
the EuroQol are described in Table 2
. The
proportion of missing data (approximately 3%) was very similar for
each of these 5 domains (Table 2
). However, since each of the
domains had only 3 potential levels of response, a substantial
proportion of patients scored the maximum possible score (no problems)
for each domain. Examination of the distribution of overall estimates
of health-related quality of life with the EuroQol visual analog scale
or the EuroQol utility scores did not suggest problems with ceiling or
floor effects (Figures 2
and 3
).
|
|
|
The relationships between patients' responses to the EuroQol and SF-36
questionnaires are presented in Tables 3
and 4
.
Table 3
presents the median scores for the relevant SF-36
domains for patients categorized according to their response to the
comparable EuroQol domain. For almost all of the domains, the median
scores were ordered appropriately and were significantly different
between the groups. Indeed, physical functioning, social functioning,
and pain measured with the SF-36 were particularly closely related to
the corresponding domains on the EuroQol. However, there was no
difference in the median scores for the physical role functioning and
psychological role functioning domains between patients reporting
"some" or "severe" problems with the EuroQol. Furthermore,
there was only a weak relationship between the mental health domain
(SF-36) and psychological functioning domain (EuroQol) (Table 3
).
|
|
Table 4
reports the correlation between each of the domains of
the EuroQol and those of the SF-36. The physical functioning domain on
the SF-36 correlated most closely with the mobility, self-care, and
activities domain of the EuroQol; it correlated less closely with the
pain and psychological domains of the EuroQol. Social functioning on
the SF-36 was moderately correlated with all the domains of the
EuroQol. Bodily pain was most closely correlated with the pain domain
of the EuroQol. In contrast, mental health correlated only poorly with
psychological functioning measured with the EuroQol. The vitality and
general health domains of the SF-36 correlated particularly strongly
with the overall health-related quality of life domain of the EuroQol,
but also moderately with the other domains of the EuroQol.
| Discussion |
|---|
|
|
|---|
The correlation between patients' responses to the mental health domain of the SF-36 and the psychological functioning domain of the EuroQol was poor. There are several possible explanations for this. First, it is possible that these domains, although superficially similar, are measuring different constructs. This is supported by the fact that the EuroQol item focuses on anxiety and depression, whereas the SF-36 mental health scale includes positive emotions as well (eg, feeling calm and peaceful). The psychological role functioning domain of the SF-36, which emphasizes anxiety and depression, correlated much better with the EuroQol psychological functioning domain than did the mental health domain of the SF-36 (Spearman rank correlation coefficient, 0.43 versus 0.21). However, an alternative explanation is that 1 or both of these domains have poor measurement properties in patients with stroke. There are several indications of this. First, the assessments of mental health with the SF-36 were clustered around the middle of the scale (mean score, 61; SD, 12) and therefore did not appear to take full advantage of the potential breadth of the scale. Second, the reproducibility of the mental health assessments with the SF-36 was also particularly poor (intraclass correlation coefficient=0.28).10 Finally, approximately half of these questionnaires were completed with the help of proxies, and the validity of these proxy assessments is particularly questionable for the domain of psychological functioning.13 14
It has been difficult to establish the validity of the numerical assessments of overall health-related quality of life with the EuroQol because this domain is difficult to define and is highly subjective.1 However, the general health domain of the SF-36 appears to examine a similar construct.4 15 It aims to assess an individual's general health perceptions and satisfaction and, as with the EuroQol, these general health perceptions appear to provide an approach by which different components of health such as disease, functioning, symptoms, and feelings can be integrated. The strong correlation (Pearson correlation coefficient=0.66) between patients' responses to these domains supports the view that both these domains are measuring the same underlying trait. The validity of these assessments is further supported by the moderate correlation of the assessments of overall health-related quality of life with the other domains of the SF-36.
Interpretability
Studies of interventions must show that the observed changes in
patients that are due to the intervention are important and substantial
enough to warrant further consideration in medical practice and policy
planning.16 One approach to the definition of clinical
meaningfulness is the use of anchor-based
interpretations.17 These definitions represent
instances in which the changes in quality of life measures were
compared, or anchored, to other clinical changes or results. The
descriptive nature of the categorical levels of the EuroQol
questionnaire could be considered potential anchors. In the present
study, a change of 55 points in the physical functioning scale of the
SF-36 appeared to be equivalent to the difference between "no
problems" and "some problems" in the categorical mobility domain
of the EuroQol. However, several factors limit the usefulness of this
approach. First, clinicians may be unsure about the significance of
such a change in the EuroQol and what the term "some problems"
means in practice. Second, the amount of change judged significant may
differ with the population and the type of treatment under study.
Third, most scales are not linear, ie, not an interval scale, and
therefore a change of 10 units at the top of the scale may not be the
same as a similar-sized change at the bottom of the scale. An
additional factor that limits the interpretability of Likert scaled
scores (such as the SF-36) is that the total score will never give
clinical information about the exact responses to individual items. For
example, a total score of 50 on the 10-item physical functioning scale
can be achieved in different ways.
The difficulties in defining a clinically significant change in these health-related quality of life measures reflect in part the newness of these measures and our lack of experience with them.17 Therefore, presenting these correlations should improve researchers' familiarity with them and may help to develop an intuitive feeling about the relevance of any change.
Distribution of Scores
The large number of patients scoring the minimum score
(worst outcome) in the physical and emotional role functioning domains
of the SF-36 suggests that floor effects may be present in these
domains. The observation that the median scores for the physical role
functioning domain did not distinguish between patients classified as
having moderate or severe problems by the mobility or self-care domains
of the EuroQol confirms this suspicion. The role functioning domains
may therefore not measure the consequences of more severe disabilities,
and this might reduce responsiveness in these domains. This confirms
the findings of other investigators in studies of the SF-36 in patients
with stroke,18 the elderly,7 and
groups of patients with other diagnoses.19 Indeed, because
of these problems the SF-36 has recently been
revised.20
Methodological Issues
The lower frequency of response and lower levels of data
completeness in patients followed up with the SF-36 compared with the
EuroQol are consistent with the result of the direct randomized
comparison of their feasibility after stroke.8 We used an
interpolation procedure to reduce the proportion of missing data
(missing items were substituted with the mean response to other
items),5 and therefore these results underestimate the
underlying level of missing data for the SF-36. Brazier and
colleagues7 have expressed concerns over the validity of
these interpolation procedures. They suggest that when patients omit
items because they do not appear relevant to them, this may indicate
that the respondent is in fact unable to perform that particular
activity or function, and therefore the average response to the other
items could be misleading if interpolation is used for missing
values.7
The crossover design used in this study seems to have been valid since there did not appear to be any significant carryover or other ordering effects. Furthermore, the study of test-retest reliability demonstrated that the patients did not change significantly in any of the domains of health-related quality of life between test and retest.10 These findings justified the combined analysis of all the data irrespective of the order of questionnaire administration.
Conclusions
In summary, despite fundamental differences in their
background, design, and format, the domains of the EuroQol and SF-36
measured broadly similar aspects of health-related quality of life. The
weak relationship between the assessment of mental health with the
SF-36 and psychological functioning with the EuroQol may reflect a
difference in content or more fundamental problems with the validity or
reliability of the items in either of these domains. Unfortunately, it
is difficult to resolve which of these explanations applies since no
reference instruments were administered concurrently. This study has
provided the first empirical qualitative evidence by which data on the
SF-36 after stroke may be interpreted.
| Acknowledgments |
|---|
Received May 20, 1999; revision received June 24, 1999; accepted June 28, 1999.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B Schweikert, H Hahmann, and R Leidl Validation of the EuroQol questionnaire in cardiac rehabilitation Heart, January 1, 2006; 92(1): 62 - 67. [Abstract] [Full Text] [PDF] |
||||
![]() |
The IMS Study Investigators Combined Intravenous and Intra-Arterial Recanalization for Acute Ischemic Stroke: The Interventional Management of Stroke Study Stroke, April 1, 2004; 35(4): 904 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Sulch, A. Melbourn, I. Perez, and L. Kalra Integrated Care Pathways and Quality of Life on a Stroke Rehabilitation Unit Stroke, June 1, 2002; 33(6): 1600 - 1604. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Hobart, L. S. Williams, K. Moran, and A. J. Thompson Quality of Life Measurement After Stroke: Uses and Abuses of the SF-36 Stroke, May 1, 2002; 33(5): 1348 - 1356. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Dorman, M. Dennis, and P. Sandercock Are the modified "simple questions" a valid and reliable measure of health related quality of life after stroke? J. Neurol. Neurosurg. Psychiatry, October 1, 2000; 69(4): 487 - 493. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 1999 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |