How Do Scores on the EuroQol Relate to Scores on the SF-36 After Stroke?
Background and Purpose—The EuroQol and Medical Outcome Survey 36-item short-form health survey (SF-36) questionnaires have both been validated for the assessment of health-related quality of life after stroke. However, the relationship between these instruments has not been studied after stroke. We therefore sought to compare the responses of a group of stroke patients to both instruments.
Methods—A total of 2253 patients with stroke entered by United Kingdom hospitals in the International Stroke Trial were randomized to follow-up with either the EuroQol or SF-36 instruments. We randomly selected one third of patients who had responded to the EuroQol for follow-up, again using the SF-36, and two thirds of patients who had responded to the SF-36 for follow-up, again using the EuroQol. We assessed the patients’ mean score for each domain of the SF-36 categorized by their response to the corresponding EuroQol domain and the correlation between the domains of the 2 instruments.
Results—The domains for both instruments, which assessed physical functioning, social functioning, bodily pain, and overall health-related quality of life, correlated closely. The mental health domain of the SF-36 correlated only poorly with the psychological functioning domain of the EuroQol.
Conclusions—Both the EuroQol and SF-36 measure broadly similar domains of health. The weak relationship between the assessments of mental health may reflect a difference in content or more fundamental problems with the validity or reliability of the items in one of the instruments with respect to this domain. This study has provided the first empirical qualitative evidence by which the data on the SF-36 after stroke may be interpreted.
The EuroQol and Medical Outcome Survey 36-item short-form health survey (SF-36) are both valid measures for the assessment of health-related quality of life after stroke.1 2 The EuroQol assesses outcome in 6 broad areas (mobility, self-care, activities, pain, psychological functioning, and self-reported overall health-related quality of life) and also provides a utility score for overall health-related quality of life.3 The SF-36 assesses 8 domains: physical functioning, physical role functioning, social functioning, bodily pain, mental health, psychological role functioning, vitality, and general health.4 5 At first it may appear that many of the domains of the SF-36 are similar to those of the EuroQol: for example, the mobility question in the EuroQol appears to relate closely to the physical functioning questions on the SF-36. However, the relationship between responses by an individual to the domains of each instrument has not been well defined.6 7 The degree to which any change in health-related quality of life might be reflected differently by the EuroQol and SF-36 is also unknown.
A clearer understanding of the relationship between these instruments might help to improve the interpretation of a change of score with either instrument. It should also be helpful in comparing studies that have used different measures and may even allow the translation of outcomes for meta-analysis. Furthermore, since both instruments aim to measure health-related quality of life, there should be a strong correlation between responses on the 2 instruments. A poor correlation might suggest poor validity of 1 or both of the measures. We therefore administered both the EuroQol and SF-36 to a group of patients after stroke to compare their responses to these instruments.
Subjects and Methods
Patients and Allocation to the EuroQol or SF-36
In a previous study we examined response rates to postal versions of the EuroQol and SF-36; we randomly allocated patients to receive either the EuroQol or the SF-36. We have described in detail elsewhere the methods used to identify patients and the format of the instruments.8 Briefly, the study included patients with confirmed or suspected ischemic stroke who had been enrolled between March 2, 1993, and May 31, 1995, by any of the United Kingdom hospitals participating in the International Stroke Trial.9 We included all patients who were not known to have died by the time of the survey. We incorporated the EuroQol or SF-36 into booklets that included some additional questions recording the patients’ demographic details, their functional outcome after stroke, and whether the patients completed the booklet by themselves. The only difference in the questionnaire booklets was the nature of the health-related quality of life instrument; they were otherwise identical.
For a study of reproducibility, we then randomly sampled one third of the patients who had responded within approximately 3 weeks to the first questionnaire for repeated testing with the same health-related quality of life instrument (test-retest reliability).10 Concurrently, we also randomly selected one third of patients who had responded within approximately 3 weeks to the first EuroQol (the remaining one third of patients were involved in another study) for follow-up again with the SF-36 and two thirds of the patients who had responded within approximately 3 weeks to the first SF-36 for follow-up again with the EuroQol (Figure 1⇓). These patients were completely separate from those included in the study of the reproducibility.10 The planned (and actual) flow of patients is shown in Figure 1⇓.
We mailed the second questionnaire booklet containing the appropriate instrument to all eligible patients with a personalized letter and a postage-paid reply envelope. The letter explained the purpose of the repeated questionnaire and asked the subjects to respond if possible without the help of another person and, if not, to give the questionnaire to a close relative or caregiver who was willing to respond on the patient’s behalf. We sent a reminder letter and an additional identical questionnaire to any patient who had not responded within 14 days. We made no further attempts to contact nonrespondents thereafter. We marked individual questionnaire booklets with labels that included details of the patient’s name, address, trial identifying number, and questionnaire allocation.
The ability of a questionnaire to discriminate between different levels of health is an important aspect of validity. This is determined in part by whether a measure can define a full range of potential health states and whether it is sensitive to change or difference over this range. Patients who are at the lowest score on a measure will have no scope to show any further decline of health (“floor” effects).11 12 Similarly, if the majority of patients score near the top of the measure, it will have little scope to show improvements in health (“ceiling” effects).11 12 We initially examined the distribution of scores in each domain to identify the degree of skewness of the distribution, chiefly by assessing, for each domain, the proportion of respondents with a maximum or minimum score for that domain (ie, ceiling and floor effects). We also assessed the levels of missing data for both instruments.
“Ordering” effects are a potentially important source of bias in an unbalanced crossover study. For instance, completing the EuroQol questionnaire first might affect the patients’ subsequent response to the SF-36. We therefore used a simple factorial 1-way ANOVA to investigate whether ordering effects occurred after the administration of either the EuroQol or SF-36. We restricted these analyses to the comparable domains of both instruments (physical functioning [SF-36] versus mobility [EuroQol] and self-care [EuroQol], physical role functioning [SF-36] versus mobility [EuroQol] and self-care [EuroQol], social functioning [SF-36] versus activities [EuroQol], pain [SF-36] versus pain [EuroQol], mental health [SF-36] versus psychological functioning [EuroQol], and psychological role functioning [SF-36] versus psychological functioning [EuroQol]), to avoid the problems that may arise from multiple testing.
We assessed the construct validity of both instruments further by testing the relationship between the EuroQol and SF-36 domains. Thus, the relationship between comparable domains on the EuroQol and SF-36 (such as physical functioning on the SF-36 and mobility on the EuroQol) should be higher than between less comparable domains (such as physical functioning on the SF-36 and psychological functioning on the EuroQol). In contrast, the domains that examine more general aspects of health (such as overall health-related quality of life on the EuroQol) should be moderately correlated with all the other domains. We examined these relationships in 2 separate ways. We initially calculated patients’ median score, for each separate domain of the SF-36, for patients categorized according to their response to the corresponding EuroQol domain. These analyses were performed to facilitate the interpretation of patients’ scores with the SF-36. We subsequently calculated correlation coefficients between the domains of the EuroQol and each of the domains of the SF-36. All analyses were performed with the statistical software package SPSS for Windows (release 6.1).
Of the 905 respondents to the initial EuroQol questionnaire, 272 (approximately one third of respondents) were selected at random to receive a subsequent SF-36. A separate 505 patients (approximately two thirds of respondents) were selected at random from the respondents to the initial SF-36 to receive a EuroQol questionnaire (Figure 1⇑). Four hundred fifty-eight (91%) of those allocated to the EuroQol questionnaire responded. A slightly lower proportion (85%) of the patients allocated to the SF-36 questionnaire responded (Figure 1⇑). The mean delay between the completion of the initial EuroQol and subsequent SF-36 was 28 days (SD, 12 days). For patients allocated to the SF-36 initially, the mean delay between completion of the questionnaires was 29 days (SD, 13 days).
We performed a simple 1-way factorial ANOVA to assess the effect of the questionnaire ordering (ie, “EuroQol then SF-36” or “SF-36 then EuroQol”) on the relationship between EuroQol and SF-36 scores for comparable domains. The “ordering term” was not a significant determinant of the relationship between the EuroQol and SF-36 scores in the 8 analyses performed. We therefore combined all the data for the remaining analyses.
The distribution of scores for the SF-36 is described in Table 1⇓. There was substantial variation in the proportion of responses with missing data between the different domains, which ranged from 2% to 16% (social functioning and psychological role functioning domains, respectively). The distribution of scores was highly skewed for some domains. A large proportion of respondents scored the minimum possible score (0 of a possible 100, ie, the floor of the scale, and the worst possible outcome) for the domains of physical role functioning and emotional role functioning. Approximately one quarter of patients scored the maximum score for the bodily pain and psychological functioning domains of the SF-36.
The distribution of patients’ responses to the categorical domains of the EuroQol are described in Table 2⇓. The proportion of missing data (approximately 3%) was very similar for each of these 5 domains (Table 2⇓). However, since each of the domains had only 3 potential levels of response, a substantial proportion of patients scored the maximum possible score (no problems) for each domain. Examination of the distribution of overall estimates of health-related quality of life with the EuroQol visual analog scale or the EuroQol utility scores did not suggest problems with ceiling or floor effects (Figures 2⇓ and 3⇓).
The relationships between patients’ responses to the EuroQol and SF-36 questionnaires are presented in Tables 3⇓ and 4⇓. Table 3⇓ presents the median scores for the relevant SF-36 domains for patients categorized according to their response to the comparable EuroQol domain. For almost all of the domains, the median scores were ordered appropriately and were significantly different between the groups. Indeed, physical functioning, social functioning, and pain measured with the SF-36 were particularly closely related to the corresponding domains on the EuroQol. However, there was no difference in the median scores for the physical role functioning and psychological role functioning domains between patients reporting “some” or “severe” problems with the EuroQol. Furthermore, there was only a weak relationship between the mental health domain (SF-36) and psychological functioning domain (EuroQol) (Table 3⇓).
Table 4⇑ reports the correlation between each of the domains of the EuroQol and those of the SF-36. The physical functioning domain on the SF-36 correlated most closely with the mobility, self-care, and activities domain of the EuroQol; it correlated less closely with the pain and psychological domains of the EuroQol. Social functioning on the SF-36 was moderately correlated with all the domains of the EuroQol. Bodily pain was most closely correlated with the pain domain of the EuroQol. In contrast, mental health correlated only poorly with psychological functioning measured with the EuroQol. The vitality and general health domains of the SF-36 correlated particularly strongly with the overall health-related quality of life domain of the EuroQol, but also moderately with the other domains of the EuroQol.
Relationship Between EuroQol and SF-36
The EuroQol and SF-36 questionnaires have a different background, structure, content, and length and also ask the patient to consider different time periods. Nonetheless, we observed a close relationship between the domains that assessed physical functioning, social functioning, bodily pain, and overall health-related quality of life. Our results suggest that the 2 instruments are generally sampling similar areas of health. This finding supports the notion that there are several key dimensions that constitute health-related quality of life, as well as providing further support for the construct validity of the assessments of these domains with either instrument.
The correlation between patients’ responses to the mental health domain of the SF-36 and the psychological functioning domain of the EuroQol was poor. There are several possible explanations for this. First, it is possible that these domains, although superficially similar, are measuring different constructs. This is supported by the fact that the EuroQol item focuses on anxiety and depression, whereas the SF-36 mental health scale includes positive emotions as well (eg, feeling calm and peaceful). The psychological role functioning domain of the SF-36, which emphasizes anxiety and depression, correlated much better with the EuroQol psychological functioning domain than did the mental health domain of the SF-36 (Spearman rank correlation coefficient, 0.43 versus 0.21). However, an alternative explanation is that 1 or both of these domains have poor measurement properties in patients with stroke. There are several indications of this. First, the assessments of mental health with the SF-36 were clustered around the middle of the scale (mean score, 61; SD, 12) and therefore did not appear to take full advantage of the potential breadth of the scale. Second, the reproducibility of the mental health assessments with the SF-36 was also particularly poor (intraclass correlation coefficient=0.28).10 Finally, approximately half of these questionnaires were completed with the help of proxies, and the validity of these proxy assessments is particularly questionable for the domain of psychological functioning.13 14
It has been difficult to establish the validity of the numerical assessments of overall health-related quality of life with the EuroQol because this domain is difficult to define and is highly subjective.1 However, the general health domain of the SF-36 appears to examine a similar construct.4 15 It aims to assess an individual’s general health perceptions and satisfaction and, as with the EuroQol, these general health perceptions appear to provide an approach by which different components of health such as disease, functioning, symptoms, and feelings can be integrated. The strong correlation (Pearson correlation coefficient=0.66) between patients’ responses to these domains supports the view that both these domains are measuring the same underlying trait. The validity of these assessments is further supported by the moderate correlation of the assessments of overall health-related quality of life with the other domains of the SF-36.
Studies of interventions must show that the observed changes in patients that are due to the intervention are important and substantial enough to warrant further consideration in medical practice and policy planning.16 One approach to the definition of clinical meaningfulness is the use of anchor-based interpretations.17 These definitions represent instances in which the changes in quality of life measures were compared, or anchored, to other clinical changes or results. The descriptive nature of the categorical levels of the EuroQol questionnaire could be considered potential anchors. In the present study, a change of 55 points in the physical functioning scale of the SF-36 appeared to be equivalent to the difference between “no problems” and “some problems” in the categorical mobility domain of the EuroQol. However, several factors limit the usefulness of this approach. First, clinicians may be unsure about the significance of such a change in the EuroQol and what the term “some problems” means in practice. Second, the amount of change judged significant may differ with the population and the type of treatment under study. Third, most scales are not linear, ie, not an interval scale, and therefore a change of 10 units at the top of the scale may not be the same as a similar-sized change at the bottom of the scale. An additional factor that limits the interpretability of Likert scaled scores (such as the SF-36) is that the total score will never give clinical information about the exact responses to individual items. For example, a total score of 50 on the 10-item physical functioning scale can be achieved in different ways.
The difficulties in defining a clinically significant change in these health-related quality of life measures reflect in part the newness of these measures and our lack of experience with them.17 Therefore, presenting these correlations should improve researchers’ familiarity with them and may help to develop an intuitive feeling about the relevance of any change.
Distribution of Scores
The large number of patients scoring the minimum score (worst outcome) in the physical and emotional role functioning domains of the SF-36 suggests that floor effects may be present in these domains. The observation that the median scores for the physical role functioning domain did not distinguish between patients classified as having moderate or severe problems by the mobility or self-care domains of the EuroQol confirms this suspicion. The role functioning domains may therefore not measure the consequences of more severe disabilities, and this might reduce responsiveness in these domains. This confirms the findings of other investigators in studies of the SF-36 in patients with stroke,18 the elderly,7 and groups of patients with other diagnoses.19 Indeed, because of these problems the SF-36 has recently been revised.20
The lower frequency of response and lower levels of data completeness in patients followed up with the SF-36 compared with the EuroQol are consistent with the result of the direct randomized comparison of their feasibility after stroke.8 We used an interpolation procedure to reduce the proportion of missing data (missing items were substituted with the mean response to other items),5 and therefore these results underestimate the underlying level of missing data for the SF-36. Brazier and colleagues7 have expressed concerns over the validity of these interpolation procedures. They suggest that when patients omit items because they do not appear relevant to them, this may indicate that the respondent is in fact unable to perform that particular activity or function, and therefore the average response to the other items could be misleading if interpolation is used for missing values.7
The crossover design used in this study seems to have been valid since there did not appear to be any significant carryover or other ordering effects. Furthermore, the study of test-retest reliability demonstrated that the patients did not change significantly in any of the domains of health-related quality of life between test and retest.10 These findings justified the combined analysis of all the data irrespective of the order of questionnaire administration.
In summary, despite fundamental differences in their background, design, and format, the domains of the EuroQol and SF-36 measured broadly similar aspects of health-related quality of life. The weak relationship between the assessment of mental health with the SF-36 and psychological functioning with the EuroQol may reflect a difference in content or more fundamental problems with the validity or reliability of the items in either of these domains. Unfortunately, it is difficult to resolve which of these explanations applies since no reference instruments were administered concurrently. This study has provided the first empirical qualitative evidence by which data on the SF-36 after stroke may be interpreted.
This study was supported by a grant from Glaxo-Wellcome plc. Dr Dorman was supported by a United Kingdom Medical Research Council Training Fellowship. Dr Sandercock was also supported by grants from the United Kingdom Medical Research Council. We would like to thank the patients, their families, and their caregivers for their keen participation. We are also grateful to Stephanie Lewis for statistical support.
- Received May 20, 1999.
- Revision received June 24, 1999.
- Accepted June 28, 1999.
- Copyright © 1999 by American Heart Association
Dorman PJ, Waddell FM, Slattery J, Dennis MS, Sandercock PAG. Is the EuroQol a valid measure of health-related quality of life after stroke? Stroke. 1997;28:1876–1882.
Anderson C, Laubscher S, Burns R. Validation of the Short Form 36 (SF-36) health survey questionnaire among stroke patients. Stroke. 1996;27:1812–1816.
Medical Outcomes Trust. SF-36 Health Survey: Scoring Manual for English-Language Adaptations. Boston, Mass: Medical Outcomes Trust; 1994.
Dorman PJ, Slattery JM, Farrell B, Dennis MS, Sandercock PAG, and the United Kingdom Collaborators in the International Stroke Trial. A randomised comparison of the EuroQol and SF-36 after stroke. BMJ. 1997;315:461.
Dorman PJ, Slattery JM, Farrell B, Dennis MS, Sandercock PAG, and the United Kingdom Collaborators in the International Stroke Trial (IST). A qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 after stroke. Stroke. 1998;29:63–68.
Streiner DL, Norman GR. Health Measurement Scales: A Practical Guide to Their Development and Use. Oxford, England: Oxford University Press; 1989.
Segal ME, Schall RR. Determining functional/health status and its relation to disability in stroke survivors. Stroke. 1994;25:2391–2397.
Dorman PJ, Waddell FM, Slattery JM, Dennis MS, Sandercock PAG. Are proxy assessments of health status after stroke with the EuroQol questionnaire feasible, accurate, and unbiased? Stroke. 1997;28:1883–1887.
Ware JE. Measures for a New Era of Health Assessment. Durham, NC: Duke University Press; 1992.
O’Mahoney PG, Rodgers H, Thomson RG, Dobson R, James OFW. Is the SF-36 suitable for assessing health status of older stroke patients? Age Ageing. 1998;27:19–22.
Jenkinson C, Stewart-Brown S, Petersen S, Paice C. Assessment of the SF-36 version 2 in the UK. J Epidemiol Community Health. 1999;53:46–50.