Are Proxy Assessments of Health Status After Stroke With the EuroQol Questionnaire Feasible, Accurate, and Unbiased?
Background and Purpose It is often difficult to determine the health-related quality of life (HRQoL) of stroke patients because physical and cognitive problems limit their ability to complete complex questionnaires. A proxy, such as a family member or caregiver, may be able to give an estimate of the patients’ health status. We therefore examined the agreement between the HRQoL as assessed by a series of patients and that assessed by their proxies.
Methods We studied the validity of the EuroQol in a series of 152 patients from our prospective registry of patients with first (or recurrent) stroke. We asked patients to ensure that a friend or relative (a proxy) who knew them well was available at the time of the interview. We asked each proxy to complete a EuroQol questionnaire independently on behalf of the patient.
Results Proxies completed forms for 130 patients (86%). Agreement between responses from the patients and those from their proxies was better for patients who were able to self-complete the EuroQol than for patients who required the EuroQol to be administered by interview. For both groups, agreement was best for the self-care domain and worst for the domain that assessed psychological outcome. For the more severely affected patients, agreement was only fair for the pain and social functioning domains and no better than chance alone for the psychological functioning domain (κ=0.05, 95% confidence interval, 0 to 0.43). Patients tended to rate their own health status as better than their proxies did (P<.05).
Conclusions We found moderate agreement between responses from patients and those from their proxies for the more directly observable domains of the EuroQol. Proxy agreement was less good for the more subjective domains. In health surveys, allowing responses by a proxy increases response rate. However, the disadvantages inherent in the use of proxy responses must be considered carefully. In general, some domains of HRQoL information obtained from a proxy may be sufficiently valid and unbiased to be useable in most types of trials and surveys.
When assessing outcome after an illness such as stroke, it may be preferable to measure health-related quality of life (HRQoL), than to just count the frequency of major clinical events or to simply apply a single measure of impairment or disability.1 This becomes particularly relevant when evaluating treatments that do not have a major impact on death or disability, yet do improve outcome in more subtle ways. Conventional measures of disability may not detect worthwhile improvements in psychological function, social function, and even some aspects of physical function; yet these domains may be important to patients and their caregivers.2 Furthermore, government research agencies now routinely ask investigators to perform, or to at least consider, HRQoL issues in clinical research proposals.3 The EuroQol provides a broad, generic assessment of health status, and appears to be valid in stroke patients.4 It provides a simple descriptive profile of health status in several ways. There are questions to assess status in five domains (mobility, self-care, social functioning, pain, and mood) with three possible categories of response in each domain.5 The pattern of responses classifies patients into one of 243 (35) unique health states. The EuroQol also includes a visual analogue scale with which patients rate their own health between 0 and 100, providing an overall numeric estimate of their HRQoL.
It is often difficult to measure the HRQoL of many patients with stroke because physical and cognitive problems limit their ability to complete complex questionnaires.6 In a large study of over 2000 stroke patients who were sent postal versions of the EuroQol and Short Form-36 (SF-36) more than 3 months after their stroke, about half were unable to complete either type of questionnaire by themselves.7 Asking someone else, such as the caregiver, may be the only way to assess quality of life for a patient who is unable to complete the questionnaire themselves (this is often referred to as a proxy measure). Proxy measures of the SF-36 were disappointingly inaccurate.8 However, rating of a patients’ functioning on the EuroQol by a proxy could prove to be valid, since much of the information sought is concrete and observable. We therefore examined whether a proxy could assess a stroke patient’s HRQoL accurately and without bias using the EuroQol.
Patients and Assessments
We studied the validity of the EuroQol in a series of 152 patients from our prospective registry of inpatients and outpatients with first (or recurrent) stroke. We have described the methods used to identify patients in detail elsewhere.4 Briefly, we selected patients who were alive at least 3 months after their index stroke and who lived within an approximate 10-mile radius of the hospital. All patients were visited by the study nurse at home. The nurse gave the patients a EuroQol questionnaire that they were to complete by themselves if possible. If the patient was unable to complete it, the nurse administered it by interview. The nurse administered other standard assessments, including the Barthel Index and the Office of Population Censuses and Surveys (OPCS) disability score.9 10 Patients could select the person they considered to be the most appropriate proxy for them; we asked all patients to choose someone who knew them well (eg, close relative, friend, or caregiver) and who could be available at the time of the interview. We asked these proxies to complete a EuroQol questionnaire (preferably in a separate room) on behalf of the patient. If a proxy was not available at the time of the interview, we contacted them by post.
We calculated the level of agreement for the categorical data items of the EuroQol between the assessments of the patients’ health status by the patients and their proxies. We did not use a correlation coefficient (eg, Spearman rank) to assess agreement because it only measures association and would be constant under deviations of scale or bias.11 We therefore used the κ statistic, which measures the amount of agreement beyond that which could be expected by chance.11 12 We calculated the variance of the κ statistic using Altman’s method.12 Because the scale items had three levels of response, we used all three levels for the estimation of κ. We assessed agreement separately for patients who were able to complete the EuroQol questionnaire themselves and those who could not complete the EuroQol themselves and consequently had to be interviewed. Unfortunately, no absolute definitions exist for the interpretation of any given κ statistic. We planned to base our interpretation of the κ statistic on the following widely cited guidelines: <0.2 implies poor agreement, 0.21 to 0.40 implies fair agreement, 0.41 to 0.60 implies moderate agreement, 0.61 to 0.80 implies good agreement, and 0.81 to 1.00 implies very good agreement.11 12 13
The analysis of the continuous data from the visual analogue scale on overall HRQoL was more complex. Differences between the patients and their proxies in their estimates of the patients’ overall HRQoL might be due to observer error, systematic differences (ie, bias), or random effects (ie, the play of chance). To display the raw data, we planned to plot a simple scatterplot and calculate a linear correlation coefficient. However, this plot gives little information on systematic differences, so we also performed a Bland and Altman analysis, which plots the difference between the two estimates against the mean of the two estimates.14 The EuroQol is bounded at 0 and 100, which limits the value of a Bland and Altman plot, so we also used a factorial ANOVA (SPSS for Windows, Release 6.1, SPSS Inc) to calculate the intraclass correlation coefficient, an appropriate measure of agreement for continuous data.15
One hundred fifty two patients participated in the study. Of these, 92 (61%) were able to complete the questionnaires independently and the remaining 60 patients (39%) could only be assessed by interview. The interviewer rated 6 of these 60 patients as having significant difficulties in communication by the OPCS communication subscale (all scored >5, which implies they are very difficult for strangers to understand or worse). We excluded the data on these 6 patients from the analyses because it was almost exclusively derived from the caregiver not the patient, and so it was not informative for the current analyses (which required that the patient should be able to provide information directly and equally well by interview or self-completed questionnaire). A proxy was available and completed a form for 130 patients (86%): 94 of 130 forms (72%) were completed at the time of the home visit and 36 were returned later by post (of these, 16 were completed within 1 day of the patient assessment and all but 1 were completed within 7 days of the patient assessment).
Agreement between the proxies’ and the patients’ estimate of HRQoL is shown in Table 1⇓. Agreement was better for patients who were able to self-complete the EuroQol than for patients who required the EuroQol to be administered by interview. For both groups, agreement was best for the self-care domain and worst for the domain used to assess psychological outcome. For the more severely affected patients (assuming that the reasons for being unable to self-complete are generally stroke-related), agreement was only fair for the pain and social functioning domains and no better than chance alone for the psychological functioning domain (κ=0.05, 95% confidence interval [CI], 0 to 0.43).
Plotting the differences between the patients’ and proxies’ estimates of overall HRQoL against the mean score (Bland and Altman plot) showed an expected distribution for a score bounded at 0 and 100 (Figure⇓). For all patients combined, the mean of the differences between the patients’ and proxies’ estimates of overall HRQoL was 2 (95% CI for a pair of differences=−38 to 42, Table 2⇓); this indicates that the proxies’ estimates of overall HRQoL were not significantly different from the patients’. A factorial ANOVA also suggested that there was no statistically significant variance between patients’ and proxies’ numeric estimates of overall health status. For all patients combined, the intraclass correlation coefficient (a measure of the agreement between the patients’ and proxies’ estimates of overall health status) was moderate with an intraclass correlation coefficient of .49 (P<.0001). Agreement for the estimates of overall HRQoL was better for the subgroup of patients who were able to complete the EuroQol themselves (intraclass correlation coefficients for those able to complete: .53 versus .32 for patients unable to complete by themselves).
When the categorical data were used, a higher proportion of patients reported “no problems” in each of the five domains than their proxies (Table 3⇓). In these categorical domains of the EuroQol, the proxy estimated the level of functioning to be the same as that reported by the patient for 466 of the potential 640 outcomes. For 100 outcomes, the proxy estimated the functioning to be worse than that estimated by the patient. In contrast, there were only 74 outcomes for which the proxy estimate of the patients’ functioning was better than that reported by the patient (test for symmetry, P<.05).
Many stroke survivors are unable to complete questionnaires measuring health status by themselves. The use of a proxy to assess a patient’s HRQoL should help increase the proportion of patients in trials and surveys of stroke therapy who have complete data. This should improve the quality and generalizability of the data. In the current study, we could obtain proxy assessments for 86% of the patients, which suggested that proxy assessment of HRQoL after stroke was generally feasible. Our analyses suggested that the patients and their proxies agreed reasonably well in their assessments of the patients’ HRQoL after stroke, at least for mobility and self-care. Agreement was less good for social functioning, pain, and the overall estimate of HRQoL and even worse for psychological functioning. The degree of agreement between proxy and patient varied and was better among less severely affected patients who completed the initial EuroQol themselves. However, proxy assessments would be of value if they could also be used in more severely affected patients who are unable to complete questionnaires themselves. The degree of agreement among patients who were more severely affected and had to have the EuroQol by interview may therefore provide a more realistic guide to the value of proxy assessments.
In this study, the agreement was apparently less among more severely affected patients. This loss of agreement could have been due to observer error by the proxy or a systematic difference due to the different mode of questionnaire administration. The latter notion is supported by a recent report that suggests that patients give a more optimistic picture of their health status when assessed by interview than by self-completed questionnaire.16 Furthermore, random errors may be important; the sample size was quite small (especially for the subgroup analysis in Table 1⇑) so the 95% CI around each estimate of agreement is wide and does not exclude the possibility of substantially better agreement. There are a number of other possible sources for less than perfect agreement. When a patient and his/her proxy appear to disagree about the patients’ health status after a stroke, the following factors may contribute to the disagreement: the domain under study, systematic differences in perceived health (ie, bias), relationship of the proxy to the patient, random error, and the choice of statistic to measure agreement. The poor agreement for social functioning, pain, psychological functioning, and overall HRQoL probably reflected the subjective nature of these domains.
The proxy tended to report the patient’s problems as more severe than did the patient. This suggests that proxy assessments of HRQoL do indeed differ systematically from self-assessments. Were the patients more optimistic about their health status than their proxies, did the patients adjust to or fail to perceive their own deficits, or were the proxy responders being pessimistic? The patients’ view is likely to be more valid, as HRQoL instruments primarily aim to assess the patients’ subjective perception of their own health. However, we cannot be certain, since there is no accepted gold standard for the measurement of HRQoL.
We allowed the patient to decide who could act as their proxy (rather than stipulate that they must choose a spouse or a close family member). It is possible that some of the proxies were selected simply because they were available and so might not have known the patient well enough to complete the assessment accurately. If allowing the patient to choose the proxy does introduce some extra measurement error, the error might not be reduced by insisting that a family member is used as the proxy: regrettably not all blood relations are sufficiently familiar to assess their relatives’ HRQoL reliably! Furthermore, many patients do not have any family members living nearby and so a relatively imprecise estimate by a close friend may well be better than a very imprecise estimate from a distant family member and probably better than no estimate at all.
In our study, patients had to be able to complete the EuroQol either by themselves or by interview. We could not have assessed whether the proxy responses were valid for the patients who were unable to complete the EuroQol. Although we observed worse agreement for the patients who required the EuroQol to be administered by interview, we cannot necessarily infer that the agreement would have been even worse for even more severely affected patients (who have greater difficulties with communication) because the observed differences in agreement may have been due to the method of questionnaire administration.16 However, it seems likely that the use of proxies for patients who have difficulties with communication will have greater bias and measurement error because their relatives, friends, and caregivers will almost certainly have less insight into their perceived HRQoL.
The distribution of the random error is likely to be strongly influenced by the reproducibility of the EuroQol. In other words, some domains may be more prone to measurement error than others. It is possible that the more subjective domains have the worst reproducibility. A number of methodological factors may have caused us to underestimate the true level of agreement between patients and their proxies. First, since the EuroQol assesses the patients’ HRQoL on the day of completion, any delay in getting assessments from proxies who were not available at the time of the interview might have reduced the true level of agreement (as some of the patients could have changed). This effect is unlikely to be important because the majority of assessments (72%) were performed at the time of the home visit and nearly all of the remaining assessments were completed within 7 days of the home visit. An unweighted κ statistic may also underestimate the true level of agreement, because it ignores the ordering of the three levels of the EuroQol. Furthermore, the interval differences between each of the three levels of the EuroQol (“no problems,” “some problems,” and “severe problems”) are unlikely to be equal. The difference between “no problems” and “some problems” may be greater than that between “some problems” and “severe problems.” Weighting the κ statistic to get around these problems is not necessarily the solution, since any weights will inevitably be arbitrary. Finally, the dependence of the κ statistic on the prevalence of the underlying attribute being measured complicates its interpretation.11 Alternatively, it is also possible that we have overestimated the true level of agreement because we cannot be sure that some of the questionnaires returned by post were not completed with some input from the patient.
In a randomized trial or survey that measures HRQoL, allowing a proxy to respond on behalf of the patient has potential disadvantages: it may increase random error and so reduce the statistical power of the study to detect the treatment effect, particularly for the domain of psychological functioning,17 and it may also introduce bias. In an observational study, such bias might make the overall outcome appear worse than if the patient had responded. In a randomized trial, if the treatment were effective, this might reduce the number of patients with poor outcome who can only be assessed by proxy in the treatment group (but not in the control group) and so exaggerate the treatment effect. This type of bias would, however, not be expected to affect the direction of the treatment effect or its statistical significance. Furthermore, the above bias is not unique to the EuroQol because proxy assessments of more objective outcomes, eg, disability, are affected by a similar bias.18 In general, such “second order” biases are not very important. The use of proxy responses is likely to ensure a higher overall response rate that will substantially reduce the risk of random error and bias.
In summary, a proxy assessment appears feasible in a wide variety of patients. The proxy assessed the domains of mobility and self-care accurately and without major bias, although there was a slight tendency for them to take a generally somewhat more pessimistic view of the patients’ HRQoL. Therefore, for at least these domains, it seems reasonable to use proxy responses for the EuroQol in stroke patients who cannot complete questionnaires by themselves (especially if face-to-face interviews are not practicable). Proxy assessments of social functioning, pain, and overall HRQoL were associated with more error and must be interpreted more cautiously. Proxy assessments of psychological functioning were the least reliable, particularly in patients who required the EuroQol to be administered by interview, in whom they were no more accurate than chance alone. These findings are consistent with other evaluations of ratings by proxies.19 20 In general, allowing the use of proxy response where necessary is likely to be preferable to forbidding them in randomized controlled trials and many types of observational studies. However, where the focus of an observational study is an aspect of HRQoL other than physical functioning, the use of proxy responses may not be a good idea.
Paul Dorman is supported by a UK Medical Research Council Training Fellowship. Jim Slattery and Peter Sandercock are supported by grants from the UK Medical Research Council. The study was also supported by a grant from Glaxo-Wellcome plc.
- Received January 13, 1997.
- Revision received April 22, 1997.
- Accepted May 30, 1997.
- Copyright © 1997 by American Heart Association
Goodare H, Smith R. The rights of patients in research: patients must come first in research. BMJ. 1995;310:1277-1278.
Dorman P, Waddell F, Slattery J, Dennis M, Sandercock P. Is the EuroQol a valid measure of health-related quality of life after stroke? Stroke.. 1997;28:1876-1882.
Dorman P, Slattery JM, Farrell B, Dennis MS, Sandercock PAG, and the United Kingdom Collaborators in the International Stroke Trial. A randomised comparison of the EuroQol and SF-36 after stroke. BMJ. 1997;315:461.
Segal ME, Schall RR. Determining functional/health status and its relation to disability in stroke survivors. Stroke. 1994;25:2391-2397.
Mahoney F, Barthel D. Functional evaluation: the Barthel Index. Md Med J. 1965;14:61-65.
Wellwood I, Dennis M, Warlow CP. A comparison of the Barthel Index and the OPCS Disability Instrument used to measure outcome after acute stroke. Age Ageing. 1995;24:54-57.
Brennan P, Silman A. Statistical methods for assessing observer variability in clinical measures. BMJ. 1992;304:1491-1494.
Altman DG. Practical Statistics for Medical Research. 1st ed. London, UK: Chapman & Hall; 1993.
Morton AP, Dobson AJ. Assessing agreement. Med J Austral. 1989;150:384-387.