| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2004;35:607.)
© 2004 American Heart Association, Inc.
Original Contributions |
From the College of Pharmacy, University of Illinois at Chicago (A.S.P.); Division of Neurology (A.M.N., A.S.), Faculty of Medicine and Dentistry (J.A.J., D.H.F.), Faculty of Pharmacy and Pharmaceutical Sciences (D.H.F.), and Department of Mathematics and Statistics (K.C.C.), University of Alberta, Edmonton, Alberta; Institute of Health Economics (J.A.J., D.H.F.), Edmonton, Alberta; and Health Utilities Incorporated (D.H.F.), Dundas, Ontario, Canada.
Reprint requests to A. Simon Pickard, PhD, College of Pharmacy, Room 164, 833 S Wood St (MC886), University of Illinois at Chicago, Chicago, IL 60612. E-mail pickard1{at}uic.edu
| Abstract |
|---|
|
|
|---|
Methods An observational longitudinal cohort of 124 patients hospitalized after ischemic stroke and their family caregivers completed the HRQL measures at baseline and were followed up for 6 months. Patient and proxy agreement was assessed by use of weighted
or the intraclass correlation coefficient (ICC).
Results At baseline, the more observable domains of HRQL demonstrated greater agreement than the more subjective components. Cross-sectional point estimates of agreement were generally acceptable (ICC >0.70) for the EQ-5D Index and HUI3 summary scores when assessed
1 month after baseline. Agreement between change scores was generally poor to fair (ICC <0.60), but systematic bias was not observed for the indirect preference-based summary scores between baseline and 6 months.
Conclusions Results suggest that proxy assessments obtained 6 months after stroke are more reliable than those obtained within 2 to 3 weeks after stroke. Although proxy-assessed change scores for indirect preference-based summary scores of the EQ-5D and HUI3 provided suboptimal agreement with patient assessment, limited systematic bias may support their consideration as alternatives to missing data or statistical imputation. Further research into the validity and reliability of proxy assessments is suggested.
Key Words: observer variation outcome quality of life stroke assessment
| Introduction |
|---|
|
|
|---|
The reliability of proxy raters has been examined in several independent studies of generic HRQL measures in stroke, including investigations of the Health Utilities Index Mark 2 (HUI2) and Mark 3 (HUI3),4 EQ-5D,5 Health Status Questionnaire,6 and Sickness Impact Profile.2 Consistent with the general literature on proxy respondents, these studies found that patient-proxy agreement was stronger for physically based, observable attributes than for less observable, psychosocial attributes. Recommendations varied, with investigators supportive of using proxies (Sickness Impact Profile, HUI2/3),2,4 not supportive (Health Status Questionnaire),6 or conditionally endorsing proxy respondents as reliable assessors of the more observable domains of HRQL (EQ-5D).5
We were interested in using a longitudinal design to examine the reliability of proxy assessments using both the EQ-5D and HUI3 for several reasons. Previous studies in stroke have investigated each measure independently using cross-sectional designs. A longitudinal study using several HRQL measures concomitantly would be useful to corroborate agreement on different domains of HRQL across measures and to compare agreement at different points in time after the stroke event. In addition, we were interested in whether mean proxy-assessed change scores were systematically different from patient scores, an issue important in the analysis of clinical trial data.
Four specific hypotheses were proposed. First, mean cross-sectional HRQL scores assessed by proxy assessment were expected to be lower than patient self-assessment.7 Second, patient-proxy agreement was expected to be greater at 6 months than at baseline as the patient became more clinically stable.8 Third, stronger patient-proxy agreement was expected for the more observable domains of HRQL7 such as mobility, ambulation, and self-care, and poorer agreement was expected on the less observable domains (eg, emotion). Fourth, patient-proxy agreement was expected to be poorest for the visual analog scale (VAS) of the EQ-5D. The EQ-VAS score involves a direct valuation of health. The other summary scores are based on health status assessments and calculated with an algorithm based on community preferences. Thus, VAS scores reflect heterogeneity in both health status and the valuation of health states, whereas the other summary scores reflect only heterogeneity in health states.
| Subjects and Methods |
|---|
|
|
|---|
In addition to patient consent, patients were required to have a caregiver (proxy) who also consented to participate. The proxy was a family caregiver such as a spouse or partner, sibling, or offspring or, if unavailable, a friend. Both patient and caregiver had to be able to comprehend English and be
18 years of age. Patients were excluded if they had a life expectancy of <6 months for any medical reason, a history of previous degenerative or space-occupying brain disorder, hemorrhagic or lower brainstem stroke, subarachnoid hemorrhage or transient ischemic attack, coma, global or Wernickes aphasia, or history of dementia before stroke. Patients and caregivers had to live within 150 km of Edmonton, Alberta, and not be cognitively impaired in the judgment of the clinical assessor. The study was conducted through 2 large teaching hospitals in Edmonton, and ethics approval was obtained from the participating institutions and the Health Research Ethics Review Board at the University of Alberta.
The patient and proxy were requested not to discuss the items with each other during completion of the questionnaires. Assessments were performed at baseline and 1, 3, and 6 months after baseline. After baseline, research assistants contacted and visited the patients and caregivers to oversee questionnaire completion. Research assistants were permitted to assist physically in the completion of the surveys if the respondent requested help.
Standard 1-week recall versions of the HUI questionnaire for self-completion were administered to the patient and proxy and scored as recommended.10 HUI3 single-attribute utility scores are defined on a scale in which no impairment in that attribute (ie, normal) is assigned a score of 1.00 and severe impairment (eg, blind on the vision attribute) is assigned a score of 0. Overall scores are on a scale in which perfect health is equal to 1.00 and dead is equal to 0; negative scores imply health states worse than dead. The HUI3 includes vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain.
A standard version of the EQ-5D was administered that comprises a 5-domain health self-classification system and a VAS, described as a "feeling thermometer" rated from 0 to 100, anchored by worst and best imaginable health state.11 The health state vector from the self-classification system was transformed into a single, index-based preference score (EQ-Index) using the scoring algorithm from York (UK).12 The EQ-5D was amended for completion from the proxy view of the patients health status. Because the HUI questionnaire has standardized instructions for completion by proxy, it preceded the EQ-5D in order of administration.
Differences in central location (median) of responses by patient and proxy to each dimension of the EQ-5D were tested with the sign test. Agreement on each EQ-5D item was evaluated with
, weighted by squared differences.13 Differences between patient and proxy HUI3 single-attribute scores were assessed with the Wilcoxon ranked sign test (2 tailed). Agreement for HUI3 single-attribute scores and for the overall utility scores was assessed with a 1-way random-effects modelbased intraclass correlation coefficient (ICC).14 Weighted
/ICC agreement was generally used to interpret level of agreement because it gives partial credit to paired responses that, although not perfectly concordant, are close together. However, the ICC relies on variance, and if the group is relatively homogeneous in ability, the statistic will understate agreement. In such an instance, percent exact agreement is an informative, complementary statistic.
The magnitude of the systematic bias between patient and proxy mean scores was quantified with the standardized response mean (SRM), calculated as patient minus proxy score standardized by the standard deviation of the difference score.15 Given that the SRM is a variant of effect size (d), an absolute standardized difference of |d|=0.2 was interpreted as small effect; |d|=0.5 indicated medium effect; and |d|=0.8 or more was interpreted as large effect.2 A generic threshold of discrimination of minimally important differences in HRQL in chronic diseases has been estimated at half an SD.15 A guideline for interpretation of interrater reliability generalizability coefficients is as follows: poor (
0.40), fair (0.41 to 0.59), good (0.60 to 0.74), and excellent (
0.75).16
Results presented focus on baseline and 6-month assessment (further analyses available on request). Because of the nature of the research objectives, item nonresponses were not imputed. Values of P<0.05 were considered statistically significant, and confidence intervals (CIs) were calculated for the 95% level. Statistical analyses were performed with SPSS version 10.1.3 and SAS system version 8.01.
| Results |
|---|
|
|
|---|
60 (dependent) and 5% scoring
95 (independent). At 6 months, 15% of the sample had scores
60, and 50% had scores
95.
|
The number of patient/proxy respondents at each time point was 124/124 (t0), 108/104 (t1), 102/101 (t3), and 98/96 (t6). Of the 26 patients lost to follow-up, 8 patients died. Fewer than 5% of respondents had
1 item nonresponse to the EQ-5D and HUI. Differences in dyads available for analysis between the EQ-5D and HUI (Tables 2 through 4![]()
) were due to the HUI scoring algorithm, which classifies inconsistent responses for multi-item attributes as missing data.
|
|
|
Proxies demonstrated a central tendency to report more problems than the patient on the EQ-5D at 6 months for self-care, pain/discomfort, and anxiety/depression (P<0.05) (Table 2). Agreement based on
was good for the more observable dimensions (mobility, self-care) and poor for the less observable dimensions (pain/discomfort, anxiety/depression) at baseline. Exact agreement and ICC point estimates generally improved at the 6-month follow-up, most notably for pain/discomfort.
ICC-based agreement at baseline was fair to good for the more observable attributes of the HUI3 (ambulation, dexterity) and improved at 6 months (Table 3). At both baseline and 6 months, proxies underestimated the extent of problems with hearing compared with patient assessment. Patient self-assessed cognition scores were systematically higher than proxy scores (P<0.05). Poor patient-proxy agreement persisted at 6 months on the attributes of speech, hearing, and cognition. The domain of hearing had a poor ICC-based agreement yet a high level of exact agreement (85%) because of extreme discrepancy in a small subgroup of patients who reported they were unable hear at all, while their proxies reported the patients had no problems hearing. Dexterity had poor exact agreement but fair ICC-based agreement because, although most patient-proxy responses were not identical, they were generally within 1 response category of each other.
For summary scores, the magnitude of difference between patient and proxy mean scores was absent to small (Table 4). A statistically significant (P<0.001) and nontrivial difference of 10 points on the EQ-VAS at baseline disappeared thereafter. EQ-Index mean patient scores were 0.04 to 0.06 higher than proxy mean scores across time points. All summary scores displayed a trend toward greater agreement after baseline. The hypothesis that agreement between patient and proxy on the EQ-VAS would be lower than for other summary scores was generally supported by ICC point estimates and CIs. Time elapsed between date of stroke and assessment was not significantly correlated with patient-proxy difference scores (all Pearsons r<0.15). One-way ANOVA on difference scores indicated that no statistically significant differences were detected among groups based on proxy relationship to patient or stroke subtype.
Change score agreement between baseline and 6 months was poor (EQ-VAS) to fair (EQ-Index, HUI3) (Table 4). However, little systematic bias was detected at the group level. In a comparison of patient and proxy change scores, mean differences on the EQ-Index and HUI3 were not statistically significant and below a small magnitude (|d|<0.2) of effect.
| Discussion |
|---|
|
|
|---|
1 month after stroke. Proxy assessments of the direct preference-based EQ-5D VAS were less reliable, especially at baseline assessment. Although only fair agreement was observed between patient and proxy change scores on the EQ-Index and HUI3, the magnitude of systematic differences between assessor types was generally trivial (d<0.20). For instance, mean EQ-5D index-based change scores (t0/t6) were the same for patient assessment (0.32; SD=0.38) and proxy assessment (0.32; SD=0.39). It may be reasonable to contemplate the use of individual proxy assessments to derive change scores for the purpose of group-level inferences against the alternatives: missing data, statistical imputation, or mapping from clinical evaluation.
In general, results were similar to those reported for summary scores in previous studies of the HUI and EQ-5D.4,5 HUI3 single-attribute and overall utility scores displayed a similar pattern of agreement, except that we observed less agreement on the hearing attribute.4 The fair to good agreement on the EQ-5D observed in the present study at 6 months was comparable to the agreement in a previous study of the EQ-5D5 for self-completers who survived at least 3 months after stroke. On all EQ-5D dimensions, proxy assessment was more reliable in stroke compared with dementia patients.17
In considering proxy assessment of the EQ-5D and HUI3 in stroke, the EQ-5D is briefer and simpler for proxies to complete but lacks attributes relevant to stroke that are included in the HUI3 such as cognition, speech, and dexterity. Interestingly, the attribute of cognition demonstrated poor to fair agreement between patient and proxy assessment, with proxy scores being systematically lower at both baseline and 6 months. This discrepancy points to the need to contemplate the validity of a different perspective such as the family caregiver as an additional criterion for evaluating the usefulness of proxy respondents for neurologically compromised conditions. Disagreement is not necessarily undesirable, and multiple viewpoints can be valid and informative.18 Caregivers may recognize functional limitations that patients are unaware of or deny. Arguably, assessments of HRQL that are used to inform decision making in health care based on community-based preferences need not be restricted to the patient perspective.
Generalizability was attenuated by the exclusion of patients who lacked identifiable informal caregivers and thus likely to have less social support. The study sample was composed of fewer patients with mild stroke compared with the distribution of stroke types described in larger, comprehensive studies of stroke.9 The tertiary care hospitals in the study are referred the more serious stroke cases in the region, and all participants were hospitalized for at least 1 day. The sample was well suited for studying the research question, but study results cannot be generalized to cognitively impaired stroke patients, for whom proxy assessments are especially salient. Finally, the HUI3 scoring algorithm potentially introduced a bias that favored more agreement because respondents with illogical response sets were filtered out.
In conclusion, we found that patient-proxy agreement using the EQ-5D and HUI3 was comparable to previous studies even though the sample consisted of relatively fewer patients with mild stroke. Results suggested that proxies are more reliable for assessing stroke patients using community preference-based summary scores of generic HRQL measures (eg, HUI3, EQ-5D Index) and that proxy assessments of direct preference-based scores (EQ-VAS) were less reliable. Patient-proxy assessments had greater agreement if performed
1 month after the stroke event. Sequential proxy assessments to obtain change scores is not recommended but may be considered against the alternatives such as statistical imputation or mapping from clinical evaluation.
| Acknowledgments |
|---|
Received June 12, 2003; revision received September 4, 2003; accepted October 14, 2003.
| References |
|---|
|
|
|---|
2. Sneeuw KCA, Aaronson NK, de Haan RJ, Limburg M. Assessing the quality of life after stroke: the value and limitations of proxy ratings. Stroke. 1997; 28: 15411549.
3. Staquet MJ, Hays RD, Fayers PM, eds. Quality of Life Assessments in Clinical Trials. 2nd ed. New York, NY: Oxford University Press; 1998: 249280.
4. Mathias SD, Bates MM, Pasta DJ, Cisternas MG, Feeny D, Patrick DL. Use of the Health Utilities Index with stroke patients and their caregivers. Stroke. 1997; 28: 18881894.
5. Dorman PJ, Waddell F, Slattery JM, Dennis M, Sandercock PA, for the United Kingdom Collaborators in the International Stroke Trial. Are proxy assessments of health status after stroke with the EuroQol questionnaire feasible, accurate, and unbiased? Stroke. 1997; 28: 18831887.
6. Segal ME, Schall RR. Determining functional/health status and its relation to disability in stroke survivors. Stroke. 1994; 25: 23912397.[Abstract]
7. Sprangers MA, Aaronson NK. The role of health care providers and significant others in evaluating the quality of life of patients with chronic disease: a review. J Clin Epidemiol. 1992; 45: 743760.[CrossRef][Medline] [Order article via Infotrieve]
8. Kelly-Hayes M, Wolf PA, Kase CS, Gresham GE, Kannel WB, DAgostino RB. Time course of functional recovery after stroke: the Framingham Study. J Neurol Rehab. 1989; 3: 6570.
9. Bamford J, Sandercock P, Dennis M, Burn J, Warlow C. Classification and natural history of clinically identifiable subtypes of cerebral infarction. Lancet. 1991; 337: 15211526.[CrossRef][Medline] [Order article via Infotrieve]
10. Feeny D, Furlong W, Torrance GW, Goldsmith CH, Zhu Z, DePauw S, Denton M, Boyle M. Multiattribute and single-attribute utility functions for the health utilities index mark 3 system. Med Care. 2002; 40: 113128.[CrossRef][Medline] [Order article via Infotrieve]
11. Brooks R. EuroQol: the current state of play. Health Policy. 1996; 37: 5372.[CrossRef][Medline] [Order article via Infotrieve]
12. Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997; 35: 10951108.[CrossRef][Medline] [Order article via Infotrieve]
13. Fleiss JL, Cohen J. The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Meas. 1973; 33: 613619.[CrossRef]
14. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979; 86: 420428.[CrossRef][Medline] [Order article via Infotrieve]
15. Norman GR, Sloan JA, Wyrwich KW. Interpretation of change in health-related quality of life. Med Care. 2003; 41: 582592.[CrossRef][Medline] [Order article via Infotrieve]
16. Cicchetti DV, Sparrow SA. Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. Am J Ment Defic. 1981; 86: 127137.[Medline] [Order article via Infotrieve]
17. Coucill W, Bryan S, Bentham P, Buckley A, Laight A. EQ-5D in patients with dementia: an investigation of inter-rater agreement. Med Care. 2001; 39: 760771.[CrossRef][Medline] [Order article via Infotrieve]
18. Feeny DH, Furlong W, Barr RD. Multiattribute approach to the assessment of health-related quality of life: Health Utilities Index. Med Pediatr Oncol. 1998; 31 (suppl 1): 5459.[CrossRef]
This article has been cited by other articles:
![]() |
F. J. Carod-Artal, L. F. Coral, D. S. Trizotto, and C. M. Moreira Self- and Proxy-Report Agreement on the Stroke Impact Scale Stroke, October 1, 2009; 40(10): 3308 - 3314. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Muus, M. Petzold, and K. C. Ringsberg Health-Related Quality of Life After Stroke: Reliability of Proxy Responses Clin Nurs Res, May 1, 2009; 18(2): 103 - 118. [Abstract] [PDF] |
||||
![]() |
K. Hilari, S. Owen, and S. J. Farrelly Proxy and self-report agreement on the Stroke and Aphasia Quality of Life Scale-39 J. Neurol. Neurosurg. Psychiatry, October 1, 2007; 78(10): 1072 - 1075. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. S. Williams, T. Bakas, E. Brizendine, L. Plue, W. Tu, H. Hendrie, and K. Kroenke How Valid Are Family Proxy Assessments of Stroke Patients' Health-Related Quality of Life? Stroke, August 1, 2006; 37(8): 2081 - 2085. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Calvert, N. Freemantle, and J. G.F. Cleland The impact of chronic heart failure on health-related quality of life data acquired in the baseline phase of the CARE-HF study Eur J Heart Fail, March 2, 2005; 7(2): 243 - 251. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2004 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |