Use of the Health Utilities Index With Stroke Patients and Their Caregivers
Background and Purpose Few studies currently assess the health-related quality of life of individuals following a stroke. One of the major challenges of assessing quality of life is the high likelihood that after a stroke a patient will not be able to complete such an assessment. One practical solution is to have a family caregiver complete the assessment on behalf of these individuals. This current pilot study examined the interrater reliability of having family caregivers complete the Health Utilities Index (HUI) on behalf of stroke patients.
Methods A total of 74 patients who experienced an ischemic stroke and 37 family caregivers completed the interviewer-administered HUI (data were available for 33 pairs). The HUI is designed to produce a single summary measure of health-related quality of life, the global multiattribute utility score, as well as descriptive information on each of its attributes. Interrater reliability was measured by evaluating the percent agreement, Cohen’s κ statistics, intraclass correlation coefficients (ICCs), Pearson’s R correlations, and paired t tests between the patient and caregiver responses.
Results In most instances interrater reliability was acceptable, with values suggesting moderate to high agreement. The mean global multiattribute utility scores for the HUI 2 were identical for patients and caregivers (0.64±0.29), with an ICC of .72. A preponderance of patients reported decrements in several attributes of the HUI.
Conclusions These data indicate a substantial decrement in functioning in stroke patients and suggest that family caregivers can complete the HUI reliably when patients are unable to do so.
Stroke is an often fatal or debilitating neurological disorder that results in approximately 150 000 deaths each year in the United States. It ranks as the leading cause of disability in adults and is the third leading cause of death in the United States.1 More than one half of all stroke survivors are left with disabilities that prevent them from returning to their prestroke level of health and productivity.2 The direct costs of treating stroke survivors and the indirect costs due to lost productivity have a significant impact on the health care system. In 1993, the total cost of stroke in the United States was estimated at $30 billion.2
Common residual effects of stroke include paralysis of the face, arms, and legs; impairment of thought processes; speech and language problems; and visual disturbances. In addition to affecting physical functioning, self-care, and the ability to live independently, stroke can cause anxiety and emotional distress in most stroke patients. Health-related quality-of-life (QOL) assessments have particular relevance in stroke, because both the resulting decrements in motor, cognitive, sensory, and emotional functioning and any changes in these areas over time can be measured.
A major challenge in assessing health-related quality of life in this population is the high likelihood that following a stroke a patient will not be able to complete self-report assessments. One practical solution is to request that an individual who knows the patient well, such as a family caregiver, complete the assessment on his or her behalf. Including proxy data for incapable or unwilling patients results in an increased sample size and improves the representativeness of the study population.3 However, researchers disagree on the ability of proxies to accurately respond on behalf of the patient.
Prospective health-related QOL studies typically require serial administration of a questionnaire. Because a proportion of patients may be unable to answer for themselves because of physical or cognitive deficits during the poststroke period, we examined the interrater reliability of having family caregivers complete the HUI on behalf of stroke patients. The HUI is one of three widely used multiattribute systems available to determine health state preferences. The other two include the Quality of Well-Being Scale4 and the EuroQoL.5 We compared responses between patients and their caregivers and investigated whether there were systematic differences between these two alternative respondents.
Subjects and Methods
A total of 74 individuals were recruited by study coordinators at seven medical centers in the United States. Eligible participants were those who (1) were at least 18 years of age, (2) had experienced an ischemic stroke within the last 3 months, (3) were literate in English, and (4) were competent to participate in a 15- to 30-minute face-to-face interview. Study coordinators also attempted to identify a family or “formal” caregiver for each patient who was knowledgeable about the patient’s condition. They were unable to recruit any “formal” caregivers into the study; therefore, data for family caregivers (usually a spouse, child, or other close relative) are presented. Caregivers were identified for 37 patients. They were required to be aged 18 years or older, play an active role in caring for the stroke patient (although they were not required to live with the patient), be literate in English, and be competent to participate in a 15- to 30-minute face-to-face interview. The range of activities involved in playing an “active role” in caring for the patient could vary from providing help performing activities of daily living to providing emotional support and were not directly specified.
Of the 74 patients recruited, 95% provided complete and evaluable responses, and 50% identified a family caregiver. Although we have data on 37 pairs of patients and caregivers, the current analyses are based on 33 cases for whom both complete patient and complete caregiver interviews were obtained. All interviews took place either at the medical center or at the subjects’ homes. Patients and caregivers were interviewed in separate rooms to reduce bias. All participants were offered a small honorarium for their time. All study materials were reviewed and approved by institutional review boards at each of the participating institutions.
Patients and family caregivers completed an interviewer-administered combined HUI 2/HUI 3, which addressed patient health status during the preceding week. The questionnaires were identical except for minor rephrasing to instruct the caregiver to answer on behalf of the patient. The HUI and the demographic characteristics analyzed are described briefly below.
Health Utilities Index
The HUI is a generic multiattribute system for the assessment of health status developed at McMaster University. It allows for the collection of comprehensive data on the functional status of a population. It is based on concepts of functional capacity rather than performance. In addition to providing descriptive information on each of the attributes addressed in the questionnaire, the HUI is designed to produce a single summary measure of health-related QOL. This global multiattribute utility score, currently available only for the HUI 2, is determined from an algorithm based on weights derived from a community-based Canadian sample. Therefore, data using the two components of the HUI (health status in terms of deficits [categorical data] and utility scores) can be reported.6 7
Because a utility-scoring algorithm is not currently available for the HUI 3, we were unable to calculate either single- or multiattribute utility scores for the HUI 3. However, it was possible to calculate mean, linearly scaled scores for each of the HUI 3 attributes. These scores were calculated by transforming the mean attribute scores to a 0.00 to 1.00 scale, with 1.00 representing the highest level of functioning.
The HUI 2, as used in this study, comprises six attributes: sensation (vision, hearing, and speech), mobility, emotion, cognition, self-care, and pain. The HUI 3 comprises vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain. For some attributes (eg, cognition), the HUI 2 and HUI 3 overlap.8 In other cases, different concepts for the same attribute (eg, emotion) are used in the HUI 2 and the HUI 3. In some cases, an attribute (eg, self-care) is included in the HUI 2 but not in the HUI 3; in others, an attribute (eg, dexterity) is included in the HUI 3 but not the HUI 2. Thus, although there is considerable overlap between the HUI 2 and the HUI 3, the two do provide complementary information of relevance to the assessment of morbidity in the context of stroke.
The HUI was chosen as the health status measure for the current study because its attributes are relevant to stroke, it is available in an interviewer-administered format (an important benefit in this elderly population), and its global multiattribute utility scoring function allows for the calculation of quality-adjusted life years, a useful measure for comparison of different treatment options.6 We chose to use a combined HUI 2/HUI 3 version that contains questions on the following nine attributes: emotion, cognition, self-care, pain, vision, hearing, speech, ambulation, and dexterity. Each attribute contains between four and six levels of ability, with level 1 corresponding to the highest functioning and level 6 to the lowest functioning. The selected level for each attribute in the HUI 2 system can be combined to yield a global multiattribute utility score ranging from −0.03 to 1.00.6 (Because the worst possible health state was judged by respondents to be worse than death, a negative global utility score can be obtained.) The face-to-face interview to obtain health status in the HUI 2/HUI 3 systems took approximately 20 minutes to complete.
The HUI 2/HUI 3 was pretested before use in the study. The purpose of the pretest was to ensure comprehension and acceptability. Individuals who had suffered an ischemic stroke within the preceding 3 months (n=13) and their family caregivers (n=11) took part in the pretest. As a result of this pretest, minor revisions were made.
Patient Sociodemographic and Clinical Information
Items include age, gender, education, total family income, present residence, current marital status, initial poststroke hospitalization length of stay, and duration of any rehabilitative hospitalizations.
Family Caregiver Information
The proxy characteristics collected include age, education, total family income, current marital status, relationship to the patient, and years married to the patient, if applicable.
We calculated the percentage of patients who were assigned the highest level for each attribute by patients and by caregivers. We calculated the percent agreement between the patient and caregiver responses as well as Cohen’s κ statistic.9 Following the guidelines in Robinson et al,10 percentage agreement above 80% was considered good agreement. Similar studies evaluating test-retest or interrater reliability have used the following criteria for κ statistics: (1) >0.80, excellent agreement; (2) 0.61 to 0.80, substantial agreement; (3) 0.41 to 0.60, moderate agreement; and (4) ≤0.40, poor agreement.3 8
We also calculated the mean single-attribute utility scores (and standard deviations) for each of the attributes in the HUI 2 system for the patient group and the caregiver group. In a single-attribute utility function, the score for level 1 (the highest) is defined as 1.00, and the score for the lowest level of functioning for that attribute is defined as 0.00; intermediate levels have intermediate scores according to preference ratings.7 We also calculated the mean global multiattribute utility scores for the HUI 2 for each of these groups.
To ascertain the interchangeability of patient and caregiver responses, we performed matched-pair t tests on single-attribute and multiattribute utility scores. The single best measure of interchangeability is the intraclass correlation coefficient (ICC), which is a measure of concordance.11 It represents the proportion of total variability among two groups that is accounted for by the variability between pairs. The ICC is sensitive to differences in level or in scale between patients and caregivers. Cicchetti and Sparrow12 recommend the following guidelines for intraclass correlation coefficients: .75 or better indicates excellent interrater agreement; .60 to .74 indicates good agreement; .40 to .59 indicates fair to moderate agreement; and below .40 indicates poor agreement. For comparative purposes, we also reported Pearson’s R correlations comparing patient and caregiver single-attribute and multiattribute utility scores on the HUI 2. Pearson’s correlations are a familiar measure of agreement, but they may be misleading, because two groups with systematic differences may appear to be highly correlated.11 Unlike the ICC, Pearson’s correlation is not sensitive to differences in level or scale between patients and caregivers. We also calculated weighted and unweighted κ statistics. Cohen’s κ is used to measure the patient and caregiver agreement, adjusted for the amount of agreement expected by chance. The unweighted κ measures exact agreement. The weighted κ, appropriate for responses that are ordinal or continuous, weights disagreement according to how many categories apart the two responses are.
To evaluate the validity of using linearly scaled attribute scores for the HUI 3 rather than single-attribute utility scores, we calculated both linearly scaled and single-attribute utility scores for the HUI 2. We also compared these two scores for all respondents pooled together (n=66) using three measures of reliability: Pearson’s R correlations, paired t tests, and ICCs. For the HUI 3, we compared patient and caregiver linearly scaled attribute scores and assessed interrater reliability. Sample questions from the HUI, their associated descriptors from the health state classification system and single-attribute and linearly scaled utility values are included in the “Appendix.”
All statistical analyses were completed with the SAS System, Version 6.10, for Windows.13
Patient and caregiver sociodemographic data were compared for the 33 matched pairs for whom we had complete data.
Patient and Caregiver Description
The majority of patients (59%) were men, with an average age of 69.2 (range, 35 to 92) years. The majority of patients had finished high school (75%), were married or living with a partner (60%), and were living at home at the time of the interview (70%). The mean initial poststroke hospital stay was 9.7 days, and the mean length of any rehabilitative hospital stay was 10.0 days. Caregivers tended to be more educated and have a higher family income than patients, presumably because some (27%) were children or grandchildren of patients. Fifteen caregivers (58%) were married to patients for an average of 30.6 years. Other caregivers were either parents of the patients (4%) or significant others (12%). Some differences in demographic characteristics emerged when comparisons were made between patients who identified a caregiver and were included in the current analysis and those who did not identify a caregiver. Patients included in the study had a significantly higher education and family income levels and were older. They were, however, comparable in the following areas: duration of the initial poststroke hospital stay, time elapsed since the stroke, current residence, marital status, and gender.
HUI 2 and HUI 3 Attributes
For the individual health status classification questions, the percentage agreement ranged from 68% to 94% (Table 1⇓). The agreements for eight of the eleven questions were above 80%, indicating good agreement. The lowest agreement was found for the distant vision question, perhaps because this ability is not discussed much between patients and caregivers. The κ statistics of interrater reliability for the individual health status classification questions ranged from .37 to .80. Most of the κ statistics were at least .40, indicating moderate to substantial agreement. The confidence intervals for these κ statistics were quite large, presumably due to the small sample size of this study. In addition, details on the percentage of patients and caregivers with deficits in each of the attributes are included as part of Table 1⇓.
The mean single-attribute utility scores for each of the HUI 2 attributes as well as the HUI 2 multiattribute utility score are presented for both patients and caregivers in Table 2⇓. Patients and proxies had similar single-attribute scores for most of the attributes, and the mean global multiattribute utility score for the HUI 2 was identical for patients and proxies. The lack of statistical significance in the paired t test for self-care indicates no systematic difference in mean response between the patient and his caregiver. Pearson’s R correlations between patients and proxies were quite high (>.60) for all the HUI 2 attributes (except pain). All of these correlations were statistically significant (P<.001). The ICCs for the individual attributes in the HUI 2 were all good or excellent except for pain (.39). The ICC for the global HUI 2 global multiattribute utility score was 0.72.
Table 3⇓ contains the linearly scaled attribute scores for patients and caregivers on the HUI 2. In comparing Tables 2⇑ and 3⇓, it is interesting to note that while the linearly scaled single-attribute scores appear to provide useful information for assessing interrater reliability, the linearly scaled single-attribute scores provide quite different information on the degree of morbidity for a number of attributes including sensation, mobility, emotion, cognition, and pain. In assessing the degree of morbidity, linearly scaled scores are not a good substitute for single-attribute utility scores based on direct preference measurements. Patients and proxies had similar linearly scaled attribute scores for most of the HUI 2 attributes (Table 3⇓). Pearson’s R correlations between patients and proxies were quite high (>.60) for all the HUI 2 attributes. All of these correlations were statistically significant (P<.001). The κ statistics of interrater reliability for the linearly scaled attribute scores ranged from .33 to .75. All of the attributes except cognition had κ statistics that were at least .40, indicating moderate to substantial agreement. The ICCs for the linearly scaled HUI 2 attribute scores were all good to excellent.
When we pooled data for all patient and caregiver pairs and compared single-attribute utility scores versus linearly scaled attribute scores (Table 4⇓), we found that, in general, these two scores were similar. Pearson’s R correlations were good to excellent, although in all attributes except for self-care, paired t tests reached statistical significance. For all the attributes, the linearly scaled means were the same or lower than the mean single-attribute utilities. Table 5⇓ presents data on the HUI 3 comparing the patient and caregiver attribute scores calculated assuming linear properties (single-attribute and multiattribute utility scores are not yet available for the HUI 3). For HUI 3 cognition, patients rated themselves as more impaired than did their caregivers. For HUI 3 pain, caregivers rated the patients as significantly more impaired (as measured by paired t test; P=.044) than did the patients themselves. The ICCs for each attribute in the HUI 3 ranged from .65 for cognition to .85 for vision and dexterity.
Only one patient (3%) reported no functional limitations on the HUI 2 classification system (ie, all six attributes classified as level 1; data not shown). All 33 caregivers classified patients as having at least one deficit. There was also a high level of agreement between patients and caregivers on the number of attributes affected. For instance, 27 caregivers reported between three and six attributes affected, and 25 patients also rated between three and six attributes affected. On the HUI 3 classification system there was also high agreement between patients and caregivers. Twenty-six patients concurred with the 27 caregivers who reported between three and six attributes affected. In addition to the high concordance in the reporting between patients and caregivers, these data also suggest that stroke patients have a large decrement in functioning and well-being.
Although extensive research has been conducted on mortality and risk of recurrence in stroke patients, less research has been conducted on long-term disability and health-related QOL in these patients.14 Health status assessment should be included routinely (even in its briefest form) during the initial and poststroke period. Common barriers to its widespread application include methodological, practical, conceptual, and attitudinal issues.15 Barriers specific to a stroke population include (1) the fact that some survivors are unable to rate their own health-related QOL on standard measures because of impaired cognitive or communicative functioning, (2) the fact that patients may deny functional impairments, and (3) the difficulty of distinguishing the effects of stroke from those related to age and other morbidity and impairments. The results of this study offer a practical solution to resolving the first two of these impediments.
Several studies3 16 have found that proxies of elderly patients tend to overestimate patient disability relative to direct patient assessment, especially with regard to capacity to perform instrumental activities of daily living. Sprangers and Aaronson17 found that healthcare providers and significant others tend to underestimate physical symptoms such as pain and overestimate psychological conditions such as depression. They also found that the level of agreement tends to diminish with decreasing health status of the patient. Magaziner et al18 compared the responses of community-dwelling women aged 65 and older with their self-designated proxies. They found substantial agreement on questions relating to the presence of chronic conditions (eg, arthritis or glaucoma), physical tasks of daily living, and instrumental tasks of daily living. However, poor to moderate agreement (κ of .24 to .59) was found on questions regarding health symptoms. Robinson et al9 found that valid social functioning assessments can be made either by a stroke patient or by a familiar outside informant. Segal et al19 examined patient versus caregiver agreement for telephone administration of the Functional Independence Measure in a sample of 25 community-living stroke patients 18 months after stroke. They found excellent patient versus caregiver agreement for the total Functional Independence Measure score and the physical dimension score but lower agreement on the cognitive dimension.
We found that caregivers tended to overestimate certain abilities (eg, vision, hearing, and self-care) and to underestimate others (eg, distant vision, speech, and ambulation). We found substantial agreement between caregivers and patients on questions related to sensation, mobility, emotion, self-care, ambulation, and dexterity. However, poor to moderate agreement was found on questions related to cognition, pain, hearing, and speech.
Several studies have evaluated the reliability of using proxy respondents to determine patients’ preferences for their current state of health. In a study on the QOL of low-birth-weight infants at adolescence, Saigal et al20 found a high degree of consistency between teenager and parental responses on the HUI 2, visual analogue scale, and standard gamble assessments. Gemke and Bonsel21 evaluated the health status of 254 children admitted to intensive care by having parents, attending clinicians, and investigators complete the HUI 2 on behalf of the children. They found excellent interrater reliability across all the raters. However, Tsevat et al22 found that the health values and ratings of patients (measured with time tradeoff and visual analogue scale assessments) were generally higher than the ratings of their surrogates or their physicians.
Our results can be compared with those of Saigal et al,20 who evaluated the self-assessed health status and health-related QOL of extremely low-birth-weight infants during adolescence, using the HUI 2. The adolescents had limitations in cognition, sensation, self-care, and pain, but the mean global HUI 2 multiattribute utility score for this group (.87) was higher than the mean global HUI 2 multiattribute utility score for stroke patients (.64). In addition, Saigal et al found that 34% of the extremely low-birth-weight adolescents reported no HUI 2 attributes affected, 54% reported one to two affected and 12% reported three to six affected. We found that only 3% of stroke patients reported no HUI 2 attributes affected, 18% reported one to two affected, and 79% reported three to six affected.
Assessment of health status has been shown to be feasible in clinical settings. This information can be used to identify functional and emotional problems, assess response to treatment, and enhance communication between the physician and patient.15 Collecting data on the health status of stroke patients is crucial to facilitate treatment of the physical, psychological, and social effects of stroke. A stroke often results in long-term disability, and research is needed concerning factors that will improve the health-related QOL of stroke survivors and their families.23 Research into different stroke rehabilitation programs can also be enhanced by utilizing health-related QOL instruments that are sensitive to changes over time to facilitate the process of identifying which patients will likely benefit from which type of rehabilitative procedure.
In addition, these results suggest that the impact of stroke on the patient is substantial. As would be expected, the patients were greatly impaired in the areas of cognition, ambulation, mobility, and emotion. These results are consistent with other studies evaluating the impact of stroke on health-related QOL.14 24 25
The results of this study should be interpreted in light of several limitations. The sample size was small and did not allow for comparisons of different subgroups of family caregivers, such as spouses versus children or men versus women. In addition, the results should only be applied to the use of the HUI and not necessarily other health-related QOL measures. The HUI focuses on the physical and emotional dimensions of health status but does not include questions on the individual’s social functioning or satisfaction. Because the HUI is more oriented to physical health than other measures, it may be easier for caregivers to reliably complete the HUI on behalf of patients. Although we did find acceptable reliability on the more subjective domains of the HUI (such as emotion or pain), these results cannot be extended to all health-related QOL measures. In addition, because the scoring algorithm for the HUI 3 has not yet been developed, we were not able to calculate single-attribute or multiattribute utilities for the HUI 3.
Another limitation of our study is that we tested only the reliability of patients with family caregivers. Therefore, our results cannot be extended to other proxy respondents such as formal caregivers or physicians. The benefit of using a family caregiver as a respondent is that the individual spends time every day with the patient and should be keenly aware of the patient’s physical and emotional functioning. In general, physicians seem to underestimate patients’ health-related QOL and underestimate the pain intensity experienced by their patients. However, they seem to overestimate patients’ feelings of anxiety, depression, and distress.17 In contrast to this, Gemke and Bonsel21 found no systematic differences among the responses of parents, clinicians, and investigators on the HUI 2. Because only one half of the stroke patients identified a family caregiver at the time of the outpatient visit and thus were used in our analyses, it is possible that the patients who participated in the current study were more impaired than those who came to their clinic visit independently (without a caregiver). If patients in our study were more impaired, it is quite likely that our results represent a conservative estimate of interrater reliability.
The moderate to high correlations between patient and caregiver responses suggest the use of caregivers to complete the HUI when patients are unable to do so. We found that the correlation between the patient and caregiver responses was acceptable even for more subjective health-related QOL domains such as pain or emotion. These results may have broad implications within the therapeutic area of stroke as well as in other conditions in which patients may be unable to respond to a health-related QOL questionnaire.
This study was supported by Janssen Pharmaceutica. (The authors own no stock or options in Janssen Pharmaceutica.) The authors thank Jane Donohue, PhD, for her assistance with the study and Megeen Egan for her secretarial support.
- Received February 20, 1997.
- Revision received June 24, 1997.
- Accepted July 21, 1997.
- Copyright © 1997 by American Heart Association
National Institute of Neurological Disorders and Stroke. Stroke research highlights. National Institutes of Health, 1994.
Matchar DB, Duncan PW. Cost of stroke. National Stroke Association: Stroke Clinical Updates. 1994;V:9-12.
Kaplan RM, Anderson JP, Ganiats TG. The Quality of Well-Being Scale: rationale for a single quality of life index. In: Walker SR, Rosser RM, eds. Quality of Life Assessment: Key Issues in the 1990s. Dordrecht, Netherlands: Kluwer Academic Publishers, 1993:65-94.
Feeny DH, Torrance GW, Furlong WJ. Health Utilities Index. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. 2nd ed. Philadelphia, Penn: Lippincott-Raven Publishers; 1996:239-252.
Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Controlled Clin Trials. 1991;12(suppl 4):142S-158S.
SAS Institute. SAS Statistical Package for Personal Computers, Version 6.10. Cary, NC: SAS Institute.
Ahlsio B, Britton M, Murray V, Theorell T. Disablement and quality of life after stroke. Stroke. 1984;15:886-890.
Deyo RA, Patrick DL. Barriers to the use of health status measures in clinical investigation, patient care, and policy research. Med Care. 1989;27(suppl 3):S254-S268.
Magaziner J, Bassett SS, Hebel JR, Gruber-Baldini A. Use of proxies to measure health and functional status in epidemiologic studies of community-dwelling women aged 65 years and older. Am J Epidemiol. 1996;143:283-292.
Tsevat J, Cook EF, Green ML, Matchar DB, Dawson NV, Broste SK, Wu AW, Phillips RS, Oye RK, Goldman L. Health values of the seriously ill. Ann Intern Med. 1994;122:514-520.
Evans RL, Noonan WC, Bishop DS, Hendricks RD. Caregiver assessment of personal adjustment after stroke in a Veterans Administration Medical Center outpatient cohort. Stroke. 1989;20:483-487.
Torrance GW, Zhang Y, Feeny D, Furlong W, Barr R. Multi-attribute preference functions for a comprehensive health status classification system. Working Paper #92-18, Centre for Health Economics and Policy Analysis, McMaster University 1992.