Development of a Stroke-Specific Quality of Life Scale
Background and Purpose—Clinical stroke trials are increasingly measuring patient-centered outcomes such as functional status and health-related quality of life (HRQOL). No stroke-specific HRQOL measure is currently available. This study presents the initial development of a valid, reliable, and responsive stroke-specific quality of life (SS-QOL) measure, for use in stroke trials.
Methods—Domains and items for the SS-QOL were developed from patient interviews. The SS-QOL, Short Form 36, Beck Depression Inventory, National Institutes of Health Stroke Scale, and Barthel Index were administered to patients 1 and 3 months after ischemic stroke. Items were eliminated with the use of standard psychometric criteria. Construct validity was assessed by comparing domain scores with similar domains of established measures. Domain responsiveness was assessed with standardized effect sizes .
Results—All 12 domains of the SS-QOL were unidimensional. In the final 49-item scale, all domains demonstrated excellent internal reliability (Cronbach’s α values for each domain ≥0.73). Most domains were moderately correlated with similar domains of established outcome measures (r2 range, 0.3 to 0.5). Most domains were responsive to change (standardized effect sizes >0.4). One- and 3-month SS-QOL scores were associated with patients’ self-report of HRQOL compared with before their stroke (P<0.001).
Conclusions—The SS-QOL measures HRQOL, its primary underlying construct, in stroke patients. Preliminary results regarding the reliability, validity, and responsiveness of the SS-QOL are encouraging. Further studies in diverse stroke populations are needed.
Stroke is the leading cause of adult disability and the third leading cause of adult death in the industrialized world. Despite the enormous personal and societal impact of stroke, the best method for measuring stroke outcome is not clear. As new drugs are assessed in stroke clinical trials, it is critical to measure outcomes that are both relevant and important to stroke patients. Some commonly used stroke outcome measures, such as the Barthel Index (BI)1 and the Short Form 36 (SF-36),2 for example, have no assessment of language. Thus, patients with severe aphasia may have a normal score on these measures and therefore be classified as having “good” outcome for purposes of analysis of drug efficacy. Other domains often neglected in stroke outcome assessments are cognitive, psychological, and social function.
Because of these deficiencies, clinical trials are increasingly emphasizing patient-centered outcomes such as functional status and health-related quality of life (HRQOL). HRQOL is broadly conceptualized as the physical, psychological, and social aspects of life that may be affected by changes in health states.3 HRQOL can be measured with generic or disease-specific measures. Generic measures are designed to compare HRQOL across populations or different diseases; disease-specific measures are designed to assess HRQOL with questions and scales that are specific to a disease or condition.4 Ideally, patient-centered outcomes like HRQOL are more relevant to individuals, but these measures are relevant in specific disease states only insofar as the measure incorporates questions about functions typically affected by that disease.
Assessing HRQOL is difficult in stroke, in which patients have heterogeneous stroke symptoms and deficits and also commonly suffer from psychological and social sequelae of stroke. Currently, stroke trials use generic HRQOL measures such as the SF-365 and the EuroQol.6 Generic measures, however, have several problems when applied to stroke patients, including the following: (1) content validity of the domains, that is, appropriate areas of potential dysfunction may not be assessed (the EuroQol, for example, does not include arm/hand or language assessments); (2) content validity of the items, that is, meaningful questions to quantify function in a specific area may not be asked; and (3) sensitivity to change or responsiveness, that is, generic measures may not detect clinically important changes in HRQOL.7 The ability of an instrument to be sensitive to within-patient change is especially important in clinical trials. Other special difficulties in assessing HRQOL in stroke patients include the necessity of relying on proxy responses for patients with language and cognitive impairment. Although the EuroQol has been validated for proxy completion in stroke,8 no stroke-specific measure has been similarly validated.
A responsive HRQOL measure that includes domains commonly affected by stroke would be useful both to evaluate treatment efficacy in patients with different deficits and to assess the impact of various types of stroke on HRQOL. Establishing such a reliable, valid, responsive instrument suitable for proxy completion is the ultimate goal of this line of research. This process, however, is one that takes time and large cohorts of patients. The aim of this study was to begin the process of developing a patient-derived, responsive stroke-specific quality of life (SS-QOL) measure, designed for use in stroke clinical trials. The first steps in this process are described here, including (1) the development of items and domains with the use of qualitative data from stroke survivors and (2) initial reliability, validity, and responsiveness data from the first patient sample. Future aims are to compare the SS-QOL with other HRQOL instruments, validate it in a larger and more heterogeneous sample of stroke patients, and assess proxy completion.
Subjects and Methods
To establish domain and item content validity, one investigator conducted focused interviews with 34 ischemic stroke survivors to identify common domains that affect stroke patients’ HRQOL. Subjects without significant cognitive or language impairment were identified from our stroke clinics and from a stroke survivor group on the Internet. Patients were interviewed 1 to 6 months after stroke and were asked to identify the 3 areas most affected by their stroke. A list of commonly affected domains was used so that areas not mentioned by patients could be uniformly queried. Specific examples of activities or functions affected in the domains were sought.
Items in each identified domain were generated from these responses and from review of other stroke and HRQOL instruments. We purposely included 2 to 3 times the number of items we desired in the final instrument. Three response sets were developed on a 5-point Likert scale: (1) amount of help required to do specific tasks, ranging from no help to total help, (2) amount of trouble experienced when attempting tasks, ranging from unable to do it to no trouble at all, and (3) degree of agreement with statements regarding their functioning, ranging from strongly agree to strongly disagree. The point of reference for all items was the past week. The complete set of items was then reviewed by experts in neurology, physical medicine, and rehabilitation and by stroke survivors. The items were pilot tested in patients 1 to 3 months after ischemic stroke. After administration to 3 patients, comments were reviewed and changes made; this process was repeated among 5 groups of 3 patients, at which time no substantial changes were suggested.
In addition to the individual items of the SS-QOL, questions asking patients to rate each domain and their overall HRQOL compared with before the stroke as a lot worse, a little worse, or the same were added. These patient-generated ratings of the domains and overall HRQOL were included so that they could be used as the standard against which changes in individual item and domain scores were compared.
Patients aged >18 years with acute ischemic stroke were identified from the 3 adult hospitals on the Indiana University School of Medicine campus (a veterans hospital, a county hospital, and a tertiary-referral hospital). Those who were able to return for follow-up 1 month after stroke were considered for participation in the study. Patients were excluded if they met any of the following criteria: (1) prior stroke with persistent deficit, (2) intracerebral or subarachnoid hemorrhage, (3) dysphasia at 1 month after stroke such that meaningful communication could not be established, and (4) significant comorbidities likely to concurrently affect HRQOL (eg, class III or IV heart failure, peritoneal dialysis, hemodialysis, preexisting musculoskeletal disease significantly limiting physical function, metastatic cancer, active psychiatric disease or dementia, and diagnosis of HIV infection or AIDS). Patients were paid an honorarium of $5 at each clinic visit.
Study Instruments and Follow-Up
Patients were seen at 1 and 3 months (±1 week) after stroke. Baseline data included the following: (1) demographic information, (2) location and size of stroke, (3) ischemic stroke subtype according to Trial of Org 10172 in Acute Stroke Treatment (TOAST) criteria,9 and (4) length of stay and discharge disposition. Initial stroke severity was determined retrospectively in previously validated fashion with the Canadian Neurologic Scale.10 At 1 and 3 months after stroke, an interviewer administered the following instruments: (1) SS-QOL, (2) SF-36,2 and (3) Beck Depression Inventory (BDI).11 To eliminate bias due to order of instrument administration, the SS-QOL, SF-36, and BDI were administered in random order, with each patient’s order of administration at 1 month repeated for the 3-month visit. Two interviewers conducted all assessments, and interviewer assignment was consistent for subsequent patient visits. One- and 3-month National Institutes of Health Stroke Scale (NIHSS)12 and BI1 scores were completed by a neurologist certified in their administration and blinded to scores on the other instruments.
Item Reduction and Reliability
Items were evaluated within each domain with 3 statistical techniques: (1) exploratory factor analysis (EFA), a technique to determine whether the items form a single underlying factor; (2) Cronbach’s α, a measure of internal consistency; and (3) change in mean item score between 1 and 3 months. With EFA, we used eigenvalues of >1.0 to identify separate domains, hypothesizing that each domain would be unidimensional, and used a cutoff of ≥0.40 for factor loadings for each item.13 By prespecified criteria, domains with eigenvalues <1.0 and/or <3 items with factor loadings ≥0.4 were not considered reliable. We used Cronbach’s α to identify items whose removal increased the internal reliability of that domain. We compared mean item change scores between 1 and 3 months after stroke for each item, keeping items whose mean change score discriminated between patients with self-reported improvement in that domain versus those with no improvement. With the use of these 3 criteria, each item was evaluated for removal from the SS-QOL. If 2 criteria suggested removal, the item was removed. If only 1 of the 3 criteria suggested removal, the contribution of the item to the content validity of its domain was also considered. For example, an unresponsive item that loaded on and increased the α in a domain was removed if it did not contribute significantly to the content validity of that domain. To assess internal reliability of the final domains, Cronbach’s α values were recalculated after items were removed.
Once items were deleted, we calculated average item scores separately for each domain. All items were scored from 1 to 5, with higher scores representing more normal function. Relationships of SS-QOL domains and the selected “gold standard” measure of each domain for construct validation analyses were specified a priori. Construct validity of individual domains was established by comparing the linear association of the domain score with the score of an established outcome measure for that domain. The domains and corresponding outcome measures can be found in the first 2 columns of Table 3⇓. SS-QOL score, the average of all domain scores, was compared with other outcome measure scores by ANOVA in patients rating their overall HRQOL as a lot worse, a little worse, or the same as before their stroke.
Domain responsiveness to change was estimated in patients reporting dysfunction in that domain with the use of standardized effect sizes (SES). Domains were categorized as affected at 1 month after stroke if the patient reported that the domain overall was not the same as before the stroke. The SES was calculated by dividing the change between 1- and 3-month scores by the SD of the 1-month scores. By convention, SES scores of <0.2 were considered nonresponsive; 0.2 to 0.5, mildly responsive; 0.51 to 0.7, moderately responsive; and >0.7, markedly responsive to change.14 All analyses were done with SPSS statistical software.
The 32 poststroke patients interviewed for item generation identified 12 commonly affected domains: energy, family roles (defined by patients as relationships and work within the family), language, mobility, mood, personality, self-care, social roles (defined by patients as relationships with friends and activities outside the home), thinking, upper extremity function, vision, and work/productivity (defined by patients as necessary activities done within or outside the home). Domains most frequently identified as 1 of the 3 most affected were as follows: hand/arm function (56%), family roles (56%), language (56%), mobility (31%), work/productivity (28%), cognitive (19%), mood (19%), and energy (13%). Within these 12 domains, 78 items were generated (Appendix). Examples of specific items mentioned by patients and incorporated into SS-QOL items include the following: “I felt unsteady when I was walking” (mobility domain), “I had trouble with my handwriting” (upper extremity domain), and “I feel like a burden to my family” (family roles domain). No domain had fewer than 3 items.
Between August 1, 1997, and June 1, 1998, 72 patients were enrolled in the study. The mean age of the subjects was 61 years; 63% were male, 25% were black, and 18% had no health insurance. Most patients had mild stroke, with mean Canadian Neurologic Scale at admission of 9.2 (range, 2.0 to 11.5; best score possible, 11.5). NIHSS scores at 1 and 3 months were ≤1 in 43% and 63%, respectively, and 1- and 3-month BI scores were ≥95 in 81% and 89%, respectively. Mean SF-36 physical function score was 49 at 1 month and 59 at 3 months after stroke. By TOAST criteria, 51% had lacunar strokes, 61% had strokes ≤1 cm on CT or MRI, and 58% of strokes were in the deep gray or subcortical white matter. Overall HRQOL was rated as the same as prestroke in 48% and 59% at 1 and 3 months, respectively. The proportion of patients affected in each domain 1 month after stroke is shown in the Table 1⇓.
Item Reduction and Reliability
By the criteria outlined above, all domains were unidimensional. With the use of the 3 techniques of EFA, Cronbach’s α, and sensitivity to patient-reported improvement, 29 items were deleted from the SS-QOL. After item deletion, the internal reliability of the domains remained quite high, with α scores ≥0.73 in all domains (Table 2⇓). The response set asking the amount of help needed to perform an activity was relatively unresponsive to change between 1 and 3 months. Because of this lack of responsiveness, all items with the “help” responses, except for the self-care items, which were almost exclusively in this set, were removed.
Construct Validity of Domains
One-month scores on energy, family roles, mobility, mood, personality, self-care, and work domains were significantly linearly associated with the corresponding scores of the BI, BDI, and subscales of the SF-36 (Table 3⇓). Scores in the language and thinking domains were not associated with selected items from the NIHSS. This most likely occurred because patients with language and cognitive deficits were excluded, ie, there were no patients with a score >1 on these items. A ceiling effect of the NIHSS also likely accounts for the lack of linear association in the upper extremity domain; although 62% of patients reported upper extremity dysfunction 1 month after stroke, only 11% had an NIHSS arm score >1. A significant ceiling effect was also seen with the BI (81% with a score ≥95), and a moderate floor effect was seen with the SF-36 physical role limitations subscale (49% with score=0). The SS-QOL social roles domain was not linearly associated with the SF-36 social functioning subscale score. Mean social roles domain scores were significantly different in patients reporting their social roles as a lot worse, a little worse, and the same as before stroke (mean domain scores, 1.98, 2.87, and 3.07, respectively; P=0.006), but mean SF-36 social functioning scores were not different in these groups (48, 50, and 47, respectively; P=0.84).
We assessed domain responsiveness between 1 and 3 months after stroke in subjects affected in that domain. Most of the domains demonstrated moderate responsiveness, with SES scores >0.5 (Table 4⇓). The mood and personality domains were noticeably less responsive across all instruments: SES scores for the BDI and SF-36 mental health subscale were <0.2.
Overall SS-QOL Performance
The overall SS-QOL score at both 1 and 3 months increased significantly in patients reporting their poststroke HRQOL as a lot worse, a little worse, or the same as prestroke, respectively (Figure⇓). For ease of graphic presentation, the scores of each instrument have been standardized, with the instrument’s score in each HRQOL group divided by the highest score on that instrument. Mean instrument scores at both time points are shown in Table 5⇓. The pattern of SS-QOL scores was similar to the SF-36. The NIHSS score was different at 1 but not 3 months in the 3 overall HRQOL groups, and the BI was significantly different at 3 months but not 1 month after stroke. Neither the NIHSS nor the BI showed the same linear trend in scores demonstrated by the SS-QOL.
The ultimate goal of our research is to develop a reliable, valid, responsive measure of stroke-specific HRQOL across the range of stroke symptoms and severity. Our present study, conducted among patients with mild to moderate stroke, suggests that the SS-QOL is a valid and reliable measure of stroke-specific HRQOL that is moderately responsive to change in most domains during the first 3 months after stroke. Because it was developed inductively with stroke survivors defining the domains, the SS-QOL has excellent content validity. Compared with other common generic HRQOL measures, the SS-QOL has a broader coverage of functions typically affected by stroke and asks questions in these areas in a way that is meaningful to stroke patients. For example, the SF-36 and the EuroQol are commonly used in stroke trials but do not assess language, hand function, cognition, or vision. Consequently, the SS-QOL should be better able than current generic HRQOL instruments to assess meaningful poststroke HRQOL changes across the continuum of stroke symptoms.
Most of the individual domains of the SS-QOL show reasonable construct validity, that is, the SS-QOL domain scores correlate with the established measures assessing the same construct. The moderate degree of linear relationship seen (r2 range, 0.3 to 0.5) is appropriate, since we attempted to capture stroke-specific rather than generic changes in HRQOL. Higher correlations would suggest that the SS-QOL is capturing redundant information compared with the generic measure. As expected, SS-QOL domains that were compared with items on the NIHSS showed poor linear association. This is likely because the NIHSS measures impairments observable on neurological examination, not disability or handicap, and thus has a tendency to underestimate the effect of neurological symptoms on HRQOL. As a whole, one would expect the NIHSS score to be moderately related to overall HRQOL, but it is clear from our analysis of NIHSS scores in patients with different levels of self-reported HRQOL that an impairment scale alone is not an adequate surrogate for poststroke HRQOL.
Also as expected, the language and cognitive domains of the SS-QOL do not correlate with the corresponding NIHSS scores. To begin the development of the SS-QOL with minimal introduction of variability, we excluded patients with significant aphasia or cognitive deficits necessitating proxy responses. As a result, there were almost no patients with language or cognitive abnormalities as defined by the NIHSS. The SS-QOL, however, did detect more subtle complaints that patients had in these areas, since 37% noted their language and 37% noted their cognition impaired at 1 month compared with prestroke. Further development of the SS-QOL will include patients with language and cognitive effects of stroke and their proxies.
We also found poor correlation between the SS-QOL and the SF-36 on the social roles domain. Other researchers have reported poor performance of the SF-36 social functioning subscale in stroke patients.5 When self-reported social functioning compared with prestroke was used as the criterion, the SS-QOL social roles domain scores but not the SF-36 social functioning subscale scores were significantly different in patients with varying reports of dysfunction, suggesting that the SS-QOL social roles domain is measuring what stroke patients consider to be meaningful changes in poststroke social functions. In future validation studies, additional measures of social roles may be required to establish validation of the SS-QOL social roles domain.
The construct validity of the SS-QOL as a measure of HRQOL is inferred by the similar relationship between SS-QOL and SF-36 scores in the 3 HRQOL groups and by the linear association between the 2 scores. Both of these relationships suggest that the SS-QOL is measuring the intended underlying construct of HRQOL. Like other researchers,15 16 we found that even mild stroke affects many aspects of stroke recovery. This underscores the need to include measures of HRQOL in stroke trials and the necessity of developing responsive stroke-specific HRQOL instruments capable of measuring change across the spectrum of stroke severity.
With this goal of applicability for stroke trials in mind, we retained items that were responsive to patient-reported change. Thus, even in this sample of relatively mild stroke patients, most of the SS-QOL domains are responsive between 1 and 3 months after stroke, and only the vision domain had a ceiling effect (63% with maximal score). Further work is needed to assess the responsiveness of the SS-QOL in patients with more severe stroke.
It is important to emphasize the theoretical nature of the domains of the SS-QOL. The 12 domains of the SS-QOL were elicited from stroke patients. It is possible that in a large sample validation, where confirmatory factor analysis and multitrait-multimethod techniques can be used, the number of domains may be reduced. The exact number of domains is less important than the content of the items constituting the SS-QOL; it is imperative to include items that are measuring aspects of poststroke function that are important to patients.
Although we are encouraged by the preliminary reliability, content, and construct validity and responsiveness data, the SS-QOL is early in development, and many questions remain to be answered, including the issues of proxy respondents, interviewer versus self-administration, weighted versus unweighted domains, and performance in patients with more severe stroke. We are encouraged, however, that the SS-QOL appears to detect meaningful change even in patients with mild stroke, and we are revalidating the SS-QOL, including assessments of test-retest reliability, proxy responses, and mode of administration, in a cohort that includes more severely affected patients. At present, we would suggest that, if resources allow, all 78 items of the SS-QOL be included when populations with moderate to severe stroke are studied since modification of these items may be required for optimal performance of the SS-QOL in these patients. In addition, since we found that the “amount of help” response set was not responsive, likely related to our patients’ good functional outcome, we recommend that the self-care items be converted to the “amount of trouble” responses because of their greater ability to respond to clinically meaningful changes after stroke. Ongoing revalidation will assess the performance of this response set in patients with worse functional outcome.
Despite these limitations, preliminary results regarding the reliability, validity, and responsiveness of the SS-QOL are encouraging. Further validation of the SS-QOL in a larger sample that includes patients with more severe stroke is under way. Although the techniques of assessing patient-reported outcomes in clinical trials can be challenging,17 it is essential for stroke trials to evaluate interventions from the patient’s perspective. The SS-QOL is a single stroke outcome measure that aims to efficiently assess the various domains important in determining stroke-specific HRQOL across the spectrum of stroke symptoms and severity.
SS-QOL items indicated with an asterisk are items retained after analysis. Items were administered in random order with standardized instructions. Items are presented here in individual domains⇓.
This study was supported by a Research Career Development Award, Office of Research and Development, Health Services Research and Development, Department of Veterans (Dr Williams). This work was performed (in part) in the Regenstrief Institute for Health Care.
Reviews of this paper were directed by Guest Editor Vladimir Hachinski. Because of possible conflict of interest, Editor-in-Chief Mark Dyken was not involved in the review process.
- Received December 28, 1998.
- Revision received April 7, 1999.
- Accepted April 7, 1999.
- Copyright © 1999 by American Heart Association
Ware JE, Sherbourne CD. The MOS 36-item short form health survey (SF-36), I: conceptual framework and item selection. Med Care. 1992;30:472–483.
Anderson C, Laubscher S, Burns R. Validation of the Short Form 36 (SF-36) health survey questionnaire among stroke patients. Stroke. 1996;27:1812–1816.
Dorman PJ, Waddell F, Slattery J, Dennis M, Sandercock P. Is the EuroQol a valid measure of health-related quality of life after stroke? Stroke. 1997;28:1876–1882.
Dorman PJ, Waddell F, Slattery J, Dennis M, Sandercock P. Are proxy assessments of health status after stroke with the EuroQol questionnaire feasible, accurate, and unbiased? Stroke.. 1997;28:1882–1887.
Gordon DL, Bendixen BH, Adams HP Jr, Clarke W, Kappelle LJ, Woolson RF. Interphysician agreement in the diagnosis of subtypes of acute ischemic stroke: implications for clinical trials: the TOAST Investigators. Neurology. 1993;43:1021–1027.
Goldstein LB, Chilukuri V. Retrospective assessment of initial stroke severity with the Canadian Neurologic Scale. Stroke. 1997;28:1181–1184.
Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J. An inventory for measuring depression. Arch Gen Psychiatry. 1961;4:561–571.
Brott T, Adams HP Jr, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Herztberg V, Rorick M, Moonaw CJ, Walker M. Measurement of acute cerebral infarction: a clinical examination scale. Stroke. 1989;20:864–870.
Bollen KA. Structural Equations with Latent Variables. New York, NY: John Wiley & Sons; 1989.
Cohen J. Statistical Power Analysis for the Behavioral Sciences. New York, NY: Academic Press; 1977.
Duncan PW, Samsa GP, Weinberger M, Goldstein LB, Bonito A, Witter D, Enarson C, Matchar D. Health status of individuals with mild stroke. Stroke. 1997;28:740–745.
Sacco RL, Boden-Albala B, Chen X, Lin I-F, Kargman DE, Paik MC. Relationship of 6-month functional outcome and stroke severity: implications for acute stroke trials from the Northern Manhattan Stroke Study. Neurology. 1998;50:A327. Abstract.