The Stroke Impact Scale Version 2.0
Evaluation of Reliability, Validity, and Sensitivity to Change
Background and Purpose—To be useful for clinical research, an outcome measure must be feasible to administer and have sound psychometric attributes, including reliability, validity, and sensitivity to change. This study characterizes the psychometric properties of the Stroke Impact Scale (SIS) Version 2.0.
Methods—Version 2.0 of the SIS is a self-report measure that includes 64 items and assesses 8 domains (strength, hand function, ADL/IADL, mobility, communication, emotion, memory and thinking, and participation). Subjects with mild and moderate strokes completed the SIS at 1 month (n=91), at 3 months (n=80), and at 6 months after stroke (n=69). Twenty-five subjects had a replicate administration of the SIS 1 week after the 3-month or 6-month test. We evaluated internal consistency and test-retest reliability. The validity of the SIS domains was examined by comparing the SIS to existing stroke measures and by comparing differences in SIS scores across Rankin scale levels. The mixed model procedure was used to evaluate responsiveness of the SIS domain scores to change.
Results—Each of the 8 domains met or approached the standard of 0.9 α-coefficient for comparing the same patients across time. The intraclass correlation coefficients for test-retest reliability of SIS domains ranged from 0.70 to 0.92, except for the emotion domain (0.57). When the domains were compared with established outcome measures, the correlations were moderate to strong (0.44 to 0.84). The participation domain was most strongly associated with SF-36 social role function. SIS domain scores discriminated across 4 Rankin levels. SIS domains are responsive to change due to ongoing recovery. Responsiveness to change is affected by stroke severity and time since stroke.
Conclusions—This new, stroke-specific outcome measure is reliable, valid, and sensitive to change. We are optimistic about the utility of measure. More studies are required to evaluate the SIS in larger and more heterogeneous populations and to evaluate the feasibility and validity of proxy responses for the most severely impaired patients.
The assessment of outcomes in individuals after stroke is important for both clinical practice and research, yet there is no consensus on the best measures of stroke outcome in either clinical practice or research.1 Existing measures have not been sensitive to change in mild strokes.2 The most commonly used outcome measures, the Rankin scale and the Barthel Index, assess only the physical aspects of stroke.3 4 No stroke-specific outcome measure has been developed that assesses other dimensions of health-related quality of life: emotion, communication, memory and thinking, and social role function. We have developed a stroke-specific outcome measure, the Stroke Impact Scale (SIS) to detect these important consequences of stroke, especially in mild to moderate strokes. The measure was developed from the perspective and input of both the patient and caregiver, and it incorporates contemporary standards of instrument development.5 Development of this outcome measure followed a comprehensive iterative process (Duncan et al, unpublished data, 1999).
The quality of any outcome measure is based on its psychometric attributes, which include reliability, validity, and sensitivity to change.8 9 10 The purpose of this article is to describe the psychometric attributes of the SIS (Version 2.0). Additionally, we assessed the relationship between the SIS scores and the patient’s global assessment of percentage of recovery.
Subjects and Methods
Stroke Impact Scale Version 2.0
The 64 items of SIS Version 2.0 comprise 8 domains (Appendix). Principal components factor analysis was performed with all 8 domains to assess the feasibility of summing scores across domains. This analysis revealed 5 factors, 1 of which encompassed 4 physical domains (including strength, hand function, mobility, and activities of daily living/instrumental activities of daily living [ADL/IADL]), with the remaining factors being emotion, communication, memory, and social participation. Therefore, if desired, the 4 physical domains can be summed to create 1 score. Emotion, communication, memory, and social participation must be scored as individual domains.
Aggregate scores in each domain were generated with an algorithm equivalent to the scoring algorithm for the SF-36.11 For a particular subject, if ≥50% of the questions had missing responses, the domain score was assigned as missing. Otherwise, scores for each domain were computed using the following equation: where the score is the domain score for a particular domain, the mean is the mean of the nonmissing item scores within that domain, with each item scored in the range of 1 to 5. Using this algorithm, each domain score has a range of 0 to 100. This algorithm was selected to make the protocol for this development study consistent with the ultimate application of the instrument. For this development study, the number of observations in which missing scores were imputed to obtain domain scores were minimal, with only 3 observations (1 subject at 1 month and 2 subjects at 3 months) having domain scores computed from less-than-complete data.
The SIS also includes a question to assess the patient’s global perception of percentage of recovery. After the SIS is administered, the respondent is asked to rate their percent recovery since their stroke on a visual analog scale of 0 to 100, with 0 meaning no recovery and 100 meaning full recovery (Appendix). (The SIS and the scoring algorithm can be accessed at www2.kumc.edu/coa)
Subjects for the psychometric analysis of Version 2.0 of the SIS were obtained from a sample of convenience, using a subset of the participants in the Kansas City Stroke Study. The Kansas City Stroke Study was a prospective cohort study of 459 individuals designed to characterize the patterns of recovery of patients with mild, moderate, and severe stroke. As described by Lai et al,12 individuals with stroke were assessed with a battery of instruments within 14 days after stroke, and a follow-up was performed at 1, 3, and 6 months after stroke.12 Because the Kansas City Stroke Study was in progress when the SIS was completed, only the last 105 subjects were included in the testing and analysis of the SIS instrument. Of those subjects, 12 declined to participate and 2 of the remaining 93 had a major stroke. Ninety-one subjects with minor and moderate stroke were eligible to complete the SIS at 1 month. Additionally, the SIS alone was repeated in 25 randomly selected patients 1 week after their assessment at either 3 or 6 months, and the data from these replicate assessments were used to assess test-retest reliability.
Concurrent Measures for Assessment of Validity
The battery from the Kansas City Stroke Study included the Barthel Index,4 the Functional Independence Measure (FIM),13 the Fugl-Meyer (FM),14 the Folstein Mini-Mental State Examination (MMSE),15 the NIH Stroke Scale,16 the SF-36,11 the Duke Mobility Scale,17 and the Geriatric Depression Scale (GDS).18 Each of these measures has been established as a tool for assessing rehabilitation outcomes, and together they provide a battery against which to assess the concurrent validity of the SIS.
Stroke severity was determined by administration within 3 to 14 days after stroke of the Orpington Prognostic Scale,19 a weighted measure that screens for motor deficits, sensory loss, balance, and cognition. The Orpington Scale ranges for stroke severity are as follows: <3.2, minor stroke; 3.2 to 5.2, moderate stroke; and >5.2, major stroke. For the development of this instrument we included only minor and moderate strokes.
Descriptively, we examined the mean and SD of each domain score as a function of stroke severity. We determined floor (percentage of subjects who scored 0) and ceiling (percentage of subjects who scored 100) effects for each domain. Item convergence and item discrimination were assessed by computing the Pearson product moment correlation between each item and the domain total score for each of the 8 domains. The appropriateness of creating summative scales within each of the 8 domains was examined with the multitrait criteria described by Stewart et al.20 21 22
To assess reliability, we evaluated the internal consistency of each item with the Cronbach α for each domain scales23 and examined the stability of each scale by computing the intraclass correlation coefficients (ICCs) using the data from the 25 test-retest observations.
We examined the criterion validity of the SIS domain scores by comparing the results from the SIS to measures selected a priori from the Kansas City Stroke Study battery. For example, for the strength domain we selected a motor assessment for comparison (Fugl-Meyer Motor Assessment), for ADL/IADL domain we selected the Barthel ADL, and for memory and thinking we selected the MMSE. Criterion-related validity was assessed by examining Spearman Rank correlation coefficients. Discriminant validity was addressed by comparing mean scores for each domain to the groups defined by the Rankin scale. We used ANOVA to determine whether the SIS scores were different across Rankin classification.
To assess the sensitivity of measures to change, we used mixed model software (SAS-MIXED procedure) for repeated measures.24 Omnibus F tests were used to examine stroke severity and time main effects as well as the stroke severity*time interaction. We used t statistics within each stroke severity level to compare domain change scores for 1 to 3 months, 1 to 6 months, and 3 to 6 months. Finally, we used multiple regression analysis to determine those domain scores that most accurately predict a patient’s own global assessment of the percentage of stroke recovery.
Thirty-three individuals with minor stroke and 58 moderate stroke patients participated in this study. Table 1⇓ summarizes demographic characteristics and the 1-month levels of the different KCSS outcome measures for the sample. Between 1 and 3 months and between 3 and 6 months, 11 individuals were lost to follow-up SIS administration. Reasons for loss to follow-up at 3 months included 3 deceased, 1 cognitively impaired, 1 aphasic, and 5 dropped out. Reasons for loss to follow-up at 6 months included 2 deceased, 2 cognitively impaired, 2 moved, and 5 dropped out.
The distribution of scores on the 8 SIS domains and the patient’s global assessment of recovery are shown in Table 2⇓. Each of the scales showed acceptable levels of endorsement with the full range of the scale used by these 91 subjects. For each of the scales except emotion, minor subjects on average scored better than moderate subjects. In both minor and moderate strokes, 3-month and 6-month scores were better than 1-month scores. Within each of the severity groups, the SDs of the scales were in the range of 15 to 30, signifying reasonable dispersion of outcomes across the sample.
Assessment of floor and ceiling effects (Table 3⇓) showed the potential for floor effects in the most difficult domain (hand function) in the moderate stroke group and the possibility of a ceiling effect in the communication domain for both the mild and moderate stroke groups. The highest percentage of ceiling effects for the SIS was for the communication domain (35%) compared with the 64.6% ceiling rate for the Barthel.
Ninety-nine item/scale correlations were used to assess the degree of item convergence and item discrimination.20 21 22 All of the items except 1 in the emotion domain had item domain correlations of 0.4 or greater, a level generally considered to represent reasonable item convergence with the dimension in which it is included.21 The item that does not have a corrected item/domain correlation of at least 0.4 is “feel quite nervous.” Item discrimination is supported if an item’s convergent correlation is 2 SEs (2/√n) greater than correlations computed for the item and other domains.21 The percentage of items/scale comparisons within each domain that meet this criterion ranged from 70% to 98%. Item discrimination is excellent for the domains of strength, emotion, communication, and memory (89% to 98%) but only modest for ADL/IADL, mobility, hand function, and participation domains (70% to 83%). These domains reflect higher-level functional activity, and several items in each domain were found to be highly correlated both with their own domain and the other 3 domains.
The Cronbach α coefficients ranged from 0.83 to 0.90 and meet criteria for measuring change over time.25 The ICCs of the 8 domains are in the range of 0.7 to 0.92, except for emotion (0.57).
The discriminant validity of each SIS domain was examined by comparison of mean scores across groups defined by the 6-month Rankin scores. The results of the analysis (Table 4⇓) indicate that the scales for 6 of the 8 domains were significantly different (P<0.02 to P<0.0001) across the Rankin levels. The memory and thinking domain and emotion domain scores are not significantly different across the Rankin levels.
As shown in Table 5⇓, each of the domain scales showed good criterion validity. The measures of disability (mobility and ADL/IADL) showed excellent coherence with the established measures, with correlation coefficients in the range of 0.82 to 0.84. Correlations for domains that measure memory and communication were more modest, generally in the range of 0.44 to 0.58. The participation domain showed a moderate correlation with the SF-36 social function domain (0.70), but the correlations with the SF-36 emotional and physical role functions were low, at 0.28 and 0.45, respectively. The correlations between the SIS domains and patient’s global rating of recovery were good (0.53 to 0.63), with weaker correlations for memory, communication, and emotion (0.21 to 0.39).
The multiple regression analysis of global recovery as a function of SIS domain scores revealed that physical function (P=0.0001), emotion (P=0.0002), and participation (P=0.058) domains were predictors of the patient’s global assessment of recovery. Forty-five percent of the variance in the patient’s assessment of percentage of recovery was explained by these factors. Basic ADL alone, as measured by the Barthel Index (P=0.0001), explained only 33% of the variance in the patient’s assessment of recovery.
Sensitivity to Change
Table 6⇓ is a summary of the SIS domain sensitivity to change as measured by t statistics from the mixed models. All results are stratified by severity and time since stroke. Severity and time poststroke effect sensitivity of each domain. For minor strokes the instrument is sensitive to change from 1 to 3 months and 1 to 6 months, but not between 3 to 6 months, for the domains of hand function, mobility, ADL/IADL, combined physical, and participation. For moderate strokes the instrument is also sensitive to change to change from 1 to 3 months and 1 to 6 months, but for higher-level functions (mobility, ADL/IADL, combined physical, and participation) in moderate stroke the instrument is sensitive to change from 3 to 6 months.
The heterogeneity of stroke severity and symptoms has created many challenges to the assessment of stroke outcomes, especially in mild and moderate stroke. The most commonly used stroke outcome measures, the Barthel Index and the Rankin scale, have captured only the physical aspects of stroke disability. Stroke impacts not only physical function but also emotion, memory and thinking, communication and role function (social participation). Focus group interviews with patients and caregivers have demonstrated that these factors should be assessed as sequelae of stroke.6 Furthermore, the results of this study have demonstrated that in addition to the physical aspects of disability, emotion and participation also predict the patient’s assessment of stroke recovery. If these multiple dimensions are to be assessed in clinical trials, multiple instruments are required, and the responsiveness of these instruments in patients with varying severity has not been considered. Additionally, administration of numerous measures is burdensome to patients and researchers. Multiple domains may be captured with the SIS. Most importantly, the combined physical domain captures a wide range of skills, including hand function. No other stroke outcome measure has assessed hand function. In the development of this instrument, we have incorporated the different levels of the WHO model of disability (impairment, disability, and handicap) but have endorsed the new WHO levels of activities (for disabilities) and participation (for handicap).26 The resulting 64 items are grouped into 8 domains: strength, hand function, ADL/IADL, mobility, communication, emotion, memory, and participation. Factor analysis has revealed that strength, hand function, mobility, and ADL/IADL can be combined into a physical domain but that other domains represent distinct dimensions of recovery that should be examined individually. This combined domain has important implications for developing a summary score for outcomes in clinical trials. The individual physical domains may be retained for those who are interested in specific components of function, ie, hand function or mobility.
Existing stroke outcome measures have suffered from ceiling effects in mild to moderate stroke.2 In 459 subjects enrolled in the Kansas City Stroke Study, 89% with the minor strokes and 52% with moderate strokes achieved 90 on the Barthel Index by 6 months (P.W. Duncan, PhD, and S.M. Lai, PhD, unpublished data, 1999). Therefore, the Barthel Index, which measures only basic ADL, has limited ability to discriminate outcomes in most individuals who survive stroke. The use of global measures such as the Rankin scale to define the success of interventions may disguise meaningful shifts in disability states and changes in health-related quality of life.27 In contrast to the Barthel Index and the Rankin scale, the Stroke Impact Scale (SIS) is a new measure that broadens the range of deficits and recovery assessed, and changes in scores may be treated continuously. Consequently, the SIS provides a potentially more relevant outcome because stroke has variable impact on many domains of health status. The SIS does not suffer from the magnitude of ceiling effects observed with the Barthel Index (Table 3⇑). To broaden the range of ADL function, we combined basic ADL and IADL into 1 domain. The Cronbach α for this domain remains high. The validity of combining basic and instrumental ADL into 1 scale has also been supported by other measurement researchers.28
The psychometric properties of the SIS Version 2.0 support its use to measure change over time.25 The range of the Cronbach α for all domains is 0.86 to 0.90, which meets or approaches the standard of 0.90 for comparing patients across time.25 The test-retest reliability of 7 of the 8 instrument domains also meets the requirements of a measure to assess the same patient across time.9 The ICC for the emotional domain was only 0.527. However, this ICC is substantially higher than the ICC of 0.28 reported for the SF-36 mental health domain of stroke patients.29
The validity of the domain constructs was supported by the analysis of convergent and divergent validity. The lowest item scaling success rates were for mobility and ADL/IADL items. The items within these domains are highly correlated across these domains and do not meet the criteria that item discrimination is supported if the item’s convergent correlations is 2 SEs greater than their correlation with other domains.22 The correlation between mobility and ADL/IADL may compromise their independence as primary end points in clinical trials. However, utilization of the physical domain score (combined strength, hand function, mobility, ADL/IADL) will avoid this problem.
The discriminant validity of this measure is excellent. Domain scores for minor strokes were higher than for moderate strokes, and the scores were different across 4 Rankin levels. Three- and 6-month domain scores were higher than 1-month scores. The patterns of changes in scores, differences between 1 and 3 months, and little change between 3 and 6 months are consistent with the numerous studies that have reported that almost all stroke recovery is complete in 3 months.30 31 32 33 34 Yet most previous studies have assessed only recovery of basic ADLs and motor function. The results of this study using the SIS demonstrate that patients are changing in all dimensions except emotion and patients perceive this recovery as measured by their global assessment of percentage of recovery. The SIS change scores are congruent with the previously described patterns of recovery: most recovery occurs in the first 3 months, and recovery is determined by severity.30 31 32 33 34 35 However, in respondents with moderate stroke, the SIS detected change between 3 and 6 months for ADL/IADL, mobility, physical domain, and participation. Previous studies of recovery may not have detected this ongoing recovery due to the psychometrics of the instruments used.
In assessing the effectiveness of interventions on outcomes that progress over time, clinical researchers frequently are asked to define “clinically” meaningful change. In developing such a definition, investigators must consider the precision of the outcome measure and the magnitude of change that is physiologically relevant or has value to the patient. From these 2 perspectives, changes in SIS domain scores of approximately 10 to 15 points appear to represent reasonable definitions of clinically meaningful change. In terms of precision, the variance components analyses that formed the basis of the ICCs for the test-retest reliability indicate that the within-subject SDs on replicate tests range from 6 to 15 for the different domains. Although sample sizes were insufficient to demonstrate statistically significant differences across Rankin levels, the best estimates of mean differences between adjacent categories were 10 to 15 points for most SIS domains. Although additional testing is needed on larger samples to develop refined estimates, these initial analyses support the 10- to 15-point range as reasonable.
The emotion domain had less-desirable psychometric properties than the other domain. It has the lowest reliability, 1 item did not have acceptable item domain correlations, and it has limited sensitivity to change compared with the SF-36 mental health domain. Several reasons might explain the poor performance of this domain. First, we asked the patient to rate their emotional domain relative to the past week, while the SF-36 is in reference to the past 4 weeks. Because emotion scores are expected to exhibit more short-term variability than physical scores, the emotion domain of the SIS is likely to be somewhat less reliable than the SF-36 MHI. Also, the questions on the SIS may be related to components of emotional states that are likely to have more random variability across time, whereas the SF-36 may be capturing emotional traits. Although these results complicate analysis within the emotion domain, patient responses during the development indicate that emotion should be assessed. Furthermore, in all of our analyses, the patient’s score on the emotion domain contributed significantly to the patient’s perception of recovery, suggesting that at a minimum, the emotion domain must be taken into account in assessment of the effect of an intervention on outcomes in the other domains.
This study has several limitations. Most importantly, the instrument was developed on a small sample of mild and moderate stroke patients who had the communication skills and cognitive function to participate in interviews. While over two thirds of all stroke survivors have mild to moderate deficits,36 the usefulness of the SIS scale in more severely involved patients needs to be evaluated. Future studies will need to be done to develop guidelines for the minimum levels of cognitive and communication skills necessary for the patient to complete the SIS. Because individuals with major stroke will need to be included in outcome studies, we need to assess characteristics of proxy-versus-patient responses. A study assessing proxy/patient responses has been initiated. Second, this instrument was interviewer administered in the patient’s home. Other modes of administration (telephone and mail questionnaire) will need to be assessed in the future. Finally, we will need to continue to develop a more stable and responsive measure of emotion.
We are optimistic about the utility of this new stroke outcome measure. However, as required in the development of any measure, an ongoing program to evaluate the measure and to explore the generalizability of the results in one sample to other stroke populations is needed.8 We have ongoing funded research programs to continue the evaluation of the Stroke Impact Scale.
The present study was supported by the Department of Veterans Affairs Rehabilitation Research and Development (E8793), Glaxo-Wellcome Inc, and the University of Kansas Claude D. Pepper Older Americans Independence Center, funded by the National Institute of Aging (P60 AG 14635-02).
- Received February 11, 1999.
- Revision received July 22, 1999.
- Accepted July 22, 1999.
- Copyright © 1999 by American Heart Association
Roberts L, Counsell C. Assessment of clinical outcomes in acute stroke trials. Stroke. 1998;29:986–991.
Wade D. Measurement in Neurological Rehabilitation. New York, NY: Oxford University Press; 1997.
Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md State Med J. 14:61–65.
Juniper E, Guyatt G, Jaeschke R. How to develop and validate a new health-related quality of life instrument. In: Spilker B, ed. Quality of Life and Pharmacoeconomics in Clinical Trials. Philadelphia, Pa: Lippincott-Raven; 1996:49–58.
Deleted in proof.
Deleted in proof.
Hobart JC, Lamping DL, Thompson AJ. Evaluating neurological outcome measures: the bare essentials. J Neurol Neurosurg Psychiatry. 1996;60:127–130.
McDowell I, Newell C. Measuring Health: A Guide to Rating Scales and Questionnaires. New York, NY: Oxford University Press; 1996.
Ware JE. SF-36 Health Survey: Manual and Interpretation Guide. Boston, Mass: The Health Institute, New England Medical Center; 1993.
Lai SM, Duncan PW, Keighley J. Prediction of functional outcomes after stroke: comparison of the Orpington Prognostic Scale and the NIH Stroke Scale. Stroke. 1998;29:1838–1842.
Hamilton BB, Granger CV, Sherwin FS. A uniform national data system for medical rehabilitation. In: Fuhrer MJ, ed. Rehabilitation Outcome: Analysis and Measurement. Baltimore, Md: Paul Brookes; 1987:137–147.
Brott T, Adams HP Jr, Olinger CP. Measurements of acute cerebral infarction: a clinical examination scale. Stroke. 1989;20:864–870.
Hogue C, Studenski S, Duncan PW. Assessing mobility: the first steps in preventing fall. In: Funk SG, Tornquist EM, Champagne MT, Copp LA, Wiese RA, eds. Key Aspects of Recovery. New York, NY: Springer; 1990:275–281.
Yesavage JA, Brink T, Rose TL, Lum O, Huang V, Adey M, Leirer VO. Development and validation of a geriatric depression screening scale: a preliminary report. J Psychiatr Res. 1983;17:37–49.
Stewart AL, Hays RD, Ware JE Jr. Methods of constructing health measures. In: Stewart AL, Ware JE, eds. Measuring Function and Well-Being. Durham, NC: Duke University Press; 1992:67–85.
Scientific Advisory Committee. Instrument review criteria. Med Outcome Trust Bull. 1995;September:I-IV.
Littell RC, Milliken GA, Stroup WW, Wolfinger RD. SAS System for Mixed Models. Cary, NC: SAS Institute; 1996.
Howard KI, Forehand GG. A method for correcting item-total correlations for the effect of relevant item inclusion. Educ Psychol Meas. 1963;22:40–66.
WHO. ICIDH-2 International Classification of Impairments, Activities and Participation: A Manual of Dimensions of Disablement and Functioning. Beta-1: Draft for Field Trials. Geneva Switzerland: World Health Organization; 1997.
Spector WD, Fleishman JA. Combining activities of daily living with instrumental activities of daily living to measure functional disability. J Gerontol. 1998;53:46–57.
Dorman P, Slattery J, Farrell B, Dennis M, Sandercock P. Qualitative comparison of the reliability of health status assessments with the EuroQol and SF-36 questionnaires after stroke. Stroke. 1998;29:63–68.
Bonita R, Beaglehole R. Recovery of motor function after stroke. Stroke. 1988;19:1497–1500.
Duncan, PW, Goldstein LV, Horner RD, Landsman PB, Samsa GP, Matchar DB. Similar motor recovery of upper and lower extremities after stroke. Stroke. 1994;25:181–1188.
Ferrucci L, Bandinelli S, Guralnik JM, Lamponi M, Bertini C, Falchini M, Baroni A. Recovery of functional status after stroke: a postrehabilitation follow-up study. Stroke. 1993;24:200–205.
Kelly-Hayes M, Wolf PA, Kase CS, Gresham GE, Kannel WB, D’Agostino RB. Time course of functional recovery after stroke: the Framingham study. J Neurol Rehabil. 1989;3:65–70.
Wade, DT, Langston-Hewer R, Wood VA, Skilbeck C. The hemiplegic arm after stroke: measurement and recovery. J Neurol Neurosurg Psychiatry. 1983;46:521–524.