ACTIVLIM-Stroke: A Crosscultural Rasch-Built Scale of Activity Limitations in Patients With Stroke
Background and Purpose—This study describes the development of a Rasch-built scale measuring activity limitations in stroke patients, named ACTIVLIM-Stroke.
Method—This new Rasch-built measure was constructed based on stroke patients' perceptions of difficulty in performing daily activities. Patients were recruited from inpatient and outpatient rehabilitation departments in Belgium and Benin. A 73-item questionnaire was completed by 204 participants. A random subsample of 83 subjects was given the questionnaire a second time. Data were analyzed using RUMM2030 software.
Results—After successive Rasch analyses, the ACTIVLIM-Stroke questionnaire, a unidimensional and linear 20-item measure of activity limitations, was constructed. All 20 items fulfilled Rasch requirements (overall and individual item fit, category discrimination, invariance, local response independence, and nonredundancy in item difficulty). This simple patient-based scale encompasses a large range of activities related to self-care, transfer, mobility, manual ability, and balance. The ACTIVLIM-Stroke questionnaire exhibited high internal validity, excellent internal consistency, and good crosscultural validity. The test–retest reliability of item difficulty hierarchy (intraclass correlation coefficient=0.99) and patient location (intraclass correlation coefficient=0.92) were both excellent. Furthermore, it showed good external construct validity using correlations with the Functional Independence Measure motor and the Barthel Index and a higher discriminating capacity than either of these widely used indices.
Conclusions—The ACTIVLIM-Stroke questionnaire has good psychometric qualities and provides accurate measures of activity limitations in patients with stroke. It is recommended for evaluating clinical and research interventions in patients with stroke, because it provides a higher discrimination and might be more sensitive to change.
The increasing use of patient-reported outcomes has encouraged the development of several questionnaires to evaluate individual functioning based on the International Classification of Functioning, Disability and Health framework. The International Classification of Functioning, Disability and Health, a classification of health and health-related domains, describes individual functioning in 3 domains: (1) body functions and anatomic structures; (2) activity; and (3) participation.1 Problems in each domain are, respectively, impairments, activity limitations, and participation restrictions. In the activity domain, the International Classification of Functioning, Disability and Health defines activity limitations as the difficulties a person might have in executing daily activities. Activity limitation is a behavior that is a combination of motor function, compensatory behavior of individuals, and personal (eg, age, lifestyle, motivation) and environmental (eg, architectural characteristics, ground type) factors. Therefore, limitation of activity cannot be measured directly but can be inferred from an individual's perception of the difficulty of performing activities.
Despite the wide range of instruments currently available, only few are identified as meeting rigorous, evidence-based modern psychometric standards for a rating scale.2,3 Earlier tools were developed following traditional standards of measurement science, concentrating on key aspects such as reliability and validity. More recently, a greater emphasis has been given to more powerful diagnostic approaches, which examine a wider range of attributes such as response category functioning and differential item functioning (DIF). Among the new approaches, the Rasch measurement model is the most commonly used.4 Over the last 15 years, Rasch analysis has been widely used in health science.5–9 Some Rasch-built scales such as the ABILHAND scale (a measure of manual ability),10 the ABILOCO scale (a measure of locomotion ability),11 and the EG Motor Index (a measure of mobility)12 assess the functioning of patients with stroke. The Stroke Impact Scale, developed in 1999,13 has been refined in 2003 using Rasch analysis.14 However, most stroke-specific Rasch-built scales evaluate only some aspects of activity limitations as defined by the International Classification of Functioning, Disability and Health.1 ABILHAND, ABILOCO, and the EG Motor Index assess specific aspects of activity limitations and are important in trials designed to evaluate the effect of particular interventions focused on a specific skill, for example manual ability or mobility. However, they cannot be used as a comprehensive measure of whole activity limitation. The Stroke Impact Scale is a broad assessment tool of physical function and not a measure specific to activity limitations, because it includes items from the body function domain (eg, “bladder and bowel control”), activity domain (eg, “move from a bed to a chair,” “bathe yourself”), and participation domain (eg, “go shopping”).
A full assessment of a stroke patient's functional ability should consider the broad range of activity limitations as a whole variable, as did Vandervelde et al15 in the ACTIVLIM scale, a Rasch-built measure of activity limitations in children and adults with neuromuscular disorders. However, the use of the ACTIVLIM questionnaire for patients with stroke would require a validation in that diagnosis. Moreover, given the increase in multicenter international studies, so facilitating comparison of the outcome of across different populations, crosscultural validated outcome measures are also required. Consequently, this current study aimed to calibrate and validate the ACTIVLIM questionnaire for patients with stroke from Europe (Belgium) and Africa (Benin).
Data were collected from French language-speaking communities in European (Belgium) and African (Benin) patients with stroke. This study was approved by the Ethics Committee of the Université catholique de Louvain in Belgium and the local ethics committees of the participating caregiver centers and hospitals in Benin. Patients signed an informed consent form before being included.
Patients were recruited from rehabilitation departments, including patients with stroke currently undergoing rehabilitation and those discharged. Patients who had been discharged were identified from patient registers at the recruitment centers. The study was restricted to patients presenting no major cognitive deficit that could potentially prevent them from completing a self-report questionnaire (≥24 of 30 on the Mini-Mental State Examination).16,17
Patient Assessment and Outcome Measures
A preliminary list of 81 items generated by Vandervelde et al15 was submitted to physical therapists, occupational therapists, and medical doctors involved in stroke rehabilitation. They were asked to identify which items were not relevant for patients with stroke, resulting in deletion of 5 items. Three other items that concerned specific lifestyle aspects with no direct correspondence to 1 of the countries studied were removed from the original list. For example, getting on an escalator was deleted because escalators are not common in Benin. A set of 73 items was submitted to both European and African patients with stroke. Patients were asked to provide their perceived difficulty in performing each activity if completed without technical or human assistance. The response format was a 3-level scale labeled and scored as impossible (0), difficult (1), or easy (2). Unfamiliar activities were recorded as missing responses.
BI and the FIM
The BI and FIM are observer-rated generic measures of disability widely used in rehabilitation.20 They are accepted as functional-level assessment tools evaluating the functional status of patients throughout the rehabilitation process. The FIM comprises 18 items,18 and Linacre et al21 found that these items define 2 statistically and clinically different indicators: (1) FIM–motor, which assesses disability in motor functions (13 items); and (2) FIM–cognitive, which assesses disability in cognitive functions (5 items). The BI is a 10-item scale assessing different aspects of functional ability for self-care and daily activities.19 The FIM, although limited, is commonly used by clinicians and researchers as indicated by recent review studies.22 In a more recent meta-analysis study (2011), it appears that FIM–motor scale and the BI are still used as the main outcome in some randomized controlled trials.23
Rasch Analysis and Item Selection
The Rasch analysis tests whether data from a scale satisfy the rules for constructing interval scale measurement.24 Based on a mathematical model, it estimates person ability and item difficulty by examining a matrix of these items on a common scale comparing individual response patterns with the response pattern of the entire sample.4 In other words, it is a probabilistic model that converts ordinal scores into interval measures and, in the process, examines other key attributes such as unidimensionality, invariance, sample targeting of a scale, the appropriateness of response format, hierarchy of item difficulty, and the local independence of items within a scale. Consequently, Rasch analysis enables evaluation of the internal construct validity of a scale, and this is judged through overall fit statistics including item fit, person fit, and total χ2 probability, which evaluates the extent to which the scale fits the Rasch model. The reliability of the scale was examined using the Person Separation Index, which indicates the extent to which the questionnaire distinguishes distinct ability levels. Rasch analysis and its applications and advantages are described in detail elsewhere.8,25–30
Patients' responses to the ACTIVLIM-Stroke questionnaire and selected personal factors were analyzed using RUMM2030 software for Rasch analysis,31 under an unrestricted partial credit model. During successive analyses, the following criteria were used for item selection: (1) missing responses: items presenting a missing response rate of ≥20% were removed before analyzing the entire item set; (2) category discrimination: for each item, the 3-level response format was applied defining 2 thresholds of increasing order. Threshold 1 (t1) was between the categories “impossible” and “difficult” and was expected to be followed by threshold 2 (t2) between the categories “difficult” and “easy.” Subjects with higher ability should score higher than subjects with lower ability, indicating correct category discrimination. When the categories were not discriminated as expected, reversed thresholds were observed and these items were deleted; (3) item fit to the model: individual item fit was examined through fit residuals and χ2 statistics. Residuals indicated the deviations of items from the expected model score. Only items with residuals within the range ±2.5 were considered as fitting model expectations, and others were removed. Because significant χ2 probability (below the Bonferroni adjusted value) indicates misfit, these items were removed; (4) DIF: 4 personal factors were used dichotomously to check the invariance of the item difficulty hierarchy: age (≤55 years old, >55 years old), sex (male, female), affected side (left, right), and country (Belgian, Benin). The age cutoff of 55 years was based on the median value of our sample's age distribution, which was 55. Items with DIF were deleted; (5) local dependency: within a scale, local dependency of items affects the test score because it inflates the scale in a particular direction.32 When items are highly correlated, a patient's response to 1 item will influence the response to another. We examined residual correlations between items considering correlations of ≤0.2 as acceptable.
Scale Validity and Reliability
The external construct validity of the questionnaire was assessed by examining its association with other valid measures. This was based on the concept of convergent validity, which expects good correlation between scales measuring the same aspect. Correlations among FIM–motor, BI, and the ACTIVLIM-Stroke questionnaire were tested. The crosscultural validity of the ACTIVLIM-Stroke questionnaire was examined based on DIF analysis and the invariance of the items' hierarchy across countries.33,34 Furthermore, we examined the test–retest reliability of the questionnaire in a subsample of 83 subjects who underwent a second evaluation within 1 to 4 weeks.
Data from 204 patients with stroke were obtained, all of whom had a stroke occurring at least 1 month previously (108 recruited in Benin and 96 in Belgium). Their mean age±SD was 57.1±13.4 years, their time since stroke was 21.5±24.4 months; their average body mass index was 25.7±4.7 kg/m2. The sex distribution was 130 (63.7%) males and 74 (36.3%) females with 113 (55.4%) left paretic side and 91 (44.6%) right paretic side affected; 162 (79.4%) had ischemic stroke and 26 (12.7%) hemorrhagic stroke. The type of stroke was unknown in 16 (7.8%) patients. Their handedness before stroke was 185 (90.7%) right and 19 (9.3%) left and the rehabilitation process was ongoing in 169 (82.8%) patients.
Item Selection and Metric Quality of the ACTIVLIM-Stroke Questionnaire
From the original 73 items, 3 items presented a missing response rate ≥20% (eg, “putting on a seatbelt”) and were removed. The remaining data were fitted to the Rasch model and 8 items had reversed threshold (eg, “using a touchtone phone”), 19 items did not fit the model, 7 items showed DIF (1 for sex, 2 for age, and 4 for country), and 16 presented local dependency with other items. Consequently, 20 remaining items were selected for the ACTIVLIM-Stroke questionnaire.
Analysis of the final 20-item scale demonstrated excellent overall fit statistics for both items (mean±SD, −0.44±0.86) and persons (mean±SD, −0.27±0.83), where fit statistics are standardized to a mean of 0 and a SD of 1. The nonsignificant item-trait interaction (χ2=49.63, 40 degrees of freedom; P=0.14) indicated that the hierarchical ranking of the items did not vary across the trait, satisfying the required property of invariance of the scale for patients at different levels of activity limitation. An independent t test confirmed the unidimensionality of the scale, where comparisons of estimates derived from different sets of items showed only 3.92% of tests outside the range of −1.96 to 1.96.35 The CI for the binomial test of proportions was 0.9% to 6.9%. The reliability (Person Separation Index) of the final scale was 0.88.
Table 1 presents the calibration of the final 20-item scale. Items are ordered according to difficulty level from easiest (opening a door: −1.95 logits) to most difficult (carrying a heavy bag: 2.33 logits). The table also reports item estimates, the associated SE of estimation, and the fit statistics including the standard fit residual, χ2, and χ2 probability. The individual item fit residual (range, −1.80 to 1.30) and the χ2 probability (range, 0.04–0.97) indicated that all 20 items showed good item fit and contributed to the definition of a unidimensional scale of activity limitations. Figure 1 depicts the structure and targeting of the ACTIVLIM-Stroke questionnaire. The top panel shows the distribution of patients ACTIVLIM-Stroke measures. Approximately 95% of subjects, with ACTIVLIM-Stroke measures ranging from −2.4 to 4.8 logits, reported activity limitations with a fine perception of item difficulties, answering impossible, difficult, or easy. No participant found all items impossible and only 10 answered they could perform all activities easily. The scale is therefore well targeted with no considerable floor or ceiling effect. The middle panel shows the expected response to a given item as a function of patient functional ability. Zero is by convention set as the average item difficulty. A patient with an ability of 0 logit would be expected to perform without difficulty the 5 easiest activities, to perform with some difficulties the 7 average activities, and to fail to perform the 8 most difficult activities. The range of difficulties of the 20 items of ACTIVLIM-Stroke questionnaire fit the distribution of the functional abilities of patients with stroke. ACTIVLIM-Stroke measures were then obtained by converting total raw scores into linear measures. The bottom panel illustrates the relationship between ordinal total score and linear measures. This relationship was approximately linear between total raw scores of 5 and 34, in which a unitary increment in total ordinal score was equal to nearly 0.2 logit. Outside this central range, however, the same unitary increment in total ordinal score increased up to 0.9 logit, indicating that a unitary progression in total score accounted for an increasing amount of functional ability measure.
Invariance of the Items Hierarchy
Figure 2 presents differential item functioning plots as a comparison of the difficulty hierarchy of the items as rated by different subgroups according to affected side, age, sex, and country. Most items were within the 95% CI of the identity line, indicating that different subgroups perceived the item hierarchy similarly. Some minor exceptions of items lying outside the 95% CI were seen, but these did not show significant DIF. Additional statistics, based on intraclass correlation coefficient, confirmed the invariance of the ACTIVLIM-Stroke scale. Indeed, intraclass correlation coefficient between females and males, younger (≤55 years old) and older (>55 years old), left and right affected side, and European and African samples was ≥95 (P<0.001). The ACTIVLIM-Stroke questionnaire demonstrated good invariance of item hierarchy because item difficulty was consistently estimated across groups.
Validity and Reliability of the ACTIVLIM-Stroke Questionnaire
The ACTIVLIM-Stroke measures were highly and significantly correlated with the BI (r=0.83, P<0.001, n=119) and the FIM–motor (r=0.82, P<0.001, n=140). These correlations demonstrated a good convergent validity of the ACTIVLIM-Stroke questionnaire (Figure 3) and highlighted the low discriminating capacity and high ceiling effect of the FIM–motor and BI. Indeed, 38% (n=77) of participants had FIM–motor scores >80 of 91, indicating a high level of independence, whereas their ACTIVLIM-Stroke measures ranged from 0.34 to 5.03 logits (Figure 3). Similarly, 27% (n=56) of participants had BI scores >85 of 100, whereas their activity measures ranged from 0.77 to 5.03 logits (Figure 3). These results confirmed that the widely used FIM and BI do not appear to be sufficiently sensitive to discriminate higher levels of independence.
The test–retest of the ACTIVLIM-Stroke questionnaire in 83 patients showed excellent reliability. Patients' measures (intraclass correlation coefficient=0.92, P<0.001) and item hierarchy (intraclass correlation coefficient=0.99, P<0.001) had very good reproducibility (Figure 4). Item hierarchy and most patient measures were within the 95% CI of the identity line, indicating that: (1) the item difficulty hierarchy was invariant across time; and (2) the patients' estimation of their activity limitations was consistent over time.
The new ACTIVLIM-Stroke questionnaire is a 20-item scale in which all items were free of local dependency; showed good individual item fit; indicated no DIF for sex, age, affected side, or country; and exhibited no redundancy in item difficulty. Each item has 3 response possibilities (impossible, difficult, easy). The possible total linear scores provide a wide range from −4.65 to 5.03 logits. This large measurement range widely explores difficulties of performing daily activities in poststroke patients. For clinical practice, a conversion table is particularly useful. Table 2 presents the conversion of the ACTIVLIM-Stroke raw score (0 to 40) to Rasch measure (logit) and as a centile metric score. This will allow clinicians to quickly transform a patient's raw score to interval measures as long as the patient answers all items.
Our study presents a new scale, the ACTIVLIM-Stroke questionnaire, measuring activity limitations in patients with stroke. We describe the calibration and the validation of the scale in a large population of patients with stroke. This new Rasch-built measure was constructed based on both European and African stroke patients' perception of difficulty in performing daily activities. It provides a practical tool for accurate assessment of activity limitations in clinical settings and in community-based rehabilitation programs. This patient-reported 20-item questionnaire encompasses a large range of activities related to self-care, transfer, mobility, manual ability, and balance, yet remains strictly unidimensional. Thus, the ACTIVLIM-Stroke gives a unidimensional simple summed score of activity limitation at the ordinal level with a transformation to interval scaling where required. It has high internal validity, excellent internal consistency, and good invariance by group. Furthermore, it shows good external construct validity and excellent test–retest reliability.
The Rasch model is based on the principle of unidimensionality, which is a prerequisite to the summation of any set of items.36,37 Consequently, items fitting the Rasch model such as the ACTIVLIM-Stroke questionnaire items contribute to the construct definition of the scale. As expected from a clinical perspective, activities that require higher energy expenditure38 such as “walking more than 1 km” or “carrying a heavy load” as well as activities requiring higher bimanual dexterity,10 for example “tying one's laces,” were rated as more difficult.
Analyses of the relationships between patient ACTIVLIM-Stroke scores and other widely used scales also supported the construct validity of the scale (Figure 3). The higher the FIM–motor and BI scores, the higher the ACTIVLIM-Stroke measures. This confirmed that the ACTIVLIM-Stroke measures are consistent with patient activity level. Previous studies have pointed out limitation to classical test theory-based scales.5,7,39–41 We also found that the FIM–motor and the BI were less sensitive than the ACTIVLIM-Stroke questionnaire in chronic stroke patients with high ability level, confirming the appropriateness of the selected items of the new Rasch-built scale. A possible reason for the high discrimination capacity of the ACTIVLIM-Stroke questionnaire is that it evaluates whether a patient can complete a task without any assistance and, if so, the perceived difficulty, whereas the FIM–motor and BI evaluate a patient on performing a given task, irrespective of the task difficulty perception.
Thus, the ACTIVLIM-Stroke questionnaire might be recommended based on its higher discrimination and might be more sensitive to change in patents with chronic stroke.
The reliability (Person Separation Index) of the ACTIVLIM-Stroke was 0.88, indicating that an appropriate degree of precision will be observed when clinicians or researchers use the questionnaire to evaluate either a group of patients or a single subject.8 The ACTIVLIM-Stroke also showed excellent test–retest reliability. The intraclass correlation coefficients were 0.99 for item hierarchy and 0.92 for patient measure. These coefficients are sufficiently high to conclude that the questionnaire is reproducible over time and thus provides reliable measures. Additionally, the crosscultural validity of the ACTIVLIM-Stroke questionnaire was tested by checking its invariance across European and African patients with stroke. The absence of DIF supports the crosscultural validity of the new scale.
In addition to (1) being a Rasch-built scale with excellent clinimetric properties; and (2) exhibiting higher ability to discriminate different level of activity limitations in patients with stroke, the ACTIVLIM-Stroke questionnaire has potential add-value compared with existing scales. Indeed, the 20 items of the ACTIVLIM-Stroke questionnaire describe common activities for poststroke patients from Benin and Belgium providing an equal basis for patients' evaluation in both countries. The crosscultural validity of the scale supports its appropriateness and, therefore, its usefulness in multicenter studies.
The 20 items of the ACTIVLIM-Stroke questionnaire are clinically relevant because they represent situations that are regularly experienced by patients with stroke in their everyday life. Activities related to self-care, transfer, mobility, manual ability, and balance are essential human needs and represent a challenge for patients with stroke and their caregivers during rehabilitation.42–44 For wide and easy use of Rasch-built instruments, some have proposed nomograms that allow the translation of raw sum scores to logits or centile metric score.24,45 We provide a conversion table of ACTIVLIM-Stroke ordinal summed scores to interval measures. Consequently, even if clinicians are not familiar with Rasch analysis, they will be able to transform a patient's total raw score into a linear measure. However, the conversion table is appropriate only if the patients answer all items. If responses are missing, online analysis is available (www.rehab-scales.org) to directly convert ACTIVLIM-Stroke raw scores into linear Rasch measures of activity limitations, taking missing values into account.
Finally, the ACTIVLIM-Stroke questionnaire meets the key scale assessment criteria provided by the 2 most widely used guideline documents for psychometric standards for rating scales established by the Scientific Advisory Committee of the Medical Outcomes Trust46 and the US Food and Drug Administration.47 Except for responsiveness, which will be evaluated in further studies, this questionnaire satisfies several metric properties such as a conceptual and measurement model, unidimensionality, targeting, reliability, validity, interpretability, respondent and administration burden, alternative forms, and cultural adaptation.
The ACTICLIM-Stroke questionnaire assesses activity limitations in patients with stroke and is focused on the International Classification of Functioning, Disability and Health activity domain. Rasch analysis enabled robust calibration and validation of this new scale through rigorous selection of 20 items. The questionnaire presents good psychometrics qualities and provides more precise and accurate measures of activity limitations in populations with high activity level than existing traditional questionnaires. The ACTICLIM-Stroke questionnaire was constructed based on both Belgium and Benin stroke patients' perception of difficulty in performing daily activities. A conversion table from ordinal scores to interval scores reinforces the practicality and usefulness of the ACTIVLIM-Stroke questionnaire in clinical settings as well as in research studies.
We thank the patients who participated in the study. We are grateful to the Ministère de l'Enseignement Supérieur et de la Recherche Scientifique de la Communauté française de Belgique, for granting data collection in Benin.
- Received September 14, 2011.
- Revision received November 15, 2011.
- Accepted November 29, 2011.
- © 2012 American Heart Association, Inc.
World Health Organization. International Classification of Functioning, Disability and Health: ICF. Geneva, WHO; 2001.
- Baker K,
- Cano SJ,
- Playford ED
- Rasch G
- Merkies IS,
- Hughes RA
- Penta M,
- Tesio L,
- Arnould C,
- Zancan A,
- Thonnard JL
- Houlden H,
- Edwards M,
- McNeil J,
- Greenwood R
- Ferrarello F,
- Baccini M,
- Rinaldi LA,
- Cavallini MC,
- Mossello E,
- Masotti G,
- et al
- Wright BD,
- Stone MH
- Andrich D
- Bond TG,
- Fox CM
- Penta M,
- Arnould C,
- Decruynaere C
- Andrich D,
- Lyne A,
- Sheridan B,
- Luo G
- Embretson SE,
- Reise SP
- Tennant A,
- Penta M,
- Tesio L,
- Grimby G,
- Thonnard JL,
- Slade A,
- et al
- Streiner D,
- Norman G
- Wright BD,
- Masters GN
- Murray J,
- Ashworth R,
- Forster A,
- Young J
- McKevitt C,
- Fudge N,
- Redfern J,
- Sheldenkar A,
- Crichton S,
- Rudd AR,
- et al
- van Nes SI,
- Vanhoutte EK,
- van Doorn PA,
- Hermans M,
- Bakkers M,
- Kuitwaard K,
- et al
US Food and Drug Administration. Patient reported outcome measures: use in medical product development to support labeling claims. 2009. Available at: www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM193282.pdf. Accessed May 24, 2011.