Analysis and Comparison of the Psychometric Properties of Three Balance Measures for Stroke Patients
Background and Purpose— This study compared the psychometric properties of 3 clinical balance measures, the Berg Balance Scale (BBS), the Balance subscale of the Fugl-Meyer test (FM-B), and the Postural Assessment Scale for Stroke Patients (PASS), in stroke patients with a broad range of neurological and functional impairment from the acute stage up to 180 days after onset.
Methods— One hundred twenty-three stroke patients were followed up prospectively with the 3 balance measures 14, 30, 90, and 180 days after stroke onset (DAS). Reliability (interrater reliability and internal consistency) and validity (concurrent validity, convergent validity, and predictive validity) of each measure were examined. A comparison of the responsiveness of each of the 3 measures was made on the basis of the entire group of patients and 3 separate groups classified by degree of neurological severity.
Results— The FM-B and BBS showed a significant floor or ceiling effect at some DAS points, whereas the PASS did not show these effects. The BBS, FM-B, and PASS all had good reliability and validity for patients at different recovery stages after stroke. The results of effect size demonstrated fair to good responsiveness of all 3 measures within the first 90 DAS but, as expected, only a low level of responsiveness at 90 to 180 DAS. The PASS was more responsive to changes in severe stroke patients at the earliest period after stroke onset, 14 to 30 DAS.
Conclusions— All 3 measures tested showed very acceptable levels of reliability, validity, and responsiveness for both clinicians and researchers. The PASS showed slightly better psychometric characteristics than the other 2 measures.
Balance training is an important component of stroke rehabilitation.1,2⇓ Several studies have found that changes in balance ability correlate significantly with changes in function.3–7⇓⇓⇓⇓ Measuring balance can assist the clinician in diagnosis, selection of the most appropriate therapy, and outcome measurement.3,8⇓
A variety of laboratory approaches to assess balance have been proposed,9–16⇓⇓⇓⇓⇓⇓⇓ but the functional scales of balance measures are most commonly applied to stroke patients in clinical settings.14,17⇓ To date, >15 different functional scales measuring balance have been developed and used in stroke research.11,13,14,17–20⇓⇓⇓⇓⇓⇓ However, only a few are specifically designed for stroke patients.17 The balance subscale of the Fugl-Meyer test (FM-B)19 and the Berg Balance Scale (BBS)18 are the most commonly used. Recently, Benaim et al17 adapted items from the FM-B and developed a new scale, the Postural Assessment Scale for Stroke Patients (PASS), for measuring balance function in stroke patients.
To be clinically useful, a scale must be scientifically sound in terms of 3 basic psychometric properties: reliability, validity, and responsiveness.8,21,22⇓⇓ Although many researchers have examined the reliability and validity of each of the 3 balance measures described above,10,17–20,23–29⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓ some limitations were noted. First, these studies did not compare the properties of different balance measures on the same cohort of patients. The measures in the previous studies were also rarely administered at specific times after onset. It is thus difficult for clinicians and researchers to compare the balance measures because of the limited results of previous studies. Second, most of the subjects in previous studies were tested only up to 3 months after stroke onset. Therefore, information is lacking on whether these balance measures are appropriate for assessing patients after that stage. Furthermore, no previously reported studies have evaluated whether these balance measures have similar psychometric properties for patients with different degrees of neurological severity. As a consequence, researchers and clinicians have found that they are faced with a greater range of choices but limited information on which to base their selection.
No reported studies have concurrently compared the psychometric properties of the 3 balance measures, the BBS, FM-B, and PASS. The purpose of this prospective study was to compare the reliability, validity, and responsiveness of these 3 balance measures concurrently in a cohort of patients 14, 30, 90, and 180 days after stroke onset (DAS).
Subjects were recruited from the registry of the Quality of Life After Stroke Study in Taiwan between December 1, 1999, and May 31, 2000. The Quality of Life After Stroke Study is an ongoing prospective cohort study of patients with stroke admitted to National Taiwan University Hospital. Individuals enrolled in the Quality of Life After Stroke Study were evaluated on 14 DAS and reassessed at other specific DAS times for up to 3 years after stroke to characterize recovery of neurological impairments, functional abilities, and health-related quality of life. Patients were included in the study if they met the following criteria: (1) diagnosis (International Classification of Diseases, ninth revision, clinical modification codes) of cerebral hemorrhage (431), cerebral infarction (434), or other (430, 432, 433, 436, 437); (2) first onset of cerebrovascular accident without other major diseases; (3) stroke onset within 14 days before hospital admission; (4) ability to follow commands; and (5) ability to give informed consent personally or by proxy. The clinical diagnosis of stroke was confirmed by neuroimaging examination (CT/MRI). Subjects were excluded if they suffered from another stroke or other major diseases during the follow-up period or lived >40 miles from the hospital.
The 3 balance measures and related measures were administered to patients 14, 30, 90, and 180 DAS. The protocol of this study was divided into 2 parts. The first part was an interrater reliability study. Only subjects at 14 DAS were used in this part of the study. The 3 balance measures were administered by 2 occupational therapists (A and B) individually on the same patients on 14 DAS. The therapists administered the assessments in a random order within a 24-hour period to minimize the effects of a possible spontaneous recovery. The therapists who administered these tests were blinded to each other’s results.
The second part of the protocol was a validity and responsiveness study. Assessment of validity requires the use of standard instruments with which the scale is to be compared.8 In this study, the Barthel Index (BI)30 was used as the external criteria (for the examination of convergent validity) and was administered 14, 30, 90, and 180 DAS. The walking subscale of the Motor Assessment Scale (MAS)31 was also used as an external criterion (for the examination of predictive validity) to evaluate the performance of ambulation on 180 DAS. The degrees of responsiveness of the 3 balance measures were calculated on the basis of the changes occurring between 14 to 30, 30 to 90, 90 to 180, 14 to 90, and 14 to 180 DAS.
Because the testing protocol took ≈30 to 40 minutes to administer, patients were allowed to rest during testing if necessary. All of the above assessments were made by an occupational therapist who was blinded to the purpose of this study. Data on demographic characteristics and comorbidity were collected from medical records.
The FM-B is 1 of 6 subscales of the FM, which was designed to evaluate impairment after stroke.19 The FM-B contains 7 three-point items, 3 for sitting and 4 for standing. The total score ranges from 0 to 14. Results of previous studies investigating the reliability and validity of the FM-B have been controversial.19,20,23,28,29⇓⇓⇓⇓ Some studies found the sitting balance items, especially the 2 parachute reaction items, to be unreliable and invalid (item to total,r=−0.03).28 Therefore, Hsueh et al27 revised the scoring criteria of these 2 items but retained the original testing procedures. In this modified version, patients received a score of 0 if they lost balance easily, 1 if they partially lost balance, and 2 if they maintained sitting balance well when firmly pushed on the affected or nonaffected side. The validity of the modified FM-B was quite acceptable (r=0.84).27 The modified FM-B was used in this study.
The BBS24 evaluates a person’s performance on 14 items (1 sitting and 13 standing items) related to balance function that are frequently encountered in everyday life. The scoring method is based on a 5-point ordinal scale of 0 to 4, with the total score ranging from 0 to 56.24 The BBS was originally developed for screening the elderly at risk of falling,24 but the psychometric properties of the BBS used in stroke patients have also been examined by various researchers with supportive results.5,18,26,27⇓⇓⇓ The BBS was reported to be able to detect changes of the same magnitude as the BI during the initial 12 weeks after stroke.26
The PASS17 was developed to be applicable to all stroke patients, even those with very poor postural performance. The PASS contains 12 four-point items that grade performance for situations of varying difficulty in maintaining or changing a given lying, sitting, or standing posture. Its total score ranges from 0 to 36. The psychometric properties of the PASS have been reported to be satisfactory in stroke patients during the first 3 months after stroke.17 The interrater reliability and intrarater reliability of the PASS have been shown to be very high. There are moderate correlations between scores of the PASS and the Functional Independent Measure, motor score of lower extremities, and postural stability.18 The PASS had fair predictive validity for later functional performance. However, no studies of the responsiveness of the PASS have been reported.
Instruments Measuring Other Related Functions
The FM19 mainly measures motor impairment after stroke. The FM includes items of upper and lower extremity motor function. Each item is graded on a 3-point scale. The possible score ranges from 0 to 100 points. In general, the FM is reliable and valid.23,29,32⇓⇓ The score of the FM was used as an index of neurological severity of the patients in this study.
The BI is a measure of the severity of disability.30 The BI evaluates 10 basic activities of daily life (ADL), and the total score ranges from 0 to 100. It has been shown to be a reliable and valid measure of ADL.33–35⇓⇓
The score range and distribution of each of the 3 measures were examined. The floor and ceiling effects, the percentages of the sample scoring the minimum and maximum possible scores, respectively, reflect the extent that scores cluster at the bottom and top of the scale range. Floor and ceiling effects >20% are considered to be significant.36 The existence of the floor and ceiling effects is indicative of the limited ability of a measurement to discriminate between subjects.
The interrater agreement on individual items of the 3 balance measures was analyzed with the weighted κ statistic. The weighted κ score measures the agreement among raters adjusted for the amount of agreement expected by chance and the magnitude of disagreements.37 A κ value >0.75 indicates excellent agreement, 0.4 to 0.75 indicates fair to good agreement, and <0.4 indicates poor agreement.38
Total Score Reliability
The interrater reliability of the total score of the 3 balance measures was analyzed with the intraclass correlation coefficient (ICC) statistic. The fixed effect of ICC Model 339 was used to compute the ICC value for the degree of agreement between repeated measurements by the 2 raters on the same patient. An ICC value of >0.80 indicates high reliability.40
The internal consistency of each balance measure was expressed using Cronbach’s α coefficients. An α coefficient >0.70 is considered adequate for group comparison.41
Concurrent validity is usually established by demonstrating a high correlation between the scale and a gold standard. The interrelationship between the 3 balance measures on 14, 30, 90, and 180 DAS were examined by use of Spearman’s ρ correlation coefficient.
Convergent validity was determined by examining the relationships between the 3 balance measures and instruments measuring similar constructs. The relationships between the total score of the 3 balance measures and those of the BI at each DAS point were examined with Spearman’s ρ correlation coefficient.
The predictive validity of the 3 balance measures was assessed by comparing the results of the 3 balance measures at 14, 30, and 90 DAS with that of the MAS at 180 DAS by use of Spearman’s ρ correlation coefficient.
Because there is no consensus regarding how best to assess the responsiveness of measurement instruments, 2 approaches were used in this study. First, effect size (ES) was calculated by dividing the mean change scores by the standard deviation of the change score in the same subjects. According to Cohen’s criteria,42 an ES >0.8 is large, 0.5 to 0.8 is moderate, and 0.2 to 0.5 is small. In addition, Wilcoxon matched-pairs signed-rank tests were performed to determine the statistical significance of the change scores. Furthermore, to determine whether the responsiveness of the measures varied depending on the initial stroke-induced deficits, patients were stratified into 1 of the following 3 groups on the basis of their FM scores: 0 to 35, severe; 36 to 79, moderate; and ≥80, mild.43
A total of 128 patients with a wide spectrum of balance deficits, ranging from asymptomatic patients to the bedridden, were originally recruited for the study. However, 5 patients declined to participate in the study; 13 patients were lost to follow-up at 30 DAS; and another 17 patients were lost to follow-up at 90 DAS. A total of 80 patients completed follow-up at 180 DAS.Table 1 presents detailed characteristics of the cohort of patients in the study.
Table 2 shows the distributions of the 3 balance measures at 4 DAS points. For the entire sample, scores at the 4 DAS points spanned virtually the entire range; however, the FM-B and BBS showed notable floor effects at 14 DAS, and the BBS showed significant ceiling effects at 90 and 180 DAS.
One hundred twelve patients participated in the reliability investigation. This study group consisted of 52 women and 60 men with a mean age of 69 years (SD, 11.3). Eleven patients were excluded from the analysis because they were not rated by both raters within 24 hours.
The medians (ranges) of weighted κ statistics for each item of the PASS, FM-B, and BBS were 0.88 (0.61 to 0.96), 0.79 (0.71 to 0.95), and 0.92 (0.59 to 0.94), respectively, indicating good individual item agreement. The ICCs (95% CI) for the total scores of the PASS, FM-B, and BBS were 0.97 (0.95 to 0.98), 0.92 (0.88 to 0.95), and 0.95 (0.93 to 0.97), respectively, indicating excellent total score agreement. The Cronbach’s α of the PASS, FM-B, and BBS ranged from 0.94 to 0.96, 0.85 to 0.91, and 0.92 to 0.98, respectively, on all 4 DAS, indicating high internal consistency.
Table 3 shows the intercorrelations between the 3 balance measures. The pairwise correlations of the 3 balance measures were high at each stage of stroke, indicating high concurrent validity.
The scores of the 3 balance measures were highly correlated with those of the BI scores at the 4 selected time points after stroke (Spearman’s ρ ≥0.86,P<0.0001), indicating good convergent validity (Table 4).
The scores of the 3 balance measures at the earlier 3 DAS points were highly correlated with the MAS scores on evaluations on 180 DAS (Spearman’s ρ ≥0.8,P<0.0001), indicating good predictive validity (Table 4).
The ES showed that the 3 balance measures were moderately to highly responsive in detecting changes before 90 DAS (14 to 30 DAS, ES ≥0.8; 30 to 90 DAS, ES ≥0.63) and that the levels of responsiveness of these measures were low, as expected, at 90 and 180 DAS (0.31≤ES≤0.4) (Table 5). Table 5 also shows that the changes in the 3 scales at each stage were all significant (P≤0.006). Table 6 shows the responsiveness of the 3 balance measures at different stages for subjects with different levels of stroke severity (ES ≥0.21). All of these results indicate that the 3 balance measures are generally sensitive to change over time after a stroke. In particular, the BBS was found to be less responsive than the FM-B and PASS for severe stroke patients at 14 to 30 DAS (Table 6).
In this study, the comprehensive psychometric properties of 3 balance measures (FB-M, BBS, and PASS) for stroke patients were systematically compared for the first time. This study recruited stroke subjects with a wide range of neurological severity from a major academically based teaching hospital in Taiwan. In addition, this study followed subjects at 4 specific time points after stroke for an extended period (up to 180 DAS) to evaluate how appropriate these measures are for use at different recovery stages after stroke. Furthermore, the responsiveness data were analyzed according to different degrees of neurological severity of the subjects.
Distribution of balance measures is rarely reported in clinical studies. However, it is important to show the score distribution of the study sample to understand whether the scale is measuring a restricted range of stroke patients. In this study, analysis of the score distribution of the 3 balance measures at 14 DAS revealed some limitations of these measures. The BBS and FM-B showed significant floor effects at 14 DAS, whereas the PASS did not. Almost one third of the subjects had the lowest scores on these 2 measures. The reason for this result might be that the least demanding test in the BBS and FM-B is to sit independently; however, some stroke patients may not regain their ability to sit independently in the very early stage,7,44⇓ thus leading to significant floor effects. These results indicated that the PASS, including 4 items on bed mobility, was more appropriate than the BBS and FM-B in assessing patients in the early recovery stage. We also found that the BBS had significant ceiling effects on 90 and 180 DAS. These results indicate that the BBS might not discriminate the patients’ balance function after 90 DAS.
In this study, the reliabilities of these 3 balance measures were examined in terms of interrater reliability and internal consistency. In agreement with many previous studies,10,18,23⇓⇓ our results indicated that both the item and total score interrater reliabilities and the internal consistency of these 3 measures were equally high. In particular, the individual item and total score reliabilities of the FM-B after adjustment for the scoring criteria of the sitting items were much higher than those found in a previous study.28 Thus, the reliability of FM-B with the modified sitting balance items was well supported.
The very high internal consistency of the 3 balance measures indicated that the items of each instrument measured the same concept: balance.11,24⇓ However, the extremely high internal consistency of the 3 measures might indicate the possibility of item redundancy, which needs further examination.
Results of concurrent, convergent, and predictive validity of all 3 measures were generally in accordance with the findings of previous studies.5,10,17,19,26–28⇓⇓⇓⇓⇓⇓ For example, previous studies found that BBS scores were correlated to motor performance,10 ADL function,10,27⇓ and walking ability.27 The findings of this study further confirmed the validity of the 3 measures.
The responsiveness of an instrument is of key importance in outcome studies. If the instrument is unable to detect change in balance function, an intervention that improves balance may indicate no significant differences between treated and untreated patients. Unfortunately, this property is often overlooked, and information about the responsiveness of the 3 measures is scarce.
The results of the ES indicate that these 3 measures had fair to good levels of responsiveness before 90 DAS and in the overall stages (14 to 90 and 14 to 180 DAS) of recovery. In addition, at later stages (90 to 180 DAS) of recovery, the 3 measures had, as expected, only low levels of responsiveness. This might have been due to a plateau in the improvement of balance function after 90 DAS. The motor and ADL functions have also been reported to reach a plateau after 90 DAS.43 The other possible reason might be that these 3 balance measures lack items sensitive enough to detect patients’ improvement after 90 DAS.
Investigation of how disease severity affects the responsiveness of the 3 balance measures revealed that the BBS was less responsive than the PASS and FB-M in severe stroke patients at the initial stages (14 to 30 DAS). The reason for this finding might be that the BBS was not originally designed for stroke patients,24 and only 1 item of the scale assesses balance ability in the sitting position. Because sitting balance is 1 of the first postures to be restored after a stroke, it seems that the BBS is lacking items to detect change in patients who are unable to stand independently.
Level of Scaling and Number of Items
The FM-B, PASS, and BBS have 3, 4, and 5, scoring levels, respectively. The number of items in these 3 scales also varies. However, as found in this study, the reliability and responsiveness of these 3 measures are generally similar if the subjects are considered as a whole. These results indicate that increasing the number of items or the grading levels does not improve the responsiveness or decrease the reliability of these 3 balance measures. Interestingly, a recent study found that the BI (10 items, mainly a 3-point scale) and the motor Functional Independence Measure (13 items, a 7-point scale) showed similar responsiveness in patients with stroke and multiple sclerosis.45 The similar responsiveness of the 3 balance measures has important implications for both clinicians and researchers. The FM-B and the PASS are quicker and simpler to rate than the BBS. From this point of view, the BBS is the least suitable instrument for use in both clinical and research settings.
One of the limitations of this study was that the intrarater reliability of the 3 measures was not examined. Some studies have reported excellent intrarater reliability results for the BBS18 and PASS.17 In addition, we found a high interrater reliability of the measures. Therefore, the intrarater reliability of the 3 measures might not be an important issue.
In summary, the BBS, FM-B, and PASS are clinical balance measures with good reliability, good validity, and accepted responsiveness at different poststroke stages of recovery. The PASS showed slightly better psychometric characteristics than the other 2 measures and thus appears to be better suited for use by clinicians and researchers.
This study was supported by research grants from the National Taiwan University Hospital (89A015) and National Science Council (NSC-89-2314-B002-534 and NSC-89-2314-B002-468).
- Received July 11, 2001.
- Accepted August 30, 2001.
- ↵Bobath B. Adult Hemiplegia: Evaluation and Treatment. 3rd ed. London, UK: William Heinemann Medical Books; 1990.
- ↵Ryerson SD. Hemiplegia.In. Umphred DA, ed. Neurological Rehabilitation 3rd ed. St Louis, Mo: CV Mosby Co; 1995: 681–721.
- ↵Sandin KJ, Smith BS. The measure of balance in sitting in stroke rehabilitation prognosis. Stroke. 1990; 21: 82–86.
- ↵Wade DT. Measurement in Neurological Rehabilitation. Oxford, UK: Oxford University Press; 1992.
- ↵Horak FB. Clinical measurement of postural control in adults. Phys Ther. 1987; 67: 1881–1885.
- ↵Horak FB, Esselman P, Anderson ME, Lynch MK. The effects of movement velocity, mass displaced, and task certainty on associated postural adjustments made by normal and hemiplegic individuals. J Neurol Neurosurg Psychiatry. 1984; 47: 1020–1028.
- ↵Leonard E. Balance tests and balance responses: performance changes following a CVA: a review of the literature. Physiother Can. 1990; 42: 68–72.
- ↵Benaim C, Pérennou DA, Villy J, Rousseaux M, Pelissier JY. Validation of a standardized assessment of postural control in stroke patients: the Postural Assessment Scale for Stroke Patients (PASS). Stroke. 1999; 30: 1862–1868.
- ↵Hobart JC, Lamping DL, Thompson AJ. Evaluating neurological outcome measures: the bare essentials. J Neurol Neurosurg Psychiatry. 1996; 60: 127–130.
- ↵Streiner DL, Norman GR. Health Measurement Scales. 2nd ed. Oxford, UK: Oxford University Press; 1995.
- ↵Duncan PW, Propst M, Nelson SG. Reliability of the Fugl-Meyer assessment of sensorimotor recovery following cerebrovascular accident. Phys Ther. 1983; 63: 1606–1610.
- ↵Berg K, Wood-Dauphinee S, Williams JI, Maki B. Measuring balance in the elderly: validation of an instrument. Can J Public Health. 1992; 83 (suppl 2): S7–S11.
- ↵Wood-Dauphinee S, Berg K, Bravo G, Williams JI. The balance scale: responsiveness to clinically meaningful changes. Can J Rehabil. 1997; 10: 35–50.
- ↵Hsueh IP, Mao HF, Huang HL, Hsieh CL. Comparisons of responsiveness and predictive validity of two balance measures in stroke inpatients receiving rehabilitation [in Chinese]. Formos J Med. 2001; 5: 261–268.
- ↵Sanford J, Moreland J, Swanson LR, Stratford PW, Gowland C. Reliability of the Fugl-Meyer assessment for testing motor performance in patients following stroke. Phys Ther. 1993; 73: 447–454.
- ↵Mahoney FI, Barthel DW. Functional evaluation: the Barthel Index. Md Med J. 1965; 14: 61–65.
- ↵Carr JH, Shepherd RB, Nordholm L, Lunne D. Investigation of a new motor assessment scale for stroke patients. Phys Ther. 1985; 65: 175–180.
- ↵Richards SH, Peters TJ, Coast J, Gunnell DJ, Darlow MA, Pounsford J. Inter-rater reliability of the Barthel ADL index: how does a researcher compare to a nurse? Clin Rehabil. 2000; 14: 72–78.
- ↵McCluggage WG, Bharucha H, Caughley LM, Date A, Hamilton PW, Thornton CM, Walsh MY. Interobserver variation in the reporting of cervical colposcopic biopsy specimens: comparison of grading systems. J Clin Pathol. 1996; 49: 833–835.
- ↵Richman J, Makrides L, Prince B. Research methodology and applied statistics, part 3: measurement procedures in research. Physiother Can. 1980; 32: 253–257.
- ↵Ware JE Jr. SF-36 Health Survey: Manual and Interpretation Guide. Boston, Mass: Health Institute, New England Medical Center; 1993.
- ↵Cohen J. Statistical Power Analysis for the Behavior Sciences. Hillsdale, NJ: Lawrence Erlbaum Assoc; 1983.
- ↵van der Putten JJ, Hobart JC, Freeman JA, Thompson AJ. Measuring change in disability after inpatient rehabilitation: comparison of the responsiveness of the Barthel Index and the Functional Independence Measure. J Neurol Neurosurg Psychiatry. 1999; 66: 480–484.