Use of a 3-Item Short-Form Version of the Barthel Index for Use in Stroke
Systematic Review and External Validation
Background and Purpose—There may be a potential to reduce the number of items assessed in the Barthel Index (BI), and shortened versions of the BI have been described. We sought to collate all existing short-form BI (SF-BI) and perform a comparative validation using clinical trial data.
Methods—We performed a systematic review across multidisciplinary electronic databases to find all published SF-BI. Our validation used the VISTA (Virtual International Stroke Trials Archive) resource. We describe concurrent validity (agreement of each SF-BI with BI), convergent and divergent validity (agreement of each SF-BI with other outcome measures available in the data set), predictive validity (association of prognostic factors with SF-BI outcomes), and content validity (item correlation and exploratory factor analyses).
Results—From 3546 titles, we found 8 articles describing 6 differing SF-BI. Using acute trial data (n=8852), internal reliability suggested redundancy in BI (Cronbach α, 0.96). Each SF-BI demonstrated a strong correlation with BI, modified Rankin Scale, National Institutes of Health Stroke Scale (all ρ≥0.83; P<0.001). Using rehabilitation trial data (n=332), SF-BI demonstrated modest correlation with quality of life measures Stroke Impact Scale and 5 domain EuroQOL (ρ≥0.50, P<0.001). Prespecified prognostic factors were associated with SF-BI outcomes (all P<0.001). Our factor analysis described a 3 factor structure, and item reduction suggested an optimal 3-item SF-BI comprising bladder control, transfer, and mobility items in keeping with 1 of the 3-item SF-BI previously described in the literature.
Conclusions—There is redundancy in the original BI; we have demonstrated internal and external validity of a 3-item SF-BI that should be simple to use.
The Barthel Index (BI) is a 10-item measure of basic activities of daily living (ADL).1 The BI is the second most commonly used functional assessment scale in stroke trials and the most commonly used ADL assessment in adult rehabilitation.2,3 BI quantifies ADL in an ordinal, hierarchical scale that ranges from 0 to 20 or 0 to 100 depending on the scoring used.4 BI is recommended as an outcome measure by various professional societies and guidelines.3 BI has proven prognostic utility,5 it is used in clinical practice to inform rehabilitation and care planning, and it is used in research both to describe outcomes and as case-mix adjuster. The BI has proven a useful scale, but there is scope for improvement, for example, floor and ceiling effects of BI scoring are well described.6 For any assessment, there is a trade-off between the time and effort required for testing and the validity of the data acquired.7 Although administration time for BI assessment is modest, there is still opportunity cost, particularly in busy clinical settings.8
Issues with time taken to complete a scale are important to the assessor (longer time spent in assessment gives less time for other clinical activity) and are important to the patient (test burden is a particular issue in the context of acute stroke). These issues will be more apparent in patients with physical, cognitive, or communication difficulties, yet this is exactly the population that requires robust assessment of function. In the National Health Service (NHS) England and Wales National Stroke Audit, completion rate of BI measures was ≈60%, with lack of time cited as the reason for poor completion.9 The problem is not unique to assessment of BI, and in large registries, completion of the modified Rankin Scale (mRS) was around 75% with lesser completion in those with more severe impairments.10 In a rehabilitation study, completion of the Stroke Impact Scale was limited with potential to bias results.11
In this situation, the ideal would be a shortened form of the BI that offered prompt assessment without sacrificing clinical properties. The high internal reliability of the BI suggests that certain component items of BI are redundant, and there is potential to condense the scale.6
We sought to describe and compare properties of published short-form versions of BI (SF-BI), using a 2-stage approach, first, systematically searching the literature for SF-BI and then validating and comparing the various forms using an independent data set.
Our primary question for the systematic review was which items are included in short-form versions of BI for use in patients in stroke? As the purpose of the search was to find SF-BI, we did not perform quantitative summary analyses or quality assessment of primary articles.
We devised a focused search strategy using validated search terms across multidisciplinary electronic databases. After initial scoping searches, we opted to use a concept-based approach with search strings based on concepts of BI/ADL assessment and short forms/psychometric properties of scales. Search strings were based on MeSH and other controlled vocabulary (Material I in the online-only Data Supplement).
We searched across 3 electronic databases (Medline [Ovid], Embase [Ovid], and Health and Psychosocial Instruments [Ovid]) all from inception to December 2015. We used citation searching (backwards searching) and assessed all articles that had cited the index article (forwards searching).
We included any article that described a shortened (<10 items) version of the BI. We limited to studies of patients with stroke or brain injury but operated no restrictions with respect to language, date, or study design.
Titles and abstracts generated from the electronic database searches were screened for relevance. Irrelevant titles and abstracts were excluded and full-text articles inspected to determine eligibility. As a test of external validity, we preselected 2 studies12,13 relevant to the study question from a previous review of BI properties,6 and we assessed whether the search included these studies.
We extracted details of studies meeting inclusion criteria to a prespecified pro forma. We described the items included in the short form, the derivation sample, the method used for item reduction, and any validation. We included data in the primary publications and supplementary materials but did not contact study authors for additional detail.
We followed, where appropriate, Preferred Reporting Items for Systematic Review and Meta-Analysis (PRISMA) best practice guidance for design, conduct, and reporting of this systematic review.14 All aspects of title selection, assessment, and data extraction were performed by 2 independent researchers trained in systematic review (T.J.Q., M.T.-R.). The review protocol was registered with Research Registry (http://www.researchregistry.com, researchregistry1213).
Validation of SF-BI
Validity is the extent to which a rating scale measures what it purports to measure.15 We used multiple, complementary, approaches to validation of each of the SF-BI identified by literature searching, all prespecified and described in our protocol. On the basis of peer-review advice, we added a further assessment of divergent validity. For these analyses, we used the VISTA (Virtual International Stroke Trial Archive) resource. All analyses used SAS version 9.4 (SAS Institute, Cary) software.
VISTA is a not-for-profit repository for stroke trial data, containing study quality, and anonymized individual patient-level data on thousands of participants. These data have been used to investigate novel hypotheses, including analyses of stroke assessment scale properties.16,17 We selected all patient-level data that contained BI along with any other functional outcome measure. Within VISTA, we had access to data sets from acute stroke settings and rehabilitation studies. We ensured that studies included in our VISTA data sets had not been used to develop any of the published SF-BI found on literature searching. A priori we decided to treat data from acute stroke trials and from rehabilitation studies separately because we thought that they may have differing outcome measures and differing case-mix of participants.
We described clinical and demographic features of the acute and rehabilitation data sets. Where data were collected at >1 time point, we used the time point that gave largest data set. We assessed internal consistency of standard (10 items) BI using Cronbach α.
Concurrent, Convergent, and Divergent Validity
We described concurrent validity by assessing the agreement (Spearman rank correlation) of each SF-BI with the standard BI and with the various other short forms. We assessed convergent validity by describing agreement with other functional outcome assessments. After initial scoping of available data, our chosen comparator outcomes for acute data were mRS and National Institutes of Health Stroke Scale (NIHSS). For rehabilitation data, we used the health-related quality of life tools 5 domain EuroQOL (visual analogue scale) and the Stroke Impact Scale. Five domain EuroQOL data were transformed into a single index using the Europe visual analogue scale data set. For divergent validity, we described association with an aphasia scale, the Sheffield Screening Test for Acquired Language Disorders. We hypothesized that agreement with SF-BI would be greater for the other activity/impairment level scales (mRS and NIHSS), less for the quality of life scales (5 domain EuroQOL, visual analogue scale, and Stroke Impact Scale), and lowest for the aphasia scale.
We used ordinal univariate regressions to assess association of each SF-BI with, where data were available, clinical and demographic features known to influence stroke outcome (age, baseline stroke severity, physiological variables, comorbidity, previous stroke, and use of thrombolytic therapy). In the first analysis, we described cross-sectional association of point change in various SF-BI with clinical and demographic factors known to be associated with outcome. In the second analysis, we described odds of a point change in SF-BI associated with unit change in NIHSS or mRS at 90-day follow-up.
As a final test, we explored the most discriminating BI items in the acute VISTA data set. We performed exploratory factor analysis to suggest a minimum number of items for a short form and further analyses to determine the optimal items for this short form. Because of the larger sample size, factor analysis was restricted to the acute data set. We first described correlation, using Spearman ρ, for each BI item relative to the total score. Correlations between individual items were explored to investigate item redundancy, and exploratory factor analysis was performed to investigate the underlying structure of the BI. As a final test of content validity, we derived a short form from the VISTA data. First we used exploratory factor analysis to outline the minimum number of factors needed for SF-BI. We then used a stepwise selection process sequentially removing the poorest performing individual item (based on the correlation with total BI and Cronbach α) and comparing properties to find the 3 items within VISTA that had optimal properties. We compared the resulting VISTA-derived short form (herein referred to as SF-BI VISTA) with the short forms identified from the systematic review.
From 3546 titles, we found 812,13,18–23 titles describing 6 differing short forms of the BI (PRISMA diagram, Figure I in the online-only Data Supplement). Some of the articles were validations of previously described scores,19,20 although it was not always clear in the text if the SF-BI presented was derivation or validation. The validity of our search was proven as our 2 prespecified articles12,13 were included in the original search results.
The short forms differed in the number of included items, the nature of the items included, and in the methodology used for item reduction. The short forms included a variety of BI items, and all BI items were included in at least one of the short forms. Ability to perform transfers was a feature in most of the SF-BI, although dressing was included in only 1 SF-BI (Figure). The short forms described by Bohannon and Landes19 and Ellul et al13 required additional computation to assign a total score, and we added these formulae to correct the score before any of the validation analyses.
The authors used various approaches to the derivation of the SF-BI, and methods of derivation and validation were not consistently described. One of the articles described a short form with no reference to derivation or validation.23 Only the 5-item SF-BI described by Hobart and Thompson12 and 3-item SF-BI described by Ellul et al13 had robust derivation, multimodal validation, and further validation of the unmodified scale in an external data set (Table 1).
The VISTA database had 8852 acute strokes with a recorded measurement of BI at 90 days (919 intracerebral hemorrhage, 7933 ischemic), 8493 of whom had a complete BI measurement for day 30. The rehabilitation data set had 332 participants with a recording of BI at baseline. For these rehabilitation studies, baseline assessments were predominantly at 4 weeks post ictus. The included patients were broadly representative of trial populations, mean age 68.1 years (SD, 12.4), n=3943 (44.5%) female for the acute data set and 65.7 years (SD, 11.0), n=107 (32%) female for the rehabilitation data set. Both populations had prevalent comorbidity, for example, ischemic heart disease and diabetes mellitus (Tables I and II in the online-only Data Supplement).
There was a spread of BI scores across both data sets; for acute data median, BI day 90 was 80 (interquartile range, 60), and for rehabilitation data, median BI baseline was 75 (interquartile range, 35). Internal consistency for complete BI was high in the acute data set, with α 0.95 (BI days 30 and 90). For the rehabilitation data set, α is 0.85 (BI baseline).
We assessed convergent validity of each SF-BI in our data sets. Agreement of SF-BI with full BI was excellent in both data sets (Table 3). Each SF-BI showed significant (P<0.0001) correlations with all our chosen outcome measures. For acute data, correlations with mRS and NIHSS were strong. SF-BI at baseline showed weaker correlation with quality of life measures in the rehabilitation data set, albeit correlations were roughly equivalent to those seen for full BI. Correlations were strongest for Stroke Impact Scale, a measure that includes assessment of ADL, and weakest for the VAS. Correlations with the aphasia measure (divergent validity) were weak (Table 3).
Our assessment of predictive validity was limited to the acute data set, because of small numbers of common follow-up assessments in the rehabilitation data set. On ordinal univariate analyses, several factors known to predict outcome were independently associated (P<0.0001) with SF-BI (Table IV in the online-only Data Supplement). Each SF-BI was independently predictive of mRS at day 90 and NIHSS at day 90 in univariate ordinal regressions (Table V in the online-only Data Supplement). This association persisted when adjusting for the relevant clinical attributes suggested in univariate analysis (age, sex, and stroke type; Table 4).
As a test of content validity we derived a correlation matrix, we found between item analyses suggested redundancy with correlations of >0.7 for most individual items (Material III in the online-only Data Supplement). Exploratory factor analysis identified 2 independent factors within BI, with a potential third cross-loading factor (Table VI in the online-only Data Supplement). On the basis of this, we derived a 3-item SF-BI and for comparison a 5-item SF-BI. The optimal 3-item scale comprised bladder control, transfer, and mobility, that is, the items used in Ellul et al.13 The optimal 5-item scale compromised dressing, toileting, transfers, mobility, and stairs.
As a post hoc exploratory analysis, we compared the use of a simple sum of the 3 items, as used in SF-VISTA, and compared with the scores generated using the formulae suggested by Ellul et al.13 In the context of our validation analyses, we found that, compared with the Ellul formula, there was no evidence that the simple sum score used in SF-VISTA correlated less well with other BI-derived scales (Table 3) or was less strongly associated with mRS or NIHSS (Table 4).
Using systematic review and secondary analyses of existing data, we have described the validity of various published SF-BI. Our review of the literature found various SF-BI with differing number of items, differing components included, and differing scoring. The derivation and validation of these scales was inconsistently described. However, our independent validation using a large data set confirmed the potential item redundancy within BI (high internal reliability) and suggested the use of a short form (the form originally described by Bohannon et al18,19 and Ellul et al13) comprising three 3 assessed variables.
On the basis of our analyses, we would recommend a 3-item SF-BI that assesses bladder control, mobility, and transfers. We feel this offers parsimony, while still capturing key aspects of ADL. Comparing the existent 3-, 4-, and 5-item SF-BI, there was no obvious increase in our measures of validity with increasing number of items. We note that the ability to perform transfers appeared in almost all the short forms and suspect that any short form should include this item. The component items of the 3-item SF-BI should be relatively simple to score, and our post hoc analysis suggests that item scored can be added to give a total score without the need for additional calculations.
We chose to validate existing SF-BI rather than focus on creating our own de novo SF-BI. A priori we suspected that various SF-BI would be available, and we recognize the difficulty of establishing a novel assessment into routine use.24 We designed analyses that assessed concurrent, convergent, predictive, and content validity. The ideal for convergent validity would have been another ADL assessment. Such data were not available within VISTA, this concurs with our previous findings that BI is the most prevalent ADL assessment in trials, and other measures are infrequently used.2 We assessed agreement with similar outcome assessment scales (mRS and NIHSS) and with assessments that measure differing constructs (5 domain EuroQOL, Sheffield Test). Assessing BI against mRS and NIHSS is in keeping with previous work looking at stroke outcome properties.25 The weaker agreement with the aphasia and quality of life scales supports the short forms as tools describing ADL rather than generic measures of stroke recovery.
We feel confident of the properties of the 3-item SF-BI that we recommend because it performed well in our validation analyses, and previous derivation and validation studies have been described. We note also that for some of our convergent validity analyses, the short forms performed better than the full BI. This may suggest that as a prognostic tool or case-mix adjuster, baseline short forms of the BI may be preferable to the full assessment.
In creating a shorter version of an existing scale, there is a compromise between ease of use and richness of data captured. Standard BI is already a reasonably short assessment scale; in fact, various groups have suggested that BI lacks granularity and have proposed additional items be added to the scoring or the scale.26,27 We do not envisage the SF-BIs being used for individual clinical assessment, rather we think the short scales will have use in large-scale audit, epidemiology, and clinical research. Time required for testing is a major factor in determining acceptability of a scale to therapists.28 The SF-BIs described in the literature had a minimum of 3 items, but assessments could be made shorter still. Our factor analysis suggests 2 main factors within BI, in keeping with previous descriptions.29 There is a literature describing the use of single-question assessments for certain disease states.30
Having suggested a promising 3-item SF-BI, the next step would be to use this short-form assessment and describe whether it offers any benefit over traditional BI in terms of feasibility, acceptability, and completion rates. We speculate that a 3-item SF-BI will lessen assessment time, lessen test burden for patients, and lead to fewer data transcription errors, but all of this remains to be proven. We are encouraged that large audits, registries, and clinical trials are already incorporating SF-BI into their test batteries, and we would encourage any groups using the short form to share their experiences with the stroke community.
There is emerging best practice guidance on derivation and validation of short-form assessments.31 The articles included in our review predate this guidance, and so the variation in conduct and reporting across the studies is understandable. There is no consensus tool for quality assessment of such studies. We felt, as a minimum, articles should describe their derivation cohort and method, use at least 2 differing validation techniques, and have further validation in an independent data set. Few of the SF-BI described in the literature fulfilled all these criteria, and our VISTA-based analysis assists by providing robust, multimodal validation in a large, external data set.
The strengths of our approach include a robust literature search with internal and external validity checks and access to a large data set of study quality data. The size of the VISTA resource allowed us to look at properties of BI with a greater precision than previously described. We recognize that our literature review included a relatively limited scope of databases, with no meta-analyses or quality assessment. The purpose was to discover SF-BI, and our internal checks suggest we achieved this. A limitation of our study is around generalizability of the VISTA population. VISTA data are from randomized controlled trials and participants may not be representative of unselected stroke admissions. This is less of an issue as we propose that the SF-BI be used for audit and research purposes rather than individual patient clinical assessment. Our focus was stroke, as VISTA is a stroke-specific resource, and BI is often used in stroke trials. We suspect that our SF-BI could be used in nonstroke populations. However, we found few published articles describing SF-BI in nonstroke settings. Where data were available properties seemed favorable,32 but further validation work would be needed before we recommend SF-BI for other conditions. We recognize that validating a short form does not address some of the inherent limitations of the BI as a measure of ADL,33 but the shortened scale should, at least, address the issue of efficiency of assessment.
Our data support use of a shortened Barthel for assessment of stroke populations. On the basis of multimodal validation analyses, we recommend a 3-item scale that sums ability to transfer, ability to mobilize, and bladder control. We hope that this short form may prove useful in future large-scale trials, registries, and audit.
Sources of Funding
This work was supported by Greater Glasgow and Clyde Endowments; funder had no role in analysis. Dr Quinn is supported by a Stroke Association/Chief Scientist Office Senior Clinical Lectureship.
* A list of all VISTA Collaborators is given in SIX in the online-only Data Supplement.
Guest Editor for this article was Eric E. Smith, MD, MPH.
Presented in part at the International Stroke Conference, Nashville, TN, February 11–12, 2015.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.116.014789/-/DC1.
- Received July 20, 2016.
- Revision received December 12, 2016.
- Accepted December 19, 2016.
- © 2017 American Heart Association, Inc.
- Quinn TJ,
- Dawson J,
- Walters MR,
- Lees KR
- Quinn TJ,
- McArthur K,
- Ellis G,
- Stott DJ
- Quinn TJ,
- Langhorne P,
- Stott DJ
- Irwin P,
- Rutledge Z,
- Lowe DA
- Kwok CS,
- Potter JF,
- Dalton G,
- George A,
- Metcalf AK,
- Ngeh J,
- et al
- Hobart JC,
- Thompson AJ
- Ellul J,
- Watkins C,
- Barer D
- Moher D,
- Liberati A,
- Tetzlaff J,
- Altman DG
- MacIsaac R,
- Ali M,
- Peters M,
- English C,
- Rodgers H,
- Jenkinson C,
- et al
- Ali M,
- Fulton R,
- Quinn T,
- Brady M
- Bohannon RW
- Bohannon RW,
- Landes M
- Hsueh IP,
- Lin JH,
- Jeng JS,
- Hsieh CL
- McAvoy E
- Hendry K,
- Hill E,
- Quinn TJ,
- Evans J,
- Stott DJ
- Tagharrobi Z,
- Sharifi K,
- Sooky Z,
- Tagharrobi L