Reliability of the Modified Rankin Scale
To the Editor:
We read with interest the review of the modified Rankin Scale (mRS) by Banks and Marotta.1 Although not developed as a trial end point, mRS is now the preferred disability outcome measure in stroke trials, and discussion of the scale and its clinimetric properties is timely and valuable. Several points raised are worthy of further consideration.
Accurate quantification of test-retest reliability for clinical scales is challenging. Testing intraobserver variability over a short time period will be biased by observer recall of previous grading; delaying second testing allows for potentially significant patient improvement or disease progression. We should be cautious in extrapolating the results of studies that have reported acceptable intraobserver reliability of mRS2,3 because both studies used a maximum delay of 2 weeks between gradings. Assuming a consistent approach is used, video recording of the mRS interview for later review should be less prone to such bias. We are currently analyzing data from a project that used this design.4
Interobserver variability of mRS has been extensively studied and the authors provide a succinct review; however, again we must exercise caution in interpretation of the positive results reported. The majority of published studies testing interobserver variability have limited their observers to a few or even one individual, of similar background training from a single center.3 Country of origin5 and background training of assessor6 both influence mRS grading. Thus, to test reliability of mRS across multiple raters robustly requires a range of international centers, mimicking a large scale clinical trial. Studies using this approach show disappointing agreement in standard mRS grading (k=0.25).3
Reliability of mRS is of more than clinimetric interest. Misclassification of mRS will increase type II error rate and decrease statistical power. The significant effect of misclassification of clinical end points has already been described, in one large scale trial minor variation in classification of cause of death reduced trial power by 40%.7 Issues of statistical power are of both ethical and economic importance in stroke medicine where previous trials have been underpowered to detect modest but meaningful treatment effects.
Methods to improve the reliability of mRS have been developed, and we anticipate their increasing use in stroke trial design. Banks and Marotta discuss the improvements that can be achieved with a structured mRS interview,3 but we would draw attention to other methods currently being developed. Training and certification is now a prerequisite of National Institutes of Health Stroke Scale (NIHSS) use for clinical trials with resultant improved reliability.8 A video-based training system for mRS has been developed and used successfully in 3 large-scale trials (SAINT I; CHANT; SAINT II)5; further validation is underway. Shinohara demonstrated a further application of video technology, using mRS interview videos to validate a native language questionnaire.9 Such an approach could be extended to a multicenter trial setting with central “off-line” assessment of mRS. Pilot work to further explore the use of such an approach is ongoing.4
Banks JL, Marotta AC. Outcomes validity and reliability of the Modified Rankin Scale: implications for stroke clinical trials. Stroke. 2007; 38: 1091–1096.
Wolfe CDA, Taub NA, Woodrow EJ, Burney PGJ. Assessment of scales of disability and handicap for stroke patients. Stroke. 1991; 22: 1242–1244.
Wilson JTL, Hareendran A, Hendry A, Potter J, Bone I, Muir KW. Reliability of the modified Rankin Scale across multiple raters: benefits of a structured interview. Stroke. 2005; 36: 777–781.
Quinn TJ, Dawson J, Walters MR, Lees KR. Initial experience with video based modified Rankin assessment. Presented at European Stroke Conference; 2007.
Quinn TJ, Lees KR, Hardemark HG, Dawson J, Walters MR. Initial experience of a digital training resource for modified Rankin scale assessment in clinical trials. Stroke. 2007; 38: 2257–2261.
Quinn TJ, Hardemark HG, Lees KR. Influence of clinical speciality on modified Rankin scale assessment. Presented at British Geriatric Society Spring Meeting 2007.
Jaffar S, Leach A, Smith PG, Cutts F, Greenwood B. Effects of misclassification of causes of death on the power of a trial to assess the efficacy of a pneumococcal conjugate vaccine in the Gambia. Int J Epid. 2003; 32: 430–436.
Lyden P, Brott T, Tilley B, Welch KMA, Mascha EJ, Levine S, Haley EC, Grotta J, Marler J. Improved reliability of NIH stroke scale using video training. Stroke. 1994; 25: 2220–2226.