Training and Consistency in Stroke Assessments
See related article, pages 2507–2511 and 2557–2559.
Stroke is a global disease. It needs global tools for description and outcome assessment, common definitions for risk factors, common definitions for complications such as symptomatic hemorrhage, and common investigation protocols.
Research into treatments for stroke depends on enrollment of large numbers of patients, possible only through international cooperation. Wide variation in initial stroke severity requires us to describe the population that we enroll. The National Institutes of Health Stroke Scale is now the most widely used scale for measuring stroke severity in clinical trials1 and lies second only to the modified Rankin Scale for choice as a primary end point.2 This trend to homogeneity is important; consistency in language is required within trials and to interpret their results. For example, use of common scales allowed pooling of data from the National Institute of Neurological Diseases and Stroke, Alteplase Thrombolysis for Acute Noninterventional Therapy in Ischemic Stroke, and European Cooperative Acute Stroke Study trials to examine the influence of onset to treatment time with alteplase.3 It assists selection of patients for routine care through translation of trial results into practice. It lets us understand trends in clinical practice, facilitating comparisons over time and across regions.4 With clinical trial data from 10s of thousands of patients archived by groups such as the Virtual International Stroke Trials Archive,5 it is now possible to examine trends in natural history, to plan selection criteria for future trials, and perhaps to crossvalidate trial results using data that were collected in a consistent manner using common tools.
These benefits that accrue from use of common scales depend on consistency of application across raters and over time. There will be a temptation to modify the scale to improve reliability, by adding or omitting items, and by adjusting the weighting given to components. This should be resisted. An imperfect scale applied consistently will be more useful than modifications intended to improve validity or reliability. The articles by Lyden and colleagues in this journal6,7 provide reassurance in this respect. By demonstrating that even among general (nonexpert) users, the National Institutes of Health Stroke Scale can achieve good interrater reliability, Lyden at al have given further support to the choice of National Institutes of Health Stroke Scale as the pre-eminent stroke severity rating. Pezzella and colleagues7 have found that with translation into Italian, the scale can achieve similar reliability as the English version. The lower reliability of the scale in their hands among nurses contrasts with the more extensive analysis of the English version by Lyden.6 He and colleagues find that nonneurologists are just as reliable in their application of the scoring rules. Training can be robustly applied across multiple venues and specialties. This pattern appears to be consistent with other scales such as modified Rankin Scale.8 Of course, language forms only a component of National Institutes of Health Stroke Scale both with regard to scoring and instructions; mostly the scoring is based on observation of physical performance. Validity of the National Institutes of Health Stroke Scale in other languages thus does not guarantee that interview-based assessments such as the modified Rankin Scale will achieve similar consistency after translation.
It would be intriguing to re-examine trials to assess the extent to which the expertise or reliability of the individual rater contributes to the variation in severity ratings, to outcome assessments, and perhaps even to the trial conclusion. This must be a topic for continued effort because description of patients’ baseline characteristics and outcomes plays such a crucial role in determining whom we should treat and how effective are our interventions.
We should congratulate Dr Lyden and colleagues for taking an imperfect tool and guiding its use in a way that has made it an invaluable part of every modern stroke trial and a mandatory skill for professionals in at least one country.
The opinions in this editorial are not necessarily those of the editors or of the American Heart Association.
Quinn TJ, Dawson J, Walters MR, Lees KR. Functional outcome measures in contemporary stroke trials. Int J Stroke. In press.
Quinn TJ, Lees KR, Hardemark HG, Dawson J, Walters MR. Initial experience of a digital training resource for modified Rankin Scale assessment in clinical trials. Stroke. 2007; 38: 2257–2261.
Ali M, Atula S, Bath PMW, Grotta JC, Hacke W, Lyden PD, Marler JR, Sacco RL, Lees KR; for the VISTA Investigators. Stroke outcome in clinical trial patients deriving from different countries. Stroke. 2009; 40: 35–40.
Ali M, Bath PMW, Curram J, Davis SM, Diener HC, Donnan GA, Fisher M, Gregson BA, Grotta J, Hacke W, Hennerici MG, Hommel M, Kaste M, Marler JR, Sacco RL, Teal P, Wahlgren NG, Warach S, Weir CJ, Lees KR. The Virtual International Stroke Trials Archive. Stroke. 2007; 38: 1905–1910.
Lyden P, Ramani R, Liu L, Emr M, Warren M, Marler J. NIHSS certification is reliable across multiple venues. Stroke. 2009; 40: 2507–2511.
Pezzella FR, Picconi, Pezzella FR, Picconi O, De Luca A, Lyden PD, Fiorelli M. Development of the Italian version of the NIH Stroke Scale: It-NIHSS. Stroke. 2009; 40: 2557–2559.
Quinn TJ, Dawson J, Lees KR, Walters MR. Variability in modified Rankin scoring across a large cohort of international observers. Stroke. 2009;40:in press.