| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2003;34:1907.)
© 2003 American Heart Association, Inc.
Original Contributions |
From the Department of Neurology and Alzheimer Center (E.C.W. van S., P.S., M.C.V, H.C.W.), Department of Clinical Epidemiology and Biostatistics (D.L.K.), and Department of Radiology and Image Analysis Center (G.K., F.B.), VU Medical Center, Amsterdam, the Netherlands; Department of Radiology, LUMC, Leiden, the Netherlands (M.A. van B.); Department of Epidemiology and Biostatistics, Erasmus Medical Center, Rotterdam, the Netherlands (E.J. van D., N.D.P.); Department of Radiology, University Hospital, Maastricht, the Netherlands (P.A.M.H.); Department of Radiology, National University Hospital, Reykjavik, Iceland (O.K.); Department of Neurology, University Medical Center, Utrecht, the Netherlands (F.-E. de L.); Department of Neurology, Karl Franzens University, Graz, Austria (R.S.); and Department of Neurology, St Lucas/Andreas Ziekenhuis, Amsterdam, the Netherlands (H.C.W.).
Correspondence to E.C.W. van Straaten, Department of Neurology and Alzheimer Center, VU Medical Center, De Boelelaan 1117, PO Box 7057, 1007 MB Amsterdam, Netherlands. E-mail i.vanstraaten{at}vumc.nl
| Abstract |
|---|
|
|
|---|
Methods Six experienced and 4 inexperienced observers rated a set of 40 MRI studies of patients with clinically suspected VaD twice using the NINDS-AIREN set of radiological criteria. After the first reading session, operational definitions were conceived, which were subsequently used in the second reading session. Interobserver reproducibility was measured by Cohens
.
Results Overall agreement at the first reading session was poor (
=0.29) and improved slightly after application of the additional definitions (
=0.38). Raters in the experienced group improved their agreement from almost moderate (
=0.39) to good (0.62). The inexperienced group started out with poor agreement (
=0.17) and did not improve (
=0.18). The experienced group improved in both the large- and small-vessel categories, whereas the inexperienced group improved generally in the extensive white matter hyperintensities categories.
Conclusions Considerable interobserver variability exists for the assessment of the radiological part of the NINDS-AIREN criteria. Use of operational definitions improves agreement but only for already experienced observers.
Key Words: dementia magnetic resonance imaging observer variation vascular disorders
| Introduction |
|---|
|
|
|---|
55 years of age to 4.2% in a cohort of subjects
71 years of age.4,5 Differences in diagnostic criteria may partly explain this variability. In 1993, the International Workshop of the National Institute of Neurological Disorders and Stroke (NINDS) and the Association Internationale pour la Recherche et lEnseignement en Neurosciences (AIREN) reported diagnostic criteria for the diagnosis of VaD for research studies.6 Criteria were formulated for the different parts of the diagnostic process (history and physical, radiological, and pathological examination) to classify patients as having possible, probable, and definite VaD. The NINDS-AIREN criteria state that the diagnosis of probable VaD cannot be made without some form of radiological assessment. Consequently, a list of lesions associated with VaD was included in the NINDS-AIREN criteria.
Recently, a vast interest in clinical trials on the efficacy of cholinesterase inhibitors and other drugs for VaD has emerged, and the NINDS-AIREN criteria with their radiological definitions are being used on a large scale in these trials. However, clear operational definitions on how to use and interpret the radiological criteria are lacking. Only a few interobserver studies of the NINDS-AIREN criteria have been published. In 2 of these studies, both clinical and radiological diagnoses were studied together.7,8 The agreement between raters was moderate to good (
=0.42 in the first study mentioned, 0.46<
<0.72 in the second study). It was suggested that a cause of the disappointing results could have been the difference in interpretation of the radiological criteria by the different raters.8
In this study, we examined the interobserver agreement of the radiological part of the NINDS-AIREN criteria and the effect of subsequently formulated operational definitions on the level of agreement in patients with clinical signs of VaD. Second, we investigated whether experienced and inexperienced raters would benefit equally from such definitions.
| Methods |
|---|
|
|
|---|
50% rejection rate in the above-mentioned trial and to have a balanced distribution in the study sample, MRI studies were selected in a way that we expected half of the scans to be rated as having sufficient abnormalities to fulfill the NINDS-AIREN criteria. It should be noted that the percentage of cases fulfilling such criteria is not reflective of the general population of patients clinically suspected of having VaD. In addition to the 40 scans, we selected 10 scans to be scored during the first assessment and to be used for consensus reading and formulation of definitions. All MRI studies consisted of axial T2, axial fluid-attenuated inversion recovery, and axial and coronal T1 series using 5-mm slices and 1x1-mm pixel size.
Study Design
Ten raters with different levels of experience evaluated the 40 selected MRI studies in 2 consecutive reading sessions. The decision to use the same data set twice (rather than having 2 independent data sets) was based on the expectation that this would preclude variability to be introduced by unbalanced matching in the distribution of cases over the various subcategories of the NINDS-AIREN criteria. On the other hand, we expected no bias from a learning effect when the same samples were rated twice because the second rating was done with a set of operational criteria developed from the additional training set of 10 scans; if any, this design would tend to maintain rather than to reduce interobserver variability and therefore is slightly conservative. The team of raters consisted of 10 physicians (2 radiologists, 4 neurologists, 3 research fellows, 1 neurology resident). Six had extensive experience in the evaluation of vascular lesions on MRI scans in clinical settings or in population-based studies on aging and dementia. The other 4 had experience in assessing MRI scans of the brain, but they had never assessed vascular lesions systematically on a large scale. The raters were blinded to all clinical and personal information. During the first reading session, all raters individually assessed the scans in random order with only the aid of the table of radiological findings of the NINDS-AIREN criteria for VaD as stated in the original article.6 All images were presented to the readers on identical personal computers using a digital viewing program, allowing window and level adjustment. The readers were able to browse through the scans as often as they wanted; no time limits were set. Scoring consisted of 2 stages. First, lesions had to be identified and classified topographically on a scoring form (Table 1), divided into a section on large-vessel disease (strategic infarcts in certain anterior, middle, or posterior cerebral artery territories) and a section on small-vessel disease (lacunes, white matter hyperintensities, bilateral thalamic lesions). Second, the topographical information had to be combined with severity criteria to decide whether the scan met the radiological criteria for VaD (final diagnosis). Subsequently, a joint consensus reading of the additional 10 scans was held, and operational definitions for scoring vascular lesions according to the NINDS-AIREN criteria were discussed. After consensus on a set of definitions was reached, a second reading of the 40 scans was performed the next day, again in random order, according to the newly formulated operational definitions (Table 2).
|
|
Statistical Analysis
We determined agreement between raters for the 2 reading sessions separately by Cohens
for >2 raters.10,11 The weighted
was not used because most scorings were dichotomous and the different categories were not ordered. We did this using AGREE software (ProGAMMA), which also calculated standard error values. We determined
for presence of radiological evidence for probable VaD, presence of large-vessel disease, and presence of small-vessel disease. To test whether agreement between the first and second readings differed statistically, we determined z values for the difference in
and used the corresponding probability value for testing. All scores were calculated for 3 groups: the whole group of raters (n=10), the group of experienced raters (n=6), and the group of inexperienced raters (n=4). A
between 0 and 0.2 refers to poor agreement; 0.2 to 0.4, fair agreement; 0.41 to 0.60, moderate agreement; 0.61 to 0.80, good agreement; and 0.81 to 1.00, very good agreement.12
| Results |
|---|
|
|
|---|
is given for the various sections of the scoring separately. At the first reading session, agreement in the group of inexperienced raters was generally less than the agreement in the group of experienced raters. This is also true for the assessment of the final diagnosis (Table 5). At the first reading session, mean
for the final diagnosis for all raters signifies fair agreement.
|
|
|
After the first scoring, operational definitions were formulated in consensus (Table 2). During this consensus meeting, we identified the problems that had risen with the interpretation of the criteria. The meaning, exact location, and borders of a paramedian thalamic infarction were uncertain in our opinion. We had trouble interpreting the term "multiple basal ganglia and frontal white matter lacunes." Questions that arose included, Are lacunes needed in both areas to meet the criteria? How many lacunes is "multiple" exactly? How big should an extensive periventricular white matter lesion be, and is a lesion considered only when directly abutting the ventricles? Should strokes in any area be considered in the bilateral large-vessel hemispheric strokes category, or only those strokes that are scored previously in the topography section? How can we approximate one fourth of the total white matter? We tried to address these questions in the operational criteria, leaving the original set of criteria fundamentally intact. Definitions were laid out for the different radiological types of vascular pathology, different regions of relevant strokes were defined, and for small-vessel disease, numeric definitions were adopted. With respect to the leukoencephalopathy, we agreed on quantification with the use of the age-related white matter changes (ARWMC) rating scale.13 In the severity section, we discussed dominance of hemispheres, and for practical reasons, the left hemisphere was considered dominant. In addition to describing the different parts of the diagnostic criteria, rules on how to combine these parts were added because we noticed differences in opinion during consensus reading. We agreed that a scan would meet the final diagnosis of VaD if both severity and topography criteria were met, with the exception of the bilateral thalamic lesions and multiple lacunes subcategories, which have no related severity criterion.
Table 4 shows that agreement generally increases, especially in the small-vessel category. To calculate significance of change in
, z values were calculated. For appreciation of large-vessel disease, z values indicated that none of these differences are statistically significant. For small-vessel disease, only the difference in scoring of the inexperienced raters in the small-vessel category showed statistically significant improvement (P=0.04).
The mean
for the final diagnosis for all raters at the second reading was slightly greater than at the first reading session (Table 5). For the experienced group, agreement rose to
=0.62, but in the inexperienced group, it remained low. Only in the experienced group of raters did agreement improve significantly.
| Discussion |
|---|
|
|
|---|
After the application of operational definitions, agreement on the final diagnosis of VaD improved. However, stratified analysis showed that this improvement in agreement was confined to the group of experienced raters with a
of 0.62, indicating good agreement. This was due to improvements in both the large- and small-vessel categories. In the group of inexperienced raters, agreement worsened in the large-vessel category but improved in the small-vessel category. The latter was due mainly to an increase in
by 0.35 to good agreement in the extensive white matter lesions subcategory.
The design of this study has some limitations. We did not have a gold standard. The operational definitions were not validated against pathology or clinical findings but had the sole purpose of being practical, usable, and able to improve standardization. In addition, the raters did not have clinical information that could have contributed to the final diagnosis. In large clinical trials in which the MRI scans are rated centrally, this information is also not available, but agreement can be expected to improve in a clinical setting because previous studies show higher
when this information is accessible by the readers. Another limitation of the study might have been the use of
. In some cases, expected agreement was high because of the very low prevalence of some lesions, especially some stroke types (eg, anterior cerebral artery, paramedian thalamic infarctions). This results in low
even when agreement is high. Finally, the operational definitions formulated are, of course, arbitrary and may be subject to further amendments. However, the raters who formulated the criteria were the same raters who were going to apply them in the second reading session. It can therefore be expected that they were optimal for use in this interobserver study.
In conclusion, we found that the radiological criteria for the NINDS-AIREN criteria for VaD are very complex. This makes these criteria less suitable for inexperienced raters and not appropriate for routine diagnosis on the basis of a standard radiological report only. The radiological criteria for the NINDS-AIREN criteria for VaD have suboptimal reproducibility. Use of operational criteria improves agreement to acceptable levels, but only in experienced readers. Because operational definitions essentially do not change the original criteria, a critical reappraisal of the NINDS-AIREN radiological criteria seems to be needed to further improve the quality of the criteria and interobserver agreement. We hope that our results set the stage for such an endeavor.
Received February 4, 2003; revision received March 18, 2003; accepted April 16, 2003.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
J. D. Sluimer, W. M. van der Flier, G. B. Karas, N. C. Fox, P. Scheltens, F. Barkhof, and H. Vrenken Whole-Brain Atrophy Rate and Cognitive Decline: Longitudinal MR Study of Memory Clinic Patients Radiology, August 1, 2008; 248(2): 590 - 598. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. S. Staekenborg, W. M. van der Flier, E. C.W. van Straaten, R. Lane, F. Barkhof, and P. Scheltens Neurological Signs in Relation to Type of Cerebrovascular Disease in Vascular Dementia Stroke, February 1, 2008; 39(2): 317 - 322. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dormont, D.J. Seidenwurm, and for the Expert Panel on Neurologic Imaging Dementia and Movement Disorders AJNR Am. J. Neuroradiol., January 1, 2008; 29(1): 204 - 206. [Full Text] [PDF] |
||||
![]() |
A. J. Bastos-Leite, W. M. van der Flier, E. C.W. van Straaten, S. S. Staekenborg, P. Scheltens, and F. Barkhof The Contribution of Medial Temporal Lobe Atrophy and Vascular Pathology to Cognitive Impairment in Vascular Dementia Stroke, December 1, 2007; 38(12): 3182 - 3185. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. Auchus, H. R. Brashear, S. Salloway, A. D. Korczyn, P. P. De Deyn, C. Gassmann-Mayer, and For the GAL-INT-26 Study Group Galantamine treatment of vascular dementia: A randomized trial Neurology, July 31, 2007; 69(5): 448 - 458. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. M VAN DER FLIER, F. BARKHOF, and P. SCHELTENS Shifting Paradigms in Dementia: Toward Stratification of Diagnosis and Treatment Using MRI Ann. N.Y. Acad. Sci., February 1, 2007; 1097(1): 215 - 224. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Behl, C. Bocti, R. H. Swartz, F. Gao, D. J. Sahlas, K. L. Lanctot, D. L. Streiner, and S. E. Black Strategic Subcortical Hyperintensities in Cholinergic Pathways and Executive Function Decline in Treated Alzheimer Patients Arch Neurol, February 1, 2007; 64(2): 266 - 272. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. O'Brien, M. J. Firbank, M. S. Krishnan, E. C. W. van Straaten, W. M. van der Flier, K. Petrovic, L. Pantoni, M. Simoni, T. Erkinjuntti, A. Wallin, et al. White Matter Hyperintensities Rather Than Lacunar Infarcts Are Associated With Depressive Symptoms in Older People: The LADIS Study Am J Geriatr Psychiatry, October 1, 2006; 14(10): 834 - 841. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Dittrich, M. A. Ritter, M. Kaps, M. Siebler, K. Lees, V. Larrue, D. G. Nabavi, E. B. Ringelstein, H. S. Markus, and D. W. Droste The Use of Embolic Signal Detection in Multicenter Trials to Evaluate Antiplatelet Efficacy: Signal Analysis and Quality Control Mechanisms in the CARESS (Clopidogrel and Aspirin for Reduction of Emboli in Symptomatic carotid Stenosis) Trial Stroke, April 1, 2006; 37(4): 1065 - 1069. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. G. Leblanc, J. F. Meschia, D. T. Stuss, and V. Hachinski Genetics of Vascular Cognitive Impairment: The Opportunity and the Challenges Stroke, January 1, 2006; 37(1): 248 - 255. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Bastos Leite, W. M. van der Flier, E. C. W. van Straaten, P. Scheltens, and F. Barkhof Infratentorial Abnormalities in Vascular Dementia Stroke, January 1, 2006; 37(1): 105 - 110. [Abstract] [Full Text] [PDF] |
||||
![]() |
W M van der Flier and P Scheltens Use of laboratory and imaging investigations in dementia J. Neurol. Neurosurg. Psychiatry, December 1, 2005; 76(suppl_5): v45 - v52. [Full Text] [PDF] |
||||
![]() |
W. M. van der Flier, E.C.W. van Straaten, F. Barkhof, and P. Scheltens NINDS AIREN neuroimaging criteria do not distinguish stroke patients with and without dementia Neurology, October 25, 2005; 65(8): 1341 - 1341. [Full Text] [PDF] |
||||
![]() |
W. M. van der Flier, E. C.W. van Straaten, F. Barkhof, A. Verdelho, S. Madureira, L. Pantoni, D. Inzitari, T. Erkinjuntti, M. Crisby, G. Waldemar, et al. Small Vessel Disease and General Cognitive Function in Nondisabled Elderly: The LADIS Study Stroke, October 1, 2005; 36(10): 2116 - 2120. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. L. Lopez, L. H. Kuller, J. T. Becker, W. J. Jagust, S. T. DeKosky, A. Fitzpatrick, J. Breitner, C. Lyketsos, C. Kawas, and M. Carlson Classification of vascular dementia in the Cardiovascular Health Study Cognition Study Neurology, May 10, 2005; 64(9): 1539 - 1547. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2003 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |