Impact of White Matter Hyperintensities Scoring Method on Correlations With Clinical Data
The LADIS Study
Background and Purpose— White matter hyperintensities (WMH) are associated with decline in cognition, gait, mood, and urinary continence. Associations may depend on the method used for measuring WMH. We investigated the ability of different WMH scoring methods to detect differences in WMH load between groups with and without symptoms.
Methods— We used data of 618 independently living elderly with WMH collected in the Leukoaraiosis And DISability (LADIS) study. Subjects with and without symptoms of depression, gait disturbances, urinary incontinence, and memory decline were compared with respect to WMH load measured qualitatively using 3 widely used visual rating scales (Fazekas, Scheltens, and Age-Related White Matter Changes scales) and quantitatively with a semiautomated volumetric technique and an automatic lesion count. Statistical significance between groups was assessed with the χ2 and Mann-Whitney tests. In addition, the punctate and confluent lesion type with comparable WMH volume were compared with respect to the clinical data using Student t test and χ2 test. Direct comparison of visual ratings with volumetry was done using curve fitting.
Results— Visual and volumetric assessment detected differences in WMH between groups with respect to gait disturbances and age. WMH volume measurement was more sensitive than visual scores with respect to memory symptoms. Number of lesions nor lesion type correlated with any of the clinical data. For all rating scales, a clear but nonlinear relationship was established with WMH volume.
Conclusions— Visual rating scales display ceiling effects and poor discrimination of absolute lesion volumes. Consequently, they may be less sensitive in differentiating clinical groups.
White matter hyperintensities (WMH) on MRI are associated with cognitive dysfunction, gait abnormalities, falls, and depression and contribute to disability in the elderly population.1–5⇓⇓⇓⇓ Lesion load on MRI may serve as surrogate marker of disease burden and may ultimately guide treatment. For the measurement of WMH extent, different methods can be used, ranging from visual rating to fully computerized techniques. Visual rating of WMH is easy, and several scales are available with good reproducibility.6 The visual scales often do not detail size and location, and most are not linear. Scores from different rating scales are not directly comparable.7 Most volumetric studies use supervised semiautomated methods that may provide more information on location and size, as well as continuous data, but are time consuming.8,9⇓ Both methods have been used to correlate WMH with clinical data and have rendered varying results.10,11⇓ Subjective memory symptoms, although difficult to define, are associated with higher risk of dementia and WMH and may be used for the early detection of subjects at risk.12
Number of lesions and lesion pattern (punctate versus confluent) may also be correlated to clinical data. WMH burden might be caused by a large number of punctate lesions or few confluent lesions, possibly leading to different clinical signs.
In this study, we aimed to establish cross-sectionally the sensitivity of several visual WMH scales, volumetric WMH measurement, as well as WMH lesion count and pattern, to symptoms of cognitive decline, gait abnormalities, urinary incontinence, and depression. The relationship between visual and volumetric methods was characterized by establishing the mathematical function that best fitted the data. The study group consisted of elderly independently living individuals recruited on the basis of WMH and stratified by lesion severity into 3 groups.
Materials and Methods
Data were drawn from the multinational multicenter longitudinal Leukoaraiosis And DISability (LADIS) study among 639 elderly, described previously.13 Inclusion criteria were 65 to 85 years of age and no or mild disability in everyday life (as established with the Instrumental Activities of Daily Living scale).14 Subjects were required to have at least some degree of WMH, demonstrated on MRI. Participants presented for evaluation in various settings: stroke unit, memory clinic, neurological or geriatric wards/clinic, population studies on aging, controls in other studies.
The study was approved by the local ethics committees, and all subjects gave informed consent. At baseline, subjects underwent a standardized evaluation (including global functioning, cognitive, motor, and psychiatric assessment), and, together with their informants, filled in questionnaires on medical history.
The data used in this study are age, gender, presence of depression requiring therapy, symptoms of urinary incontinence, gait disturbances, and memory problems, as expressed by the participants or their informants.
All subjects underwent magnetic resonance scanning following a standard protocol, during which 0.5-T or 1.5-T scanners were used and series included axial T2-weighted images (echo time [TE] 100 to 120 ms; repetition time [TR] 4000 to 6000 ms; voxel size 1×1×5 to 7.5 mm3; 19 to 24 slices), axial fluid-attenuated inversion recovery (FLAIR) images (TE 100 to 140 ms; TR 6000 to 10 000 ms; inversion time 2000 to 2400 ms; voxel size 1×1×5 to 7.5 mm3; 19 to 24 slices), and coronal or sagittal 3D T1 sequence (TE 4 to 7 ms; TR 10 to 25 ms; flip angle 15 to 30°; voxel size 1×1×1 to 1.5 mm3). All scans were checked and stored at the Image Analysis Center of the VU Medical Center, Amsterdam, the Netherlands. Postprocessing and data analysis for this study was performed in Amsterdam and Copenhagen. Of the 639 scans, 21 could not be used because of insufficient quality for the volumetric assessment.
On the FLAIR images, we applied the visual rating scales of Fazekas (range 0 to 3), Scheltens (range 0 to 84), and the Age-Related White Matter Changes (ARWMC) scale (range 0 to 30).15–17⇓⇓ All ratings were performed by an experienced rater (E.vS.) blind to the clinical data.
Volumetric analysis of WMH was performed by a single rater on the same axial FLAIR images, including the infratentorial region, using a Sparc 5 workstation (SUN). Lesions were marked and borders were set using local thresholding (home-developed software Show_Images, version 3.6.1) on each slice. No distinction was made between subcortical and periventricular hyperintensities. Areas of hyperintensity on T2-weighted images around infarctions and lacunes were disregarded.
An automated assessment of the number of lesions was performed by defining each lesion that was generated with the volumetric method, as a cluster of 3D connected voxels, and counting the number of such clusters (26-connectivity).
To test reproducibility of the different methods, 18 scans, with a mean volume (SD) of 26.3 (19.0) mL, were assessed twice with an interval of ≥2 months.
Sensitivity of WMH measurements to detect clinical group differences was tested with χ2 for trend (Fazekas scale) and Mann-Whitney test (Scheltens scale, ARWMC scale, volumetric measurement, and lesion count). Nonparametric testing was used for the WMH volumes because of the nonnormal distribution. Differences in lesion volume and number between Fazekas groups were tested using ANOVA with Bonferroni correction.
To test the hypothesis that lesion type (punctate, defined as Fazekas score 1, versus confluent, defined as Fazekas score 3, with comparable lesion volumes) was associated with clinical characteristics, we selected all subjects with WMH volume between 15 and 30 mL. In this volume range, both Fazekas scores 1 and 3 were represented. These groups were compared with respect to the clinical data, using the Student t test and χ2 test.
Visual rating scales were correlated with the volumetric method using Spearman rank correlation method. To test the hypothesis of a nonlinear relationship between the visual methods and the volumetric method, we fitted a linear and a quadratic function to the plot using a linear regression with a correction factor based on the local variance.18
WMH Load and Clinical Data
Table 1 shows mean WMH volumes and scores for different subject groups. Mean WMH volumes, but not visual ratings, were significantly greater in men than in women. Both visual and volumetric assessments showed group differences in WMH load between the older and younger subjects. No significant differences in WMH load could be found between the groups with and without a history of depression or symptoms of urinary incontinence. The mean WMH load of subjects with symptoms of gait disturbance was only significantly larger when measured volumetrically or with the ARWMC and Scheltens scales. With the volumetric assessment only, we were able to establish a significant difference in lesion load between subjects with and without memory symptoms.
Mean Lesion Volume and Lesion Count
Fazekas score 1 corresponded to a mean lesion volume of 0.20 mL, score 2 to 0.45 mL per lesion and score 3 to 1.26 mL per lesion (Table 2). The differences were statistically significant between all groups. Table 2 also shows mean total WMH for each Fazekas category, which was also statistically significant between the groups. Subjects in the Fazekas score 2 category tended to have most lesions. Number of lesions did not discriminate significantly between groups with and without symptoms (Table 1).
We found no significant differences in clinical features between the groups with punctate (Fazekas 1) and confluent (Fazekas 3) lesions (Table 3).
Correlation Between Visual Rating Scales and Volumetric Measurement
Scatter plots for the Fazekas and ARWMC scales with WMH volume are shown in Figures 1 and 2⇓. The scatter and shape of the plot for the Scheltens score was similar to the scatter plot of the ARWMC score (data not shown). Increasing volume correlated with higher visual scores (Spearman ρ 0.86), and scatter increased with higher WMH visual scores. The relationship with WMH volume was better described by a quadratic than a linear model, indicated by higher R2 (Table 4). When corrected for difference in variance, the difference between the linear and quadratic model was statistically significant (P<0.01).
Intrarater agreement for the scales was good with a κ for Fazekas score of 0.84. Intraclass correlation coefficients were 0.93 (ARWMC scale), 0.92 (Scheltens scale), and 0.99 (volume measurement). The mean difference between the 2 measurements was not statistically significant when tested against 0 using a 1-sample t test.
The results indicate that volumetry may be more sensitive to detect small group differences. This is in line with previous research, correlating WMH measurement methods with cognitive performance.10 Subjective and objective memory symptoms are not interchangeable for the assessment of cognition, but both seem to be related with WMH. The best method for the measurement of WMH with respect to objective cognitive measures is still to be established.
The ARWMC and Scheltens rating scales have a greater range than the Fazekas scale and were found to differentiate better between groups. This finding corresponds with a review on the relationship between WMH and cognition.19 The Fazekas scale seems most appropriate for defining different WMH groups. No group differences were detected for symptoms of urinary incontinence and depression. One of the reasons could be that only WMH in certain areas correlate with these symptoms. Frontal WMH has been associated with mood disorders, cognitive functions, and gait problems. On the other hand, it was shown that WMH in different regions are highly correlated and that their influence on clinical signs may therefore not be limited to certain areas of the brain.20
To our knowledge, this is the first report on lesion count as a measure of WMH severity. We found that it was not sensitive to detect associations with clinical signs, possibly because lesion count does not take into account lesion size. In progressing disease, lesions can merge, leading to a smaller total number of lesions. Few lesions could therefore indicate either mild or severe disease. This lack of correlation between number of lesions and clinical findings was also found in individuals with multiple sclerosis (MS). This caused studies in MS to focus on total T2 lesion volume and T1 gadolinium enhancing lesions instead of number of lesions.21,22⇓
We compared subjects with punctiforme and confluent lesion patterns who had comparable WMH volumes. We found no differences in symptoms between the groups. Although these subanalyses limited the number of subjects studied, it illustrates the arbitrary nature of the qualitative scoring system.
We confirmed the good correlations between all 3 visual rating scales and the WMH volume, but the current study shows that the variability in WMH volume is large in the patient groups with high visual scores.23 Subject group with higher visual scores contains subjects with different degrees of WMH burden, leading to decreased correlation with clinical data. When progression of WMH is measured, this ceiling effect can be even greater. A previous study on the detection of WMH progression with conventional visual rating scales showed lack of sensitivity compared with WMH volume measurement.24 This effect is especially of interest because WMH progression seems to occur fastest in patients with a high lesion load.25
The WMH volume in this study was significantly higher than reported in previous studies.26,27⇓ The LADIS study was designed to enroll a large population of subjects with WMH, and participants were stratified into 3 categories of WMH severity. This approach is different from population-based studies and resulted in a relatively large group of subjects with a high WMH lesion load. Results are therefore not directly applicable to the healthy elderly population. The advantage of this design is the possibility of studying a broad spectrum of lesion loads.
WMH burden can be presented as volume or as a proportion of the total white matter or intracranial volume, depending on the focus of the study. This makes comparison between studies complicated. We did not correct for intracranial volume or white matter volume because we wanted to compare the raw volumes with visual scales that are also uncorrected. Wen and Sachdev investigated uncorrected WMH volume and found no differences in WMH volumes between men and women, whereas in our study, men had a larger mean WMH volume than women.27 The subjects in their study were younger than our study participants (60 to 64 years versus 65 to 85 years), which could partly explain this difference.
We did not control for risk factors for WMH such as hypertension. In addition, objective measures for cognition, gait, depression, and urinary incontinence were also not included here. This was done because the focus of this study was not to establish a causal relationship of WMH with clinical data but a comparison between scoring methods in their association with symptoms, which is clearly of clinical relevance for clinicians dealing with these patients.
Participating Centers and Personnel
Helsinki, Finland (Memory Research Unit, Department of Clinical Neurosciences, Helsinki University): Timo Erkinjuntti, MD, PhD, Tarja Pohjasvaara, MD, PhD, Pia Pihanen, MD, Raija Ylikoski, PhD, Hanna Jokinen, LPsych, Meija-Marjut Somerkoski, MPsych; Graz, Austria (Department of Neurology and MRI Institute, Karl-Franzens University): Franz Fazekas, MD, Reinhold Schmidt, MD, Stefan Ropele, PhD, Brigitte Rous, MD, Katja Petrovic, Ulrike Garmehi; Lisboa, Portugal (Serviço de Neurologia, Centro de Estudos Egas Moniz, Hospital de Santa Maria): José M. Ferro, MD, PhD, Ana Verdelho, MD, Sofia Madureira, PsyD; Amsterdam, The Netherlands (Department of Neurology, VU Medical Center): Philip Scheltens, MD, PhD, Ilse van Straaten, MD, Wiesje van de Flier, PhD, Frederik Barkhof, MD, PhD; Goteborg, Sweden (Institute of Clinical Neuroscience, Goteborg University): Anders Wallin, MD, PhD, Michael Jonsson, MD, Karin Lind, MD, Arto Nordlund, PsyD, Sindre Rolstad, PsyD, Kerstin Gustavsson, RN; Huddinge, Sweden (Karolinska Institute, Department of Clinical Neuroscience and Family Medicine, Huddinge University Hospital): Lars-Olof Wahlund, MD, PhD, Militta Crisby, MD, PhD, Anna Pettersson, physiotherapist, Kaarina Amberla, PsyD; Paris, France (Department of Neurology, Hopital Lariboisiere): Hugues Chabriat, MD, PhD, Ludovic Benoit, MD, Karen Hernandez, Solene Pointeau, Annie Kurtz, Daniel Reizine, MD; Mannheim, Germany (Department of Neurology, University of Heidelberg, Klinikum Mannheim): Michael Hennerici, MD, Christian Blahak, MD, Hansjorg Baezner, MD, Martin Wiarda, PsyD, Susanne Seip, RN; Copenhagen, Denmark (Memory Disorders Research Unit, Department of Neurology, Rigshospitalet and Danish Research Center for Magnetic Resonance, Hvidovre Hospital, Copenhagen University Hospital): Gunhild Waldemar, MD, DMSc, Egill Rostrup, MD, MSc, Charlotte Ryberg, MSc; Tim Dyrby; Newcastle-on-Tyne, UK (Institute for Ageing and Health, University of Newcastle): John O’Brien, DM, Sanjeet Pakrasi, MRCPsych, Thais Minnet, PhD, Michael Firbank, PhD, Jenny Dean, PhD, Pascale Harrison, BSc, Philip English, DCR.
The coordinating center is in Florence, Italy (Department of Neurological and Psychiatric Sciences, University of Florence): Domenico Inzitari, MD (Study Coordinator); Leonardo Pantoni, MD, PhD, Anna Maria Basile, MD, Michela Simoni, MD, Giovanni Pracucci, MD, Monica Martini, MD, Luciano Bartolini, PhD, Emilia Salvadori, PhD, Marco Moretti, MD, Mario Mascalchi, MD, PhD.
The LADIS Steering Committee is formed by Domenico Inzitari, MD (study coordinator), Timo Erkinjuntti, MD, PhD, Philip Scheltens, MD, PhD, Marieke Visser, MD, PhD, and Kjell Asplund, MD, PhD.
The LADIS study was supported by the European Union (grant QLRT-2000-00446). The authors thank Ronald van Schijndel for his support with the volumetric measurements and Dirk Knol for his help with the curve fitting.
A list of collaborators of the LADIS study is presented in the Appendix.
- Received August 19, 2005.
- Revision received October 20, 2005.
- Accepted November 15, 2005.
Breteler MM, van Swieten JC, Bots ML, Grobbee DE, Claus JJ, van den Hout JH, van Harskamp F, Tanghe HL, de Jong PT, van Gijn J. Cerebral white matter lesions, vascular risk factors, and cognitive function in a population-based study: the Rotterdam Study. Neurology. 1994; 44: 1246–1252.
Longstreth WT Jr, Manolio TA, Arnold A, Burke GL, Bryan N, Jungreis CA, Enright PL, O’Leary D, Fried L. Clinical correlates of white matter findings on cranial magnetic resonance imaging of 3301 elderly people. The Cardiovascular Health Study. Stroke. 1996; 27: 1274–1282.
Schmidt R, Fazekas F, Offenbacher H, Dusek T, Zach E, Reinhart B, Grieshofer P, Freidl W, Eber B, Schumacher M. Neuropsychologic correlates of MRI white matter hyperintensities: a study of 150 normal volunteers. Neurology. 1993; 43: 2490–2494.
Scheltens P, Erkinjunti T, Leys D, Wahlund LO, Inzitari D, del Ser T, Pasquier F, Barkhof F, Mantyla R, Bowler J, Wallin A, Ghika J, Fazekas F, Pantoni L. White matter changes on CT and MRI: an overview of visual rating scales. European Task Force on Age-Related White Matter Changes. Eur Neurol. 1998; 39: 80–89.
Pantoni L, Simoni M, Pracucci G, Schmidt R, Barkhof F, Inzitari D. Visual rating scales for age-related white matter changes (leukoaraiosis): can the heterogeneity be reduced? Stroke. 2002; 33: 2827–2833.
Pantoni L, Basile AM, Pracucci G, Asplund K, Bogousslavsky J, Chabriat H, Erkinjuntti T, Fazekas F, Ferro JM, Hennerici M, O’Brien J, Scheltens P, Visser MC, Wahlund LO, Waldemar G, Wallin A, Inzitari D. Impact of age-related cerebral white matter changes on the transition to disability—the LADIS Study: rationale, design and methodology. Neuroepidemiology. 2005; 24: 51–62.
Wahlund LO, Barkhof F, Fazekas F, Bronge L, Augustin M, Sjogren M, Wallin A, Ader H, Leys D, Pantoni L, Pasquier F, Erkinjuntti T, Scheltens P. A new rating scale for age-related white matter changes applicable to MRI and CT. Stroke. 2001; 32: 1318–1322.
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied Linear Statistical Models. Burr Ridge, Ill: Irwin; 1996.
Tullberg M, Fletcher E, DeCarli C, Mungas D, Reed BR, Harvey DJ, Weiner MW, Chui HC, Jagust WJ. White matter lesions impair frontal lobe function regardless of their location. Neurology. 2004; 63: 246–253.
Kapeller P, Barber R, Vermeulen RJ, Ader H, Scheltens P, Freidl W, Almkvist O, Moretti M, del Ser T, Vaghfeldt P, Enzinger C, Barkhof F, Inzitari D, Erkinjunti T, Schmidt R, Fazekas F. Visual rating of age-related white matter changes on magnetic resonance imaging: scale comparison, interrater agreement, and correlations with quantitative measurements. Stroke. 2003; 34: 441–445.
Prins ND, van Straaten EC, van Dijk EJ, Simoni M, van Schijndel RA, Vrooman HA, Koudstaal PJ, Scheltens P, Breteler MM, Barkhof F Measuring progression of cerebral white matter lesions on MRI: visual rating and volumetrics. Neurology. 2004; 62: 1533–1539.
Atwood LD, Wolf PA, Heard-Costa NL, Massaro JM, Beiser A, D’Agostino RB, DeCarli C. Genetic variation in white matter hyperintensity volume in the Framingham Study. Stroke. 2004; 35: 1609–1613.