Rating Method for Dilated Virchow–Robin Spaces on Magnetic Resonance Imaging
Background and Purpose—Dilated Virchow–Robin spaces are an emerging neuroimaging biomarker, but their assessment on MRI needs standardization.
Methods—We developed a rating method for dilated Virchow–Robin spaces in 4 brain regions (centrum semiovale, basal ganglia, hippocampus, and mesencephalon) and tested its reliability in a total of 125 MRI scans from 2 population-based studies. Six investigators with varying levels of experience performed the ratings. Intraclass correlation coefficients were calculated to determine intra- and interrater reliability.
Results—Intrarater reliability was excellent for all 4 regions (intraclass correlation coefficient, >0.8). Interrater reliability was excellent for the centrum semiovale and hippocampus (intraclass correlation coefficient, >0.8) and good for the basal ganglia and mesencephalon (intraclass correlation coefficient, 0.6–0.8). This did not differ between the cohorts or experience levels.
Conclusions—We describe a reliable rating method that can facilitate pathogenic and prognostic research on dilated Virchow–Robin spaces using MRI.
The study of imaging biomarkers plays an essential role in understanding brain aging and pathology, such as cognitive impairment, dementia, and cerebrovascular disease.1 Structural imaging studies have already shown the importance of white matter lesions, infarcts, and more recently cerebral microbleeds.1 An emerging potential marker is Virchow–Robin spaces (VRSs), spaces filled with interstitial fluid that surround the blood vessels in the brain.2 VRSs can increase in size, and such dilated VRSs (dVRSs) can subsequently be found on brain imaging,3 particularly in the mesencephalon, hippocampus, basal ganglia (BG), and centrum semiovale (CSO).4,5 Determinants of dVRS severity include age,6 blood pressure,6 and inflammation.7 The associated brain etiology is diverse, covering age-related cerebral small vessel disease,6,8,9 Alzheimer disease,4,9,10 and Cerebral Autosomal Dominant Arteriopathy with Subcortical Infarcts and Leukoencephalopathy (CADASIL).11
Despite increasing literature on dVRSs, a major limitation of current research is the lack of a robust and generalizable rating method on MRI. Current methods are restricted to studies that only use a single MRI protocol and focus on 1 or 2 brain regions.3,4,6,8,9,11 A method that can be applied to MRI protocols from different centers and scanners and evaluates the whole brain would strongly facilitate pathogenic and prognostic research on dVRSs. Here, we propose a novel rating method for dVRSs, which we apply in 2 population-based studies, encompassing 3 different scanning protocols.
We aimed to develop a rating protocol meeting 3 preconditions. First, the method should be standardized and generalizable across various MRI protocols. Second, intra- and interrater agreement should be high, irrespective of rater experience. Third, the method should be easily applicable for other researchers without requiring complex image processing.
The ASPS is a prospective community-based study investigating the effects of vascular risk factors on brain structure and function in residents of Graz, Austria (≥45 years of age). Between 1999 and 2003, a diagnostic work-up, including MRI, was done. Scans were obtained on a 1.5T Philips scanner. The MRI protocol included axial T1-weighted, T2-weighted, proton-density–weighted, and fluid-attenuated inversion recovery sequences. The study protocol was described previously.12
The RSS investigates causes and determinants of chronic neurological diseases in the elderly (≥45 years of age). Participants are residents of Ommoord, a suburb of Rotterdam, the Netherlands. Brain MRI was incorporated into the core study protocol from 2005 onward using a 1.5T GE MR unit. The protocol has been extensively described and includes axial T1-, T2-weighted, and fluid-attenuated inversion recovery sequences.13 Earlier in 1995, a smaller MRI study was performed using a 1.5T Siemens system with the protocol, including T1- and T2-weighted sequences.13
We developed and applied our rating method on scans from the ASPS and 2005 RSS because these were acquired with the most up-to-date protocols available. The primary rating sequence was T2-weighted (ASPS: slice thickness, 4.5 mm; RSS: slice thickness, 1.6 mm), which shows VRSs as hyperintensities (Figure I in the online-only Data Supplement). VRSs were identified by their linear, ovoid, or round shape depending on the slice direction and considered dilated when their diameter was ≥1 mm.14 Also, because dVRSs >3 mm in shortest diameter may have a distinct pathogenesis,3 these large lesions were rated separately and not evaluated in the reliability analyses. For differential diagnosis with lacunar infarcts, symmetry of the lesions, sharp demarcation, and absence of a hyperintense rim on the fluid-attenuated inversion recovery sequence supported rating them as dVRSs.14 White matter lesions (WMLs) are mostly confluent and were differentiated from dVRSs by signal intensity not equivalent to cerebrospinal fluid on T2.
dVRSs were scored in 4 brain regions: the CSO, BG, hippocampus, and mesencephalon. This choice was based on the pronounced presence of dVRSs in these regions, which was reported earlier and is known from our own experience.4,5 Raters determined dVRS count for each region, with a maximum of 20 per region. Because CSO and BG are visible on multiple slices, the rating was done on a single, predefined slice to decrease inter- and intrarater variability. For CSO, this was the slice 1 cm above the lateral ventricles. For BG, this was the slice showing the anterior commissure or, when not visible, the first slice superior to it. In the hippocampus and mesencephalon, all unique dVRSs were counted (Figure). A blank rating form is provided in File I in the online-only Data Supplement.
To assess the intrarater reliability, 1 rater (H.H.H.A.) scored 85 scans twice, blinded to his initial rating, separated by >1 month. Interrater reliability was assessed on 100 randomly selected scans and 5 additional scans in case of motion artifacts on the initial 100 (40 ASPS, 65 RSS). Every scan was rated independently by 3 to 6 investigators with varying degrees of experience (1–2 years: H.H.H.A., M.C., B.F.J.V., and D.B.; >10 years: C.E. and R.S.) who were blinded to all clinical data. The order of scans was randomized and different for each rater. Afterward, we also assessed the reliability on 20 scans from the 1995 RSS MRI protocol rated by 3 investigators (H.H.H.A., B.F.J.V., and D.B.).
Intrarater and interrater reliability was determined using intraclass correlation coefficients (ICCs) for all raters combined. Secondary analyses were performed after stratifying by MRI protocol (ASPS versus RSS), experience level (1–2 years versus 10 years) or coexisting brain pathology (WMLs, brain atrophy, and lacunar infarcts). WMLs and brain volume were measured with automated software within each cohort and dichotomized at the median value to provide equally sized groups. For lacunar infarcts, we restricted to participants without lacunar infarcts.
Study population characteristics are shown in Table 1 (mean age, 65.8 [SD, 5.8] years; 54 [51%] women). The distribution of the average dVRS count showed most dVRSs in the CSO (9.63 [SD, 6.79]), followed by the BG (5.30 [3.41]), hippocampus (3.35 [3.14]), and mesencephalon (1.78 [1.75]; Figure II in the online-only Data Supplement).
Intrarater reliability for the 85 scans showed nearly perfect agreement (ICC >0.8) for all regions (Table 2). The ICC values for the 105 scans indicate good agreement between raters (ICC, 0.6–0.8) for the BG and mesencephalon and nearly perfect agreement for the CSO and hippocampus (Table 2). Calculating the ICCs for RSS and ASPS scans separately gave similar values (Table 2). Furthermore, interrater reliability was independent of rater experience, WML burden, and brain volume (Table 2). Excluding participants with lacunar infarcts (n=8) also did not alter the results (Table 2). In the 20 additional scans from the 1995 RSS protocol, ICC values were >0.8 for each region (data not shown).
We propose a newly developed rating method for dVRSs in 4 brain regions, which shows good to nearly perfect interrater and intrarater agreement, independent of rater experience and concomitant brain pathology. We applied this method to a total of 125 MRI scans acquired from 3 different scanners and protocols across 2 cohorts and found comparable reliabilities.
The proposed rating has several strengths that can facilitate future dVRS research. We developed the protocol on a large data set of images from different MRI scanners, with multiple raters of differing experience level, and performed secondary analyses for factors potentially affecting observer agreement. Also, we included the 4 brain regions with most prevalent dVRSs, while rater instructions remained simple and time investment was minimal (≈3 minutes per scan). Moreover, regular transverse slices were used for scoring, thereby eliminating the need for complex planar reformatting of scans.
Whereas previous studies have only used upper limits in size for defining dVRSs,6,9,11 we also implemented a minimum diameter criterion to consider VRSs dilated. This is because the increasing resolution of new MRI scanners will enable detection of many VRSs <1 mm, which could inflate the dVRS rating and reduce comparability between studies if not excluded. Morphological criteria were used for differentiation among dVRSs, lacunar infarcts, and WMLs.14 Although reliability of our method was not affected by concomitant brain pathology visible on MRI, the distinction between dVRSs and lacunar infarcts in particular remains controversial.14
As an alternative to counting dVRSs, we considered assigning a severity score to each region after comparison with a consensus-based template. Although preliminary analyses revealed good intrarater agreement on 30 scans (average of regions, 0.70), interrater agreement was weak to moderate (0.48). We, therefore, did not pursue this approach further. Existing rating protocols were not evaluated because there is currently no gold standard for quantifying dVRS burden. A future direction would be to compare the reliability across different rating protocols.
In conclusion, this study presents a generalizable rating method for dVRSs in the mesencephalon, hippocampus, BG, and CSO, which has been tested in a multicenter setting. The protocol allows for better comparability between VRS research and is easy to implement by investigators.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.111.000620/-/DC1.
- Received January 15, 2013.
- Revision received March 4, 2013.
- Accepted March 14, 2013.
- © 2013 American Heart Association, Inc.
- Gorelick PB,
- Scuteri A,
- Black SE,
- Decarli C,
- Greenberg SM,
- Iadecola C,
- et al
- Chen W,
- Song X,
- Zhang Y
- Zhu YC,
- Dufouil C,
- Mazoyer B,
- Soumaré A,
- Ricolfi F,
- Tzourio C,
- et al
- Zhu YC,
- Tzourio C,
- Soumaré A,
- Mazoyer B,
- Dufouil C,
- Chabriat H
- Maclullich AM,
- Wardlaw JM,
- Ferguson KJ,
- Starr JM,
- Seckl JR,
- Deary IJ
- Patankar TF,
- Mitra D,
- Varma A,
- Snowden J,
- Neary D,
- Jackson A