Reproducibility of Measurements of Cerebral Infarct Volume on CT Scans
Background and Purpose—Infarct volume is increasingly used as an outcome measure in clinical trials of therapies for acute ischemic stroke. We tested which of 5 different methods to measure infarct size or volume on CT scans has the highest reproducibility.
Methods—Infarct volume and total intracranial volume were measured with Leica Q500 MCP image analysis software, or with a caliper, on 38 CT scans of patients who participated in the Tirilazad Efficacy Stroke Study II (TESS II). The scans were performed 8 days (±2 days) after the onset of symptoms. The 5 methods tested were based on (1) semiautomated pixel thresholding, (2) manual tracing of the perimeter, (3) a stereological counting grid, (4) measurement of the 3 largest diameters, and (5) the single largest diameter. The measurements were performed independently by 2 observers; the first observer performed all measurements twice.
Results—The single largest diameter did not correlate well with infarct volume. Of the other methods, manual tracing of the perimeter of the infarct had the lowest intraobserver and interobserver variability: coefficients of variation were 8.6% and 14.1%, respectively. For total intracranial volume, manual tracing also provided the highest reproducibility: intraobserver and interobserver coefficients of variation were 3.3% and 4.9%, respectively.
Conclusions—Manual tracing of the perimeter is the most reproducible method for measuring the volumes of the infarct and the total intracranial space in multicenter trials of therapies for acute ischemic stroke.
Infarct volume as measured by CT or MRI is increasingly used as a surrogate or auxiliary outcome measure in clinical trials of therapies for acute ischemic stroke.1 2 3 4 5 Several methods to measure infarct volume on CT scans have been developed, of which manual tracing of the infarct perimeter6 is the most well established. Other methods are based on pixel thresholding,7 8 a stereological counting grid,9 or measurement of the 3 largest diameters.10 The single largest visible diameter has also been used as a measure for infarct size.11 If a method for measuring infarct volume is applied in (multicenter) trials, it should be characterized by good reproducibility, even when performed on hard copies of scans of different quality, calibration, and density scale parameters. Although fair interrater reliabilities for some of the techniques have been reported before,9 10 the present study is the first to compare the reproducibility of the 5 different methods.
The aim of the present study was to determine which of the aforementioned methods to measure infarct volume has the highest reproducibility. To express infarct volume as a percentage of the total intracranial volume (ICV), we also compared the intraobserver and interobserver variability of this measure obtained with each of the different methods.
Subjects and Methods
Measurements were performed by 2 of the authors (H.B. van der W. and S.P.C.) on 45 CT scans of patients who participated in the European-Australasian multicenter, international, randomized, vehicle-controlled trial of the efficacy of tirilazad mesylate in patients with acute ischemic stroke, the Tirilazad Efficacy Stroke Study II (TESS II) (Upjohn protocol M-2700-0088). In TESS II, patients aged 40 to 85 years with a clinical syndrome of acute infarction in the territory of one middle cerebral artery received tirilazad mesylate or vehicle intravenously for 3 days. The study protocol had been reviewed and approved by the institutional review boards of the participating hospitals. All patients had been informed about the background and procedures of the trial and had given their explicit consent.
In each patient, noncontrast CT examination of the brain was performed twice: first before treatment, or at the latest within 12 hours after stroke onset, and again on day 8 (±2 days). In this study, only CT scans made on day 8 that showed an infarct in the territory of the middle cerebral artery were used. The scans were randomly selected from 3 subgroups on the basis of the following classification of the infarct: (1) cortical and subcortical; (2) subcortical only (diameter >1.5 cm); and (3) lacunar (diameter 1.5 cm).
Apart from the time frame and a ban on the use of contrast, the TESS II protocol provided no guidelines for the performance of CT scans. The scans in the present study came from 24 different centers in Europe, Australia, and New Zealand, and at least 13 types of machines were used (machine type was not given on every scan). Slice thickness of the different scans ranged from 3 to 10 mm, with the exception of one scan with a slice thickness of 16 mm. The number of slices per scan ranged from 7 to 28.
For volume measurements, the CT scan films were placed on a transillumination base under a camera. Each image was captured and digitized. Of each scan, the images were calibrated by means of the internal calibration marks. If necessary, contrast was enhanced. Three routines were developed with Leica Q500 MCP image analysis software: (1) The pixel thresholding technique7 8 (Figure 1A⇓) was based on thresholds in signal intensity for both ICV and infarct. First, each intracranial pixel with an attenuation lower than bone was assigned to the ICV. Thereafter, each pixel with a signal intensity identical to or below that of the infarct was color coded. For each slice, this threshold in signal intensity was interactively determined by the investigator and depended on the density scale settings of the CT scan and the attenuation of the infarct. After color-coded compartments other than the lesion, such as cerebrospinal fluid, had been rejected by the investigator, the computer calculated the area of the infarct in the particular slice. Infarct was distinguished from cerebrospinal fluid on the basis of anatomy and signal intensity. (2) In the manual tracing technique6 (Figure 1B⇓), the perimeter of the area of abnormal low attenuation was traced on each CT slice showing the infarct. The same procedure was performed for the total intracranial space. (3) The stereological counting grid,9 based on the Cavalieri principle12 (Figure 1C⇓), is a single-lattice square grid with an interpoint distance of 1.5 cm superimposed over the brain image. In an earlier study of infarct volume measurement, this distance was chosen on the basis of pilot studies and published nomograms.9 For technical reasons, the interpoint difference was 1.4 cm in 4% of the cases and 1.6 cm in 2%. Each grid point within the intracranial space was assigned to either normal tissue or infarct. If a grid intersection fell on a boundary between 2 compartments, the point was assigned to the compartment in the lower left square. The points assigned to normal tissue or infarct were multiplied by the area of a single square. For each of the methods, all measured areas were multiplied by the slice distance to obtain the total volumes. (4) In the fourth method, the largest diameter (A) of the infarct and its largest perpendicular diameter (B) were measured with a caliper (Figure 1D⇓). The third, vertical diameter (C) was determined by summing the thicknesses of the slices in which the lesion was visible. Infarct volume was calculated according to the formula 0.5×A×B×C.10 (5) Finally, the single largest visible diameter was measured.11
The observers, who had 7 and 4 years of experience in the field of vascular neurology, respectively, performed the measurements independently, but in case of doubt about the presence of an infarct, a decision could be made by mutual agreement. To avoid a learning effect during the study, the measurements were performed after a training session. To test the intraobserver variability, all scans were reassessed by one author (H.B. van der W.) after all first measurements had been completed. The time between the first and second assessments of a scan with use of the same method was at least 4 weeks.
Intraobserver variability was determined for each method by comparing data of one author (H.B. van der W.) at 2 different measuring sessions; interobserver variability was determined by comparing the values of the first measurements of 2 authors (H.B. van der W. and S.P.C.). Variability of measurements was analyzed according to the method of Bland and Altman,13 including scatterplots showing the difference between 2 measurements against their mean, and by calculating the coefficient of variation for each method. This measure is equal to the intraobserver or interobserver error (SD of the mean difference) times 100, divided by √2 times the pooled mean values. Differences between the mean volumes of both the infarct and the total intracranial space as measured by the first investigator with the 4 different volume measurement techniques were compared by means of the paired Student’s t test.
Seven of the 45 selected scans were excluded: 3 were incomplete, 1 did not show an infarct, 1 was contrast enhanced, 1 had no calibration marks, and 1 had several different calibration marks. The intraobserver and interobserver coefficients of variation of the infarct volumes or sizes are presented in Table 1⇓. Scatterplots showing the difference between 2 measurements against their mean for methods 1 through 4 are presented in Figure 2⇓. Except for method 3 (stereological counting grid), the interobserver variability was larger than the intraobserver variability. For methods 1 through 4 there were no significant differences between the mean values of the infarct volumes. Both the intraobserver and interobserver variability were smallest for the manual tracing method.
The reproducibility was highest for the 1-dimensional method 5, but Figure 3⇓ shows that this method is not very accurate because there is no good correlation between the largest diameter and the volume of the infarct as measured with method 2. Figure 4⇓ suggests a fair correlation between infarct volumes as measured with semiautomated method 2 (perimeter tracing) and those measured according to the nonautomated method 4 (3 diameters); the coefficient of variation of these 2 methods was 23.1%.
Manual tracing was the only method that had a satisfactory reproducibility for the measurement of the volume of lacunar infarcts (Table 2⇓). When the stereological counting grid was used, a lacunar infarct was missed in 36% of the cases because none of the grid intersections fell on the infarct.
Values for total intracranial volume are shown in Table 3⇓. The average ICV measured with the counting grid (method 3) was 84 mL (95% CI, 28 to 139 mL) smaller than that measured with thresholding (method 1) and was 87 mL (95% CI, 34 to 140 mL) smaller than with the tracing method (method 2). For all 3 methods, the interobserver variability was larger than the intraobserver variability, but the reproducibility was much better than that for infarct volume. The intraobserver variabilities for the pixel thresholding and manual tracing methods were comparable, but the latter method had the smallest interobserver variability.
The time needed for the complete measurement of a single CT scan was highly dependent on the total number of slices and on the number of slices showing the infarct, but it generally varied between approximately 10 and 30 minutes for each of the 3 semiautomated techniques. Measurements of infarct volume were on average more rapidly performed with method 4 (3 diameters; approximately 3 minutes) than with each of the semiautomated techniques (2 to 10 minutes, depending on the number of slices showing the infarct).
In this study we tried to find the most accurate and reproducible method for measuring cerebral infarct volume on CT scans obtained in large multicenter clinical trials. In these trials, CT scans are likely to vary considerably in quality, density scales, calibration, and slice thickness. The scans in our study came from 24 different centers in Europe, Australia, and New Zealand that participated in the TESS II trial, and all these characteristics were indeed highly variable. Given these circumstances, the manual tracing method had the lowest intraobserver and interobserver variability for both infarct volume and intracranial volume.
The variability of measurements was analyzed according to the method of Bland and Altman13 and by calculating the coefficient of variation for each method. The low variability of the manual tracing method was demonstrated by the small limits of agreement calculated according to Bland and Altman and by the low coefficient of variation. The mean difference in the scatterplot was nearly zero, showing that there were no systematic discrepancies between the first and second measurements. The limits of agreement in the intraobserver study for the manual tracing method indicate that differences up to 16 mL are possible for the larger infarcts (Figure 2b⇑). Such differences may be acceptable in clinical trials that use cerebral infarct volume as an outcome.
For all 3 semiautomated techniques, the reproducibility of ICV was much better than that of infarct volume. The most probable explanation is that the border of the intracranial space is better defined than that of most infarcts. The identification of the boundary of an infarct is therefore more strongly dependent on the interpretation of the investigator. Others have also found that interobserver agreement improves when compartments are better demarcated.9
Only manual tracing of the infarct proved to be reliable for measuring lacunar infarct volumes. Because of the small size of these infarcts, a small deviation in absolute measures readily corresponds to a substantial relative error. Irrespective of the question of whether measuring the volume of lacunar infarcts in acute stroke trials is meaningful, measurement on CT scans with one of the other tested methods is not worthwhile.
The variability in attenuation within the infarct may have contributed to the unfavorable results of the method based on pixel thresholding. This technique required the threshold to be set at such a level that the entire infarct was included. In this way, a considerable amount of normal tissue was primarily designated as infarct. Although erroneously classified areas had to be subsequently rejected by the investigator, normal tissue with a signal intensity similar to or lower than that of the infarct may have been included in the infarct to a varying extent.
For the stereological counting grid, the largest interpoint distance that produced the minimum error has been reported to be 1.5 cm.9 However, with this grid size we observed intraobserver and interobserver variabilities that were much higher than with the other methods. In addition, in 36% of the cases a lacunar infarct was missed because none of the grid intersections fell on the infarct. Both problems would probably be reduced if a smaller grid size was used, but this would make the method much more time-consuming.
Our stereological method differed slightly from the method that has been described before.9 In the previous study, the number of grid intersection points that fell on the infarct was divided by the total number of intersection points and then multiplied by the ICV that was obtained with manual tracing. In this study we directly multiplied each point with the area of each square and the slice thickness. Although this difference may lead to a minor difference in absolute infarct volumes, it will not affect the variability as expressed by a percentage for both variants of the method.
Obviously, it was not possible to compare the obtained volumes for intracranial space and infarct with the gold standard of measurement after autopsy. However, despite the differences in reproducibility, the mean infarct volumes as obtained with the 4 multidimensional techniques were virtually similar, suggesting accuracy. For each of the semiautomated methods, the mean values for intracranial volumes were within the normal range of 1100 to 1400 mL,14 but for unknown reasons the values obtained with the counting grid were substantially smaller than those obtained with the other methods.
Among the methods tested, measurement of the largest diameter of the infarct was the most rapid and reproducible. The good intraobserver and interobserver variability may well be explained by the fact that only a single, 1-dimensional measurement had to be performed. However, the relation between the largest diameter of the infarcts and their volume was insufficient (Figure 3⇑). This disadvantage is overcome by the inclusion of both largest perpendicular diameters and the slice thickness in a formula to obtain infarct volume.10 15 This method has earlier been described to measure volumes of intracerebral hemorrhage.16 Despite its slightly lower reproducibility, the fair correlation between infarct volumes as measured according to this technique and those obtained with manual tracing makes this simple method acceptable for investigators who do not have access to an image analysis system.
Although not formally tested, the time spent on the complete measurement of one CT scan was more or less the same for the 3 semiautomated techniques. This period of time was highly dependent on the total number of slices and on the number of slices showing the infarct, but it generally varied between 10 and 30 minutes. In comparison with the large amount of time spent on inclusion, treatment, follow-up, and data collection of each patient included in an acute stroke trial, the time involved in obtaining this additional outcome measure appears acceptable.
In recent years, the use of MRI for measurement of infarct volume has expanded enormously.1 17 18 19 20 21 For this purpose, the advantages of MRI over CT include a better spatial resolution, earlier detection of ischemic tissue, and the possibility of performing different kinds of imaging in one sample.22 Some of these techniques put great demands on hardware, software, and personnel. In larger studies, these high requirements cannot always be met. Although MRI is the technique of choice for infarct measurements in smaller studies performed in academic and other specialized centers, CT is more suitable in larger multicenter trials because of its higher availability.
In conclusion, manual tracing of the perimeter is the most reproducible method for measuring the volume of a brain infarct and the total intracranial space in large multicenter trials of therapies for acute ischemic stroke. The method is rapid, reliable, and probably accurate, despite technical differences between the CT scans.
This study was supported by a grant from the Janivo Foundation to Dr van der Worp. Purchase of the Leica Q500 MCP image analysis system was made possible by a grant from Pharmacia & Upjohn. We thank Peter Sodaar for designing the routines for the volume measurements and for his assistance during the phase of data collection. We thank the TESS II investigators, Pharmacia & Upjohn, and Professor J.M. Orgogozo for giving us the opportunity to make use of CT scans of the TESS II trial.
accepted November 1, 2000.
- Received August 9, 2000.
- Revision received October 10, 2000.
- Copyright © 2001 by American Heart Association
Warach S, Benfield A, Schlaug G, Siewert B, Edelman RR. Reduction of lesion volume in human stroke by citicoline detected by diffusion-weighted magnetic resonance imaging: a pilot study. Ann Neurol. 1996;40:527–528. Abstract.
The NINDS rt-PA Stroke Study Group. Effect of rt-PA on ischemic stroke lesion size by computed tomography: preliminary results from the NINDS rt-PA Stroke Trial. Stroke. 1998;29:287.
Saver JL, Johnston KC, Homer D, Wityk R, Koroshetz W, Truskowski LL, Haley EC, for the RANTTAS Investigators. Infarct volume as a surrogate or auxiliary outcome measure in ischemic stroke clinical trials. Stroke. 1999;30:293–298.
Clark WM, Albers GW, Madden KP, Hamilton S, for the Thrombolytic Therapy in Acute Ischemic Stroke Study Investigators. The rtPA (alteplase) 0- to 6-hour acute stroke trial, part A (A0276g): results of a double-blind, placebo-controlled, multicenter study. Stroke. 2000;31:811–816.
Brott T, Marler JR, Olinger CP, Adams HP Jr, Tomsick T, Barsan WG, Biller J, Eberle R, Hertzberg V, Walker M. Measurements of acute cerebral infarction: lesion size by computed tomography. Stroke. 1989;20:871–875.
Woo D, Broderick JP, Kothari RU, Lu M, Brott T, Lyden PD, Marler JR, Grotta JC, for the NINDS t-PA Stroke Study Group. Does the National Institutes of Health Stroke Scale favor left hemisphere strokes? Stroke. 1999;30:2355–2359.
Lyden PD, Zweifler R, Mahdavi Z, Lonzo L. A rapid, reliable, and valid method for measuring infarct and brain compartment volumes from computed tomographic scans. Stroke. 1994;25:2421–2428.
Pantano P, Caramia F, Bozzao L, Dieler C, von Kummer R. Delayed increase in infarct volume after cerebral ischemia: correlations with thrombolytic treatment and clinical outcome. Stroke. 1999;30:502–507.
Gundersen HJG, Bendtsen TF, Korbo L, Marcussen N, Møller A, Nielsen K, Nyengaard JR, Pakkenberg B, Sørensen FB, Vesterby A, West MJ. Some new, simple and efficient stereological methods and their use in pathological research and diagnosis. APMIS. 1988;90:379–394.
Hubbard BM, Anderson JM. Sex differences in age-related brain atrophy. Lancet. 1983;1:1447–1448.
Castillo J, Dávalos A, Marrugat J, Noya M. Timing for fever-related brain damage in acute ischemic stroke. Stroke. 1998;29:2455–2460.
Kothari RU, Brott T, Broderick JP, Barsan WG, Sauerbeck LR, Zuccarello M, Khoury J. The ABCs of measuring intracerebral hemorrhage volumes. Stroke. 1996;27:1304–1305.
Saunders DE, Clifton AG, Brown MM. Measurement of infarct size using MRI predicts prognosis in middle cerebral artery infarction. Stroke. 1995;26:2272–2276.
Tong DC, Yenari MA, Albers GW, O’Brien M, Marks MP, Moseley ME. Correlation of perfusion- and diffusion-weighted MRI with NIHSS score in acute (<6.5 hour) ischemic stroke. Neurology. 1998;50:864–870.