Variability in Measurement of Extracranial Internal Carotid Artery Stenosis as Displayed by Both Digital Subtraction and Magnetic Resonance Angiography
An Assessment of Three Caliper Techniques and Visual Impression of Stenosis
Background and Purpose The degree of stenosis in the extracranial internal carotid artery helps predict the risk of an individual suffering subsequent cerebrovascular ischemic events. Different techniques have evolved to measure stenosis from angiograms, leading to some confusion and a call for the adoption of a single technique. To help choose the most reliable technique, this study assessed observer variability in reporting carotid stenosis for four different techniques, from both digital subtraction (DSA) and MR angiograms (MRA). Three of the techniques used caliper measurements; the fourth was the visual impression of stenosis.
Methods From a total of 137 angiograms, caliper measurements were possible on 105 DSAs and 74 MRAs. Measurements from these angiograms were made by two independent observers on two separate occasions to assess interobserver and intraobserver variation in reporting.
Results For DSA, the variability in reporting and the number of clinically significant differences arising as a result were similar for each of the four techniques. While the typical measurement errors for each of the techniques studied were on the order of ±5%, each technique produced some sizable individual differences for the same angiogram, with resultant wide 95% limits of agreement. Observer variability for reporting MRA was generally a little greater than for DSA. Compared with the caliper techniques, the visual impression of stenosis technique performed well, particularly for MRA.
Conclusions Although observer variability in reporting can be considerable, no important differences were found among the different techniques widely used for measuring carotid stenosis.
The measurement of internal carotid artery stenosis is one of the most important assessments made when symptomatic patients are considered for prophylactic internal carotid endarterectomy. Several different methods of measuring the degree of stenosis from carotid angiograms have evolved, each with different definitions of what constitutes carotid stenosis. It is thus possible for the same angiogram to be reported as showing differing degrees of stenosis according to the particular method of measurement used. This is a potential source of confusion and frustration for both clinicians and researchers involved in this area. A uniform approach to measuring carotid stenosis is required, but which method should be adopted?
Any method of measuring carotid artery stenosis, in this context, should first be shown to reliably predict the risk of ipsilateral ischemic stroke. The degree of ipsilateral carotid stenosis, as measured by the techniques used in the ECST1 and NASCET,2 has been clearly shown to predict stroke risk. More recently, the predictive powers of these methods and of the common carotid technique have been shown to be almost identical.3 Given that there are no real differences in the predictive power of these caliper techniques, the reproducibility of measurements by each technique becomes the significant issue. The aim of this study was to compare the reproducibility of these caliper techniques by assessing observer variability in reporting both intra-arterial DSAs and MRAs. The visual impression of stenosis, or “eyeballing” the degree of stenosis, remains a widely used technique by which angiograms are assessed in routine clinical practice. We also assessed the reproducibility of this technique.
Materials and Methods
Selective, intra-arterial DSAs and MRAs of the carotid bifurcations from 70 patients (137 vessels) were obtained as part of a study comparing DSA and MRA. Details of the patients and techniques involved are as previously reported.4 The angiograms were reported by two of the authors (P.R.D.H. and G.R.Y.) independently and blinded from previous results.
For each angiographic examination, the reporting investigator determined the image and viewing angle that best demonstrated the stenosis present in the internal carotid artery and recorded four measurements directly from the angiogram (Fig 1⇓). These measurements were as follows: the diameter of the minimum residual lumen of the internal carotid artery at the point of maximum stenosis (a), the estimated diameter of the original internal carotid artery lumen at the point of maximum stenosis (not the “bulb,” as is often assumed) (b), the diameter of the distal internal carotid artery lumen at the point where the vessel walls first appeared parallel (c), and the diameter of disease-free common carotid artery proximal to the carotid artery bifurcation (d). Measurements were made with the use of mechanical vernier-scale calipers, reading to 0.02 of a millimeter. For each angiogram the visual impression of the degree of stenosis (“eyeball” measurement) was also recorded. Vessels with complete occlusion of the internal carotid artery were excluded from the analysis. Likewise, vessels in which the degree of stenosis was such that an apparent gap was present in the contrast column by DSA or in the flow signal by MRA were also excluded, since in this situation the minimum residual lumen could not be measured.
Three months after the initial measurements, the angiograms were assessed a second time by the same investigators using the same protocol. No information from the first assessment concerning the specific image, viewing angle, or result obtained was available to the investigator at this time.
The degree of stenosis for each angiogram was calculated by each of three different methods, with the use of the caliper measurements obtained, as shown:
Results were rounded to the nearest whole figure. The agreement between measurements made by each of the above methods, as well as by the eyeball method, was assessed, both between the two observers and also for each observer reporting the angiograms on two occasions (ie, the interobserver and intraobserver agreement).
The measurement of agreement in this situation is complex. The results obtained by each of the methods are not directly comparable. Since each caliper method uses a different denominator in the calculation of stenosis, each will produce considerably different results for the same angiogram. An approximately linear relationship has been demonstrated between the results obtained by these different caliper techniques, with the relationship between the common carotid and NASCET techniques reported by Rothwell et al5 as follows:
From this equation it can be seen that in addition to the two methods generating different values for a specific angiogram, the measurement scales are also quite different. Thus, the range of 0 to 100 for the common carotid method is reflected in a range of −67 to 100 for the NASCET method. A difference of 1% by the common carotid method is thus equal to a difference of approximately 1.7% by NASCET. Comparing the differences recorded by each of the methods will therefore require that the results are first converted to an equivalent scale.
For this comparison, the mean of the four results available for each angiogram by each of the different methods (two results for each of the two investigators) was calculated. Using regression statistics, we calculated the mathematical relationship between each of the methods for DSA and separately for MRA using these mean values. The results obtained by the common carotid, NASCET, and eyeball methods were then interpolated to the “equivalent” value by the ECST method with the use of the appropriate regression equation. The analyses below were performed on these interpolated results.
Agreement plots according to Bland and Altman6 were constructed, and the mean of the differences and 95% limits of agreement were calculated for each comparison. Any systematic differences, such as one observer consistently reporting tighter stenosis than the other (bias), would result in the mean of the differences being significantly different from zero, with the points scattered predominantly above or below the zero difference line. The wider the scatter between the points in the direction of the y axis, the worse will be the agreement. The 95% limits of agreement represent the range of values within which, for a given measurement, 95% of the results of a second reading would be expected to lie. The closer the agreement, the narrower will be the limits of agreement.
The absolute differences between observations on the same vessel were also obtained and the median value calculated. The median of the absolute differences is a measure of the typical magnitude, although not the “direction,” of the differences between observations.
The results were also assigned into one of two clinically important categories on the basis of available information: “surgical” and “nonsurgical” stenoses. Surgical stenosis was defined as a stenosis of 70% or greater by the ECST method; nonsurgical stenosis was defined as a stenosis of less than 70% by the ECST method. Agreement concerning the classification of vessels was then assessed with the use of the κ statistic.7
Full measurements were available for 105 of the DSAs. On at least one occasion, 15 angiograms were reported as showing a gap in the contrast column, 14 as occluded, and 3 as unmeasurable.
Full measurements were available for 74 vessels examined by MRA. On at least one occasion, 46 angiograms were reported as showing a signal gap, 14 as occluded, and 3 as unmeasurable.
In 73 vessels, full measurements were available for both the DSAs and the MRAs, and therefore these vessels were analyzed separately to enable comparison between DSA and MRA as well as among the different techniques of measurement.
The mean of the differences, the median of the absolute differences, and the 95% limits of agreement for DSA results are shown in Table 2⇓.
The 95% limits of agreement and the median of the absolute differences for the 73 vessels in which caliper results were available for all vessels by both DSA and MRA are shown in Table 3⇓.
The κ values for the classification of results as surgical (≥70% by ECST) and nonsurgical (<70% by ECST) stenosis for each of the four methods are shown in Table 4⇓ for DSA. In terms of interobserver results, the comparison of the caliper results showed the mean of the differences to be significantly different from zero, ie, one observer reported consistently tighter degrees of stenosis than the other. In an attempt to separate the effect of this bias on the κ values obtained for these comparisons, a further analysis was performed after correction of the bias. This was achieved by subtracting the mean of the differences between the two observers from the results of that observer reporting tighter stenoses. The resultant κ values represent more closely the performance of each of the methods of measurement under investigation, although the contribution of observer bias to overall agreement is clearly of vital importance.
There were too few vessels in the surgical category for a κ analysis to be performed for MRA since a considerable proportion of vessels with tight stenosis appear with a “signal gap” by MRA. No signal gap on MRA appeared in vessels with less than 70% stenosis by the ECST method.
Comparison of the different methods of measuring internal carotid artery stenosis by DSA has recently been investigated by others. This is the first study to also assess these methods by MRA. Rothwell et al3 assessed the ECST, NASCET, and common carotid techniques using conventional angiograms from 1001 vessels for interobserver and 50 vessels for intraobserver variability comparisons. After interpolating the results to the ECST equivalents by use of linear relationships, they found that while the relative performance of each of the methods varied according to the degree of stenosis, the common carotid technique was superior to the NASCET and ECST techniques in the clinically important range of 50% to 90% stenosis, as measured by ECST. In response, Eliasziw et al,8 on behalf of the NASCET group, assessed the variability of measurements of the common carotid artery from 30 angiograms and found that the diameter varied between angiograms, with the most variation at the bifurcation and the least approximately 20 to 30 mm proximal to the bifurcation. The variability of the distal internal carotid artery was not reported, and no direct comparison of the variability of reporting for the different methods was made. Bladin et al,9 using conventional angiography, compared the ECST and NASCET techniques with a third method, which they termed the “carotid stenosis index.” This technique involved obtaining an estimate of the diameter of the carotid bulb, derived from measurement of the proximal common carotid artery, and using this as the denominator in the equation for the calculation of stenosis. They reported technical difficulty in performing measurements for ECST in 5% of vessels, for NASCET in 11%, and for the carotid stenosis index in 1%, although the method they used to measure the ECST stenosis was incorrect. They also found repeatability to be greatest for the carotid stenosis index and least for the NASCET technique. Vanninen et al10 compared the ECST and NASCET techniques in 80 conventional angiograms and, using κ statistics, found that there was greater intraobserver variability using the NASCET method. For the latter two studies, it is not clear if the results obtained by each method were converted to a similar scale before the analysis.
As in the report of Eliasziw et al,8 in this study we found a linear relationship between the eyeball and ECST techniques and a parabolic relationship, described by a quadratic equation, between the NASCET and common carotid techniques and ECST. After conversion of the results by the other techniques to the equivalent ECST result with the use of these relationships, a very similar range of percent stenosis was covered by each of the methods, as can be seen from the agreement plots (Figs 2⇑ and 3⇑). In this way, direct comparison of disagreement by percent stenosis is possible. In contrast to Bladin et al,9 we did not find any difference between the methods in terms of technical ability to obtain measurements from the angiograms.
Before differences between the techniques are discussed, it is important to note that it is apparent from this study that measurements made by each of the techniques investigated are subject to quite considerable variability. The plots show that at all levels of stenosis, sizable individual differences can occur both between different observers and for the same observer on two occasions. This is also seen in the 95% limits of agreement, with the narrowest limits obtained for DSA covering a range of 23%. Thus, even at best, there will be sizable individual differences on some occasions between observers reporting the same angiograms. This undoubtedly leads to clinically important disagreements, as confirmed by the κ analyses in this study. The mean of the differences for each of the techniques was close to 0% for the intraobserver variation, so that the observers were reporting the same level of stenosis for the group of angiograms as a whole. Any tendency to report vessels as more tightly stenosed on the second occasion was balanced by an equal tendency to report other vessels as less tightly stenosed. In this situation, the absolute differences between readings are a guide to the magnitude of the measurement error. The fact that the median values for the absolute differences were between 4% and 5% for DSA and between 3% and 6% for MRA indicate that a “typical” measurement error is on the order of ±4% to 5% for each technique. In terms of interobserver variability, another consideration is the presence of systematic differences between different observers, or bias. This would result in the mean of the differences being significantly different from 0%, with greater median values for the absolute differences between observers. To a certain extent, the effects of bias can be countered if the mean of the differences between the two observers is known and subsequently subtracted from the readings made by the second observer. This will improve agreement, although clearly it is not possible to know which observer was initially reporting closer to the “true” stenosis.
Concerning the comparison between the different techniques of measurement, the interobserver variation in reporting DSAs was similar for each of the techniques investigated. This was also the case when the intraobserver variations for each technique were compared. However, in this case there was significant bias present between the two observers when the caliper techniques were compared, with the mean of the differences significantly different from zero. This is clearly seen in the plots of agreement, with the points lying predominantly above the zero difference line for all of the caliper techniques. As mentioned above, it is possible, provided that one is aware of its presence, to correct for bias by subtracting the magnitude of the bias from the appropriate observations. Systematic differences in measuring from the same angiograms can occur because of difficulties in precisely locating the vessel lumen boundary on conventional angiograms. This occurs for a number of reasons, including a penumbra effect caused by the finite size of the focal spot of the x-ray source11 as well as the fact that the x-ray attenuation profile is weakest, and therefore hardest to detect, at the edges of the contrast-filled lumen.12 It should be possible to overcome this problem by ensuring a uniform technique between observers when calipers are used, particularly when measuring from magnified images. There was no significant difference between observers when eyeball measurements were compared. This is surprising since differences between observers would intuitively be expected to be a major disadvantage of this technique. Obviously, observers from the same center, reviewing angiograms together regularly, are likely to develop similar reporting habits. These results should therefore be treated with some caution, since comparisons between observers from different centers would be likely to show greater variability.
Observer variability was consistently greater for MRA than for DSA. It must be remembered, however, that because this study was comparing caliper techniques, those tightly stenosed vessels in which a signal gap was present by MRA were excluded from the analysis. Given the fact that all vessels in which a signal gap was present in this study had more than 70% stenosis according to the corresponding DSA, the presence of such a signal gap can be considered to reliably indicate the presence of a surgical stenosis by current criteria.
With wide confidence intervals, there were no statistically significant differences among the caliper techniques for intraobserver and interobserver variation by MRA. Although not statistically significant, the measurements by the eyeball technique gave consistently narrower limits of agreement than those for the caliper techniques.
The classification of vessels according to 70% stenosis was an attempt to measure the clinical consequences of the variability in reporting. In this series covering a wide range of possible stenoses, the number of “clinically important disagreements” ranged from 4 to 13 vessels of 105 (Table 4⇑). Obviously, the more vessels at or around the 70% cutoff, the greater the number of disagreements that were likely to occur. For this reason, it is not possible to compare these results with those obtained in different studies involving different patient populations. In terms of intraobserver comparisons, there were no significant differences between the κ values obtained for the different methods as measured by observer 1. For observer 2, the κ value by the NASCET analysis was significantly lower than that by the ECST method. In terms of the interobserver analysis, there were more disagreements for the caliper methods. This was in part due to the bias previously noted, with one observer systematically reporting tighter degrees of stenosis than the other. After the effects of this bias were removed, there were no significant differences among the techniques, although there were considerably more disagreements when the NASCET technique was used.
Overall, in our assessment of DSAs we do not believe that there are sufficient differences between these techniques to recommend one technique over any other on the basis of repeatability. With more vessels in the analysis, one might expect statistically significant differences to be found. However, perhaps a more important message is that quite sizable differences in measurement are regularly encountered no matter which technique is used. An interesting finding is that two techniques that would intuitively be considered to be subject to greater variability, namely the ECST and eyeballing methods, performed very well in this study. Indeed, in the case of MRA, the performance of the eyeball technique was, if anything, better than the caliper techniques. Intuitively, caliper measurements would appear to be the more “scientific” approach. One of the difficulties in assessing the eyeball results is the problem of digit preference. It is clear from the plots that when eyeballing angiograms, observers report results to the nearest 5%. By constraining results to “categories,” agreement may be affected. For example, if an observer only reported films as 0%, 50%, or 100% stenosed, extremely high repeatability would be expected. The results obtained would not, however, be very close to the “true stenosis” for the majority of observations, making the method of measurement useless. With cutoffs at 5% intervals, this effect is not as important and less likely to be of clinical significance. Measurement by eyeballing is, of course, heavily dependent on whose eyes are looking at the angiogram. The ability to accurately assess angiograms in this subjective manner is likely to depend on the experience of the observer involved. Conversely, one would expect precisely defined caliper measurements to be less dependent on the individual observer and therefore more readily generalizable. For these reasons, we would be wary of recommending adoption of the eyeballing technique on the evidence of this study alone. Eyeballing does, however, integrate the information from several different viewing angles into a single result, thereby eliminating some variability due to disagreement over which viewing angle to assess. In the case of MRA, the various artifacts and lower resolution in comparison to DSA can make interpretation of MRAs difficult, and qualitative impressions can be important. We have previously demonstrated good agreement between DSA, MRA, and ultrasound using eyeball measurements to report the DSAs and MRAs.4 It may be that when MRAs are reported with the use of caliper techniques, the distal internal carotid and common carotid arteries are less reliable regions from which to measure, particularly when axially acquired images centered on the carotid bifurcation are used, since these regions are located at the upper and lower extremes of the imaging volume.
In summary, no consistent differences were observed among the four different techniques for measuring internal carotid artery stenosis in terms of observer variability in reporting. The reproducibility of measurements made by the visual impression of stenosis, or eyeballing, was similar to that obtained by the caliper techniques and, in the case of MRA, tended to be superior. While the typical measurement errors for each technique were ±5%, all of the techniques studied gave rise to sizable disagreements for some vessels on repeated measurement. This has implications for both routine clinical practice and research, including method comparison studies such as comparisons between DSA and MRA, in which the differences due to variation in reporting will constitute a sizable proportion of the disagreement between the two methods.
Selected Abbreviations and Acronyms
|DSA(s)||=||digital subtraction angiography, digital subtraction angiogram(s)|
|ECST||=||European Carotid Surgery Trial|
|MRA(s)||=||MR angiography, MR angiogram(s)|
|NASCET||=||North American Symptomatic Carotid Endarterectomy Trial|
This study was supported by the Stroke Association, the Dunhill Medical Trust, the Mersey Regional Health Authority, and the University of Liverpool.
- Received August 31, 1995.
- Revision received November 6, 1995.
- Accepted November 7, 1995.
- Copyright © 1996 by American Heart Association
Rothwell PM, Gibson RJ, Slattery J, Warlow CP, for the European Carotid Surgery Trialists’ Collaborative Group. Prognostic value and reproducibility of measurements of carotid stenosis: a comparison of three methods on 1001 angiograms. Stroke. 1994;25:2440-2444.
Young GR, Humphrey PRD, Shaw MDM, Nixon TE, Smith ETS. Comparison of magnetic resonance angiography, duplex ultrasound, and digital subtraction angiography in assessment of extracranial internal carotid artery stenosis. J Neurol Neurosurg Psychiatry.. 1994;57:1466-1478.
Rothwell PM, Gibson RJ, Slattery J, Sellar RJ, Warlow CP, for the European Carotid Surgery Trialists’ Collaborative Group. Equivalence of measurements of carotid stenosis: a comparison of three methods on 1001 angiograms. Stroke. 1994;25:2435-2439.
Eliasziw M, Smith RF, Singh N, Holdsworth DW, Fox AJ, Barnett HJM, for the North American Symptomatic Carotid Endarterectomy Trial (NASCET) Group. Further comments on the measurement of carotid stenosis from angiograms. Stroke. 1994;25:2445-2449.
Bladin CF, Alexandrov AV, Murphy J, Maggisano R, Norris JW. Carotid stenosis index: a new method of measuring internal carotid artery stenosis. Stroke. 1995;26:230-234.
Vanninen R, Manninen H, Koivisto K, Tulla H, Partanen K, Puranen M. Carotid stenosis by digital subtraction angiography: reproducibility of the European Carotid Surgery Trial and the North American Symptomatic Carotid Endarterectomy Trial measurement methods and visual interpretation. AJNR Am J Neuroradiol.. 1994;15:1635-1641.
Milne ENC, Friedman PJ. The role and performance of minute focal spots in roentgenology and special reference to magnification. Crit Rev Radiol Sci.. 1971;2:269-277.