Carotid Plaque Characterization by Duplex Scanning
Observer Error May Undermine Current Clinical Trials
Background and Purpose—Clinical studies currently in progress are using subjective methods to characterize plaque morphology from ultrasound images. However, there are few studies on the intraobserver and interobserver variability of these classifications. This study was designed to assess these variables.
Methods—Grading of plaque morphology from ultrasound images, stored both digitally and to hard copy, was performed by 2 classification schemes. Interobserver agreement was determined by 4 observers. Within-observer agreement was performed at intervals for up to 6 months. Accuracy of the 2 methods was determined by comparison with histology.
Results—Within- and between-observer agreement was moderate to good for full-color digital image analyses, with pooled κ values of κp=0.49±0.10 and κp=0.62±0.07 for the 2-category method and κp=0.53±0.06 and κp=0.52±0.05 for the 4-category method, respectively. Hard copy data analyses gave lower κ values. The more experienced observers produced higher within-observer agreements and higher correlation with histology.
Conclusions—Reproducible grading of ultrasound images is not consistently achievable among experienced observers, and within-observer agreement may vary with time. The current subjective ultrasound characterization of carotid plaque morphology used in clinical trials may be associated with unacceptable levels of reproducibility in some centers. Variability between observers may be reduced by using the simpler 2-category grading of plaque morphology to interrogate full-color digitally stored images. Observer agreement should be audited regularly.
The accepted criterion for surgical selection of patients with symptomatic carotid disease is a stenosis of >70% reduction in lumen diameter of the internal carotid artery (ICA).1 2 Current imaging of patients with ICA disease has focused on the accurate measurement of the severity of ICA stenosis. Increasingly, this has been achieved solely by noninvasive duplex ultrasound with color flow mapping.3 This modality also provides images of the artery wall and atherosclerotic plaque morphology, but the latter has not been used for selection of patients for surgery, although it is becoming increasingly recognized as important in the development of symptoms.4 5 6 7 8 For example, several authors have reported that plaques in the carotid and coronary arteries associated with large lipid pools or soft extracellular lipid are more prone to rupture and produce emboli and therefore cause symptoms.5 6 7 8
Ultrasound has been used to evaluate plaque morphology, and, more recently, multicenter trials investigating patients with cerebrovascular disease have included plaque characterization in their protocols.9 10 The Asymptomatic Carotid Surgery Trial used a subjective method for ultrasound plaque characterization based on the approach described by Gray-Weale et al.11 The Asymptomatic Carotid Stenosis and Risk of Stroke study10 used the Gray-Weale method with additional categories for calcified and ulcerated plaques, while the European Carotid Plaque Study Group commented on both plaque echogenicity and surface characteristics.12 In all these studies, grading of plaque morphology was largely based on the ultrasound gray scale appearance, which was assessed subjectively by visual inspection of ultrasound images.
Thus, a range of criteria and classification schemes for assessing plaque morphology are currently in use, but there is no consensus on protocol or methodology between trials.11 12 13 14 15 Subjective grading has been compared with histology; however, a wide range of agreements has been reported.11 13 14 15 16 17 18 Furthermore, there are few studies on reproducibility, which is very important when amalgamating data in multicenter trials.19 In this study we determined the accuracy and intraobserver and interobserver agreement of plaque characterization performed by subjective, visual inspection of ultrasound images as currently used in clinical trials. We also investigated the influence of image storage media on image interpretation and reproducibility issues.
Subjects and Methods
Sixty patients undergoing carotid endarterectomy were entered into this study (mean age, 65±8 years; 21 female, 39 male); 51 were operated on for symptomatic and 9 for asymptomatic severe stenosis of the ICA. The mean percent diameter stenoses for these 2 groups were 79% (range, 60% to 99%) for symptomatic and 81% (range, 70% to 90%) for asymptomatic. All patients underwent duplex scanning of the ICA on the day before carotid surgery. Duplex examination was performed with a 4- to 7-MHz linear array transducer (HDI3000 Advanced Technology Laboratories). A standardized imaging format was adopted that included the following: linear gray scale map, dynamic range of 60 dB, medium persistence level, high line density, and medium frame rate. Optimum image focusing was achieved by positioning image focal zones at the anterior and posterior walls of the artery. B-mode ultrasound images of the ICA were obtained in longitudinal section to demonstrate the maximum extent of the atherosclerotic plaque. Color and power Doppler images were also recorded to assist in delineation of the plaque border. This gave a total of 60 cases from which images obtained were stored to either hard copy (n=27) or hard disk (n=33) for subsequent analysis.
Ultrasound images were subjectively graded by 2 methods (Figure⇓). The first method was according to the relative contribution of echogenic (high-intensity) and echolucent (low-intensity) material using the classification by Gray-Weale et al,11 as follows: type I, predominantly echolucent plaque with a thin echogenic cap; type II, substantially echolucent lesions with small areas of echogenicity; type III, predominantly echogenic lesions with small areas of echolucency; and type IV, uniformly echogenic lesions (equivalent to homogeneous).
The second subjective method classified the plaques as either homogeneous or heterogeneous. Homogeneous plaques had a relatively uniform texture and, compared with the adjacent adventitia, contained uniform echoes that were medium to high level. Heterogeneous plaque had mixed high-, medium-, and low-level echoes and contained at least one well-defined focal echolucent area. For this study, plaque echogenicity was compared with blood and adjacent adventitia. Low echoes were defined as those closely approaching that of blood, and medium to high echoes were those similar to or greater than adventitia or adjacent soft tissue. Plaques that were obscured because of acoustic shadowing were not included in this study.
Four observers examined the hard copy and digital images independently and were blinded to patient identification. The observers were ultrasound technologists with scanning experience ranging from 4 to 15 years and were listed according to their experience. All observers underwent an extensive training session before analysis of the images for this study to standardize the concepts of echogenicity and echolucency. Images were reexamined after a 1-month interval to determine intraobserver variability of the classification methods. Two observers reanalyzed images at 1, 5, and 6 months to determine short-term and long-term variability. All images were assessed off-line either on the same digital analysis system (8 bit, 256 gray scale levels) or on hard copy (Sony, Mavigraph color printer). The ambient lighting, magnification, and gain level of the image screen were maintained constant throughout the analysis.
The excised carotid specimens were stored in formalin, and histological examination was performed within 1 week of surgery (range, 1 to 7 days). Histological assessment of the plaque in the ICA was performed subsequently, and plaques were graded as homogeneous or heterogeneous and divided into 4 histological categories equivalent to the Gray-Weale classification. Statistical analysis was by κ, which determined the conformity in multicategorical data and simple statistics for calculation of percent agreement and accuracy. κ values of <0.20 indicated poor agreement, 0.21 to 0.40 fair agreement, 0.41 to 0.60 moderate agreement, 0.61 to 0.80 good agreement, and 0.81 to 1.0 very good agreement. Pooled κ (κp) was calculated as described by Fleiss20 and represented the overall interobserver and intraobserver κ for all modality and classification schemes.
The within-observer agreement for grading carotid plaques using only the B-mode ultrasound information ranged from 61±17% (κ=0.08 to 0.66) for the 2-category classification and gave an overall κ classification of moderate according to the κp value (Table 1⇓). The addition of color flow information improved the overall agreement to good (κp=0.65±0.08). The 4-category classification showed poor (κp=0.15±0.06) or fair (κp=0.34±0.06) agreement when B-mode or combined B-mode and color images were analyzed, respectively. The range of agreement values for the 4 observers was wide and tended to improve with the more experienced observers.
The intraobserver agreement for grading plaques from digital images showed increased κp values, indicating moderate agreement for B-mode alone and with color for both the 2-category (κp=0.45±0.08 and 0.49±0.10) and 4-category (κp=0.48±0.06 and 0.53±0.06) classifications (Table 1⇑).
Plaque morphology grading was repeated by observers 1 and 2 over a range of time periods (1 month, 5 months, and 6 months) for full-color images and demonstrated the following values for κ, standard error, and agreement for the 2-category classification: observer 1: 0.89±0.11 (96%), 0.65±0.18 (88%), 0.50±0.21 (85%); observer 2: 0.49±0.18 (77%), 0.42±0.16 (69%), 0.29±0.15 (62%).
Although the agreements were higher for observer 1, both observers showed a range of agreements over the time period studied.
Interobserver agreement between the 4 observers for the hard copy data is summarized in Table 2⇓. For B-mode images alone, κp values indicated moderate agreement (κp=0.52±0.07) for the 2-category approach, and, surprisingly, agreements were reduced to fair after the addition of color information (κp=0.28±0.08). Values for the 4-category system showed fair agreements with B-mode alone or with color (κp=0.28±0.05 and 0.24±0.05) .
Interobserver values for the digital images showed good to moderate agreement for the combined B-mode and color images according to both the 2-category (κp=0.62±0.07) and 4-category (κp=0.52±0.05) classification schemes (Table 2⇑). This was converse to the agreements demonstrated for the hard copy data, in which color increased variability between observers. The highest overall agreement was observed with the 2-category method for grading color flow mapping images displayed in digital format.
The correlation between histology and grading of plaque morphology from combined B-mode color flow images was poor to fair according to κ for the 4 observers (Table 3⇓). The overall accuracy of the Gray-Weale approach compared with histology was 30±10%, whereas the 2-category classification showed improved accuracy of 72±12%.
Measurement of ICA stenosis by duplex ultrasound has come under close scrutiny, and although the latter involves numerate measurement of velocity, sources of variability and protocol standardization remain contentious issues.21 Grading of plaque morphology by ultrasound is based on subjective evaluation and therefore is susceptible to variability in observer interpretation. Despite this, reports in the literature suggest that subjective grading of ultrasound plaques can be achieved with good reproducibility and high correlation with histology.19
The present study was conducted with standardized protocols for image acquisition and standardized hard copy and digital image displays; however, a reproducibility study did not fully support previous findings. Intraobserver agreement for the 2-category classification achieved only moderate agreement for digital B-mode images. The addition of color flow information improved delineation of the plaque border and increased agreement for hard copy data but not for digital images. The 4-category classification showed moderate agreement for digital images but poor to fair agreement for hard copy data. The reduced agreement for the latter may be due to the reduced contrast present in hard copy images. This would have the greatest effect on the 4-category classification, which required more subtle interpretation of the relative contribution of echogenic and echolucent material.
Intraobserver agreement was not consistent when studied over a 6-month period, indicating that a single evaluation of agreement is not necessarily representative of the individual’s reproducibility in grading images. Although all observers in this study were experienced in ultrasound imaging, only 1 observer achieved consistently good agreement levels after a training period of 3 months. A recent study demonstrated similar variability in within-observer agreement for grading plaque morphology but higher values for between-observer agreement.22
Interobserver variability showed a wide range of agreement values, with good agreement being achieved for color flow digital images graded according to the 2-category classification. The somewhat puzzling finding of lower intraobserver agreements compared with interobserver agreements may be explained by the variation in intraobserver agreement with time. Correlation with histology was disappointing, with almost no correlation in some instances; interestingly, higher values of accuracy were noted for the more experienced observers.
These findings suggest that the previously reported high values of agreement within and between observers are not easily reproducible. This may be due to one of several factors: the subjective nature of the classification process; the number of images that are routinely studied; the experience of the observers; the plethora of classification schemes; the variability in image storage and display media used; variations in image quality; the statistics used; and the distribution of plaque types within each study. This is illustrated by the improvements in agreement achieved by using the more simplistic 2-category classification and also by interrogating full-color digital data compared with hard copy images.
Comparison of results between studies is limited by the different statistical approaches used. For example, simple percentage values may suggest a good test, while more detailed κ analysis will reveal inadequacies. In a study on intravascular ultrasound images of coronary arteries, a high overall agreement of 95% was reported; however, κ values indicated good ability for identifying hard (κ=0.67) and soft (κ=0.61) lesions but an inability to detect lipid.23
The implications of this study are that results of plaque characterization in clinical studies and multicenter trials could be seriously flawed. Improvements to clinical trials could be achieved by standardizing image parameters, using test objects with a range of echogenicity and echolucency to eliminate individual scanner variations, interrogating full-color digitally stored images, and using the simpler 2-category classification scheme to grade plaque morphology. Observers should not participate in multicenter trials using subjective analysis of ultrasound images until they have achieved high levels of intraobserver agreement that are verified throughout the trial period.
We are grateful to the Stroke Association and Impra for funding this research.
- Received August 5, 1998.
- Revision received September 28, 1998.
- Accepted October 12, 1998.
- Copyright © 1999 by American Heart Association
Fuster V, Stein B, Ambrose JA, Badimon L, Badimon JJ, Chesebro JH. Atherosclerotic plaque rupture and thrombosis: evolving concepts. Circulation. 1990;82(suppl II):II-47–II-49.
Falk E. Why do plaques rupture? Circulation. 1992;86(suppl III):III-30–III-42.
Carr S, Farb A, Pearce WH, Virmani R, Yao JST. Atherosclerotic plaque rupture in symptomatic carotid artery stenosis. J Vasc Surg. 1996;23:755–766.
Sitzer M, Muller W, Siebler M, Hort W, Kniemeyer HW, Janke L, Steinmetz H. Plaque ulceration and lumen thrombus are the main sources of cerebral microemboli in high-grade internal carotid artery stenosis. Stroke. 1995;26:1231–1233.
Haliday AW, Thomas D, Mansfield A. The Asymptomatic Carotid Surgery Trial (ACST): rationale and design: Steering Committee. Eur J Vasc Surg. 1994;6:703–710.
Nicolaides AN. Asymptomatic Carotid Stenosis and Risk of Stroke: identification of a high risk group (ACSRS): a natural history study. Int Angiol. 1995;14:21–23.
Gray-Weale AC, Graham JC, Burnett JR, Bryne K, Lusby RJ. Carotid artery atheroma: comparison of preoperative B-mode ultrasound appearance with carotid endarterectomy specimen pathology. J Cardiovasc Surg. 1988;2:676–681.
European Carotid Plaque Study Group. Carotid artery plaque composition: relationship to clinical presentation and ultrasound B-mode imaging. J Vasc Endovasc Surg. 1995;10:23–30.
Hatsukami TS, Thackray BD, Primozich JF, Ferguson MS, Burns DH, Beach KW, Detmer PR, Alpers C, Gordon D, Strandness DE. Echolucent regions in the carotid artery: preliminary analysis comparing three-dimensional histologic reconstructions to sonographic findings. Ultrasound Med Biol. 1994;20:743–749.
Bock RW, Lusby RJ. Carotid plaque morphology and interpretation of the echolucent lesion. In: Labs KH, Jager KA, FitzGerald DE, Woodcock JP, Neuerberg-Heusler D, eds. Diagnostic Vascular Ultrasound. London, UK: Edward Arnold; 1992;21:254–263.
Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley & Sons Inc; 1981.
Eliaziw M, Rankin RN, Fox AJ, Haynes RB, Barnett HJM, for the North American Symptomatic Carotid Endarterectomy Trial (NASCET) Group. Accuracy and prognostic consequences of ultrasonography in identifying severe carotid artery stenosis. Stroke. 1995;26:1747–1752.
Joakimsen O, Bonaa KH, Stensland-Bugge E. Reproducibility of ultrasound assessment of carotid plaque occurrence, thickness, and morphology: the Tromso Study. Stroke. 1998;28:2201–2207.