How Good Is Intercenter Agreement in the Identification of Embolic Signals in Carotid Artery Disease?
Background and Purpose There has been concern regarding the reproducibility of the detection of embolic signals, particularly in patients with carotid artery stenosis in whom the signals are of low intensity. No published studies have examined intercenter agreement in reporting specific embolic signals or the factors responsible for any lack of agreement. We examined reproducibility between two centers in which widely differing proportions of embolic signals have previously been reported in patients with carotid artery stenosis.
Methods Recordings from the middle cerebral artery of eight patients with ipsilateral carotid artery stenosis in whom embolic signals had been detected during a previous study were independently examined by three experienced observers in one center and by one experienced observer in another center. We calculated agreement within and between centers by estimating the probability that one observer would identify a specific embolic signal if other observers had identified it (a probability of 1 indicates complete agreement). The influence of different characteristics of the embolic signal on the probability of its detection as an embolic signal was determined.
Results A high level of agreement in the identification of specific embolic signals was found. This was similar between all observers (.90), between the three observers in one center (.89), and between observers in the two different centers (.94). The probability of detection was independently related to the relative intensity of the embolic signal (P<.0001). It was less (although significantly) independently related to the posi-tion of the embolic signal in the cardiac cycle (P=.02), with signals in systole being more reliably detected. There was no independent relationship between the probability of detection and either the duration of the embolic signal or the velocity at the maximum intensity increase. The use of threshold intensity as a criterion for embolic signal detection increased inter-observer agreement but reduced the sensitivity in detecting signals.
Conclusions The high level of interobserver agreement suggests that the technique is sufficiently reproducible for clinical use.
The detection of asymptomatic embolic signals with the use of Doppler ultrasound has a number of potential applications in the management of patients with or at risk of cerebrovascular disease. It may help to select at-risk groups for appropriate pharmacological or surgical therapy, allow monitoring of the effect of pharmacological interventions, aid in perioperative monitoring during surgical and neuroradiological procedures, and allow new insights into the pathogenesis of cerebral embolization. A major concern has been that preliminary studies have reported greatly differing proportions of patients with embolic signals and different frequencies of embolic signals per hour in signal-positive patients. These differences are particularly marked in recordings from patients with carotid stenosis compared with patients with prosthetic cardiac valves. Embolic signals in the former group are of lower intensity and shorter duration.1 Initial studies in patients with symptomatic carotid stenosis have reported embolic signals in 20% to 95% of patients when recordings are from the ipsilateral middle cerebral artery.2 3 4 5 6 7 Before this technique can be used clinically, it is vital that the reasons for these differences be elucidated.
A number of factors may account for these differences. These include differences in patient populations, different criteria used to identify embolic signals, and different equipment characteristics. In patients with carotid stenosis, factors that correlate with the detection of embolic signals include the degree of stenosis2 8 and time since last symptoms.6 8 9 In addition, different treatment regimens were used in different studies; for example, in some studies heparin was used, whereas in others aspirin was used as primary treatment. Equipment characteristics may also be important. The relative intensity increase associated with an embolic signal produced by an embolus of known material will depend on gain and sample volume settings and is optimal with a low gain and small sample volume.10 Other important factors include transducer frequency and vessel diameter. Perhaps more importantly, embolic signals may not appear on the spectral display as a result of an inadequate fast Fourier transform (FFT) time frame overlap.11 This is a problem if computer processing power is insufficient and may be a problem with all but the most recent generation of computer-based transcranial Doppler machines.
Poor interobserver reproducibility may also play a major part. Small interobserver and intraobserver reproducibility studies performed within single centers have shown good reproducibility.4 11 However, a recent preliminary exercise between observers from a number of centers suggested disappointing results.12 The experience of these observers varied markedly, which may have contributed to the poor performance. In patients with prosthetic cardiac valves and embolic signals, a reasonable correlation between the number of embolic signals detected by different observers has been reported.13 However, the embolic signals in these patients are much easier to detect because they are more intense and of longer duration. Furthermore, this particular study counted the total number of embolic signals rather than identifying individual signals. This makes it impossible to determine whether the same observers detected the same embolic signals. There has been no detailed study published that analyzed agreement between different centers in the detection of specific embolic signals.
For these reasons, we have evaluated reproducibility in the detection of embolic signals both between and within centers and correlated it with various characteristics of the embolic signal to determine which particular type of embolic signals leads to poor interobserver reproducibility. We chose only recordings from patients with carotid artery disease because this is the area in which greatest variability has been reported.
Subjects and Methods
All recordings were made from the middle cerebral artery ipsilateral to a symptomatic carotid stenosis and were collected from eight patients in whom embolic signals had been detected by recordings in center 1 from a previous study of 38 patients.8 Permission from local hospital ethics committees had been granted for this project. Recordings were made on an EME TC2000 transcranial Doppler machine with a 2-MHz transducer, a sample volume of 10 mm, and a depth of 45 to 52 mm. The Doppler audio signals, which had been recorded onto digital audiotape, were analyzed by one observer from center 1 and three observers from center 2. The audio signal was recorded before any processing, including FFT. Analysis by center 1 used an EME TC2000 system with an FFT time window overlap of approximately 57%. Analysis by center 2 used an Embotec Doppler analysis system with an FFT overlap of 75%. Each observer analyzed the tapes blinded to the results of the other observers and recorded the exact time of any embolic event. All observers were experienced and applied standard criteria that had been used for recent studies of embolic signal detection in their respective laboratories,6 8 including a short-duration, high-intensity increase, predominantly unidirectional in the direction of flow with a characteristic sound. However, no strict threshold intensity or duration criteria were used to identify embolic signals. Artifact was identified as a bidirectional intensity increase, usually at low frequencies.
Each embolic signal was characterized by a number of measurements obtained from the FFT spectra. Relative intensity increase was calculated from the 10 log (maximum amplitude increase of embolic signal/amplitude of background signal), in which the background signal was measured from the average of three FFTs at the same point in the preceding or following cardiac cycle, at the same velocity. Duration was estimated from the number of time frames for which the relative intensity increase exceeded 4 dB; since each time frame lasted 5 milliseconds, the temporal resolution was 5 milliseconds. Velocity was defined as the peak intensity increase due to the embolic signal. We calculated position in the cardiac cycle by determining the relative position of the peak intensity increase in the cardiac cycle; the beginning of the upstroke of systole is indicated by 0 and the end of diastole is indicated by 1.0.
Observers only detected signals they believed to be embolic signals, and therefore there were no moments when all observers recorded no abnormality. Therefore, Cohen's κ would not provide meaningful results because the number of observations in which both observers would not detect an abnormality is unknown. Cohen's κ could be applied if the tape was divided into an equal number of short time periods and if for each of these the detection or nondetection of an embolic signal was recorded. However, the κ would be dependent on the duration of time periods: the shorter the time period, the greater the number of periods for which two observers would agree there is no embolic signal. The results could therefore be manipulated by altering the duration of the time interval. Therefore, we used an alternative method that is independent of the number of observations in which both observers would not detect an abnormality. This is an extension of the proportion of specific agreement we previously used to examine interrater performance between two observers11 14 to take into account the fact that there are more than two observers. We estimated the probability that a second observer will record an abnormality if the first observer records such an abnormality.15
The influence of different characteristics of the embolic signal on the probability of detection of an embolic signal was then calculated. We accomplished this by classifying observations according to whether they were detected by 1, 2, 3, or 4 observers. The independent variables examined were maximum relative intensity increase of embolic signal, duration of high-intensity embolic signal (>4 dB), velocity of peak intensity increase, and position in the cardiac cycle. Ordered logistic regression was then performed to determine which variables predicted detection.
Our results indicated good agreement in the detection of embolic signals between all observers, as shown in the Table⇓.
When we considered all four observers as one sample of observers, there was a high probability that if one observer detected an embolic signal, the second observer would also detect the same signal as representing an embolic signal (.90). A similar degree of agreement was found both between observers within an individual center (center 2) and between observers in the two different centers. In the analysis of results from center 2 only, the probability that if one observer detected a signal another observer would also detect a signal was .89. When we compared analysis by center 1 and analysis by center 2, the probability that if center 1 detected an embolic signal, observers in center 2 would also detect an embolic signal was .94.
Factors Determining Probability of Detection as an Embolic Signal
When each parameter was entered into the regression independently, relative intensity increase, duration, and position in the cardiac cycle all predicted the probability of observers identifying an embolic signal. Values of R2 and probability were as follows: relative intensity increase, R2=.22, P<.0001; duration, R2=.014, P<.0001; position in the cardiac cycle, R2=.02, P<.02; velocity, R2=0, P=.9. However, the different parameters were highly correlated, particularly intensity and duration (r=.66, P<.0001). When multiple backward ordered logistic regression was performed, only relative intensity of embolic signal and its position in the cardiac cycle independently determined the probability of the observers detecting an embolic signal. The relative intensity increase was the most important factor, with interobserver reproducibility worse for low-intensity signals (P<.0001). Position in the cardiac cycle was inversely related to probability of detection (P=.02); ie, embolic signals occurring later in the cardiac cycle were less likely to be detected.
In view of the dependence of probability of detection on intensity, an analysis was performed in which a threshold intensity was used as a criterion for the identification of an embolic signal. The probability of more observers detecting an embolic signal progressively improved with increasing threshold intensity: >4 dB, .936; >5 dB, .962; >6 dB, .976; and >7 dB, .995. Correspondingly, sensitivity fell as the threshold intensity rose. Assuming that an embolic signal detected by any observer represented an embolic signal, sensitivity was as follows: no threshold cutoff, 100%; >4 dB, 92%; <5 dB, 79%; >6 dB, 68%; and >7 dB, 49%.
Our results show a good correlation between analysis of embolic signals both between operators within one center and between two centers. It is interesting that the reproducibility between centers was as good as that within center 2 in our present evaluation. These results are striking, given the wide variation found when we examined recordings from patients with symptomatic carotid artery disease in our two laboratories; center 1 found embolic signals in only 24% of subjects, whereas center 2 found them in 82% of subjects. This suggests that, at least in our population, factors other than observer variability account for this difference. All observers were experienced and had previously performed intercenter reproducibility studies with good results.4 11 The fact that observers did not use rigidly defined criteria but there was good agreement indicates that experienced observers use similar criteria in practice.
In this study we tried to reduce other technical and equipment factors that might account for any difference. Although separate Doppler analysis systems were used in the two centers, both used a reasonable degree of FFT time frame overlap. It has previously been shown that 57% overlap results in no embolic signals being missed because they arrive between two time windows, and the minimum used in this study was 57%.
Nevertheless, despite the good correlation found in our study, the different observers disagreed on a number of embolic signals. It is important that such disagreements are reduced if large multicenter studies that evaluate the technique are established. We found that by far the most important factor in determining the probability that observers would detect an embolic signal was the relative intensity increase of the embolic signal. This suggests that the use of a higher threshold as a criterion for detecting an embolic signal may lead to greater reproducibility. We tested this hypothesis and found that the probability of more observers detecting an embolic signal progressively improved as we increased the threshold intensity from 4 to 7 dB. However, this will reduce sensitivity, and only half of all embolic signals were detected with a threshold intensity of 7 dB. It is likely that different threshold intensities will be required when recordings are made under different conditions. For example, in patients in whom embolic signals are frequent, such as those undergoing cardiopulmonary bypass or with metallic prosthetic cardiac valves, a slightly lower sensitivity is acceptable, and the use of a higher threshold of perhaps 5 dB will increase specificity. In contrast, under conditions in which embolic signals are less frequent, such as asymptomatic carotid stenosis, a threshold of 3 or 4 dB may be necessary to allow adequate sensitivity. When our data are compared with those obtained with the use of other systems, it is important to note that the relative intensity increase will be dependent on the method of measurement, specifically whether the peak intensity or area under the curve is measured and whether the background intensity is measured as the mean overall velocity (or frequency) or only at the velocity of the embolic signal peak intensity increase, as in our study. The decibel measurements for the embolic signals in this study are markedly higher with the use of certain commercial transcranial Doppler software (eg, EME Pioneer and DWL MultidopX), and therefore the corresponding threshold intensities used would be higher. Individual researchers need to perform a similar validation analysis to determine an appropriate threshold intensity for their system.
It is commonly observed that embolic signals with an intensity increase that is maximal at a low velocity are more difficult to distinguish from artifact, which usually produces a bidirectional intensity increase that is also maximal at low velocities. However, we found no correlation between the probability of detecting an embolic signal and the velocity of the peak intensity increase. In contrast, we did find an inverse correlation with position in the cardiac cycle. Therefore, embolic signals that occurred late in diastole were more likely to be detected variably, although compared with relative intensity increase this was a weak effect. This may reflect the common observation that there is often more Doppler “speckle” at this point in the cardiac cycle.
In conclusion, these results suggest that with current criteria and modern equipment, good reproducibility can be obtained between observers in different centers for patients with carotid stenosis. This is encouraging and suggests that the technique is sufficiently reproducible to allow multicenter studies to examine the prognostic implications of asymptomatic embolic signals and the effect of treatments on their frequency. However, individual centers need to perform similar studies to validate their interpretation of the criteria for embolic signal detection.
Note: The authors are willing to supply copies of the original digital audiotape and a copy of results showing the level of agreement for each embolic signal if other centers wish to perform a similar validation study.
This study was supported in part by a British Heart Foundation project grant (Dr Markus) and a Deutsche Forschungsgemeinschaft grant (project No. SI370/1-4) (M.S.). We are grateful to technicians Frahm and Hache for reviewing the tapes.
- Received January 8, 1996.
- Revision received March 11, 1996.
- Accepted March 11, 1996.
- Copyright © 1996 by American Heart Association
Grosset DG, Georgiadis D, Kelman AW, Lees KR. Quantification of ultrasound emboli signals in patients with cardiac and carotid disease. Stroke.. 1993;24:1922-1924.
Babikian VL, Hyde C, Pochay V, Winter MR. Clinical correlates of high-intensity transient signals detected on transcranial Doppler sonography in patients with cerebrovascular disease. Stroke.. 1994;25:1570-1573.
Ries S, Schminke U, Daffertshofer M, Hennerici M. Emboli detection by TCD in patients with cerebral ischaemia. Cerebrovasc Dis. 1994;4(suppl 3):22. Abstract.
Siebler M, Kleinschmidt A, Sitzer M, Steinmetz H, Freund HJ. Cerebral microembolism in symptomatic and asymptomatic high-grade internal carotid artery stenosis. Neurology.. 1994;44:615-618.
Valton L, Larrue V, Arrue P, Kurczewski A, Geraud G, Bes A. Asymptomatic cerebral embolic signals in patients with carotid stenosis: correlation with the appearance of plaque ulceration on angiography. Stroke.. 1995;26:813-815.
Markus HS, Thomson N, Brown MM. Asymptomatic cerebral embolic signals in symptomatic and asymptomatic carotid artery disease. Brain.. 1995;118:1005-1011.
Van Zuilen EV, Mauser HW, van Gijn J, Ackerstaff RGA. The relationship between cerebral microemboli and symptomatic cerebral ischaemia: a study of transcranial Doppler monitoring. Cerebrovasc Dis. 1994;4(suppl 3):20. Abstract.
Droste DW, Markus HS, Brown MM. The effect of alterations in ultrasound power, gain and sample volume on the appearance of emboli studied in a transcranial Doppler model. Cerebrovasc Dis.. 1994;4:152-156.
Microembolism Research Group Consensus Meeting. 8th International Symposium on Cerebral Haemodynamics; September 27, 1994; Munster, Germany.
Georgiadis D, Kaps M, Siebler M, Hill M, Konig M, Berg J, Kahl M, Zunker P, Diehl B, Ringlestein EB. Variability of Doppler microembolic signal counts in patients with prosthetic cardiac valves. Stroke.. 1995;26:439-443.
Fleiss JL. Statistical Methods for Rates and Proportions. 2nd ed. New York, NY: John Wiley & Sons, Inc; 1981:212-214.
Bland JM, Altman DG. Statistical Approaches to Medical Management. Oxford, UK: Oxford University Press. In press.