Intercenter Agreement in Reading Doppler Embolic Signals
A Multicenter International Study
Background and Purpose Different frequencies of asymptomatic Doppler embolic signals have been reported in studies. There has been concern that different criteria for identification may account for some of this variation. A previous reproducibility study between two centers found good agreement, but no studies among large numbers of centers have been performed. We performed an international reproducibility study among nine centers, each of which had published recent studies of embolic signal detection in peer-reviewed journals.
Methods Each center performed blinded analysis of a taped audio Doppler signal composed of transcranial Doppler middle cerebral artery recordings from 6 patients with symptomatic carotid artery stenosis. The exact time of any embolic signal was recorded. Six centers also measured the intensity increase of any embolic signals detected. Interobserver agreement was determined by a method based on the proportion of specific agreement.
Results Seven centers reported between 39 and 55 signals, but one center reported 142 embolic signals. The probability of agreement between observers was .678, which rose to .791 when the data from the highest reporting center were excluded. Introducing a decibel threshold resulted in a significant increase in the probability of agreement; a decibel threshold of >7 dB resulted in a probability of agreement of .902. Intensity measurements made by different centers were usually highly correlated, but this was not always the case, and 3 of the 15 correlations were not significant. The absolute values of the intensities measured varied between centers by as much as 40%.
Conclusions Although most centers report similar numbers of embolic signals, some use less specific criteria and report more events. The use of a decibel threshold improves reproducibility. However, intensity thresholds developed by one center cannot be directly transferred without validation to another center; differing methods of measurement are being used, and this results in different intensity values for the same embolic signals, even when the same equipment is used.
Asymptomatic embolic signals detected using Doppler ultrasound are frequent in patients with carotid artery disease and may represent markers of disease activity, allowing for the selection of a subgroup of patients at high risk of stroke. These signals correlate with known markers of increased stroke risk, being more frequent in patients with more severe stenosis, ie, with symptomatic compared with asymptomatic stenosis,1 2 and with plaque ulceration detected histologically or on angiography.3 4 A small study has suggested that these signals may allow the identification of patients with increased stroke risk.5 However, a major concern has been the different numbers of embolic signals reported by different groups in patients with carotid artery stenosis. Although a number of differences in patient populations, such as differing degrees of stenosis, differing times from symptoms, and different treatment regimes, may account for some of these differences, there has been concern that different centers may use different criteria in identifying embolic signals. A recent study reported excellent agreement between observers in two centers.6 However, it is possible that agreement between a larger number of unselected centers may not be as good. Therefore, we performed an international interobserver agreement study among nine centers, each experienced in embolic signal detection. All centers had published, in peer-reviewed journals, recent studies using embolic signal detection.
Subjects and Methods
Each center performed blinded analysis of a single 2-hour digital audiotape. This was made up of 20-minute recordings from the middle cerebral artery of 6 patients, ipsilateral to a symptomatic carotid stenosis. Recordings had all been made on an EME transcranial Doppler machine (TC2000) with a 2-MHz transducer, a sample volume of 10 mm, and an insonation depth of 45 to 52 mm. Eight centers used EME transcranial Doppler machinery (Pioneer or TC2000), and one center used an additional signal processor to improve fast Fourier transform (FFT) overlap (center 1). Each center played the Doppler audio signal recorded on the digital audio tape back into their transcranial Doppler machine on the highest sweep speed to achieve the greatest degree of FFT time window overlap. Observers were instructed to use criteria based on those previously published in a consensus statement,7 with simultaneous use of both the spectral display and the audio data but without any specified intensity (decibel) threshold, and to record those signals that they would have included in their published data. Observers listed the exact time of any embolic signal.
No intensity threshold was specified, but centers were asked to record the relative intensity increase of any embolic signals. Three centers did not measure relative intensity increase. The remaining six centers used two general types of methods of measurement of relative intensity increase. First, the intensity increase of the embolic signal was calculated from the color-coded intensity scale on the screen. This method was used by centers 2, 5, and 9. The intensity of the embolic signal relative to a reference region in the spectra not including an embolic signal was determined. However, the reference regions used by each center were not necessarily identical. Second, the intensity of each embolic signal was calculated using automatic embolus software supplied with the machine. The algorithm determines the power of the embolic signal over the whole spectral line; similarly, the background power is calculated over the whole spectral line using a running average of background intensity over the preceding spectral lines. This method was used by centers 3, 6, and 8.
There were no moments when all observers recorded no abnormality, because the method of data collection did not allow for this. Therefore, we used a method of analysis that is independent of the number of observations, in which both observers would not detect an abnormality. This is an extension of the proportion of specific agreement.7 However, in this analysis we have taken into account that there are more than two observers.7 We estimated the probability that a second observer would record an abnormality if the first observer (or observers) recorded such an abnormality, as has previously been used to examine interrater agreement in embolic signal detection.7 This allows calculation of the probability that a specified observer will identify an embolic signal compared with the performance of one or more of the other observers. We determined the effect of introducing a signal intensity threshold (decibel threshold) on the agreement between observers by repeating the probability analysis only on signals above specific decibel thresholds. For this analysis, decibel thresholds determined by the coordinating center that had prepared the data were used, along with the first of the two measurement methods mentioned above. The peak intensity increase of the embolic signal was determined from the color-coded spectral display and compared with that of background spectra at the same frequency and the point of the preceding or subsequent cardiac cycle. During analysis, if the coordinating center had not identified an embolic signal at a point at which another center had detected one, that time point on the tape was reexamined. If an embolic signal could be then identified, its intensity was recorded. If no obvious embolic signal could be identified at that time, the maximum intensity increase in the spectral display at that time point was measured.
Centers reported between 39 and 62 embolic signals, except for one center, which reported 142 embolic signals (Table 1⇓). The probability of agreement between observers was .678. One center reported many more embolic signals than the other eight centers, and when analysis from this center was omitted, the probability of agreement rose to .791.
There was a highly significant relationship between the number of observers identifying any specific embolic signal and the intensity of that embolic signal (Spearman’s ρ=.837, P<.0001) (see Figure⇓). Consistent with this, introducing a decibel threshold resulted in a significant increase in the probability of agreement (see Table 2⇓). Using a decibel threshold of >6 dB resulted in a probability of agreement of .902, similar to that reported in previous studies.6
Six centers were able to perform intensity measurements on embolic signals. Correlation between measurements made by different centers is shown in Table 3⇓. The correlation between most centers was good, but this was not always the case, and 3 of the 15 correlations were not significant. In addition, the absolute values of the intensities measured varied between centers, with some centers recording significantly higher intensities. Analysis was performed of the measured intensities of the 23 embolic signals detected by all centers (Table 4⇓). As can be seen, there was a 40% difference in mean measured intensities between highest and lowest centers.
Previous studies have demonstrated that good within-center and between-center agreement can be obtained in the reporting of embolic signals.2 7 8 The present study demonstrates that although there is good agreement between many centers, some centers report markedly different numbers of embolic signals, with one center reporting over four times the number reported by the lowest reporting center. From review of the exact times at which embolic signals were identified by some but nor other centers, it was clear that the discrimination of high-intensity artifacts from embolic signals was performed consistently between different centers. The major disagreement arose in the interpretation of small increases in signal intensity, and different centers appeared to have different thresholds at which they reported an intensity increase to be an embolic signal. Consistent with this, there was a highly significant association between the intensity of an embolic signal and the number of observers identifying it. Relatively small increases in signal intensity, which many centers believed to represent random Doppler speckle, were reported as embolic signals, particularly by one center (center 4). Despite these differences, it was clear that there was a very marked improvement in intercenter agreement if a decibel threshold was used. Satisfactory probabilities of agreement of between .87 and .9 were achieved using decibel thresholds of 6 and 7 dB. It is of note that previous analyses of the intensity of episodes of random Doppler speckle in normal volunteers using a similar method of intensity measurements have demonstrated increases of 6 or occasionally 7 dB.9 This suggests that using this method of intensity measurement a threshold of 7 dB will exclude most episodes of “normal” Doppler speckle.
Our results showed marked differences in the measured intensity of the same embolic signals between different centers. Although there was a good correlation between values in many centers, there was little or no correlation in a minority of cases. Furthermore, even where there was good correlation, the absolute values of intensity measurements were quite different. This has important implications if an intensity threshold is to be used. It implies that a threshold validated in one center cannot be used by another center, even using the same equipment, unless cross validation is performed. These differences are not surprising, because intensity measurements are crucially dependent on the method of measurement.9 Intensity is calculated from the logarithm of the ratio of the power increase associated with the embolic signal to that of the Doppler spectrum in the absence of any embolic signal. The power increase associated with the embolic signal may have been measured in a number of ways. These may include the peak increase at one velocity, the area under the power increase measured both across frequencies and across time, and the power increase along the whole spectral line, taking into account the power increase of the embolic signal and also of the background Doppler spectrum at other velocities. Similarly, the background power may be measured in a number of ways. The background power can be measured at the same velocity or at all velocities, at the same point of the cardiac cycle or averaging across the whole cardiac cycle, and only within the Doppler spectrum or along the whole spectral line. For example, the background intensity is higher in diastole than in systole; therefore, the position of the cardiac cycle will alter measurements. Similarly, if the whole spectral line is used, for technically poor recordings with artifactual noise, the background power will appear higher, resulting in a lower intensity increase of the embolic signal. In a recent analysis, the use of a 7-dB threshold developed using one standard method of measurement resulted in 95% detection of embolic signals, whereas using another method on the same data on the same equipment resulted in only ≈50% detection.9 In clinical practice, differences may be even greater as the intensity increase of an embolic signal will depend on recording parameters such as sample volume.
Our results have a number of important implications. First, differences in reported frequencies of embolic signals in different studies may at least partly reflect different criteria for their identification. Second, the previously suggested consensus criteria for the identification of embolic signals7 are not sufficiently precise to result in reproducible identification of embolic signals in different centers. Third, the use of a decibel threshold will result in a marked improvement in interobserver agreement, and using a threshold set at the upper limit of random episodes of Doppler speckle in normal subjects will result in good probability of agreements on the order of .9. This seems a reasonable approach to improving reproducibility. However, our results demonstrate that different centers measure intensity in very different ways and that standardization of methods is required if a single-decibel threshold is to be used in consensus criteria. Until this occurs, each center needs to select an appropriate decibel threshold that is likely to be individual to their equipment, recording settings, and method of measurement. The use of validated automated detection systems may improve this area. Fourth, until validated automated detection systems are available, multicenter studies examining the predictive value of embolic signals should include blinded analysis of data by a single central reading center.
This study is supported in part by a project grant from the British Heart Foundation (Dr Markus).
- Received March 13, 1997.
- Revision received April 24, 1997.
- Accepted April 28, 1997.
- Copyright © 1997 by American Heart Association
Siebler M, Kleinschmidt A, Sitzer M, Steinmetz H, Freund HJ. Cerebral microembolism in symptomatic and asymptomatic high-grade internal carotid artery stenosis. Neurology. 1994;44:615-618.
Markus HS, Thomson N, Brown MM. Asymptomatic cerebral embolic signals in symptomatic and asymptomatic carotid artery disease. Brain. 1995;118:1005-1011.
Sitzer M, Muller W, Siebler M, Hort W, Kniemeyer HW, Jancke L, Steinmetz H. Plaque ulceration and lumen thrombus are the main sources of cerebral microemboli in high-grade internal carotid artery stenosis. Stroke. 1995;26:1231-1233.
Valton L, Larrue V, Arrue P, Geraud G, Bes A. Asymptomatic cerebral embolic signals in patients with carotid stenosis: correlation with the appearance of plaque ulceration on angiography. Stroke. 1995;26:813-815.
Siebler M, Nachtmann A, Sitzer M, Rose G, Kleinschmidt A, Rademacher J, Steinmetz H. Cerebral microembolism and the risk of ischemia in asymptomatic high-grade internal carotid artery stenosis. Stroke. 1995;26:2184-2186.
Markus HS, Bland JM, Rose G, Sitzer M, Siebler M. How good is intercenter agreement in the identification of embolic signals in carotid artery disease? Stroke. 1996;27:1249-1252.
Consensus Committee of the Ninth International Cerebral Hemodynamic Symposium. Basic identification criteria of Doppler microembolic signals. Stroke. 1995;26:1123.
Siebler M, Sitzer M, Rose G, Bendfeldt D, Steinmetz H. Silent cerebral embolism caused by neurologically symptomatic high-grade carotid stenosis: event rates before and after carotid endarterectomy. Brain. 1993;116:1005-1015.
Markus HS, Molloy J. The use of a decibel threshold in detecting Doppler embolic signals. Stroke. 1997;28:692-697.