Automatic Embolus Detection Compared With Human Experts
A Doppler Ultrasound Study
Background and Purpose Transcranial Doppler ultrasound (TCD) reliably detects the occurrence of microembolic signals (MES). Unfortunately, TCD monitoring is a time-consuming and mentally strenuous procedure. The purpose of this study was to assess whether automatic embolus detection software devices acting as a “stand-alone system” are able to identify MES in patients with solid cerebral microemboli.
Methods Ten records of TCD monitoring of the middle cerebral artery in patients with symptomatic high-grade carotid artery stenosis were analyzed for the moments at which MES occurred by four observers and three automatic detection software devices (RB11 on TC2000, Pioneer Version 2.10, and Embotec). The results of the three software systems were assessed on the basic assumption that MES were present if at least three of the four observers agreed.
Results The average number of 1-second periods in which MES were detected by the four observers per tape ranged from 5 to 39. The overall κ values (and SEs) for chance-corrected interobserver agreement between the four observers ranged from .94 (.02) to .99 (.01). The agreement between the software devices and the observers was lower, with κ values (and SEs) ranging from .18 (.17) to .93 (.07). The RB11 and Embotec systems achieved a κ value higher than 0.4 in all tapes. The Pioneer system failed to reach a κ value of 0.4 in three tapes. The RB11 showed a sensitivity of 70% for detecting MES, the Embotec 62%, and the Pioneer 44%.
Conclusions In patients with symptomatic high-grade carotid artery stenosis, a high degree of agreement in the detection of moments of MES can be achieved between observers. The three automatic detection software devices reached less agreement. Supervision of TCD monitoring and assessment of MES by an experienced observer is still necessary.
The search for pathophysiological factors of cerebral ischemia has recently led to the introduction of microembolus detection by TCD.1 This technique reliably detects the occurrence of MES that clearly differ from the normal blood flow spectral waveform. The similarity between these Doppler signals in patients and in animals in which embolic material had been introduced suggests that MES represent embolic particles.2 3 4 Furthermore, there is a strong correlation between the number of MES in the MCA and symptomatic atherosclerosis of the ipsilateral carotid artery.5 6 7 8 9
Unfortunately, TCD monitoring is a time-consuming and mentally strenuous procedure. The large number of patients who are candidates for this examination has led to the development of systems aimed at detecting MES automatically with rejection of artifacts or other changes of the Doppler signal. The purpose of this study was to assess whether automatic embolus detection software systems acting as “stand-alone systems” can adequately identify MES in patients with cerebral microemboli associated with carotid artery stenosis. The results of these automated analyses were compared with the separate assessment of MES by human experts, which is still regarded as the most accurate method of microembolus detection.
Materials and Methods
Ten different records of TCD monitoring of the MCA in as many patients with an ipsilateral symptomatic high-grade carotid artery stenosis were analyzed. TCD monitoring was performed by means of a TC2-64 (EME) with a 2-MHz monitoring transducer. The monitoring periods were recorded on the audio channels of a stereo VHS recorder and lasted 30 minutes, with the exception of tapes 3 and 9 with 20 minutes of recording. The tapes contained different numbers of MES as well as various types of artifacts such as probe movements, speaking, and snoring.
Four observers (E.V. Van Z., W.H.M., C.J., R.G.A.A.) independently performed analysis of the tapes for the occurrence of MES using a TC2000 (EME). This TCD system was equipped with a 386 processor and used a 128-point color-coded FFT. Scale settings were as close as possible to the settings of the original Doppler recordings. The sweep was kept constant and corresponded with a time window overlap of 46% or 66%, depending on the velocity scale. Specially designed software (EME) was used for determining duration, frequency, and power characteristics of the Doppler signals. According to established criteria,10 MES were identified on the basis of their typical musical sound, a duration of less than 300 milliseconds, an amplitude exceeding the background signal by at least 3 dB, and a unidirectional appearance within the Doppler velocity spectrum. The actual moments at which MES occurred were identified, instead of the absolute number of signals on each tape.
The 10 tapes were also analyzed by three automatic detection software systems: RB11 software on the TC 2000 (EME), automatic event detection software (version 2.10) on the Pioneer (EME), and the neural network of Embotec (STAC). All software systems were used with the standard settings of the latest available versions, especially with regard to embolus detection. The gain of the input signal was adjusted at the beginning of the analysis and, if necessary, readjusted during the analysis to achieve the best possible envelope of the Doppler spectrum. Since all MES were within the dynamic range of the TC2-64 Doppler system, overload of the preamplifier could be excluded as a cause of technical problems of automated embolus detection.11 The software analysis was performed only once, because this resembles clinical practice.
Description of the Automatic Detection Software Systems
The RB11 software on the TC2000 provides a computerized embolus detection algorithm, which calibrates itself at the start of the examination. The trigger level and artifact rejection level are set according to the power of the Doppler signal. The algorithm relies on the characteristic bell-shaped increase in the relative power occurring with an embolus. If the power ratio between a high-intensity signal and the background signal exceeds the trigger level and if the difference between the maximum power above and below the zero line remains below the artifact rejection level, the signal will be counted as an embolus. If the second condition is not fulfilled, the signal is classified as an artifact.
Pioneer Version 2.10 (EME)
This device includes software that counts “events,” defined as a sudden increase in the power of the ultrasound signal expressed in decibels, by means of two algorithms. If a signal exceeds the minimal threshold level, it is further calculated by the artifact rejection algorithm. This algorithm compares the signal intensity above and below the zero line with the mean background signal. Signals that exceed the artifact rejection level are omitted; the other signals are counted as events.
A three-layer neural network, consisting of 512 input neurons, 10 hidden-layer neurons, and 3 output neurons is trained through a database of FFT Doppler spectra. These Doppler spectra are analyzed for the presence of MES by human observers. The 3 output neurons correspond to the network decisions “microembolus,” “artifact,” or “normal.” The output value of “microembolus” represents a network estimation of the similarity in FFT between an input signal and the MES in the database.
To localize the moments at which MES occurred, the running time of the tapes was subdivided into periods of 1 second. The agreement of MES between the four observers was assessed with Cohen's κ values. Cohen's κ provides a correction for chance agreement, which makes it superior to other indices of interobserver agreement. κ values may vary between 1 (complete agreement) and −1 (complete disagreement); zero represents agreement similar to that expected by chance. A common guideline is that a κ value of .75 or more indicates excellent agreement, whereas κ values from .4 to .75 indicate fair to good agreement and values of .4 or less poor agreement.12 κ values only slightly depend on the number of correct negative counts, which is mostly high in these studies.
Agreement between the observers was considered the “gold standard” for the automated procedures. Comparison with the three software systems was performed on the basic assumption that a moment of MES was present if at least three of the four observers agreed. All other moments of MES detected by the software systems were considered false-positive. For each system the sensitivity (true-positive rate) was calculated for each of the 10 tapes and overall. Because of the high proportion of correct negative MES in this study, the specificity would be unrealistically high and the false-positive rate very low. Instead, the false-positive moments of MES were expressed as the proportion of the sum of true-positive and false-positive moments of MES detected by the software systems. Furthermore, κ values were calculated to assess the agreement between any of the three software systems and the human gold standard. For each of the human and artificial observers, a χ2 test was used to evaluate the hypothesis that the κ values in the separate tapes are equal.12
The four observers found a mean number of moments of MES (and SD) ranging from 5 (.8) on tape 6 to 39 (.5) on tape 4. The κ values and their SEs for the agreement between the observers in each of the 10 tapes are given in Table 1⇓. The overall κ values and SEs ranged from .94 (.02) to .99 (.01). The χ2 test showed no differences in κ values between the separate tapes in each of the four observers. Table 2⇓ lists the results of the comparison between the human gold standard (agreement on moments of MES between at least three observers) and the three automatic embolus detection software systems. The agreement between the software devices and the observers' collective was lower than the agreement between the observers, with κ values (and SEs) ranging from .18 (.17) to .93 (.07). The RB11 and Embotec achieved a κ value above .4 in all 10 tapes, whereas the Pioneer failed to reach a κ value of .4 in three tapes. For each of the three software systems, the χ2 test showed a significant difference in κ values between the separate tapes (RB11 P<.001, Pioneer P<.001, Embotec P<.05).
The RB11 achieved an overall sensitivity of 70% for detecting moments with MES for all tapes, the Embotec 62%, and the Pioneer 44%. The Figure⇓ illustrates the number of false-positive moments of MES calculated as a percentage of the sum of true-positive and false-positive moments of MES detected by the three software systems in each tape. The RB11 detected more false-positive moments of MES than true-positive moments of MES in tapes 2 and 8, as did the Pioneer and Embotec in tape 5.
The present study shows that for the detection of moments with MES in symptomatic high-grade carotid artery stenosis, a high degree of interobserver agreement can be achieved between experienced observers. This confirms the reproducibility of a technique that might become standard in the evaluation of cerebrovascular disease. Recently, good agreement has been reported between human and computerized analysis of MES.11 13 14 In contrast, we found less agreement between the automatic embolus detection software devices and the human experts. One reason for this disparity might be that in our study only patients with high-grade carotid artery stenosis were included. Most of the earlier studies have been done in patients with prosthetic heart valves and under in vitro conditions, with relatively large experimental emboli; in these circumstances MES have a higher intensity and a longer duration than in patients with carotid artery disease.15 The temporal resolution of the automatic detection devices might not be high enough to detect all short-duration low-intensity MES derived from microemboli in carotid artery disease,16 whereas human observers have the advantage of hearing the typical sound characteristics of a microembolus in the audio Doppler signal. In the present study the percentage of time window overlap was not always sufficient to visualize all MES on the spectral display. Therefore, it might be expected that human observers reach better agreement.
Another reason that our results differ from those in previous studies13 14 on interobserver agreement might be that we identified the actual moments of MES on the tapes rather than the absolute number of MES. This method was chosen to ensure that the measurement of agreement is based on identification of the same signals, and we argue that this approach is the most correct in assessing the performance of an automatic embolus detection device. For example, the RB11 indicated a total number of 159 MES that consisted of 101 true-positive counts and 58 false-positive counts; according to the observers' collective, the actual number of MES was 145 (Table 2⇑). Otherwise, one might have concluded that all the actual MES as well as an additional 14 false-positive ones were detected. Obviously, this would result in markedly different conclusions about the accuracy of this embolus detection device.
The automatic detection software systems failed to achieve high sensitivity. However, it is not really important to produce artificial systems that detect MES with 100% sensitivity as long as the detection rate of MES is reproducible. The RB11 detected more than half of the moments with MES in all tapes, and the Embotec failed in only one tape, thus providing useful information about the presence or absence of these signals as well as a reasonable estimate of the absolute number in most of the tapes. In contrast, the Pioneer in its default settings missed more than 50% of the moments of MES in half of the tapes.
The three software devices use different methods for embolus detection. The Embotec neural network is capable of classifying patterns after learning typical examples from a database.14 Therefore, the system is biased and especially set at identifying MES that are identical to those from the training database. The RB11 has been developed in in vitro conditions.17 MES are predominately recognized by their typical bell-shaped increase in relative power. The larger proportion of false-positive MES detected by this device compared with the other two systems suggests difficulties in distinguishing the spectral characteristics of emboli from those of artifacts. The Pioneer uses a less sophisticated detection algorithm but showed a favorably low proportion of false-positive MES. All software systems provide the possibility of changing the settings of the detection algorithm parameters; we used only the standard settings.
Although in this study the three software devices for automatic embolus detection reached less agreement than the four human experts, the results are promising, and it is likely that ongoing research and development will further improve the performance of the systems. At minimum, they may be important to help reduce the vast amount of data. In future designs of TCD equipment and embolus detection software, a high temporal resolution is an important feature. The Wigner signal analysis might be a useful alternative.18 The present study shows that supervision of TCD monitoring and assessment of MES by experienced observers is still necessary.
Selected Abbreviations and Acronyms
|FFT||=||fast Fourier transform|
|MCA||=||middle cerebral artery|
|TCD||=||transcranial Doppler ultrasound|
- Received March 1, 1996.
- Revision received June 13, 1996.
- Accepted June 18, 1996.
- Copyright © 1996 by American Heart Association
Spencer MP. Detection of cerebral arterial emboli. In: Newell DW, Aaslid R, eds. Transcranial Doppler. New York, NY: Raven Press; 1992:215-230.
Russell D, Madden KP, Clark WM, Sandset PM, Zivin JA. Detection of arterial emboli using Doppler ultrasound in rabbits. Stroke. 1991;22:253-258.
Kessler C, Kelly AB, Suggs WD, Weissman JD, Epstein CM, Hanson SR, Harker LA. Induction of transient neurological dysfunction in baboons by platelet microemboli. Stroke. 1992;23:697-702.
Siebler M, Sitzer M, Steinmetz H. Detection of intracranial emboli in patients with symptomatic extracranial carotid artery disease. Stroke. 1992;23:1652-1654.
Grosset DG, Georgiadis D, Abdullah I, Bone I, Lees KR. Doppler emboli signals vary according to stroke subtype. Stroke. 1994;25:382-384.
Siebler M, Kleinschmidt A, Sitzer M, Steinmetz H, Freund HJ. Cerebral microembolism in symptomatic and asymptomatic high-grade internal carotid artery stenosis. Neurology. 1994;44:615-618.
Ries S, Schminke U, Daffertshofer M, Schindlmayr C, Hennerici M. High intensity transient signals and carotid artery disease. Cerebrovasc Dis. 1995;5:124-127.
Van Zuilen EV, Mauser HW, Algra A, Van Gijn J, Ackerstaff RGA. TCD detection of microemboli in symptomatic and asymptomatic high-grade carotid artery stenosis. J Neuroimaging. 1995;suppl 2:S63. Abstract.
Consensus Committee of the Ninth International Cerebral Hemodynamic Symposium. Basic identification criteria of Doppler microembolic signals. Stroke. 1995;26:1123.
Markus H, Loh A, Brown MM. Computerized detection of cerebral emboli and discrimination from artifact using Doppler ultrasound. Stroke. 1993;24:1667-1672.
Fleiss JL. Statistical methods for rates and proportions. In: Statistical Methods and Proportions. 2nd ed. New York, NY: J Wiley & Sons; 1981:212-225.
Georgiadis D, Kaps M, Siebler M, Hill M, König M, Berg J, Kahl M, Zunker P, Diehl B, Ringelstein EB. Variability of Doppler microembolic signal counts in patients with prosthetic cardiac valves. Stroke. 1995;26:439-443.
Georgiadis D, Mackay TG, Kelman AW, Grosset DG, Wheatley DJ, Lees KR. Differentiation between gaseous and formed embolic materials in vivo. Stroke. 1994;25:1559-1563.
Markus H. Importance of time-window overlap in the detection and analysis of embolic signals. Stroke. 1995;26:2044-2047.
Brucher R, Russell D. Automatic embolus detection with artifact suppression. J Neuroimaging. 1993;3:77. Abstract.
Smith JL, Evans DH, Fan L, Gaunt ME, London MJN, Bell PRF, Naylor AR. Interpretation of embolic phenomena during carotid endarterectomy. Stroke. 1995;26:2281-2284.