(Stroke. 2001;32:1091.)
© 2001 American Heart Association, Inc.
Original Contributions |
From the Durham Veterans Affairs Medical Center (L.B.G., M.R.J., J.H., S.B.A., R.D.H.); Department of Medicine (Neurology [L.B.G., V.C.] and General Internal Medicine [D.B.M.]), Stroke Policy Program, Center for Clinical Health Policy Research (L.B.G., D.B.M.), Center for Cerebrovascular Disease (L.B.G., D.B.M., V.C., R.D.H.), Duke University, Durham, NC; and Department of Biostatistics, University of North Carolina, Chapel Hill (L.J.E.).
Correspondence to Larry B. Goldstein, MD, Duke Center for Cerebrovascular Disease, Stroke Policy Program, Center for Clinical Health Policy Research, Box 3651, Duke University Medical Center, Durham, NC 27710. E-mail golds004{at}mc.duke.edu
| Abstract |
|---|
|
|
|---|
MethodsIn preparation for a study of outcomes and management practices for patients with ischemic stroke within Department of Veterans Affairs hospitals, 2 neurologists and 2 internists first retrospectively classified a series of 14 randomly selected stroke patients on the basis of the TOAST definitions to provide a baseline assessment of interrater agreement. A 2-phase process was then used to improve the reliability of subtype assignment. In the first phase, a computerized algorithm was developed to assign the TOAST diagnostic category. The reliability of the computerized algorithm was tested with a series of synthetic cases designed to provide data fitting each of the 11 definitions. In the second phase, critical disagreements in the data abstraction process were identified and remaining variability was reduced by the development of standardized procedures for retrieving relevant information from the medical record.
ResultsThe 4
physicians agreed in subtype diagnosis for only 2 of the 14 baseline
cases (14%) using all 11 TOAST definitions and for 4 of the 14 cases
(29%) when the classifications were collapsed into the 5 major
etiologic/pathophysiological groupings (
=0.42;
95% CI, 0.32 to 0.53). There was 100% agreement between
classifications generated by the computerized algorithm and the
intended diagnostic groups for the 11 synthetic cases. The
algorithm was then applied to the original 14 cases, and the
diagnostic categorization was compared with each of the 4
physicians baseline assignments. For the 5 collapsed subtypes, the
algorithm-based and physician-assigned diagnoses disagreed for 29% to
50% of the cases, reflecting variation in the abstracted data and/or
its interpretation. The use of an operations manual designed to guide
data abstraction improved the reliability subtype assignment (
=0.54;
95% CI, 0.26 to 0.82). Critical disagreements in the abstracted data
were identified, and the manual was revised accordingly. Reliability
with the use of the 5 collapsed groupings then improved for both
interrater (
=0.68; 95% CI, 0.44 to 0.91) and intrarater (
=0.74;
95% CI, 0.61 to 0.87) agreement. Examining each remaining disagreement
revealed that half were due to ambiguities in the medical record
and half were related to otherwise unexplained errors in data
abstraction.
ConclusionsIschemic stroke subtype based on published TOAST classification criteria can be reliably assigned with the use of a computerized algorithm with data obtained through standardized medical record abstraction procedures. Some variability in stroke subtype classification will remain because of inconsistencies in the medical record and errors in data abstraction. This residual variability can be addressed by having 2 raters classify each case and then identifying and resolving the reason(s) for the disagreement.
Key Words: diagnosis stroke classification
| Introduction |
|---|
|
|
|---|
=0.54).2 Since the
description of the TOAST scheme, it has been used to classify patients
according to ischemic stroke subtype in several studies.
However, the TOAST investigators cautioned that disagreements in
subtype assignment remain despite the use of these explicit criteria
and that trials should include measures to ensure the most uniform
diagnosis possible.2 In the
final report of the TOAST study, all of the stroke subtype diagnoses
were assigned by a central-blinded evaluator to minimize interrater
variability.3
|
In preparation for a study of outcomes and management practices for patients with ischemic stroke within Department of Veterans Affairs (VA) hospitals, we used the published TOAST criteria1 2 to retrospectively categorize a series of cases. Because only fair to moderate levels of interrater agreement were achieved, we engaged in a systematic effort to improve the reliability of assigning subtype diagnoses using the TOAST definitions.
| Methods |
|---|
|
|
|---|
Two experienced neurologists and 2 internists first
independently reviewed the medical records of each of 14 randomly
selected patients and assigned stroke subtype on the basis of the TOAST
definitions
(Table 1
). Each rater was provided with reference materials
listing the published criteria used by the TOAST investigators in
assigning patients to a given diagnostic
category.1 Only fair to
moderate levels of interrater agreement (see Results) led to a 2-phased
approach aimed at improving reliability.
In the first phase, a standardized form was designed to
record the abstracted data necessary to assign a TOAST subtype
classification
(Figure
).
A computerized SAS algorithm
(SAS Institute Inc) was then developed and
refined to classify cases according to the 11 described TOAST
definitions. The reliability of the computerized algorithm was tested
with synthetic cases that provided data fitting each of the 11
definitions
(Table 2
).
|
|
The computerized algorithm was then applied to the original 14 cases with the use of data from 1 set of abstractions. The resulting diagnostic categorizations were compared with each physicians baseline assignment. With variability due to differences in the subjective interpretation of the data removed by use of the algorithm, remaining variability had to be due to differences in the physicians application of the TOAST criteria or differences in the abstracted data (ie, critical differences in data entered into the computerized algorithm could lead to differences in subtype diagnosis).
The data abstraction process was refined in the second phase. First, an operations manual designed to guide the abstraction process was developed. Using the operations manual, 3 experienced abstractors (a nurse with extensive stroke-related experience, a stroke neurologist, and a medical student with extensive training) independently recorded data on the standardized forms for a series of 17 patients. Systematically comparing the data abstracted by each rater identified areas of disagreement critical to the assignment of stroke subtype, and the abstraction manual was then revised though a series of iterations. Using the final revised operations manual (see the Appendix, which may be found online at http://stroke.ahajournals.org), 2 raters then independently abstracted a final set of 20 cases with the extracted data entered into the computerized algorithm. Intrarater reliability was assessed by having 1 observer abstract a set of 61 cases on 2 separate occasions 6 months apart.
The degrees of intrarater and interrater reliability were
measured by simple percentage of agreement and with the unweighted
statistic.4 The values of the
statistic may be interpreted in a manner similar to the
interpretation of correlation coefficients (
=0 to 0.20, slight;
=0.21 to 0.40, fair;
=0.41 to 0.60, moderate;
=0.61 to 0.80,
substantial; and
=0.81 to 1.00, almost perfect
agreement).5 Probabilities
reflect the chances that the calculated
values were statistically
different from zero.
| Results |
|---|
|
|
|---|
=0.29; 95% CI, 0.21 to 0.37,
P<0.0001). The 4 raters were
concordant in diagnostic assignment for 4 of the 14 cases
(29%) when the classifications were collapsed into the 5 major
etiologic/pathophysiological groupings (overall
statistic for the 4 evaluators was 0.42; 95% CI, 0.32 to 0.53;
P<0.0001). The 2
neurologists classifications were concordant for 6 (43%) of the 14
cases using the full 11 TOAST categories and for 8 cases (57%) using
the collapsed 5 categories. One of the internists arrived at the same
diagnoses for 6 of the 8 patients (75%) for whom the 2 neurologists
agreed. The second internist concurred with only 4 of these 8
classifications (50%).
|
Development and Reliability of Computerized
Diagnostic Algorithm
Because of the relatively poor reliability found
in this initial assessment, a standardized abstraction form was devised
(Figure
),
and a computerized algorithm was developed to categorize patients
according to the published TOAST
criteria.1 This was
accomplished though a series of iterations in which both the
abstraction form and computer programming were tested and refined (data
not shown). A group of 11 synthetic cases was then created to fit each
of the TOAST categories
(Table 2
). There was 100% agreement between classifications
determined by the computerized algorithm and the intended
diagnostic groups for the synthetic cases.
The computerized algorithm was then applied to the 14 cases used in the baseline assessment with data from one of the neurologists abstractions. The algorithm-based diagnosis agreed with the 2 neurologists for 8 (57%) and 10 (71%) of the 14 cases and with the internists for 7 (50%) and 8 (57%) of the cases, respectively. Because the computerized algorithm yields consistent diagnostic categorizations in accord with the TOAST definitions, these discrepancies could only have been related to differences in the data as abstracted by the different raters, or differences in their interpretations of these data.
Reliability of Data Abstraction
A manual to guide data abstraction was then developed
by comparing disagreements among 3 experienced reviewers (data not
shown). To test the revised abstraction methodology and to
systematically explore remaining sources of variability, an additional
set of 17 cases was independently reviewed by 2 raters with the
extracted data entered into the computerized algorithm. Using the 5
collapsed categories, the 2 raters agreed in subtype diagnosis for 11
of the 17 cases (65%;
=0.54; 95% CI, 0.26 to 0.82;
P<0.05).
Examination of the raters data abstraction forms revealed that disagreements in subtype diagnoses were primarily due to differences in the interpretations of CT and MRI scans, cardiovascular evaluations, and carotid ultrasound results. Many of these disagreements occurred because raters relied on different reports in the medical record. For example, one rater used official radiology reports, whereas the other used interpretations of the studies as reflected in physicians notes. As a result, the operations manual was revised to specify a hierarchy of test reports to be used for abstraction of the results of the radiological tests (Appendix).
Final Interrater and Intrarater
Reliability
The abstraction process was repeated for an additional
set of 20 cases with the use of the final operations manual (Appendix)
with patients categorized into the 5 collapsed major
etiologic/pathophysiological groupings
(Table 4
). Reliability further improved, with the 2 raters
agreeing in subtype diagnosis for 75% of the cases (
=0.68; 95% CI,
0.44 to 0.91; P<0.05). All of
the differences in diagnostic assignment were due to
differences in abstracted data. Examining each disagreement revealed
that half were due to ambiguities in the medical record (eg, in one
case medical notes indicated results of an MRI without other evidence
in the record that the test was actually performed; in another case
a carotid duplex evaluation indicated mild to moderate
stenosis), and half were due to otherwise unexplained errors in
data abstraction.
|
Intrarater reliability was assessed by having one observer
abstract a set of 61 cases on 2 separate occasions 6 months apart, with
the data entered into the computerized algorithm for
diagnostic categorization. Diagnoses agreed for 50 cases
(82%;
=0.74; 95% CI, 0.61 to 0.87). Again, discrepancies were
largely due to differences in classification of CT and MRI scans,
cardiovascular evaluations, and carotid ultrasound
results.
| Discussion |
|---|
|
|
|---|
We found that the reliability of the TOAST classification was only fair to moderate when the published definitions were retrospectively applied to a randomly selected series of cases. Unified central assessment of stroke subtype was used in the TOAST trial itself to minimize interrater variability.3 The presence of this variability confirms that the published TOAST criteria should be used with caution unless the investigators can demonstrate acceptable levels of agreement within the context of an individual study.2
We used a 2-phase process aimed at improving the reliability of the TOAST classification scheme. Creation of a computerized algorithm (available at http://hsrd.durham.med.va.gov/) eliminated variability due to differences in the interpretation of stroke-related characteristics for a given patient. Remaining discrepancies were related to differences in the abstracted data, prompting the development of a standardized manual and procedures for extracting relevant information from the medical record. This improved reliability in the classification of stroke subtype to the substantial to almost-perfect level for both intraobserver and interobserver agreement. Residual differences in diagnostic categorization were related to simple errors in abstraction or ambiguities in the medical record, occurring in 25% of cases.
In practice, having each medical record abstracted by 2 raters with the data entered into the computerized algorithm could identify this remaining variability. Abstraction forms for cases in which there is a difference in subtype diagnosis could then be reviewed (focused on CT and MRI scan, cardiovascular, and carotid ultrasound results) and the reason(s) for the discrepancies identified and resolved. Our data show that variability in stroke subtype diagnosis can be reduced to a minimum through the use of this rigorous methodology. The generalizability of these results will need to be confirmed in other settings.
| Acknowledgments |
|---|
Received August 21, 2000; revision received November 1, 2000; accepted February 6, 2001.
| References |
|---|
|
|
|---|
St Jamess University Hospital, Beckett Street, Leeds, UK, j.m.bamford@leeds.ac.uk
| Introduction |
|---|
|
|
|---|
In the introduction to one of the earliest attempts to synthesize the various strands of classification, MillikanR1 wrote, "Our ultimate objectives are to obtain greater clarity of thinking in regard to cerebrovascular diseases, to compose a generally acceptable classification, to establish reliable criteria for diagnosis, and to promote further research in this field."
One suspects that, outside the centers of stroke research, such aspirations were considerably in advance of their time, and that for the majority of stroke patients worldwide, meaningful (if fairly basic) subclassification became a reality only with the advent of CT and ultrasound scanning. Most of the early research that used mechanistic classifications was observational epidemiology, most notably that from the Mayo ClinicR2 and later from the Stroke Data Bank collaboratorsR3 and the Lausanne group.R4 When the original classification was reviewed some 17 years later, at a time when the growth in stroke research in general, and clinical trials in particular, was beginning to expand dramatically, MillikanR5 wrote: "It continues to be evident that in such a complex set of clinical-pathophysiological phenomena some standard reference language or set of definitions should be used or the literature of investigation will be uninterpretable."
The point about the need for a common language of communication continues to be of paramount importance in an era when the uses of a classification have broadened from observational epidemiology to clinical trials and, more recently, to the purchasing of healthcare. Perhaps most importantly, there are the individual clinicians caring for stroke patients who use the classifications to put the results from the research centers into the context of their daily practice.
Clearly, any scheme of classification that is used needs to be as reliable as possible, and the article by Goldstein et al describes their experience using a computer algorithm to improve the reliability of the widely used TOAST (Trial of Org 10172 in Acute Stroke Treatment) classification,R6 a scheme that had its origins in the Stroke Data Bank classification of the 1980s and whose originators recognized that "Interobserver agreement is essential to the reliability of clinical data from cooperative studies and provides the foundation for applying research results to clinical practice."R7
However, as Goldstein et al stress, their objective was only to try to standardize retrospective data collection. That is quite a different proposition from using the classification prospectively either in clinical research or to manage individual patients. Here, it is important to remember that diagnostic reliability (ie, interrater or intrarater agreement) should not be equated with diagnostic accuracy, something that requires a gold standard against which it can be judged, and which is lacking in vivo for many stroke mechanisms. Indeed, Johnson et alR8 noted that in the absence of such a gold standard, "the merit of a classification system depends on its clarity, utility and reproducibility."
It seems likely that the relatively modest interrater and intrarater agreement of the TOAST classification, when used prospectively in clinical practice,R9 R10 is in part a consequence of that rather nebulous, but extremely important, entity of clinical acumen, a complex interaction of pattern recognition and experience-influenced, repeated testing of a hypothesis against available evidence. Of course, such behavior does not sit easily alongside an administrative "bean-counting mentality," in which it is more important to have everything in a category, regardless of the accuracy of the categorization!
So have the various classifications of stroke mechanism served us well over the last 40 years? It seems to me that they have been rather blunt tools. Even at the population level, we know relatively little about the natural history of the groupings. Furthermore, they have failed to identify subgroups of patients who would benefit from acute interventions (the original raison dêtre of the TOAST classification), and where secondary prevention treatments have been more successful, they have been targeted at much more specific groups, eg, patients with atrial fibrillation or carotid stenosis. One suspects that the multiple failures of acute intervention trials will prompt a thorough review of this whole area, and although it has been shown that advances such as multimodal MR can improve the reliability of the TOAST classification,R11 perhaps we should also consider other schemes that may have fewer links with the established clinicopathological paradigm. On the other hand, I think the current classifications do contribute to individual patient management, and harking back to Millikans original aspirations, I am sure that many of us will continue to use the basic skeleton of the classification to bring greater "clarity of thinking" to our clinical practice. Indeed, Gross et alR7 observed that clinicians were able to use the classification to synthesize a number of basic clinical and investigative findings with relatively poor interrater and intrarater reliability to form a much more reliable overall diagnosis. However, I do not envisage sitting in the outpatient clinic using the algorithm of Goldstein et al on my Palm or Psion for diagnostic purposes!
Received August 21, 2000; revision received November 1, 2000; accepted February 6, 2001.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
Y Iguchi, K Kimura, K Kobayashi, Y Ueno, K Shibazaki, and T Inoue Microembolic signals at 48 hours after stroke onset contribute to new ischaemia within a week J. Neurol. Neurosurg. Psychiatry, March 1, 2008; 79(3): 253 - 259. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ay, T. Benner, E. Murat Arsava, K. L. Furie, A. B. Singhal, M. B. Jensen, C. Ayata, A. Towfighi, E. E. Smith, J. Y. Chong, et al. A Computerized Algorithm for Etiologic Classification of Ischemic Stroke: The Causative Classification of Stroke System Stroke, November 1, 2007; 38(11): 2979 - 2984. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Saleh, M. Schroeter, A. Ringelstein, H.-P. Hartung, M. Siebler, U. Modder, and S. Jander Iron Oxide Particle-Enhanced MRI Suggests Variability of Brain Inflammation at Early Stages After Ischemic Stroke Stroke, October 1, 2007; 38(10): 2733 - 2737. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. Wiklund, W. M. Brown, T. G. Brott, B. Stegmayr, R. D. Brown Jr, S. Nilsson-Ardnor, J. A. Hardy, B. M. Kissela, A. Singleton, D. Holmberg, et al. Lack of aggregation of ischemic stroke subtypes within affected sibling pairs Neurology, February 6, 2007; 68(6): 427 - 431. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y Iguchi, K Kimura, K Kobayashi, Y Ueno, and T Inoue Ischaemic stroke with malignancy may often be caused by paradoxical embolism J. Neurol. Neurosurg. Psychiatry, December 1, 2006; 77(12): 1336 - 1339. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. B. Goldstein and D. L. Simel Is This Patient Having a Stroke? JAMA, May 18, 2005; 293(19): 2391 - 2402. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Jackson and C. Sudlow Are Lacunar Strokes Really Different?: A Systematic Review of Differences in Risk Factor Profiles Between Lacunar and Nonlacunar Infarcts Stroke, April 1, 2005; 36(4): 891 - 901. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Johnston Ischemic Preconditioning From Transient Ischemic Attacks?: Data From the Northern California TIA Study Stroke, November 1, 2004; 35(11_suppl_1): 2680 - 2682. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. B. Goldstein, G. P. Samsa, D. B. Matchar, and R. D. Horner Charlson Index Comorbidity Adjustment for Ischemic Stroke Outcome Studies Stroke, August 1, 2004; 35(8): 1941 - 1945. [Abstract] [Full Text] [PDF] |
||||
![]() |
U.G.R. Schulz, E. Flossmann, and P.M. Rothwell Heritability of Ischemic Stroke in Relation to Age, Vascular Risk Factors, and Subtypes of Incident Stroke in Population-Based Studies Stroke, April 1, 2004; 35(4): 819 - 824. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Lovett, A. J. Coull, and P. M. Rothwell Early risk of recurrence by subtype of ischemic stroke in population-based incidence studies Neurology, February 24, 2004; 62(4): 569 - 573. [Abstract] [Full Text] [PDF] |
||||
![]() |
German Stroke Study Collaboration Predicting outcome after acute ischemic stroke: An external validation of prognostic models Neurology, February 24, 2004; 62(4): 581 - 585. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.B. Goldstein, D.B. Matchar, J. Hoff-Lindquist, G.P. Samsa, and R.D. Horner VA Stroke Study: Neurologist care is associated with increased testing but improved outcomes Neurology, September 23, 2003; 61(6): 792 - 796. [Abstract] [Full Text] [PDF] |
||||
![]() |
U.G.R. Schulz and P.M. Rothwell Differences in Vascular Risk Factors Between Etiological Subtypes of Ischemic Stroke: Importance of Population-Based Studies Stroke, August 1, 2003; 34(8): 2050 - 2059. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wassertheil-Smoller, S. Hendrix, M. Limacher, G. Heiss, C. Kooperberg, A. Baird, T. Kotchen, J. D. Curb, H. Black, J. E. Rossouw, et al. Effect of Estrogen Plus Progestin on Stroke in Postmenopausal Women: The Women's Health Initiative: A Randomized Trial JAMA, May 28, 2003; 289(20): 2673 - 2684. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. B. Goldstein, D. B. Matchar, J. Hoff-Lindquist, G. P. Samsa, R. D. Horner, and E. J. Kenton III Veterans Administration Acute Stroke (VASt) Study: Lack of Race/Ethnic-Based Differences in Utilization of Stroke-Related Procedures or Services * Diagnostic Disparities: Still Exist? Stroke, April 1, 2003; 34(4): 999 - 1004. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Horner, J. W. Swanson, H. B. Bosworth, and D. B. Matchar Effects of Race and Poverty on the Process and Outcome of Inpatient Rehabilitation Services Among Stroke Patients Stroke, April 1, 2003; 34(4): 1027 - 1031. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Atiya, T. Kurth, K. Berger, J. E. Buring, and C. S. Kase Interobserver Agreement in the Classification of Stroke in the Women's Health Study Stroke, February 1, 2003; 34(2): 565 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Z. Oddone, R. D. Horner, D. C.C. Johnston, K. Stechuchak, L. McIntyre, A. Ward, L. G. Alley, J. Whittle, L. Kroupa, and J. Taylor Carotid Endarterectomy and Race: Do Clinical Indications and Patient Preferences Account for Differences? Stroke, December 1, 2002; 33(12): 2936 - 2943. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2001 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |