(Stroke. 1996;27:238-242.)
© 1996 American Heart Association, Inc.
Articles |
From the Division of Preventive Medicine, Department of Medicine, Brigham and Women's Hospital (K.B., J.E.B.), the Department of Neurology, Boston University School of Medicine (C.S.K.), and the Department of Ambulatory Care and Prevention, Harvard Medical School (J.E.B.), Boston, Mass.
Correspondence to Carlos S. Kase, MD, Department of Neurology, Boston University School of Medicine, 80 E Concord St, B-605, Boston, MA 02118.
| Abstract |
|---|
|
|
|---|
Methods Stroke subtype, stroke severity, and certainty of diagnosis were first classified from medical records from the years 1982 through 1988. The 216 stroke events reported in this period were independently reclassified in 1994 and compared with the initial classification using kappa statistics.
Results Overall agreement in major stroke types
(hemorrhagic, ischemic, undetermined stroke) as well as in
hemorrhagic stroke subtypes was excellent (
=0.81 and
=0.95,
respectively). A wide range of values for the ischemic stroke
subtypes (
=0.13 to
=0.96) was obtained. Agreement was
substantial
in assessment of stroke severity (
=0.71), and it was fair
(
=0.33)
for certainty of diagnosis.
Conclusions Interobserver agreement is high for major stroke types as well as for categories of hemorrhagic stroke on the basis of review of medical records and results of imaging data. The classification of ischemic stroke subtypes, however, is subject to substantial interobserver disagreement. Periodic reclassification of random samples of end points might be considered in long-term prospective studies to assess potential misclassification of events by different observers.
Key Words: classification diagnosis interobserver variation
| Introduction |
|---|
|
|
|---|
To address the issues of adequacy of medical record information as well as consistency of diagnosis over time, we independently reclassified and assessed interobserver agreement in the classification of the 216 strokes in the Physicians' Health Study reported before the early termination of the aspirin component of the trial in January 1988.
| Subjects and Methods |
|---|
|
|
|---|
While the beta carotene component of the study is still ongoing, the aspirin component was terminated early on January 15, 1988, because of the emergence of a statistically extreme 44% reduction in the risk of a first myocardial infarction among individuals assigned to aspirin.5 By that time, participants had been followed for an average of 60.2 months. Morbidity follow-up was 99.7% complete, and mortality follow-up was 100%.
A diagnosis of stroke was confirmed after the review of medical records and all other available information by the End Points Committee (consisting of two internists, one cardiologist, and one neurologist, all blinded to the treatment assignment). A definite stroke was defined as a focal neurological deficit that lasted longer than 24 hours and was attributable to a vascular event. Strokes were classified into six subtypes on the basis of presence of risk factors, the mode of onset, clinical findings, test results, and the nature and location of the occluded vessel: embolic infarct, atherothrombotic infarct, embolic or thrombotic infarct (undifferentiable), SAH, ICH, and stroke of undetermined type. The latter category applied to instances in which the clinical data, although consistent with stroke, did not allow a distinction between ischemic and hemorrhagic subtype. The confidence (certainty) about the diagnosis of stroke was classified as possible, probable, or certain. Severity of stroke at hospital discharge or at time of stabilization for patients who were not hospitalized was determined using the following six-grade scale: 1, no residual impairment; 2, minor nonfunctionally impairing deficit; 3, mild functional deficit with some restriction of lifestyle; 4, moderate deficit significantly interfering with activities of daily life; 5, dependent state requiring chronic care; and 6, fatal. All newly reported strokes were first classified according to this system by one neurologist (Harris H. Funkenstein, MD) between 1982 and 1988. In 1994, two other neurologists (K.B., C.S.K.) independently reclassified these strokes on the basis of the same medical records but blinded to the first classification. They discussed all cases to reach consensus on the reclassification of each stroke. The first and second classifications were then compared in terms of stroke subtype diagnosis, stroke severity, and certainty of diagnosis. The three neurologists who conducted the initial and second reviews of the cases based the diagnosis of stroke subtype on their best clinical judgment after reviewing all available clinical and laboratory information rather than on the application of predefined precise diagnostic criteria for each stroke subtype.
Statistical Analysis
The kappa statistic
(
)6 is the most commonly
used measure of agreement for categorical data. It is chance corrected,
ie, it compares the amount of observed agreement with that expected,
taking into account the prevalence of the item measured. The unweighted
for a dichotomized item (or symptom) and two observers has been
extended to instances with multiple observers and to observations with
more than two categories on an ordinal scale.7 8 In
the
latter situation, either standardized (ie, quadratic) or
self-defined weights should be used. The
based on quadratic
weights is asymptotically equal to the intraclass correlation. However,
the intraclass correlation is only to be used with interval data, while
is based on nominal (two categories only) or ordinal
data.9 We calculated quadratic-weighted
for stroke
severity and certainty of diagnosis, since these variables were
measurable on an ordinal scale. In addition, unweighted
values for
a dichotomized outcome were calculated for each stroke category. To
receive a summary measure of agreement within major stroke types (ie,
ischemic or hemorrhagic), an overall
as the unweighted
average10 of the corresponding single categories was
calculated. If the agreement is that expected by chance,
=0.
Generally, a
of 0.80 and higher can be considered as excellent
agreement, between 0.40 and 0.80 as moderate to substantial, between
0.20 and 0.40 as fair, and less or equal to 0.20 as slight or
poor.11
| Results |
|---|
|
|
|---|
Table 1
presents a summary of agreement on major
stroke types between the first and the second classifications. An
overall agreement of 93.1% with a
of 0.81 was obtained. None of
the strokes initially classified as ischemic were reclassified
as hemorrhagic. However, 12 of the initial ischemic strokes
were reclassified as undetermined in contrast to only 2 such diagnoses
in the first classification, indicating a more conservative approach in
the later interpretation of diagnostic results. Only 1 of
35 strokes initially classified as hemorrhagic was reclassified as
undetermined. This patient had a clinical presentation
suggestive of SAH but without proof of the diagnosis by CT or lumbar
puncture. The first classification of the event as hemorrhagic was
based solely on the symptoms at onset; the second classification
labeled the stroke as undetermined because of the lack of confirmatory
laboratory data for intracranial hemorrhage.
|
Table 2
gives a summary of agreement for stroke
subtypes. Every category of ischemic stroke from the first
classification showed a spread over the ischemic stroke
categories of the reclassification, indicating low agreement. The
undetermined stroke category revealed good agreement, although the
number of events classified was quite small. Agreement on hemorrhagic
stroke categories was very high, reaching perfect agreement on ICH and
reclassifying only 2 of 10 SAHs as ICHs.
|
Table 3
presents the
for each stroke category.
On the level of major stroke types, agreement was excellent, with
=0.98 for hemorrhagic and
=0.82 for ischemic strokes.
Regarding stroke subtypes, agreement was also excellent for the two
categories of hemorrhagic stroke but was substantially lower for the
three categories of ischemic stroke. Undetermined strokes
revealed a moderate agreement (
=0.45) between the two
classifications. The overall
values for the two major stroke types
represent an average of the agreement within the corresponding
single categories. They reveal excellent agreement on the hemorrhagic
stroke categories (
=0.95) but only fair agreement on
ischemic stroke subtype diagnosis (
=0.34).
|
Table 4
shows the overall interobserver agreement on
stroke severity. Since 10 cases were coded as having "missing
information" in the first classification, only 206 cases were
reevaluated. In addition, the category representing death
(grade 6) was excluded from the calculation of the
because perfect
agreement was implicit and its inclusion would have artificially
increased the overall
value for stroke severity. Since severity is
listed on a scale from 1 to 5 as defined earlier, only quadratic
weights were used to calculate the weighted
for this table.
Agreement occurred in 94.3% of all cases compared with 80.4% expected
by chance alone, yielding a moderate to substantial agreement
(
=0.71). A closer look at each category of severity reveals that
the
agreement was substantial for cases with no residual symptoms and was
only slight for those with minor functional deficits, but it improved
again in more severe cases. The following
values were obtained for
each category of severity: 0.47 (no residual, grade 1), 0.19
(nonfunctional deficit, grade 2), 0.17 (mild functional deficit, grade
3), 0.26 (moderate deficit, grade 4), 0.46 (severe deficit, chronic
care, grade 5), and 1.00 (fatal, grade 6).
|
For confidence in stroke diagnosis (an item generated after the
evaluation of all data concerning the event), total agreement was found
in 78.2% of cases. Since the categories of possible, probable, and
definite stroke also represent an ordering, only quadratic
weights were used to calculate the weighted
. A
value of 0.33
was obtained, demonstrating only a fair interobserver agreement for
degree of certainty in the stroke diagnosis.
| Discussion |
|---|
|
|
|---|
With regard to stroke subtype, interobserver agreement is generally
lower for ischemic than hemorrhagic categories, regardless of
the level of diagnostic workup. Previous evaluations of
interobserver agreement on stroke classification from clinical
impression and from medical records have found wide ranges for the
.1 2 3 12 Gross et
al1 found that the
addition of complete workup information to the findings from physical
examination and patient history alone increased the agreement from
=0.15 to
=0.38 when using a nine-category scale.
Reducing the
stroke subtypes to four (ischemic, SAH, ICH, and other) by
collapsing all ischemic categories and providing the workup
information to the same physician who initially examined the patient
improved the
value further to 0.69. The interobserver agreement
among the physicians who only had access to complete written
information on patients was insignificantly lower than the agreement
among those who examined the patient in person (
=0.54 versus
=0.61). However, the total number of patients in this study was
quite small (n=17). Gordon et al3 tested the agreement of
the ischemic stroke classification used in the TOAST (Trial of
ORG 10172 in Acute Stroke Treatment) study13 by sending
out 18 written case reports to 24 neurologists. They were asked to
classify the cases on the basis of the patients' histories,
description of the physical examination, and test results. The overall
interobserver agreement was a significant 54% increase over chance
(
=0.54). Values for each subtype varied between
=0.75
for
cardioembolic stroke and
=0.51 for ischemic stroke due to
small artery occlusion. However, the number of patients was small
(n=18), and the sample contained detailed clinical data and
diagnostic workup, intentionally including instances of
particular diagnostic uncertainty, to test the degree of
variability in stroke subtype diagnosis among investigators selected to
participate in a controlled clinical trial. In contrast, our study
evaluated agreement on 216 stroke cases, both hemorrhagic and
ischemic, with a heterogeneous level of
diagnostic workup, reflecting general clinical
practice.
The use of CT has greatly facilitated the diagnosis of hemorrhagic
stroke and the differentiation of SAH and ICH. Thus, CT is for the most
part responsible for the high levels of interrater agreement in these
stroke subtypes (in our study,
=0.96 for ICH and
=0.82
for SAH)
and in the diagnosis of "ischemic" stroke as a category
defined by the absence of blood on initial CT. However, the situation
is quite different in the classification of subtypes of
ischemic stroke after exclusion of a hemorrhagic event by CT.
Infarct subtype categorization is usually based on a combination of
data, including the affected vascular territory and/or infarct
mechanism (ie, [cardio]embolic versus
atherothrombotic).12 13 14 15 16 17 18 19 20
Because this differentiation is
derived from data on a variety of test results and clinical findings,
the
is generally lower, reflecting differences of opinion on the
diagnostic value of clinical findings and laboratory
results among physicians. Furthermore, disagreement among observers is
enhanced by the lack of predefined strict criteria for the diagnosis of
ischemic stroke subtypes, as shown in our study. However, the
high interobserver reliability for major stroke types observed in this
study serves the purpose of the Physicians' Health Study, which was
primarily concerned with the incidence of ischemic or
hemorrhagic events during use of an agent (aspirin) capable of altering
the incidence of both types of cerebrovascular event. For trials of
stroke therapy or for observation of potential differential effects of
risk factors or interventions on specific subtypes of brain infarction,
detailed diagnostic criteria need to be preestablished to
ensure improved interobserver reliability.
Interrater agreement on stroke severity has not been previously
evaluated in a retrospective manner. On the basis of the prospectively
generated results reported by Shinar et al2 and Lindley et
al,12 interrater agreement for neurological signs differs
widely (eg, weak arm,
=0.77; sensory loss, arm,
=0.1512 ). Thus, it is not surprising that we found
better agreement on severity in cases with major residual deficits or
other severe symptoms and in cases with no residual symptoms at all.
Similar results were obtained by van Swieten et al,21 who
found excellent interobserver reliability at both ends of a
six-item modified Rankin disability scale,22 with fair
agreement in the intermediate degrees of handicap after stroke. This
study and ours also show that a six-step scale is adequate to
classify poststroke handicap, resulting in an overall acceptable
interrater agreement.
Agreement on the certainty of stroke diagnosis was only fair
(
=0.33)
in our study. However, all cases initially diagnosed as stroke were
diagnosed as stroke in the second review. The low level of agreement in
the categories of certainty of diagnosis only reflects the assignment
of different degrees of strength to the available evidence by the
raters rather than doubt about the actual diagnosis of a stroke event.
This subjective difference on the certainty of stroke diagnosis was
also observed in the study of Gross et al.1 Although there
were variations between "low" and "high" subjective
confidence on initial clinical impressions among observers, their
agreement on the final stroke diagnosis (based on all available data)
was not different (
=0.34 and 0.39, respectively).
The interpretation of our results has to take certain factors into account. The time elapsed between the first and the second classification varied between 6 and 12 years. Although the technical quality of diagnostic tests (especially neuroimaging and Doppler ultrasonography) substantially improved during this period, the reclassification was generally done using the same data as in the first classification. In a few instances, additional medical records from stroke recurrence with more data (such as MRI) related to the first event or the availability of diagnostic test results sent with long delay broadened the evidence for a specific subtype reclassification, but this was a rare occurrence and could not have accounted for a substantial difference in agreement.
In summary, these data demonstrate high interobserver agreement for major stroke types as well as for categories of hemorrhagic stroke with a classification system based on review of medical records. The classification of ischemic stroke subtypes carries some uncertainty because of the complexities of diagnosis based on interpretation of combined clinical and laboratory data and because of the lack of preestablished criteria for their diagnosis. Thus, if ischemic stroke subtype is a major end point in a clinical trial, it is evident that clear diagnostic criteria need to be established before initiation of the trial. These data also demonstrate that neurologist raters analyzing the same medical records years apart produced reliable results on the diagnosis of major stroke types as tested by interobserver agreement. Thus, misclassification of strokes in such circumstances is not likely to be a plausible explanation for the observed results. To ensure quality control in long-term prospective studies, periodic reclassification of a sample of randomly selected stroke end points by persons other than the study neurologists should be considered to identify possible personal diagnostic biases or misapplication of diagnostic criteria.
| Acknowledgments |
|---|
This article is dedicated to the memory of Harris H. Funkenstein, MD, who served as neurologist on the End Points Committee of the Physicians' Health Study from its inception until his untimely and tragic death.
Received October 4, 1995; accepted October 25, 1995.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Weikert, J. Dierkes, K. Hoffmann, K. Berger, D. Drogan, K. Klipstein-Grobusch, J. Spranger, M. Mohlig, C. Luley, and H. Boeing B Vitamin Plasma Levels and the Risk of Ischemic Stroke and Transient Ischemic Attack in a German Cohort Stroke, November 1, 2007; 38(11): 2912 - 2918. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ay, T. Benner, E. Murat Arsava, K. L. Furie, A. B. Singhal, M. B. Jensen, C. Ayata, A. Towfighi, E. E. Smith, J. Y. Chong, et al. A Computerized Algorithm for Etiologic Classification of Ischemic Stroke: The Causative Classification of Stroke System Stroke, November 1, 2007; 38(11): 2979 - 2984. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Q. Rich, J. M. Gaziano, and T. Kurth Geographic Patterns in Overall and Specific Cardiovascular Disease Incidence in Apparently Healthy Men in the United States Stroke, August 1, 2007; 38(8): 2221 - 2227. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kurth, J. M. Gaziano, N. R. Cook, V. Bubes, G. Logroscino, H.-C. Diener, and J. E. Buring Migraine and Risk of Cardiovascular Disease in Men Arch Intern Med, April 23, 2007; 167(8): 795 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. Bowman, J. M. Gaziano, C. S. Kase, H. D. Sesso, and T. Kurth Blood pressure measures and risk of total, ischemic, and hemorrhagic stroke in men. Neurology, September 12, 2006; 67(5): 820 - 823. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Y.L. Zee, J. R. Romero, J. L. Gould, D. A. Ricupero, and P. M Ridker Polymorphisms in the Advanced Glycosylation End Product-Specific Receptor Gene and Risk of Incident Myocardial Infarction or Ischemic Stroke Stroke, July 1, 2006; 37(7): 1686 - 1690. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Hak, J. Ma, C. B. Powell, H. Campos, J. M. Gaziano, W. C. Willett, and M. J. Stampfer Prospective Study of Plasma Carotenoids and Tocopherols in Relation to Risk of Ischemic Stroke Stroke, July 1, 2004; 35(7): 1584 - 1588. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. S. Bowman, H. D. Sesso, J. Ma, T. Kurth, C. S. Kase, M. J. Stampfer, and J. M. Gaziano Cholesterol and the Risk of Ischemic Stroke Stroke, December 1, 2003; 34(12): 2930 - 2934. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kurth, C. S. Kase, K. Berger, E. S. Schaeffner, J. E. Buring, and J. M. Gaziano Smoking and the Risk of Hemorrhagic Stroke in Men Stroke, May 1, 2003; 34(5): 1151 - 1155. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Atiya, T. Kurth, K. Berger, J. E. Buring, and C. S. Kase Interobserver Agreement in the Classification of Stroke in the Women's Health Study Stroke, February 1, 2003; 34(2): 565 - 567. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kurth, J. M. Gaziano, K. Berger, C. S. Kase, K. M. Rexrode, N. R. Cook, J. E. Buring, and J. E. Manson Body Mass Index and the Risk of Stroke in Men Arch Intern Med, December 9, 2002; 162(22): 2557 - 2562. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. M. Barnett and M. Eliasziw Aspirin Benefit Remains Elusive in Primary Stroke Prevention Arch Neurol, March 1, 2000; 57(3): 306 - 308. [Full Text] [PDF] |
||||
![]() |
R. G. Hart, J. L. Halperin, R. McBride, O. Benavente, M. Man-Son-Hing, and R. A. Kronmal Aspirin for the Primary Prevention of Stroke and Other Major Vascular Events: Meta-analysis and Hypotheses Arch Neurol, March 1, 2000; 57(3): 326 - 332. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Berger, U. A. Ajani, C. S. Kase, J. M. Gaziano, J. E. Buring, R. J. Glynn, and C. H. Hennekens Light-to-Moderate Alcohol Consumption and the Risk of Stroke among U.S. Male Physicians N. Engl. J. Med., November 18, 1999; 341(21): 1557 - 1564. [Abstract] [Full Text] [PDF] |
||||
![]() |
I-M. Lee, C. H. Hennekens, K. Berger, J. E. Buring, and J. E. Manson Exercise and Risk of Stroke in Male Physicians Stroke, January 1, 1999; 30(1): 1 - 6. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Berger, H. Schulte, F. Stogbauer, and G. Assmann Incidence and Risk Factors for Stroke in an Occupational Cohort : The PROCAM Study Stroke, August 1, 1998; 29(8): 1562 - 1566. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. D'Olhaberriague, I. Litvan, P. Mitsias, and H. H. Mansbach A Reappraisal of Reliability and Validity Studies in Stroke Stroke, December 1, 1996; 27(12): 2331 - 2336. [Abstract] [Full Text] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 1996 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |