Development of Performance Measures for Acute Ischemic Stroke
Background and Purpose— The purpose of the present study was to develop and rate performance measures for hospital-based acute ischemic stroke.
Methods— A national multidisciplinary panel of 16 individuals (2 stroke specialists, 2 general neurologists, 2 internists, 2 neuroscience nurses, 2 stroke advocacy organization representatives, 1 stroke rehabilitationist, 1 family practitioner, 1 emergency room physician, 1 neuroradiologist, 1 managed care organization director, and 1 hospital association representative) from 10 medical societies or lay organizations assisted in the development of 44 potential stroke performance measures. We developed evidence summaries for each of the performance measures and graded the level of evidence associated with each measure. The panel received a summary of the literature pertaining to each measure and rated the measures by use of a modified Delphi approach for 6 dimensions of quality, including validity of evidence, feasibility, impact on outcomes, room for improvement, plausibility, and an overall rating (little reason to do, could do, should do, and must do).
Results— Highly rated and agreed on performance measures for the overall rating include warfarin in atrial fibrillation, antithrombotics on hospital discharge, carotid imaging in appropriate patients, and use of stroke units. Additional measures notable for high agreement were heparins for deep-vein thrombosis prophylaxis and use of a stroke protocol. Panelists rated time-related thrombolytic measures such as head CT within 25 minutes highly on the room for improvement dimension but low on the overall dimension. Neurologists tended to rate measures lower than did nonneurologists (P<0.01) for all 9 measures pertaining to thrombolytic management.
Conclusions— Highly rated and agreed on performance measures exist in all domains of hospital-based stroke care.
Quality of health care varies throughout the United States and frequently does not meet professional standards.1 Patients at risk for or who have had a stroke often do not receive medical care consistent with current evidence-based standards.2 For example, only 57% of patients with hypertension and 25% to 64% of patients with hyperlipidemia receive proper treatment.3 In addition, only 18% of hospitals in 1 state had an organized stroke unit.4 On a national scale, the percentage of hospitalized patients with a stroke or transient ischemic attack discharged on an antithrombotic agent ranged from 72% to 90%, whereas the percentage of appropriate patients in atrial fibrillation discharged on warfarin ranged from 31% to 64%.5 Despite this growing body of data, we still have relatively little information as to the extent, variation, and cause of the stroke evidence–practice gap.
See Editorial Comment, page 2073
One reason that we know so little about what is happening to stroke patients during the course of receiving medical care is that we lack a standard set of clinical measures for providers, health systems, and payers to use to monitor the quality of care. However, new initiatives are underway to develop quality measurement systems for the structure and process of health care delivery in both the inpatient and outpatient setting, several of which include care of patients with cerebrovascular disease.5–9 The usefulness of these systems will depend largely on the validity and reliability of the measures chosen. Prior stroke quality improvement projects have chosen measures but have paid little attention to the measurement properties of the end points used.10–12 Poorly chosen measures in clinical trials have led to disastrous consequences, including patient harm, indecisive evidence, and wasted efforts.13 Although this area has not been studied in detail, we have no reason to expect that choosing measures poorly in quality improvement projects will not result in similar consequences.
Given the effect that stroke has in both human and financial terms and the evidence that clinical practice often falls short of the ideal, the objective of this project is to develop a menu of valid and reliable performance measures for hospital-based, acute ischemic stroke care. We present here the methods of developing performance measures and the results of the expert panel’s ratings for each measure along 6 dimensions.
Assembly of a National Panel
To assist in development and ranking of stroke performance measures, we convened a national panel composed of physicians, nurses, administrators, and advocacy representatives. We wanted to create a multiperspective panel of various stakeholders in the care of hospitalized stroke patients. We solicited nominations from 12 professional societies and organizations, including the American Academy of Family Physicians, the American Academy of Neurology, the American Association of Neuroscience Nurses, the American College of Emergency Physicians, the American College of Healthcare Executives, the American College of Physicians—American Society of Internal Medicine, the American Heart Association—American Stroke Association, the American Society for Neuroradiologists, the National Stroke Association, the Society of General Internal Medicine, and the VHA, Inc. From a list of 46 nominations, 16 individuals were selected on the basis of their field of expertise, geographic location, and time availability. The national panel included 2 stroke specialists, 2 general neurologists, 1 stroke rehabilitationist, 1 neuroradiologist, 1 emergency room physician, 2 internists, 1 family practitioner, 2 neuroscience nurses, 1 managed care organization director (who was also a family practitioner), 1 hospital association representative, and 2 stroke advocacy organization representatives (which included 1 patient representative). Names of the 16 panel members and their affiliated organizations are listed in Appendix 1.
Developing a List of Performance Measures
The research team generated an initial comprehensive list of potential measures and performed an on-site hospital “walk through” of a typical stroke patient’s hospital experience. This list of performance measures was augmented by reviewing available clinical practice guidelines (see below) and performance measures used in published quality improvement projects involving the care of stroke patients. We also consulted the following sources: Health Plan Employer Data and Information set (HEDIS 3.0),14 Computerized Needs-Oriented Quality Measurement Evaluation System (CONQUEST) 1.1, and the National Library of Healthcare Indicators performance measures.15
The panel reviewed and focused the development of the performance measures during a half-day, face-to-face meeting in June 1999. From these deliberations, the research team developed 44 performance measures in 8 domains of acute stroke care. The draft performance measures underwent an external review by 3 content experts for wording clarity and specificity. Draft listing of the 44 performance measures is located in Table 1⇓⇓. We focused primarily on process of care measures rather than on developing outcome measures.5,16
Preparing Evidence Summaries for the Performance Measures
We developed evidence summaries for each of the draft performance measures. We used explicit search strategies to locate clinical practice guidelines, systematic reviews, randomized controlled trials, and cost-effectiveness analyses (Table 2). The research team reviewed titles, abstracts, and articles from the literature identified. An article was accepted if the reviewer thought it provided evidence for a potential association between the process in question and better patient outcomes. We gave priority to literature that systematically identified and graded the quality of the evidence and that was issued by authoritative bodies (eg, guidelines and literature from specialty societies and government agencies).
We identified a total of 84 clinical practice guidelines, 68 systematic reviews, 11 narrative reviews, 52 randomized controlled trials, 6 cost-effectiveness analyses, and 18 other related studies. From this literature base, we prepared a detailed review of the evidence, linking each of the 44 draft performance measures to patient outcomes. In addition, we graded the level of evidence by use of criteria issued by the Fifth American College of Chest Physicians Consensus Conference on Antithrombotic Therapy.17 A copy of the search strategies, literature identified, and evidence summaries is available from the corresponding author (R.G.H.) on request.
Dimensions of Quality Used to Rate Performance Measures
The panel rated each measure along 6 dimensions of quality: validity of evidence, feasibility of measurement, impact on outcomes, room for improvement, plausibility, and overall. These 6 dimensions of quality and the method of rating the performance measures within each dimension were adapted from 3 sources. The validity of evidence and the feasibility of measurement dimension were adapted from the Rand/UCLA appropriateness methodology as previously published.18 Both of these measures were rated along a 9-point scale from a score of 1 (definitely not valid or feasible) to 9 (definitely valid or feasible). A measure was considered definitely valid if sufficient scientific evidence existed to support a link between performance of that measure and overall positive outcomes to patients. A measure was considered definitely feasible if information needed to assess adherence was thought to be available in the medical record or from patient or proxy surveys or interviews and likely to be accurate. The dimensions of room for improvement, plausibility, and impact on outcomes were adapted from the Harvard Q-SPAN-CD study, the goal of which was to identify a set of cardiovascular-related performance measures.19 Each of these quality dimensions is rated along a 5-point scale. The room for improvement dimension asked panelists to rate each measure from a score of 1 (no room for improvement) to 5 (substantial room for improvement). The plausibility dimension asked panelists how plausible it was to expect that a quality improvement activity within a typical stroke care setting in the United States could improve adherence to the measure, from a score of 1 (not at all plausible) to 5 (extremely plausible). The impact on outcome dimension asked panelists to rate the measure as to the relative effect of outcome achieved if the measure is followed, from 1 (no impact) to 5 (very large impact). The final dimension was an overall rating of the utility of the measure in stroke quality-of-care assessment. This dimension was adapted from clinical audit criteria for monitoring patients with diabetes.20 The 4 categories for rating the overall dimension were little reason to do, could do, should do, and must do.
Rating the Performance Measures
To rate the potential performance measures, we used a modified Delphi approach to achievement of consensus by combining evidence with expert judgment.21 Two rounds of ratings were done, and the first round was done by mail. Each panelist received a copy of the evidence summaries and a measure-rating booklet; panelists had 3 weeks to rate each measure on the 6 dimensions. A copy of the measure-rating booklet is available from the corresponding author (R.G.H.) on request.
The second rating occurred during a face-to-face meeting November 9, 1999. Before this meeting, panelists received a summary of the results obtained from the first round of ratings. For each measure and each quality dimension, the panelists received an anonymous distribution of the ratings of the other panelists in addition to a notation of their own ratings. Therefore, 16 unique summary tables were prepared (1 for each panel member).
A structured approach was followed during the face-to-face meeting, during which the panelists rerated each of the performance measures. First, the measure was read aloud to the panel. Second, the panel discussed the wording of the measure and any wording changes necessary to improve its clarity and specificity. Third, the meeting comoderators (R.G.H. and B.G.V.) initiated a discussion about the measure, summarizing to the group the distribution of responses across the quality dimensions. The comoderators attempted to limit the role of dominant members and encourage the participation of all panel members. Fourth, once a discussion of a measure was complete, the process was repeated for each measure in a particular domain of care, and the panel rerated the performance measures. This process was repeated until all performance measures were rerated. The wording of 7 performance measures changed on the basis of discussion during the meeting.
For each of the 44 performance measures in each of the 6 quality dimensions, we present the frequency of panel responses. For each domain, we highlight those measures for which ≥75% of the panel members (≥12) rated higher than a cutoff score set by the research team and consider these measures highly rated. For the overall dimension, we considered measures highly rated if ≥12 of the 16 panel members ranked the measure should or must do. For the validity and feasibility dimension, we considered the measures highly rated if ≥12 of the 16 panel members rated the measure 7, 8, or 9. For the impact on outcomes, room for improvement, and plausibility dimension, we consider measures highly rated if ≥12 of the 16 panel members ranked the measure 4 or 5. In addition, we present the mean and median score for each measure and rank order the measures by mean scores. To quantify the amount of variability for the rating of each performance measure, we present the SD. The smaller the SD, the more homogenous the observations. Because the observations are, in fact, panel-member ratings, the SD can be interpreted as a measure of agreement. Using Spearman rank correlation coefficients, we assessed the association among the individual median ratings for each quality dimension. We also present the ratings on the overall quality dimension on the basis of whether the panelist was a neurologist (n=5) or a nonneurologist (n=11). We compared the responses of neurologist with nonneurologist on the overall rating using the Wilcoxon rank test.
Of the 16 panelists, ≥12 rated 21 of the 44 measures either should or must do (Table 3). The 2 most highly rated performance measures with the most agreement were warfarin in atrial fibrillation and antithrombotics on hospital discharge. The carotid imaging and stroke unit performance measures also were rated highly, with tight agreement. The recombinant tissue plasminogen activator (rtPA) treatment considered and rtPA treatment on protocol performance measures were more highly rated than the rtPA within 3 hours measure, but the rtPA within 3 hours measure lacked agreement with a spread of responses and an SD that is 37th widest (1.1) of the 44 performance measures. Two performance measures were notable for the amount of rating agreement: DVT/PE: heparins and stroke protocol. The stroke protocol measure was not highly rated, with a mean score that ranked 27th of 44, but it had an agreement rank based on its SD of 4 (0.5). All panel members ranked the acute aspirin measure as either must or should do except 1, who ranked the measure as little reason to do. The 2 acute diagnostic tests that ranked most highly were the serum glucose and ECG performance measures. Notably, 19 performance measures ranked higher than the avoid sublingual nifedipine measure in the overall ratings. The initial imaging performance measures ranked very low on the list, as did most of the time-to-treatment thrombolytic performance measures.
As shown in Table 4, ≥12 panelists highly rated (7, 8, or 9) 8 performance measures. The warfarin in atrial fibrillation and antithrombotics on hospital discharge had the highest validity ratings with the most agreement. The stroke unit and acute aspirin measures also were rated highly. Panel members often rated the validity of the scientific evidence higher than the level of evidence provided by the research team (Table 3). For example, of the 8 highly rated measures, 3 had A-level evidence, 2 had B-level evidence, and 3 had C-level evidence (see Table 2 for further information on evidence level17).
Feasibility ratings were rated more highly than validity ratings, and most of the measures (38 of 44) were rated 7, 8, or 9 (Table 5). Many of the early diagnostic-test performance measures were rated highly, including serum glucose, complete blood count (CBC), platelets, prothrombin time/partial thromboplastin time (PT/PTT), and serum electrolytes, as was warfarin in atrial fibrillation. Performance measures that ranked lower on the list included the time-to-thrombolytic measures (eg, measures ranked relatively low on the feasibility ratings included stroke expert within 15 minutes, head CT read within 45 minutes, and to bed within 3 hours).
Impact on Outcomes Dimension
Only 3 impact on outcome measures were rated highly: warfarin in atrial fibrillation, antithrombotics on hospital discharge, and stroke unit (Table 6). Panelists felt that the 2 swallow assessment performance measures had larger impact on patient outcomes than the mobilization and rtPA measures.
Room for Improvement Dimension
Eighteen measures were highly rated; the highest was stroke unit (Table 7). Many of the time-to–thrombolytic treatment performance measures, such as head CT done within 25 minutes, rtPA within 3 hours, stroke expert within 15 minutes, and head CT read within 45 minutes, also were rated highly. The education/support measure also was thought to need much or substantial improvement. In addition, many of the highly feasibly ranked performance measures were at the lower end of the rankings in the room for improvement dimension (eg, glucose, CBC, platelets, and PT/PTT.
Nine of the measures were rated highly on the plausibility dimension, including 3 that would require substantial changes in the infrastructure of stroke care delivery: stroke unit, rtPA considered, and head CT done within 25 minutes (Table 8).
Relationships Among Dimensions
Pairwise correlations of medians for validity, impact on outcomes, room for improvement, plausibility, and the overall rating ranged from 0.43 to 0.82 (all P<0.004). Feasibility ratings were not correlated with any other dimension except for a negative correlation with room for improvement (−0.5, P=0.0001).
Neurologists Versus Nonneurologists
Neurologists tended to rate the overall quality dimension lower than the nonneurologists (P<0.01, Table 9). Two performance measures that the neurologists rated higher than the nonneurologists were avoid blood pressure (BP) therapy and DVT/PE: heparins. The neurologists rated all of the measures relating to thrombolytic management lower on the overall dimension compared with the nonneurologists. Neurologists also were more inclined to give lower ratings to the swallow assessment performance measures and the smoking cessation measure compared with the nonneurologists.
The rationale for developing stroke performance measures is to promote performance excellence and to improve quality of care. Performance excellence in health care requires that performance results are measured, trended, and compared with prior performance or best in industry and that best practices are deployed and aligned in a practice or on an organization-wide basis. The goal of a structured approach to performance improvement built around cycles of learning is to deliver ever-improving value to patients while at the same time maximizing the overall practice performance and capabilities. A stroke care self-assessment is important because one can gain self-knowledge and identify opportunities for improvement. In addition, a standard set of core stroke measures can facilitate a common language that promotes benchmarking and sharing of best practices. Ultimately, the best set of stroke performance measures may be a flexible menu of performance measures, adaptable to the needs of each provider, stroke team, or organization. An increasing number of quality-improvement initiatives exist both in the public and private sectors that could benefit from a standard menu of stroke performance measures.
The present study reveals 21 performance measures rated highly by our definition (ie, 75% of panelists rated should or must do) for hospitalized patients with acute ischemic stroke. The 2 most highly rated and agreed-on performance measures for the overall quality dimension in the present study are 2 of the 3 performance measures being used by the Health Care Financing Administration (HCFA) to create a monitoring system that supports quality improvement for fee-for-service Medicare beneficiaries (warfarin in atrial fibrillation and antithrombotics on hospital discharge).5 The third measure (avoid sublingual nifedipine) being used by HCFA in their quality-monitoring program was in the highly rated group, but 19 measures ranked higher. In addition, the measure that ranked third on the overall dimension in our analysis (carotid imaging) is similar to 1 of 3 stroke performance measures used in a study of Medicare claims data to measure underuse.8 Finally, in a recently published conference proceeding on measurement and improvement of quality of stroke care, several structure and process measures were recommended.22 Several of their recommendations received high ratings on the overall dimension in the present study (antithrombotics on hospital discharge and warfarin in atrial fibrillation), whereas others did not (initial imaging: 24 hours and treat fever).
No standard method exists for developing, rating, and analyzing the results of performance measure ratings, and several methods have been used.19,23–25 We used an explicit modified-Delphi consensus method to combine evidence with expert judgment, a method being used more frequently in health care research.21 No attempt was made to “force” consensus, and what we attempted to achieve was the best possible consensus on a given date, with a given amount of evidence, with a particular composition of panel members. We assembled a multidisciplinary panel with broad US representation and stroke care perspectives. The modified-Delphi approach has limitations and theoretically can represent a collective error in judgment or inference and can be influenced by panel composition. For example, prior research has shown that multispecialty panel ratings are more divergent than single specialty panels.26 However, we believe that a rigorously applied method and prudent exercise of inference and common sense in the face of incomplete evidence is an integral and necessary component for developing valid performance measures, particularly in highly prevalent conditions.
In addition, we present all of the data including the mean, median, and SD, recognizing the potential limitation of nonnormality. Both the absolute score and the distribution of responses are important when evaluating the rating results of the performance measures. For example, for the overall rating, the stroke protocol measure had a relatively low absolute score (mean 2.7, median 3.0), but tight agreement existed, as evidenced by the distribution of responses (SD 0.5). In contrast, the rtPA within 3 hours measure had a much higher absolute score (mean 3.2, median 4.0) but much less panel agreement (SD 1.1). Disagreed-on performance measures may be more difficult to implement and monitor despite ardent support by a majority.
Seven of the 18 highly rated measures on the room for improvement dimension were for the thrombolytic performance measures. However, despite this perceived need for room for improvement, many of the thrombolytic performance measures received low scores on the overall ratings, particularly the time-to-treatment measures. This may be explained in part by the relative disagreement demonstrated for most of the thrombolytic measures. Also, neurologists rated all 9 performance measures pertaining to thrombolytic management lower than did the nonneurologists on the overall quality dimension, a surprising finding given that other panel ratings have found that those who use a particular technology tend to rate higher than those who do not.26 This may reflect the particular bias of the neurologists on the panel in that only 2 were stroke specialists.
The present study also addresses an area of difficulty in the field of quality measurement and improvement. We found a negative correlation on the measure ratings between room for improvement and feasibility ratings, which indicates that those performance measures thought to be in the need of most improvement were the most difficult to collect. This negative correlation again emphasizes the limits of the currently available data sources, primarily administrative data and the medical record, for quality-of-care assessment, and the need to develop new methods of information retrieval and tracking.
Of 44 potential performance measures in the present study, only 3 were graded as having A-level evidence, whereas most of the performance measures (34 of 44) had C-level evidence. The present study highlights the lack of research evidence for most of what we do for hospitalized patients with ischemic stroke and that we cannot expect clear verdicts from randomized controlled trials with sufficient statistical precision to address most of our processes of care. These occurrences also probably explain the lack of overall panel agreement for rating the quality dimensions. However, despite the lack of research evidence to support the performance measures, many panelists rated the validity of the evidence as relatively strong. In addition, most of the panelists thought that quality-of-care assessment should or must be performed on 21 of the measures, despite a paucity of evidence to support a link between the measure and patient outcomes for most of them.
What are the implications of this research for performance improvement activities? For local efforts, individual hospitals or stroke programs may want to review the list of highly rated indicators and focus on measuring, trending, and improving adherence to ≥1 of the performance measures. Selection of the measures should be guided by local perceived need, availability of resources, comprehensiveness of services, and relative ranking of the measures on the dimensions of quality. Keep in mind that most of these performance measures have not been tested yet in terms of reliability. The validity of a stroke performance measure is necessary but not sufficient for quality-of-care purposes, and a validated measure can be made invalid if not reliably collected.27 Therefore, a local program interested in measuring and tracking patient performance on prophylaxis for deep-vein thrombosis and pulmonary embolism (DVT/PE), for example, must be confident that data can be accurately collected. The issue is no different for national quality-improvement initiatives, for which states, providers, hospitals, or health systems may be held accountable in terms of having their adherence rates made public. However, under such circumstances, the reliability of the performance measures must be established before the data are measured and reported on. We know from a recent study that 3 performance measures (warfarin in atrial fibrillation, antithrombotics on hospital discharge, and avoid sublingual nifedipine) had excellent interrater reliabilities with trained abstractors.5 More research is needed to assess the reliability of collected data for other performance measures, and the results from the present study can guide the selection of what measures to test by focusing on those measures that were rated highly.
Other areas also are in need of future research. Continued development and testing is needed of ischemic stroke measures for care before and after hospitalization and for patients with transient ischemic attacks, intracerebral hemorrhages, and subarachnoid hemorrhages. More research is needed on exploiting existing data sources and unearthing relevant data about quality, but this is not enough. We need to harness the potential of information technology for quality improvement by developing new systems of data collection that will provide us with additional opportunities to track, trend, and compare stroke performance measures. Finally, the lack of scientific data to support what we do for patients is startling, and major new investments in studies to establish links between process and outcomes of care would provide better information for decision making.
Given the burden of stroke in terms of suffering and cost, the evidence that practice often falls short of the ideal and the desire to go beyond conformance to standard to focus on excellence, we have developed potential performance measures for hospitalized patients with acute ischemic stroke ultimately to be used in national and local initiatives to improve quality of stroke care. By such efforts we will begin to narrow the evidence-practice gap and truly give stroke patients precisely what they need and want, precisely when they need and want it.
Panel Members and Affiliations
Harold P. Adams, MD, American Academy of Neurology; David C. Anderson, MD, American Academy of Neurology; Carol A. Barch, MN, CRNP, CNRN, American Association of Neuroscience Nurses; Andrew S. Jagoda, MD, FACEP, American College of Emergency Physicians; Daniel L. Kent, MD, American College of Physicians; Edgar J. Kenton, MD, American Academy of Neurology; Walter N. Kernan, MD, Society of General Internal Medicine; Richard E. Latchaw, MD, American Society for Neuroradiologists; Laura Lennihan, MD, American Academy of Neurology; L. Gordon Moore, MD, Strong Health Managed Care Organization, Rochester, New York; Meighan St. John Girgus, American Stroke Association (American Heart Association); Jeffery Saver, MD, American Academy of Neurology; Yvonne Schooley, RN, MSN, CNRN, American Association of Neuroscience Nurses; Kenneth Smithson, MD, VHA Inc; Jacquelyn Mayer-Townsend, National Stroke Association; Paul E. Van Gorp, MD, American Academy of Family Physicians.
The present work was funded by the New York State Department of Health and the American Academy of Neurology. We thank the National Expert Stroke Panel, whose commitment to quality patient care will provide the energies needed to improve the performance of stroke care delivery. We also thank Neil Wenger, MD, MPH, Robert Brook, MD, ScD, and Justine Zentner, RN, NP, for suggestions during the conceptual phase of the present project. We thank Linda Williams, MD, David Matchar, MD, and Karen Johnston, MD, for their review of the first draft of the performance measures and Hongwei Zhao, PhD, and Antai Wang for statistical advice. We thank Sehyun Kim, PhD, for programming assistance and Carolynn O’Connell for coordination of the present project and preparation of the manuscript.
- Received January 4, 2001.
- Accepted June 6, 2001.
Holloway RG, Benesch C, Rush SR. Stroke prevention: narrowing the evidence-practice gap. Neurology. 2000; 54: 1899–1906.
Kalra L, Perez I, Melbourn A. Stroke risk management: changes in mainstream practice. Stroke. 1998; 29: 53–57.
Goldstein LB, Hey LA, Laney R. North Carolina Stroke Prevention and Treatment Facilities Survey. Stroke. 2000; 31: 66–70.
Gordon DL, Cobb AB, McIlwain JS, Keller C, Roach CA, Miller D, Sanchez N, Guy B, Meydrech EF. Cooperative stroke management project by a peer-review organization. J Stroke Cerebrovasc Dis. 1996; 6: 45–53.
Goldman RS, Hartz AJ, Lanska DJ, Guse CE. Results of a computerized screening of stroke patients for unjustified hospital stay. Stroke. 1996; 27: 639–644.
Committee on Performance Measurement of the National Center for Quality Assurance. Health Plan Employer Data and Information Set: HEDIS 3.0. Washington, DC: National Committee for Quality Assurance (NCQA). 1996. Draft.
National Library of Healthcare Indicators: Health Plan and Network Edition. Oakbrook Terrace, Ill: Joint Commission on Accreditation of Healthcare Organizations; 1997.
Naylor CD. Assessing processes and outcomes of medical care. Ann R Coll Surg Can. 1997; 30: 157–161.
Brook RH. The Rand/UCLA Appropriateness Method. In: McCormick KA, Moore SR, Siegel RA, eds. Clinical Practice Guideline Development: Methodologic Perspectives. Rockville, Md: US Department of Health and Human Services; 1994:59-65. AHCPR Publication No. 95–0009, 59–65.
Normand SL, McNeil B, Peterson LE, Palmer RH. Eliciting expert opinion using the Delphi technique: identifying performance indicators for cardiovascular disease. Int J Qual Health Care. 1998; 10: 247–260.
Baker R, Fraser RC. Fortnightly review: development of review criteria: linking guidelines and assessment of quality. BMJ. 1995; 311: 370–373.
Jones J, Hunter D. Consensus methods for medical and health services research. BMJ. 1995; 311: 376–380.
Measuring and improving quality of care: a report from the American Heart Association/American College of Cardiology First Scientific Forum on Assessment of Healthcare Quality in Cardiovascular Disease and Stroke. Stroke. 2000;31:1002–1012.
Naylor CD, McGlynn EA, Leape LL, Pinfold SP, Bernstein SJ, Hilborne LH, Park RE, Kahan JP, Brook RH. Coronary angiography and revascularization: defining procedural indications through formal group processes: the Canadian Revascularization Panel, the Canadian Coronary Angiography Panel. Can J Cardiol. 1994; 10: 41–48.
Continuous Quality Improvement in Stroke Care
Video meliora, proboque: Deteriora sequor.
[I see the better way and approve it; I follow the worse course.]
ISO9000 is an accepted method of quality assurance throughout industry. It is global and it works because it changes behavior. This trend has yet to fully encompass the practice of clinical medicine. Only one single clinical standard—on the protection of the airway during laser surgery to the upper airway—is listed on the ISO website.1 Yet, medicine is ripe for this philosophy of quality improvement. Clinical standards in the form of clinical pathways and practice guidelines are abundant, but there is little evidence that they result in a change in physician behavior or improve patient outcomes.
A tautology: to improve, one must have knowledge of what needs improving and a method of measuring it. Such knowledge may appear intuitive, but disagreement on apparently simple targets may be so great as to disarm the process. In stroke medicine, it is well known that stroke units save lives, reduce long-term disability, improve quality of life, and probably reduce total stroke expenditures. This has been known for a decade and consistently confirmed in systematic reviews.2,3 However, few stroke units exist in North America.
This kind of evidence-to-practice gap is the basis for the article by Holloway et al, in which they explicitly develop standards for clinical acute stroke care. Performance measures were developed and then evaluated in a modified Delphi conference format. The panelists included representatives from major organizations with a stake in stroke care, including nurses, emergency physicians, family physicians, neurologists, internists, administrators, and the public represented by US national stroke organizations. This was a representative panel rather than an expert panel, but the panel included well-known US stroke experts. This difference is important in assessing the authors’ conclusions.
Each of 44 performance measures were rated on 6 domains. To highlight the results, only 3 of 44 performance measures were found to have grade A14 evidence to support them: stroke unit care, warfarin for atrial fibrillation, and antithrombotic therapy at discharge. Not surprisingly, these 3 were the only measures in which there was substantial agreement on the impact on outcomes domain. Five additional measures showed substantial agreement for overall utility: acute aspirin, carotid imaging, tPA treatment consideration, tPA treatment protocol, and early mobilization at 24 hours.
The study provides a menu of targets for improvement. The inclusive nature of the panel suggests that these targets would appeal to a wide audience. While experts may disagree on the relative order of importance of targets, and external validation of these results by other groups with different compositions (eg, single-specialty) would raise content validity, few would dispute the main findings of the panel. The application of their findings will depend on the local structure of stroke care delivery.
This process underscored the lack of definitive (grade A) evidence of efficacy for the majority of stroke-care interventions (evidence-to-practice gap) and the lack of evidence to demonstrate effective care where efficacy has been shown (practice-to-evidence gap). Clinical stroke and health service researchers have both a blank slate and an unobscured list of proofs to provide. Administrators and all health professionals who care for stroke patients have been given targets. Holloway et al have issued a challenge.
Department of Clinical Neurosciences
University of Calgary
Calgary, Alberta, Canada
Division of Neurology
University Health Network
University of Toronto
Toronto, Ontario, Canada
Available at: http://www.iso.ch. Accessed June 25, 2001.
Collaborative systematic review of the randomised trials of organised inpatient (stroke unit) care after stroke: Stroke Unit Trialists’ Collaboration. BMJ. 1997;19:314: 1151–1159.
Organised inpatient (stroke unit) care for stroke: Stroke Unit Trialists’ Collaboration. Cochrane Database Syst Rev. 2000;2:CD000197.
Guyatt GH, Cook DJ, Sackett DL, Eckman M, Pauker S. Grades of recommendation for antithrombotic agents. Chest. 1998; 114: 441S-444S.