Two Simple Questions to Assess Outcome After Stroke
A European Study
Background and Purpose—The “2 simple questions” were designed as an efficient way of measuring outcome after stroke. We assessed the sensitivity and specificity of this tool, adapted for use in 8 European centers, and used it to compare outcomes across centers.
Methods—Data were taken from the Biomed II prospective study of stroke care and outcomes. Three-month poststroke data from 8 European centers were analyzed. Sensitivity and specificity were assessed by comparing responses to the 2 simple questions with Barthel Index and modified Rankin scale scores. Adjusting for case mix, logistic regression was used to compare patients in each center with “good” outcome (not dependent and fully recovered) at 3 months.
Results—Data for 793 patients were analyzed. For the total sample, the dependency question had a sensitivity of 88% and a specificity of 77%; the recovery question had a sensitivity of 78% and a specificity of 90%. Dependency data from Riga had much lower sensitivity. There was variation in good outcome between centers (P=0.0015). Compared with the reference center (Kaunas), patients in Dijon, Florence, and Menorca were more likely to have good outcome, after adjusting for case mix.
Conclusions—Dependency and recovery questions showed generally high sensitivity and specificity. There were significant differences across centers in outcome, but reasons for these are unclear. Such differences raise particular questions about how patients interpreted and answered the simple questions and the extent to which expectations of recovery and perceived needs for assistance vary cross-culturally.
Assessment of patient outcomes is essential for quality assurance of care, for investigations of the efficacy and effectiveness of new methods of treating or caring for patients, and for investigation of the efficiency of resource use. Mortality is the most straightforward outcome to measure, but this provides limited information about the efficacy of an intervention or the quality of a service. Outcomes such as disability or handicap provide more information for evaluators, but even commonly used measures may be interpreted in different ways.1
There is a drive toward comparing outcomes cross-nationally. This can be related to several factors, including the World Health Organization’s goal of “Health for All,”2 the globalization of markets targeted by multinational pharmaceutical companies, and, in the European context, moves toward harmonization within the region. This drive has led to a need for indicators which are equally meaningful across countries so that questions may be expected to elicit the same types of responses.
Many measures are lengthy, making them unsuitable for large-scale studies. Two simple questions were developed by Lindley et al3 to meet the need for a simple, inexpensive, and quick way of assessing functional outcome in large numbers of stroke patients recruited to an open, randomized, controlled trial. It was suggested that the tool would also be useful to monitor patient outcome in routine practice and clinical audit.4 5 The questions were developed to assess dependency and recovery.
The 2 simple questions were developed in the United Kingdom and were used in the pilot and subsequent International Stroke Trial (IST).6 The IST authors accepted the validity and reliability of the 2 simple questions, although in fact studies testing the measures were conducted only in the United Kingdom, whereas the IST involved centers in many non-English speaking countries. Because IST data are not presented in disaggregated form, we do not have information about any possible intercountry differences in responses to the 2 simple questions. In this study we aimed to investigate the sensitivity and specificity of the 2 simple questions used in a pan-European prospective study of stroke care, resource use, and outcomes. If possible, we then aimed to compare outcomes across centers using this tool.
Subjects and Methods
Data were collected in the European Union BIOMED II stroke study, which was undertaken to investigate the relationships between resource use, costs, and outcome of different packages of care for stroke. The specific objectives have been outlined previously.7 8 Approval was obtained from local ethics committees. The project involved 13 centers located in 12 western and eastern European states (England, Denmark, France, Germany, Hungary, Italy, Latvia, Lithuania, Portugal, Spain, Poland, and Russia). In each center, a hospital-based register was established to collect patient data prospectively for 1 calendar year. Data collection related to first-ever stroke admissions using the World Health Organization definition.9 Stroke-specific questionnaires were developed from previous register questionnaires formulated by participants and were in agreement with those used by the MONICA Stroke Study.10 The clinical data items were routinely collected in the centers and have been collected for population studies in Europe previously.11 Case-mix variables and other outcome measures were agreed following discussions by study participants at 2 workshops before the start of data collection. The study coordinating team visited each center to oversee data collection, and a manual of definitions was distributed to all centers. Issues related to data collection, interpretation, and quality were discussed at the site visits and at 6 monthly meetings of the group during the 3 years of the project. Data were collected by dedicated researchers in each center at time of admission to hospital and subsequently at 3 and 12 months by face-to-face interview, telephone interview, or mailed questionnaire, according to local preference.
Sociodemographic data collected include age, gender, and place of residence (at home alone, at home with carer, or institution such as nursing home). Clinical data collected included measures of stroke severity (urinary incontinence, level of consciousness, and any limb weakness at time of maximum impairment).11 At the 3-month follow-up, the patients were reassessed. In addition to clinical data collected at onset, resource use data were collected (including use of clinical services, use of social services, and assistance from informal carers), as were outcomes including death, disability measured by the Barthel Index (BI)12 and the modified Rankin scale (RS),13 and responses to the 2 simple questions.
The 2 simple questions assess (1) patient dependency through the question, “In the last 2 weeks did you require help from another person for everyday activities?” and (2) recovery through the question, “Do you feel that you have made a complete recovery from your stroke?” The questions were included in the Biomed II questionnaire as an adjunct to the agreed functional outcome measures. Following discussion at an early planning meeting of participants to introduce the questions and clarify the concepts, the questions were then translated into the relevant language by each local study coordinator.
For the present study, data from 8 centers were analyzed (ie, all centers at which the 2 simple questions were asked at 3-month follow-up. Five centers are excluded from this analysis because they did not collect either the 2-simple-questions data or 3-month follow-up data.
Testing the Validity of the 2 Simple Questions
To investigate how successfully the 2 simple questions captured dependency and recovery, sensitivity and specificity were calculated. Sensitivity is defined as how accurately a question identifies positive cases. Specificity is defined as how well a question identifies negative cases.14 The standard way of calculating sensitivity and specificity of a new measure is to compare responses to the new measure with responses to a gold standard or usual measure. Following Lindley et al,3 we defined dependence in 2 ways: BI score of <20 and modified RS score of 3, 4, or 5. Recovery was defined as modified RS score 0 (a BI score of 20 indicates independence in functional abilities).15 The modified RS is scored as follows: 0, no symptoms; 1 or 2, functionally independent; 3, moderate handicap; and 5, moderate to severe handicap.13 We further hypothesized that recovery would be defined by responders as a return to prestroke ability. Therefore, we also defined recovery as a 3-month modified RS score the same as or better than the prestroke modified RS, and 3-month BI same as or better than prestroke BI.
To test the validity of the 2 simple questions, we compared responses to these questions with BI and modified RS scores. One center, London, did not collect the modified RS score and is therefore not included in this part of the analysis. For both the dependency and the recovery questions, sensitivity and specificity were calculated for each center individually as well as for the total sample.
The algorithm developed by Lindley et al3 was used to identify 3 outcome groups. Bad outcome was defined as a positive response to the dependency question; indifferent outcome was defined as a negative response to both dependency and recovery questions; and good outcome was defined as a positive response to the recovery question.
McNemar’s test14 was used to compare the proportions classified with bad outcome defined as (1) BI <20 and (2) a positive response to the simple question, “In the last 2 weeks did you require help from another person for everyday activities?” This is a test of proportions for paired data and as such does not require adjustment for confounding variables. The analysis was repeated to compare good outcome using the 2 measures, ie, modified RS score of 0 and negative response to both simple questions.
To compare outcome across centers, logistic regression was used. The relationship between good outcome and center was investigated, with adjustment for age group, sex, consciousness level, any limb weakness, and incontinence.
A total of 1349 patients were registered. Complete data were available for 1300. At 3 months, 793 patients were alive and followed up. Table 1⇓ reports the demographic and clinical characteristics of the sample at onset of stroke and at 3-month follow-up. There was considerable variation between centers in both case mix and outcome.
The sensitivity and specificity of the dependency and recovery questions, using a BI score of 20 (dependency) and a modified RS score of 0 (recovery), are reported in Table 2⇓. For the total sample, the dependency question had a sensitivity of 88% and a specificity of 77%. In individual centers, apart from Riga, sensitivity ranged from 83% in Dijon to 100% in Menorca; specificity ranged from 67% in Menorca to 94% in Warsaw. Riga was exceptional, with both sensitivity and specificity being low. For the total sample, the recovery question had a sensitivity of 78% and a specificity of 90%. In individual centers, sensitivity ranged from 63% in Riga to 100% in Florence and Menorca; specificity ranged from 80% in Menorca to 100% in Warsaw. There were significant differences in sensitivity and specificity between the centers for the dependency question and also in sensitivity for the recovery question.
Using the modified RS to examine dependency (modified RS score of 3, 4, or 5), the overall sensitivity was 83% and overall specificity 90%. When recovery was defined as 3-month modified RS score equal to or better than prestroke modified RS score, the overall sensitivity was 41% and specificity 84%. Similarly, using the BI, the overall sensitivity was 29% and specificity 95%.
Table 3⇓ reports the difference between the 2 methods of measuring bad and good outcomes. Overall, the BI classified a larger proportion of respondents as having bad outcome than did the dependency question, although the upper limit of the confidence interval was only 9.2%. There was no significant difference between the 2 methods for most centers. For Kaunas and Menorca, however, the BI classified a larger proportion of respondents as having a bad outcome than the dependency question. Overall, the modified RS classified a larger proportion of respondents as having good outcome than the 2 simple questions, although the upper limit of the confidence interval is only 9.4%. In Almada, Riga, and Warsaw, there was no significant difference between the 2 methods for identifying good outcome. For the remaining centers, the modified RS classified a larger proportion of respondents as having good outcome than the 2 simple questions.
Table 4⇓ reports the distributions of the BI for those who reported dependency using the 2 simple questions and also for those who did not report dependency. With the exception of respondents in Riga, high proportions of patients who were not dependent reported a BI score of 20. In all centers, only negligible numbers who were not dependent had BI score ≤9. In Riga, a large proportion (43%) of respondents who reported dependency had BI=20. In all centers, the proportion of respondents who reported that they had not made a complete recovery but were scored RS=0 was small (3%). However, there were high proportions of respondents who reported having made a complete recovery while scoring RS=1 or 2, especially in Dijon (71%) and Kaunas (79%).
Table 5⇓ reports the unadjusted proportions classified into each outcome category (bad, indifferent, good) by using the 2 simple questions. After adjusting for case mix, the centers with the least likelihood of good outcome were in Eastern Europe. The odds ratios, compared with Kaunas, were highest in Menorca and Florence. London and Dijon also had higher odds of a good outcome compared with Kaunas.
Simple outcome measures that can be used across centers and countries are required for large-scale studies and trials, but their strengths and weaknesses need to be understood when deciding which should be adopted. The 2 simple questions were used as outcome measures in the IST, one of the largest trials of stroke care ever undertaken. This study investigated the use of the 2 simple questions as part of a larger study of stroke outcomes across selected European studies. We compared responses to the 2 simple questions with widely used measures of disability and handicap. The overall sensitivity of the dependency question was high (88%), although sensitivity found in the data from Riga is considerably lower than in all other centers. Nevertheless, the sensitivity of the dependency question in all centers including Riga is higher that reported by Lindley et al,3 who found a sensitivity of 61% in their original study conducted in Scotland. The sensitivity of the recovery question found in the present study (78%) was similar to that reported by Lindley et al3 (83%).
Overall, there was a significant difference between the proportion classified as having a bad outcome using the definition of BI<20 and the proportion classified as bad using the simple question. Considering the 95% confidence interval, the definition of bad outcome of BI<20 will classify at most 9.2% more as bad outcome defined by the 2-simple-questions method. However, the wider confidence interval in the estimate might represent more of problem in choosing between 2 methods when the sample size is small, as in the case of Menorca in this study, or for other unknown reasons, as in the case of Kaunas (upper limit of the 95% CI, 19.2%).
There was a similar pattern for the proportions classified as having a good outcome. Overall, the upper limit of the 95% CI for the difference was 9.4%, although the upper limits of the 95% CI for the differences in Dijon and Florence were high: 21.9% and 23.1%, respectively. This means that the modified RS score of 0 may classify considerably more respondents as having good outcome than the 2-simple-questions method. Nevertheless, the patterns of outcome identified in this way are broadly similar to previous cross-national studies of stroke10 11 in which poorer outcomes have been reported in Eastern European countries, and in the United Kingdom, compared with other Western European countries.
There are a number of potential limitations to the study, which must be taken into consideration when interpreting the outcomes presented here. First, variations in data collection methods adopted across centers for 3-month follow-up data (face-to-face interviews, telephone interviews, and postal survey) might constitute a methodological limitation. On the one hand, data collection of BI scores by telephone interview has been shown to have high validity compared with face-to-face assessment.16 On the other hand, reporting of subjective health status using the Short Form (SF)-36 is affected by the data collection method. A study comparing postal and telephone administration of the SF-36 found that health ratings were poorer and chronic illness more frequently reported in postal responses.17 Thus, we acknowledge the possibility that different data collection methods might produce different results, with more subjective assessments (general health status rather than ability to perform specific task) perhaps more liable to collection method influence. Allowing centers to use their own preferred method of data collection in this study was not ideal but was necessary to encourage the participation of all centers, including some centers in eastern Europe with limited resources. It is also a pragmatic approach to data collection that will be required should such outcome measures be used in routine practice in the future. A further limitation, as acknowledged above, relates to the small sample size of 1 center in particular, Menorca, which resulted in wide confidence intervals.
Other factors should be considered when differences in outcome across centers are interpreted. Because variations in case mix have an impact on outcomes and their interpretation, statistical correction for confounding variables is essential.18 In this study, outcome as measured by the 2 simple questions was adjusted for by using variables appropriate to the clinical condition being investigated. Nevertheless, some case-mix variation might remain unmeasured, which could account for different outcomes across centers.19 Social class data were not collected in the study, because this information is difficult to standardize across countries, especially in the elderly. However, social class might be associated with expectations of recovery20 and uptake of support services.21
Another caveat relates to the quality of the questionnaire translations from English into the local languages. In the field of cross-national quality-of-life measurement, it has been proposed that, at the very least, correct instrument translation requires forward-backward translation, as well as a test of psychometric criteria on appropriate subjects.22 Such development work was beyond the resources of this project, as well as those of the IST. The 2 simple questions were therefore translated by local study coordinators following discussions at initial project meetings attended by all participants. We were unable to monitor the quality of the translation used in each center, although it was assumed that the questions were conceptually and semantically unproblematic. This assumption may well be unfounded: the meaning of “complete recovery” in particular may be open to different interpretations. This should also be taken into consideration when interpreting differences between the 2-simple-questions scores and modified RS scores, as well as differences in 2-simple-questions outcome across centers.
The importance of our study, however, lies not so much in the outcome data themselves but in the questions raised about the kind of outcome the 2 simple questions capture. They were devised as functional outcomes. Unlike the modified RS, which was developed for assessment by an observer, the 2-simple-questions method asks the patient (or proxy) for information and therefore requires the patient to interpret the question and to decide what information to divulge. One question might be assumed to ask only for factual information: “Did you require help from another person for everyday activities?” However, even this raised problems in the original study, with some subjects being unclear about the question’s meaning or intention.3 Researchers participating in this study agreed to a definition of everyday activities that encompassed basic functional tasks (ie, feeding, dressing, personal hygiene). Nevertheless, the question is open to differing interpretations, because the concept of “everyday” activities may well vary from one respondent to another, as might expectations about activities for which it is legitimate to receive assistance from another person. Factors that might influence views of what constitutes an everyday activity and views of legitimate assistance include age, prestroke activities, family situation, and local culture. A striking example is provided by Ali and Mulley23 in their study of the use of the BI in rural Pakistan, in which they concluded that the measure was not appropriate, given local customs and lifestyle.
The second question asks subjects to comment on their own feeling of recovery and therefore invites an entirely subjective assessment. Consequently, the issue of how respondents interpret the question posed is likewise of crucial importance. To interpret better the responses elicited across different age groups (younger stroke patients compared with older stroke patients) and across cultures (a stroke patient in post-Soviet Latvia compared with an Italian living in Florence or a first generation African-Caribbean living in inner-city London), we need to know more about the meanings attached to “recovery.” These will surely be influenced by different expectations of recovery, disability, and well-being in different age groups, cultures, and perhaps also social class/wealth. The investigators of an American qualitative study of 102 stroke patients have reported that none of the patients interviewed, “even those who had visibly and tangibly regained lost function,” considered themselves to have recovered.24 25 However, it is difficult to generalize about stroke patients’ concepts of recovery, because few studies have investigated their expectations, much less how these may vary in different contexts.
In conclusion, this study has tested the feasibility of using simple questions rather than longer outcome instruments in a cross-national investigation. Both simple questions showed high sensitivity and specificity, demonstrating their theoretical validity. There were significant differences across centers in outcome, but reasons for these differences remain unclear. They may reflect inadequate translation of the questions, residual case-mix variation, or different cultural expectations of recovery and needs for assistance. The 2 simple questions were proposed as a resource-efficient method of collecting data on physical outcome after stroke. The need to collect data efficiently in large-scale studies can lead to a trade-off between sophistication and feasibility. Because the 2 simple questions have been used in a well-known international study that recruited over 19 000 subjects, their feasibility has been demonstrated. However, our study suggests that this method of outcome assessment should be adopted with caution. Although designed to assess functional outcome, respondents may interpret the questions with reference to a number of criteria. Some of these, such as culturally specific views of what constitutes recovery, have not yet been investigated. However, their importance in interpreting responses to questions such as those considered here should not be discounted.
Prof A. Czlonkowska and Dr D. Ryglewicz (Institute of Psychiatry and Neurology, Warsaw, Poland); Dr J. Aleixo Dias (Division of Epidemiology and Biostatistics, Direccao General Da Saude, Lisbon Portugal); Prof M.O. Carrageta, Dr I. Remidios, and Dr J. Namora (Hospital Garcia de Orta, Almada, Portugal); Prof G. Enina and Dr I. Purina (Latvian Neuroangiological Centre, Riga, Latvia); Prof M. Giroud and Dr M. Menassa (Service de Neurologie, Hôpital General, Dijon, France); Prof D. Inzitari, Dr P. Vanni, Dr A. Di Carlo, Dr M. Lamassa, and Dr S. Rossi (Dipartimento di Scienze Neurologiche & Psichiatriche, Ospedale Careggi, Florence, Italy); Dr S. Spolveri and Dr M.C. Baruffi (Torregalli Hospital, Florence, Italy); Dr D. Rastenyte (Institute of Cardiology, Medical Academy, Kaunas, Lithuania); Dr M. Torrent (Health Care Research Unit, Mao, Menorca, Spain); Dr A.G. Rudd (Department of Care of the Elderly, Guy’s and St Thomas’ Hospital, London, England); Dr R. Beech (Centre for Health Planning and Management, Keele University, England).
The study was funded by the European Union Biomed II programme. We are grateful to all patients and their family members who participated in the study.
- Received July 25, 2000.
- Revision received November 17, 2000.
- Copyright © 2001 by American Heart Association
Sulter G, Steen C, De Keyser J. Use of the Barthel Index and modified Rankin scale in acute stroke trials. Stroke. 1999;30:1538–1541.
World Health Organisation. Formulating Strategies for Health for All by the Year 2000: Guiding Principles and Essential Issues. Document of the Executive Board of the World Health Organization. Geneva, Switzerland: World Health Organization; 1979.
Dennis M, Wellwood I, O’Rourke S, MacHale S Warlow C. How reliable are simple questions in assessing outcome after stroke? Cerebrovasc Dis. 1997;7:19–21.
Dennis M, Wellwood I, Warlow C. Are simple questions a valid measure of outcome after stroke? Cerebrovasc Dis. 1997;7:22–27.
International Stroke Trial Pilot Study Collaborative Group. Study design of the International Stroke Trial (IST), baseline data and outcome in 984 randomised patients in the pilot study. J Neurol Neurosurg Psychiatry. 1996;60:371–376.
Beech R, Radcliffe M, Tilling K, Wolfe C. Hospital services for stroke care: a European perspective. Stroke. 1996;27:1958–1964.
Thorvaldsen P, Asplund K, Kuulasmaa K, Rajakangas AM, Schvoll M. Stroke incidence, case fatality, and mortality in the WHO MONICA project. Stroke. 1995;26:361–367.
Wolfe CDA, Tilling K, Beech R, Rudd AG. Variations in case fatality and dependency from stroke in Western and Central Europe. Stroke. 1999;30:350–356.
UK-TIA Study Group. The UK-TIA aspirin trial: interim results. BMJ. 1988;296:316–320.
Armitage P, Berry G. Statistical Methods in Medical Research. 3rd ed. Oxford, UK: Blackwell Science; 1994.
Collin, C Wade D. Assessing motor impairment after stroke: a pilot reliability study. J Neurol Neurosurg Psychiatry. 1980;10:125–132.
Davenport RJ, Dennis MD, Warlow CP. Effect of correcting outcome data for case mix: an example from stroke medicine. BMJ. 1996;312:1503–1505.
McKevitt C, Beech R, Pound P, Rudd AG, Wolfe CDA. Putting stroke outcomes into context: assessment of variations in the processes of care. Eur J Public Health. 2000;10:120–126.
Atkin K, Rollings J. Community Care in Multi-Racial Britain: A Critical View of the Literature. London, UK: HMSO; 1993.
Becker G. Continuity after stroke: implications of life-course disruption in old age. Gerontologist. 1993;2:148–158.