Are Cognitive Screening Tools Sensitive and Specific Enough for Use After Stroke?
A Systematic Literature Review
It is estimated that up to three quarters of acute and subacute stroke survivors exhibit cognitive impairment, with many experiencing ongoing problems.1,2 Cognitive impairment can significantly compromise functional recovery, quality of life, and social engagement after stroke.2–4 Encouragingly early detection and rehabilitation can improve functional recovery of stroke-related impairments.5 Unfortunately, however, a significant amount of cognitive dysfunction is not detected by health professionals in acute and subacute settings.6
Comprehensive neuropsychological assessment using reliable and valid tools to measure multiple cognitive domains is considered the gold standard method of detecting and characterizing cognitive dysfunction after stroke. However, neuropsychological assessments are often considered too expensive and lengthy to be routinely administered to patients with stroke. In an attempt to improve detection of cognitive impairments, while managing expense, many national stroke clinical management guidelines now recommend the use of screening measures to detect cognitive impairment.7–9 If cognitive difficulties are detected during this screening process, comprehensive assessment and intervention is then recommended. The Mini-Mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) are 2 screening tools that are regularly used in clinical practice. Although these tests are commonly used to detect cognitive impairment in dementia settings, neither was specifically designed for use after stroke. The profile of cognitive impairment after stroke is heterogeneous, and focal impairments such as dysphasia, dyspraxia, unilateral inattention, and agnosia are often observed. Therefore, we cannot assume that reliability and validity of cognitive screening tools found in other clinical populations will be comparable in stroke.
It is acknowledged that numerous reliability and validity indices are important to consider when evaluating neuropsychological measures. However, when considering the use of cognitive screening measures, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) are particularly important to ensure patients with cognitive impairment are not missed, and patients without cognitive impairment do not undergo comprehensive neuropsychological evaluation unnecessarily. Several studies have investigated the sensitivity and specificity of cognitive screening tools within stroke populations. However, a range of different methodologies have been used, and results seem to vary considerably across studies. Thus, the aims of this review were (1) to systematically review the sensitivity, specificity, PPV, and NPV of a range of cognitive screening tools used in stroke and (2) to critically evaluate methodologies used within these studies. It is intended that findings from this review will inform clinicians regarding suitability of these screening tools for clinical use and direct best practice for future research in this field.
This systematic literature review was conducted and reported in line with the current Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement. Articles were identified through MedLine, PsychInfo, Scopus, PubMed, and CINAHL databases. Keywords included stroke, cerebrovasc*, cognit*, screen*, sensitivity, and specificity. Common screening measure names were also used. See Figure I in the online-only Data Supplement for an example of key words and search strategy. The search was limited to studies of adult humans published in English. The electronic search was conducted on December 27, 2013. Reference lists of articles included in this review and other relevant publications were also used to identify any studies overlooked in the electronic search.
Articles were included in this review if they met 3 key criteria: (1) male or female participants aged ≥18 years; (2) confirmed ischemic or hemorrhagic stroke, and (3) analysis of the sensitivity and specificity of a cognitive screening measure compared with a gold standard neuropsychological assessment. If >1 clinical population was included in a study (eg, transient ischemic attack and stroke), stroke-specific data must have been available. Cognitive screening measures were included if they were designed to screen for cognitive impairment or had been used for that purpose and typically took <30 minutes to administer. Gold standard neuropsychological assessments were included if they used multiple domain-specific neuropsychological assessments with established reliability and validity.10 Some studies identified during the literature search investigated screening tools that aim to detect just 1 cognitive domain, such as dysphasia or dyspraxia. However, understandably these studies typically only included 1 cognitive domain within their gold standard assessment and thus did not meet our eligibility criteria.
In line with PRISMA guidelines, 2 authors (R.J.S. and M.H.O.) separately reviewed results from the electronic search and identified potentially relevant titles and abstracts. If the abstract suggested the article met the inclusion criteria, the full-text article was obtained and evaluated. Full-text articles were then compared across authors, and contrasting/ambiguous studies were discussed to determine whether they met criteria for inclusion. Articles that met the inclusion criteria were included for subsequent data extraction.
The following data were extracted from each article: author, year, title, participant data (sample size, age, sex, education), stroke data (mechanism, location, hemisphere, severity), recruitment procedures (inclusion/exclusion criteria, participant attrition), cognitive screening measure used, domains and tests included in gold standard cognitive assessment, time poststroke of screening and gold standard cognitive assessments, and sensitivity, specificity, PPV, and NPV results. In some studies, multiple sets of sensitivity, specificity, PPV, and NPV data were presented at different screening measure cut points. To limit the amount of data presented, the cut point that resulted in the most favorable sensitivity and specificity results was selected. This was based on commonly used criteria of sensitivity >80% and specificity >60%.11,12
Electronic and additional searching returned 13 201 records; duplicate ones were removed. Sixty-six records remained following title and abstract screening. A further 50 records were excluded during full-text review for the following reasons: review articles (n=2), combined transient ischemic attack/stroke data (n=3), full neuropsychological battery used instead of screening tool (n=4), screening measure did not meet inclusion criteria (n=3), not written in English (n=2), no sensitivity and specificity data (n=6), nonstroke samples used (n=11), and gold standard neuropsychological battery did not meet inclusion criteria (n=19). Sixteen articles were found to meet our inclusion criteria and were retained for analysis. See Figure II in the online-only Data Supplement for a summary of the above.
Summary of study descriptions and evaluations is presented in Tables I and II in the online-only Data Supplement. All studies adequately reported sample size; however, few justified the sample size used or reported whether assumptions for statistical analyses were met. Based on key references within this field,13,14 it seems that many studies did not use sufficient sample sizes, particularly those that used samples of ≤50 people.15–18 This may have contributed to the large confidence intervals of sensitivity and specificity results noted across studies.
With regard to demographic variables, all studies provided adequate information regarding age and sex. Four studies failed to include education information.11,15,16,19 Most study samples appeared representative of stroke populations. However, most studies excluded nonnative language speakers. With regard to stroke variables, most studies reported key information such as stroke mechanism (12 of 16), location (12 of 16), and severity (11 of 16). Stroke mechanism and lesion location variables across studies appeared generally representative of stroke populations. However, most studies excluded people with severe stroke. Only 2 studies provided adequate statistical analysis of whether the final study sample used for analysis was representative of the wider patient group within their clinical setting.20,21 There was significant heterogeneity across studies regarding time since stroke, with mean/median times ranging from 6 days17 to 29 months18 across studies. Two studies failed to report this information,15,22 and others provided only limited information.
Sensitivity and Specificity Methodology
All studies calculated sensitivity and specificity using receiver operative characteristics curve analysis. Only 9 of 16 studies calculated PPV or NPV data. Gold standard assessments used to classify the cognitive status of participants differed across studies. Cognitive domains such as language, visual/space perception, attention, memory, and executive functions were generally well represented. However, cognitive functions such as calculic function, praxis, and mental speed were less well represented, included in ≤4 of the 16 studies. Most studies used age- and education-based normative data to interpret gold standard test performances, using a criterion ranging from fifth to tenth percentile as an indicator of impaired performance. However, studies differed regarding whether impairment on single or multiple cognitive domains was required to classify participants as impaired on gold standard assessments. Furthermore, some studies did not use psychometric criteria at all, instead relying on clinician opinion taken from neuropsychological reports.11,15,20 To reliably and validly assess sensitivity and specificity, it was expected that screening and gold standard assessments would be conducted within a short period of each other. Unfortunately, 5 studies did not sufficiently report this information,16,18,20,23,24 and 3 reported the mean time interval between assessments as >10 days.11,15,25 Only 6 studies stratified sensitivity or specificity results according to demographic or stroke variables.20,22,23,25–27
Sensitivity and Specificity Results
As seen in Table, the MMSE and MoCA were the most commonly studied screening measures. With regard to the MMSE, 11 studies investigated the sensitivity and specificity of the measure, with just 3 reporting sensitivity and specificity at or above commonly regarded acceptable levels (sensitivity >80% and specificity >60%).23,25,27 All 3 studies obtained these levels using cut points of either 26/30 or 27/30. Bour et al25 achieved acceptable sensitivity and specificity levels only when the gold standard assessment impairment criterion was increased to ≥2 cognitive domains. Of the 3 studies that reported adequate sensitivity and specificity, PPVs were generally >80%; however, NPVs were less impressive, ranging from 65% to 73% across studies.
Five studies investigated the sensitivity and specificity of the MoCA. Three of these reported adequate sensitivity and specificity21,23,27; 1 further study reported adequate sensitivity and specificity at 1 year poststroke, but not 2 to 4 weeks poststroke,19 and the final study reported adequate sensitivity and specificity for only 2 of the 13 gold standard tests included (naming and verbal learning).18 Acceptable sensitivity and specificity were found at different MoCA cut points across the studies, ranging from 21 to 26. Of the 4 studies that reported adequate sensitivity and specificity, only 2 studies also reported PPVs and NPVs >80%.21,27
Four studies directly compared the MMSE and MoCA. Using area under the receiver operating curve scores, 2 studies reported no significant differences between the MMSE and MoCA,23,27 whereas another reported higher MoCA area under the receiver operating curve scores compared with the MMSE at 1 year poststroke only.19 The final study reported superior sensitivity of the MoCA compared with the MMSE.18 The general trend noted across these 4 studies was of somewhat better sensitivity and slightly poorer specificity of MoCA compared with the MMSE.
The sensitivity and specificity of other screening measures have been investigated in only a single study that met our inclusion criteria. Both the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) and Cognistat demonstrated typically accepted sensitivity and specificity levels; however, NPV was only 53% for the RBANS, and no PPV or NPV data were presented for the Cognistat.16,28 The Barrow Neurological Institute (BNI) screen for higher cerebral functions returned acceptable sensitivity, specificity, PPV, and NPV levels in people aged >55 years but not <55 years.26 The Middlesex Elderly Assessment of Mental State (MEAMS), Addenbrooke Cognitive Examination, Revised (ACE-R), Screening Instrument for Neuropsychological Impairments in Stroke (SINS), and Clock Drawing Test all failed to achieve adequate levels of sensitivity and specificity.12,15,16
In addition to providing total scores, some cognitive screening measures also provided domain-specific subscores. Four studies included in this review examined the sensitivity and specificity of these subscores to detect domain-specific cognitive impairment assessed using gold standard measures.12,15,16,28 This information is presented in Table III in the online-only Data Supplement. Three of the 5 subscores from the RBANS achieved acceptable sensitivity and specificity levels (immediate memory, language, visuospatial) with the other 2 subtests only just under acceptable thresholds (delayed memory, attention).28 Results from the Cognistat, MEAMS, ACE-R, and SINS were less impressive. The Cognistat and MEAMS both achieved acceptable results for their naming subscores; however, all other subscores did not reach acceptable levels.15,16 No subscores from ACE-R or SINS reached acceptable levels.12,16 Although memory, language, and visuospatial domains were regularly assessed, attention, processing speed, praxis, and executive function were seldom examined.
Few studies specifically investigated whether methodological factors or patient variables impacted on sensitivity and specificity results. Qualitatively there was no consistent evidence to suggest time poststroke significantly affected sensitivity and specificity results. Although Wong et al19 reported acceptable MoCA sensitivity and specificity at 1 year but not 2 to 4 weeks after stroke, other studies reported favorable MoCA, Cognistat, RBANS, and BNI sensitivity and specificity ranging from 1 week to 1 year poststroke. With regard to time interval between screening and gold standard assessments, almost all studies that reported favorable sensitivity and specificity of the MoCA, Cognistat, RBANS, and BNI used mean assessment time intervals within ≈1 week. Studies varied regarding their criteria for impairment on gold standard assessment. Those who used a multiple cognitive domain criterion were more likely to report adequate sensitivity and specificity compared with those who used a single-domain criterion. A minority of studies stratified sensitivity and specificity results according to demographic and stroke variables. One study reported better sensitivity and specificity results in older stroke participants.26 Lesion hemisphere effects were equivocal. One study reported no effect,25 whereas others reported better results in right hemisphere20,27 or left hemisphere groups.22 No studies specifically investigated the impact of premorbid cognitive function, stroke severity or mechanism, or cultural factors on screening measure performance. Few studies directly compared screening measure performance across different gold standard cognitive domains. The MoCA was shown to be relatively more sensitive to naming and verbal learning difficulties compared with other cognitive domains in 1 study,18 whereas performance was higher for language and visuospatial impairments in another.19
Sixteen articles that investigated the sensitivity and specificity of cognitive screening tools in stroke met our inclusion criteria. Eleven of these studies investigated the MMSE, and most reported inadequate sensitivity and specificity. MoCA results were somewhat better, with 3 of 5 studies reporting consistent acceptable sensitivity and specificity results. It is not clear why the MoCA performed better than the MMSE. However, possible reasons include the fact the MoCA contains items assessing executive functions, which are often affected by stroke and the total score of MoCA can be adjusted for education level, albeit crudely. Interestingly, 2 relatively more recently developed measures, the RBANS and Cognistat, demonstrated traditionally acceptable levels of sensitivity and specificity. There is also some preliminary support for the use of BNI within older stroke populations. Furthermore, analysis of RBANS subscores highlighted promising sensitivity and specificity results to detect a range of focal cognitive difficulties, including memory, language, and visuospatial difficulties. However, it is noted that the RBANS, Cognistat, and BNI were only investigated in 1 study that met our eligibility criteria, and further research confirming these initial findings is warranted.
The above findings provide some preliminary support for the use of the MoCA, BNI, Cognistat, and RBANS as screening measures for stroke. However, these findings should be considered in the context of some key methodological issues. First, of the 7 studies that reported adequate sensitivity and specificity of MoCA, BNI, Cognistat, and RBANS, 3 either failed to report PPVs and NPVs16 or reported NPVs <80% (indicating ≥20% false-negative rates).23,28 Second, adequate sensitivity and specificity of the MoCA were found at different cut points, making recommendations for clinical practice difficult. Third, most studies did not include calculation, praxis, and speed of information processing within gold standard assessments. Thus, capacity for screening measures to detect these cognitive difficulties remains unknown, which is problematic considering that impairments of calculic, praxis, and mental speed functions are not uncommon after stroke and can significantly impact functional recovery.1 Fourth, most studies that reported adequate sensitivity and specificity used a criterion of multiple cognitive impairments (≥2 domains) within their gold standard assessments. Studies have shown higher screening measure sensitivity for multiple-domain versus single-domain cognitive impairments.25,29 Thus sensitivity results from studies that used multiple-domain impairment as a gold standard criterion may have been lower if a single-domain criterion was used (although equally specificity results may have been higher). Finally, few studies stratified sensitivity and specificity results according to demographic and stroke variables. This can be problematic for several reasons. For example, many screening measures do not account for age, education, or premorbid intelligence. Thus, it is possible that sensitivity of these screening measures for young and highly intelligent people may be limited, and specificity may be limited in older people and those with lower premorbid intelligence. Furthermore, people with severe stroke and those from culturally and linguistically diverse backgrounds were often excluded from studies altogether. Additional research is required within these groups before use of these screening measures is warranted. See Figure for recommendations for future research.
With regard to more general methodological issues, significant heterogeneity and poor reporting regarding time interval between screening and gold standard assessments and time of assessment since stroke were noted across studies. Unless long-term predictive validity is a specific research aim, we recommend screening and gold standard assessments be conducted as time-congruent as possible. Cognitive function can change significantly during the course of stroke recovery, and results from early screening cannot be assumed to be an accurate picture of longer-term cognitive function. Thus, we recommend further research investigating the potential impact of time since stroke on the sensitivity and specificity of screening measures. On another note, although it is important to report PPV and NPV data, we acknowledge these values vary according to prevalence of impairment in the population. Thus, direct comparison of these values across studies is not valid. Importantly, however, PPVs and NPVs can be calculated based on sensitivity, specificity, and prevalence data.30 Thus, clinicians and researchers alike may choose to use data presented in this review to estimate PPV and NPV across a range of stroke populations where prevalence data are known. Finally, few studies have investigated which specific cognitive domains are more or less likely to be detected by these screening measures. Further research is warranted.
Many researchers have previously suggested that 80% sensitivity and 60% specificity of cognitive screening measures is considered adequate for clinical practice. However, the significant negative impact of cognitive impairment in stroke survivors has been consistently demonstrated.2–4 As such, 20% nondetection of patients with cognitive impairment seems unacceptable for clinical practice. Further research is required to more comprehensively examine existing screening measures that show initial promise (MoCA, Cognistat, RBANS, BNI) addressing previous methodological weaknesses noted above. Further development of more appropriate stroke-specific screening measures may be warranted if future research does not generate positive results. Furthermore, it is important to evaluate how current recommended guidelines (cognitive screening followed by comprehensive assessment) are being implemented in clinical practice. There is evidence to suggest good adherence to cognitive screening protocols, but limited provision of further comprehensive assessment when indicated by screening results.31 Further research exploring potential modifications to screening processes is also warranted. For example, benefits of including patient, close other, or clinician reports of cognitive difficulties, in conjunction with screening measures, to improve detection of cognitive difficulties could be explored. Addition of items not included in current screening measures, but often affected by stroke such as calculation, praxis, and mental speed, should also be considered. It would be particularly helpful for these cognitive measures to be incorporated as standard measures within stroke trials to ensure ongoing comprehensive investigation of their utility across research and clinical settings. Although beyond the scope of this review, it is also important to consider cognitive screening in other cerebrovascular disorders. For example, some screening protocols have been specifically developed for small-vessel disease and have demonstrated encouraging results.32 This may be because of the relatively more homogeneous neuropathology and associated cognitive profile seen in this population, compared with the relatively more heterogeneous cognitive profile across the stroke population, which seems to present as a challenge for some existing screening measures to accommodate.
In conclusion, a limited number of studies have adequately investigated the sensitivity and specificity of cognitive screening measures after stroke. Although most studies do not support the MMSE for clinical use, the MoCA, Cognistat, RBANS, and BNI show some initial promise. However, further research addressing key methodological considerations and further discussion regarding what is considered acceptable sensitivity and specificity for clinical practice is required before use of these screening measures can be fully supported.
The online-only Data Supplement is available with this article at http://stroke.ahajournals.org/lookup/suppl/doi:10.1161/STROKEAHA.114.004232/-/DC1.
- Received February 18, 2014.
- Revision received May 27, 2014.
- Accepted July 1, 2014.
- © 2014 American Heart Association, Inc.
- Hommel M,
- Miguel ST,
- Naegele B,
- Gonnet N,
- Jaillard A
- Jaillard A,
- Naegele B,
- Trabucco-Miguel S,
- LeBas JF,
- Hommel M
- 7.↵Intercollegiate Stroke Working Party. National Clinical Guidelines for Stroke. 4th ed. London, UK: Royal College of Physicians; 2012.
- 8.↵National Stroke Foundation. Clinical Guidelines for Stroke Management 2010. Melbourne, Australia: National Stroke Foundation; 2010.
- Lindsay MP,
- Gubitz G,
- Bayley M,
- Hill MD,
- Davies-Schinkel C,
- Singh S,
- Phillips S
- Lezak MD,
- Howieson DB,
- Bigler ED,
- Tranel D
- Blake H,
- McKinney M,
- Treece K,
- Lee E,
- Lincoln NB
- Carley S,
- Dosman S,
- Jones SR,
- Harrison M
- Nøkleby K,
- Boland E,
- Bergersen H,
- Schanke AK,
- Farner L,
- Wagle J,
- et al
- Nys GM,
- van Zandvoort MJ,
- de Kort PL,
- Jansen BP,
- Kappelle LJ,
- de Haan EH
- Godefroy O,
- Fickl A,
- Roussel M,
- Auribault C,
- Bugnicourt JM,
- Lamy C,
- et al
- O’Sullivan M,
- Morris RG,
- Markus HS