Validation of a New Language Screening Tool for Patients With Acute Stroke
The Language Screening Test (LAST)
Background and Purpose—Standard aphasia scales such as the Boston Diagnosis Aphasia Evaluation are inappropriate for use in acute stroke. Likewise, global stroke scales do not reliably detect aphasia, and existing brief aphasia screening scales suitable for patients with stroke have several limitations. The objective of this study was to generate and validate a bedside language screening tool, the Language Screening Test, suitable for use in the emergency setting.
Methods—The Language Screening Test comprises 5 subtests and a total of 15 items. To avoid retest bias, we created 2 parallel versions of the scale. We report the equivalence of the 2 versions, their internal and external validity, and their interrater reliability. We validated the scale by administering it to 300 consecutive patients within 24 hours after admission to our stroke unit and to 104 stabilized patients with and without aphasia using the Boston Diagnosis Aphasia Evaluation as a reference.
Results—The 2 versions of the Language Screening Test were equivalent with an intraclass correlation coefficient of 0.96. Internal validity was good; none of the items showed a floor or ceiling effect with no redundancy and good internal consistency (Cronbach α 0.88). External validation against the Boston Diagnosis Aphasia Evaluation showed a sensitivity of 0.98 and a specificity of 1. Interrater agreement was near perfect (intraclass correlation coefficient, 0.998). The median time to complete the Language Screening Test was approximately 2 minutes. Importantly, the Language Screening Test does not need to be administered by a speech and language therapist.
Conclusions—This comprehensively validated language rating scale is simple and rapid, making it a useful tool for bedside evaluation of patients with acute stroke in routine clinical practice.
Poststroke aphasia is a major source of disability, potentially leading to impaired communication, reduced social activity, depression, and a lower probability of resuming work.1,–,4 Despite some controversy, early detection of aphasia after stroke may improve rehabilitation by taking advantage of the synergy between intensive speech therapy and early neural reorganization.5,–,7 Tools capable of detecting aphasia and evaluating its severity during the acute phase of stroke might help to improve early rehabilitation and to predict outcome.8 Standard aphasia rating scales such as the Western Aphasia Battery, the Boston Diagnostic Aphasia Evaluation (BDAE), and the Boston Naming Test are not appropriate for use during the acute phase of stroke.7,9,–,11 In particular, these scales take too long to complete and must be administered by speech and language therapists.9,–,11 Global stroke rating scales such as the National Institutes of Health Stroke Scale and the Scandinavian Stroke Scale include language items and have been developed for use in acute settings,12,–,17 but they do not reliably detect aphasia.8 Several attempts have been made to develop and validate brief aphasia screening scales suitable for patients with acute stroke,5,18,–,25 but all have inherent structural limitations, including7 (1) inclusion of written language subtests, the results of which are influenced by hemiplegia and illiteracy5,19,–,23,25; (2) use of complex visual material inappropriate for patients with stroke with neurovisual deficits19,20; (3) inclusion of subtests the results of which are markedly influenced by attention/executive dysfunction19,20; (4) excessively lengthy administration22; (5) difficulties with administration or scoring5,18,23,25; and (6) IQ dependency.21 Some of these scales also have poor sensitivity for the detection of language disorders and a paucity of information on their validity and reliability.5,20
We therefore developed a brief language screening scale, named the Language Screening Test (LAST), for the assessment of patients with acute stroke. LAST incorporates the following features: (1) no written material; (2) no complex visual material; (3) no evaluation of verbal executive function; and (4) suitability for bedside administration by persons who are not speech and language therapists. We report the validity, reliability, sensitivity, and specificity of LAST.
Patients and Methods
LAST was developed as a formalized quantitative scale for screening language functions, including comprehension and expression. The initial, qualitative design phase focused on item generation and construction. We chose to exclude verbal fluency subtests, the results of which are strongly influenced by changes in attention/executive function, and also written language subtests that are unsuitable for hemiplegic and illiterate patients. We generated several preliminary versions of the scale, which were evaluated internally and then refined because of the following weaknesses: (1) too lengthy to administer (too many items); (2) in the naming task, inadequate use of real daily objects such as watches and pens, instead of images, which are less ambiguous (for example, the pen of 1 examiner is different from the pen of another one); and (3) in the picture recognition task, inadequate use of color pictures, which may provide semantic clues. We selected the items by consensus and eliminated any ambiguities by administering the scale to 50 healthy volunteers (data not shown).
The final version of LAST consists of 5 subtests and a total of 15 items (Figure 1). The patient has 5 seconds to answer each question, and the answer is scored as either 1 (perfect answer) or 0 (imperfect answer, including arthric errors, and failure to answer). The maximum score is therefore 15. There are 2 subscores, namely an expression index (naming, repetition, and automatic speech; maximum score 8 points) and a receptive index (picture recognition and verbal instructions; maximum score 7 points).
The test is administered on a simple sheet held in portrait orientation. The front side corresponds to the expression index with 5 pictures to be named facing the patient and the instructions facing the examiner. The other side corresponds to the receptive index with 8 pictures (4 to be indicated with a finger and 4 trap pictures) facing the patient and the instructions facing the examiner (see Supplemental Data; http://stroke.ahajournals.org).
Each subtest is composed as follows (Figure 1):
“Naming” subtest: naming of 5 black-and-white pictures specially drawn for the test. The pictures were selected for their everyday familiarity (subjective verbal frequency) and for the image evoking value of the noun.26 Standard synonyms and abbreviations are accepted (alligator for crocodile, TV for television, etc). The maximum score is 5 points.
“Repetition” subtest: repetition of 1 concrete 4-syllable noun and 1 8-word sentence containing 11 syllables and 3 consonantal groups. One self-correction is accepted. The maximum score is 2 points, 1 for the isolated word and 1 for the sentence.
“Automatic speech” subtest: the patient counts from 1 to 10. No mistakes or omissions are accepted. The score is 1 or 0.
“Picture recognition” subtest: recognition of 4 black-and-white pictures drawn specially for the test and selected for their high image-evoking value and sorted by their subjective verbal frequency. This subtest includes 2 phonologic traps (close and distant), 1 semantic and 1 visual. The maximum score is 4.
“Verbal instructions” subtest: execution of 3 verbal orders—simple, semicomplex, and complex—involving the use of part of the body or simple objects in the room. The patient is asked to precisely execute the verbal order. The maximum score is 3.
Having developed a version of the scale that we considered suitable for validation (LAST-a), we then generated a second, parallel version (LAST-b). Each item on the 2 scales was different (except for the automatic speech item, see subsequently) but strictly matched to obtain 2 equivalent versions of the scale. For example, the pictures (naming subtest and recognition subtest) used in the 2 versions were each matched for their visual and verbal frequency, and the words and sentences used for the repetition subtest were matched for their consonantal content. Several series can be used to assess automatic speech, but counting is the most universally acquired (days of the week and the alphabet, for example, are more influenced by sociocultural status). Counting to 10 was thus used in both versions (see Supplemental Data).
Patients and Instruments
To validate the scale, we included both “acute” and “chronic” patients. We first prospectively enrolled consecutive “acute” patients, that is, admitted with suspected acute stroke to our stroke unit during a 7-month period. They were tested within 24 hours after their admission. During the same period, we enrolled stabilized patients (hospitalized or ambulatory) seen in our neurology department, but not in the stroke unit, who were able to complete the entire BDAE comprehensive language evaluation. These “chronic” patients were considered aphasic or nonaphasic on the basis of their BDAE results. The BDAE is a standard scale widely used for comprehensive evaluation of aphasia. Its 28 subtests evaluate oral comprehension, oral agility, repetition, naming, oral reading, reading comprehension, and writing and take between 1.5 and 2 hours to administer.10 Both “acute” and “chronic” patients were excluded if they had any of the following characteristics: (1) history of dementia or of severe psychiatric disorders; (2) deafness or blindness; (3) nonnative French language; and (4) altered consciousness. The study was approved by the ethics committee of Pitié-Salpêtrière Hospital, Paris. Demographic data were collected for all the patients and the National Institutes of Health Stroke Scale score was recorded for the “acute” patients. A schematic representation of the study design is shown in Figure 2.
Validation of LAST
LAST was validated on the basis of (1) the equivalence of the 2 versions of the instrument; (2) the internal validity of the 2 versions of the instrument (item analysis, reliability, and factor structure); (3) external validity (comparison with the BDAE scale); and (4) interrater reliability.
To test the equivalence of the 2 versions of LAST, the “chronic” aphasic patients were asked to complete LAST-a followed by LAST-b with a 1-minute rest period.
To assess the internal validity of the scale, consecutive “acute” patients completed either LAST-a or LAST-b within 24 hours after admission. The 2 versions were used in alternation for each new patient (only 1 version per patient).
To assess external validity, aphasic and nonaphasic “chronic” patients were asked to complete the BDAE language evaluation followed by either LAST-a or LAST-b on the same day and administered by 2 different and blinded examiners.
Interrater reliability was assessed in the “acute” patients. Four examiners pairs were used, consisting of a speech and language therapist with another speech and language therapist, a student, a nurse, or a neurologist. All the examiners received a 5-minute explanation on how to administer the test. Blinded assessment was ensured as follows. Two examiners were present at the bedside. The first examiner administered LAST to the patient (result used for internal validity), reading aloud 1 by 1 the different subtest, at the same time as the second examiner, who could hear the first examiner but could not see the results he or she recorded, simultaneously scored the same version.
The median time for scale completion was calculated for 50 new consecutive “acute” patients.
The concordance of the 2 versions of LAST was assessed by calculating the intraclass correlation coefficient (ICC) from the 2 total scores (equivalent to a quadratic weighted κ).
Internal validity was assessed in 3 steps. First, we closely inspected the score distribution for each item to detect a floor or ceiling effect, and the Pearson correlation matrix was used to detect item redundancy. Second, the number of underlying dimensions was determined by parallel analysis, which consists of representing a traditional screeplot with simulations.27,28 Third, we calculated Cronbach α coefficient, a measure of reliability based on internal correlation of the items on the scale.
To evaluate external validity, sensitivity and specificity were assessed with respect to the BDAE. We represented the correlations between the LAST and BDAE subtests on a sphere (Figure 3)29 and with the receiver operating characteristic curve (Figure 4).
The ICC was used to appreciate interrater reliability.
R 2.11.1 software and the “psy” library were used for all analyses.30
Three hundred forty consecutive unselected “acute” patients were admitted to our stroke unit for suspected acute stroke during a 7-month period. Thirty-six patients were excluded (nonnative French speakers [n=24], history of dementia or severe psychiatric disorders [n=6], deafness or blindness [n=3], altered consciousness [n=3]) and 4 could not be evaluated for logistic reasons. The remaining 300 “acute” patients were included in the internal validity and interrater reliability assessments (159 men and 141 women; mean age 62.6 years [±14.9]; mean National Institutes of Health Stroke Scale score 3.5 [±5.1]; Figure 2).
The sample of 104 “chronic” patient consisted of 55 men and 49 women with a mean age of 61.6 years (±17.9). Based on the BDAE results, 54 of these “chronic” patients (30 men and 24 women, mean age 66.4 years [±11.0]) had aphasia and were used to study the equivalence of the 2 versions of LAST. To assess external validity, we used the results for the 50 “chronic” patients without aphasia (25 men and 25 women; mean age 56.4 years [±16.2]) and 52 “chronic” patients with aphasia (27 men and 25 women; mean age 67.4 years [±14.8]; mean LAST score 9.9 [±3.9]). Two patients refused the BDAE (Figure 2).
Time Taken to Administer LAST
The median time required to complete LAST was 124 seconds (interquartile range, 80).
Equivalence of LAST-a and LAST-b
The comparison of LAST-a and LAST-b in the sample of 54 “chronic” aphasic patients showed that the 2 versions were strictly equivalent with an ICC of 0.96. Exclusion of the automatic speech item, which is identical in the 2 versions, did not significantly modify the ICC (0.954). None of the patients diagnosed as “aphasic” in 1 of the versions was “nonaphasic” in the other version. The same level of agreement was observed for each item of the scale.
Because LAST-a and LAST-b were equivalent, data obtained with the 2 versions were pooled for analysis. Similar results were obtained with LAST-a, LAST-b, and the 2 versions combined. Item-by-item analysis of the whole sample of 300 “acute” patients showed no floor or ceiling effect. There was no redundancy between items as shown by Pearson correlation coefficients <0.8. Parallel analysis revealed a 1-dimensional structure. The internal consistency of the 15 items was good with a Cronbach α of 0.88.
Taking the BDAE as the gold standard, LAST had a sensitivity of 0.98 for aphasia and a specificity of 1 with a cutoff of <15 in the sample of 102 “chronic” patients. Thus, only 1 patient identified as “aphasic” with the BDAE obtained a score of 15 out of 15 in LAST, whereas all patients with a LAST score of <15 were diagnosed as “aphasic” with the BDAE. A spherical representation of the correlation matrix of the LAST and BDAE subtests is shown in Figure 3 (the closer the points, the stronger the correlation). The receiver operating characteristic curve in Figure 4 shows the nearness of the results of the 2 tests by the tradeoff between sensitivity and specificity with a 2-dimensional measure of classification performance: the closer the receiver operating characteristic curve is to the upper left-hand corner, the higher the overall accuracy of the test.31
Interrater reliability for the 300 “acute” patients was near perfect (ICC, 0.998). The results obtained by the examiner pairs consisting of 2 speech and language therapists (26%) were not different from those of the pairs combining a speech and language therapist with a nurse (32%), student (34%), or a neurologist (8%). The ICC was near perfect regardless of the nature of the second examiner.
We have developed and validated a brief language screening scale (LAST) for patients with acute stroke. LAST standardizes and formalizes quantitative clinical language examination in the emergency setting. The scale has good internal validity, correlates well with the gold standard BDAE scale, shows very high interrater reliability, and is quick to complete. We developed 2 versions of the scale to avoid the retest bias and found that the 2 versions were equivalent. Importantly, LAST does not need to be administered by a speech and language therapist. With a cutoff score of <15 from a maximal score of 15, LAST showed excellent sensitivity and specificity for language disorders, thus identifying patients warranting personalized language evaluation with a speech and language therapist. Although the benefit of language therapy during the acute phase of stroke is controversial, this screening tool may help to begin early language rehabilitation, which may optimize long-term rehabilitation.5,–,7 One strength and originality of LAST is the possibility of using the 2 versions successively to test the same patient, thereby avoiding the retest effect.
LAST detected a language deficit (score <15, the cutoff based on external validation) in 55% of the 300 patients admitted urgently to our stroke unit during the study period, whereas aphasia is reported in only 17% to 38% of patients in other acute stroke series.32 Explanations for this difference may include (1) a higher sensitivity of LAST for aphasia in this setting; (2) early testing in our study (within 24 hours after admission), thus identifying patients who would go on to recover rapidly33; and (3) identification of false-positive (nonaphasic) patients such as (a) patients with dysarthria (8% to 30% of patients in large stroke series have isolated dysarthria)34,35; and (b) patients with visual field impairment, eye movement disorders, or initiative/executive dysfunctions (for example, the maximal response time of 5 seconds could penalize patients with initiative disorders). Lastly, although we excluded patients with a history of dementia or severe psychiatric disorders, deafness or blindness, altered consciousness, or a non-French native language, such patients could undermine the reliability of LAST results in a real-life setting. Although we included consecutive patients, we acknowledge that they were rather young with fairly mild strokes when compared with the literature's stroke series. Possible reasons for these particular characteristics are: (1) the oldest patients with stroke are preferentially admitted to geriatric acute care units; and (2) patients with more severe stroke are occasionally admitted to nonspecialized intensive care units. This may have resulted in a slight recruitment bias. Concerning the potential limitations of our validation procedure, we had no alternative to testing external validity in “chronic” patients, because (1) there is no universally recognized gold standard scale for evaluating language disorders in the emergency setting; and (2) gold standard aphasia rating scales such as BDAE take too long to administer in acute stroke. In contrast, internal validity, interrater reliability, and the time required for scale completion were determined in “acute” patients. Finally, LAST was primarily designed to evaluate language impairment, but it is now well recognized that the impact on daily life activities of such impairments extends beyond these actual impairment,36 and tools have recently been developed to specifically address this issue.37,38 It would be interesting to test LAST against such quality-of-life scales.
The impact of very early intervention (within days after stroke) on language recovery is difficult to screen, and LAST may prove useful for this purpose. However, to further establish its use, future studies are warranted comparing LAST and language items of the National Institutes of Health Stroke Scale against BDAE or another gold standard. A recent Cochrane review showed a benefit of speech and language therapy in patients with stroke but failed to establish the best way of delivering such therapy or the best time to initiate speech and language therapy.39 This review was based on 30 randomized trials of various interventions designed to improve language in patients with stroke, but none of the studies focused on very early interventions, starting within 15 days after stroke. As a result, the use of the usual prevalent tools such as BDAE was warranted. By contrast, a recent randomized controlled trial of very early intervention (Day 2) used a short adjusted home-made version of the Norsk Grunntest for Afasi. This scale was not validated, included written items, and took 15 minutes to complete, which limited its use to selected patients.40 The paucity of the literature on very early interventions and the use of nonvalidated scales underlines the need for new validated tools such as the LAST scale.
In conclusion, we propose a new validated language screening tool for patients with acute stroke, which can be administered at the bedside in approximately 2 minutes. This French-language scale should be easy to adapt to English and other languages. It may represent a useful complement to global stroke rating scales such as the National Institutes of Health Stroke Scale for initial evaluation of patients with stroke.
We thank Professor M.-G. Bousser for helpful discussions and critical reading of the article, Dr Alexis Elbaz for helpful discussions, and Tristan Laville for the drawings.
The online-only Data Supplement is available at http://stroke.ahajournals.org/cgi/content/full/STROKEAHA.110.609503/DC1.
- Received November 23, 2010.
- Revision received February 11, 2011.
- Accepted February 14, 2011.
- © 2011 American Heart Association, Inc.
- Wade DT,
- Hewer RL,
- David RM,
- Enderby PM
- Kertesz A
- Goodglass H,
- Kaplan E
- Cote R,
- Hachinski VC,
- Shurvell BL,
- Norris JW,
- Wolfson C
- Brott T,
- Adams HP Jr.,
- Olinger CP,
- Marler JR,
- Barsan WG,
- Biller J,
- et al
- Gotoh F,
- Terayama Y,
- Amano T
Scandinavian Stroke Study Group. Multicenter trial of hemodilution in ischemic stroke—background and study protocol. Stroke. 1985;16:885–890.
- Adams RJ,
- Meador KJ,
- Sethi KD,
- Grotta JC,
- Thomson DS
- Grant I,
- Adams KM
- Reitan RM,
- Wolfson D
- Reinvang I,
- Engvik H
- Falissard B
http://cran.r-project.org. R (computer program). Version 1.9.1. 2002.
- Zweig MH,
- Campbell G
- Engelter ST,
- Gostynski M,
- Papa S,
- Frei M,
- Born C,
- Ajdacic-Gross V,
- et al
- Melo TP,
- Bogousslavsky J,
- van Melle G,
- Regli F
- Urban PP,
- Wicht S,
- Vukurevic G,
- Fitzek C,
- Fitzek S,
- Stoeter P,
- et al
- Hilari K,
- Byng S,
- Lamping DL,
- Smith SC
- Post MW,
- Boosman H,
- van Zandvoort MM,
- Passier PE,
- Rinkel GJ,
- Visser-Meily JM