Stroke. 2005;36:635-638
Published online before print January 27, 2005,
doi: 10.1161/01.STR.0000155688.18207.33
(Stroke. 2005;36:635.)
© 2005 American Heart Association, Inc.
Performance of the PHQ-9 as a Screening Tool for Depression After Stroke
Linda S. Williams, MD;
Edward J. Brizendine, MS;
Laurie Plue, MA;
Tamilyn Bakas, DNS, RN;
Wanzhu Tu, PhD;
Hugh Hendrie, MD
Kurt Kroenke, MD
From the Roudebush VAMC HSR&D (L.S.W.); the Department of Neurology (L.S.W., L.P.), Indiana University School of Medicine; Regenstrief Institute (L.S.W., W.T., H.H., K.K.); the Department of Medicine (E.J.B., W.T., K.K.), Indiana University School of Medicine; and the Indiana University School of Nursing (T.B.), Indianapolis, Ind.
Correspondence to Dr Linda S. Williams, Roudebush VAMC HSR&D 11-H, 1481 W. 10th Street, Indianapolis, IN 46202. E-mail Lwilliams{at}hsrd.va.iupui.edu
 |
Abstract
|
|---|
Background and Purpose The purpose of this study was
to examine the performance of the Patient Health Questionnaire
(PHQ)-9, a 9-item depression scale, as a screening and diagnostic
instrument for assessing depression in stroke survivors.
Methods As part of a randomized treatment trial for poststroke depression (PSD), subjects with and without PSD completed the PHQ-9, a 9-item summed scale, with scores ranging from 0 (no depressive symptoms) to 27 (all symptoms occurring daily). Subjects endorsing 2 or more symptoms of depression were administered the criterion standard Structured Clinical Interview for Depression (SCID). Receiver operating characteristic analysis was used to examine the sensitivity and specificity of the PHQ-9
Results Of 316 subjects enrolled, 145 met SCID criteria for major depression or other depressive disorder, and 171 were not depressed. PHQ-9 scores discriminated well between subjects with any versus no depressive disorder, with an area under the curve (AUC) of 0.96, as well as between subjects with and without major depression (AUC=0.96). The AUC was similar regardless of patient age, gender, or ethnicity. A PHQ-9 score
10 had 91% sensitivity and 89% specificity for major depression, and 78% sensitivity and 96% specificity for any depression diagnosis.
Conclusions The PHQ-9 performs well as a brief screener for PSD with operating characteristics similar or superior to other depression measures and similar to its characteristics in a primary care population. Moreover, PHQ-9 scores discriminate equally well between those with and without PSD regardless of age, gender, or ethnicity.
Key Words: depression stroke
 |
Introduction
|
|---|
Poststroke depression (PSD) affects approximately one-third
of ischemic stroke survivors, is often undiagnosed and inadequately
treated, and is associated with increased morbidity and mortality
after stroke.
14 Depression screening after stroke is
thus important but can be complicated by cognitive and physical
symptoms of stroke that may introduce additional variability
in assessment of depressive symptoms and depression diagnosis.
Although several established depression screening instruments
have been validated in stroke cohorts,
510 these scales
can be burdensome for patients to complete, require a trained
interviewer to administer, and often are designed only for screening
and not as a diagnostic depression tool. The Patient Health
Questionnaire 9-item depression scale (PHQ-9) is a 9-item self-administered
depression screening and diagnostic tool increasingly used in
primary care and other medical populations.
11,12 Although it
has excellent measurement properties in other settings, it has
not been previously validated in patients with PSD. The purpose
of this study was to examine the performance of the PHQ-9 as
a screening and diagnostic instrument for assessing depression
in ischemic stroke survivors.
 |
Subjects and Methods
|
|---|
Subjects were patients enrolled in the National Institute for
Neurologic Disorders and Stroke-funded AIM (Activate, Initiate
treatment, Monitor) PSD study. The AIM study consists of a randomized
clinical trial of case-management intervention versus usual
care in depressed subjects, nested within a longitudinal cohort
study that includes nondepressed subjects. Nondepressed subjects
were matched 1:1 by site of enrollment to depressed subjects.
Eligible patients at 4 Indianapolis hospitals were screened
for PSD between 1 and 2 months after ischemic stroke. Patients
with more than moderate aphasia (National Institutes of Health
Stroke Scale language item score >1) or cognitive impairment
(modified 6-item Mini-Mental Status score <3) were excluded.
13,14 We used previously validated methodology to estimate stroke
severity at the time of stroke admission from the admission
physical examination note.
15 The local human subjects review
board approved this study.
All subjects were screened for depression with the PHQ-9, a 9-item scale that assesses the 9 Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) depression symptom criteria for frequency of occurrence during the previous 2 weeks. The first 2 PHQ-9 items assess the 2 cardinal DSM-IV symptoms of depression: depressed mood and anhedonia. The PHQ-9 can be used as a screening tool, with summed score ranging from 0 (no depressive symptoms) to 27 (all symptoms occurring daily).12 In this use, a PHQ-9 score
10 has been found to have 88% sensitivity and 88% specificity for a diagnosis of major depression. The PHQ-9 can also be used as a diagnostic assessment, with major depression diagnosed if 5 or more of the 9 symptoms have been present at least more than half the days of the past 2 weeks and 1 of these symptoms is either depressed mood or anhedonia.11
Subjects endorsing any occurrence in the past 2 weeks of at least 2 symptoms on the PHQ-9 or any subject endorsing the depressed mood or anhedonia item were administered the Structured Clinical Interview for Depression (SCID), considered a criterion standard for DSM-IV depressive disorder diagnoses in clinical research.16 Subjects who did not endorse the mood, anhedonia, or thoughts of death item and who had at most endorsed only 1 other PHQ-9 item were considered nondepressed. SCID diagnoses of major depression and other depression were determined according to standard scoring algorithms in which all depressed subjects must endorse depressed mood and/or anhedonia, and with major depression defined as a total of 5 or more depressive symptoms endorsed and other depression as a total of 3 or 4 depressive symptoms endorsed in the past 2 weeks.16 Only baseline PHQ-9 and PHQ-2 scores were used in these analyses.
The PHQ-2 is an abbreviated version of the PHQ-9. It is defined as the sum of the anhedonia and mood items of the PHQ-9 and can be used as a very brief depression screening tool in primary care.17,18 In primary care patients, a PHQ-2 score
3 has 83% sensitivity and 92% specificity for identifying patients with major depression.18 Because very brief screening tools may be preferred in some clinical settings, we also examined the sensitivity and specificity of the PHQ-2 score for identifying patients with major and any depression, again using the SCID criterion standard for depressive disorder diagnosis.
We used receiver operating characteristic analyses to assess the discriminatory power of the PHQ-9 as a screening and diagnostic tool for any depression (other depression or major depression) and major depression. We also examined the effect of age (60 years or older versus younger than 60 years), gender, and ethnicity (white versus nonwhite) on the PHQ-9 operating characteristics. We assessed the sensitivities and specificities of a PHQ-9 screening score
10 and a PHQ-2 score
3 for identifying subjects with major depression and any depression. Demographic characteristics were compared between depressed and nondepressed subjects using the Pearson
2 test. SAS version 8.2 (SAS Institute, Cary, NC) was used for all analyses.
 |
Results
|
|---|
Of 316 patients enrolled, 145 were depressed and 171 were nondepressed.
There was a greater proportion of younger patients (younger
than 60 years old) among the depressed subjects than the nondepressed
subjects (50% versus 35% younger than 60 years old;
P=0.009)
but the groups were similar in terms of gender (57% versus 46%
female;
P=0.07) and ethnicity (60% versus 58% white;
P=0.71).
National Institutes of Health Stroke Scale score at the time
of stroke was also similar between depressed and nondepressed
subjects (mean score 3.3 versus 3.1;
P=0.475). All symptoms
were endorsed significantly more frequently by depressed than
by nondepressed patients (
Table 1). Fatigue was the symptom
most frequently endorsed by both the depressed (73%) and the
nondepressed (32%) subjects. The ninth item of the PHQ-9 (asking
whether the patient had been bothered by "thoughts that you
would be better off dead or of hurting yourself in some way")
was endorsed by 10% of depressed subjects.
The PHQ-9 had excellent discriminatory power (Figure) for subjects with any depression, with an area under the curve of 0.96, as well as those with major depression only (area under the curve=0.96). The area under the curve was similar between older (0.97) and younger (0.94) subjects, men (0.97) and women (0.94), or white (0.96) and nonwhite (0.96) subjects (Table 2). PHQ-9 scores
10 had 91% sensitivity and 89% specificity for major depression and 78% sensitivity and 96% specificity for any depression diagnosis (Table 3). A PHQ-2 score
3 had a 83% sensitivity and 84% specificity for major depression and a 78% sensitivity and 95% specificity for any depression diagnosis.
 |
Discussion
|
|---|
The PHQ-9 performed with similarly high diagnostic accuracy
for both major depression and any depression in patients with
PSD, and performed as well as in stroke survivors as it has
in the general medical outpatient population in which it was
developed. Performance of the PHQ-9 did not differ by age, ethnicity,
or gender. The PHQ-2 also performed quite well as a depression-screening
tool with nearly identical performance to the PHQ-9 in identifying
subjects with any depression. However, for diagnosis and more
complete clinical evaluation of depression symptoms, those scoring

3 on the PHQ-2 should be administered the additional 7 items
to complete the PHQ-9.
Like previous studies,19,20 we also found much higher endorsement of all depression symptoms in patients with depression compared with those without. Even for the more somatic symptoms of depression (agitation, sleep disturbance, fatigue, and appetite disturbance), depressed patients were more than twice as likely to endorse these symptoms compared with those without depression. This finding speaks to the necessity to actively screen for depression in the poststroke period rather than attributing physical symptoms to the stroke itself. Importantly, although 10% of our sample endorsed the "suicidality/thoughts of death" item, a proportion identical to that reported in an earlier study of suicidal thoughts at 3 and 12 months after stroke,21 our standardized suicidality assessment demonstrated that endorsement of this item almost exclusively represented passive thoughts (ie, increased mortality or likelihood of death, or thoughts of whether life was worth living) rather than active suicidal ideation.
Other depression screening tools have been validated in stroke populations510 and at least 1 scale has been developed for depression screening in patients with aphasia.22 Although most of these scales have performed reasonably well in stroke cohorts, they are longer than the PHQ-9, and many can only be used for depression screening, not for specific depressive disorder diagnoses. A key difference between the PHQ-9 and other depression scales is that it is based on the 9 DSM-IV symptoms of depression, which facilitates its use as a depression diagnostic tool as well as a screening instrument. Further, although other scales may perform differently in different ethnic groups or among patients of different age or gender,23,24 this critical aspect of scale measurement is infrequently assessed. We found that the PHQ-9 performed equally well regardless of age, gender, or ethnicity, suggesting that it can be confidently used even in a heterogeneous group of patients. Finally, the PHQ-9 has also been demonstrated to be sensitive to change in assessing depression outcomes over time and thus is valuable for monitoring response to depression therapy,25,26 a key aspect of scale performance that we plan to evaluate in this study cohort.
A potential limitation of our study is the inherent differences between clinical trial cohorts and the entire population with a given condition. This study included patients (both depressed and nondepressed) who were participants in a clinical trial and so are likely to have different physical, psychological, and behavioral characteristics than a population-based stroke cohort. Further, these were patients with no more than moderate language or cognitive effects of stroke and most of whom were outpatients at the time of evaluation. Although it is encouraging that we observed no differences in PHQ-9 performance by age, gender, or ethnicity, it would be beneficial to evaluate the PHQ-9 in other stroke samples to ensure its measurement characteristics are stable across the full range of stroke populations and severity levels. Additionally, in some subjects, the same rater administered both the PHQ-9 and the SCID, which may contribute to the observed agreement between the scales. We addressed this possibility by conducting rigorous training and evaluation of all study personnel to minimize interviewer bias, but we acknowledge that for the purposes of comparison these 2 scales would have always been scored by independent interviewers.
Because PSD is common and is associated with increased morbidity and mortality after stroke,2729 systematic screening for depression symptoms should be considered in all stroke survivors in the first months after stroke. Symptoms of PSD may be misattributed by the patient or the health care provider as expected physical effects of stroke, or they may be missed in the context of a busy follow-up visit in which review of diagnostic studies and attention to secondary stroke prevention often take precedence. As with all screening tools, a few false-positive and false-negative assignments are expected; if the clinician feels the patient might still be depressed despite not achieving the threshold score on the PHQ-9, additional questions should be asked to clarify the diagnosis. However, the use of a brief but accurate depression screening tool like the PHQ-9 that patients can self-complete before the provider visit could help increase the identification and treatment of PSD and thus improve patient outcomes after stroke.
 |
Appendix
|
|---|
 |
Acknowledgments
|
|---|
The authors thank the AIM research team: Connie Dagon, Carrie
Dixon, Monta Gazvoda, Carol Kempf, Gloria Nicholas, and Jennifer
Stuart. This work was supported by NINDS grant RO1 NS39571-01
(L.S.W., Principal Investigator) and by an Advanced Research
Career Development Award, Department of Veterans Affairs (L.S.W.).
Received July 23, 2004;
revision received October 26, 2004;
accepted November 22, 2004.
 |
References
|
|---|
- Robinson RG, Starr LB, Kubos KL, Price TR. A 2-year longitudinal study of post-stroke mood disorders: findings during the initial evaluation. Stroke. 1983; 14: 736741.[Abstract]
- Herrman N, Black SE, Lawrence J, Szekely C, Szalai JP. The Sunnybrook Stroke Study: a prospective study of depressive symptoms and functional outcomes. Stroke. 1998; 29: 618624.[Abstract/Free Full Text]
- Kotila M, Numminen H, Waltimo O, Kaste M. Depression after stroke: results of the FINNSTROKE study. Stroke. 1998; 29: 368372.[Abstract/Free Full Text]
- Burvill PW, Johnson GA, Jamrozik KD, Anderson CS, Stewart-Wynne EG, Chakera TM. Prevalence of depression after stroke: the Perth Community Stroke Study. Br J Psychiatry. 1995; 166: 320327.[Abstract/Free Full Text]
- Williams LS, Ghose S, Swindle RW. The effect of depression and other mental health diagnoses on mortality after ischemic stroke. Am J Psychiatry. 2004; 161: 10901095.[Abstract/Free Full Text]
- Shinar K, Gross CR, Price TR, Banko M, Bolduc PL, Robinson RG. Screening for depression in stroke patients: the reliability and validity of the center for epidemiologic studies depression scale. Stroke. 1986; 17: 241245.[Abstract/Free Full Text]
- Johnson G, Burvill PW, Anderson CS, Jamrozik K, Stewart-Wynne EG, Chakera TM. Screening instruments for depression and anxiety following stroke: experience in the Perth Community Stroke Study. Acta Psychiatr Scand. 1995; 91: 252257.[Medline]
[Order article via Infotrieve]
- Gainotti G, Azzoni A, Razzano C, Lanzillotta M, Marra C, Gasparini F. The Post-Stroke Depression Rating Scale: a test specifically devised to investigate affective disorders of stroke patients. J Clin Exp Neuropsychol. 1997; 19: 340356.[Medline]
[Order article via Infotrieve]
- ORourke S, MacHale S, Signorini D, Dennis M. Detecting psychiatric morbidity after stroke: comparison of the GHQ and the HAD Scale. Stroke. 1998; 29: 980985.[Abstract/Free Full Text]
- Aben I, Verhey F, Lousberg R, Lodder J, Honig A. Validity of the beck depression inventory, hospital anxiety and depression scale, SCL-90, and hamilton depression rating scale as screening instruments for depression in stroke patients. Psychosomatics. 2002; 43: 386393.[Abstract/Free Full Text]
- Lincoln NB, Nicholl CR, Flannaghan T, Leonard M, Van der Gucht E. The validity of questionnaire measures for assessing depression after stroke. Clin Rehabil. 2003; 17: 840846.[Abstract/Free Full Text]
- Spitzer Rl, Kroenke K, Williams JBW. Patient Health Questionnaire Study Group. Validity and utility of a self-report version of PRIME-MD: the PHQ Primary Care Study. JAMA. 1999; 282: 17371744.[Abstract/Free Full Text]
- Kroenke K, Spitzer RL, Williams JBW. The PHQ-9. Validity of a brief depression severity measure. J Gen Intern Med. 2001; 16: 606613.[CrossRef][Medline]
[Order article via Infotrieve]
- Brott T, Adams HP Jr., Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Herztberg V, Rorick M, Moonaw CJ, Walker M. Measurement of acute cerebral infarction: a clinical examination scale. Stroke. 1989; 20: 864870.[Abstract/Free Full Text]
- Callahan CM, Unverzagt FW, Hui SL, Perkins AJ, Hendrie HC. Six-item screener to identify cognitive impairment among potential subjects for clinical research. Med Care. 2002 Sep; 40: 771781.[CrossRef][Medline]
[Order article via Infotrieve]
- Spitzer RL, Williams JB, Gibbon M. Instruction Manual for the Structured Clinical Interview for DSM-III-R. New York: Biometrics Research Department, New York State Psychiatric Institute; 1986.
- Williams LS, Yilmaz E, Lopez-Yunez AM. Retrospective assessment of initial stroke severity with the NIH Stroke Scale. Stroke. 2000; 31: 858862.[Abstract/Free Full Text]
- Spitzer RL, Williams JBW, Gibbon M, First MB. The structured clinical interview for DSM-III-R (SCID). Arch Gen Psychiatry. 1992; 49: 624629.[Abstract]
- Whooley MA, Avins AL, Miranda J, Browner WS. Case-finding instruments for depression: two questions are as good as many. J Gen Intern Med. 1997; 12: 439445.[CrossRef][Medline]
[Order article via Infotrieve]
- Kroenke K, Spitzer RL, Williams JB. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med Care. 2003; 41: 12841292.[CrossRef][Medline]
[Order article via Infotrieve]
- Paradiso S, Ohkubo T, Robinson RG. Vegetative and psychological symptoms associated with depressed mood over the first two years after stroke. Int J Psychiatry Med. 1997; 27: 137157.[Medline]
[Order article via Infotrieve]
- Federoff JP, Starkstein SE, Parikh RM, Price TR, Robinson RG. Are depressive symptoms nonspecific in patients with acute stroke? Am J Psychiatry. 1991; 148: 11721176.[Abstract/Free Full Text]
- Pohjasvaara T, Vataja R, Leppavuori A, Kaste M, Erkinjuntti T. Suicidal ideas in stroke patients 3 and 15 months after stroke. Cerebrovasc Dis. 2001; 12: 2126.[CrossRef][Medline]
[Order article via Infotrieve]
- Sutcliffe LM, Lincoln NB. The assessment of depression in aphasic stroke patients: the development of the Stroke Aphasic Depression Questionnaire. Clin Rehabil. 1998; 12: 506513.[Abstract/Free Full Text]
- Callahan CM, Wolinsky FD. The effect of gender and race on the measurement properties of the CES-D in older adults. Med Care. 1994; 32: 341356.[Medline]
[Order article via Infotrieve]
- Cole SR, Kawachi I, Maller SJ, Berkman LF. Test of item-response bias in the CES-D scale. Experience from the New Haven EPESE study. J Clin Epidemiol. 2000; 53: 285289.[CrossRef][Medline]
[Order article via Infotrieve]
- Löwe B, Kroenke K, Herzog W, Gräfe K. Measuring depression outcome with a short self-report instrument: sensitivity to change of the Patient Health Questionnaire (PHQ-9). J Affective Disorders. 2004; 78: 131140.[CrossRef][Medline]
[Order article via Infotrieve]
- Löwe B, Unutzer J, Callahan CM, Perkins AJ, Kroenke K. Monitoring depression treatment outcomes with the PHQ-9. Med Care. In press.
- Lai S-M, Duncan PW, Keighley J, Johnson D. Depressive symptoms and independence in BADL and IADL. J Rehab Research Devel. 2002; 39: 589596.
- Ramasubbu R, Robinson RG, Flint AJ, Kosier T, Price TR. Functional impairment associated with acute poststroke depression: the Stroke Data Bank Study. J Neuropsych Clin Neurosci. 1998; 10: 2633.[Abstract/Free Full Text]
This article has been cited by other articles:

|
 |

|
 |
 
E. Townend, M. Brady, and K. McLaughlan
A Systematic Evaluation of the Adaptation of Depression Diagnostic Methods for Stroke Survivors Who Have Aphasia
Stroke,
November 1, 2007;
38(11):
3076 - 3083.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. S. Williams, K. Kroenke, T. Bakas, L. D. Plue, E. Brizendine, W. Tu, and H. Hendrie
Care Management of Poststroke Depression: A Randomized, Controlled Trial
Stroke,
March 1, 2007;
38(3):
998 - 1003.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
L. S. Williams, T. Bakas, E. Brizendine, L. Plue, W. Tu, H. Hendrie, and K. Kroenke
How Valid Are Family Proxy Assessments of Stroke Patients' Health-Related Quality of Life?
Stroke,
August 1, 2006;
37(8):
2081 - 2085.
[Abstract]
[Full Text]
[PDF]
|
 |
|