Interpreting Effect Size to Estimate Responsiveness of Outcome Measures
To the Editor:
I congratulate Hsieh et al on their well-conducted study comparing the responsiveness of some common outcomes used in stroke rehabilitation research.1 They report large responsiveness indices for Fugl Meyer (FM) and Action Research Arm test with FM being a more responsive measure than Action Research Arm test.
Hsieh et al have used the standardized response mean (SRM), which is one of the best valid measures to estimate responsiveness. The authors have, however, used Cohen’s thresholds (>0.8 large; 0.5 to 0.8 moderate, and <0.5 small) for grading the SRM values, which is debatable. Cohen’s thresholds are described for effect size (ESp) calculated by dividing change in scores by pooled SD (population standard deviation).2 SRM effect size is slightly different from ESp and is calculated by dividing the change in scores by SD of change in scores. Consequently, applying Cohen’s thresholds to SRM values rather than ESp could lead to over- or underestimation of effects.3
Middel et al in their 2 studies have shown up to half of the estimates based on Cohen’s threshold applied to SRM values being either over- or underestimation of an intervention-related effect.4,5 A simple method to avoid such estimation errors has been suggested whereby the correlation between repeated measurements (baseline and follow-up) is used to determine the effect size and then apply the Cohen’s thresholds.3 For example, an SRM of 0.85, which could be interpreted as a large effect, changes to a moderate effect according to Cohen’s thresholds due to a correlation of 0.82 between repeated measurements.3
It is also to be noted that the results of large responsiveness observed in this study by Hsieh et al are not comparable to previous FM and Action Research Arm test responsiveness studies (in stroke subjects), which have showed only a moderate effect size.6,7 The moderate responsiveness is in fact believed to be one of the limitations of FM scale.
It is quite possible Hsieh et al may find that even after calculation of ESp values, the category of effect sizes as per Cohen’s thresholds remains the same, that is, large responsiveness indices for FM and Action Research Arm test. If this is the case, the responsiveness of these scales will have to be further researched, particularly the FM scale because it is extensively used in stroke rehabilitation studies.
Hsieh YW, Wu CY, Lin KC, Chang YF, Chen CL, Liu JS. Responsiveness and validity of three outcome measures of motor function after stroke rehabilitation. Stroke. 2009; 40: 1386–1391.
Cohen J. Statistical Power Analysis for the Behavioural Sciences, rev ed. New York: Academic Press; 1977.
Middel B, Kuipers-Upmeijer H, Bouma J, Staal M, Oenema D, Postma T, Terpstra S, Stewart R. Effect of intrathecal baclofen delivered by an implanted programmable pump on health related quality of life in patients with severe spasticity. J Neurol Neurosurg Psychiatry. 1997; 63: 204–209.
Middel B, Bouma J, Crijns HJGM, De Jongste MJL, Van Sonderen FLP, Niemeijer MG, Crijns H, van den Heuvel W. The psychometric properties of the Minnesota Living with Heart Failure Questionnaire (MLHF-Q). Clin Rehabil. 2001; 15: 380–391.
Lin JH, Hsu MJ, Sheu CF, Wu TS, Lin RT, Chen CH, Hsieh CL. Psychometric comparisons of 4 measures for assessing upper-extremity function in people with stroke. Phys Ther. 2009; 89: 840–850.
Hsueh IP, Hsieh CL. Responsiveness of two upper extremity function instruments for stroke inpatients receiving rehabilitation. Clin Rehabil. 2002; 16: 617–624.