Responsiveness and Validity of 3 Outcome Measures of Motor Function After Stroke Rehabilitation
To the Editor:
With interest we read the recently published letter of Dr Sivan, “Interpreting Effect Size to Estimate Responsiveness of Outcome Measures,”1 as a response to a paper by Hsieh et al2 in which they provided indices of the magnitude of treatment-related intraindividual change assessed with the Fugl-Meyer Assessment (FMA), Action Research Arm Test (ARAT), and Wolf Motor Function Test performance time (WMFT-TIME) and functional ability scores (WMFT-FAS). As an effect size index, Hsieh et al used the method of the so-called standardized response mean (SRM) by which mean change in scores over time is divided by the SD of these change scores (see Formula A).
As Sivan argued in his letter,1 the interpretation of the magnitude of intraindividual change estimated with a SRM may lead to overestimation or underestimation of treatment-related effects when the widely used thresholds of Cohen3 are used. These thresholds for classification of the magnitude of mean differences were developed with an effect size index based on standardizing these mean differences using the pooled SD (see Formula B). Dunlap et al convincingly argued that only the pooled SD should be used to compute effect size (ES) for correlated designs and concluded that if the SDpooled is corrected for the amount of correlation between the measures, then the ES estimate will be an overestimate of the actual ES.4 It is essential for clinical investigators to understand the differences between the SRM and ES in classifying treatment-related change in terms of Cohen’s thresholds (ES <0.20 indicating a “trivial” change, ES between ≥0.20 and <0.50 “small,” ES of ≥0.50 to <0.80 a moderate, and ES ≥0.80 a large change).3
Hsieh et al refer to our earlier work concerning the risk of misclassification of an SRM when using Cohen’s thresholds5,6 in their response to Sivan’s critic. However, their calculation of adjusted ES estimates7 is based on a false assumption. Consequently, adjusting ES for the size of the correlation between baseline and follow-up as computed by Hsieh et al7 in the Table leads to an ES estimate more than twice the magnitude of the ES computed using the SDpooled when the correlation between the baseline and follow-up scores is at least 0.8.4
Adjustment of a SRM to ES comprises 2 components. First, Cohen introduced a √(2) correction as necessary for an appropriate use of his tables for sample size calculation. This correction for looking in Cohen’s power tables is necessary because these assume 2(N-1) degrees of freedom (2 independent samples), whereas in, for example, pre-/posttest evaluation, only n-1 are actually available3 (pp 46 to 48). Thus, following Cohen’s theory, “multiplying SRM by √2 (approximately 1.41) compensates for the sample size tables’ assumption of double the error variance”3 (p 46).
Second, because the t test prescribed in “own control” study designs (baseline to follow-up) is based on correlated means3 (p 48), we also have to compensate for the correlation (r) between paired observations. Therefore, according to Cohen, the relative size of the standardizing unit for the SRM to the ESpooled is not
Thus, the difference between means for paired (dependent) samples needs to be standardized by a value “which is √2 (1−r) as large as would be the case were they independent”3 (p 49).
As was shown in an earlier publication, (d′/√2)/√(1−r) is equivalent to the SRM and alternatively SRM * √ 2 * √(1−r) is equivalent to d′ and both indices will vary with the size of r. Hsieh et al have adjusted their ES but may have overlooked the fact that their method of adjustment of SRM to an ES is incomplete. As shown in the Table, correct transformation of their SRM estimates into ES suitable for Cohen’s classification resulted in a moderate effect for Fugl-Meyer Assessment and small effects for Action Research Arm Test, Wolf Motor Function Test–TIME, and Wolf Motor Function Test–FAS, respectively, which seems in sharp contrast with the conclusion of Hsieh et al. Extreme high correlations between baseline and follow-up (except Wolf Motor Function Test–TIME) are mainly responsible for overestimation of effect size by the method of Hsieh et al. Minor deviations from published indices are due to calculations without individual data.
Sivan M. Interpreting effect size to estimate responsiveness of outcome measures. Stroke. 2009; 40: e709–e711.
Hsieh YW, Wu CY, Lin KC, Chang YF, Chen CL, Liu JS. Responsiveness and validity of three outcome measures of motor function after stroke rehabilitation. Stroke. 2009; 40: 1386–1391.
Cohen J. The T Test for Means. Statistical Power Analysis for the Behavioural Sciences, 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988: 19–74.
Middel B, Van Sonderen FLP. Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. International Journal of Integrated Care. 2002; 2: 1–21.
Middel B, Van Sonderen FLP. Erratum. Statistical significant change versus relevant or important change in (quasi) experimental design: some conceptual and methodological problems in estimating magnitude of intervention-related change in health services research. International Journal of Integrated Care. 2008; 8: 1–2.
Hsieh YW, Wu CY, Lin KC. Response to Letter by Sivan. Stroke. In press.