Effect Size Measures and Their Relationships in Stroke Studies
- effect size measure
- Mann-Whitney measure
- numbers needed to treat
- proportional odds ratio
- randomized controlled trial
- risk difference
Many articles in Stroke have considered good statistical practice for adequate planning and high-power analysis for stroke trials. They have discussed which test may be adequate and powerful, proposals for an effect size measure, and proposals for defining number needed to treat (NNT) based on an ordinal scale (see online-only Data Supplement for citations).
The clinical problem is straightforward. We read results of trials that fall into 3 categories: unequivocally neutral or even negative; overwhelmingly positive; or encouraging but open to various interpretations according to the approach taken to statistical analysis and presentation of findings. This dependence on methodology for the third group undermines our confidence in effective treatments and can prompt unjustified repetition of trials of ineffective treatments. A robust, powerful, and universal statistical approach is required.
The statistical problem is more complex. From dozens of available statistical tests, each may be uniquely powerful in certain circumstances. However, trials intended as confirmatory for regulatory approval or for the use in clinical guidelines demand that the analysis plan be prespecified so that the test of choice should minimize assumptions yet maximize power for the anticipated difference between treatment groups. Furthermore, it is not sufficient to indicate that 1 treatment is significantly different from another: clinical research guidelines require that the magnitude of the treatment effect should be declared using the so-called effect size measure accompanied by its measure of precision, the confidence interval.
Thus, there is need for a robust test, preferably for all data types—binary, ordinal, or continuous—and a test-related effect size measure, with a confidence interval that matches the test-related P value. This requirement for an adequate analysis of study data fortunately restricts the plethora of available tests to a small number of useful candidates.
Here, we describe and explain the relationships between 2 tests, of which 1 …