| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
(Stroke. 2002;33:1176.)
© 2002 American Heart Association, Inc.
Editorials |
From the Department of Clinical Epidemiology and Biostatistics, Academic Medical Center, University of Amsterdam, the Netherlands.
Correspondence to Rob J. de Haan, PhD, Department of Clinical Epidemiology and Biostatistics, University of Amsterdam, Academic Medical Center, Room J2-203, Meibergdreef 9, PO Box 22660, 1105 AZ Amsterdam, the Netherlands. E-mail rob.dehaan{at}amc.uva.nl
The SF-36 is the most widely used generic instrument for measuring quality of life (QOL). The instrument is translated into numerous languages, and the validity of the 8 subscales is confirmed in general populations and in a wide variety of patient groups in more than 2000 articles. In an article published in this issue of Stroke, Hobart et al1 report the psychometric properties of the SF-36 in a sample of ischemic stroke patients. The authors conclude that (1) some subscales, especially the scales for General Health (GH) and Social Functioning (SF), have limited reliability and validity; (2) half of the subscales suffer from floor and/or ceiling effects; and (3) the 2 summary scores inadequately reflect the patients physical and mental health. In view of the overwhelming weight of evidence that the subscales of the SF-36 are psychometrically sound to measure QOL in a range of patient populations, the question arises how convincing the arguments of Hobart and his colleagues are.
The authors argue that the GH and SF scales generate low reliability scores and have limited convergent and discriminant validity. However, these conclusions can be challenged. The reliability of only 1 scale (GH) was marginally less (Cronbachs alpha=0.68) than the authors predefined criteria. Although it is often recommended that coefficient values should be above 0.80, values above 0.70 are generally regarded as acceptable for scales when assessing outcome on a group level. Moreover, it should be noticed that the alpha coefficient not only depends on the correlations of the items but is also related to the number of items in the scale. For example, the relatively low coefficient (0.70) of the SF scale may also be explained by the fact that this scale encompasses only 2 items. Since Cronbachs coefficients increase as the number of items is increased (or vice versa), one may wonder whether it is sensible to specify criteria for acceptable levels of the alpha coefficient without specifying the number of items in the scale.2 The authors criticism with regard to the convergent and discriminant validity of both subscales is not convincing. In general, the item-total subscale correlation of the GH and SF is above 0.40, indicating that the items in each scale measure a common underlying trait. Moreover, in both scales, all item-own scale correlations are higher than the item-other scale correlations (although less than 2 SE),
See article on page 1349
indicating that the subscale items are best placed in the scales in which they already appear.
In this study, a number of subscales of the SF-36 exhibit floor and/or ceiling effects. However, as the authors noted, the psychometric properties of an instrument are sample dependent. The narrow range of many health scales partly results from the traditional methods used in the development of scales, such as interitem correlation and factor analysis. Unfortunately, these techniques for item analyses are highly dependent on the average level of the patients in the samples used in the psychometric evaluation of a scale. This means that the resulting scales may exhibit considerable ceiling and floor effects in score distribution when they are used with groups of patients with a lower or higher average level (eg, mild to moderate stroke patients as in this study) of functional health.
To simplify the statistical analysis and to enhance the interpretation of the SF-36, the developers of the instrument recently made available scoring algorithms for aggregating subscale scores in 2 distinct summary scores: Physical Component Summary and Mental Component Summary. An important finding of Hobart et al is that the study results do not support the computation of summarized scores in stroke. These findings are in line with recent studies,37 which also demonstrate shortcomings of the summary scores in accurately reflecting patients physical and mental health on the basis of subscale scores. Taft et al showed that the discrepancies between subscale profile and component scores of the SF-36 are attributable to the way in which these summary scores are calculated.7 The main problem in the scoring algorithm derives from the use of negatively weighted subscale factor score coefficients, leading to inaccurately summarize profile scores and, sometimes, clinically counterintuitive study results.
To summarize, when the results of the relatively small study of Hobart et al are taken in conjunction with the findings of previous research, there is at present insufficient evidence to question the reliability and validity of the SF-36 subscales in stroke. The finding that the SF-36 is suffering from floor and/or ceiling effects can largely be explained by the specific characteristics of the mild to moderate stroke patients studied. However, a point in case is their finding that the assumptions for generating 2 summary scores could not be supported. Until the current scoring method is statistically revised, it is advisable not to use the component scores in stroke research.
In the light of the study results presented by Hobart et al, some additional remarks should be made with regard to the measurement of functional outcome using traditional multi-item instruments such as the SF-36. Typically, these scales calculate the total scale score for each patient using a (weighted) sum of the responses to each item. However, some serious problems are associated with this approach. Firstly, all items on a scale have to be presented to patients in order to obtain a summated score. This inefficiency has led researchers to shorten health instruments, resulting in more practical but less precise scales. Secondly, since summated scores are dependent on the number of, and precisely which, items are included in the instrument, it is impossible to compare scores obtained on different instruments, even if they measure the same health concept: 10 points on the Barthel are not the same as 10 points on the physical dimension of the Sickness Impact Profile.
Thirdly, the clinical interpretation of summated scores is not as straightforward as it may seem. For example, in the study of Hobart et al, stroke patients had a mean score of 47.6 on the subscale Physical Functioning. The clinical meaning of this SF-36 score would be unclear for most neurologists. This problem is amplified by the ordinal nature of summated scores, meaning that a given difference in scores at one point on the scale does not necessarily represent the same amount of functional change as an identical difference at another point on the scale. Following growing dissatisfaction with the classical approach, an alternative method has been introduced: item response theory (IRT).8 This statistical paradigm uses a logistic regression-type analysis to model the responses of the patients to the individual items. Using this technique, both patients and items can be placed on the same hierarchical continuous scale. There are a number of advantages to the use of IRT techniques in clinical measurement. Firstly, not all items in an instrument have to be presented to all patients to assess their functional health. Thus, more difficult items (eg, vigorous activities) can be presented to less disabled patients and easier ones (eg, bathing or dressing) to more severely impaired patients. This approach leads to a more efficient data collection method known as adaptive testing and results in a considerable reduction of floor or ceiling effects. Secondly, even if different subsets of items are presented to subgroups of patients, the measurements of their functional level remain completely comparable. This is because the difficulty of each item (its position on the linear scale) has been estimated beforehand. Thirdly, the clinical interpretation of functional measurements is straightforward because the patients level of functional health can be directly compared with the hierarchically ordered items on the linear scale. In spite of recent interest in IRT in clinical outcome measurement, these methods been have not yet been developed in stroke research. Perhaps IRT is not suitable for the development of scales measuring a subjective and multidimensional construct such as QOL. However, with regard to more direct and tangible manifestations of disease such as physical disability, IRT is probably a useful supplement to the traditional approach.
Footnotes
The opinions expressed in this editorial are not necessarily those of the editors or of the American Stroke Association.
References
1. Hobart JC, Williams LS, Moran K, Thompson AJ. Quality of life measurement after stroke: uses and abuses of the SF-36. Stroke. 2002; 13491356.
2. Fayers PM, Machin D. Quality of Life: Assessment, Analysis and Interpretation. London, England: John Wiley & Sons; 2000: 8587.
3. Simon GE, Revicky DA, Grothaus L, Von Korff M. SF-36 summary scores: are physical and mental health truly distinct? Med Care. 1998; 36: 567572.[CrossRef][Medline] [Order article via Infotrieve]
4.
Hurst NP, Ruta DA, Kind P. Comparison of the MOS short-form-12 (SF-12) health status questionnaire with the SF-36 in patients with rheumatoid arthritis. Br J Rheumatol. 1998; 37: 862869.
5. Nortvedt MW, Riise T, Myhr KM, Nyland HI. Performance of the SF-36, SF-12, and RAND-36 summary scales in a multiple sclerosis population. Med Care. 2000; 38: 10221028.[CrossRef][Medline] [Order article via Infotrieve]
6. Wilson D, Parsons J, Tucker G. The SF-36 summary scales: problems and solutions. Soz Praventiv Med. 2000; 45: 239246.
7. Taft C, Karlsson J, Sullivan M. Do SF-36 summary component scores accurately summarize subscale scores? Qual Life Res. 2001; 10: 395404.[CrossRef][Medline] [Order article via Infotrieve]
8. Van der Linden WJ, Hambleton RK. Handbook of Modern Item Response Theory. New York, NY: Springer; 1997.
This article has been cited by other articles:
![]() |
L. van der Sluijs Veer, M. J. E. Kempers, B. F. Last, T. Vulsma, and M. A. Grootenhuis Quality of Life, Developmental Milestones, and Self-Esteem of Young Adults with Congenital Hypothyroidism Diagnosed by Neonatal Screening J. Clin. Endocrinol. Metab., July 1, 2008; 93(7): 2654 - 2661. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. J. Gray, N. Sprigg, P. M.W. Bath, G. Boysen, P. P. De Deyn, D. Leys, D. O'Neill, E. B. Ringelstein, and for the TAIST Investigators Sex Differences in Quality of Life in Stroke Survivors: Data From the Tinzaparin in Acute Ischaemic Stroke Trial (TAIST) Stroke, November 1, 2007; 38(11): 2960 - 2964. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. W. Gargano, M. J. Reeves, and for the Paul Coverdell National Acute Stroke Regis Sex Differences in Stroke Recovery and Stroke-Specific Quality of Life: Results From a Statewide Stroke Registry Stroke, September 1, 2007; 38(9): 2541 - 2548. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Muus, L. S. Williams, and K. C. Ringsberg Validation of the Stroke Specific Quality of Life Scale (SS-QOL): test of reliability and validity of the Danish version (SS-QOL-DK) Clinical Rehabilitation, July 1, 2007; 21(7): 620 - 627. [Abstract] [PDF] |
||||
![]() |
R. E. Bramlett, A. K. Bothe, and D. M. Franic Using preference-based measures to assess quality of life in stuttering. J Speech Lang Hear Res, April 1, 2006; 49(2): 381 - 394. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. Jonsson, I. Lindgren, B. Hallstrom, B. Norrving, and A. Lindgren Determinants of Quality of Life in Stroke Survivors and Their Informal Caregivers Stroke, April 1, 2005; 36(4): 803 - 808. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Stroke Home | Subscriptions | Archives | Feedback | Authors | Help | AHA Journals Home | Search Copyright © 2002 American Heart Association, Inc. All rights reserved. Unauthorized use prohibited. |