It is a pleasure for me to give the Willis Lecture. I am especially honored because at the 8th Conference in February 1983 I also gave the Keynote Address, before it became known as the Willis Lecture.1
I am going to discuss a few principles concerning the risk factors for stroke; some recent observations from the Rochester, Minnesota, population on modeling of stroke risk factors; and some new observations about modeling attributable risk for stroke risk factors in a multivariable analysis; and I shall discuss how we might consider the risk factors for stroke in regard to the pathologic substrates for stroke.
The Rochester Epidemiology Project medical record linkage system2 provided the means to identify virtually all new cases of ischemic stroke in the Rochester population for a population-based, nested case-control study of risk factors for stroke. The controls for the study were selected from an enumeration of the population through the medical records of the Rochester Epidemiology Project. There were 1444 incidence cases of stroke in the population in the 25 years of the study from 1960 through 1984, with controls from the population matched one-to-one by age, sex, and duration of the medical record. About 80% of the cases were seen and evaluated by a neurologist.
This study of risk factors for ischemic stroke provides a unique and powerful set of data because it includes such a large number of incident cases and population-based controls. The size of the data set allowed assessment of interactions that have not been assessed adequately. More details of this study were published in December 1996.3
Risk Factors for Stroke
The effect of a risk factor on the probability of stroke is determined by three considerations: (1) the relative risk of the factor, (2) the prevalence of the factor in the population, and (3) the effect of intervention. The relative risk may have confounders or may be modified by other risk variables or by interaction with other variables, and these effects can be assessed by multivariable analyses. Prevalence may be affected by geography and certainly by age and sex. The effect of intervention may be affected by time and by complications of intervention.
The risk factors I shall consider here are ones that were determined from the population-based studies in Rochester that used a multiple logistic regression model and what is called the bootstrap method, which is a multiple resampling procedure4 to validate (1) the model, (2) the coefficient estimates, and (3) their standard errors. For variable selection and validation, only variables selected by the stepwise process in 70% or more of 1000 bootstrap samples were retained. In the logistic models, the inclusion of the variables delineating five quinquennial periods and interactions with the risk factor variables permitted an assessment of whether the odds ratios (ORs) indicating relative risk were stable over time. Such analyses were performed for each risk factor separately and within the final multivariable model. An antihypertensive treatment variable was assessed with the final model for the 25 years of the study.
The risk factors or comorbid conditions that were considered in the univariable analysis are shown in Table 1⇓, with the 95% confidence intervals and probability values. Except for transient ischemic attack (TIA) with an OR of 5.6, most of these ORs are about 2.
The potential risk factors of (1) cardiac surgical procedure, (2) left ventricular aneurysm, (3) dilated cardiomyopathy, and (4) sick sinus syndrome were not included in the analysis because each was present in fewer than 20 cases. Neither mitral valve prolapse nor aortic valve disease was significant in the univariable analysis, so they were not included in the multivariable analysis. Congestive heart failure, left ventricular hypertrophy, and regional wall motion abnormality of the heart were not significant in the multivariable modeling and did not come into the final model.
The significant main effects in the final model were prior TIA, hypertension, cigarette smoking, ischemic heart disease, mitral valve disease, atrial fibrillation, and diabetes mellitus, which were considered along with age and sex. All two-way interactions were also examined in the analysis, and this modeling process identified the following significant interactions: age and TIA, sex and TIA, age and hypertension, age and cigarette smoking, and hypertension and atrial fibrillation. I’ll now review these interactions.
(1) TIA Interaction With Sex and Age (Fig 1⇓). The OR for stroke was quite high in younger patients (that is, at age 50 or 60 years, particularly in women), but at age 70 years (which is a common age for TIA) the OR was about 5 for men and closer to 10 for women. The OR was still high at age 90 years: about 3 in men and 7 to 8 in women. The ORs for TIA in men and women are much higher than previously noted. However, these ORs represent a lifetime risk after the onset of TIA. For the cases of stroke in this study, there was an average of 37 years of observation in the medical record before the occurrence of stroke, whether or not TIA had occurred. There was a similar observation period for the population controls.
(2) Hypertension Interaction With Age (Fig 2⇓). Hypertension is the most important risk factor for stroke. We noted an interaction with hypertension and age3 : at younger ages (50 years) the OR is about 5, and it decreases with age until about age 90 years, at which point there is no increase in risk. This might seem to be at odds with the results of the SHEP clinical trial, which showed benefit from treating systolic hypertension in older persons.5 However, the average age of patients in that trial was about 71 years. Our population-based data show that the relative risk of stroke in hypertensive persons is about 2 at age 70 years, so there is ample room to show improvement in a clinical trial in patients at that age.
(3) Cigarette Smoking Interaction With Age (Fig 3⇓). Cigarette smoking shows an interaction with age similar to that for hypertension.3 At younger ages, the relative risk is higher (age 50 years, it is about 4) and decreases with increasing age, so that at age 90 years there is no increased risk of stroke from cigarette smoking.
The prevalence of cigarette smoking has greatly decreased in the United States. We do not know the full extent of the prevalence of cigarette smoking in Rochester 30 years ago, but it may have been at least twice as much as it is currently.
(4) Complex Relationship Between Atrial Fibrillation, Hypertension, and Age (Table 2⇓).3 There is little difference between the ORs for no fibrillation (OR=1) and intermittent fibrillation (OR=0.8) when hypertension is absent, whereas the OR for persistent atrial fibrillation in that circumstance is 7. When hypertension is present, however, there is little difference between the ORs for intermittent atrial fibrillation and persistent atrial fibrillation (OR=7 to 8 at age 50 years); the ORs decrease with increasing age, as occurs with hypertension alone.
Mitral valve disease, ischemic heart disease, and diabetes mellitus each have a relative risk of close to 2, and these are the same at all ages: there is no age interaction. Therefore, the effect on the population depends on the prevalence in addition to this low relative risk. The prevalence of mitral valve disease in this country is rather low now (1% to 2%), but the prevalence of ischemic heart disease is as high as 20% in older men, and the prevalence of diabetes is about 20% in older men.
We assessed the stability of the ORs over the five quinquennial periods by including variables for each period and interactions of the period with each risk factor. For each of the ORs that was significant in the final model, there was no significant difference in the value over the five periods. If there had been a significant effect of an added treatment or a significant difference in severity of a risk factor over time, these should have been detected by a difference in the ORs over time.
There also was no significant effect of antihypertensive treatment before stroke. Even though both the treated and the untreated hypertensive patients had a significantly higher incidence of ischemic stroke than nonhypertensive patients, there was no significant difference between the treated and untreated hypertensive patients. It is probable that the most severe cases of hypertension were more likely to be treated, but treatment in the population across the board had little effect in regard to prevention of stroke. This emphasizes that the result of a treatment in the population (that is, effectiveness) may be quite different from the result shown in a clinical trial (that is, efficacy of treatment).
The proportion of ischemic stroke in the population that can be attributed to a particular risk factor is called the attributable risk (AR). When applied to the population, some have called it population AR (PAR), which is then stated in terms of percent. AR is a practical consideration of the combination of the relative risk of stroke for the risk factor and the prevalence of the risk factor in the population. This concept was first introduced by Levin6 more than 40 years ago in regard to lung cancer and cigarette smoking. Levin pointed out the public health utility of the concept but warned of the need to establish causality, and we have to keep that in mind in today’s discussion.
A simple but artificial example of the concept follows. If there were 1000 persons with ischemic stroke in a population sample but 600 did not have the risk factor and 400 did, there are then apparently 400 cases related to the risk factor. The AR is 0.40 and the PAR is 40%. A commonly used formula for this is the following: where P is the prevalence of the factor in the population and R is the relative risk. Multiplication by 100 gives PAR. This equation shows the influence of both relative risk and prevalence of the risk factor on the value of AR.
Graphs showing the relationships among relative risk, prevalence in the population, and AR (Fig 4⇓)7 indicate that if the prevalence in the population is low, for example 2%, the AR will be low even if the relative risk is as high as 6 or 7. By the same token, if the prevalence is high (40% to 50%), a relative risk as low as 2 would give an AR of more than 30%.
An equation7 can be derived from the one previously shown. in which P(¯F‖D) is the proportion of cases free of the risk factor. The earlier equation requires information on the prevalence of the risk factors in the whole population. For the latter equation, it is necessary to determine only the prevalence of the risk factors in the cases when the cases are all the cases in the population.
The graphs showing the relationship among the relative risk, prevalence in the cases, and AR (Fig 5⇓)7 indicate that if the prevalence in the cases is 75% and relative risk is 2, the PAR is nearly 40%. If the prevalence is 5%, the PAR is less than 5%.
These comments refer to the simple circumstance of relating a single risk factor to a disease, which is not the reality of assessing AR in regard to ischemic stroke. The successful use of models to estimate independent relative risk indicates that modeling of AR would also be appropriate and useful if that were possible.
Multivariable modeling has not been possible thus far because of the lack of a means to estimate the standard errors and therefore the 95% confidence intervals. My mathematical and statistical colleagues, Dr Michael Kahn and Dr W. Michael O’Fallon, have recently developed the theory and computer programs to obtain estimates of AR using computer-intensive methods to estimate standard error and confidence intervals (W.M. O’Fallon, J.D. Sicks, C.P. Chu, M.T. Kahn, J.P. Whisnant, unpublished data, 1997). I shall not try to explain the procedure, but I will talk about how multivariable modeling affects the estimates of AR. Because the assessment of relative risk that I described earlier was from a population-based case-control study, the relative risk estimates were in the form of ORs from multivariable logistic models.
The 11 risk factors that were significant in the univariable analysis and their respective prevalences in the cases, ORs, and AR estimates are shown in Fig 6⇓. The ORs are adjusted for age, sex, and date of stroke because of the matched nature of the study design. With these adjustments it is still an analysis of one variable, without regard for confounding or modification by other variables.
Under each risk factor abbreviation, the prevalence of the factor among the cases is noted. For example, 17% of the ischemic stroke patients had a TIA before the stroke. The open circles, corresponding to the scale on the right, indicate the ORs. TIA had an OR of 5.6, the highest observed. The closed circles above each abbreviation correspond to the AR scale on the left. The AR associated with TIA is 14%, with a 95% confidence interval from 11% to 17%. In contrast, hypertension was present in 74% of the cases. The OR was only 2.0 for all ages, and the AR was 37%, with a 95% confidence interval from 28% to 47%. Similar information is noted for the other nine factors (ischemic heart disease [AR=13%], current cigarette smoking [AR=12%], atrial fibrillation [AR=9%], left ventricular hypertrophy [AR=8%], congestive heart failure [AR=7%], diabetes mellitus [AR=7%], mitral valve disease [AR=4%], regional wall motion abnormality of the heart [AR=2%], and aortic valve disease [AR=2%]).
The sum of all of these is 113%. If we assumed that the relationship of each risk factor with ischemic stroke was causal, it would indicate that removing all of these risks from the population would prevent more than all of ischemic stroke, which obviously is not possible. This has been the problem with AR estimates that we and others have noted previously. It is necessary and important to have multivariable modeling of AR.
I have noted earlier in this presentation that we identified seven risk factors that were independently significant in the assessment of relative risk through estimations of ORs. There were also four significant interactions.
Table 3⇓ shows the estimates of AR for each risk factor, with 95% confidence intervals adjusted for all other variables, including the interactions. ORs are not shown because the interactions preclude simple OR estimates. Table 3⇓ shows the AR for each variable, with each other significant variable treated as a covariable.
The AR estimates are obviously lower than those in Fig 6⇑. With hypertension as an example, the “univariable” estimate of AR was 37%, which was reduced to 26% when adjustment was made for all of the modifying variables and their interactions as well. It is not enough simply to adjust for the main effects without considering these interactions. However, the AR for TIA is about 14%, with or without the adjustment.
Fig 7⇓ shows the impact of considering only the four risk factors with the highest independent ARs, ie, hypertension, TIA, smoking, and ischemic heart disease. Thus, hypertension plus any one or two of the other independent risk factors collectively is associated with an AR of close to 40%. For any two of these four without hypertension, the AR is about 25%. The presence of all four of these risk factors collectively is associated with an AR that is still less than 50%, and all seven of the independent risks would be associated with an AR of 57% (less than half of the 113% from the univariable estimates that I noted earlier). When variables are considered collectively, the AR is less than the sum of each because of overlapping prevalence of the variables.
The pathologic substrates of ischemic stroke include large-artery disease, either extracranial or intracranial disease, arteriolar disease, and cardiac disease as a source of embolus. The remaining mechanisms are either unknown or related to some mechanism of low frequency, such as dissection, inflammation, or a coagulation problem. The proportions noted in Table 4⇓ are my own informed estimates, but there are no data to make precise judgments about these proportions.8
If these are the limits of mechanisms, we should consider how each independently significant risk factor affects one or more of these pathologic states in regard to AR. Epidemiologic studies assess the risk factors (or comorbid conditions) from the perspective of an association with ischemic stroke. This need not mean a causal relationship. However, there is sufficient evidence to indicate that hypertension, cigarette smoking, and diabetes mellitus have a direct effect on the occurrence of moderate to severe atherosclerotic stenosis of extracranial and intracranial arteries9 and good evidence of an effect on intracranial arteriolar disease. Ischemic heart disease could be considered a marker for atherosclerotic disease, but a portion of ischemic heart disease (ie, recent myocardial infarction) may be an embolic source. We might think of TIA as mostly a marker for large artery disease, but we know that some quite typical TIAs occur because of a cardiac embolic source.
If we consider the collection of variables that affect arterial disease (hypertension, smoking, diabetes, ischemic heart disease, and TIA), the adjusted AR is 52% (ie, about half of the ischemic strokes are related to extracranial or intracranial large-artery disease or arteriolar disease).
By similar logic, the independent association of atrial fibrillation and mitral valve disease with ischemic stroke is more likely related to a source for cerebral emboli from the heart. The AR for that combination is 10%. If all TIA and ischemic heart disease were included in this stroke mechanism, the AR would be 33%, obviously an overestimate. Therefore, a cardiac embolic source would account for more than 10% and less than 33% of all ischemic stroke. This is consistent with 25% that we have generally considered as the proportion of ischemic stroke due to cardiac emboli simply from the associations of these conditions with ischemic stroke,8 but we probably have not considered all the cardiac conditions that could be sources for emboli.
We have information missing on prevalence or independent relative risk (or both) that precludes fitting other putative risk factors (Table 5⇓) into this picture. The 57% of ischemic strokes for which we have accounted by modeling of ARs certainly leaves room for other risks, but we have to consider whether the prevalence of a putative risk factor is high enough to have an impact, whether the relative risk is independent of other factors, and whether the AR is modified by other factors.
The interactions of age with TIA, hypertension, and smoking emphasize that management strategies for prevention have a greater potential for benefit in younger persons with these risks because of the higher relative risk for stroke at younger ages for these variables. Preventive strategies in older patients have to reach many more people to have a comparable effect because of the lower relative risk with increasing age.
A stroke prevention program aimed at risk factors that can be modified must be designed with an understanding of the independent relative risk, the prevalence, and the modified independent AR and adopt only proven measures to modify or control the specific risk factor.
I appreciate the statistical consultation by Dr W. Michael O’Fallon. Funded in part by grants from the Public Health Service, National Institutes of Health NS-06663 and AR-30582.
Presented as the Willis Lecture at the 22nd International Joint Conference on Stroke and Cerebral Circulation. Anaheim, CA, February 6-8, 1997.
© 1997 American Heart Association, Inc.
- Copyright © 1997 by American Heart Association
Whisnant JP. The decline of stroke. Stroke.. 1984;15:160-168.
Whisnant JP. A population-based model of risk factors for ischemic stroke: Rochester, Minnesota. Neurology.. 1996;47:1420-1428.
Urban Hjorth JS. Computer intensive statistical methods validation model selection and bootstrap. London, UK: Chapman and Hall; 1994.
Levin ML. The occurrence of lung cancer in man. Acta Unio Internationalis Contra Cancrum.. 1953;9:531-539.
O’Fallon WM, Sicks JD. Attributable risk. In Whisnant JP, ed. Stroke: Populations, Cohorts and Clinical Trials. Oxford UK: Butterworth-Heinemann; 1993.
Whisnant JP. Extracranial and intracranial arterial disease. Hypertens Res. 1994;17(suppl 1):S43-S46.