Identifying Low-Quality Preclinical Studies
See related article, pages 2824–2829.
The failure of the NXY-059 development program for acute stroke therapy has been a major setback in the field of neuroprotection. Large pharmaceutical companies are unlikely to finance further trials in this area of research in the foreseeable future. There have simply been too many disappointments.1 Reviewers at the National Institutes of Health have similar concerns and are currently resistant to funding much of this type of work. This is particularly regrettable because much has been learned about how to design and conduct such investigations in the past 15 to 20 years. I think it is likely that if current methodology had been used for several of the drugs that were abandoned along the way, some would have been found to be safe and effective for treating acute stroke victims.
The article by Macleod et al2 in this issue is an attempt to identify specific factors that lead to the failure of NXY-059 to be proven useful. It is their contention that low-quality preclinical investigations were, in large measure, to blame, and that they have a method for detecting inferior work. Specifically, they used their Collaborative Approach to Meta Analysis and Review of Animal Data from Experimental Stroke (CAMARADES) checklist3,4 to identify flaws in the individual preclinical studies. They further decided that their findings strongly suggest biases of the preclinical investigators, which was a major factor in their providing misleading evidence. Ignoring for a moment that there are numerous suppositions in this logic, I think the use of an unproven “gold standard” for assessing predictive usefulness of preclinical investigations is a serious error. Although the preclinical studies may have their inadequacies, it is my contention that the CAMARADES criteria are also seriously flawed.
There are 10 items in the CAMARADES list. Half of them are quite reasonable. These are: publication in a peer-reviewed journal, randomization to treatment, 2 items concerning proper blinding, and avoidance of anesthetics with neuroprotective properties. However, the rest of the list is questionable to me. They want explicit statements in each publication of the following: proper temperature control, use of animals with hypertension or diabetes, sample size calculation, compliance with regulatory requirements, and statements concerning possible conflicts of interest.
Most good preclinical investigators publish an initial article that contains details of their procedures and subsequently they refer to their earlier publication for a fuller explanation of their methods. Therefore, restatement of all of this is redundant. Temperature control was a problem in some laboratories, but once the potential error was pointed out,5 few if any investigators continued to make that error. Is it still necessary to state that in each publication? Preclinical investigation techniques, just like clinical investigation techniques, have evolved for the better over time. What about all of the other possible errors that are not being made? Using animals with comorbidities has not been shown to be predictive of successful clinical trials. In fact, the only successful clinical investigation, that of tissue plasminogen activator, did not include use of such animal models. Sample size calculations are essential for human trials that are quite expensive, to minimize costs. That is generally not true of preclinical investigations, and making this mandatory is unsubstantiated. Furthermore, the problem has been that too many preclinical trials have found efficacy that was not corroborated in clinical trials. Therefore, the preclinical studies were not underpowered. Compliance with regulatory requirements is essential in human studies for ethical reasons. For animal studies, in most countries, such requirements are also in place for humane reasons, but what does this have to do with the validity of the results? Finally, possible conflicts of interest are important, but virtually all journals require authors to provide explicit statements of such potential biases to the editors and some do not publish them.
Therefore, it is not surprising that none of these items were correlated with “low-quality studies” with the exception of hypertension. Was that a play of chance? The fact that the CAMARADES list as a whole correlated with low quality is almost certainly due to the reasonable items and not the irrelevant ones. Therefore, this list needs to be revised. More important, good basic science depends on creative and critical thinking. It is true that some preclinical studies are poorly done, but identifying them is not easy. The CAMARADES list is not the “gold standard” the authors hoped it would be. There is more work to be done. There have been too few successful development programs to allow us to go back and reliably determine which of the preclinical studies were prophetic. That is the best way of validating the quality of preclinical investigations. Bureaucratic requirements do not protect us from low-quality work and prescriptions without solid evidence are not informative.
I am not sure why the NXY-059 development program failed. Numerous publications have now addressed this topic and have come to a variety of conclusions. My perspective is that we are now very close to knowing how to prove some neuroprotectives are beneficial, and for a detailed understanding of this, any one particular failure is not vitally important. There is more than enough blame to go around, both for preclinical and clinical investigators, and I will accept my fair share of it. The fact that few, if any, neuroprotective drugs have been widely accepted as effective for any neurological disease is evidence that this type of research is difficult. However, discovery of therapies is an iterative process that, for acute stroke, is being prematurely terminated. It will be tragic if those of us who work in this area are not given further opportunities to conduct our work; the calamity will be for our patients who will not be able to derive benefit from the eventual successes that can come from it.
The opinions in this editorial are not necessarily those of the editors or of the American Heart Association.
Macleod MR, van der Worp HB, Sena ES, Howells DW, Dirnagl U, Donnan GA. Evidence for the efficacy of NXY-059 in experimental focal cerebral ischaemia is confounded by study quality. Stroke. 2008; 39: 2824–2829.
Crossley NA, Sena E, Goehler J, Horn J, van der Worp B, Bath PM, Macleod M, Dirnagl U. Empirical evidence of bias in the design of experimental stroke studies: a metaepidemiologic approach. Stroke. 2008; 39: 929–934.
Macleod MR, O'Collins T, Howells DW, Donnan GA. Pooling of animal experimental data reveals influence of study design and publication bias. Stroke. 2004; 35: 1203–1208.
Buchan A, Pulsinelli WA. Hypothermia but not the n-methyl-d-aspartate antagonist, MK-801, attenuates neuronal damage in gerbils subjected to transient global ischemia. J Neurosci. 1990; 10: 311.