The argument in The Fault Lies in our Stars features in Chapter 6 of the forthcoming Shipwreck of the Singular. It was sent out for comment to the following, who were chosen mostly by Mark Wilson. The responses received are below:
Corrado Barbui, Lisa Bero, Alan Cassels, Angus Deaton, Jean-Francois Dreyfus, Andrew Leigh (author of Randomistas), Silvio Garattini (& Mario Negri Institute), Juan Gervas, Peter Gøtzsche, Bruno Harle, Jeremy Howick, Irving Kirsch (N), Joan-Ramon LaPorte, Joel Lexchin (N), John McMillan, Leemon McHenry, Barbara Mintzes, Florian Naudet, Abel Novoa, Anne Springer, Jacob Stegenga, Sean Valles,
Some of the following declined because of time issues. Future responses will be added.
Virginia Barbour, Paul Glasziou, Ralph Edwards and Rebecca Chandler of UMC, Prescrire in Paris, Carl Heneghan Tom Jefferson and Nicholas DeVito linked to the CEBMOxford, University of Toronto Dept Statistics, McMaster University Dept of EBM, John Ioannides, Aaron Kesselheim, Harlan Krumholz, Vinay Prasad,
Corado Barbui
Editor Epidemiology and Psychiatric Sciences
David
Nice reading, very provocative indeed. I like the way of thinking differently, my only concern is that you included a number of different considerations that would require a paper each to be analysed in details. A second issue is that you do not seem to suggest an alternative way.
To me, the main contribution of your paper is here:
Clinical practice is essentially a judicial rather than an algorithmic exercise. The view offered here is that our best evidence as to what happens or is likely to happen on treatment lies in the ability to examine and cross-examine the person or persons (interrogate the data) given that treatment. Every day of the week doctors and patients continue or stop treatments based on judgements as to whether the treatment is working or not. These judgements have to be mostly correct or else medicine would not work.
But what holds true at the individual level must be true at the population level also. The evaluation of a treatment must be judicial rather than algorithmic.
I totally agree with you, but nobody in the world thinks that decisions should correspond to the results of the last trial, while you are assuming that. You did not cover the step between the production of evidence to its use in practice, i.e. guideline production and implementation. If you check GRADE, for example, it becomes clear that recommendations are judicial, as they take into consideration “the evidence” as only one component. Factors such as values, preferences, feasibility issues, equity, etc. are other components. And, at the end of the day, guidelines are by definition based on judgements and represent an average recommendation for an average patient that does not exist, so good doctors are not those who follow guidelines but, rather, those who take evidence-based guidelines into consideration in a shared decision-making process. So, guidelines are judicial, and decisions are judicial.
If, within this process, you are suggesting that RCTs should not be the study design to build the “evidence base” (which is only one component of the decision making process) then I would react by asking you to give me a better alternative. Until a better alternative is not around, good to criticize trials but we cannot go ahead without them, unless we want to give credit to the many professionals who think they know what is good for their patients, they don’t need any type of studies, and they would be happy to read your paper, as it will give support to their opinion, that is that research is garbage and they know the truth for their patients. I wouldn’t agree with this position!
Please, keep my thoughts as those of an average reader, and thank you for raising these important challenges on clinical trials.
Best wishes! Corrado
Lisa Bero
Cochrane Centre, U of Colorado
Hi David,
Thanks for contacting me. I have some comments in the attached… I honestly did not have a lot of time to devote to it, but I hope you find it useful.
Lisa
Alan Cassels
Author Selling Sickness, Therapeutics Initiative Vancouver
David,
I liked the paper a lot. It reminded me about all the interviews I did with the founders of the Cochrane Collaboration of the early days of placing all their trust in RCTs and how controversial that was even back then. With 20 years of experience of the snobbery of the RCT, the smart people now realize how they were being ridden like donkeys and that the industry-funded RCT was just another marketing gimmick that companies needed to invest in, bend to their purposes and then manipulate in order to maximize sales goals. If the tainted RCT is the ‘gold standard’ in medicine, then surely the “Platinum standard” has to be the meta-analysis of tainted RCTs.
I can’t help but think the whole enterprise is corrupted from start to finish and that the lofty rhetoric of medical academicians with their credentials, their RCTs and their impact factors are little more than high priced circus barkers.
I remember a presentation by Richard Smith at the Cochrane Colloquium in Melbourne in 2005, where he said something like “Is Academic medicine for sale? No, the current owners are very happy with it”. He nailed it.
The sentiment was captured in this article:
https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020138
Too many people seem to treat all RCTs–even industry-funded ones– as if they were dispassionate searches for truth, and even after we spend all this time, money and energy pointing out the many flaws, tricks and manipulations in trials, we still treat them as something to be examined, dissected and treated seriously. Then we summarize them, quote confidence values, NNTs, p-values, and all that stuff even though we know it’s mostly.
Between you and I, we waste an enormous amount of energy basically being gaslit by the drug industry. Drives me bananas.
Angus Deaton
Professor of Economics, Princeton. Nobel Prize for Economics
Thanks, Dr Healy
I agree with a lot of this, especially the importance of knowing what you are doing, and the hopelessness of trying to replace that by RCTs, or any other mechanical process. Not sure we are on the same page about inference or the role of standard errors.
Here is a much less technical piece that I wrote, but again, mostly about economics. (I do think there are important differences between medicine and social sciences here, but I do not fully understand how to think about that.)
http://www.princeton.edu/~deaton/downloads/Deaton%20Randomization%20revisited%20v7%202020.pdf
Surprised that you write that life expectancy is falling. Not in Canada, I thought.
Jean-François Dreyfus,
Psychiatrist, Head CNS development Rhone Poulenc, CEO of a CRO
First of all, let me apologize, in advance, for some lack of nuance in the wording of my remarks. As I now live in the west countryside of France my command of the English language has clearly decreased.
This said, I would like to compliment Dr. Healy for his provocative article although I do not totally endorse his contentions.
Epistemologically, my teaching of clinical trials’ methodology to pharmacy, medicine or engineering students always started with Karl Popper and John Stuart Mill. Popper because I believe that knowledge progresses by showing that a consequence of the current leading theory is falsifiable and, therefore, this theory has to be modified/improved – if you have seen only white swans and are convinced there is no other color for swans, the first time you see a black swan, you have to revise/improve your theory. Mill because in science causality is a crucial issue. This is rather easy with inanimate objects: you take two similar ingots and heat one of them. Everything else being equal, if the heated bar melts you may conclude that heat, being the only difference in the experimental environment, is a cause of metal melting. My first quiz was how to apply this model to humans, i.e., to decide on the causal role of a factor on differences between individuals; even if you are dealing with monozygotic twins, they cannot be strictly identical as their previous life experiences, current health status, aging process, life events encountered during the experiment and personal relevance of outcome criteria cannot be considered identical. Except in emergency situations, as we have seen recently with the COVID-19 pandemic, one individual who provides data in good faith and whose data are analyzed by well-meaning specialists cannot be the sole basis for a conclusion that applies to the whole population. I fully agree that we should not dismiss individual data as second-rate but it is my contention it should be used as the basis for new hypotheses to be tested.
Daniel Schwartz, who taught me statistics, used to state that a true scientist obtaining confirmation or rejection of a hypothesis had to accept the results with equanimity as, whatever the case, they added to one’s knowledge. According to my doxa, using relevant groups and randomizing the group to which a participant belongs is the only way to equalize all the known and unknown factors that could confound the results. Of course, statistics were necessary to determine if group differences could occur by chance alone. No need to resort to “classic” statistics; bootstrapping and data permutations could be used to do such tests and, if considered necessary, to obtain confidence intervals. I shall therefore forego dealing with calculation considerations.
As to the issue of the primary endpoint, most certainly to reach a clear-cut conclusion on whether two groups are different, one has to indicate in advance on which criterion such a judgement will be made. Of course, there are methods (to cite one: Carlo Emilio Bonferroni’s and his followers) that allow testing multiple endpoints. Actually, the requirement of a primary endpoint is a consequence of the eagerness of industry (and authors) to report positive results in order to beat the infamous publication bias. With multiple endpoints and no predominance given to any, communicators were always able to claim the study results were significant. One has to require that a primary criterion be specified in advance to filter out truly positive trials from those in which positivity was a post hoc reconstruction. That you need RCTs to avoid personal bias in obtaining, analyzing or interpreting group comparisons is a postulate I never brought into question. Thus, from an epistemological point of view, I must admit I am not completely in agreement with Dr. Healy.
But what about the rest of his demonstration? Not only do I agree with it but I contend that Healy could have gone a bit farther in his criticisms. When I was Head of the CNS development unit of Rhône−Poulenc, at that time the largest French pharmaceutical company, I had problems with Phase IIa. For those not familiar with this terminology, to be granted marketing permission a pharmaceutical product has to go through Phase I (in general, studies on healthy volunteers), Phase II (the initial studies in patients) and Phase III (notably, large scale studies in patients who resemble those in whom the medication is to be used). To be more precise, Phase II is generally subdivided in Phase IIa, in which therapeutic effects in humans are ascertained, and Phase IIb, in which therapeutic hypotheses, especially on dosing, developed during Phase IIa are confirmed using placebo-controlled RCTs on clean homogeneous patient populations with clear-cut conditions and using appropriate validated measurements. In Phase IIa one jumps from results obtained in healthy volunteers through studies that are fully formalized to establish whether the product has actual therapeutic properties. How is the trick done? Time and again, I tried to obtain some hints from colleagues in other companies but it seemed to be a trade secret.
Finally, I developed my own solution: a network of trusted physicians who received, in confidence, a preliminary formulation of the new compounds and an in-depth investigator’s brochure. They were told about what animal studies had led us to expect and, within certain limits, were free to experiment and accumulate data and experience; they would briefly exchange, say at biweekly intervals, information about their findings but were left free to continue in a direction they felt promising, even if it was considered a dead end. A few months later, investigators, data analysts and pharmaceutical executives gathered in a secluded place and results were dissected until some sort of consensus emerged on indications, dosing, precautions, adverse events to be expected, etc. It was my responsibility thereafter to draft the most suitable development plan which was submitted to the Phase IIa investigators for comments. However, I had the final say. Thus, RCTs were indeed considered a must but they were based on clinical judgement and methodologists’/analysts’ creativity. In the 1980s a company could still decide to forego financial profits and develop a drug that would enhance its scientific/humanitarian image, for instance an original treatment for a rare condition (e.g., amyotrophic lateral sclerosis) considered beyond current medical reach.
David Healy shouts what many people who went from academy to industry (and vice versa) have been murmuring: that the pharmaceutical industry is the major culprit of what we are currently seeing. Due to my own biography, I am quite picky at where you should start and/or stop the analysis of an issue and, in this case, the author stops the causal regress before going on to what I consider to be its actual start. Of course there is not a single responsible factor but if one needs to be singularized, I believe that the role of financial capitalism should be singled out. If this is not done, the public will not be in position to understand how it happens that well-meaning individuals led psychiatry to its present unfortunate state.
Of course, some will insist that this is not a psychiatric issue but a societal one and that such a quandary has no place in a scientific discussion. However, I hope to substantiate that the need for pharmaceutical companies to be highly profitable so as not to see their shareholders leave to invest in more yielding fields elsewhere led them to decrease as much as possible the hazards of drug development. It probably started with diagnosis and diagnostic tools.
Industry convinced academia that diagnosis had to be more objective, based on harder criteria and that fuzzy categories prevented psychiatry from being considered as a fully acceptable medical domain. And academics, I was one of them, jumped on the bandwagon, forgetting that this approach somehow dehumanized the physician/patient relationship. Next, scales were developed to assess every disorder. Recently, I went to the site of the Canadian Paediatric Society and found six preliminary screening tools and 35 (I may have lost count) scales to measure specific conditions; of course, these scales provided, among other benefits, more precise estimates of drug effects but, once more, precision was gained at the expense of comprehensiveness; an atypical symptom, or at least a symptom not resembling those that were considered as core symptoms, had a greater chance of being missed. Guidelines could have mitigated these effects but in most cases they hampered creativity by stating what was to be done and how it was to be done. Even junior collaborators could then design RCTs. Such guidelines were also considered by industry as a protection because only major companies had enough resources to abide by them.
By multiplying guidelines in order to be more precise and supposedly helpful, well-meaning persons ensured, more or less, that smaller and more creative companies with limited means could not compete with larger ones. For instance, if you separate general anxiety from panic attacks, and I will not dispute here that there might have been good reasons to do so, two sets of studies would be required and the development costs would surge, becoming unfitting for smaller companies’ budgets. This epidemic also encompassed pharmacovigilance. For instance, the MedDRA system was established in the 1990s and, as a user, I certainly will not “throw out the baby with the tub water,” as we say in France, since MedDRA has its merits but its costs may not make it the most efficient system to make sure we have a complete reporting to health authorities of rare but every significant adverse event, post-marketing. Medical journals bear a heavy responsibility in publication bias and their semi-incestuous relationship with the pharmaceutical industry makes their request for evidence of a lack of conflict of interest among authors somewhat intriguing.
Finally, Healy rightly asks us how one could get out of this quicksand system. Given the financial power of pharmaceutical corporations, I do not believe that a group of independent, trustworthy individuals would stay independent for very long. On the other hand, one can concur with the recommendation of making public the data on which registration and marketing permission was gained. Of course, there will be objections: why should a company be compelled to show some trade secrets to other companies? Most certainly, there could be steps taken to avoid harming those who pioneer a new approach by giving them, for instance, a period of exclusivity, forbidding their investigators to work/consult for competitors, or making sure that data were disseminated separately to teams that would replicate part of the analysis, with only a global synthesis being available to the public. In addition, among the solutions not mentioned by the author, at least two may be worth discussing: 1) increasing the role of ethics committee and 2) increasing the resources of health agencies.
Ethics committees should be given proper means, rights, authority and enough time to actually examine a protocol in its context and make sure it is not biased. They should also include many more patient representatives, a random sample of lay persons who would be progressively trained and individuals concerned with economics and doctor/patient relationships, even if this sounds like an oxymoron. As to health agencies, they should have the resources to recruit as employees or advisors, methodologists of a good stature who would be in position to refuse a protocol if they felt it was doomed to miss the objective(s) of its purported aim. One would also benefit from generalizing the US system of publicly held sessions of advisory committees, which, viewed by outsiders seem to be fairly efficient.
To conclude, it is my contention that RCTs remain one of the pillars of knowledge, if only because they prevent us from jumping to conclusions that are not warranted by data and building a coherent system out of false premises. Not only can a coherent system be built on false premises but it has been proven that a system that is more coherent is not by essence closer to truth than a less coherent one (Bovens and Hartmann 2003). It is, therefore, all important to base our actions on solid foundations. However, the inappropriate construction and the irrelevant exploitation to promote new drugs has led to an inordinate belief in a pharmaceutical company’s ability to rightly answer questions that should have been left to other decision means. Should we say that the rise of lay persons’ skepticism (for instance on the efficacy of vaccines) is an unanticipated consequence of this situation?
Reference:
Bovens L, Hartmann S. Solving the riddle of coherence. Mind, 2003;112(448):601–33.
This lengthy comment led to a Response DH.
Silvio Garattini,
Creator of the Mario Negri Institute
Dear David,
I have read with interest your article and my conclusion is that I largely agree with your considerations but at the same time I do not believe that the problem is the use of RCT, which are good tools when they properly planned and conducted.
Here are some points of reflection.
The problem in medicine is cultural because it is centered on the cure rather than prevention. The prevention is against the interest of the market of medicine and even of doctors.
The prevalence of the market tends to emphasize benefits and in fact RCT are made to prove benefits in terms of numerosity and end-points. To study toxic effects is considered “second class science”. Benefits are searched, toxicity is awaited. The use of placebo is to maximize benefits while the use of comparators would reduce the benefits.
Frequently, the effect of a drug is to reduce a symptom but quality of life of the single patient is seldom measured to establish if other side effects outweigh the benefit.
It is certainly difficult to translate the results of RCT to the single patient in the clinical practice. The problem is that doctors when they prescribe a drug to lower blood pressure do not know they have to treat more than 100 patients to avoid one stroke. They could do much better recommending a good lifestyle!
For all the drugs utilized for chronic diseases there no RCT to reduce NNT to reasonable numbers by selecting suitable patients.
In conclusion RCT should be more useful if we could minimize commercial interests which is supported by the corruption of many academics.
Best wishes and congratulations for the paper! Silvio
Juan Gervas
General Practitioner, Spain
Essay We Must Abandon EBM sent as response
Peter Goetzsche,
Founder of Cochrane Collaboration & Institute for Scientific Freedom
Peter’s Comments in Text.
Sander Greenland
Statistician, Epidemiologist, UCLA
Dear David,
This looks interesting so thanks for sending. I should hope you pursue it.
I am however swamped at the moment (thanks in no small measure to the pandemic) so I don’t know when I can do it justice with comments.
I did notice that Hill 1966 was cited in the text as “Hill 1965”.
I further noticed an absence of recent citations on the issue so I attach a few;
the Deaton-Cartwright paper was followed by comments and related articles but I don’t have those on hand – they can be seen here:
https://www.sciencedirect.com/journal/social-science-and-medicine/vol/210/suppl/C
I also attach an old one of mine that Deaton and Cartwright cite.
Whether those are helpful I must leave for you to judge.
Please send along revisions as they arise. If it ends up being published it may be a useful citation in my future work.
All the Best,
Sander
Bruno Harle
Child Psychiatry, Lyon France
Bruno’s Comments in Text
Jeremy Howick
Philosophy Oxford, Author The Philosophy of Evidence
Dear David,
Many thanks for sharing this interesting document, and congratulations on your new post at McMaster, the birthplace of EBM.
I enjoyed reading the paper and didn’t see anything big that stood out, with one possible exception. I think you mentioned, but did not emphasize the fact that the agenda can be controlled by commercial interests. I’ve attached my paper on power and rationality which explains this.
A few comments, all of which could be easily addressed if you agree with them.
- There is a great controversy about the first RCT in medicine, but the view emerging is that it was well before 1947 (see https://www.jameslindlibrary.org). The 1947 one is the most famous, but it might be worthwhile tipping your hat to what came before.
- It is great that you note that Hill’s use of randomisation was for controlling bias not stats (“Hill’s randomization was a method for fair allocation, not a means of controlling for the unknowns linked to doctors not knowing what they were doing (Healy 2020). ”) This corrects a mistake many philosophers make.
- You said “RCT evidence should never trump an evident safety effect that appears after treatment.” They don’t! To wit, the Oxford vaccine trial has been paused twice because of safety issues already.
- You saidk, “Drug Trials are done on healthy volunteers, and ordinarily do not have a primary endpoint.” This is sometimes but not always true. In the few trials I’ve had funded, the trial participants were chosen because they had conditions. Ditto for pragmatic trials.
- You stated: “Every medicine that gets on the market, by definition, beats placebo (often inconsistently). As a result, it has become unethical to use placebos in clinical practice, when for those for whom it works a placebo may be preferable to therapeutic poisoning.” This does not follow. There are other reasons why placebos may be desirable (or at least not unethical) in clinical practice (see my attached book chapter).
- You stated: “As of 1951, FDA made most new medicines prescription-only on the basis that they are unavoidably risky.” Did you mean ‘avoidably’?
- I found the paragraph starting with “The ability of RCTs to focus on one effect…” difficult to follow.
- On page 8 your comments about the failings of expertise did not convince me. The fact that many treatments were introduced in the 1950s is only partially relevant and I would omit it.
- You state: “An endorsement of clinical judgement does not suit health service managers or the pharmaceutical industry, for whom the supposed generalizability of RCT knowledge and confidence intervals that can be offered for such knowledge are legally appealing.” This is not true: Big Pharma adapts what they like and it was cheaper for them when they could invite experts to Nice for a conference at the beach in exchange for signing a consensus conference. See my power and rationality paper attached, and you admit as much further down when you say “After 1962, RCTs became the standard through which industry would make gold,”
- You state: “There are no drugs licensed to treat adverse effects.” This is not true in many areas, including cancer and diabetes.
- Do you have a reference for “So does data indicating antidepressants are now the second most commonly used drugs by young women in the face of 30 out of 30 trials negative on the primary outcome, which advocates of RCTs, with no links to industry, claim to be able meta-analyze and extract positive effects from data taken from ghostwritten publications, without access to trial data”?
Have a great day, Jeremy
Irving Kirsch
Author of The Emperor’s New Drugs
Hi again,
I’ve read through about half of the paper. I couldn’t help making comments on it (attached). As you can see, I do have some major issues with it as currently written.
Irving’s Comments here
Best,
Irving
Joan-Ramon Laporte,
Founder of the Catalan Pharmacovigilance Centre, Former Lead for WHO Essential Drugs and Supporter of Barcelona FC
Hi David. Sorry for being late.
Regarding the general argument:
At best, RCTs measure the effect of an intervention on a particular variable. Regarding the effect, priority is given to CIs rather than to the magnitude of the effect, and this should perhaps be stressed. They evaluate an intervention, not patients. They tell us if the intervention has a probability to produce a given (and usually clinically narrow) effect. But they are not designed to tell us which patients will probably respond with a benefit, which patients will not respond, and which patients will respond adversely.
Clinical trials are a regulatory procedure, rather than science and knowledge generating. Regulators look at the intervention, not at how the drug will be used and misused in practice once marketed.
There is a responsibility of the state to protect the health of patients and citizens, and the economic health of the health care system. Do you have an alternative to RCTs for determining if a drug or other therapeutic, preventive, or diagnostic intervention can be reimbursed with public funds? The same when health care is privatized.
However, RCTs (and meta-analyses) have been useful to show the inefficacy of a lot of interventions (e.g., internal mammary artery ligation), but also their adverse effects (e.g., HRT for lifelong youth after menopause, colonoscopy for prevention of colon cancer, antiarrhythmics after MI, and so many). Their value should not be dismissed because they are used for marketing purposes. If well designed and done, and not fraudulent, they help us to discard unsubstantiated statements of efficacy or safety. As you say, suicidality by SSRIs was already seen in RCTs (and then masked under other diagnostic categories, as emotional instability).
I see RCTs as the best potential study design to evaluate causality. In the eighties, N of 1 trials were proposed as a way to understand the effects of interventions on individual patients with chronic conditions. But N of 1 trials have nothing to do with marketing and promoting medicines misuse, and they are very rarely used, even with new (and expensive) biotechnological products.
There is a problem of language too. The power has abused of language, by imposing terms as efficacy (superiority over placebo in a given variable, no matter how irrelevant it is), evidence (which is not self-evident, particularly in Spanish and in other languages), safety to refer to toxicity and harms, etc. The regulatory and commercial use of RCTs has to do with all this.
Regarding specific issues of the paper:
The methodological critique (pages 2 to 4) could be reorganized. You may have your reasons to quote confidence intervals first, bur conceptually there are other issues that come before, and that may be even more important than confidence intervals. I don’t know if you agree with the general strategy for the critical review of RCTs in terms of internal validity (i.e., how well was the trial done – main question and primary and secondary variables, interventions in the control group and dose in the control group, treatment of outlayers and missing data, sample calculations and statistical tests, publication of the results concerning the primary variable and other details of the initial protocol, relative vs absolute risk reduction, are the individual patient data available, etc.), and in terms of external validity (e.g., reference and source population, inclusion and exclusion criteria, clinical follow up vs routine follow up, detailed description of adverse effects, etc.). May be the paper would benefit by this sort of review.
I attach a reviewed version of the manuscript with additional comments and some grammatical corrections.
Andrew Leigh
Member of Australian Shadow Cabinet, Author of Randomistas
Responded but little time to engage
Joel Lexchin
Professor School of Health Policy & Management, York Uni.
David Healy writes very well and makes many good points but there are also weaknesses in the arguments that he makes. Here I will just outline what I think the weaknesses are.
- He goes from anecdotes to generalizations without any substantiation, e.g., Modell and Lasagna saw RCTs as the answer to a regulatory problem so therefore that’s the correct interpretation of why the FDA went in that direction.
- He makes many statements of fact without providing any evidence to back them up, e.g., life expectancy is going down in recent years. (It is in the US but what about other countries?) There are many other statements that should be referenced so that we can rely on more than just Healy’s assertions.
- He rightly points out that RCTs are bad for recognizing side effects of drugs but says comparatively little about the use of RCTs in identifying benefits.
- His solution for how we should decide whether to allow a new drug on the market seems to be to let experienced clinicians evaluate the drug. But he doesn’t answer important questions: who chooses the experienced clinicians, what are their qualifications, how many patients should each of them look at, how many clinicians for each new treatment? Moreover, if patients’ responses can be heterogeneous can’t clinicians’ evaluations also be heterogeneous? How do we reconcile their differences?
- Healy says that focusing on a primary endpoint may make us lose sight of the overall effect of a drug and that’s certainly a valid point. However, the equivalent to this statement for whether we should use seatbelts would be to say that if we test them to see if they save lives and find out that they do so, but don’t look for other effects of seatbelts we shouldn’t use them because we weren’t looking for the negative effects that they may have, e.g., causing serious chest and abdominal injuries.
- Healy says that it now takes decades to recognize side effects from drugs but using Health Canada’s safety warning letters, an admittedly flawed database, these are sent out, on average, about 3 years after a drug is marketed although it can be substantially shorter or longer.
- Healy says that drugs should be assessed by “collective evaluation” without defining the term. How does it work in practice, especially when the treatment effect is relatively small? Small treatment effects may be largely irrelevant if a drug reduces the time someone is symptomatic from the common cold from 7 days to 6.7 days, but a 3.5% reduction in mortality from a myocardial infarction due to a treatment from 8% to 4.5% is much more meaningful.
- Although Healy downplays the role of RCTs, but does not dismiss them, he also appears to rely on RCTs for some of his points, e.g., the RCT of imipramine for melancholia.
- Some claims require concrete examples and more information. For example, what drug can cause a rise in heart rate in some people and a drop in others and what proportion of people have a rise versus what proportion have a drop?
None of these comments should be seen as saying that Healy’s points are or are not valid, rather they are asking for a more fulsome presentation in order to be able to form a better judgment about what he is saying.
Joel’s further Comments in Text
John McMillan
Editor Journal of Medical Ethics
My apologies for taking a while to get to you about this, I had a heavy teaching load this semester and that along with JME duties meant a few things have sat in my inbox for a while.
I suspect what you say is true and that we have come to see the value of the RCT in a myopic way. I found the discussion of how RCTs came about interesting and it make me think of Hempel’s discussion of Semmelweiss and how explanations could be scrutinized by testing hypothetical explanations under new conditions where they should obtain. It’s quite a different mode of reasoning from that encouraged by EBM and placing RCTs at the top of the hierarchy of evidence.
If you’re thinking about a journal to place this in, it might be worth considering one that is more philosophy of medicine than medical ethics. The Journal of Medicine and Philosophy has been impressive since Mark Cherry took over as editor, I wonder about trying there.
All best wishes, John
Barbara Mintzes
Barbara has worked as a health policy analyst in T.I. in Vancouver, HAI in Holland and now Sydney. An interview with her will be on Samizdat.org soon
Hi David,
Thanks – interested to read the next draft. I would see the solution not as doing away with RCTs but instead taking the design, conduct and analysis completely out of the manufacturers’ hands, and also explicitly designing them to look at both benefit and harm. We all know drugs cause more than one effect in the body.
There’s an inherent contradiction between the safety concerns about untrammelled exposures to untested substances underlying the requirement for pre-market trials and the regulators’ continued acceptance of trial designs with only benefits identified as primary outcome measures. And to rely on companies both for pre-market and post-market studies is completely crazy.
Barbara
Florian Naudet
Psychiatrist, Professor in Therapeutics University of Rennes with links to Stanford Centre for EBM
Florian’s Comments in Text
Anne Springer
Professor of Nursing, University of Saskatchewan
Anne’s Comments in Text
Jacob Stegenga
Philosopher Cambridge, Author Medical Nihilism
Dear David ,
I’m very sorry for how long it’s taken for me to get you feedback. It’s been an unusual few weeks for me. I found your paper convincing — the general message is surely right and one that some philosophers of science and medicine have been urging for awhile. My comments below are mostly fine-grained questions or possible points for expansion, and don’t cause any general difficulties for your paper.
Thanks for sharing it with me, it’s interesting for me to see how you are thinking about these questions.
Best,
Jacob
“Hill’s randomization was a method for fair allocation, not a means of controlling for the unknowns linked to doctors not knowing what they were doing”
I am not sure exactly what you mean here. As you know, there is an active debate among statisticians and philosophers of science about this very point. Some people claim that randomisation in fact does control for the distribution of both known and unknown confounding factors. Other people deny this. The debate is technical and subtle. Anyway, you proceed to argue that trials don’t always give the same results, and conclude that there is a lot we don’t know. I’m not sure how the latter point about not knowing a lot is supposed to relate to the issue about randomisation. Wouldn’t a defender of RCTs say: exactly, given that there is so much we don’t know about what factors are causally related to measured outcomes in trials, we must randomise subject allocation, precisely to try to even out the distribution of unknown causally relevant factors?
“In medical RCTs, a focus on a primary endpoint is key to ensuring that only chance or measurement error will get in way of the correct result. Ipso facto, this means RCTs are not a good way to evaluate a medicine.”
The argument here strikes me as pretty quick. What is the argument? Is the idea that medicines have many effects, and so focusing on a primary endpoint amounts to gathering evidence on a small range of a medicine’s effects? Or? The comments in the following paragraphs suggest this argument. It is surely compelling. But of course, trials can measure many endpoints. But in general this argument is sound, and especially w.r.t. unintended harms of an intervention.
On this point, you use the term ’safety effect’ – the use of the term ’safety’ in the context of discussing harms of medicines has struck me as Orwellian abuse of language (I make this point in Medical Nihilism).
“In any trial where both condition and treatment cause superficially similar problems, as when antidepressants and depression cause suicidality or bisphosphonates and osteoporosis both lead to fractures, a dependence on RCT data rather than clinical judgement risks misleading.”
Yes, this point has been pretty well established in the philosophy of science community. The very general way that a philosopher might make the point is that probabilistic relations (as given by RCTs) underdetermine causal relations (which is of course what we want to know when making inferences about real effectiveness).
“Every medicine that gets on the market, by definition, beats placebo (often inconsistently). As a result, it has become unethical to use placebos in clinical practice, when for those for whom it works a placebo may be preferable to therapeutic poisoning.”
Important point and I of course agree. Although it might be worth noting that the medicine ‘beating’ placebo might be merely ‘putative beating’, because as you know allocation concealment can be broken more often in drug group than placebo group, and thus the placebo effect is merely stronger in the drug group than in the placebo group (this is Gotzsche’s claim about the putative positive effect seen in antidepressant trials).
“Finally, the suicidality, sexual dysfunction, agitation, and insomnia antidepressants cause in clinical trials are commonly folded into a primary endpoint, the Hamilton Depression Rating Scale (HDRS), which includes questions on suicidality, sexuality, sleep and agitation. These changes render confidence intervals around scores on these items meaningless, compromise the use of the scale more generally, and risk hiding a benefit.”
Also in Medical Nihilism I raise various complaints against HDRS, along these lines.
“Through to 1991, clinical knowledge of the range of effects drugs can cause derived primarily from clinical experience, embodied in case reports and published in clinical journals.”
In my response to your commentary last year, I made the argument for why we ought to be suspicious, at least usually, not always, of such reasoning—at least when it comes to inferences about drug benefits … I find your argument here more persuasive in the context of inferences about harms, which is what you are discussing here. And anyway, later you rightly claim that your point is to urge ‘collective evaluation’ rather than individual evaluation — this sounds compelling.
Johanna says
I can’t say I’ve studied ALL these responses — that’ll take some time! But a couple of points in Dr. Barbui’s response (the first one) struck me as classic Defensive Medicine — defending the medical-industrial complex, that is. And being unable to imagine a different path:
First, if he believes that “nobody in the world” thinks decision-making for real-world patients should be steered by the results of the latest trials? I’d like him to meet some doctors over here. They have been willing to turn on a dime and transfer millions of people from “normal” to “high-risk” on the strength of some rather dodgy clinical trials. Especially when it comes to blood pressure and blood sugar standards. I’m just one person, but already know three people who’ve been rushed to the hospital with sudden hypo-glycemia as a result of shiny new meds for Type II diabetes, ushered in with strict new blood-sugar limits.
Some of their doctors may be fools — but others may have no choice. Their marching orders are given to them by their hospitals, their private practices or the insurance company.
I’m also struck that he can see no alternative to the current dependence on flawed clinical trials. Except a return to the Bad Old Days when every doctor relied on their own habits and customs, and their resulting skewed view of what “experience” had taught them. It’s not as though you want to ban RCT’s. But even if you did, he acts as though that means banning Research Itself! Of course it doesn’t.
There are lots of other forms of research–including research that looks to find the basis for the responses that we see in the natural world. Drug companies benefit from that research but they don’t do a lot of it. They concentrate on research that results in Product Development. Maybe one reason RCT’s have become so over-valued is that they’re so well suited to Product Development.
The current Covid-19 crisis shows how well that is serving us (NOT). There’s so much we don’t know yet about vulnerabilities to the virus, how it spreads, who is most infectious, why it’s so benign in some people and deadly in others … Yet billions have been wasted on RTC’s of every biologic drug in the book that some drug company wants to squeeze another billion out of. Some of them with fairly extreme side effects. Not to mention the billions we’ll now spend on Remdesivir (in the US at least) which required huge trials to demonstrate an itsy-bitsy advantage.
Imagine how far along we might be if we’d put the billions elsewhere. Including the scientific study of adverse effects, which may answer all kinds of good questions we just hadn’t thought to ask yet.
susanne says
Fawlty Towers or Faulty Towers ——which need rebuilding?
The comments are such an interesting and tough read ,am still reading in chunks. I imagine most are more detailed than always the case which proves there is a serious interest . Something we /everybody couldn’t access any other way. When it clicked I realise this is a totally new way of dealing with peer review. It is more democratic, more open, it is free. there is a huge array of feedback as opposed to having to satisfy a few reviewers chosen by journals or other editors. The reviewers /commentaters were invited from a huge pool giving a more useful feedback . In most? publications which invite comments even these are ‘edited’ Could this kind of publishing be extended to become a Journal? I imagine calling it a Samizdat Journal would frighten the horses , some contributors and commentators would be put off – or start contributing anonymously. All the reviewers/commentators here were happy to put their names and organisations. There could still be contributions made to the what could become the old fuddy duddy journals or they my even be encouraged to copy a new more relevant style more accessible to everyone as well as experts and practitioners. In baby steps maybe. Normally interested outsiders have to rely on finding a publication – then going through masses of references – wonder how many of them get read even by those working on the subject.?
Hope we get more like The Fault Lies in Our Stars from DH Rxisk and others who might be influenced to have ago.
annie says
And then something unexpected happened. The Randomized Controlled Trials became the gold standard for everything – called Evidence Based Medicine. Randomized Clinical Trials are hardly the only form of valid evidence in medicine. That was a reform idea that kept people from shooting from the hip, but was also capable of throwing the baby out with the bathwater.
It’s an abnormal situation on purpose, suitable for the yes-no questions in approval, but not the for-whom information of clinical experience.
The commercial strangle-hold…
http://1boringoldman.com/index.php/2017/01/22/the-commercial-strangle-hold/
Structured RCTs may well be the best method for our regulatory agencies use to evaluate new drugs. They cost a mint to do and about the only people who can fund them are the companies who can capitalize on success – the drug companies. But medicine doesn’t need to shouldn’t buy into the notion that they’re the only way to evaluate the effectiveness of medicinal products. As modern medicine has become increasingly organized and documented, there are huge caches of data available. And it’s not just patient data or clinic data.
What about the pharmacy data that’s already being used by PHARMA to track physician’s prescribing patterns? And where are the departments of pharmacology and the schools of pharmacy in following medication efficacy and safety? or the HMOs? or the Health Plans? the VAH? What about the waiting room questionnaires? I’d much rather they ask about the medications the patient is on than being used to screen for depression. It’s really the ongoing data after a drug is in use that clinicians need anyway – more important than the RTC that gets things started.
So while it’s important to continue the push for data transparency and clinical trial reporting reform, it’s also time to explore other ways of gathering and evaluating the mass of information that might free us from the commercial strangle-hold we live with now – and potentially give us an even better picture of what our medications are doing over time.
There’s a way out of this conundrum. The task is to find it…