Editorial Note: Following the publication of Study 329 Restored, Martin Keller and his co-authors gave an initial response in which they said Nobody Pinned Anything on Us and also that they would respond in more detail later. It took four months for this response to appear. Our Response will follow in the next post and after that we will update our background correspondence with BMJ.
This and earlier correspondence and all the data from the Study 329, BMJ reviews of the study, a timeline of the history of SSRIs and all controversies linked to these drugs is all available on Study329.org.
There are two further Study 329 articles making Study 329 the most intensively studied Clinical Trial ever. The Study 329 website will make all this material available for anyone who wants to see how clinical trials operate – this study is not an anomaly, it is standard industry practice.
The picture above was dubbed Three Amigos by its creator. It features Charlie Nemeroff, Marty Keller and Alan Schatzberg. A picture that is at least as appropriate is below linked to the conflict of interest declaration.
BMJ 2015; 351 doi: http://dx.doi.org/10.1136/bmj.h4320 (Published 16 September 2015) Cite this as: BMJ 2015;351:h4320
18 January 2016
Martin B Keller, Boris Birmaher, M.D., Gabrielle A. Carlson, MD, Gregory N. Clarke, Ph.D., Graham J. Emslie, M.D., Harold Koplewicz, M.D., Stan Kutcher, M.D., Neal Ryan, M.D., William H. Sack, M.D., Michael Strober, Ph.D.
attn: Martin B Keller, MD, 700 Butler Drive, Blumer 120, Providence, RI 02906, USA
The BMJ article entitled “Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescence” reanalyzed data from the original Paroxetine 329 study, a double-blind placebo controlled comparison of paroxetine to imipramine. Paroxetine 329 was designed between 1991 and 1992. Subject enrollment began in 1994, and was completed in 1997. Academic psychiatrists designed the study, with very little change by GSK, which funded the study in an academic / industry partnership. The goal of the study was to advance the treatment of depression in youth, rather than primarily as a drug registration trial.
Overarching issues with the “Restoring Study 329” include:
Antidepressants considered as a group are superior to placebo for the treatment of anxiety disorders and for depression in adolescents, with similar overall response rates in anxiety and depression. 
The two primary outcome measures in Paroxetine 329, did not reach statistical significance. The abstract of the published paper noted: (1), “The two primary outcome measures were endpoint response (Hamilton Rating Scale for Depression [Ham-D] score ≤ 8 or ≥50% reduction in baseline HAM-D) and change from baseline HAM-D score.” In Table 2, the p value for the first primary endpoint (Ham-D score ≤ 8 or ≥50% reduction in baseline HAM-D) was reported at p < 0.11 for paroxetine versus placebo. In the same table, the p value for change in HAM-D total score is reported at p < 0.13. While both outcomes were in the direction of a better response for paroxetine over placebo; neither reached our critical alpha level of 0.05. This is clear in the abstract and text of the publication.
In the interval from when we planned the study to when we approached the data analysis phase, but prior to the blind being broken, the academic authors, not the sponsor, added several additional measures of depression as secondary outcomes. We did so because the field of pediatric-age depression had reached a consensus that the Hamilton Depression Rating Scale (our primary outcome measure) had significant limitations in assessing mood disturbance in younger patients. Taking this into consideration, and in advance of breaking the blind, we added secondary outcome measures agreed upon by all authors of the paper. We found statistically significant indications of efficacy in these measures. These secondary outcomes were clearly reported as separate from the negative primary outcomes.
Thus, the authors of “BMJ-Restoring Study 329” were incorrect in stating that “Both before and after breaking the blind, however, the sponsors made changes to the secondary outcomes as previously detailed. We could not find any document that provided any scientific rationale for these post hoc changes and the outcomes are therefore not reported in this paper.” Rather, secondary outcomes were decided by the authors prior to the blind being broken. Secondary outcome measures are frequently, and appropriately, included in study reports even when the primary measures do not reach statistical significance. The authors of “Restoring Study 329” state “there were no discrepancies between any of our analyses and those contained in the CSR [clinical study report]”. The disagreement on treatment outcomes rests on this arbitrary and non-blind dismissal of our secondary outcome measures.
In the abstract we stated “Conclusions: Paroxetine is generally well tolerated and effective for major depression in adolescents.” In this sample and with the state of knowledge at the time, it was justified and appropriate.
Our goal was to learn as much as possible about the use of this compound in youth, so that we could understand what role it (and by extension other SSRIs) could play in the treatment of adolescents with MDD. For us the question was given (1) the data distribution and statistical results for efficacy and side effects that we saw in the study, and (2) that there was well replicated research evidence of efficacy and relative safety of paroxetine in adults, what conclusions should be drawn from our data? The clinical outcomes were substantially in the right direction, with a number of them reaching the 0.05 level of statistical significance. The clinical results comparing paroxetine placebo to and imipramine to placebo paralleled those reported in adults. Side effects were similar to what was known in adults. Thus we reached, the conclusions reported.
The “Restoring Study 329” reanalysis uses the FDA MedDRA approach to side effect data, which was not available when our study was done. That one can do better reanalyzing adverse event data using refinements in approach that have accrued in the 15 years since a study’s publication is unsurprising and not a valid critique of Paroxetine 329 study as performed and presented.
We emphatically disagree with the “Restoring Study 329” position that statistics are not useful in understanding adverse side effects and that each individual reader should decide for herself when a difference in rates of adverse side effects is meaningful. Statistics offer several approaches to the question of when is there a meaningful difference in the side effect rates between different treatments.
Specific methodology problems in the reanalysis of the “harm” data are as follows: 1) The authors choose a non-random subsample of 85 subjects who were withdrawn from the study plus 8 subjects whom the authors labeled “suicidal” based on their inspection of the data; 2) a different instrument was utilized to re-score the harm effects and only one of the authors was trained in the scoring of the instrument; 3) some side effects were arbitrarily interpreted (e.g., upper respiratory symptoms were labeled as “dystonia” and emotional lability labeled as “suicidality”); 4) in the original paper, side effects were analyzed only during the acute phase, but in the reanalysis, the authors analyzed them during the acute phase, as well as the tapering and follow up phases of the study; and 5) in the original study patients were interviewed face-to-face whereas the reanalysis was based only on the interpretation of the data; and 6) importantly, the two authors were not blind to patients’ randomization status.
Our field’s understanding of how to approach analysis of suicidal ideation, suicide attempts, and completed suicide has advanced enormously since publication of study 329. Two definitive reanalyses of the suicidality with antidepressants in adolescents include: 1). The 2003 FDA reanalysis of all RCT data of SSRI studies in youth for all indications. In the FDA analysis the average risk ratio for SSRI versus placebo treated subjects was 1.96 (CI: 1.28-2.98). Considered separately Study 329 did not reach statistical significance for increased suicidality (CI: 0.42-33.21). 2). The methodologically superior reanalysis by Bridge and colleagues also found that in study 329 there was no significant risk difference between paroxetine and placebo. 
Paroxetine treatment in youth does not appear to significantly differ from other SSRIs in the risk of suicidal ideation or attempts and whether SSRIs increase or decrease completed suicide remains an open question. [8-12]
We strongly support efforts to make anonymized raw data from scientific studies available for reanalysis. The validity of “Restoring 329”, however, is doubtful because of author bias and substantial problems with RIATT methodology. To describe Paroxetine 329 as “misreported” is pejorative and wrong based on both state-of-the-art research methods 24 years ago, and retrospectively from the standpoint of current best practices.
Martin B. Keller, M.D. Boris Birmaher, M.D. Gabrielle A. Carlson, MD Gregory N. Clarke, Ph.D. Graham J. Emslie, M.D. Harold Koplewicz, M.D. Stan Kutcher, M.D. Neal Ryan, M.D. William H. Sack, M.D. Michael Strober, Ph.D.
This picture comes from Carl Eliot based on Johanna Ryan’s noting a curious phrase in the conflict of interest statement from Gabrielle Carlsson below.
Competing interests: Please see attachments to this response – the attachments are available on Study329.org.