Tuesday, July 31, 2012

Clouding the Debate on Clinical Trials: Pediatric Edition

I would like to propose a rule for clinical trial benchmarks. This rule may appear so blindingly obvious that I run the risk of seeming simple-minded and naïve for even bringing it up.

The rule is this: if you’re going to introduce a benchmark for clinical trial design or conduct, explain its value.

Are we not putting enough resources into pediatric research, or have we over-incentivized risky experimentation on a vulnerable population?  This is a critically important question in desperate need of more data and thoughtful analysis.
That’s it.  Just a paragraph explaining the rationale of why you’ve chosen to measure what you’re measuring.  Extra credit if you compare it to other benchmarks you could have used, or consider the limitations of your new metric.

I would feel bad for bringing this up, were it not for two recent articles in major publications that completely fail to live up to this standard. I’ll cover one today and one tomorrow.

The first is a recent article in Pediatrics, Pediatric Versus Adult Drug Trials for Conditions With High Pediatric Disease Burden, which has received a fair bit of attention in the industry -- mostly due to Reuters uncritically recycling the authors’ press release

It’s worth noting that the claim made in the release title, "Drug safety and efficacy in children is rarely addressed in drug trials for major diseases", is not at all supported by any data in the study itself. However, I suppose I can live with misleading PR.  What is frustrating is the inadequacy of the measures the authors use in the actual study, and the complete lack of discussion about them.

To benchmark where pediatric drug research should be, they use the proportion of total "burden of disease" borne by children.   Using WHO estimates, they look at the ratio of burden (measured, essentially, in years of total disability) between children and adults.  This burden is further divided into high-income countries and low/middle-income countries.

This has some surface plausibility, but presents a host of issues.  Simply looking at the relative prevalence of a condition does not really give us any insights into what we need to study about treatment.  For example: number 2 on the list for middle/low income diseases is diarrheal illness, where WHO lists the burden of disease as 90% pediatric.  There is no question that diarrheal diseases take a terrible toll on children in developing countries.  We absolutely need to focus resources on improving prevention and treatment: what we do not particularly need is more clinical trials.  As the very first bullet on the WHO fact sheet points out, diarrheal diseases are preventable and treatable.  Prevention is mostly about improving the quality of water and food supplies – this is vitally important stuff, but it has nothing to do with pharmaceutical R&D.

In the US, the NIH’s National Institute for Child Health and Human Development (NICHD) has a rigorous process for identifying and prioritizing needs for pediatric drug development, as mandated by the BPCA.  It is worth noting that only 2 of the top 5 diseases in the Pediatrics article make the cut among the 41 highest-priority areas in the NICHD’s list for 2011.

(I don’t even think the numbers as calculated by the authors are even convincing on their own terms:  3 of the 5 "high burden" diseases in wealthy countries – bipolar, depression, and schizophrenia – are extremely rare in very young children, and only make this list because of their increasing incidence in adolescence.  If our objective is to focus on how these drugs may work differently in developing children, then why wouldn’t we put greater emphasis on the youngest cohorts?)

Of course, just because a new benchmark is at odds with other benchmarks doesn’t necessarily mean that it’s wrong.  But it does mean that the benchmark requires some rigorous vetting before its used.  The authors make no attempt at explaining why we should use their metric, except to say it’s "apt". The only support provided is a pair of footnotes – one of those, ironically, is to this article from 1999 that contains a direct warning against their approach:
Our data demonstrate how policy makers could be misled by using a single measure of the burden of disease, because the ranking of diseases according to their burden varies with the different measures used.
If we’re going to make any progress in solving the problems in drug development – and I think we have a number of problems that need solving – we have got to start raising our standards for our own metrics.

Are we not putting enough resources into pediatric research, or have we over-incentivized risky experimentation on a vulnerable population? This is a critically important question in desperate need of more data and thoughtful analysis. Unfortunately, this study adds more noise than insight to the debate.

Tomorrow In a couple weeks, I’ll cover the allegations about too many trials being too small. [Update: "tomorrow" took a little longer than expected. Follow up post is here.]

[Note: the Pediatrics article also uses another metric, "Percentage of Trials that Are Pediatric", that is used as a proxy for amount of research effort being done.  For space reasons, I’m not going to go into that one, but it’s every bit as unhelpful as the pediatric burden metric.]

ResearchBlogging.org Bourgeois FT, Murthy S, Pinto C, Olson KL, Ioannidis JP, & Mandl KD (2012). Pediatric Versus Adult Drug Trials for Conditions With High Pediatric Disease Burden. Pediatrics PMID: 22826574

Tuesday, July 24, 2012

How Not to Report Clinical Trial Data: a Clear Example

I know it’s not even August yet, but I think we can close the nominations for "Worst Trial Metric of the Year".  The hands-down winner is Pharmalot, for the thoughtless publication of this article reviewing "Deaths During Clinical Trials" per year in India.  We’ll call it the Pharmalot Death Count, or PDC, and its easy to explain – it's just the total number of patients who died while enrolled in any clinical trial, regardless of cause, and reported as though it were an actual meaningful number.

(To make this even more execrable, Pharmalot actually calls this "Deaths attributed to clinical trials" in his opening sentence, although the actual data has exactly nothing to do with the attribution of the death.)

In fairness, Pharmalot is really only sharing the honors with a group of sensationalistic journalists in India who have jumped on these numbers.  But it has a much wider readership within the research community, and could have at least attempted to critically assess the data before repeating it (along with criticism from "experts").

The number of things wrong with this metric is a bit overwhelming.  I’m not even sure where to start.  Some of the obvious issues here:

1. No separation of trial-related versus non-trial-related.  Some effort is made to explain that there may be difficulty in determining whether a particular death was related to the study drug or not.  However, that obscures the fact that the PDC lumps together all deaths, whether they took an experimental medication or not. That means the PDC includes:
  • Patients in control arms receiving standard of care and/or placebo, who died during the course of their trial.
  • Patients whose deaths were entirely unrelated to their illness (eg, automobile accident victims)
2. No base rates.  When a raw death total is presented, a number of obvious questions should come to mind:  how many patients were in the trials?  How many deaths were there in patients with similar diseases who were not in trials?  The PDC doesn’t care about that kind of context

3. No sensitivity to trial design.  Many late-stage cancer clinical trials use Overall Survival (OS) as their primary endpoint – patients are literally in the trial until they die.  This isn’t considered unethical; it’s considered the gold standard of evidence in oncology.  If we ran shorter, less thorough trials, we could greatly reduce the PDC – would that be good for anyone?

Case Study: Zelboraf
FDA: "Highly effective, more personalized therapy"
PDC: "199 deaths attributed to Zelboraf trial!"
There is a fair body of evidence that participants in clinical trials fare about the same as (or possibly a bit better than) similar patients receiving standard of care therapy.  However, much of that evidence was accumulated in western countries: it is a fair question to ask if patients in India and other countries receive a similar benefit.  The PDC, however, adds nothing to our ability to answer that question.

So, for publicizing a metric that has zero utility, and using it to cast aspersions on the ethics of researchers, we congratulate Pharmalot and the PDC.

Thursday, July 19, 2012

Measuring Quality: Probably Not Easy

I am a bit delayed getting my latest post up.  I am writing up some thoughts on this recentstudy put out by ARCO, which suggests that the level of quality in clinical trials does not vary significantly across global regions.

The study has gotten some attention through ARCO’s press release (an interesting range of reactions: the PharmaTimes headline declares “Developingcountries up to scratch on trial data quality”, while Pharmalot’s headline, “WhatProblem With Emerging Markets Trial Data?”, betrays perhaps a touch more skepticism). 

And it’s a very worthwhile topic: much of the difficultly, unfortunately, revolves around agreeing on what we consider adequate metrics for data quality.  The study only really looks at one metric (query rates), but does an admirably job of trying to view that metric in a number of different ways.  (I wrote about another metric – protocol deviations – in a previous post on the relation of quality to site enrollment performance.)

I have run into some issues parsing the study results, however, and have a question in to the lead author.  I’ll withhold further comment until I head back and have had a chance to digest a bit more.

Sunday, July 15, 2012

Site Enrollment Performance: A Better View

Pretty much everyone involved in patient recruitment for clinical trials seems to agree that "metrics" are, in some general sense, really really important. The state of the industry, however, is a bit dismal, with very little evidence of effort to communicate data clearly and effectively. Today I’ll focus on the Site Enrollment histogram, a tried-but-not-very-true standby in every trial.

Consider this graphic, showing enrolled patients at each site. It came through on a weekly "Site Newsletter" for a trial I was working on:

I chose this histogram not because it’s particularly bad, but because it’s supremely typical. Don’t get me wrong ... it’s really bad, but the important thing here is that it looks pretty much exactly like every site enrollment histogram in every study I’ve ever worked on.

This is a wasted opportunity. Whether we look at per-site enrollment with internal teams to develop enrollment support plans, or share this data with our sites to inform and motivate them, a good chart is one of the best tools we have. To illustrate this, let’s look at a few examples of better ways to look at the data.

If you really must do a static site histogram, make it as clear and meaningful as possible. 

This chart improves on the standard histogram in a few important ways:

Stateful histo - click to enlarge

  1.  It looks better. This is not a minor point when part of our work is to engage sites and makes them feel like they are part of something important. Actually, this graph is made clearer and more appealing mostly by the removal of useless attributes (extraneous whitespace, background colors, and unhelpful labels).
  2. It adds patient disposition information. Many graphs – like the one at the beginning of this post – are vague about who is being counted. Does "enrolled" include patients currently being screened, or just those randomized? Interpretations will vary from reader to reader. Instead, this chart makes patient status an explicit variable, without adding to the complexity of the presentation. It also provides a bit of information about recent performance, by showing patients who have been consented but not yet fully screened.
  3. It ranks sites by their total contribution to the study, not by the letters in the investigator’s name. And that is one of the main reasons we like to share this information with our sites in the first place.
Find Opportunities for Alternate Visualizations
There are many other ways in which essentially the same data can be re-sliced or restructured to underscore particular trends or messages. Here are two that I look at frequently, and often find worth sharing.

Then versus Now

Tornado chart - click to enlarge

This tornado chart is an excellent way of showing site-level enrollment trajectory, with each sites prior (left) and subsequent (right) contributions separated out. This example spotlights activity over the past month, but for slower trials a larger timescale may be more appropriate. Also, how the data is sorted can be critical in the communication: this could have been ranked by total enrollment, but instead sorts first on most-recent screening, clearly showing who’s picked up, who’s dropped off, and who’s remained constant (both good and bad).

This is especially useful when looking at a major event (e.g., pre/post protocol amendment), or where enrollment is expected to have natural fluctuations (e.g., in seasonal conditions).

Net Patient Contribution

In many trials, site activation occurs in a more or less "rolling" fashion, with many sites not starting until later in the enrollment period. This makes simple enrollment histograms downright misleading, as they fail to differentiate sites by the length of time they’ve actually been able to enroll. Reporting enrollment rates (patients per site per month) is one straightforward way of compensating for this, but it has the unfortunate effect of showing extreme (and, most importantly, non-predictive), variance for sites that have not been enrolling for very long.

As a result, I prefer to measure each site in terms of its net contribution to enrollment, compared to what it was expected to do over the time it was open:
Net pt contribution - click to enlarge

To clarify this, consider an example: A study expects sites to screen 1 patient per month. Both Site A and Site B have failed to screen a single patient so far, but Site A has been active for 6 months, whereas Site B has only been active 1 month.

On an enrollment histogram, both sites would show up as tied at 0. However, Site A’s 0 is a lot more problematic – and predictive of future performance – than Site B’s 0. If I compare them to benchmark, then I show how many total screenings each site is below the study’s expectation: Site A is at -6, and Site B is only -1, a much clearer representation of current performance.

This graphic has the added advantage of showing how the study as a whole is doing. Comparing the total volume of positive to negative bars gives the viewer an immediate visceral sense of whether the study is above or below expectations.

The above are just 3 examples – there is a lot more that can be done with this data. What is most important is that we first stop and think about what we’re trying to communicate, and then design clear, informative, and attractive graphics to help us do that.

Friday, July 13, 2012

Friday Flailing: Medical Gasses and the Law of the Excluded Middle

Aristotle never actually said "principium tertii
, mostly because he didn't speak Latin.
under the mass of attention and conversations surrounding the ACA last week, the FDA Safety and Innovation Act (FDASIA) contained a number of provisions that will have lasting effects in the pharmaceutical industry.  What little notice it did get tended to focus on PDUFA reauthorization and the establishment of fees for new generic drug applications (GDUFA).

(Tangent: other parts of the act are well worth looking into: NORD is happy about new incentives for rare disease research, the SWHR is happy about expanded reporting on sex and race in clinical trials, and antibiotic drug makers are happy about extra periods of market exclusivity.  A very good summary is available on the FDA Law Blog.)

So no one’s paid any attention to the Medical Gasses Safety Act, which formally defines medical gasses and even gives them their own Advisory Committee and user fees (I guess those will be MGUFAs?)

The Act’s opening definition is a bit of an eyebrow-raiser:
(2) The term ‘medical gas’ means a drug that is--
‘(A) manufactured or stored in a liquefied, non-liquefied, or cryogenic state; and
‘(B) is administered as a gas.
I’m clearly missing something here, because as far as I can tell, everything is either liquefied or non-liquefied.   This doesn’t seem to lend a lot of clarity to the definition.  And then, what to make of the third option?  How can there be a third option?  It’s been years since my college logic class, but I still remember the Law of the Excluded Middle – everything is either P or not-P. 

I was going to send an inquiry through to Congressman Leonard Lance (R-NJ), the bill’s original author, but his website regrets to inform me that he is “unable to reply to any email from constituents outside of the district.”

So I will remain trapped in Logical Limbo.  Enjoy your weekend.

Tuesday, July 10, 2012

Why Study Anything When You Already Know Everything?

If you’re a human being, in possession of one working, standard-issue human brain (and, for the remainder of this post, I’m going to assume you are), it is inevitable that you will fall victim to a wide variety of cognitive biases and mistakes.  Many of these biases result in our feeling much more certain about our knowledge of the world than we have any rational grounds for: from the Availability Heuristic, to the Dunning-Kruger Effect, to Confirmation Bias, there is an increasingly-well-documented system of ways in which we (and yes, that even includes you) become overconfident in our own judgment.

Over the years, scientists have developed a number of tools to help us overcome these biases in order to better understand the world.  In the biological sciences, one of our best tools is the randomized controlled trial (RCT).  In fact, randomization helps minimize biases so well that randomized trials have been suggested as a means of developing better governmental policy.

However, RCTs in general require an investment of time and money, and they need to be somewhat narrowly tailored.  As a result, they frequently become the target of people impatient with the process – especially those who perhaps feel themselves exempt from some of the above biases.

A shining example of this impatience-fortified-by-hubris can be
4 out of 5 Hammer Doctors agree:
the world is 98% nail.
found in a recent “Speaking of Medicine” blog post by Dr Trish Greenhalgh, with the mildly chilling title Less Research is Needed.  In it, the author finds a long list of things she feels to be so obvious that additional studies into them would be frivolous.  Among the things the author knows, beyond a doubt, is that patient education does not work, and electronic medical records are inefficient and unhelpful. 

I admit to being slightly in awe of Dr Greenhalgh’s omniscience in these matters. 

In addition to her “we already know the answer to this” argument, she also mixes in a completely different argument, which is more along the lines of “we’ll never know the answer to this”.  Of course, the upshot of that is identical: why bother conducting studies?  For this argument, she cites the example of coronary artery disease: since a large genomic study found only a small association with CAD heritability, Dr Greenhalgh tells us that any studies of different predictive methods is bound to fail and thus not worth the effort (she specifically mentions “genetic, epigenetic, transcriptomic, proteomic, metabolic and intermediate outcome variables” as things she apparently already knows will not add anything to our understanding of CAD). 

As studies grow more global, and as we adapt to massive increases in computer storage and processing ability, I believe we will see an increase in this type of backlash.  And while physicians can generally be relied on to be at the forefront of the demand for more, not less, evidence, it is quite possible that a vocal minority of physicians will adopt this kind of strongly anti-research stance.  Dr Greenhalgh suggests that she is on the side of “thinking” when she opposes studies, but it is difficult to see this as anything more than an attempt to shut down critical inquiry in favor of deference to experts who are presumed to be fully-informed and bias-free. 

It is worthwhile for those of us engaged in trying to understand the world to be aware of these kinds of threats, and to take them seriously.  Dr Greenhalgh writes glowingly of a 10-year moratorium on research – presumably, we will all simply rely on her expertise to answer our important clinical questions.

Friday, July 6, 2012

A placebo control is not a placebo effect

Following up on yesterday's post regarding a study of placebo-related information, it seems worthwhile to pause and expand on the difference between placebo controls and placebo effects.

The very first sentence of the study paper reflects a common, and rather muddled, belief about placebo-controlled trials:
Placebo groups are used in trials to control for placebo effects, i.e. those changes in a person's health status that result from the meaning and hope the person attributes to a procedure or event in a health care setting.
The best I can say about the above sentence is that in some (not all) trials, this accounts for some (not all) of the rationale for including a placebo group in the study design. 

There is no evidence that “meaning and hope” have any impact on HbA1C levels in patients with diabetes. The placebo effect only goes so far, and certainly doesn’t have much sway over most lab tests.  And yet we still conduct placebo-controlled trials in diabetes, and rightly so. 

To clarify, it may be helpful to break this into two parts:
  1. Most trials need a “No Treatment” arm. 
  2. Most “No Treatment” arms should be double-blind, which requires use of a placebo.
Let’s take these in order.

We need a “No Treatment” arm:
  • Where the natural progression of the disease is variable (e.g., many psychological disorders, such as depression, have ups and downs that are unrelated to treatment).  This is important if we want to measure the proportion of responders – for example, what percentage of diabetes patients got their HbA1C levels below 6.5% on a particular regimen.  We know that some patients will hit that target even without additional intervention, but we won’t know how many unless we include a control group.
  • Where the disease is self-limiting.  Given time, many conditions – the flu, allergies, etc. – tend to go away on their own.  Therefore, even an ineffective medication will look like it’s doing something if we simply test it on its own.  We need a control group to measure whether the investigational medication is actually speeding up the time to cure.
  • When we are testing the combination of an investigational medication with one or more existing therapies. We have a general sense of how well metformin will work in T2D patients, but the effect will vary from trial to trial.  So if I want to see how well my experimental therapy works when added to metformin, I’ll need a metformin-plus-placebo control arm to be able to measure the additional benefit, if any.

All of the above are especially important when the trial is selecting a group of patients with greater disease severity than average.  The process of “enriching” a trial by excluding patients with mild disease has the benefit of requiring many fewer enrolled patients to demonstrate a clinical effect.  However, it also will have a stronger tendency to exhibit “regression to the mean” for a number of patients, who will exhibit a greater than average improvement during the course of the trial.  A control group accurately measures this regression and helps us measure the true effect size.

So, why include a placebo?  Why not just have a control group of patients receiving no additional treatment?  There are compelling reasons:
  • To minimize bias in investigator assessments.  We most often think about placebo arms in relation to patient expectations, but often they are even more valuable in improving the accuracy of physician assessments.  Like all humans, physician investigators interpret evidence in light of their beliefs, and there is substantial evidence that unblinded assessments exaggerate treatment effects – we need the placebo to help maintain investigator blinding.
  • To improve patient compliance in the control arm.  If a patient is clearly not receiving an active treatment, it is often very difficult to keep him or her interested and engaged with the trial, especially if the trial requires frequent clinic visits and non-standard procedures (such as blood draws).  Retention in no-treatment trials can be much lower than in placebo-controlled trials, and if it drops low enough, the validity of any results can be thrown into question.
  • To accurately gauge adverse events.  Any problem(s) encountered are much more likely to be taken seriously – by both the patient and the investigator – if there is genuine uncertainty about whether the patient is on active treatment.  This leads to much more accurate and reliable reporting of adverse events.
In other words, even if the placebo effect didn’t exist, it would still be necessary and proper to conduct placebo-controlled trials.  The failure to separate “placebo control” from “placebo effect” yields some very muddled thinking (which was the ultimate point of my post yesterday).

Thursday, July 5, 2012

The Placebo Effect (No Placebo Necessary)

4 out of 5 non-doctors recommend starting
with "regular strength", and titrating up from there...
(Photo from inventedbyamother.com)
The modern clinical trial’s Informed Consent Form (ICF) is a daunting document.  It is packed with a mind-numbing litany of procedures, potential risks, possible adverse events, and substantial additional information – in general, if someone, somewhere, might find a fact relevant, then it gets into the form.  A run-of-the-mill ICF in a phase 2 or 3 pharma trial can easily run over 10 pages of densely worded text.  You might argue (and in fact, a number of people have, persuasively) that this sort of information overload reduces, rather than enhances, patient understanding of clinical trials.

So it is a bit of a surprise to read a paper arguing that patient information needs to be expanded because it does not contain enough information.  And it is yet even more surprising to read about what’s allegedly missing: more information about the potential effects of placebo.

Actually, “surprising” doesn’t really begin to cover it.  Reading through the paper is a borderline surreal experience.  The authors’ conclusions from “quantitative analysis”* of 45 Patient Information Leaflets for UK trials include such findings as
  • The investigational medication is mentioned more often than the placebo
  • The written purpose of the trial “rarely referred to the placebo”
  • “The possibility of continuing on the placebo treatment after the trial was never raised explicitly”
(You may need to give that last one a minute to sink in.)

Rather than seeing these as rather obvious conclusions, the authors recast them as ethical problems to be overcome.  From the article:
Information leaflets provide participants with a permanent written record about a clinical trial and its procedures and thus make an important contribution to the process of informing participants about placebos.
And from the PR materials furnished along with publication:
We believe the health changes associated with placebos should be better represented in the literature given to patients before they take part in a clinical trial.
There are two points that I think are important here – points that are sometimes missed, and very often badly blurred, even within the research community:

1.    The placebo effect is not caused by placebos.  There is nothing special about a “placebo” treatment that induces a unique effect.  The placebo effect can be induced by a lot of things, including active medications.  When we start talking about placebos as causal agents, we are engaging in fuzzy reasoning – placebo effects will not only be seen in the placebo arm, but will be evenly distributed among all trial participants.

2.    Changes in the placebo arm cannot be assumed to be caused by the placebo effect.  There are many reasons why we may observe health changes within a placebo group, and most of them have nothing to do with the “psychological and neurological mechanisms” of the placebo effect.  Giving trial participant information about the placebo effect may in fact be providing them with an entirely inaccurate description of what is going on.

ResearchBlogging.org Bishop FL, Adams AEM, Kaptchuk TJ, Lewith GT (2012). Informed Consent and Placebo Effects: A Content Analysis of Information Leaflets to Identify What Clinical Trial Participants Are Told about Placebos. PLoS ONE DOI: 10.1371/journal.pone.0039661  

(* Not related to the point at hand, but I would applaud efforts to establish some lower boundaries to what we are permitted to call "quantitative analysis".  Putting counts from 45 brochures into an Excel spreadsheet should fall well below any reasonable threshold.)