Thursday, May 30, 2013

Clinical Trial Enrollment, ASCO 2013 Edition

Even by the already-painfully-embarrassingly-low standards of clinical trial enrollment in general, patient enrollment in cancer clinical trials is slow. Horribly slow. In many cancer trials, randomizing one patient every three or four months isn't bad at all – in fact, it's par for the course. The most
commonly-cited number is that only 3% of cancer patients participate in a trial – and although exact details of how that number is measured are remarkably difficult to pin down, it certainly can't be too far from reality.

Ultimately, the cost of slow enrollment is borne almost entirely by patients; their payment takes the form of fewer new therapies and less evidence to support their treatment decisions.

So when a couple dozen thousand of the world's top oncologists fly into Chicago to meet, you'd figure that improving accrual would be high on everyone’s agenda. You can't run your trial without patients, after all.

But every year, the annual ASCO meeting underdelivers in new ideas for getting more patients into trials. I suppose this a consequence of ASCO's members-only focus: getting the oncologists themselves to address patient accrual is a bit like asking NASCAR drivers to tackle the problems of aerodynamics, engine design, and fuel chemistry.

Nonetheless, every year, a few brave souls do try. Here is a quick rundown of accrual-related abstracts at this year’s meeting, conveniently sorted into 3 logical categories:

1. As Lord Kelvin may or may not have said, “If you cannot measure it, you cannot improve it.”


Probably the most sensible of this year's crop, because rather than trying to make something out of nothing, the authors measure exactly how pervasive the nothing is. Specifically, they attempt to obtain fairly basic patient accrual data for the last three years' worth of clinical trials in kidney cancer. Out of 108 trials identified, they managed to get – via search and direct inquiries with the trial sponsors – basic accrual data for only 43 (40%).

That certainly qualifies as “terrible”, though the authors content themselves with “poor”.

Interestingly, exactly zero of the 32 industry-sponsored trials responded to the authors' initial survey. This fits with my impression that pharma companies continue to think of accrual data as proprietary, though what sort of business advantage it gives them is unclear. Any one company will have only run a small fraction of these studies, greatly limiting their ability to draw anything resembling a valid conclusion.


CALGB investigators look at 110 trials over the past 10 years to see if they can identify any predictive markers of successful enrollment. Unfortunately, the trials themselves are pretty heterogeneous (accrual periods ranged from 6 months to 8.8 years), so finding a consistent marker for successful trials would seem unlikely.

And, in fact, none of the usual suspects (e.g., startup time, disease prevalence) appears to have been significant. The exception was provision of medication by the study, which was positively associated with successful enrollment.

The major limitation with this study, apart from the variability of trials measured, is in its definition of “successful”, which is simply the total number of planned enrolled patients. Under both of their definitions, a slow-enrolling trial that drags on for years before finally reaching its goal is successful, whereas if that same trial had been stopped early it is counted as unsuccessful. While that sometimes may be the case, it's easy to imagine situations where allowing a slow trial to drag on is a painful waste of resources – especially if results are delayed enough to bring their relevance into question.

Even worse, though, is that a trial’s enrollment goal is itself a prediction. The trial steering committee determines how many sites, and what resources, will be needed to hit the number needed for analysis. So in the end, this study is attempting to identify predictors of successful predictions, and there is no reason to believe that the initial enrollment predictions were made with any consistent methodology.

2. If you don't know, maybe ask somebody?



With these two abstracts we celebrate and continue the time-honored tradition of alchemy, whereby we transmute base opinion into golden data. The magic number appears to be 100: if you've got 3 digits' worth of doctors telling you how they feel, that must be worth something.

In the first abstract, a working group is formed to identify and vote on the major barriers to accrual in oncology trials. Then – and this is where the magic happens – that same group is asked to identify and vote on possible ways to overcome those barriers.

In the second, a diverse assortment of community oncologists were given an online survey to provide feedback on the design of a phase 3 trial in light of recent new data. The abstract doesn't specify who was initially sent the survey, so we cannot tell response rate, or compare survey responders to the general population (I'll take a wild guess and go with “massive response bias”).

Market research is sometimes useful. But what cancer clinical trial do not need right now are more surveys are working groups. The “strategies” listed in the first abstract are part of the same cluster of ideas that have been on the table for years now, with no appreciable increase in trial accrual.

3. The obligatory “What the What?” abstract



The force with which my head hit my desk after reading this abstract made me concerned that it had left permanent scarring.

If this had been re-titled “Poor Measurement of Accrual Factors Leads to Inaccurate Accrual Reporting”, would it still have been accepted for this year’s meeting? That's certainly a more accurate title.

Let’s review: a trial intends to enroll both white and minority patients. Whites enroll much faster, leading to a period where only minority patients are recruited. Then, according to the authors, “an almost 4-fold increase in minority accrual raises question of accrual disparity.” So, sites will only recruit minority patients when they have no choice?

But wait: the number of sites wasn't the same during the two periods, and start-up times were staggered. Adjusting for actual site time, the average minority accrual rate was 0.60 patients/site/month in the first part and 0.56 in the second. So the apparent 4-fold increase was entirely an artifact of bad math.

This would be horribly embarrassing were it not for the fact that bad math seems to be endemic in clinical trial enrollment. Failing to adjust for start-up time and number of sites is so routine that not doing it is grounds for a presentation.

The bottom line


What we need now is to rigorously (and prospectively) compare and measure accrual interventions. We have lots of candidate ideas, and there is no need for more retrospective studies, working groups, or opinion polls to speculate on which ones will work best.  Where possible, accrual interventions should themselves be randomized to minimize confounding variables which prevent accurate assessment. Data needs to be uniformly and completely collected. In other words, the standards that we already use for clinical trials need to be applied to the enrollment measures we use to engage patients to participate in those trials.

This is not an optional consideration. It is an ethical obligation we have to cancer patients: we need to assure that we are doing all we can to maximize the rate at which we generate new evidence and test new therapies.

[Image credit: Logarithmic turtle accrual rates courtesy of Flikr user joleson.]

Wednesday, May 15, 2013

Placebos: Banned in Helsinki?


One of the unintended consequences of my (admittedly, somewhat impulsive) decision to name this blog is that I get a fair bit of traffic from Google: people searching for placebo-related information.

Some recent searches have been about the proposed new revisions to the Declaration of Helsinki, and how the new draft version will prohibit or restrict the use of placebo controls in clinical trials. This was a bit puzzling, given that the publicly-released draft revisions [PDF] didn't appear to substantially change the DoH's placebo section.

Much of the confusion appears to be caused by a couple sources. First, the popular Pharmalot blog (whose approach to critical analysis I've noted before as being ... well ... occasionally unenthusiastic) covered it thus:
The draft, which was released earlier this week, is designed to update a version that was adopted in 2008 and many of the changes focus on the use of placebos. For instance, placebos are only permitted when no proven intervention exists; patients will not be subject to any risk or there must be ‘compelling and sound methodological reasons’ for using a placebo or less effective treatment.
This isn't a good summary of the changes, since the “for instance” items are for the most part slight re-wordings from the 2008 version, which itself didn't change much from the version adopted in 2000.

To see what I mean, take a look at the change-tracked version of the placebo section:
The benefits, risks, burdens and effectiveness of a new intervention must be tested against those of the best current proven intervention(s), except in the following circumstances: 
The use of placebo, or no treatment intervention is acceptable in studies where no current proven intervention exists; or 
Where for compelling and scientifically sound methodological reasons the use of any intervention less effective than the best proven one, placebo or no treatment is necessary to determine the efficacy or safety of an intervention 
and the patients who receive any intervention less effective than the best proven one, placebo or no treatment will not be subject to any additional risks of serious or irreversible harm as a result of not receiving the best proven intervention 
Extreme care must be taken to avoid abuse of this option.
Really, there is only one significant change to this section: the strengthening of the existing reference to “best proven intervention” in the first sentence. It was already there, but has now been added to sentences 3 and 4. This is a reference to the use of active (non-placebo) comparators that are not the “best proven” intervention.

So, ironically, the biggest change to the placebo section is not about placebos at all.

This is a bit unfortunate, because to me it subtracts from the overall clarity of the section, since it's no longer exclusively about placebo despite still being titled “Use of Placebo”. The DoH has been consistently criticized during previous rounds of revision for becoming progressively less organized and coherently structured, and it certainly reads like a rambling list of semi-related thoughts – a classic “document by committee”. This lack of structure and clarity certainly hurt the DoH's effectiveness in shaping the world's approach to ethical clinical research.

Even worse, the revisions continue to leave unresolved the very real divisions that exist in ethical beliefs about placebo use in trials. The really dramatic revision to the placebo section happened over a decade ago, with the 2000 revision. Those changes, which introduced much of the strict wording in the current version, were extremely controversial, and resulted in the issuance of an extraordinary “Note of Clarification” that effectively softened the new and inflexible language. The 2008 version absorbed the wording from the Note of Clarification, and the resulting document is now vague enough that it is interpreted quite differently in different countries. (For more on the revision history and controversy, see this comprehensive review.)

The 2013 revision could have been an opportunity to try again to build a consensus around placebo use. At the very least, it could have acknowledged and clarified the division of beliefs on the topic. Instead, it sticks to its ambiguous phrasing which will continue to support multiple conflicting interpretations. This does not serve the ends of assuring the ethical conduct of clinical trials.

Ezekiel Emmanuel has been a long-time critic of the DoH's lack of clarity and structure. Earlier this month, he published a compact but forceful review of the ways in which the Declaration has become weakened by its long series of revisions:
Over the years problems with, and objections to, the document have accumulated. I propose that there are nine distinct problems with the current version of the Declaration of Helsinki: it has an incoherent structure; it confuses medical care and research; it addresses the wrong audience; it makes extraneous ethical provisions; it includes contradictions; it contains unnecessary repetitions; it uses multiple and poor phrasings; it includes excessive details; and it makes unjustified, unethical recommendations.
Importantly, Emmanuel also includes a proposed revision and restructuring of the DoH. In his version, much of the current wording around placebo use is retained, but it is absorbed into the larger concept of “Scientific Validity”, which adds important context to the decision about how to decide on a comparator arm in general.

Here is Emmanuel’s suggested revision:
Scientific Validity:  Research in biomedical and other sciences involving human participants must conform to generally accepted scientific principles, be based on a thorough knowledge of the scientific literature, other relevant sources of information, and suitable laboratory, and as necessary, animal experimentation.  Research must be conducted in a manner that will produce reliable and valid data.  To produce meaningful and valid data new interventions should be tested against the best current proven intervention. Sometimes it will be appropriate to test new interventions against placebo, or no treatment, when there is no current proven intervention or, where for compelling and scientifically sound methodological reasons the use of placebo is necessary to determine the efficacy and/or safety of an intervention and the patients who receive placebo, or no treatment, will not be subject to excessive risk or serious irreversible harm.  This option should not be abused.
Here, the scientific rationale for the use of placebo is placed in the greater context of selecting a control arm, which is itself subservient to the ethical imperative to only conduct studies that are scientifically valid. One can quibble with the wording (I still have issues with the use of “best proven” interventions, which I think is much too undefined here, as it is in the DoH, and glosses over some significant problems), but structurally this is a lot stronger, and provides firmer grounding for ethical decision making.

ResearchBlogging.org Emanuel, E. (2013). Reconsidering the Declaration of Helsinki The Lancet, 381 (9877), 1532-1533 DOI: 10.1016/S0140-6736(13)60970-8






[Image: Extra-strength chill pill, modified by the author, based on an original image by Flikr user mirjoran.]

Wednesday, April 17, 2013

But WHY is There an App for That?


FDA should get out of the data entry business.

There’s an app for that!

We've all heard that more than enough times. It started as a line in an ad and has exploded into one of the top meme-mantras of our time: if your organization doesn't have an app, it would seem, you'd better get busy developing one.

Submitting your coffee shop review? Yes!
Submitting a serious med device problem? Less so!
So the fact that the FDA is promising to release a mobile app for physicians to report adverse events with devices is hardly shocking. But it is disappointing.

The current process for physicians and consumers to voluntarily submit adverse even information about drugs or medical devices is a bit cumbersome. The FDA's form 3500 requests quite a lot of contextual data: patient demographics, specifics of the problem, any lab tests or diagnostics that were run, and the eventual outcome. That makes sense, because it helps them to better understand the nature of the issue, and more data should provide a better ability spotting trends over time.

The drawback, of course, is that this makes data entry slower and more involved, which probably reduces the total number of adverse events reported – and, by most estimates, the number of reports is far lower than the total amount of actual events.

And that’s the problem: converting a data-entry-intensive paper or online activity into a data-entry-intensive mobile app activity just modernizes the hassle. In fact, it probably makes it worse, as entering large amounts of free-form text is not, shall we say, a strong point of mobile apps.

The solution here is for FDA to get itself out of the data entry business. Adverse event information – and the critical contextual data to go with it – already exist in a variety of data streams. Rather than asking physicians and patients to re-enter this data, FDA should be working on interfaces for them to transfer the data that’s already there. That means developing a robust set of Application Programming Interfaces (APIs) that can be used by the teams who are developing medical data apps – everything from hospital EMR systems, to physician reference apps, to patient medication and symptom tracking apps. Those applications are likely to have far more data inside them than FDA currently receives, so enabling more seamless transmission of that data should be a top priority.

(A simple analogy might be helpful here: when an application on your computer or phone crashes, the operating system generally bundles any diagnostic information together, then asks if you want to submit the error data to the manufacturer. FDA should be working with external developers on this type of “1-click” system rather that providing user-unfriendly forms to fill out.)

A couple other programs would seem to support this approach:

  • The congressionally-mandated Sentinel Initiative, which requires FDA to set up programs to tap into active data streams, such as insurance claims databases, to detect potential safety signals
  • A 2012 White House directive for all Federal agencies pursue the development of APIs as part of a broader "digital government" program

(Thanks to RF's Alec Gaffney for pointing out the White House directive.)

Perhaps FDA is already working on APIs for seamless adverse event reporting, but I could not find any evidence of their plans in this area. And even if they are, building a mobile app is still a waste of time and resources.

Sometimes being tech savvy means not jumping on the current tech trend: this is clearly one of those times. Let’s not have an app for that.

(Smartphone image via flikr user DigiEnable.)

Wednesday, February 27, 2013

It's Not Them, It's You

Are competing trials slowing yours down? Probably not.

If they don't like your trial, EVERYTHING ELSE IN
THE WORLD is competition for their attention.
Rahlyn Gossen has a provocative new blog post up on her website entitled "The Patient Recruitment Secret". In it, she makes a strong case for considering site commitment to a trial – in the form of their investment of time, effort, and interest – to be the single largest driver of patient enrollment.

The reasoning behind this idea is clear and quite persuasive:
Every clinical trial that is not yours is a competing clinical trial. 
Clinical research sites have finite resources. And with research sites being asked to take on more and more duties, those resources are only getting more strained. Here’s what this reality means for patient enrollment. 
If research site staff are working on other clinical trials, they are not working on your clinical trial. Nor are they working on patient recruitment for your clinical trial. To excel at patient enrollment, you need to maximize the time and energy that sites spend recruiting patients for your clinical trial.
Much of this fits together very nicely with a point I raised in a post a few months ago, showing that improvements in site enrollment performance may often be made at the expense of other trials.

However, I would add a qualifier to these discussions: the number of active "competing" trials at a site is not a reliable predictor of enrollment performance. In other words, selecting sites who are not working on a lot of other trials will in no way improve enrollment in your trial.

This is an important point because, as Gossen points out, asking the number of other studies is a standard habit of sponsors and CROs on site feasibility questionnaires. In fact, many sponsors can get very hung up on competing trials – to the point of excluding potentially good sites that they feel are working on too many other things.

This came to a head recently when we were brought in to consult on a study experiencing significant enrollment difficulty. The sponsor was very concerned about competing trials at the sites – there was a belief that such competition was a big contributor to sluggish enrollment.

As part of our analysis, we collected updated information on competitive trials. Given the staggered nature of the trial's startup, we then calculated time-adjusted Net Patient Contributions for each site (for more information on that, see my write-up here).

We then cross-referenced competing trials to enrollment performance. The results were very surprising: the quantity of other trials had no effect on how the sites were doing.  Here's the data:

Each site's enrollment performance as it relates to number of other trials it's running.
Competitive trials do not appear to substantially impact rates of enrollment.
 Each site is a point. Good sites (higher up) and poor enrollers (lower) are virtually identical in terms of how many concurrent trials they were running.

Since running into this result, I've looked at the relationship between the number of competing trials in CRO feasibility questionnaires and final site enrollment for many of the trials we've worked on. In each case, the "competing" trials did not serve as even a weak predictor of eventual site performance.

I agree with Gossen's fundamental point that a site's interest and enthusiasm for your trial will help increase enrollment at that site. However, we need to do a better job of thinking about the best ways of measuring that interest to understand the magnitude of the effect that it truly has. And, even more importantly, we have to avoid reliance on substandard proxy measurements such as "number of competing trials", because those will steer us wrong in site selection. In fact, almost everything we tend to collect on feasibility questionnaires appears to be non-predictive and potentially misleading; but that's a post for another day.

[Image credit: research distractions courtesy of Flikr user ronocdh.]

Friday, February 8, 2013

The FDA’s Magic Meeting


Can you shed three years of pipeline flab with this one simple trick?

"There’s no trick to it ... it’s just a simple trick!" -Brad Goodman

Getting a drug to market is hard. It is hard in every way a thing can be hard: it takes a long time, it's expensive, it involves a process that is opaque and frustrating, and failure is a much more likely outcome than success. Boston pioneers pointing their wagons west in 1820 had far better prospects for seeing the Pacific Ocean than a new drug, freshly launched into human trials, will ever have for earning a single dollar in sales.

Exact numbers are hard to come by, but the semi-official industry estimates are: about 6-8 years, a couple billion dollars, and more than 80% chance of ultimate failure.

Is there a secret handshake? Should we bring doughnuts?
(We should probably bring doughnuts.)
Finding ways to reduce any of those numbers is one of the premier obsessions of the pharma R&D world. We explore new technologies and standards, consider moving our trials to sites in other countries, consider skipping the sites altogether and going straight to the patient, and hire patient recruitment firms* to speed up trial enrollment. We even invent words to describe our latest and awesomest attempts at making development faster, better, and cheaper.

But perhaps all we needed was another meeting.

A recent blog post from Anne Pariser, an Associate Director at FDA's Center for Drug Evaluation and Research suggests that attending a pre-IND meeting can shave a whopping 3 years off your clinical development timeline:
For instance, for all new drugs approved between 2010 and 2012, the average clinical development time was more than 3 years faster when a pre-IND meeting was held than it was for drugs approved without a pre-IND meeting. 
For orphan drugs used to treat rare diseases, the development time for products with a pre-IND meeting was 6 years shorter on average or about half of what it was for those orphan drugs that did not have such a meeting.
That's it? A meeting? Cancel the massive CTMS integration – all we need are a couple tickets to DC?

Pariser's post appears to be an extension of an FDA presentation made at a joint NORD/DIA meeting last October. As far as I can tell, that presentation's not public, but it was covered by the Pink Sheet's Derrick Gingery on November 1.  That presentation covered just 2010 and 2011, and actually showed a 5 year benefit for drugs with pre-IND meetings (Pariser references 2010-2012).

Consider the fact that one VC-funded vendor** was recently spotted aggressively hyping the fact that its software reduced one trial’s timeline by 6 weeks. And here the FDA is telling us that a single sit-down saves an additional 150 weeks.

In addition, a second meeting – the End of Phase II meeting – saves another year, according to the NORD presentation.  Pariser does not include EOP2 data in her blog post.

So, time to charter a bus, load up the clinical and regulatory teams, and hit the road to Silver Spring?

Well, maybe. It probably couldn't hurt, and I'm sure it would be a great bonding experience, but there are some reasons to not take the numbers at face value.
  • We’re dealing with really small numbers here. The NORD presentation covers 54 drugs, and Pariser's appears to add 39 to that total. The fact that the time-savings data shifted so dramatically – from 5 years to 3 – tips us off to the fact that we probably have a lot of variance in the data. We also have no idea how many pre-IND meetings there were, so we don't know the relative sizes of the comparison groups.
  • It's a survivor-only data set. It doesn't include drugs that were terminated or rejected. FDA would never approve a clinical trial that only looked at patients who responded, then retroactively determined differences between them.  That approach is clearly susceptible to survivorship bias.
  • It reports means. This is especially a problem given the small numbers being studied. It's entirely plausible that just one or two drugs that took a really long time are badly skewing the results. Medians with quartile ranges would have been a lot more enlightening here.
All of the above make me question how big an impact this one meeting can really have. I'm sure it's a good thing, but it can't be quite this amazing, can it?

However, it would be great to see more of these metrics, produced in more detail, by the FDA. The agency does a pretty good job of reporting on its own performance – the PDUFA performance reports are a worthwhile read – but it doesn't publish much in the way of sponsor metrics. Given the constant clamor for new pathways and concessions from the FDA, it would be truly enlightening to see how well the industry is actually taking advantage of the tools it currently has.

As Gingery wrote in his article, "Data showing that the existing FDA processes, if used, can reduce development time is interesting given the strong effort by industry to create new methods to streamline the approval process." Gingery also notes that two new official sponsor-FDA meeting points have been added in the recently-passed FDASIA, so it would seem extremely worthwhile to have some ongoing, rigorous measurement of the usage of, and benefit from, these meetings.

Of course, even if these meetings are strongly associated with faster pipeline times, don’t be so sure that simply adding the meeting will cut your development so dramatically. Goodhart's Law tells us that performance metrics, when turned into targets, have a tendency to fail: in this case, whatever it was about the drug, or the drug company leadership, that prevented the meeting from happening in the first place may still prove to be the real factor in the delay.

I suppose the ultimate lesson here might be: If your drug doesn't have a pre-IND meeting because your executive management has the hubris to believe it doesn't need FDA input, then you probably need new executives more than you need a meeting.

[Image: Meeting pictured may not contain actual magic. Photo from FDA's Flikr stream.]

*  Disclosure: the author works for one of those.
** Under the theory that there is no such thing as bad publicity, no link will be provided.



Wednesday, February 6, 2013

Our New Glass House: GSK's Commitment to AllTrials

No stones, please.

Yesterday, Alec Gaffney was kind enough to ask my opinion on GSK's signing on to the AllTrials initiative to bring full publication of clinical trial data. Some of my comments made it into his thorough and excellent article on the topic. Today, it seems worthwhile to expand on those comments.

1. It was going to happen: if not now, then soon

As mentioned in the article, I – and I suspect a fair number of other people in the industry -- already thought that full CSR publication was inevitable.  In the last half of 2012, the EMA began moving very decisively in the direction of clinical trial results publication, but that's just the culmination of a long series of steps towards greater transparency in the drug development process. Starting with the establishment of the ClinicalTrials.gov registry in 1997, we have witnessed a near-continuous increase in requirements for public registration and reporting around clinical trials.

It's important to see the AllTrials campaign in this context. If AllTrials didn't exist, something very much like it would have come along. We had been moving in this direction already (the Declaration of Helsinki called for full publication 4 years before AllTrials even existed), and the time was ripe. In fact, the only thing that I personally found surprising about AllTrials is that it started in the UK, since over the past 15 years most of the advances in trial transparency had come from the US.

2. It's a good thing, but it's not earth-shattering

Practically speaking, releasing the full CSR probably won't have a substantial impact on everyday clinical practice by doctors. The real meat of the CSR that doctors care about has already been mandated on ClinicalTrials.gov – full results posting was required by FDAAA in 2008.

There seems to be pretty clear evidence that many (perhaps most) practicing physicians do not read the complete articles on clinical trials already, but rather gravitate to abstracts and summary tables. It is highly doubtful, therefore, that a high percentage of physicians will actually read through a series of multi-hundred-page documents to try to glean fresh nuances about the drugs they prescribe.

Presumably, we'll see synopsizing services arise to provide executive summaries of the CSR data, and these may turn out to be popular and well-used. However, again, most of the really important and interesting bits are going to be on ClinicalTrial.gov in convenient table form (well, sort-of convenient – I admit I sometimes have a fair bit of difficulty sifting through the data that’s already posted there).

3. The real question: Where will we go with patient-level data?

In terms of actual positive impact on clinical research, GSK's prior announcement last October – making full patient-level data available to researchers – was a much bigger deal. That opens up the data to all sorts of potential re-analyses, including more thorough looks at patient subpopulations.

Tellingly, no one else in pharma has followed suit yet. I expect we’ll see a few more major AllTrials signatories in fairly short order (and I certainly intend to vigorously encourage all of my clients to be among the first wave of signatories!), but I don’t know that we’ll see anyone offer up the complete data sets.  To me, that will be the trend to watch over the next 2-3 years.

[Image: Transparent abode courtesy of flikr user seier+seier.]

Tuesday, February 5, 2013

The World's Worst Coin Trick?


Ben Goldacre – whose Bad Pharma went on sale today – is fond of using a coin-toss-cheating analogy to describe the problem of "hidden" trials in pharmaceutical clinical research. He uses it in this TED talk:
If it's a coin-toss conspiracy, it's the worst
one in the history of conspiracies.
If I flipped a coin a hundred times, but then withheld the results from you from half of those tosses, I could make it look as if I had a coin that always came up heads. But that wouldn't mean that I had a two-headed coin; that would mean that I was a chancer, and you were an idiot for letting me get away with it. But this is exactly what we blindly tolerate in the whole of evidence-based medicine. 
and in this recent op-ed column in the New York Times:
If I toss a coin, but hide the result every time it comes up tails, it looks as if I always throw heads. You wouldn't tolerate that if we were choosing who should go first in a game of pocket billiards, but in medicine, it’s accepted as the norm. 
I can understand why he likes using this metaphor. It's a striking and concrete illustration of his claim that pharmaceutical companies are suppressing data from clinical trials in an effort to make ineffective drugs appear effective. It also dovetails elegantly, from a rhetorical standpoint, with his frequently-repeated claim that "half of all trials go unpublished" (the reader is left to make the connection, but presumably it's all the tail-flip trials, with negative results, that aren't published).

Like many great metaphors, however, this coin-scam metaphor has the distinct weakness of being completely disconnected from reality.

If we can cheat and hide bad results, why do we have so many public failures? Pharmaceutical headlines in the past year were mostly dominated by a series of high-profile clinical trial failures. Even drugs that showed great promise in phase 2 failed in phase 3 and were discontinued. Less than 20% of drugs that start up in human testing ever make it to market ... and by some accounts it may be less than 10%. Pfizer had a great run of approvals to end 2012, with 4 new drugs approved by the FDA (including Xalkori, the exciting targeted therapy for lung cancer). And yet during that same period, the company discontinued 8 compounds.

Now, this wasn't always the case. Mandatory public registration of all pharma trials didn't begin in the US until 2005, and mandatory public results reporting came later than that. Before then, companies certainly had more leeway to keep results to themselves, with one important exception: the FDA still had the data. If you ran 4 phase 3 trials on a drug, and only 2 of them were positive, you might be able to only publish those 2, but when it came time to bring the drug to market, the regulators who reviewed your NDA report would be looking at the totality of evidence – all 4 trials. And in all likelihood you were going to be rejected.

That was definitely not an ideal situation, but even then it wasn't half as dire as Goldacre's Coin Toss would lead you to believe. The cases of ineffective drugs reaching the US market are extremely rare: if anything, FDA has historically been criticized for being too risk-averse and preventing drugs with only modest efficacy from being approved.

Things are even better now. There are no hidden trials, the degree of rigor (in terms of randomization, blinding, and analysis) has ratcheted up consistently over the last two decades, lots more safety data gets collected along the way, and phase 4 trials are actually being executed and reported in a timely manner. In fact, it is safe to say that medical research has never been as thorough and rigorous as it is today.

That doesn't mean we can’t get better. We can. But the main reason we can is that we got on the path to getting better 20 years ago, and continue to make improvements.

Buying into Goldacre's analogy requires you to completely ignore a massive flood of public evidence to the contrary. That may work for the average TED audience, but it shouldn't be acceptable at the level of rational public discussion.

Of course, Goldacre knows that negative trials are publicized all the time. His point is about publication bias. However, when he makes his point so broadly as to mislead those who are not directly involved in the R&D process, he has clearly stepped out of the realm of thoughtful and valid criticism.

I got my pre-ordered copy of Bad Pharma this morning, and look forward to reading it. I will post some additional thoughts on the book as I get through it. In the meantime,those looking for more can find a good skeptical review of some of Goldacre's data on the Dianthus Medical blog here and here.

[Image: Bad Pharma's Bad Coin courtesy of flikr user timparkinson.]

Friday, January 25, 2013

Less than Jaw-Dropping: Half of Sites Are Below Average


Last week, the Tufts Center for the Study of Drug Development unleashed the latest in their occasional series of dire pronouncements about the state of pharmaceutical clinical trials.

One particular factoid from the CSDD "study" caught my attention:
Shocking performance stat:
57% of these racers won't medal!
* 11% of sites in a given trial typically fail to enroll a single patient, 37% under-enroll, 39% meet their enrollment targets, and 13% exceed their targets.
Many industry reporters uncritically recycled those numbers. Pharmalot noted:
Now, the bad news – 48 percent of the trial sites miss enrollment targets and study timelines often slip, causing extensions that are nearly double the original duration in order to meeting enrollment levels for all therapeutic areas.
(Fierce Biotech and Pharma Times also picked up the same themes and quotes from the Tufts PR.)

There are two serious problems with the data as reported.

One: no one – neither CSDD nor the journalists who loyally recycle its press releases – seem to remember this CSDD release from less than two years ago. It made the even-direr claim that
According to Tufts CSDD, two-thirds of investigative sites fail to meet the patient enrollment requirements for a given clinical trial.
If you believe both Tufts numbers, then it would appear that the number of under-performing sites has dropped almost 20% in just 20 months – from 67% in April 2011 to 48% in January 2013. For an industry as hidebound and slow-moving as drug development, this ought to be hailed as a startling and amazing improvement!

Maybe at the end of the day, 48% isn't a great number, but surely this would appear to indicate we're on the right track, right? Why would no one mention this?

Which leads me to problem two: I suspect that no one is connecting the 2 data points because no one is sure what it is we're even supposed to be measuring here.

In a clinical trial, a site's "enrollment target" is not an objectively-defined number. Different sponsors will have different ways of setting targets – in fact, the method for setting targets may vary from team to team within a single pharma company.

The simplest way to set a target is to divide the total number of expected patients by the number of sites. If you have 50 sites and want to enroll 500 patients, then viola ... everyone's got a "target" of 10 patients! But then as soon as some sites start exceeding their target, others will, by definition, fall short. That’s not necessarily a sign of underperformance – in fact, if a trial finishes enrollment dramatically ahead of schedule, there will almost certainly be a large number of "under target" sites.

Some sponsors and CROs get tricky about setting individual targets for each site. How do they set those? The short answer is: pretty arbitrarily. Targets are only partially based upon data from previous, similar (but not identical) trials, but are also shifted up or down by the (real or perceived) commercial urgency of the trial. They can also be influenced by a variety of subjective beliefs about the study protocol and an individual study manager's guesses about how the sites will perform.

If a trial ends with 0% of sites meeting their targets, the next trial in that indication will have a lower, more achievable target. The same will happen in the other direction: too-easy targets will be ratcheted up. The benchmark will jump around quite a bit over time.

As a result, "Percentage of trial sites meeting enrollment target" is, to put it bluntly, completely worthless as an aggregate performance metric. Not only will it change greatly based upon which set  of sponsors and studies you happen to look at, but even data from the same sponsors will wobble heavily over time.

Why does this matter?

There is a consensus that clinical development is much too slow -- we need to be striving to shorten clinical trial timelines and get drugs to market sooner. If we are going to make any headway in this effort, we need to accurately assess the forces that help or hinder the pace of development, and we absolutely must rigorously benchmark and test our work. The adoption of, and attention paid to unhelpful metrics will only confuse and delay our effort to improve the quality of speed of drug development.

[Photo of "underperforming" swimmers courtesy Boston Public Library on flikr.]

Tuesday, January 15, 2013

Holding Your Breath Also Might Work

Here's a fitting postscript to yesterday's article about wishful-thinking-based enrollment strategies: we received a note from a research site this morning. The site had opted out of my company's comprehensive recruitment campaign, telling the sponsor they preferred to recruit patients their own way.

Here's the latest update from the coordinator:
I've found one person and have called a couple of times, but no return calls.  I will be sending this potential patient a letter this week.  I'm keeping my fingers crossed in finding someone soon!
They don't want to participate in a broad internet/broadcast/advocacy group program, but it's OK -- they have their fingers crossed!

Monday, January 14, 2013

Magical Thinking in Clinical Trial Enrollment


The many flavors of wish-based patient recruitment.

[Hopefully-obvious disclosure: I work in the field of clinical trial enrollment.]

When I'm discussing and recommending patient recruitment strategies with prospective clients, there is only one serious competitor I'm working against. I do not tailor my presentations in reaction to what other Patient Recruitment Organizations are saying, because they're not usually the thing that causes me the most problems. In almost all cases, when we lose out on a new study opportunity, we have lost to one opponent:

Need patients? Just add water!
Magical thinking.

Magical thinking comes in many forms, but in clinical trial enrollment it traditionally has two dominant flavors:

  • We won’t have any problems with enrollment because we have made it a priority within our organization.
    (This translates to: "we want it to happen, therefore it has to happen, therefore it will happen", but it doesn't sound quite as convincing that way, does it?)
  • We have selected sites that already have access to a large number of the patients we need.
    (I hear this pretty much 100% of the time. Even from people who understand that every trial is different and that past site performance is simply not a great predictor of future performance.)

A new form of magical thinking burst onto the scene a few years ago: the belief that the Internet will enable us to target and engage exactly the right patients. Specifically, some teams (aided by the, shall we say, less-than-completely-totally-true claims of "expert" vendors) began to believe that the web’s great capacity to narrowly target specific people – through Google search advertising, online patient communities, and general social media activities – would prove more than enough to deliver large numbers of trial participants. And deliver them fast and cheap to boot. Sadly evidence has already started to emerge about the Internet’s failure to be a panacea for slow enrollment. As I and others have pointed out, online recruitment can certainly be cost effective, but cannot be relied on to generate a sizable response. As a sole source, it tends to underdeliver even for small trials.

I think we are now seeing the emergence of the newest flavor of magical thinking: Big Data. Take this quote from recent coverage of the JP Morgan Healthcare Conference:
For instance, Phase II, that ever-vexing rubber-road matchmaker for promising compounds that just might be worthless. Identifying the right patients for the right drug can make or break a Phase II trial, [John] Reynders said, and Big Data can come in handy as investigators distill mountains of imaging results, disease progression readings and genotypic traits to find their target participants. 
The prospect of widespread genetic mapping coupled with the power of Big Data could fundamentally change how biotech does R&D, [Alexis] Borisy said. "Imagine having 1 million cancer patients profiled with data sets available and accessible," he said. "Think how that very large data set might work--imagine its impact on what development looks like. You just look at the database and immediately enroll a trial of ideal patients."
Did you follow the logic of that last sentence? You immediately enroll ideal patients ... and all you had to do was look at a database! Problem solved!

Before you go rushing off to get your company some Big Data, please consider the fact that the overwhelming majority of Phase 2 trials do not have a neat, predefined set of genotypic traits they’re looking to enroll. In fact, narrowly-tailored phase 2 trials (such as recent registration trials of Xalkori and Zelboraf) actually enroll very quickly already, without the need for big databases. The reality for most drugs is exactly the opposite: they enter phase 2 actively looking for signals that will help identify subgroups that benefit from the treatment.

Also, it’s worth pointing out that having a million data points in a database does not mean that you have a million qualified, interested, and nearby patients just waiting to be enrolled in your trial. As recent work in medical record queries bears out, the yield from these databases promises to be low, and there are enormous logistic, regulatory, and personal challenges in identifying, engaging, and consenting the actual human beings represented by the data.

More, even fresher flavors of magical thinking are sure to emerge over time. Our urge to hope that our problems will just be washed away in a wave of cool new technology is just too powerful to resist.

However, when the trial is important, and the costs of delay are high, clinical teams need to set the wishful thinking aside and ask for a thoughtful plan based on hard evidence. Fortunately, that requires no magic bean purchase.

Magic Beans picture courtesy of Flikr user sleepyneko