Why positive trials aren’t really positive, and negative ones may not be negative

Chris Poynter on 18-02-2014

Recently, I was pleasantly surprised with a discussion about statistics during one of our journal clubs. I know, I know, sounds crazy, but bear with me because it was illuminating.

There was avid discussion about yet another negative ICU trial and the implications behind both positive and negative trials in ICU research. It was essentially a brief stats clinic and although I rarely get excited by the thought of statistical analysis, I frequently get frustrated by statistical misunderstanding. I will add here that I am no statistician and am very happy to have further illumination cast my way but this is my take on some common problems surrounding the interpretation of ICU research.

To start with, it is important to recognise the gamesmanship that takes place in research. Most ICU trials (in fact, pretty much all) are underpowered to detect clinically important differences. In order to adequately power a trial without having to recruit ludicrously high numbers, the target treatment effect is generally exaggerated to the extent that frequently the hypothesis is for a mortality improvement better than any single intervention has ever provided in the history of intensive care. This is because is it extremely difficult and expensive to recruit the numbers required to adequately test a hypothesis. Hence, there are multiple negative ICU trials and ongoing uncertainty remains around most clinical questions.

For example, in the TTM trial, the study aims for an absolute mortality reduction of 11%. Although this is based on numbers from earlier studies, this is an extremely ambitious target for a 3 degree difference in temperature. As I have pointed out in my earlier blog https://www.crit-iq.com/index.php/blog/single/Targeted-temperature-management-intensive-care-therapeutic-hypothermia-cardiac-arrest, a more likely and still relevant 2% treatment effect would take approximately 20,000 patients in order to adequately power a study. This would take too long and be extremely resource intensive and expensive. Hence, we find ourselves powering for these unreasonable effects and inevitably having negative trials. Don’t get me wrong, I have chosen the TTM study as it is a shining example of one of the better recent trials and yet it is still underpowered.

Our journal club discussion revolved around the positive trials in the ICU literature. Due to the previously mentioned unreasonable treatment effect, any positive trials should be viewed with skepticism. Pre-trial expectation is very low for a positive trial. For example, if we (optimistically) estimate 1% likelihood of a positive trial beforehand with 90% power calculation and p=0.05, this means that for 1000 trials, 9 will be true positive, 50 false positive. That means that less than a sixth of such positive trials are actually positive. We agreed that the pre-study likelihood was probably even lower than that for many trials.

It was argued therefore that positive trials are probably not positive, and negative trials are not necessarily negative. The resulting questions were what is the point and how can we possibly interpret the literature? There is some truth to this conundrum but it does not tell the whole story.

Even though positive trials are unlikely positive to the degree that they indicate (e.g. showing a 16% ARR for the original therapeutic hypothermia trials) a positive result is still statistically significant and can be used as legitimate evidence towards a treatment effect. Although the effect size is likely to be an overestimate, there is, nevertheless, likely to be an effect.

What does this all mean? Evidence based medicine has many pitfalls when translating to practice. I am very hesitant to change my practice based on a single study and very skeptical about positive studies, particularly those that stand in isolation. There is hope on the way. As research networks grow, the capacity for larger multi-centre trials is improving. There are some different study protocols being developed in order to improve the accuracy and efficiency of research.

In the mean time, be skeptical about all research and remember that any study is just a single piece of an extremely large and complicated jigsaw puzzle.

3 Comments

Got something to say?

Log in to your account

Forgotten your password?

Mark from Australia wrote 02-21-2014 04:28:30 pm
I couldn't agree more! Type 1 error (chance finding of difference) is almost never discussed in positive studies (nor in meta-analysis) and coupled with publication bias leads to some clinical over-confidence in some fairly ho-hum therapies. In my main field (anesthesiology) there is rarely an acute pain trial with more than 100 participants and they are almost never repeated, making me a bit chary of "evidence-based guidelines".

Mahesh from Australia wrote 02-23-2014 08:22:43 pm
The comment regarding collaboration and large multicentre trials is very important. Cluster randomisation across a large number of ICU's is going to be way forward for ICU trials (as demonstrated by Huang et al, NEJM 2013). Otherwise, as you've nicely explained, our evidence base will continue to be underpowered.

Christopher from New Zealand wrote 03-03-2014 08:04:01 am
Thanks for your comments. I'm glad you mentioned cluster randomization Mahesh. That certainly seems the way forward in answering the big questions comparing 2 commonly used therapies head to head (ie Saline and Plasmalyte in the soon to be commenced SPliT study) but it does require equipoise such that entire units are comfortable with either therapy. This may be a barrier for many questions