14 December 2015

My (current) position on the PACE trial

I have written this post principally for people who have started following me (formally on Twitter, or in some other way) because of my somewhat peripheral involvement on the PACE trial discussions.

First off, while I try to be reasonably politically correct, I don't always get all the details right.  I've tried to be respectful to all involved here.  In particular, someone told me that "CFS/ME" is not always an appropriate label to use.  I hope anyone who thinks that will allow me a pass on that, from my position of ignorance.

I've learned a lot about CFS/ME over the past few weeks.  Some of what I've been told --- but above all, what I've observed --- about how some of the science has been conducted, has disturbed me.  The people whose opinions I tend to trust on most issues, who usually put science ahead of their personal political position, seem to be pretty much unanimous that the PACE trial data need to be released so that disinterested parties can examine them.

But I want to make it clear that I have no specific interest in CFS/ME.  I don't personally know anyone who suffers from it, and it's not something I've really ever thought about much.  I don't especially want to become an advocate for patients, except to the extent that, having had my own health problems in the last couple of years, I wish every sick person a speedy recovery and access to the finest medical treatment they can get.  So I'm not sure I can even call myself an "ally"; allies have to take a non-trivial position, and I don't think my position here is much more than trivial.  If the PACE trial data emerge tomorrow, I will not personally be reanalysing them.  I don't know enough about this kind of study to do so.

What I do care about is the integrity of science.  You can see this, I hope, if you Google some of the stuff I've been doing in psychology.  Science, imperfect though it is, is about the only rational game in town when it comes to solving the problems facing society, and when scientists put their own interests above those of the wider community, it usually doesn't turn out well.

So, on to the PACE trial... I want to say that I can understand a lot of defensiveness on the part of the PACE researchers.  They have heard stories of others being harassed and even receiving death threats.  Maybe some of them have experienced this themselves.  For the purposes of this post (please bear with me!), I'm going to assume --- because I have no evidence to the contrary, and people generally don't make these accusations lightly --- that the stories of CFS/ME researchers being harassed in the past are true; arguably, for the purposes of this discussion, it doesn't make any difference whether they are true or not.  (Of course, in another context, such claims are very important, but let me try to put that aside for now.)  Apart from anything else, given the size of the CFS/ME community, it would be unreasonable not to expect there to be some fairly unpleasant people to have also developed the condition.  We all know people like that, whatever our and their health status.  CFS/ME strikes people from all walks of life, including some saints and some sinners.

Now, with that said, I am unconvinced --- actually, "bewildered" would be a better word --- by the argument that releasing the data would somehow expose the researchers to (further) harassment.  Indeed, it seems to me that withholding the data plays directly into the hands of those who claim that the PACE researchers have "something to hide", and they are presumably the most likely to escalate their anger into harassment.  I actually don't believe that the researchers have anything to hide, in the sense of feeling guilty because they did something bad in their analyses.  I've seen enough cases like this in my working life to know that incompetence --- generally in the form of a misplaced sense of loyalty to a group rather than to the wider truth and public interest --- is always to be preferred as an alternative explanation to malice, first because malice is harder to prove, and second because it just almost always turns out to be the case than incompetence was behind a screw-up.

About the only reason I can sort of imagine for the argument that releasing the data might lead to harassment of the researchers, is if the alternative were for the question to somehow go away.  That's perhaps a reasonable argument with some political issues; for example, there is (I think) a legitimate debate to be had over whether it's helpful to reproduce, say, cartoons that might cause people to get over-excited, when they could just be left to one side.  But that's simply not going to happen here.  People with a chronic, debilitating condition, and no cure in sight, are not going to suddenly forget tomorrow that they have that condition.  So far, none of the replies to people who have asked for the data, and been told it will lead to harassment, have explained the mechanism by which that is supposed to happen.

The researchers' argument also seems to conflate the presence in the CFS/ME activist community of some unpleasant people --- which, again, for the sake of this discussion, I will assume is probably true --- to the idea that "anyone from the CFS/ME activist community who asks about PACE is probably trying to harass us".  This is not good logic.  It's what leads airline passengers to demand that Muslim passengers be thrown off their plane.  It's called the base rate fallacy, and avoiding it is supposed to be what scientists --- particularly, for goodness sake, scientists involved in epidemiology --- are good at.

A further problem with the arguments that a request for the data --- whether it comes from patients with scientific training, or scientists such as Jim Coyne --- is designed to be "vexatious" or to "lack serious purpose" or that its intent is "polemical" (all terms used by King's in their reply to Coyne), is that such arguments are utterly unfalsifiable.  Given the public profile of this matter, essentially anyone who asks for the data is going to have their credentials examined, and unless they meet the unspecified high standards of the researchers, they won't get to see the data.  (Yes, Jim Coyne --- who, full disclosure, is my PhD supervisor --- can be a bit shouty at times.  But this is not kindergarten.  Scientists don't get to withhold data from other scientists just because they don't play nice.  Ask any scientist if science is about robust disagreement and you will get a "Yes", but if that idealism isn't maintained when actual robust disagreement takes place, then we might as well conduct the whole process through everything-is-fine press releases.)

Actually, in their reply to Coyne, King's College did seem to give a hint as to who might be allowed to see the data, in their statement "We would expect any replication of data to be carried out by a trained Health Economist", with an nice piece of innuendo carried over from the preceding sentence that this health economist had better have a lot of free time, because the original analysis took a year to complete.  This suggests that unless you declare your qualifications as an unemployed health economist, you aren't going to be judged worthy to see the data (and if you come up with conclusions after a week, it might well be suggested that you didn't look hard enough). But the idea that it will take a year, or indeed need specialised training in health economics, to determine whether the Fisher's exact tests from the contingency tables were calculated correctly, or whether the results really show that people got better over the course of the study, is absurd.  Apart from anything else, science is about communicating your results in a coherent manner to the rest of the scientific community.  If you submit an article and then claim that its principal conclusions cannot be verified except by a few dozen highly trained specialists with a year's effort, that's an admission right there that your article has failed.  Of course there will be questions of interpretation, over things like what "getting better" means, but nobody should have to accept the researcher's claims that their interpretation is the right one.  There needs to be a debate, so that a consensus, if one is possible, can emerge.  (Who knows?  Maybe the evidence for CBT is overwhelming.  There are plenty of neutral scientists who can reach a fair conclusion about that, but right now, they are being deprived of the opportunity to do so.)

A further point about the failure to share data is that the researchers agreed, when they published in PLoS ONE, to make their data available to anyone who asked for it.  This is a condition of publishing in that journal.  You can't have the cake of "we're transparent, we published in an open access journal" and then eat that cake too with "but you can't see the data".  PLoS ONE must insist that the authors release the data as they agreed to do as a condition of publication, or else retract the article because their conditions of publication have been breached.  See Klaas van Dijk's formal request in this regard.

These data are undoubtedly going to come out at some point anyway.  The UK's Information Commissioner will see to that, even if PLoS ONE doesn't persuade the authors to release the data.  As the risk management specialist Peter Sandman points out, openness and transparency at the earliest possible stage translate into reduced pain and costs further down the line.

I want to end with a small apology.  I wrote a post yesterday on an unrelated topic (OK, it was also critical of some poor science, but the relation with the subject of this post was peripheral).  Two people submitted comments on that post which drew a link with the PACE trial.  After some thought, I decided not to publish those comments, as I wanted to keep discussion on that other post on-topic.  I apologise to the authors of those comments that Blogger.com's moderation system did not let me explain the reasons why they were not published.  I would happily publish those same comments on this post; indeed, I will publish pretty much any reasonable comments on this post.

13 December 2015

Digging further into the Bos and Cuddy study

*** Post updated 2015-12-19 20:00 UTC
*** See end of post for a solution that matches the reported percentages and chi-squares.
 A few days ago, I blogged about Professor Amy Cuddy's op-ed piece in the New York Times, in which she cited a non-published, non-peer reviewed study about "iPosture" by Bos and Cuddy of how people allegedly deferred more to authority when they used smaller (versus larger) computing devices, because using smaller devices caused them to hunch (sorry, "iHunch") more, and then something something assertiveness something something testosterone and cortisol something.  (The authors apparently didn't do anything as radical at to actually measure, or even observe, how much people hunched, if at all; they took it for granted that "smaller device = bigger iHunch", so that the only possible explanation for the behaviours they observed was the one they hypothesized.  As I noted in that other post, things are so much easier if you bypass peer review.)

Just for fun, I thought I'd try and reconstruct the contingency tables for "people staying on until the experimenter came and asked them to leave the room" from the Bos and Cuddy article, mainly because I wanted to make my own estimate of the effect size.  Bos and Cuddy reported this as "[eta] = .374", but I wanted to experiment with other ways of measuring it.

In their Figure 1, which I have taken the liberty of reproducing below (I believe that this is fair use, according to Harvard's Open Access Policy, which is to be found here), Bos and Cuddy reported (using the dark grey bars) the percentage of participants who left the room to go and collect their pay, before the experimenter returned.  Those figures are 50%, 71%, 88%, and 94%.  The authors didn't specify how many participants were in each condition, but they had 75 people and 4 conditions (phone, tablet, laptop, desktop), and they stated that they randomised each participant to one condition.  So you would expect to find three groups of 19 participants and one of 18.

However, it all gets a bit complicated here.  It's not possible to obtain all four of the percentages that were reported (50%, 71%, 88%, and 94%), rounded conventionally, from a whole number of participants out of 18 or 19.  Specifically, you can take 9 out of 18 and get 50%, or you can take 17 out of 18 and get 94% (0.9444, rounded down), but you can't get 71% or 88%, with either 18 or 19 as the cell size.  So that suggests that the groups must have been of uneven size.  I enumerated all the possible combinations of four cell sizes from 13 to 25 which added up to 75 and also allowed for the percentages of participants who left the room, correctly rounded, to be one of the integers we're looking for.  Here they those possible combinations, with the total numbers of participants first and the percentage and number of leavers in parentheses:

14 (50%=7), 21 (71%=15), 24 (88%=21), 16 (94%=15)
18 (50%=9), 24 (71%=17), 16 (88%=14), 17 (94%=16)
20 (50%=10), 21 (71%=15), 16 (88%=14), 18 (94%=17)
20 (50%=10), 14 (71%=10), 24 (88%=21), 17 (94%=16)
22 (50%=7), 21 (71%=15), 16 (88%=14), 16 (94%=15)

Well, I guess that's also "randomised" in a sense.  But if your sample sizes are uneven like this, and you don't report it, you're not helping people to understand your experiment.

But maybe they still round their numbers by hand at Harvard for some reason, and sometimes they make mistakes.  So let's see if we can get to within one point of those percentages (49% or 51% instead of 50%, 70% or 72% instead of 71%, etc).  And it turns out that we can, just, as shown in the figure below, in which yellow cells are accurately-reported percentages, and orange cells are "off by one".  We can take 72% for N=18 instead of 71%, and 89% for N=19 instead of 88%.  But then, we only have a sample size of 73.  So we could allow another error, replacing 94% for N=18 with 95% for N=19, and get up to a sample of 74.  Still not right.  So, even allowing for three of their four percentages to be misreported, the per-cell sample sizes must have been unequal.

However, if I was going to succeed in my original aim of reconstructing plausible contingency tables, there would be too many combinations to enumerate if I included these "off-by-one" percentages.  So I went back to the five possible combinations of numbers that didn't involve a reporting error in the percentages, and computed the chi-square values for the contingency tables implied by those numbers, using the online calculator here.  They came out between 10.26 and 12.37, with p values from .016 to .006; this range brackets the numbers reported by Bos and Cuddy (chi-square 11.03, p = .012), but none of them matches those values exactly; the closest is the last set (22, 21, 16, 16) with a chi-square of 11.22 and a p of .011.

So, I'm going to tentatively presume that in fact the sample sizes were all equal (give or take one for not having a number of participants divisible by four), and it's in fact the percentages on the dark grey bars in Bos and Cuddy's Figure 1 that are wrong.  For example, if I build this contingency table:

9 14 16 18
9 5 3 1
% Leavers 50% 74% 84% 95%

then the sample size adds up to 75, the per-condition sample sizes are equal, and the chi-square is 11.086 and the p value is .0113.  That was the closest I could get to the values of 11.03 and .012 in the article, although of course I could have missed something.  These numbers are close enough, I guess, although I'm not sure if I'd want to get on an aircraft built with this degree of attention to detail; we still have inaccuracies in three of the four percentages as well as the approximate chi-square statistic and p value.

Normally in circumstances like this, I'd think about leaving a comment on the article on PubPeer.  But it seems that, in bypassing the normal academic publishing process, Professor Cuddy has found a brilliant way of avoiding, not just regular peer review, but post-publication peer review as well.  In fact, unless the New York Times directs its readers to my blog (or another critical review) for some reason, Bos and Cuddy's study is impregnable by virtue of not existing in the literature.

PS:  This tweet, about the NY Times article, makes an excellent point:
Presumably we should all adopt the wide, expansive pose of the broadsheet newspaper reader. Come to think of it, in much of the English-speaking world at least, broadsheets are typically associated with higher status than tabloids.  Psychologists! I've got a study for you...

PPS: The implications of the light grey bars, showing the mean time taken to leave the room by those who didn't stay for the full 10 minutes, are left as an exercise for the reader.  In the absence of standard deviations (unless someone wants to reconstruct possible values for those from the ANOVA), perhaps we can't say very much, but it's interesting to try and construct numbers that match those means.

*** Update 2015-12-19 20:00 UTC: An alert reader has pointed out that there is another possible assignment of subjects to the conditions:
16 (50%=8), 24 (71%=17), 17 (88%=15), 18 (94%=17)
This gives the Chi-square of 11.03 and p of .012 reported in the article.
So I guess my only remaining complaint (apart from the fact that the article is being used to sell a book without having undergone peer review) is that the uneven cell sizes per condition was not reported.  This is actually a surprisingly common problem, even in the published literature.

A cute story to be told, and self-help books to be sold - so who needs fuddy-duddy peer review?

Daniel Kahneman's warning of a looming train wreck in social psychology took another step closer towards realisation today with the publication of this opinion piece in the New York Times.

In the article, entitled "Your iPhone Is Ruining Your Posture — and Your Mood", Professor Amy Cuddy of Harvard Business School reports on "preliminary research" (available here) that she performed with her colleague, Maarten Bos.  Basically, they gave some students some Apple gadgets to play with, ranging in size from an iPhone up to a full-size desktop computer.  The experimenter gave the participants some filler tasks, and then left, telling them that s/he would be back in five minutes to debrief and pay them, but that they could also come and get him/her at the desk outside.  S/he then didn't come back after five minutes as announced, but instead waited ten minutes.  The main outcome variable was whether the participants came to get their money, and if they did how long they waited before doing so, as a function of the size of the device that they had.  This was portrayed as a measure of their assertiveness, or lack thereof.

It turned out that, the smaller the device, the longer they waited, thus showing reduced assertiveness.  The authors' conclusion was that this was caused by the fact that, to use a smaller device, participants had to slouch over more.  The authors even have a cute name for this: the "iHunch".  And — drumroll please, here's the social priming bit — the fact that the participants with smaller devices were hunched over more made them more submissive to authority, which made them more reluctant to go and tell the researcher that they were ready to get paid their $10 participation fee and go home.

It's hard to know where to begin with this.  There are other plausible explanations, starting with the fact that a lot of people don't have an iPhone and might well enjoy playing with one compared to their Android phone, whereas a desktop computer is still just a desktop computer, even if it is a Mac.  And the effect size was pretty large: the partial eta-squared of the headline result is .177, which should be compared to Cohen's (1988) description of a partial eta-squared of .14 as a "large" effect.  Oh, and there were 75 participants in four conditions, making a princely 19 per cell.  In other words, all the usual suspect things about priming studies.

But what I find really annoying here is that we've gone straight from "preliminary research" to the New York Times without any of those awkward little academic niceties such as "peer review".  The article, in "working paper" form (1,000 words) is here; check out the date (May 2013) and ask yourself why this is suddenly front-page news when, after 30 months, the authors don't seem to have had time to write a proper article and send it to a journal, although one of them did have time to write 845 words for an editorial in the New York Times.  But perhaps those 845 words didn't all have to be written from scratch, because — oh my, surprise surprise — Professor Cuddy is "the author of the forthcoming book 'Presence: Bringing Your Boldest Self to Your Biggest Challenges.'"  Anyone care to take a guess as to whether this research will appear in that book, and whether its status as an unreviewed working paper will be prominently flagged up?

If this is the future — writing up your study pro forma and getting it into what is arguably the world's leading newspaper, complete with cute message that will appeal to anyone who thinks that everybody else uses their smartphone too much — then maybe we should just bring on the train wreck now.

*** Update 2015-12-17 09:50 UTC: I added a follow-up post here. ***

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.