13 December 2015

Digging further into the Bos and Cuddy study

*** Post updated 2015-12-19 20:00 UTC
*** See end of post for a solution that matches the reported percentages and chi-squares.
 A few days ago, I blogged about Professor Amy Cuddy's op-ed piece in the New York Times, in which she cited a non-published, non-peer reviewed study about "iPosture" by Bos and Cuddy of how people allegedly deferred more to authority when they used smaller (versus larger) computing devices, because using smaller devices caused them to hunch (sorry, "iHunch") more, and then something something assertiveness something something testosterone and cortisol something.  (The authors apparently didn't do anything as radical at to actually measure, or even observe, how much people hunched, if at all; they took it for granted that "smaller device = bigger iHunch", so that the only possible explanation for the behaviours they observed was the one they hypothesized.  As I noted in that other post, things are so much easier if you bypass peer review.)

Just for fun, I thought I'd try and reconstruct the contingency tables for "people staying on until the experimenter came and asked them to leave the room" from the Bos and Cuddy article, mainly because I wanted to make my own estimate of the effect size.  Bos and Cuddy reported this as "[eta] = .374", but I wanted to experiment with other ways of measuring it.

In their Figure 1, which I have taken the liberty of reproducing below (I believe that this is fair use, according to Harvard's Open Access Policy, which is to be found here), Bos and Cuddy reported (using the dark grey bars) the percentage of participants who left the room to go and collect their pay, before the experimenter returned.  Those figures are 50%, 71%, 88%, and 94%.  The authors didn't specify how many participants were in each condition, but they had 75 people and 4 conditions (phone, tablet, laptop, desktop), and they stated that they randomised each participant to one condition.  So you would expect to find three groups of 19 participants and one of 18.

However, it all gets a bit complicated here.  It's not possible to obtain all four of the percentages that were reported (50%, 71%, 88%, and 94%), rounded conventionally, from a whole number of participants out of 18 or 19.  Specifically, you can take 9 out of 18 and get 50%, or you can take 17 out of 18 and get 94% (0.9444, rounded down), but you can't get 71% or 88%, with either 18 or 19 as the cell size.  So that suggests that the groups must have been of uneven size.  I enumerated all the possible combinations of four cell sizes from 13 to 25 which added up to 75 and also allowed for the percentages of participants who left the room, correctly rounded, to be one of the integers we're looking for.  Here they those possible combinations, with the total numbers of participants first and the percentage and number of leavers in parentheses:

14 (50%=7), 21 (71%=15), 24 (88%=21), 16 (94%=15)
18 (50%=9), 24 (71%=17), 16 (88%=14), 17 (94%=16)
20 (50%=10), 21 (71%=15), 16 (88%=14), 18 (94%=17)
20 (50%=10), 14 (71%=10), 24 (88%=21), 17 (94%=16)
22 (50%=7), 21 (71%=15), 16 (88%=14), 16 (94%=15)

Well, I guess that's also "randomised" in a sense.  But if your sample sizes are uneven like this, and you don't report it, you're not helping people to understand your experiment.

But maybe they still round their numbers by hand at Harvard for some reason, and sometimes they make mistakes.  So let's see if we can get to within one point of those percentages (49% or 51% instead of 50%, 70% or 72% instead of 71%, etc).  And it turns out that we can, just, as shown in the figure below, in which yellow cells are accurately-reported percentages, and orange cells are "off by one".  We can take 72% for N=18 instead of 71%, and 89% for N=19 instead of 88%.  But then, we only have a sample size of 73.  So we could allow another error, replacing 94% for N=18 with 95% for N=19, and get up to a sample of 74.  Still not right.  So, even allowing for three of their four percentages to be misreported, the per-cell sample sizes must have been unequal.

However, if I was going to succeed in my original aim of reconstructing plausible contingency tables, there would be too many combinations to enumerate if I included these "off-by-one" percentages.  So I went back to the five possible combinations of numbers that didn't involve a reporting error in the percentages, and computed the chi-square values for the contingency tables implied by those numbers, using the online calculator here.  They came out between 10.26 and 12.37, with p values from .016 to .006; this range brackets the numbers reported by Bos and Cuddy (chi-square 11.03, p = .012), but none of them matches those values exactly; the closest is the last set (22, 21, 16, 16) with a chi-square of 11.22 and a p of .011.

So, I'm going to tentatively presume that in fact the sample sizes were all equal (give or take one for not having a number of participants divisible by four), and it's in fact the percentages on the dark grey bars in Bos and Cuddy's Figure 1 that are wrong.  For example, if I build this contingency table:

9 14 16 18
9 5 3 1
% Leavers 50% 74% 84% 95%

then the sample size adds up to 75, the per-condition sample sizes are equal, and the chi-square is 11.086 and the p value is .0113.  That was the closest I could get to the values of 11.03 and .012 in the article, although of course I could have missed something.  These numbers are close enough, I guess, although I'm not sure if I'd want to get on an aircraft built with this degree of attention to detail; we still have inaccuracies in three of the four percentages as well as the approximate chi-square statistic and p value.

Normally in circumstances like this, I'd think about leaving a comment on the article on PubPeer.  But it seems that, in bypassing the normal academic publishing process, Professor Cuddy has found a brilliant way of avoiding, not just regular peer review, but post-publication peer review as well.  In fact, unless the New York Times directs its readers to my blog (or another critical review) for some reason, Bos and Cuddy's study is impregnable by virtue of not existing in the literature.

PS:  This tweet, about the NY Times article, makes an excellent point:
Presumably we should all adopt the wide, expansive pose of the broadsheet newspaper reader. Come to think of it, in much of the English-speaking world at least, broadsheets are typically associated with higher status than tabloids.  Psychologists! I've got a study for you...

PPS: The implications of the light grey bars, showing the mean time taken to leave the room by those who didn't stay for the full 10 minutes, are left as an exercise for the reader.  In the absence of standard deviations (unless someone wants to reconstruct possible values for those from the ANOVA), perhaps we can't say very much, but it's interesting to try and construct numbers that match those means.

*** Update 2015-12-19 20:00 UTC: An alert reader has pointed out that there is another possible assignment of subjects to the conditions:
16 (50%=8), 24 (71%=17), 17 (88%=15), 18 (94%=17)
This gives the Chi-square of 11.03 and p of .012 reported in the article.
So I guess my only remaining complaint (apart from the fact that the article is being used to sell a book without having undergone peer review) is that the uneven cell sizes per condition was not reported.  This is actually a surprisingly common problem, even in the published literature.

No comments:

Post a Comment