27 April 2017

An open letter to Dr. Todd Shackelford

To the editor of Evolutionary Psychological Science:

Dear Dr. Shackelford,

On April 24, 2017, in your capacity as editor of Evolutionary Psychological Science, you issued an Editorial Note that referenced the article "Eating Heavily: Men Eat More in the Company of Women," by Kevin M. Kniffin, Ozge Sigirci, and Brian Wansink (Evolutionary Psychological Science, 2016, Vol. 2, No. 1, pp. 38–46).

The key point of the note is that the "authors report that the units of measurement for pizza and salad consumption were self-reported in response to a basic prompt 'how many pieces of pizza did you eat?' and, for salad, a 13-point continuous rating scale."

For comparison, here is the description of the data collection method from the article (p. 41):
Consistent with other behavioral studies of eating in naturalistic environments (e.g., Wansink et al. 2012), the number of slices of pizza that diners consumed was unobtrusively observed by research assistants and appropriate subtractions for uneaten pizza were calculated after waitstaff cleaned the tables outside of the view of the customers. In the case of salad, customers used a uniformly small bowl to self-serve themselves and, again, research assistants were able to observe how many bowls were filled and, upon cleaning by the waitstaff, make appropriate subtractions for any uneaten or half-eaten bowls at a location outside of the view of the customers.
It is clear that this description was, to say the least, not an accurate representation of the research record.  Nobody observed the number of slices of pizza.  Nobody counted partial uneaten slices when the plates were bussed.  Nobody made any surreptitious observations of salad either.  All consumption was self-reported.  It is difficult to imagine how this 100-plus word description could have accidentally slipped into an article.

Even if we ignore what appears to have been a deliberately misleading description of the method, there is a further very substantial problem now that the true method is known.  That is, the entire study would seem to depend on the amounts of food consumed having been accurately and objectively measured. Hence, the use of self-report measures of food consumption (which are subject to obvious biases, including questions around desirability), when the entire focus of the article is on how much food people actually (and perhaps unconsciously, due to the influence of evolutionarily-determined forces) consumed in various social situations, would seem to cast severe doubt on the validity of the study.  The methods described in the Editorial Note and the article itself are thus contradictory, as they describe substantially different methodologies. The difference between real-time unobtrusive observations by others, versus post hoc self-reports, is both practically and theoretically significant in this case. 

Hence, we are surprised that you apparently considered that issuing an "Editorial Note" was the appropriate response to the disclosure by the authors that they had given an incorrect description of their methods in the article.  Anyone who downloads the article today will be unaware that the study simply did not take place as described, nor that the results are probably confounded by the inevitable limitations of self-reporting.

Your note also fails to address a number of other discrepancies between the article and the dataset.  These include: (1) The data collection period, which the article reports as two weeks, but which the cover page for the dataset states was seven weeks; (2) The number of participants excluded for dining alone, which is reported as eight in the article but which appears to be six in the dataset; (3) The overall number of participants, which the article reports as 105, a number that is incompatible with the denominator degrees of freedom reported on five F tests on pp. 41–42 (109, 109, 109, 115, and 112).

In view of these problems, we believe that the only reasonable course of action in this case is to retract the article, and to invite the authors, if they wish, to submit a new manuscript with an accurate description of the methods used, including a discussion of the consequences of their use of self-report measures for the validity of their study.

Please note that we have chosen to publish this e-mail as an open letter here.   If you do not wish your reply to be published there, please let us know, and we will, of course, respect your wishes.


Nicholas J. L. Brown
Jordan Anaya
Tim van der Zee
James A. J. Heathers
Chris Chambers

12 April 2017

The final (maybe?) two articles from the Food and Brand Lab

It's been just over a week since Cornell University, and the Food and Brand Lab in particular, finally started to accept in public that there was something majorly wrong with the research output of that lab.  I don't propose to go into that in much detail here; it's already been covered by Retraction Watch and by Andrew Gelman on his blog.  As my quote in the Retraction Watch piece says, I'm glad that the many hours of hard, detailed, insanely boring work that my colleagues and I have put into this are starting to result in corrections to the scientific record.

The statement by Dr. Wansink contained a link to a list of articles for which he states that he has "reached out to the six journals involved to alert the editors to the situation".  When I clicked on that list, I was surprised to see two articles that neither my colleagues nor I had looked at yet.  I don't know whether Dr. Wansink decided to report these articles to the journals by himself, or perhaps someone else did some sleuthing and contacted him.  In any case, I thought that for completeness (and, of course, to oblige Tim van der Zee to update his uberpost yet again) I would have a look at what might be causing a problem with these two articles.

Wansink, B. (1994). Antecedents and mediators of eating bouts. Family and Consumer Sciences Research Journal, 23, 166182. http://dx.doi.org/10.1177/1077727X94232005

Wansink, B. (1994). Bet you can’t eat just one: What stimulates eating bouts. Journal of  Food Products Marketing1(4), 324. http://dx.doi.org/10.1300/J038v01n04_02

First up, there is a considerable overlap in the text of these two articles.  I estimate that 35–40% of the text from "Antecedents" had been recycled verbatim into "Bet", as shown in this image of the two articles side by side (I apologise for the small size of the page images from "Bet"):

The two articles present what appears to be the same study, from two different viewpoints (especially in the concluding sections, which as you can see above do not have any overlapping text) and with a somewhat different set of results reported. In "Antecedents", the theme is about education: broadly speaking, getting people to understand why the embark on phases of eating the same food, and the implications for dietary education.  In "Bet", by contrast, the emphasis is placed on food marketers; the aim is to get them to understand how they can encourage people to consume more of their product.  I suppose that, like the arms export policy of a country that sells arms to both sides in the same conflict, this could be viewed as hypocrisy or blissful neutrality.

The Method and Results sections show some curious discrepancies.  I assume the two articles must be describing the same study since the basic (212) and final (178) sample sizes are the same, and where the same item responses are reported in both articles, the numbers are generally identical, with one exception that I will mention below.  Yet some details differ for no obvious reason.  Thus, in "Antecedents", participants typically took 35 minutes to fill out a 19-page booklet, whereas in "Bet" then took 25 minutes to fill out an 11-page booklet.  In "Antecedents", the reported split between the kinds of food that participants discussed eating was 41% sweet, 29% salty, 16% dairy, and 14% "other".  In "Bet" the split was 52% sweet, 36% salty, and 12% "other".  The Cronbach's alpha reported for coder agreement was .87 in "Antecedents" but .94 in "Bet".

There are further inconsistencies in the main tables of results (Table 2 in "Antecedents", Table 1 in "Bet").  The principal measured variable changes from consumption intensity (i.e., the amount of the "eating bout" food that was consumed) to consumption frequency (the number of occasions on which the food was consumed), although the numbers remain the same.  The ratings given in response to the item "I enjoyed the food" are 0.8 lower in both conditions in "Bet" compared to "Antecedents".  On p. 14 of "Bet", the author reuses some text from "Antecedents" to describe the mean correlation between nutritiousness and consumption frequency, but inexplicably manages to copy the two correlations incorrectly from Table 2 and then calculate their mean incorrectly.

Finally, the F statistics and associated p values on p. 175 of "Antecedents" and pp. 12–13 of "Bet" have incorrectly reported degrees of freedom (177 should be 176) and in several cases, the p value is not, as claimed in the article, below .05.

Is this interesting?  Well, less than six months ago it would have been major news.  But so, today so much has changed that I don't expect many people to want to read a story saying "Cornell professor blatantly recycled sole-authored empirical article", just as you can't get many people to click on "President of the United States says something really weird".  Even so, I think this is important.  It shows, as did James Heathers' post from a couple of weeks ago, that the same problems we've been finding in the output of the Cornell Food and Brand Lab go back more than 20 years, past the period when that lab was headquartered at UIUC (1997–2005), through its brief period at Penn (1995–1997), to Dr. Wansink's time at Dartmouth.  When Tim gets round to updating his summary of our findings, we will be up to 44 articles and book chapters with problems, over 23 years.  That's a fairly large problem for science, I think.

You can find annotated versions of the article discussed in this post here.