06 August 2019

Some instances of apparent duplicate publication by Dr. Mark D. Griffiths

According to his Twitter bio, Dr. Mark D. Griffiths is a Chartered Psychologist and Distinguished Professor of Behavioural Addiction at the Nottingham Trent University. He is also a remarkably prolific researcher, with a Google Scholar h-index of 125. In this recent tweet he reports having just published the 879th paper of his career, which according to his faculty page makes about 30 publications per year since he obtained his PhD. In fact, that number may be a low estimate, as I counted 1,166 lines in the "Journal Articles" section of his publications list; half of these were published since 2010, which would represent more than one article published per week, every week of the year, for the last ten years.

Now, even 30 publications a year represents a lot of writing. I'm sure that many of us would like to be able to produce even a third as many pieces as that. And helpfully, Dr. Griffiths has written a piece at Psychology Today in which he give us "some general tips on how to make your writing more productive". But there is one tip that he didn't include, which is that copying and pasting extensively from one's previous manuscripts is a lot faster than typing new material.

Exhibit 1a

Here is a marked-up snapshot of a book chapter by Dr. Griffiths:
Griffiths, M. (2005). Internet abuse and addiction in the workplace. In M. Khosrow-Pour (Ed.), Encyclopedia of information science and technology (pp. 1623–1626). Hershey, PA: IGI Global.

The highlighted text appears to have been copied, verbatim and without attribution, from an earlier book chapter:
Griffiths, M. (2004). Internet abuse and addiction in the workplace: Issues and concerns for employers. In M. Anandarajan & C. A. Simmers (Eds.), Personal web usage in the workplace: A guide to effective human resources management (pp. 230–245). Hershey, PA: IGI Global.

Exhibit 1b

This is a snapshot of a journal article:
Griffiths, M. (2010). Internet abuse and internet addiction in the workplace. Journal of Workplace Learning, 22, 463–472. http://dx.doi.org/10.1108/13665621011071127

Over half of this article (the highlighted part in the above image) consists of text that appears to have been copied, verbatim and without attribution, from the same 2004 book chapter that was the source for the 2005 chapter depicted in Exhibit 1a, despite being published six years later.

One might, perhaps, have expected an article on the important and fast-moving topic of workplace Internet abuse to consist entirely of new material after such a long time, but apparently Dr. Griffiths considered his work from 2004 to still be highly relevant (and his assignment of copyright to the publisher of the 2004 and 2005 books to be of minor importance), to the extent that the 2010 article does not contain the terms "social media", "Twitter", "Facebook", "YouTube", or even "MySpace", although like the earlier chapter it does mention, perhaps somewhat anachronistically, the existence of web sites that host "internet versions of widely available pornographic magazines".

Exhibit 2a

Next up is another journal article:
Griffiths, M. (2009). Internet help and therapy for addictive behavior. Journal of CyberTherapy and Rehabilitation, 2, 43–52. (No DOI.)
The highlighted portions of this article appear to have been copied, verbatim and without attribution, from the following sources:
Yellow: Griffiths, M. (2005). Online therapy for addictive behaviors. CyberPsychology & Behavior, 8, 555–561. http://dx.doi.org/10.1089/cpb.2005.8.555
Green: Wood, R. T. A., & Griffiths, M. D. (2007). Online guidance, advice, and support for problem gamblers and concerned relatives and friends: An evaluation of the GamAid pilot service. British Journal of Guidance & Counselling, 35, 373–389. http://dx.doi.org/10.1080/03069880701593540
Blue: Griffiths, M. D., & Cooper, G. (2003). Online therapy: Implications for problem gamblers and clinicians. British Journal of Guidance & Counselling, 31, 113–135. http://dx.doi.org/10.1080/0306988031000086206

Exhibit 2b

Closely related to the previous exhibit is this book chapter:
Griffiths, M. (2010). Online advice, guidance and counseling for problem gamblers. In M. M. Cruz-Cunha, A. J. Tavares, & R. Simoes (Eds.), Handbook of research on developments in e-health and telemedicine: Technological and social perspectives (pp. 1116–1133). Hershey, PA: IGI Global.

The highlighted portions of this chapter appear to have been copied, verbatim and without attribution, from the following sources:
Yellow: Griffiths, M. D., & Cooper, G. (2003). Online therapy: Implications for problem gamblers and clinicians. British Journal of Guidance & Counselling, 31, 113–135. http://dx.doi.org/10.1080/0306988031000086206
Green: Griffiths, M. (2005). Online therapy for addictive behaviors. CyberPsychology & Behavior, 8, 555–561. http://dx.doi.org/10.1089/cpb.2005.8.555
Blue: Wood, R. T. A., & Griffiths, M. D. (2007). Online guidance, advice, and support for problem gamblers and concerned relatives and friends: An evaluation of the GamAid pilot service. British Journal of Guidance & Counselling, 35, 373–389. http://dx.doi.org/10.1080/03069880701593540
Apart from a change of coding colour, those are the same three sources that went into the article in Exhibit 2a. That is, these three source articles were apparently recycled into an article and a book chapter.

Exhibit 3

This one is a bit more complicated: an article of which about 80% consists of pieces that have been copied, verbatim and without attribution, from no less than seven other articles and book chapters.
Griffiths, M. D. (2015). Adolescent gambling and gambling-type games on social networking sites: Issues, concerns, and recommendations. Aloma, 33(2), 31–37. (No DOI.)

The source documents are:
Mauve: Anthony, K., & Griffiths, M. D. (2014). Online social gaming - why should we be worried? Therapeutic Innovations in Light of Technology, 5(1), 24–31. (No DOI.)
Pink: Carran, M., & Griffiths, M. (2015). Gambling and social gambling: An exploratory study of young people’s perceptions and behaviour. Aloma33(1), 101–113. (No DOI.)
Orange: Griffiths, M. D. (2014). Child and adolescent social gaming: What are the issues of concern? Education and Health, 32, 19–22. (No DOI.)
Indigo: Griffiths, M. D. (2014). Adolescent gambling via social networking sites: A brief overview. Education and Health31, 84–87. (No DOI.)
Light blue: Griffiths, M. D. (2013). Social gambling via Facebook: Further observations and concerns. Gaming Law Review & Economics, 17, 104–106. http://dx.doi.org/10.1089/glre.2013.1726
Yellow: Griffiths, M. (2011). Adolescent gambling. In B. B. Brown & M. Prinstein (Eds.), Encyclopedia of adolescence (Vol. 3, pp. 11–20). New York, NY: Academic Press.
Green: Griffiths, M. D. (2013). Social networking addiction: Emerging themes and issues. Journal ofAddiction Research & Therapy, 4, e118. http://dx.doi.org/10.4172/2155-6105.1000e118
Note that it is possible that I may have used more source documents than strictly necessary here, because some sections of the text are further duplicated across the various source articles and book chapters. However, in the absence (to my knowledge) of any definitions of best practices when looking for this type of duplication, I hope that readers will forgive any superfluous complexity.


In his Psychology Today piece, Dr. Griffiths describes a number of "false beliefs that many of us have about writing", including this: "Myth 2 - Good writing must be original: Little, if any, of what we write is truly original". I don't think I can improve on that.


All of the annotated documents that went into making the images in this post are available here. I hope that this counts as fair use, but I will remove any document at once if anyone feels that their copyright has been infringed (by me, anyway).

30 July 2019

Applying some error detection techniques to controversial past research: Rushton (1992)

A few days ago, James Heathers and I were cc'd in on a Twitter thread.

Neither of us had ever heard of J. Philippe Rushton before. At one point James tweeted this extract (which I later worked out was from this critical article), which included citations of authors pointing out "various technical errors in Rushton's procedures and theory".
I thought it might be interesting to look at an example of these "technical errors", so I picked one of those citations more of less at random (Cain & Vanderwolf, 1990, since it seemed like it would be easy to find on Google with only the authors' names), and downloaded both that article and Rushton's response to it. The latter was interesting because, although not an empirical article, it cited a number of other articles by Rushton. So I chose the one with the most empirical looking title, which was marked as being "In preparation", but ended up as this:
Rushton, J. P. (1992). Life-history comparisons between orientals and whites at a Canadian university. Personality and Individual Differences, 13, 439442. http://dx.doi.org/10.1016/0191-8869(92)90072-W
I found a PDF copy of this article at a "memorial site" dedicated to Rushton's work.

Now I don't know much about this area of research ("race differences"), or the kinds of questions that Rushton was asking in his survey, but it seems to me that there are a few strange things about this article. There were 73 "Oriental" and 211 "Non-Oriental" undergraduate participants (the latter apparently also being non-Black, non-Native American, etc., judging from the title of the article), who took first a two-hour and then a three-hour battery of tests in return for course credit. Some of these were regular psychological questionnaires, but then it all got a bit... biological (pp. 439440):
In the first session, lasting 2 hr, Ss completed a full-length intelligence test, the Multidimensional Aptitude Battery (Jackson, 1984). In the second session, lasting 3 hr, Ss completed the Eysenck Personality Questionnaire (Eysenck & Eysenck, 1975); the Sexual Opinion Survey (Fisher, Byrne, White & Kelley, 1988), the Self-Report Delinquency Scale (Rushton & Chrisjohn, 1981), and the Seriousness of Illness Rating Scale (Wyler, Masuda & Holmes, 1968), as well as self-report items assessing aspects of health, speed of maturation, sexual behaviour, and other life-history variables, many of which were similar to those used by Bogaert and Rushton (1989). Sex-combined  composites were formed from many of these items: Family Health included health ratings of various family members; Family Longevity included longevity ratings for various family members: Speed of Physical Maturation included age of puberty, age of pubic hair growth, age of menarche (for females), and age of first shaving (for males); Speed of Sexual Maturation included age of first masturbation, age of first petting, and age of first sexual intercourse; Reproductive Effort-Structures included size of genitalia, menstrual cycle length (for females), and amount of ejaculate (for males); Reproductive Effort-Behavioural included maximum number of orgasms in one 24 hr period, average number of orgasms per week, and maximum number of sexual partners in one month; and Family Altruism included parental marital status and self-ratings of altruism to family. Each S also rank ordered Blacks, Orientals, and Whites on several dimensions.
Whoa. Back up a bit there... (emphasis added)
Reproductive Effort-Structures included size of genitalia, menstrual cycle length (for females), and amount of ejaculate (for males)
The second and third of those variables are specified as being sex-specific, but the first, "size of genitalia", is not, suggesting that it was reported by men and women. Now, while most men have probably placed a ruler along their erect penis at some point, and might be prepared to report the result with varying degrees of desirability bias, I'm wondering how one measures "size of genitalia" in human females, not just in general, but also in the specific context of a bunch of people sitting in a room completing questionnaires. Similarly, I very much doubt if many of the men who had just put down their penis-measuring device had also then proceeded to ejaculate into a calibrated test tube and commit the resulting number of millilitres to memory in the same way as the result that they obtained from the ruler; yet, it would again appear to be challenging to accurately record this number (which, I suspect, is probably quite variable within subjects) in a lecture hall or other large space at a university where this type of study might take place.

I also have some doubts about some of the reported numbers. For example (p. 441):
At the item level ... the reported percentage frequency of reaching orgasm in each act of intercourse was 77% for Oriental males, 88% for White males, 40% for Oriental females, and 57% for White females.
Again, I'm not a sex researcher, but my N=1, first-hand experience of having been a healthy male undergraduate (full disclosure: this was mostly during the Carter administration) is that a typical frequency of reaching orgasm during intercourse is quite a lot higher than 88%. I checked with a sex researcher (who asked for their identity not to be used), and they told me that these appear to be exceptionally low rates for sexually functional young men in Canada, unless the question had been asked in an ambiguous way, such as "Did you finish?". (They also confirmed that measures of the dimensions of women's genitalia are nearly non-existent.)

Rushton also stated (p. 441) that "small positive correlations were found between head size and
general intelligence in both the Oriental (r = 0.14) and White samples (r = 0.21)"; indeed, he added in the Discussion section (p. 441) that "It is worth drawing attention to our replication of the head size-IQ relationship within an Oriental sample". However, with a sample size of 73, the 95% confidence interval around an r value of .14 is (.09, .37), which many researchers might not regard as indicative of any sort of replication.

There are some other numerical questions to be asked about this study. Look at Table 2, which shows the mean ratings given by the students of different "races" (Rushton's term) to three "races" (Black, White, and Oriental) on various attributes. That is, on, say, "Anxiety", each student ranked Blacks, Whites, and Orientals from 1 to 3, in some order, and the means of those integers were reported.

Did I just say "means of integers"? Maybe we can use GRIM here! With only two decimal places, we can't do anything with the 211 "Non-Oriental" participants, but we can check the means of the 73 "Orientals". And when we do(*), we find that seven of the 30 means are GRIM-inconsistent; that is, they are not the result of correctly rounding an integer total score that has been divided by 73. Those means are highlighted in this copy of Table 2,

It's important to note here that seven out of 30 (23%) inconsistent means is a lot when N=73, because if you just generate two-digit decimal values randomly with this sample size, 73% of them will be GRIM-consistent (i.e., 27% will be inconsistent) just by chance. A minute with an online binomial calculator shows that the chance of getting seven or more inconsistent means from 30 random values is about 58.5%; in other words, it's close to a toss-up.

A further issue is the totals of the three mean rankings in each row for each participant "race" do not always add up to 6.0. For example, the reported rounded Oriental rankings of Intelligence sum to 5.92, and even if these numbers had been rounded down from a mean that was 0.005 larger than the reported values in the table (i.e., 2.865, 1.985, and 1.085), the rounded row total would have been only 5.94. A similar problem affects the majority of the rows for Oriental rankings.

Of course, it is possible that either of these issues (i.e., the GRIM inconsistencies and the existence of total rankings below 6.00) could have been caused by missing values, although (a) Rushton did not report anything about completion rates and (b) some of the rankings could perhaps have been imputed very accurately (e.g., if 1 and 2 were filled in, the remaining value would be 3). It is, however, more difficult to explain how the total mean rankings by White participants for "Anxiety" and "Rule-following" came to have means of 6.03 and 6.02, respectively. Even if we assume that the component numbers in each case had been rounded up from a mean that was 0.005 smaller than the reported values in the table (e.g., for "Anxiety", these would be 2.145, 1.995, and 1.875), the rounded values for the row totals would be 6.02 for "Anxiety" and 6.01 for "Rule-following".

Another thing, pointed out by James Heathers, is that Rushton claimed (p. 441) that "No differences were found on the Speed of Physical Maturation or the Speed of Sexual Maturation composites"; and indeed there are no significance stars next to these two variables in Table 1. But these groups are in fact substantially different; the reported means and SDs imply t statistics of 5.1 (p < .001, 3 significance stars) and 2.8 (p < .01, 2 significance stars), respectively.

Finally, let's take a look at final-digit distribution of Rushton's numbers. I took the last digits of all of the means and SDs in Table 1, and all of the means in Table 2, and obtained this histogram:

Does this tell us anything? Well, we might expect the last digits of random variables to be uniformly distributed, although there could also be a small "Benford's Law" effect, particularly with the means and SDs that only have two significant figures, causing the distribution to be a little more right-skewed (i.e., with more smaller digits). We certainly don't have any reason to expect a substantially larger number of 4s, 5s, and 8s. The chi-square test here has a value of 15.907 on 9 degrees of freedom, for a p value of .069 (which might be a little lower if our expected distribution was a little heavier on the smaller numbers). Not the sort of thing you can take to the ORI on its own and demand action for, perhaps, but those peaks do look a little worrying.

The bottom line here is that if Dr. Rushton were alive today, I would probably be writing to him to ask for a close look at his data set.

(*) The Excel sheet and (ultra-minimal) R code for this post can be found here.

[Update 2019-07-31 08:23 UTC: Added discussion of the missing significance stars in Table 1.]

10 July 2019

An open letter to Dr. Jerker Rönnberg

**** Begin update 2019-07-10 15:15 UTC ****
Dr. Rönnberg has written to me to say that he has been made aware of this post (thanks to whoever alerted him), and he has now read my e-mail.
**** End update 2019-07-10 15:15 UTC ****

At the bottom of this post is the text of an e-mail that I have now sent three times (on May 9, June 11, and June 25 of this year) to Dr. Jerker Rönnberg, who --- according to the website of the Scandinavian Journal of Psychology --- is the editor-in-chief of that journal. I have received no reply to any of these three attempts to contact Dr. Rönnberg. Nor did I receive any sort of non-delivery notification or out-of-office reply. Hence, I am making my request public here. I hope that this will not be seen as an presumptuous, unprofessional, or unreasonable.

I sent the mail to two different e-mail addresses that I found listed for Dr. Rönnberg, namely sjoped@ibv.liu.se (on the journal website) and jerker.ronnberg@liu.se (on the Linköping University website). Of course, it is possible that those two addresses lead to the same mailbox.

A possibility that cannot be entirely discounted is that each of my e-mails was treated as spam, and either deleted silently by Linköping University's system on arrival, or re-routed to Dr. Rönnberg's "junk" folder. I find this rather unlikely because, even after acknowledging my bias in this respect, I do not see anything in the text of the e-mail that would trigger a typical spam filter. Additionally, when spam is deleted on arrival it is customary for the system to respond with "550 spam detected"; I would also hope that after 20 or more years of using e-mail as a daily communication tool, most people would check at least the subject lines of the messages in their "junk" folder every so often before emptying that folder. Another possibility is that Dr. Rönnberg is away on sabbatical and has omitted to put in place an out-of-office reply. Whatever the explanation, however, the situation appears to be that the editor of the Scandinavian Journal of Psychology is, de facto, unreachable by e-mail.

My frustration here is with the complete absence of any form of acknowledgement that my e-mail has even been read. If, as I presume may be the case, my e-mails were indeed delivered to Dr. Rönnberg's inbox, I would have imagined that it would not have been a particularly onerous task to reply with a message such as "I will look into this." Indeed, even a reply such as "I will not look into this, please stop wasting my time" would have been less frustrating than the current situation. It is going to be difficult for people who want to correct the scientific literature to do so if editors, who are surely the first point of contact in the publishing system, are not available to communicate with them.

I will leave it up to readers of this blog to judge whether the request that I made to Dr. Rönnberg in my e-mails is sufficiently relevant to be worthy of at least a minimal reply, and also whether it is reasonable for me to "escalate" it here in the form of an open letter. In the meantime, if any members of the Editorial Board of the Scandinavian Journal of Psychology, or any other colleagues of Dr. Rönnberg, know of a way to bring this message to his attention, I would be most grateful.

From: Nick Brown <nicholasjlbrown@gmail.com>
Date: Thu, 9 May 2019 at 23:32
Subject: Concerns with an article in Scandinavian Journal of Psychology
To: <jerker.ronnberg@liu.se>
Cc: James Heathers <jamesheathers@gmail.com>

Dear Dr. Rönnberg,

I am writing to you to express some serious concerns about the article "Women’s hairstyle and men’s behavior: A field experiment" by Dr. Nicolas GuĂ©guen, published in Scandinavian Journal of Psychology in November 2015 (doi: 10.1111/sjop.12253). My colleague James Heathers (in CC) and I have described our concerns about this article, as well as a number of other problems in Dr. GuĂ©guen's body of published work, in a document that I have attached to this e-mail, which we made public via a blog post (https://steamtraen.blogspot.com/2017/12/a-review-of-research-of-dr-nicolas.html) in December 2017.

More recently, we have been made aware of evidence suggesting that the research described in the article was in fact entirely designed and carried out by three undergraduate students. You will find a description of this issue in our most recent blog post (https://steamtraen.blogspot.com/2019/05/an-update-on-our-examination-of.html). For your convenience, I have also attached the report that these students wrote as part of their assignment, with their names redacted. (The original version with their names visible is available, but it is spread across several files; please let me know if you need it, and I will stitch those together.)

I have two principal concerns here. First, there would seem to be a severe ethical problem when a full professor writes an empirical article describing research that was apparently designed and carried out entirely by his students, without offering them authorship or indeed giving them any form of acknowledgement in the article. Second, we believe that the results are extremely implausible (e.g., an effect size corresponding to a Cohen's d of 2.44, and a data set that contains some unlikely patterns of regularity), which in turn leads us to believe that the students may have fabricated their work, as is apparently not uncommon in Dr. GuĂ©guen's class (cf. the comments from the former student who contacted us).

The decision about what, if anything, should be done about this situation is of course entirely in your hands. Please do not hesitate to ask if you require any further information.

Kind regards,
Nick Brown
PhD candidate, University of Groningen

09 May 2019

An update on our examination of the research of Dr. Nicolas Guéguen

(Joint post by Nick Brown and James Heathers)

It's now well over a year since we published our previous blog post about the work of Dr. Nicolas GuĂ©guen. Things have moved on since then, so here is an update.

*** Note: We have received a reply from the Scientific Integrity Officer at the University of Rennes-2, Alexandre Serres. See the update of 2019-05-22 at the bottom of this post ***

We have seen two documents from the Scientific Integrity Officer at the University of Rennes-2, which appears to have been the institution charged with investigating the apparent problems in Dr. Guéguen's work. The first of these dates from June 2018 and is entitled (our translation from French), "Preliminary Investigation Report Regarding the Allegations of Fraud against Nicolas Guéguen".

It is unfortunate that we have been told that we are not entitled to disseminate this document further, as it is considerably more trenchant in its criticism of Dr. GuĂ©guen's work than its successor, described in the next paragraph of this blog post. We would also like to stress that the title of this document is extremely inexact. We have not made, and do not make, any specific allegations of fraud, nor are any implied. The initial document that we released is entitled “A commentary on some articles by Dr. Nicolas GuĂ©guen” and details a long series of inconsistencies in research methods, procedures, and data. The words “fraud” and “misconduct” do not appear in this document, nor in any of our communications with the people who helped with the investigation. We restrict ourselves to pointing out that results are “implausible” (p. 2) or that scenarios are “unlikely [to] be enacted in practice” (p. 31).

The origin of inconsistencies (be it typographical errors, inappropriate statistical methods, analytical mistakes, inappropriate data handling, misconduct, or something else) is also irrelevant to the outcome of any assessment of research. Any research object with a strong and obvious series of inconsistencies may be deemed too inaccurate to trust, irrespective of their source. In other words, the description of inconsistency makes no presumption about the source of that inconsistency.

The second document, entitled "Memorandum of Understanding Regarding the Allegations of Lack of Scientific Integrity Concerning Nicolas GuĂ©guen", is dated October 2018, and became effective on 10 December 2018. It describes the outcome of a meeting held on 10 September 2018 between (1) Dr. GuĂ©guen, (2) the above-mentioned Scientific Integrity Officer, (3) a representative from the University of Rennes-2 legal department, and (4) an external expert who was, according to the report, "contacted by [Brown and Heathers] at the start of their inquiry". (We are not quite certain who this last person is, although the list of candidates is quite short.)

The Memorandum of Understanding is, frankly, not very hard-hitting. Dr. GuĂ©guen admits to some errors in his general approach to research, notably using the results of undergraduate fieldwork projects as the basis of his articles, and he agrees that within three months of the date of effect of the report, he will retract two articles: "High heels increase women's attractiveness" in Archives of Sexual Behavior (J1) and "Color and women hitchhikers’ attractiveness: Gentlemen drivers prefer red" in Color Research and Application (J2). Recall that our original report into problems with Dr. GuĂ©guen's research listed severe deficiencies in 10 articles; the other eight are barely mentioned.

On the question of Dr. Guéguen's use of undergraduate fieldwork: We were contacted in November 2018 by a former student from Dr. Guéguen's class, who gave us some interesting information. Here are a few highlights of what this person told us (our translation from French):
I was a student on an undergraduate course in <a social science field>. ... The university where Dr. GuĂ©guen teaches has no psychology department. ... As part of an introductory class entitled "Methodology of the social sciences", we had to carry out a field study. ... This class was poorly integrated with the rest of the course, which had nothing to do with psychology. As a result, most of the students were not very interested in this class. Plus, we were fresh out of high school, and most of us knew nothing about statistics. Because we worked without any supervision, yet the class was graded, many students simply invented their data. I can state formally that I personally fabricated an entire experiment, and I know that many others did so too. ... At no point did Dr. GuĂ©guen suggest to us that our results might be published.
Our correspondent also sent us an example of a report of one of these undergraduate field studies. This report had been distributed to the class by Dr. GuĂ©guen himself as an example of good work by past students, and has obvious similarities to his 2015 article "Women’s hairstyle and men’s behavior: A field experiment". It was written by a student workgroup from such an undergraduate class, who claimed to have conducted similar tests on passers-by; the most impressive of the three sets of results (on page 7 of the report) was what appeared in the published article. The published version also contains some embellishments to the experimental procedure; for example, the article states that the confederate walked "in the same direction as the participant about three meters away" (p. 638), a detail that is not present in the original report by the students. A close reading of the report, combined with our correspondent's comments about the extent of admitted fabrication of data by the students, leads us to question whether the field experiments were carried out as described (for example, it is claimed that the three students tested 270 participants between them in a single afternoon, which is extraordinarily fast progress for this type of fieldwork).

(As we mentioned in our December 2017 blog post, at one point in our investigation Dr. GuĂ©guen sent us, via the French Psychological Society, a collection of 25 reports of field work carried out by his students. None of these corresponded to any of the articles that we critiqued. Presumably he could have sent us the report that appears to have become the article "Women’s hairstyle and men’s behavior: A field experiment", but apparently he chose not to do so. Note also that the Memorandum of Understanding does not list this article as one that Dr. GuĂ©guen is required to retract.)

We have made a number of documents available at https://osf.io/98nzj/, as follows:
  • "20190509 Annotated GuĂ©guen report and response.pdf" will probably be of most relevance to non French-speaking readers. It contains the most relevant paragraphs of the Memorandum of Understanding, in French and (our translation) English, accompanied by our responses in English, which then became the basis of our formal response.
  • "Protocole d'accord_NG_2018-11-29.pdf" is the original "Memorandum of Understanding" document, in French.
  • "20181211 RĂ©ponse Brown-Heathers au protocole d'accord.pdf" is our formal response, in French, to the "Summary" document.
  • "20190425 NB-JH analysis of Gueguen articles.pdf" is the latest version of our original report into the problems we found in 10 articles by Dr. GuĂ©guen.
  • "Hairstyle report.pdf" is the student report of the fieldwork (in French) with a strong similarity to the article "Women’s hairstyle and men’s behavior: A field experiment", redacted to remove the names of the authors.
Alert readers will have noted that almost five months have elapsed since we wrote our response to the "Memorandum of Understanding" document. We have not commented publicly since then, because we were planning to publish this blog post in response to the first retraction of one of Dr. Guéguen's articles, which could either have been one that he was required to retract by the agreement, or one from another journal. (We are aware that at least two other journals, J3 and J4, are actively investigating multiple articles by Dr. Guéguen that they published.)

However, our patience has now run out. The two articles that Dr. Guéguen was required to retract are still untouched on the respective journals' websites, and our e-mails to the editors of those journals asking if they have received a request to retract the articles have gone unanswered (i.e., we haven't even been told to mind our own business) after several weeks and a reminder. No other journal has yet taken any action in the form of a retraction, correction, or expression of concern.

All of this leaves us dissatisfied. The Memorandum of Understanding notes on page 5 that Dr. GuĂ©guen has 336 articles on ResearchGate published between 1999 and 2017. We have read approximately 40 of these articles, and we have concerns about the plausibility of the methods and results in a very large proportion of those. Were this affair to be considered closed after the retraction of just two articles—not including one that seems to have been published without attribution from the work of the author’s own students—it seems to us that this would leave a substantial amount of serious inconsistencies unresolved.

Accordingly, we feel it would be prudent for the relevant editors of journals in psychology, marketing, consumer behaviour, and related disciplines to take action. In light of what we now know about the methods deployed to collect the student project data, we do not think it would be excessive for every article by Dr. Guéguen to be critically re-examined by one or more external reviewers.

[ Update 2019-05-09 15:03 UTC: An updated version of our comments on the Memorandum of Understanding was uploaded to fix some minor errors, and the filename listed here was changed to reflect that. ]

[ Update 2019-05-09 18:51 UTC: Fixed a couple of typos. Thanks to Jordan Anaya. ]

[ Update 2019-05-10 16:33 UTC: Fixed a couple of typos and stylistic errors. ]

[ Update 2019-05-22 15:51 UTC:
We have received a reply to our post from Alexandre Serres, who is the Scientific Integrity Officer at the University of Rennes-2. This took the form of a 3-page document (in both French and English versions) that did not fit into the comments box of a Blogger.com post, so we have made these two versions available at our OSF page. The filenames are "RĂ©ponse_billet de Brown et Heathers_2019-05-20.pdf" (in French) and "RĂ©ponse_billet de Brown et Heathers_2019-05-20 EN" (in English).

We have also added a document that was created by the university before the inquiry took place (filename "Procédure_traitement des allégations de fraude_Univ Rennes2_2018-01-31.pdf"), which established the ground rules and procedural framework for the inquiry into Dr. Guéguen's research.

We thank Alexandre Serres for these clarifications, and would only add that, while we are disappointed in the outcome of the process in terms of the very limited impact that it seems to have had on the problems that we identified in the public literature, we do not have any specific criticisms of the way in which the procedure was carried out.

01 May 2019

The results of my crowdsourced reanalysis project

Just over a year ago, in this post, I asked for volunteers to help me reanalyze an article that I had read entirely by chance, and which seemed to have a few statistical issues. About ten people offered to help, and three of them (Jan van Rongen, Jakob van de Velde, and Matt Williams) stayed the course. Today we have released our preprint on PsyArXiv detailing what we found.

The article in question is "Is Obesity Associated with Major Depression? Results from the Third National Health and Nutrition Examination Survey" (2003) by Onyike, Crum, Lee, Lyketsos, and  Eaton. This has 951 citations according to Google Scholar, making it quite an important paper in the literature on obesity and mental health. As I mentioned in my earlier blog post, I contacted the lead author, Dr. Chiadi Onyike, when I first had questions about the paper, but our correspondence petered out before anything substantial was discussed.

It turns out that most of the original problems that I thought I had found were due to me misunderstanding the method; I had overlooked that the authors had a weighted survey design. However, even within this design, we found a number of issues with the reported results. The power calculations seem to be post hoc and may not have carried out appropriately; this makes us wonder whether the main conclusion of the article (i.e., that severe obesity is strongly associated with major depressive disorder) is well supported. There are a couple of simple transcription errors in the tables, which as a minimum seem to merit a correction. There are also inconsistencies in the sample sizes.

I should make it clear that there is absolutely no suggestion of any sort of misconduct here. Standards of reproducibility have advanced considerably since Onyike et al.'s article was published, as has our understanding of statistical power; and the remaining errors are of the type that anyone who has tried to assemble results from computer output into a manuscript will recognise.

I think that all four of us found the exercise interesting; I know I did. Everyone downloaded the publicly available dataset separately and performed their analyses independently, until we pooled the results starting in October of this year. We all did our analyses in R, whereas I had hoped for more diversity (especially if someone had used Stata, which is what the original authors used); however, this had the advantage that I was able to combine everybody's contribution into a single script file. You can find the summary of our analyses in an OSF repository (the URL for which is in the preprint).

We intend to submit the preprint for publication, initially to the American Journal of Epidemiology (where the original article first appeared). I'll post here if there are any interesting developments.

If you have something to say about the preprint, or any questions or remarks that you might have about this way of doing reanalyses, please feel free to comment!

19 February 2019

Just another week in real-world science: Butler, Pentoney, and Bong (2017).

This is a joint post by Nick Brown and Stuart Ritchie. All royalty cheques arising from the post will be split between us, as will all the legal bills.

Today's topic is this article:
Butler, H. A., Pentoney, C., & Bong, M. P. (2017). Predicting real-world outcomes: Critical thinking ability is a better predictor of life decisions than intelligence. Thinking Skills and Creativity, 25, 38–46. http://dx.doi.org/10.1016/j.tsc.2017.06.005

We are not aware of any official publicly available copies of this article, but readers with institutional access to Elsevier journals should have no trouble in finding it, and otherwise we believe there may exist other ways to get hold of a copy using the DOI.

Butler et al.'s article received some favourable coverage when it appeared, including in Forbes, Psychology Today, the BPS Digest, and an article by the lead author in Scientific American that was picked up by the blog of the noted skeptic (especially of homeopathy) Edzard Ernst. Its premise is that the ability to think critically (measured by an instrument called the Halpern Critical Thinking Assessment, HCTA) is a better predictor than IQ (measured with a set of tests called the Intelligence Structure Battery, or INSBAT) of making life decisions that lead to negative outcomes, measured by the Real-World Outcomes (RWO) Inventory, which was described by its creator in a previous article (Butler, 2012).

In theory, we’d expect both critical thinking and IQ to act favourably to reduce negative experiences. The correlations between both predictors and the outcome in this study would thus be expected to be negative, and indeed they were. For critical thinking the correlation was −.330 and for IQ it was −.264. But is this a "significant" difference?

To test this, Butler et al. conducted a hierarchical regression, entering IQ (INSBAT) and then critical thinking (HCTA) as predictors. They concluded that, since the difference in R² when the second predictor (HCTA) was added was statistically significant, this indicated that the difference between the correlations of each predictors with the outcome (the correlation for HCTA being the larger) was also significant. But this is a mistake. On its own, the fact that the addition of a second predictor variable to a model causes a substantial increase in R² might tell us that both variables add incrementally to the prediction of the outcome, but it tells us nothing about the relative strength of the correlations between the two predictors and the outcome. This is because the change in R² is also dependent on the correlation between the two predictors (here, .380). The usual way to compare the strength of two correlations, taking into account the third variable, is to use Steiger’s z, as shown by the following R code:

> library(cocor)
> cocor.dep.groups.overlap(-.264, -.330, .380, 244, "steiger1980", alt="t")
<some lines of output omitted for brevity>
 z = 0.9789, p-value = 0.3276

So the Steiger’s z test tells us that there’s no statistically significant difference between the sizes of these two (dependent) correlations in this sample, p = .328.

We noted a second problem, namely that the reported bivariate correlations are not compatible with the results of the regression reported in Table 2. In a multiple regression model, the standardized regression coefficients are determined (only) by the pattern of correlations between the variables, and in the case of the two-predictor regression, these coefficients can be determined by a simple formula. Using that formula, we calculated that the coefficients for INSBAT and HCTA in model 2 should be −.162 and −.268, respectively, whereas Butler et al.’s Table 2 reports them as −.158 and −.323. When we wrote to Dr. Butler in July 2017 to point out these issues, she was unable to provide us with the data set, but she did send us an SPSS output file in which neither the correlations nor the regression coefficients exactly matched the values reported in the article.

There was a very minor third problem: The coefficient of .264 in the first cell of Table 2 is missing its minus sign. (Dr. Butler also noticed that there was an issue with the significance stars in this table.)

We wrote to the two joint editors-in-chief of Thinking Skills and Creativity in November 2017. They immediately indicated that they would handle the points that we had raised with the "journal management team" (i.e., Elsevier). We found this rather surprising, as we had only raised scientific issues that we imagined would be entirely an editorial matter. Over the following year we occasionally sent out messages asking if any progress had been made. In November 2018, we were told by the Elsevier representative that following a review of the Butler et al. article by two independent reviewers who are "senior statistical experts in this field", the journal had decided to issue a correction for... the missing minus sign in Table 2. And nothing else.

We were, to say the least, somewhat disappointed by this. We wrote to ask for a copy of the report by these senior statistical experts, but received no reply (and, after more than three months, we guess we aren't going to get one). Perhaps the experts disagree with us about the relevance of Steiger's z, but the inconsistencies between the correlations and the regression coefficients are a matter of simple mathematics and the evidence of numerical discrepancies between the authors' own SPSS output and the published article is indisputable.

So apparently Butler et al.'s result will stand, and another minor urban legend with no empirical support will be added to the folklore of "forget IQ, you just have to work hard (and I can show you how for only $499)" coaches. Of course, both of us are in favour of critical thinking. We just wish that people involved in publishing research about it were as well.

We had been planning to wait for the correction to be issued before we wrote this post, but as far as we can tell it still hasn't appeared (well over a year since we originally contacted the editors, and 19 months since we first contacted the authors). Some recent events make us believe that now would be an appropriate moment to bring this matter to public attention. Most important among these are the two new papers from Ben Goldacre and his team, showing what (a) editors and (b) researchers did when problems were pointed out in medical trial study protocols (spoiler: very often, not much). Then the inimitable James Heathers tweeted this thread expressing some of the frustrations that he (sometimes abetted by Nick) has had when trying to get editors to fix problems. And last week we also saw the case of a publisher taking a ridiculous amount of time to retract an article that was published in one of their journals published after it had been stolen, accompanied by an editorial note of the "move along, nothing to see here" variety.

There seems to be a real problem with academic editors, especially those at the journals of certain publishers, being reluctant, unwilling, or unable to take action on even the simplest problems without the approval of the publisher, whose evaluation of the situation may be based as much on the need to save face as to correct the scientific record.

A final anecdote: One of us (Nick) has been told of a case where the editor would like to retract at least two fraudulent articles but is waiting for the publisher (not Elsevier, in that case) to determine whether the damage to their reputation caused by retracting would be greater than that caused by not retracting. Is this really the kind of consideration to which we want the scientific literature held hostage?


Butler, H. A. (2012). Halpern critical thinking assessment predicts real-world outcomes of critical thinking. Applied Cognitive Psychology, 26, 721–729. http://dx.doi.org/10.1002/acp.2851

17 December 2018

Have scientists found an explanation for the onset of ME/CFS?

In this post I'm going to discuss this article (the link leads to a page from which you can download the PDF; I'm not sure if this will last), which appeared today (17 December 2018):

Russell, A., Hepgula, N., Nikkheslat, N., Borsini, A., Zajkowska, Z., Moll, N., . . . Pariante, C. M. (2018). Persistent fatigue induced by interferon-alpha: A novel, inflammation-based, proxy model of chronic fatigue syndrome. Psychoneuroendocrinology. Advance online publication. http://dx.doi.org/10.1016/j.psyneuen.2018.11.032

(Notes for nerds: (1) The article date will become 2019 when it appears in print; (2) There are 20 named authors, so I'm glad that APA referencing style only requires me to list the first six and the last one. I will be calling it "the article" or "the study" or "Russell et al." henceforth.)

Before I start, a small disclosure. In 2015, a colleague and I had a manuscript desk-rejected by Psychoneuroendocrinology for what we considered inadequate reasons. This led to a complaint to the Committee on Publication Ethics and a change in the journal's editorial policies, but unfortunately did not result in our article being sent out for review; it was subsequently published elsewhere. My interest in the Russell et al. article arose for entirely unrelated reasons, and I only discovered the identity of the journal after deciding to look at it. So, to the extent that one's reasoning can ever be free of motivation, I don't believe that my criticisms of the article that follow here are related to the journal in which it appeared. But it seems like a good idea to mention this, in case the editor-in-chief of the journal is reading this post and recognises my name.

Media coverage

This article is getting a fair amount of coverage in the UK media today, for example at the BBC, the Mail Online, the Independent, and the Guardian (plus some others that are behind a paywall). The simplified story that these outlets are telling is that "chronic fatigue syndrome is real and is caused by [the] immune system" (Mail Online) and that the study "challenges stigma that chronic fatigue is 'all in the mind'" (Independent). Those are hopeful-sounding messages for ME/CFS patients, but I'm not sure that such conclusions are justified.

I was made aware of this article by a journalist friend, who had received an invitation to attend a press briefing for the article at the Science Media Centre in London on Friday 14 December. By a complete coincidence I was in London that morning and decided to go along. I was allowed in without a press pass after identifying myself as a researcher, but when I tried to get clarification of a point that had been made during the presentation I was told that only reporters (i.e., not researchers or other members of the public) were allowed to ask questions. This was a little annoying at the time, but on reflection it seems fair enough since time is limited and the event was organised for journalists, not for curious researchers with a little time on their hands. There were about 10 journalists present, from most of the major UK outlets.

You can get a summary of the study from the media pieces linked above (the Guardian's coverage by Nicola Davis is particularly good). If you haven't seen the media articles, go and read them now, and then come back to this post. There was also a press release. I suggest that you also read the Russell et al. article itself, although it does get pretty technical.

What did the study claim to show?

Here's my summary of the study: The participants were 55 people with hepatitis C who were about to undergo interferon-alpha (IFN-α) treatment. The treatment lasted 24, 36, or 48 weeks. At five time points (at the start of treatment, than after 4 weeks, 8 weeks, 12 weeks, and at the end of treatment, whenever that might have been), patients were asked about their levels of fatigue and also had their cytokine levels (a measure of activity in the immune system) tested. These tests were then repeated six months after the end of treatment. Patients were also assessed for depression, stressful life events, and childhood trauma.

Interferon-alpha occurs naturally in the body as part of the immune system, but it can also be injected to fight diseases in doses that are much greater than what your body can produce. It's sometimes used as an adjunct to chemotherapy for cancer. IFN-α treatment often has substantial fatigue as a side effect, although this fatigue typically resolves itself gradually after treatment ends. But six months after they finished their treatment, 18 of the 55 patients in this study had higher levels of fatigue than when they started treatment. These patients are referred to as the PF ("persistent fatigue") group, compared to the 37 whose fatigue more or less went away, who are the RF ("resolved fatigue") group.

The authors' logic appears to run like this:
1. Some people still have a lot of fatigue six months after the end of a 24/36/48-week long course of treatment with IFN-α for hepatitis C.
2. Maybe we can identify what it is about those people (one-third of the total) that makes them slower to recover from their fatigue than the others.
3. ME/CFS patients are people who have fatigue long after the illness that typically preceded the onset of their condition. (It seems to be widely accepted by all sides in the ME/CFS debate that a great many cases occur following a infection of some kind.)
4. Perhaps what is causing the onset of fatigue after their infectious episode in ME/CFS patients is the same thing causing the onset of fatigue after IFN-α treatment in the hepatitis C patients.

Russell et al.'s claim is that patients who went on to have persistent fatigue (versus resolved fatigue) at a point six months after the end of their treatment, had also had greater fatigue and cytokine levels when they were four weeks into their treatment (i.e., between 46 and 70 weeks before their persistent fatigue was measured, depending on how long the treatment lasted). On this account, something that happened at an early stage of the procedure determined how well or badly people would recover from the fatigue induced by the treatment, once the treatment was over.

Just to be clear, here are some things that Russell et al. are not claiming. I mention these partly to show the limited scope of their article (which is not necessarily a negative point; all scientific studies have a perimeter), but also to make things clearer in case a quick read of the media coverage has led to confusion in anyone's mind.
- Russell et al. are not claiming to have identified the cause of ME/CFS.
- Russell et al. are not claiming to have identified anything that might cure ME/CFS.
- Russell et al. are not claiming to have demonstrated any relation between hepatitis C and ME/CFS.
- Russell et al. are not claiming to have demonstrated any relation between interferon-alpha --- whether this is injected during medical treatment or naturally produced in the body by the immune system --- and ME/CFS. They do not suggest that any particular level of IFN-α predicts, causes, cures, or is any other way associated with ME/CFS.
- Russell et al. are not claiming to have demonstrated any relation between a person's current cytokine levels and their levels of persistent fatigue subsequent to interferon-alpha treatment for hepatitis C. (As they note on p. 7 near the bottom of the left-hand column, "we ... find that cytokines levels do not distinguish [persistent fatigue] from [resolved fatigue] patients at the 6-month followup".)
- Russell et al. are not claiming to have demonstrated any relation between a person's current cytokine levels and their ME/CFS status. (As Table 2 shows, cytokine levels are comparable between ME/CFS patients and the healthy general population.)

Some apparent issues

Here are some of the issues I see with this article in terms of its ability to tell us anything about ME/CFS.

1. This was not a study of ME/CFS patients

It cannot be emphasised enough that none of the patients in this study had a diagnosis of ME/CFS, either at the start or the end of the study, and this greatly limits the generalisability of the results. (To be fair, the authors go into some aspects of this issue in their Discussion section on the left-hand side of p. 8, but the limitations of any scientific article rarely make it into the media coverage.) We don't know how long ago these hepatitis C patients were treated with interferon-alpha and subsequently tested at follow-up, or if any of them still had fatigue another six or 12 months later, or if they ever went on to receive a diagnosis of ME/CFS. One of the criteria for such a diagnosis is that unresolved fatigue lasts longer than six months (so it would have been really useful to have had a further follow-up). But in any case, the fatigue that Russell et al. studied was, by definition, not of sufficient duration to count as "chronic fatigue syndrome" (and, of course, there are several other criteria that need to be met for a diagnosis of ME/CFS; chronic fatigue by itself is a lot more common than full-on ME/CFS). I feel that it is therefore rather questionable to refer to "the presence of the CFS phenotype... for ... IFN-α-induced persistent fatigue" (last sentence on p. 5). Maybe this is just an oversight, but even the description of persistent fatigue as "the CFS-like phenotype", used at several other points in the article, is also potentially somewhat loaded.

Furthermore, the patients in this study were people whom we would have expected to be fatigued, at least throughout their treatment. IFN-α treatment knocks you about quite a bit. Additionally, fatigue is also a common symptom of hepatitis C infection, which makes me wonder whether some of the patients with "persistent fatigue" maybe just had a slightly higher degree of fatigue from their underlying condition rather than the IFN-α treatment --- the definition of persistent fatigue was any score on the Chalder Fatigue Scale that was higher than baseline, presumably even by one point (and, theoretically, even if the score was 0 at baseline and 1 six months after treatment ended). So Russell et al. are comparing people who are recovering faster or slower from fatigue that is entirely expected both from the condition that they have and the treatment that they underwent, with ME/CFS patients in whom the onset of fatigue is arguably the thing that needs to be explained.

There are many possible causes of fatigue, and I don't think that the authors have given us any good reason to believe that the fatigue that was reported by their hepatitis C patients six months after finishing an exhausting medical procedure that itself lasted for half a year or more was caused by the same mechanism (whatever that might be) as the multi-year ongoing fatigue in ME/CFS patients, especially since, for all we know, some or all of the 18 cases of persistent fatigue might have been only marginal (i.e., a small amount worse than baseline) or resolved themselves within not too many months.

2. Is post-treatment fatigue really unrelated to cytokine levels?

It can be seen from Table 2 of the article that the people with "persistent fatigue" (i.e., the hepatitis C patients who were still fatigued six months after finishing treatment) still had elevated cytokine levels at that point, compared to samples of both healthy people and ME/CFS patients. Indeed, these cytokine levels were similarly high in patients whose fatigue had not persisted. The authors ascribe these higher levels of cytokines to the IFN-α treatment; their argument then becomes that, since both the "resolved fatigue" and "persistent fatigue" groups had similar cytokine levels, albeit much higher than in healthy people, that can't be what was causing the difference in fatigue in this case. But I'm not sure they have done enough to exclude the possibility of those high cytokine levels interacting with something else in the PF group. (I must apologise to my psychologist friends here for invoking the idea of a hidden moderator.) Their argument appears to be based on the assumption that ME/CFS-type fatigue and post-IFN-α-treatment fatigue have a common cause, which remains unexplained; however, in the absence of any evidence of what that mechanism might be, this assumption seems to be based mainly on speculation.

3. Statistical limitations

The claim that the difference in fatigue at six-month follow-up was related to a difference in cytokine levels four weeks into the treatment does not appear to be statistically robust. The headline claim --- that fatigue was greater after four weeks in patients who went on to have persistent fatigue --- has a p value of .046, and throughout the article, many of the other focal p values are either just below .05, or even slightly higher, with values in the latter category being described as, for example, "a statistical trend towards higher fatigue", p. 4.  But in the presence of true effects, we would expect a preponderance of much smaller p values.

Russell et al. also sometimes seem to take a creative approach to what counts as a meaningful result. For example, at the end of section 3.1, the authors consider a p value of .09 from a test to represent "trend-statistical significance" (p. 4) and at the start of section 3.2 they invoke another p value of .094 as showing that "IL-6 values in [persistent fatigue] subjects ... remained higher at [the end of treatment]" (p. 5), but in the sentence immediately preceding the latter example, they treat a p value of .12 as indicating that there was "no significant interaction" (p. 5).

These borderline p values should also be considered in the light of the many other analyses that the authors could have performed. For example, they apparently had all the necessary data to perform the comparisons after eight weeks of treatment, after 12 weeks of treatment, and at the end of treatment, as well as the four-week results that they mainly reported. None of the eight-week or 12-week results appear in the article, and the two from the end of treatment are extremely unconvincingly argued (see previous paragraph). It is possible that the authors simply did not perform any tests on these results, but I am inclined to believe that they did run these tests and found not to provide support for their hypotheses.

There is also a question of whether we should be using .05 as our criterion for statistical significance with these results. (I won't get into the separate discussion of whether we should be using statistical significance as a way of determining scientific truth at all; that ship has sailed, and until it voluntarily returns to port, we are where we are.) Towards the bottom of the left-hand column of p. 8, we read:
Finally, due to the sample size there was no correction for multiple comparisons; however, we aimed to limit the number of statistical comparisons by pre-selecting the cytokines to measure at the different stages of the study.
It's nice that the authors pre-selected their predictors, but that is not sufficient. If (as seems reasonable to assume) they also tested the differences between the groups at eight or 12 weeks into the treatment, and found that the results were not significantly different, they should have adjusted their threshold for statistical significance accordingly. The fact that they did not have a very large sample size is not a valid reason not to do this, so I am slightly perplexed by the term "due to" in the sentence quoted above. (The sample size was, indeed, very small. Not only were there only 55 people in total; there were only 18 people in the condition of principal interest, displaying the "CFS-like phenotype". Under these conditions, any effect would have to be very large to be detected reliably.)


I don't find Russell et al.'s study to be very convincing. My guess is that different cytokine levels do not predict fatigue in either hepatitis C/IFN-α patients or ME/CFS patients, and that the purported relation between cytokine levels at four weeks into the IFN-α treatment and subsequent fatigue may well just be noise. In terms of explaining how ME/CFS begins, let alone how we might prevent or cure it, this study may not get us any closer to the truth.