Nick Brown's blog: 2020

06 December 2020

Treating COVID-19 patients in the ICU: A doctor's point of view

A few days ago I posted this tweet:

Germany's COVID-19 cases peaked about 18 days ago, but their deaths are still going up very fast. That seems like a longer lag than we see elsewhere. pic.twitter.com/E6aUztI2No
— Nick Brown (@sTeamTraen) December 2, 2020

In reply to that, I was contacted privately by “Dominique” (not their real name), a Twitter user whose profile says that they are an intensive care doctor in a country that has been quite badly affected by COVID-19. I have no reason to doubt that claim—indeed, it’s the entire basis of this post—but I haven’t attempted to formally verify it. I hope that regular readers of this blog will trust my judgment on this point; “false flag” theorists are invited to look away now.

Dominique’s initial contact was in response to my tweet, to discuss possible reasons why Germany’s numbers do not seem to be following the same pattern as other countries. But the conversation developed into a much wider-ranging discussion of the treatment of COVID-19 patients in the ICU. I found this fascinating, and thought it might be of interest (and perhaps even of professional use) to other people. So, with Dominique’s permission, I am converting this exchange of messages into a sort of interview, paraphrasing their responses a little and arranging them for context under “questions” that are intended to function as headings (they may not correspond to actual prompts that I gave during the exchange).

Dominique has read this post and approved it as a description of their thoughts. Everything here is written from Dominique’s point of view, unless it starts with “Nick:”.

Nick: Looking at my tweet, why do you think that Germany’s lag between cases and deaths seems to be so large?

One possible cause for the lag is that they wait for longer until withdrawing support. Waiting is somewhat futile in some cases, but we tend to wait longer if demand for ICU beds is not especially intense. That might not explain why the lag is so long, but it is a trend that we’ve noticed. But there might be other causes we don’t know about.

It could also be that German patients are more aggressively treated in general (the German system has plenty of resources), which could make it more likely that the elderly last longer, because they’re more likely to receive invasive organ support. On the other hand, maybe Germany has middle-aged patients attached to machines for a long time. We also wait longer in the younger patients, although it is futile in some cases.

Nick: I’ve also been wondering why Germany’s death rate is (or has been) so much lower than almost all its neighbours. This Wikipedia page suggests that Germany has 3-4 times more ICU beds than many comparable countries. Could that be a factor?

Germany did well from the beginning. I have the feeling that it has consistently shown the most robust response in Europe. They allocated resources according to the size of the problem. For example, they realized— which is not obvious to most physicians who don’t work in ICU—that proning patients [[i.e., lying them on their fronts in an induced coma]] clearly signals a likely stay of more than three weeks, which potentially leads to shortages of ICU beds. Apart from COVID cases, most people typically leave ICU much more quickly, one way or another

No other country reacted like this as quickly. Also, existing public health and hospital plans were typically designed for a bad flu epidemic, which requires different treatment patterns. The virus was quickly understood from an epidemiological point of view, but to design containment measures they needed to also understand the clinical aspects of the disease, which are what determine exactly what stresses it places on the healthcare system. It seems that Germany has understood this.

Nick: What are the conditions under which you stop treatment in the ICU?

There comes a point after which there will be no tissue function improvement, but we have to wait until the inflammation is over to determine that. It’s a bit like when Notre-Dame cathedral caught fire; they had to wait for it to stop burning to be able to evaluate whether what was left was enough to rebuild on. If what’s left in a patient is not enough to support life outside of the mechanical ventilation setup, one of us will mention this in the morning round and then we make a group assessment with several doctors involved; this might be “let’s wait a few more days”. Such decisions are not evaluated by a single person: there’s a stable group of physicians that receive the relevant information and coordinate such decisions for ICU patients.

The process of actually switching off support after a provisional assessment to do so has been made can take a few days more, maybe even a week or two if we are waiting for other processes that might be ongoing (such as a bacterial sepsis or gut ischemia) to have an outcome, which gives us time to prepare the family for what might happen. In fact many deaths are “semi-programmed”, a result of withdrawal of treatment due to ethical reasons (for example, via Do Not Resuscitate notices); the ICU doctors don’t necessarily walk solemnly over to the machine and switch it off at a particular time. But today, for example, I have a withdrawal of invasive support in a (non-COVID) patient for whom the decision was made 36 hours ago, so I will be present for that one. They will be discharged from the ICU and put on palliative care.

End of life in ICU hasn’t been a particularly pressing issue in our region. Maybe in XXX (a region of the country with three times as many cases) they could give you some examples of ethical dilemmas with COVID. Fortunately we have been able to manage them like any other ARDS [[see below]].

Nick: People from Sweden have told me that (also before COVID) almost nobody in that country gets admitted to ICU, as it’s considered unethical. I suppose it’s a coincidence that it’s also cheaper for the system.

Maybe they offered only basic or intermediate care to the elderly. And the healthier among the elderly will generally survive for three weeks even if the final outcome is death.

It’s sensible to be parsimonious with ICU care. We should probably reinforce intermediate-level care instead, which is a lot less resource-intensive and probably equally effective for many COVID cases. As it is, we are often operating at the limits of ethicality. There are too many decisions to take in a day to get everything right.

Something worth noting is that patients of all ages who die in the ICU share similarities in their disease progression. The older ones are somewhat more numerous, but otherwise their treatment, and the withdrawal of that treatment if it happens, is actually very similar to that of younger people. In fact we have very few patients over 75. So the idea that elderly people can somehow be blamed for taking up ICU capacity is rather unfair. An older person who gets infected is more likely to need the ICU than a younger one, but once they are there, they take up a similar amount of resources.

Another unusual thing with COVID ICU patients is that if they are going to improve, this often doesn’t happen until the third week. Before that their signs tend to oscillate randomly up and down. So we sometimes don’t know how the patient will do until the third week from the onset of severe symptoms, when acute inflammation declines noticeably.

Nick: Can you say something about the effects that all this has on your own mental wellbeing?

Part of the stress of COVID for healthcare workers, apart from the heavy workload, is the mental toll of not seeing what’s ahead in the mist any better than our patients can. It makes you feel like you are a passenger in the car, rather than the driver at the bedside. But once we recognised this pattern and came to accept it, it was easier mentally.

We were also lucky due to our region of the country having a relatively low level of the disease in the first wave. But because of the lockdown at that time, a lot of surgery was cancelled. So we spent part of that time doing planning and logistics, setting up decontamination routes, and customising our PPE. The sense of control that we got from that was good for dealing with our stress.

In the bad times we worked double shifts apart from the 24-hour ones, so there was one more of us on duty. It means we all worked more hours, but it took less of a mental toll, and it only lasted for a few weeks. Today we are over the worst of the second wave. We maintained a sense of control, and remained united like a small army, which really helped. It was also really great when people sent in gifts, especially PPE. I can’t imagine how hard it must have been in XXX.

Nick: What makes a case unrecoverable?

In this sense, COVID cases are easy to call. We sometimes have ethical cases in other pathologies for which we need to work with the support or advice of the medical ethics committee, but for COVID it’s quite clear-cut. After a month or so, there’s not going to be a major change in inflammation or further improvement to any tissue damage. So we test to see if the patient can live with that. The ones that can’t live quickly gasp and start to drown when you try to remove mechanical ventilation support. This relative uniformity in disease course, due to organ damage, can be seen from the fact that mortality waves lag weeks after the admission waves. Admission waves signal the onset of organ damage with loss of function.

There’s no way to escape permanent lung damage. Lungs will never heal in a way that recovers their previous function, and we don’t have the equivalent of a dialysis machine for them. People who manage to get out of ICU despite severe lung damage have a couple of likely outcomes, namely secondary right ventricular (heart) failure and tertiary liver dysfunction. It’s also unethical in any country to keep someone alive if that requires them to be sedated and connected to a machine. That’s worse than death. So the decision is straightforward.

Lung transplants are generally not an option. Only the best cases will be offered a transplant. Almost certainly nobody over 70, and probably not someone younger if they have other health issues. When a patients’ lungs (or heart, or kidneys...) fail, their relatives always ask “Maybe they can get a transplant?”, and we always give the same response: not unless they can survive without critical support. If they can’t make it to that stage, there’s no point. If a patient is not yet stable without invasive support then trying to transplant an organ is probably just a very nasty and expensive way to kill them. The patient would almost certainly experience multi-organ failure due to the extreme stress of transplant surgery, and we would have used a donor organ that could have saved another life. This is so obviously unethical that there are no dissenting opinions that we’re aware of. But I think transplant ethics might be worthy of some review. Transplant teams have been very stringent on some of these criteria, but probably for a good reason.

We only had an ethical issue in one patient when we tried to get her into transplant. We couldn’t, because she was too old, but we thought she deserved it. She made it out of ICU out of pure strength and effort. Maybe she could have had an opportunity, but now she will just have a couple of difficult years before her heart fails. Her kidneys and liver will follow. We hope that her case will be reviewed soon.

It’s a little like doing triage after a bad traffic accident. People with very extreme injuries are left to die if there are only limited resources available at the roadside and people with lesser injuries need those. That’s just how it is: Very severe patients are lower in the priority order sometimes.

But of course, our problems over a critical patient transplant seem trivial when the negligence of governments has caused thousands to die...

Nick: Can you say a bit more about ARDS (Acute Respiratory Distress Syndrome)?

ARDS is a symptom of many diseases, including COVID, but blood and plasma transfusions can cause ARDS as well, for example. Even giving people oxygen can cause a mild form of ARDS. The ARDS that we see in COVID and other SARS-type diseases has some specific features, but the lung management of all ARDS is more or less the same.

Of course we also have to treat the underlying condition that caused ARDS. For example, when we give corticoids to COVID patients, that’s aimed at treating the underlying infection. We don’t hope to improve ARDS directly with corticoids; several studies have reported that steroids are not consistently effective in ARDS. Corticoid treatment might prevent ARDS from getting worse, but we don’t know if it really improves the ARDS that has already been “programmed” by the course of COVID up to that point.

COVID ARDS is like a jail sentence, in that it requires the patient to go through the three week process described earlier. ARDS is much rarer in flu, and when it occurs it has much wider variation in the time spent in the ICU.

24 July 2020

How bad is self-plagiarism? A case study

One of the recurring topics in this blog over the last couple of years has been self-plagiarism, also known as duplicate publication or text recycling. I've shown that a number of senior scholars appear to have used this method to boost their number of publications without having to go to the effort of producing new research, or rewriting existing knowledge substantially for a new audience.

However, there has been some discussion online suggesting that quite a few people do not consider that self-plagiarism is a problem at all. For example, Dr Adriano Aguzzi of the University of Zürich sees no harm in it:

Today's rant is about academic "self-plagiarism". As the Editor-in-Chief of a medical journal, I thought long and hard about this issue. I have concluded that self-plagiarism is a non-existing crime. There is no moral imperative why authors should not re-use their own words.
— Adriano Aguzzi (@AdrianoAguzzi) July 20, 2020

That said, Dr Aguzzi does attach a couple of conditions to his support for authors recycling their own text:

4. My only constraint is that (1) republication be labelled as such, and (2) that copyright legislation be respected. But forcing people to paraphrase the exactly same concepts with different words, that may be a useful exercise for secondary school - but not for scientists.
— Adriano Aguzzi (@AdrianoAguzzi) July 20, 2020

[[ Update 2020-07-25 10:28 UTC: Dr Aguzzi seems to have deleted the above tweets, along with the rest of the thread in which they appeared. I had taken a screenshot of the first one, and @deadinsideg1 kindly hunted down the second.

]]

I opened Dr Aguzzi's Google Scholar page and looked for the most-cited article for which he was the lead author, which was this:

Aguzzi, A., & Polymenidou, M. (2004). Mammalian prion biology: One century of evolving concepts. Cell, 116(2), 313–327. https://doi.org/10.1016/S0092-8674(03)01031-6

A little searching online revealed that this was the starting point for a succession of examples of self-plagiairism, or "republication", or what Dr Aguzzi prefers to call it. In fact, as we will see, Dr Aguzzi takes a very liberal approach to recycling—or perhaps, in the modern parlance, upcycling—his previous publications.

Let's start with the 2004 Aguzzi & Polymenidou article:

The text highlighted in yellow here(*) appears to have been copied, verbatim and without attribution, to this 2006 article by the same author:

Aguzzi, A. (2006). Prion diseases of humans and farm animals: Epidemiology, genetics, and pathogenesis. Journal of Neurochemistry, 97(6), 1726–1739. https://doi.org/10.1111/j.1471-4159.2006.03909.x

In the second of Dr Aguzzi's tweets that I quoted earlier, he made a point that republication should be labelled, and copyright respected. Strangely, however, not only does the 2006 article not mention that parts of the text were previously published; the later article does not even include the 2004 article in its References section. (For completeness, in case the publication pipeline had gone slightly awry, I checked the References section of the 2004 article, but it didn't mention the 2006 article as being "in preparation" or anything like that.) Furthermore, the 2004 article is copyrighted by Cell Press, a division of Elsevier, while the 2006 article is copyrighted by the International Society for Neurochemistry (and the journal is published by Wiley). So it seems that neither of Dr Aguzzi's constraints are met here.

Some parts of the text in the above image are highlighted in green. That brings us to another article, this time with Dr Aguzzi as second author:

Weissmann, C., & Aguzzi, A. (2005). Approaches to therapy of prion diseases. Annual Review of Medicine, 56, 321–344. https://doi.org/10.1146/annurev.med.56.062404.172936

Again, the text highlighted in yellow here appears to have been copied, verbatim and without attribution, from the 2004 article mentioned above. The text highlighted in green appears to have been copied, verbatim and without attribution, from the 2006 article. The 2005 article does not include the 2006 or the 2004 article in its References section, or vice versa. Copyright for the 2005 article is owned by Annual Reviews.

Another new highlight colour (pink) appears here, especially in the second half. The more astute reader may be able to work out where we are going here. Adding the yellow, green, and pink text together, we arrive at close to 95% of the content of this book chapter:

Aguzzi, A. (2007). Prions. In J. H. Growdon and M. N. Rossor (Eds.), The Dementias 2 (pp. 250–275). Butterworth-Heinemann.

If you look hard at the third page in the top row, you can perhaps make out half a page of white text. With the exception of couple of sentences elsewhere in the chapter, that half-page represents the entire original content of this article. Again, this chapter does not cite any of the three articles from which the text has been apparently copied (all together now, 1, 2, 3) verbatim and without attribution. Still, on the bright side, the book's publisher is, like Cell Press, also a division of Elsevier, so presumably there is no risk of any legal issues with the text in yellow that was copied from the 2004 article.

So, back to the question in the title. Is self-plagiarism a bad thing? Dr Aguzzi clearly doesn't think so, and one can only admire the consistency of his position relative to his actions (although he might want to consider addressing the issues around labelling and copyright). I happen to think that this sort of thing is extremely bad for science, but it appears to be sufficiently common that maybe we are going to have to just live with it, and accept that some people think that churning out the same material over and over is perfectly acceptable.

[[ Update 2020-07-23 23:55 UTC: Thanks to Brendan O'Connor for this link to the Association for Psychological Science's policy on self-plagiarism. Spoiler: They are not too keen on it. ]]

(*) I have made the full-sized images and the annotated PDF files available for download here. I hope that this counts as fair use for the purposes of this blog post. If the owners of the copyright want to object to this, I hope that they will realise the irony that would be involved if they decided to enforce their rights now.

16 July 2020

An expression of concern about Expressions of Concern

In academic publishing, what is the purpose of a journal issuing an "Expression of Concern" (EoC)?

When I first came across the concept, I was told that an EoC was a sort of preliminary step on the way to retraction. The journal acknowledges that it has received information that suggests that an article may not be reliable. This information seems, on the face of it, to be quite convincing. The journal is still investigating exactly what happened, but in the meantime, here is an early warning that people who are thinking of citing this article might want to think twice. We could see it as the equivalent of locking up someone who is accused of a serious crime: They have not yet been found guilty, their detention is only preventive (and often under better conditions than those who have been convicted), but the prima facie case is such that on balance, we probably don't want to have that person walking around unchecked.

An example of this came in the Brian Wansink case. After retracting, republishing, and re-retracting one of Wansink's articles, JAMA placed EoCs on six other articles with Wansink as an author that had been published in its family of journals. A few months later, with no satisfactory response having being received to explain the problems in those articles, all six were retracted.

However, it appears that many journals or editors are using the term "Expression of Concern" to mean something else. This article has had an EoC on it for six years now. The editors of Psychology of Music just issued this EoC, but according to Samuel Mehr they have no plans to escalate to a retraction. The author of that last paper has also had five EoCs in place at another journal for over a year.

This type of EoC basically comes down to the following statement from the editors: "We have good reason to believe that this article is garbage, and you should not trust it. But we're not going to do anything about it that might hurt our impact factor, or embarrass us by getting us into Retraction Watch." It's like a restaurant menu with a small sticker saying "Pssst: The fish is terrible, please don't order it". (Plus, the sticker is permanent. It's inside the laminated cover of the menu.)

It has been suggested, more than once (albeit with some pushback) that we need different words for different types of retraction (say, "obvious fraud" versus "honest error"). It seems that we also need two different words to describe these two different usages of "Expression of Concern". One journal editor posted what he called an "Editorial Note" on a Wansink article; while this was frustrating for those of us who wanted that article to be retracted, at least it was very clear from that "Editorial Note" that the editor was not remotely interested in doing anything else about the problem. Perhaps that's the way "forward", although it doesn't feel like progress. Correcting the scientific record continues to feel like pulling teeth.

29 June 2020

The Guéguen saga update, summer 2020 edition

Regular readers of this blog may recall seeing a number of posts about the remarkable research of Dr Nicolas Guéguen. In 2017 I wrote here and here (and James Heathers wrote here and here) about several articles with Dr Guéguen as sole author that seemed to have a number of problems, which we summarized here. In May 2019 James and I posted an update in which we reported that the university had investigated and required Dr Guéguen to retract two articles, but that he had not yet done so by the deadline that he accepted. I wrote to the editors of the journals concerned, one of whom did not even acknowledge receipt of my e-mails until I wrote an open letter to him.

A year has gone by, and there have been a few developments. Modest developments, to be sure, but in the error detection business you take whatever you can get...

Radio silence

Dr Guéguen appears to have almost entirely stopped publishing research articles. He has deleted his Google Scholar profile, but by searching with his name I was able to identify only a few articles that have appeared since 2017, and some of those appear to have been submitted some time before that (e.g., this one, which was published online in 2013 but only assigned a definitive journal page number in 2017, in what seems --- unless there is a plausible alternative explanation --- to be rather dishonest behaviour by the journal, which appears to be using a known trick of garnering citations before attributing a final publication date in order to boost its impact factor). There is one article from 2019 with Dr Guéguen listed as last author, which seems to follow a similar design to much of the rest of his research output, but the journal in that case does not seem to report received/accepted dates on its articles, so this one could have been in the pipeline for some time. Apart from that, though, it seems that Dr Guéguen's previously prolific research output, with up to 20 single-authored publications in some years as well as numerous collaborations, seems to have suddenly ceased round about the time that we started raising questions about it. Presumably this is just a coincidence.

Antoine strikes again

Samuel Mehr has been in correspondence with the editors of Psychology of Music about this article:

Guéguen, N., Meineri, S., & Fischer-Lokou, J. (2014). Men’s music ability and attractiveness to women in a real-life courtship context. Psychology of Music, 42, 545–549. https://doi.org/10.1177/0305735613482025 (PDF available here)

Readers who have read one or two of Dr Guéguen's sole-authored articles may wonder exactly what the other two authors contributed here, as this study is just like the others: A guy called Antoine is trying to pick up young women. In this study he was either carrying a guitar case, a sports bag, or nothing. He got the woman's phone number more often when carrying the guitar. The usual problems are apparent, notably the perfect response rate and the number of women who would have to have all decided to walk down this particular street on their own on a single Saturday afternoon.

When I started drafting this post a couple of days ago, Samuel told me that his last correspondence with the journal had been in January of this year, when (from what I have seen) they appeared to suggest that some sort of action might be forthcoming quite quickly, after an investigation by the publisher's ethical committee. I initially wrote here "But since then, nothing has been forthcoming". However, today I have been in copy of an e-mail exchange in which the editors of the journal revealed that they are preparing an expression of concern for the article. (I plan to write a separate blog post about the whole question of expressions of concern.)

The awesome power of procrastination (1)

Last time, we reported that, as part of the investigation into his research conducted by the scientific integrity officials at his university, Dr Guéguen had agreed to retract two articles; however, the deadline to do this had passed, and neither article had been retracted. A few months after that blog post, in October 2019, one of these articles was retracted by the journal "at the request of the Université de Bretagne-Sud" (UBS), Dr Guéguen having apparently not honoured his commitment to do this himself. However, as of this writing, the second of these articles still has not been retracted. I have been in contact with the UBS to ask why they apparently did not ask the journal for this second article to be retracted, and the scientific integrity officer there has told me that he will pass on my message to the Presidents of the two universities involved in the scientific investigation (UBS and Rennes-2). Perhaps something will come of this.

I fought the law, and the law went "meh"

One apparently positive development of the scientific investigation (see previous paragraph) is that the Presidents of UBS and Rennes-2 decided to launch a disciplinary investigation into Dr Guéguen. This took the better part of a year to convene, apparently because they had difficulty finding people to serve on the panel. In the end it was outsourced to the University of Angers.

James Heathers and I gave evidence to this inquiry in October 2019. In my evidence I emphasised that Dr Guéguen's principal defence --- namely, that he had naïvely trusted his students to do good fieldwork, despite them having zero training --- did not hold up, because in many cases his articles would have required input from faculty members and the expenditure of budget money (whereas no funding ever seems to be reported). For example, studies where saliva samples are taken require analyses to be performed in a laboratory; even if this is on the university campus these assays will certainly need to be paid for, and even if there is some remarkable system whereby this is done for free in the name of research, there will exist traces of the request for the analyses to be performed.

Since then we have heard nothing official. But I have been told by two people with inside knowledge that a report exists, and that it states that Dr Guéguen did nothing to violate scientific integrity.

The awesome power of procrastination (2)

Also last time, we noted that we had strong evidence that Dr Guéguen's article entitled "Women’s hairstyle and men’s behavior: A field experiment", published in the Scandinavian Journal of Psychology, was, as a minimum, stolen from the work of three undergraduates, with the added twist that these undergraduates might well have themselves fabricated the study.

Following my open letter to Dr Jerker Rönnberg, the editor-in-chief of the journal, he agreed that he would look into the matter. I have written to him a couple of times since then, but there was no reply or acknowledgement of any kind until, in response to an e-mail that I sent on 29 May 2020, I received an out-of-office (from the e-mail address that is listed for contact on the journal's web site) stating that Dr Rönnberg now has the status of professor emeritus and "will only answering questions/e-mails occasionally". This didn't seem like a very satisfactory state of affairs, so I wrote to the editorial assistant of the journal. Dr Stefan Gustafson. He told me that the editors had discussed the matter with the publisher and then contacted the original reviewers, of whom one didn't respond and the other said they thought the paper was fine. No decision has yet been taken about "Women’s hairstyle and men’s behavior", and I got the impression from Dr Gustafson's e-mail that this is unlikely to happen before a new editor-in-chief is in place. <judge_judy_taps_wristwatch.gif>

The past is a foreign country; they do science differently there

In September 2019, the editors of the International Review of Social Psychology (IRSP) received a report that they had commissioned from Hans IJzerman into the six articles by Dr Guéguen that were published in their journal. This report recommended that two of the articles be retracted immediately, two others be given an expression of concern, and two should be corrected. One of the people asked by Hans to verify the accuracy of the report wrote that "the evidence from the blog posts and statistical investigators supports the conclusions ... that research misconduct likely took place".

Instead of issuing any retractions, however, the editors of IRSP issued five expressions of concern and accepted one correction. As part of their reasoning for why no article should be retracted, they stated that, although "[t]he report concludes misconduct", "the standards for conducting and evaluating research have evolved since [these articles were published]". I will leave it up to the reader to judge whether what took place (or, perhaps more relevantly, did not take place) in these cases was reasonable by the standards of social psychology in the period 2002–2011. Perhaps Diederik Stapel will be getting his PhD back soon; after all, this was his most prolific period too.

That open goal looks so nice, it would be a shame to kick a ball at it

To my knowledge, the only other formal action by a journal in the past year, apart from the response by ISRP (see previous point), is the expression of concern that was issued on 16 March 2020 by the editors of Letters on Evolutionary Behavioral Science regarding this article, which we examined in our original report (in which we showed that the claimed pattern of behaviour by participants was highly unlikely in all conditions of the study):

Guéguen, N. (2012). Risk taking and women’s menstrual cycle: Near ovulation, women avoid a doubtful man. Letters on Evolutionary Behavioral Science, 3, 1–3. https://doi.org/10.5178/lebs.2012.17

This expression of concern ends with the following paragraph, which will sound rather familiar to anyone who has been following this, or indeed almost any other recent story of journals' responses to terrible articles:

Although the investigation committee concluded that there is no decisive evidence of scientific misconduct, they still share Brown and Heathers’s (2017) concerns. Moreover, the above errors in statistics severely discredit the scientific value of Guéguen (2012). In sum, we admit that we do not have decisive evidence to retract the publication of Guéguen (2012). However, we would like to advise readers of LEBS to exercise great caution in interpreting the reported results in Guéguen (2012).

The article is still sitting there as part of the scientific record in the journal, its web page does not mention the expression of concern, and the PDF file has not been modified.

Conclusion

At the risk of letting my attempt at a mask of professionalism slip for a moment: FFS. This whole process is like pulling teeth. There has to be a better way to handle cases of obviously shoddy science than this. Four and a half years after James and I started looking at a huge number of studies that cannot possibly have taken place as described, we have a total of one retraction and seven (soon top be eight) expressions of concern (and it is unclear whether any of those represent a prelude to retraction). If the French academic establishment and the international publishing system can't be bothered to clean up a case as obviously terrible as this in a field with almost no conflicts of interest, what chance is there of anything being done about, say, ethical violations in COVID-19 research?

24 May 2020

The Silence of the RIOs

Just over a month ago, I published these two blog posts. After the first, Daniël Lakens tweeted this:

I hope you forwarded a link to your blog to the university ethics person at these 12 respective institutions? At least a benefit of having so many authors is that 1 uni might take some sort of action?
— Daniël Lakens (@lakens) April 21, 2020

I thought that was a good idea, so I set out to find who the "university ethics person" might be for the 15 co-authors of the article in question. (I wrote directly and separately to the two PhD supervisors of the lead author, as it is he who appears to be prima facie responsible for most of its deficiencies; I also wrote to Nature Scientific Reports outlining my concerns about the article. In both cases I received a serious reply indicating that they were concerned about the situation.)

It turns out that finding the address of the person to whom complaints about research integrity at a university or other institution is not always easy. There were only one or two cases where I was able to do this by following links from the institution's web site, as regular readers of xkcd might have been able to guess. In a few cases I used Google with the site: option to find a person. But about half the time, I couldn't identify anyone. In those cases I looked for the e-mail address of someone who might be the dean or head of department of the author concerned. Hilariously, in one case, the author was the head of department and I ended up writing to the president of the university.

Anyway, by 24 April 2020 I had what looked like a plausible address at all of the different institutions to which the co-authors were affiliated (which turned out to be nine in total, not 12), so I sent this e-mail.

From: Nicholas Brown <nicholas.brown@lnu.se>
Sent: 24 April 2020 16:04
To: [9 people]
Subject: Possible scientific misconduct in an article published in Nature Scientific Reports

First, allow me to apologise if I have addressed this e-mail to any of you in error, and also if my use of the phrase "Research Integrity Officer" in the above salutation is not an accurate summary of your job title. I had some difficulty in establishing, from your institution's web site, who was the correct person to write to for questions of research integrity in many cases, including [list]. In those cases I attempted to identify somebody who appears to have a senior function in the relevant department. In the case of [institution], I only found a general contact address --- I am trying to reach someone who might have responsibility for the ethical conduct of "XXX" in the XXX Department.

I am writing to bring your attention to these [sic; I started drafting the e-mail before I wrote the second post, and not everything about it evolved correctly after that] blog posts, which I published on April 21, 2020: https://steamtraen.blogspot.com/2020/04/some-issues-in-recent-gaming-research.html.

At least one author of the scientific article that is the principal subject of that blog post (Etindele Sosso et al., 2020; https://doi.org/10.1038/s41598-020-58462-0, published on 2020-02-06 in Nature Scientific Reports) lists your institution as their affiliation.

While my phrasing in that public blog post (and a follow-up, which is now linked from the first post) was necessarily conservative, I think it is clear to anyone with even a minimum of relevant scientific training who reads it that there is strong prima facie evidence that the results of the Etindele Sosso et al. (2020) article have been falsified, and perhaps even fabricated entirely. Yet, 15 other scholars, including at least one at your institution (in the absence of errors of interpretation on my part) signed up to be co-authors of this article.

There would seem to be two possibilities in the case of each author.

1. They knew, or should have known, that the reported results were essentially impossible. (Even the Abstract contains claims about the percentage of variance explained by the main independent variable that are utterly implausible on their face.)

2. They did not read the manuscript at all before it was submitted to a Nature group journal, despite the fact that their name is listed as a co-author and included in the "Author contributions" section as having, at least, "contributed to the writing".
It seems to me that either of these constitutes a form of academic misconduct. If these researchers knew that the results were impossible, they are culpable in the publication of falsified results. If they are not --- that is, their defence is that they did not read and understand the implications of the results, even in the Abstract --- then they have made inappropriate claims of authorship (in a journal whose own web site states that it is the 11th most highly cited in the world). Either of these would surely be likely to bring your institution into disrepute.

For your information, I intend to make this e-mail public 30 days from today, accompanied by a one-sentence summary (without, as far as possible, revealing any details that might be damaging to the interests of anyone involved) of your respective institutions' responses until that point. I would hope that, despite the difficult circumstances under which we are all working at the moment, it ought to be able to at least give a commitment to thoroughly investigate a matter of this importance within a month. I mention this because in previous cases where I have made reports of this kind, the modal response from institutional research integrity officers has been no response at all.

Of course, whatever subsequent action you might decide to take in this matter is entirely up to you.

Kind regards,
Nicholas J L Brown, PhD
Linnaeus University

The last-but-one paragraph of that e-mail mentions that, 30 days from the date of the e-mail, I intended to make it public, along with a brief summary of the responses from each institution. The e-mail is above. Here is how each institution responded:

Nottingham Trent University, Nottingham, UK: Stated that they would investigate, and gave me an approximate date by which they anticipated that their investigation would be complete.
Central Queensland University, Rockhampton, Australia: Stated that they would investigate, but with no estimate of how long this would take.
Autonomous University of Nuevo Leon, Monterrey, N.L., Mexico: No reply.
Jamia Millia Islamia, New Delhi, India: No reply.
University of L’Aquila, L’Aquila, Italy: No reply.
Army Share Fund Hospital, Athens, Greece: No reply.
Université de Montréal, Montréal, Québec, Canada: No reply.
University of Limerick, Limerick, Ireland: No reply.
Lero Irish Software Research Centre, Limerick, Ireland: No reply.

By "No reply" here, I mean that I received nothing. No "Undeliverable" message. No out-of-office message. No quick reply saying "Sorry, COVID-19 happened, we're busy". Not "We'll look into it". Not "We won't look into it". Not even "Get lost, there is clearly no case to answer here". Nothing, nada, nichts, rien, zip, in reply to what I (and, apparently, the research integrity people at the two institutions that did reply) think is a polite, professional e-mail, with a subject line that I hope suggests that a couple of minutes of the recipient's time might be a worthwhile investment, in 7 out of 9 cases.

I find this disappointing. I wish I could say that I found it remotely surprising. Maybe I should just be grateful that Daniël's estimate of one institution taking any sort of action was exceeded by 100%.

16 May 2020

The perils of improvising with linear regression: Stedman et al. (in press)

This article has been getting a lot of coverage of various kinds in the last few days, including the regional, UK national, and international news media:

Stedman, M., Davies, M., Lunt, M., Verma, A., Anderson, S. G., & Heald, A. H. (in press). A phased approach to unlocking during the COVID-19 pandemic – Lessons from trend analysis. International Journal of Clinical Practice. https://doi.org/10.1111/ijcp.13528

There doesn't seem to be a typeset version from the journal yet, but you can read the final draft version online here and also download it as a PDF file.

The basic premise of the article is that, according to the authors' model, COVID-19 infections are far more widespread in the community in the United Kingdom(*) than anyone seems to think. Their reasoning works in three stages. First, they built a linear model of the spread of the disease, one of whose predictors was the currently reported number of total cases (i.e., the official number of people who have tested positive for COVID-19). Second, they extrapolated that model to a situation in which the entire population was infected, assuming the the spread continues to be entirely linear. Third, they used the slope of their line to estimate what the official reported number of cases would be at that point. They concluded their model shows that the true number of cases in the population is 150 times larger than the number of positive tests that have been carried out, so that on the day when their data were collected (24 April 2020) 26.8% of the population were already infected.

The above figures are from the Results section of the paper, on p. 11 of the final draft PDF. However, the Abstract contains different numbers, which seem to be based on data from 19 April 2020. The Abstract asserts that the true number of cases in the population may be 237 (versus 150) times the reported number, and that the percentage of the population who had been infected might have been 29% (versus 26.8% several days later). Aside from the question of why the Abstract includes different principal results from the Results section, it would appear to be something of a problem for the authors' assumptions of (ongoing) linearity in the relation between the spread of the disease and the number of reported cases if the slope of their model changed by a factor of one-third over five days.

But it seems to me that, apart from the rather tenuous assumptions of linearity and the validity of extrapolating a considerable way beyond the range of the data (which, to be fair, they mention in their Limitations paragraph), there is an even more fundamental problem with how the authors have used linear regression here. Their regression model contained at least nine covariates(**), and we are told that "The stepwise(***) regression of the local UTLA factors to R_ADIR showed that only one factor total reported cases/1,000 population [sic] was significantly linked" (p. 11). I take this to mean that, if the authors had reported the regression output in a table, this predictor would be the only one whose absolute value was at least twice its standard error. (The article is remarkably short on numerical detail, with neither a table of regression output nor a table of descriptives and correlations of the variables. Indeed, there are many points in the article where a couple of minutes spent on attention to detail would have greatly improved its quality, even in the context of the understandable desire to communicate results rapidly during a pandemic.)

Having established that only one predictor in this ten-predictor regression was statistically significant (in the sense of a 95-year-old throwaway remark by R. A. Fisher), the authors then proceeded to do something remarkable. Remember, they had built this model:

Y = B₀ + B₁X₁ + B₂X₂ + B₃X₃ + ... + B₁₀X₁₀ + Error

(with Error apparently representing 78% of the variance, cf. line 3 of p. 11). But they then dropped ten of those terms (nine regression coefficients multiplied by the values of the predictors, plus the error term) to come up with this model (p. 11):

RADIR = 1.06 - 0.16 x Current Total Cases/1,000

What seems to have happened here is that authors in effect decided to set the regression coefficients B₂ through B₁₀ to zero, apparently because their respective values were less than twice their standard errors and so "didn't count" somehow. However, they retained the intercept (B₀) and the coefficient associated with their main variable of interest (B₁) from the 10-variable regression, as if the presence of the nine covariates had had no effect on the calculation of these values. But of course, those covariates had had an effect on both the estimation of the intercept and the coefficient of the first variable. That was precisely what they were included in the regression for. If the authors had wanted to make a model with just one predictor (the number of current total cases), they could have done so quite simply a one-variable regression. You can't just run a multiple regression and keep the coefficients (with or without the intercept) that you think are important while throwing away the others.

This seems to me to be a rather severe misinterpretation of what a regression model is and how it works. There are many other things that could be questioned about this study(****), and indeed several people are already doing precisely that, but this seems to me to a very fundamental problem with the article, and something that the reviewers really ought to have picked up. The first two authors appear to be management consultants whose qualifications to conduct this sort of analysis are unclear, but the third author's faculty page suggests that he knows his stuff when it comes to statistics, so I'm not sure how this was allowed to happen.

Stedman et al. end with this sentence: "The manuscript is an honest, accurate, and transparent account of the study being reported. No important aspects of the study have been omitted." This is admirable, and I take it to mean that the authors did not in fact run a one-predictor regression to estimate the effect of their main IV of interest on their DV before they decided to run a stepwise regression with nine covariates. However, I suggest that it might be useful if they were to run that one-predictor regression now, and report the results along with those of the multiple regression (cf. Simmons, Nelson, & Simonsohn, 2011, p. 1362, Table 2, point 6). When they do that, they might also consider incorporating the latest testing data and see if the slope of their regression has changed, because since 24 April the number of cases in the UK has more than doubled (240,161 at the moment I am writing this), suggesting that between 54% and 84% of the population has by now become infected, depending on whether we take the numbers from p. 11 of the article or those from the Abstract.

[[ Update 2020-05-18 00:15 UTC: There is a preprint of this paper, available here. It contains the same basic model, which is estimated using data from 11 days earlier than the accepted manuscript. In the preprint, the regression equation (p. 7) is:

R_ADIR = 1.20 - 0.26 x Current Total Cases/1,000

In other words, between the submission date of the preprint and the preparation data of the final manuscript, the slope of the regression line --- which the model assumes would be constant until everyone was infected --- changed from -0.26 to -0.16. And yet the authors did not apparently think that this was sufficient reason to question the idea that the progress of the disease would continue to match their linear model, despite direct evidence that it had failed to do so over the previous 11 days. This is quite astonishing. ]]

(*) Whether the authors claim that their model applies to the UK or just to England is not entirely clear, as both terms appears to be used more or less interchangeably. They use a figure of 60 million as the population of England, although the Office of National Statistics reports figures of 56.3 million for England and 66.8 million for the UK in mid 2019.

(**) I wrote here that there were nine, but one of them is ethnicity, which would typically have been coded as a series of categories, each of which would have functioned as a separate predictor in the regression. But maybe they used some kind of shorthand such as "Percentage who did/didn't identify in the 'White British' category", so I'll continue to assume that there were nine covariates and hence 10 predictors in total.

(***) Stepwise regression is generally not considered a good idea these days. See for example here. Thanks to Stuart Ritchie for this tip and for reading through a draft of this post.

(****) Looking at Figure 4, it occurs to me that a few data points at the top left might well be lifting the left-hand side of the regression line up to some extent, but that's all moot until we know more about the single-variable regression. Also, there is no confidence interval or any other measure of the uncertainty --- even assuming that the model is perfectly linear --- of the estimated reported case rate when the infection rate drops to zero.

23 April 2020

The Mystery of the Missing Authors

What do the following researchers have in common?

Muller G. Hito, Department of Psychology and Sports Science, Justus-Liebig University, Germany
Okito Nakamura, Global Research Department, Ritsumeikan University, Japan
Mitsu Nakamura, The Graduated [sic] University of Advanced Studies, Japan
John Okutemo, Usman University, Sokoto, Nigeria
Eryn Rekgai, Department of Psychology and Sports Science, Justus-Liebig University, Germany

Mbraga Theophile, Kinshasa University, Republic of Congo

Bern S. Schmidt, Department of Fundamental Neuroscience, University of Lausanne, Switzerland

Despite their varied national origins, it seems that the answer is "quite a bit":

1. They seem to collaborate with each other, in various combinations, on short articles with limited empirical content, typically published with less than a week from submission to acceptance. (Some examples: 1 2 3 4 5) The majority of these articles date from 2017, although there are some from 2018 and 2019 as well.

2. Apart from each other, these people have published with almost nobody else, except that:

(a) Four of them have published with Faustin Armel Etindele Sosso (whom I will refer to from now on as FAES), the lead author of the article that I discussed in this post. (Examples: 6 7 8) In one case, FAES is the corresponding author although he is not listed as an actual author of the article. I don't think I have ever seen that before in scholarly publishing.

(b) Two of them have published with an author named Sana Raouafi --- see the specific paragraph on this person towards the end of this post.

3. Whether FAES is a co-author or not, these researchers have a remarkable taste for citing his published work, which typically accounts for between 50% and 100% of the References section of any of their articles.

4. When one of these researchers, rather than FAES, is the corresponding author of an article, they always use a Yahoo or Gmail address. So far I have identified "s.bern@yahoo.com", "mullerhito@yahoo.com", "mitsunaka216@gmail.com", and "okitonaka216@gmail.com". None of these researchers seems to use an institutional e-mail address for correspondence. Of course, this is not entirely illegitimate (for example, if one anticipates moving to another institution in the near future), but it seems quite unusual for none of them to have used their faculty address.

[[ Update 2020-04-27 17:22 UTC: I have identified that "Erin Regai", who I think is the same person as "Eryn Rekgai" but with slightly different spelling, has the e-mail address "eregai216@gmail.com". That makes three people with Gmail addresses ending in 216. It would be interesting to discover whether anybody involved in these authors' publication projects has a birthday on 21/6 (21 June) or 2/16 (February 16). ]]

5. None of these people seems to have an entry in the online staff directory of their respective institutions. (The links under their names at the start of this post all go to their respective ResearchGate profiles, or if they don't have one, RG's collection of their "scientific contributions".) Of course, one can never prove a negative, and some people just prefer a quiet life. So as part of this blog post I am issuing a public appeal: If you know (or, even better, if you are) any of these people, please get in touch with me.

I don't have time to go into all of these individuals in detail, but here are some highlights of what I found in a couple of cases. (For the two authors named Nakamura, I am awaiting a response to inquiries that I sent to their respective institutions; I hope that readers will forgive me for publishing this post before waiting for a reply to those inquiries, given the current working situation at many universities around the world.)

[[ Update 2020-04-24 21:24 UTC: Ms. Mariko Kajii of the Office of Global Planning and Partnerships at The Ritsumeikan Trust has confirmed to me that nobody named "Okito Nakamura" is known to that institution. ]]

[[ Update 2020-04-24 23:37 UTC: Mitsu Nakamura's ResearchGate page claims that Okito Naakmura is a member of Mitsu's lab at "The graduated [sic] University of Advanced studies". It seems strange that someone would be affiliated with one university (even if that university denied any knowledge of them, cf. my previous update) while working in a lab at another. Meanwhile, Mitsu Nakamura's name does not appear in Japan's national database of researchers. ]]

Muller G. Hito

For this researcher --- who does not seem to be quite sure how their own name is structured(*), as they sometimes appear at the top of an article as "Hito G. Muller" --- we have quite extensive contact information, for example in this article (which cites 18 references, 12 of them authored by FAES).

I looked up that phone number and found that it does indeed belong to someone in the Department of Psychology and Sports Science at Justus-Liebig University, namely Prof. Dr. Hermann Müller. For a moment I thought that maybe Prof. Dr. Müller likes to call himself "Hito", and maybe he got his first and last names mixed up when correcting the proofs of his article. But as my colleague Malte Elson points out, no German person named "Müller" would ever allow their name to be spelled "Muller" without the umlaut. (In situations where the umlaut is not available, for example in an e-mail address, it is compensated for by adding an e to the vowel, e.g., in this case, "Mueller".)

In any case, Malte contacted Prof. Dr. Müller, who assured him that he is not "Hito D. Muller" or "Muller D. Hito". Nor has Dr. Müller ever heard of anyone with that name, or anyone with a name like "Eryn Rekgai", in the department where he works.

Bern S. Schmidt

Bern Schmidt is another author who likes to permute the components of their name. They have published articles as "Bern S. Schmidt", "Bern Schmidt S.", "Bern, SS", and perhaps other combinations. Their bio on their author page on the web site of Insight Medical Publishing, which publishes a number of the journals that contain the articles that are linked to throughout this post, says:

Dr Bern S. Schmidt is a neuroscientist and clinical tenure track [sic] of the CHUV, working in the area of fundamental neuroscience and psychobiological factors influencing appearance of central nervous disorders and neurodegenerative disorders such as Alzheimer and Dementia. He holds a medical degree at the University of Victoria, follow by a residency program at The Shiga University of Medicine and a postdoctoral internship at the Waseda University.

I assume that "CHUV" here refers to "Centre Hospitalier Universitaire Vaudois", the teaching hospital of the University of Lausanne where Dr. Schmidt claims to be affiliated in the Department of Fundamental Neuroscience. But a search of the university's web site did not find any researcher with this name. I asked somebody who has access to a directory of all past and present staff members of the University of Lausanne if they could find anyone with a name that corresponds even partially to this name, and they reported that they found nothing. Meanwhile, The University of Victoria has no medical degree programme, and their neuroscience programme has no trace of anyone with this name.

[[ Update 2020-04-27 17:24 UTC: A representative of the University of Lausanne has confirmed to me that they can find no trace of anybody named "Bern Schmidt" at their institution. ]]

(A minor detail, but one that underscores how big a rabbit hole this story is: Dr. Schmidt seems to have an unusual telephone number. This article lists it as "516-851-8564", which looks more like a North American number than a Swiss one. Indeed, it is identical to the number given in this apparently unrelated article in the same journal for the corresponding author Hong Li of the Department of Neuroscience at Yale University School of Medicine. Dr. Hong Li's doubtless vital --- after all, she is at Yale --- contribution to neuroscience research was accepted within 6 days of being submitted, presumably having been pronounced flawless by the prominent scholars who performed the rigorous peer review process for the prestigious Journal of Translational Neurosciences. It is, however, slightly disappointing that typesetting standard at this paragon of scientific publishing do not extend to removing one author's phone number when typesetting the next one to be published on the same day. If anyone knows where Dr. Bern Schmidt is, perhaps they could mention this to them, so that this important detail can be corrected. We wouldn't want Dr. Hong Li's valuable Yale neuroscientist time to be wasted answering calls intended for Dr. Schmidt.

These authors' recent dataset

The only activity that I have been able to identify from any of these authors in the last few months is the publication of this dataset, which was uploaded to Mendeley on March 22, 2020. As well as FAES, the authors are listed as HG Muller, E. Regai [sic], and O. Nakamura. From the "Related links" on that page, it appears that this dataset is a subset (total N=750) of the 10,566 cases that make up the sample described in the Etindele Sosso et al. article in Nature Scientific Reports that was the subject of my previous blog post.

However, a few things about these data are not entirely consistent with that article. For example, while the per-country means for the variables "Age", "Mean Hours of Gaming/week", and "Mean months of gaming/gamer" correspond to the published numbers in Table 2 of the article in five out of six cases (for "Mean months of gaming/gamer" in the sample from Gabon the mean is 15.77, whereas in the article the integer-rounded value reported was 15), all of the standard deviations in the dataset are considerably higher than those that were published, by factors ranging from 1.3 to 5.1.

Furthermore, there are some patterns in the distribution of scores in the four outcome variables (ISI, EDS, HADS-A subscale, and HADS-D subscale) that are difficult to explain as being the results of natural processes. For all four of these measures in the N=301 sample from Tunisia, and three of them (excluding the EDS) in the N=449 sample from Gabon, between 77% and 92% of the individual participants' total scores on each of these the subscales are even numbers. For the EDS in the sample from Gabon, 78% of the scores are odd numbers. In the Gabon sample, it is also noticeable that the ISI score for every participant is exactly 2 higher than their HADS-A score and exactly 3 higher than their EDS score; the HADS-A score is also 2 higher than the HADS-D for 404 out of 449 participants.

It is not clear to me why Hito G. Muller, Eryn Re[k]gai, and Okito Nakamura might be involved with the publication of this dataset, when their names were not listed as authors of the published article. But perhaps they have very high ethical standards and did not feel that their contribution to the curation of the data, whatever that might have been, merited a claim of authorship in the 11th most highly cited scientific journal in the world.

The other author who does seem to exist

There is one co-author on a few of the articles mentioned above who does actually appear to exist. This is Sana Raouafi, who reports an affiliation with the Department of Biomedical Engineering at the Polytechnique Montréal. The records office of that institution informed me that she was awarded her PhD on January 27, 2020. I have no other contact information for her, nor do I know whether she genuinely took part in the authorship of these strange articles, or what her relationship with FAES (or, if they exist, any of the other co-authors) might be.

Supporting file

There is one supporting file for this post here:
- Muller-dataset-with-pivots.xls: An Excel file containing my analyses of the Muller et al. dataset, mentioned above in the section "These authors' recent dataset". The basic worksheets from the published dataset have been enhanced with two sheets of pivot tables, illustrating the issues with the outcome measures that I described.

Acknowledgements

Thanks to Elisabeth Bik, Malte Elson, Danny Garside, Steve Lindsay, Stuart Ritchie, and Yannick Rochat for their help in attempting to track down these elusive researchers. Perhaps others will have more luck than us.

(*) I am aware that different customs exist in different countries regarding the order in which "given" and "family" names are written. For example, in several East Asian countries, but also in Hungary, it is common to write the family name first. Interestingly, there is often some ambiguity about this among speakers of French. But as far as I know, German speakers, like English speakers, always use put their given name first and their family name last, unless there is a requirement to invert this order for alphabetisation purposes. And of course, in some parts of the world, the whole idea of "family names" is much more complicated than in Western countries. It's a fascinating subject that, alas, I do not have time to explore here.

21 April 2020

Some issues in a recent gaming research article: Etindele Sosso et al. (2020)

Research into the possibly problematic aspects of gaming is a hot topic. But most studies in this area have focused on gamers in Europe and North America. So a recent article in Nature Scientific Reports, featuring data from over 10,000 African gamers, would seem to be an important landmark for this field. However, even though I am an outsider to gaming research, it seems to my inexpert eye that this article may have a few wrinkles that need ironing out.

Let’s start with the article reference. It has 16 authors, and the new edition of the APA Publication Manual says that we now have to list up to 20 authors’ names in a reference, so let’s take a deep breath:

Etindele Sosso, F. A., Kuss, D. J., Vandelanotte, C., Jasso-Medrano, J. L., Husain, M. E., Curcio, G., Papadopoulos, D., Aseem, A., Bhati, P., Lopez-Rosales, F., Ramon Becerra, J., D’Aurizio, G., Mansouri, H., Khoury, T., Campbell, M., & Toth, A. J. (2020). Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries. Nature Scientific Reports, 10, 1937. https://doi.org/10.1038/s41598-020-58462-0

(The good news is that it is an open access article, so you can just follow the DOI link and download the PDF file.)

Etindele Sosso et al. (2020) investigated the association between gaming and the four health outcomes mentioned in the title. According to the abstract, the results showed that “problematic and addicted gamers show poorer health outcomes compared with non-problematic gamers”, which sounds very reasonable to me as an outsider to the field. A survey that took about 20 minutes to complete was e-mailed to 53,634 participants, with a 23.64% response rate. After eliminating duplicates and incomplete forms, a total of 10,566 gamers were used in the analyses. The “type of gamer” of each participant was classified as “non-problematic”, “engaged”, “problematic”, or “addicted”, depending on their scores on a measure of gaming addiction, and the relations between this variable, other demographic information, and four health outcomes were examined.

The 16 authors of the Etindele Sosso et al. (2020) article report affiliations at 12 different institutions in 8 different countries. According to the “Author contributions” section, the first three authors “contributed equally to this work” (I presume that this means that they did the majority of it); 12 others (all except Papadopoulos, it seems) “contributed to the writing”; the first three authors plus Papadopoulos “contributed to the analyses”; and five (the first three authors, plus Campbell and Toth) “write [sic] the final form of the manuscript”. So this is a very impressive international collaboration, with the majority of the work apparently being split between Canada, the UK, and Australia, and it ought to represent a substantial advance in our understanding of how gaming affects mental and physical health in Africa.

Funding

Given the impressive set of authors and the large scale of this international project (data collection alone took 19 or 20 months, from November 2015 to June 2017), it is somewhat surprising that Etindele Sosso et al.’s (2020) article reports no source of funding. Perhaps everyone involved contributed their time and other resources for free, but there is not even a statement that no external funding was involved. (I am quite surprised that this last element is apparently not mandatory for articles in the Nature family of journals.) The administrative arrangements for the study, involving for example contacting the admissions offices of universities in nine countries and arranging for their e-mail lists to be made available, with appropriate guarantees that each university’s and country’s standards of research ethics would be respected, must have been considerable. The participants completed an online questionnaire, which might well have involved some monetary cost, whether directly paid to a survey hosting company or using up some part of a university’s agreed quota with such a company. Just publishing an Open Access article in Nature Scientific Reports costs, according to the journal’s web site, $1,870 plus applicable taxes.

Ethical approval

One possible explanation for the absence of funding information—although this would still constitute rather sloppy reporting, since as noted in the previous paragraph funding typically doesn’t just pay for data collection—might be if the data had already been collected as part of another study. No explicit statement to this effect is made in the Etindele Sosso et al. (2020) article, but at the start of the Methods section, we find “This is a secondary analysis of data collected during the project MHPE approved by the Faculty of Arts and Science of the University of Montreal (CERAS-2015-16-194-D)”. So I set out to look for any information about the primary analysis of these data.

I searched online to see if “project MHPE” might perhaps be a large data collection initiative from the University of Montreal, but found nothing. However, in the lead author’s Master’s thesis, submitted in March 2018 (full text PDF file available here—note that, apart from the Abstract, the entire document is written in French, but fortunately I am fluent in that language), we find that “MHPE” stands for “Mental Health profile [sic] of Etindele” (p. 5), and that the research in that thesis was covered by a certificate from the ethical board of the university that carries exactly the same reference number. I will therefore tentatively conclude that this is the “project MHPE” referred to in the Etindele Sosso et al. (2020) article.

However, the Master’s thesis describes how data were collected from a sample (prospective size, 12,000–13,000; final size 1,344) of members of the University of Montreal community, collected between November 2015 and December 2016. The two studies—i.e., the one reported in the Master’s thesis and the one reported by Etindele et. al (2020)—each used five measures, of which only two—the Insomnia Severity Index (ISI) and the Hospital Anxiety and Depression Scale (HADS)—were common to both. The questionnaires administered to the participants in the Montreal study included measures of cognitive decline and suicide risk, and it appears from p. 27, line 14 of the Master’s thesis that participants were also interviewed (although no details are provided of the interview procedure). All in all, the ethical issues involved in this study would seem to be rather different to those involved in asking people by e-mail about their gaming habits. Yet it seems that the ethics board gave its approval, on a single certificate, for the collection of two sets of data from two distinct groups of people in two very different studies: (a) a sample of around 12,000 people from the lead author’s local university community, using repeated questionnaires across a four-month period as well as interviews; and (b) a sample of 50,000 people spread across the continent of Africa, using e-mail solicitation and an online questionnaire. This would seem to be somewhat unusual.

Meanwhile, we are still no nearer to finding out who funded the collection of data in Africa and the time taken by the other authors to make their (presumably extensive, in the case of the second and third authors) personal contributions to the project. On p. 3 of his Master’s thesis, the author thanks (translation by me) “The Department of Biological Sciences and the Centre for Research in Neuropsychology and Cognition of the University of Montreal, which provided logistical and financial support to the success of this work”, but it is not clear that “this work” can be extrapolated beyond the collection of data in Montreal to include the African project. Nor do we have any more idea about why Etindele Sosso et al. (2020) described their use of the African data as a "secondary analysis", when it seems, as far as I have been able to establish, that there has been no previously published (primary) analysis of this data set.

Results

Further questions arise when we look at the principal numerical results of Etindele Sosso et al.’s (2020) article. On p. 4, the authors report that “4 multiple linear regression analyses were performed (with normal gaming as reference category) to compare the odds for having these conditions [i.e., insomnia, sleepiness, anxiety, and depression] (which are dependent variables) for different levels of gaming.” I’m not sure why the authors would perform linear, as opposed to logistic, regressions to compare the odds of someone in a given category having a specific condition relative to someone in a reference category, but that’s by no means the biggest problem here.

Etindele Sosso et al.’s (2020) Table 3 lists, for each of the four health outcome variables, the regression coefficients and associated test statistics for each of the predictors in their study. Before we come to these numbers for individual variables, however, it is worth looking at the R-squared numbers for each model, which range from .76 for depression to .89 for insomnia. Although these are actually labelled as “ΔR²”, I assume that they represent the total variance explained by the whole model, rather than a change in R-squared when “type of gamer” is added to the model that contains only the covariates. (That said, however, the sentence “Gaming significantly contributed to 86.9% of the variance in insomnia, 82.7% of the variance in daytime sleepiness and 82.3% of the variance in anxiety [p < 0.001]” in the Abstract does not make anything much clearer.) But whether these numbers represent the variance explained by the whole model or just by the “type of gamer” variable, they constitute remarkable results by any standard. I wonder if anything in the prior sleep literature has ever predicted 89% of the variance explained by a measure of insomnia, apart perhaps from another measure of insomnia.

Now let’s look at the details of Table 3. In principle there are seven variables (“Type of Gamers [sic]” being the main one of interest, plus the demographic covariates Age, Sex, Education, Income, Marital status, and Employment status), but because all of these are categorical, each of the levels except the reference category will have been a separate predictor in the regression, giving a total of 17 predictors. Thus, across the four models, there are 68 lines in total reporting regression coefficients and other associated statistics. The labels of the columns seem to be what one would expect from reports of multiple regression analyses: B (unstandardized regression coefficient), SE (standard error, presumably of B), β (standardized regression coefficient), t (the ratio between B and SE), Sig (the p value associated with t), and the upper and lower bounds of the 95% confidence interval (again, presumably of B).

The problem is that none of the actual numbers in the table seem to obey the relations that one would expect. In fact I cannot find a way in which any of them make any sense at all. Here are the problems that I identified:

- When I compute the ratio B/SE, and compare it to column t (which should give the same ratio), the two don’t even get close to being equal in any of the 68 lines. Dividing the B/SE ratio by column t gives results that vary from 0.0218 (Model 2, Age, 30–36) to 44.1269 (Model 1, Type of Gamers, Engaged), with the closest to 1.0 being 0.7936 (Model 4, Age, 30–36) and 1.3334 (Model 3, Type of Gamers, Engaged).

- Perhaps SE refers to the standard error of the standardized regression coefficient (β), even though the column SE appears to the left of the column β? Let’s divide β by SE and see how the t ratio compares. Here, we get results that vary from 0.0022 (Model 2, Age, 30–36) to 11.7973 (Model 1, Type of Gamers, Engaged). The closest we get to 1.0 is with values of 0.7474 (Model 3, Marital Status, Engaged) and 1.0604 (Model 3, Marital Status, Married). So here again, none of the β/SE calculations comes close to matching column t.

- The p values do not match the corresponding t statistics. In most cases this can be seen by simple inspection. For example, on the first line of Table 3, it should be clear that a t statistic of 9.748 would have a very small p value indeed (in fact, about 1E−22) rather than .523. In many cases, even the conventional statistical significance status (either side of p = .05) of the t value doesn’t match the p value. To get an idea of this, I made the simplifying assumption (which is not actually true for the categories “Age: 36–42”, “Education: Doctorate”, and “Marital status: Married”, but individual inspection of these shows that my assumption doesn’t change much) that all degrees of freedom were at least 100, so that any t value with a magnitude greater than 1.96 would be statistically significant at the .05 level. I then looked to see if t and p were the same side of the significance threshold; they were not in 29 out of 68 cases.

- The regression coefficients are not always contained within their corresponding confidence intervals. This is the case for 29 out of 68 of the B (unstandardized) values. I don’t think that the confidence intervals are meant to refer to the standardized coefficients (β), but just for completeness, 63 out of 68 of these fall outside the reported 95% CI.

- Whether the regression coefficient falls inside the 95% CI does not correspond with whether the p value is below .05. For both the unstandardized coefficients (B) and the standardized coefficients (β)—which, again, the CI probably doesn’t correspond to, but it’s quick and cheap to look at the possibility anyway—this test fails in 41 out of 68 cases.

There are some further concerns with Table 3:

- In the third line (Model 1, “Type of Gamers”, “Problematic”) the value for β is 1.8. Now it is actually possible to have a standardized regression coefficient with a magnitude above 1.0, but its existence usually means that you have big multicollinearity problems, and it’s typically very hard to interpret such a coefficient. It’s the kind of thing that at least one of the four authors who reported in the "Author contributions" section of the article that they "contributed to the analyses" would normally be expected to pick up on and discuss, but no such discussion is to be found.

- From Table 1, we can see that there were zero participants in the “Age” category 42–48, and zero participants in the “Education” category “Postdoctorate”. Yet, in Table 3, for all four models, these categories have non-zero regression coefficients and other statistics. It is not clear to me how one can obtain a regression coefficient or standard error from a categorical variable that corresponds to zero cases (and, hence, when coded has a mean and standard deviation of 0).

- There is a surprisingly high number of repetitions of exactly the same value, typically to 3 decimal places, within the same variable, category, and absolute value of the statistic from one model to another. For example, the reported value in the column t for the variable “Age” and category “24–30” is 29.741 in both Models 1 and 3. For the variable “Employment status” and category “Employed”, the upper bound of the 95% confidence interval is the same (2.978) in all four models. This seems quite unlikely to be the result of chance, given the relatively large sample sizes that are involved for most of the categories (cf. Brown & Heathers, 2019), so it is not clear how these duplicates could have arisen.

Table 3 from Etindele et al. (2020), with duplicated values (considering the same variable and category across models) highlighted with a different colour for each set of duplicates. Two pairs are included where the sign changed but the digits remained identical; however, p values that were reported as 0.000 are ignored. To find a duplicate, first identify a cell that is outlined in a particular colour, then look up or down the table for one or more other cells with the same outline colour in the analogous position for one or more other models.

The preprint

It is interesting to compare Etindele Sosso et al.’s (2020) article with a preprint entitled “Insomnia and problematic gaming: A study in 9 low- and middle-income countries” by Faustin Armel Etindele Sosso and Daria J. Kuss (who also appears to be the second author of the published article), which is available here. That preprint reports a longitudinal study, with data collected at multiple time points—presumably four, including baseline, although only “after one months, six months, and 12 months” (p. 8) is mentioned—from a sample of people (initial size 120,460) from nine African countries. This must therefore be an entirely different study from the one reported in the published article, which did not use a longitudinal design and had a prospective sample size of 53,634. Yet, by an astonishing coincidence, the final sample retained for analysis in the preprint consisted of 10,566 participants, which is exactly the same as the published article. The number of men (9,366) and women (1,200) was also identical in the two samples. However, the mean and standard deviation of their ages was different (M=22.33 years, SD=2.0 in the preprint; M=24.0, SD=2.3 in the published article). The number of participants in each of the nine countries (Table 2 of both the preprint and the published article) is also substantially different for each country between the two papers, and with two exceptions—the ISI and the well-known Hospital Anxiety and Depression Scale (HADS)—different measures of symptoms and gaming were used in each case.

Another remarkable coincidence between the preprint and Etindele Sosso et al.’s (2020) published article, given that we are dealing with two distinct samples, occurs in the description of the results obtained from the sample of African gamers on the Insomnia Severity Index. On p. 3 of the published article, in the paragraph describing the respondents’ scores on the ISI, we find: “The internal consistency of the ISI was excellent (Cronbach’s α = 0.92), and each individual item showed adequate discriminative capacity (r = 0.65–0.84). The area under the receiver operator characteristic curve was 0.87 and suggested that a cut-off score of 14 was optimal (82.4% sensitivity, 82.1% specificity, and 82.2% agreement) for detecting clinical insomnia”. These two sentences are identical, in every word and number, to the equivalent sentences on p. 5 of the preprint.

Naturally enough, because the preprint and Etindele Sosso et al.’s (2020) published article describe entirely different studies with different designs, and different sample sizes in each country, there is little in common between the Results sections of the two papers. The results in the preprint are based on repeated-measures analyses and include some interesting full-colour figures (the depiction of correlations in Figure 1, on p. 10, is particularly visually attractive), whereas the results of the published article consist mostly of a fairly straightforward summary, in sentences, of the results from the tables, which describe the outputs of linear regressions.

Figure 1 from the preprint by Etindele Sosso and Kuss (2018, p. 10). This appears to use an innovative technique to illustrate the correlation between two variables.

However, approximately 80% of the sentences in the introduction of the published article, and 50% of the sentences in the Discussion section, appear (with only a few cosmetic changes) in the preprint. This is interesting, not only because it would be quite unusual for a preprint of one study to be repurposed to describe en entirely different one, but also because it suggests that the addition of 14 authors between the publication of the preprint and the Etindele Sosso et al. (2020) article resulted in the addition of only about 1,000 words to these two parts of the manuscript.

The Introduction section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) published article (right). Sentences highlighted in yellow are common to both papers.

The Discussion section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) article (right). Sentences highlighted in yellow are common to both papers.

Another (apparently unrelated) preprint contains the same insomnia results

It is also perhaps worth noting that the summary of the participants’ results on the ISI measure—which, as we saw above, was identical in every word and number between the preprint and Etindele Sosso et al. (2020)’s published article—also appears, again identical in every word and number, on pp. 5–6 of a 2019 preprint by the lead author, entitled “Insomnia, excessive daytime sleepiness, anxiety, depression and socioeconomic status among customer service employees in Canada”, which is available here [PDF]. This second preprint describes a study of yet another different sample, namely 1,200 Canadian customer service workers. If this is not just another remarkable coincidence, it would suggest that the author may have discovered some fundamental invariant property of humans with regard to insomnia. If so, one would hope that both preprints could be peer reviewed most expeditiously, to bring this important discovery to the wider attention of the scientific community.

Other reporting issues from the same laboratory

The lead author of the Etindele Sosso et al. (2020) article has published even more studies with substantial numbers of participants. Here are two such articles, which have 41 and 35 citations, respectively, according to Google Scholar:

Etindele Sosso, F. A., & Rauoafi, S. (2016). Brain disorders: Correlation between cognitive impairment and complex combination. Mental Health in Family Medicine, 12, 215–222. https://doi.org/10.25149/1756-8358.1202010

Etindele Sosso, F. A. (2017a). Neurocognitive game between risk factors, sleep and suicidal behaviour. Sleep Science, 10(1), 41–46. https://doi.org/10.5935/1984-0063.20170007

In the 2016 article, 1,344 respondents were assessed for cognitive deficiencies; 71.7% of the participants were aged 18–24, 76.2% were women, and 62% were undergraduates. (These figures all match those that were reported in the lead author’s Master’s thesis, so we might tentatively assume that this study used the same sample.) In the 2017 article, 1,545 respondents were asked about suicidal tendencies, with 78% being aged 18–24, 64.3% women, and 71% undergraduates. Although these are clearly entirely different samples in every respect, the tables of results of the two studies are remarkably similar. Every variable label is identical across all three tables, which might not be problematic in itself if similar predictors were used for all of the different outcome variables. More concerning, however, is the fact that of the 120 cells in Tables 1 and 2 that contain statistics (mean/SD combinations, p values other than .000, regression coefficients, standard errors, and confidence intervals), 58—that is, almost half—are identical in every digit. Furthermore, the entirety of Table 3—which shows the results of the logistic regressions, ostensibly predicting completely different outcomes in completely different samples—is identical across the two articles (52 out of 52 numbers). One of the odds ratios in Table 3 has the value 1133096220.169 (again, in both articles). There does not appear to be an obvious explanation for how this duplication could have arisen as the result of a natural process.

Left: The tables of results from Etindele Sosso and Raouafi (2016). Right: The tables of results from Etindele Sosso (2017a). Cells highlighted in yellow are identical (same variable name, identical numbers) in both articles.

The mouse studies

Further evidence that this laboratory may have, at the very least, a suboptimal approach to quality control when it comes to the preparation of manuscripts comes from the following pair of articles, in which the lead author of Etindele Sosso et al. (2020) reported the results of some psychophysiological experiments conducted on mice:

Etindele Sosso, F. A. (2017b). Visual dot interaction with short-term memory. Neurodegenerative Disease Management, 7(3), 182–190. https://doi.org/10.2217/nmt-2017-0012

Etindele Sosso, F. A., Hito, M. G., & Bern, S. S. (2017). Basic activity of neurons in the dark during somnolence induced by anesthesia. Journal of Neurology and Neuroscience, 8(4), 203–207. https://doi.org/10.21767/2171-6625.1000203 [1]

In each of these two articles (which have 28 and 24 Google Scholar citations, respectively), the neuronal activity of mice when exposed to visual stimuli under various conditions was examined. Figure 5 of the first article shows the difference between the firing rates of the neurons of a sample of an unknown number of mice (which could be as low as 1; I was unable to determine the sample size with any great level of certainty by reading the text) in response to visual stimuli that were shown in different orientations. In contrast, Figure 3 of the second article represents the firing rates of two different types of brain cell (interneurons and pyramidal cells) before and after a stimulus was applied. That is, these two figures represent completely different variables in completely different experimental conditions. And yet, give or take the use of dots of different shapes and colours, they appear to be exactly identical. Again, it is not clear how this could have happened by chance.

Top: Figure 5 from Etindele Sosso (2017b). Bottom: Figure 3 from Etindele Sosso et al. (2017). The dot positions and axis labels appear to be identical. Thanks are due to Elisabeth Bik for providing a second pair of eyes.

Conclusion

I find it slightly surprising that 16 authors—all of whom, we must assume because of their formal statements to this effect in the “Author contributions” section, made substantial contributions to the Etindele et al. (2020) article in order to comply with the demanding authorship guidelines of Nature Research journals (specified here)—apparently failed to notice that this work contained quite so many inconsistencies. It would also be interesting to know what the reviewers and action editor had to say about the manuscript prior to its publication. The time between submission and acceptance was 85 days (including the end of year holiday period), which does not suggest that a particularly extensive revision process took place. In any case, it seems that some sort of corrective action may be required for this article, in view of the importance of the subject matter for public policy.

Supporting files

I have made the following supporting files available here:

- Etindele-et-al-Table3-numbers.xls: An Excel file containing the numbers from Table 3 of Etindele et al.’s (2020) article, with some calculations that illustrate the deficiencies in the relations between the statistics that I mentioned earlier. The basic numbers were extracted by performing a copy/paste from the article’s PDF file and using text editor macro commands to clean up the structure.

- “(Annotated) Etindele Sosso, Raouafi - 2016 - Brain Disorders - Correlation between Cognitive Impairment and Complex Combination.pdf” and “(Annotated) Etindele Sosso - 2017 - Neurocognitive Game between Risk Factors, Sleep and Suicidal Behaviour.pdf”: Annotated versions of the 2016 and 2017 articles mentioned earlier, with identical results in the tables highlighted.

- “(Annotated) Etindele Sosso, Kuss - 2018 (preprint) - Insomnia and problematic gaming - A study in 9 low- and middle-income countries.pdf” and “(Annotated) Etindele Sosso et al. - 2020 - Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries.pdf” Annotated versions of the 2018 preprint and the published Etindele et al. (2020) article, with overlapping text highlighted.

- Etindele-2016-vs-2017.png, Etindele-et-al-Table3-duplicates.png, Etindele-mouse-neurons.png, Etindele Sosso-Kuss-Preprint-Figure1.png, Preprint-article-discussion-side-by-side.png, Preprint-article-intro-side-by-side.png: Full-sized versions of the images from this blog post.

Reference

Brown, N. J. L., & Heathers, J. A. J. (2019). Rounded Input Variables, Exact Test Statistics (RIVETS): A technique for detecting hand-calculated results in published research. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/ctu9z

[[ Update 2020-04-21 13:14 UTC: Via Twitter, I have learned that I am not the first person to have publicly questioned the Etindele et al. (2020) article. See Platinum Paragon's blog post from 2020-04-17 here. ]]

[[ Update 2020-04-22 13:43 UTC: Elisabeth Bik has identified two more articles by the same lead author that share an image (same chart, different meaning). See this Twitter thread. ]]

[[ Update 2020-04-23 22:48 UTC: See my related blog post here, including discussion of a partial data set that appears to correspond to the Etindele et al. (2020) article. ]]

[[ Update 2020-06-04 11:50 UTC: I blogged about the reaction (or otherwise) of university research integrity departments to my complaint about the authors of the Etindele Sosso et al. article here. ]]

[[ Update 2020-06-04 11:55 UTC: The Etindele Sosso et al. article has been retracted. The retraction notice can be found here. ]]

[1] This article was accepted 12 days after submission, which is presumably entirely unrelated to the fact that the lead author is listed here as the journal’s specialist editor for Neuropsychology and Cognition.