24 May 2020

The Silence of the RIOs

Just over a month ago, I published these two blog posts. After the first, Daniël Lakens tweeted this:
I thought that was a good idea, so I set out to find who the "university ethics person" might be for the 15 co-authors of the article in question. (I wrote directly and separately to the two PhD supervisors of the lead author, as it is he who appears to be prima facie responsible for most of its deficiencies; I also wrote to Nature Scientific Reports outlining my concerns about the article. In both cases I received a serious reply indicating that they were concerned about the situation.)

It turns out that finding the address of the person to whom complaints about research integrity at a university or other institution is not always easy. There were only one or two cases where I was able to do this by following links from the institution's web site, as regular readers of xkcd might have been able to guess. In a few cases I used Google with the site: option to find a person. But about half the time, I couldn't identify anyone. In those cases I looked for the e-mail address of someone who might be the dean or head of department of the author concerned. Hilariously, in one case, the author was the head of department and I ended up writing to the president of the university.

Anyway, by 24 April 2020 I had what looked like a plausible address at all 12 institutions, so I sent this e-mail.
From: Nicholas Brown <nicholas.brown@lnu.se>
Sent: 24 April 2020 16:04
To: [9 people]
Subject: Possible scientific misconduct in an article published in Nature Scientific Reports
First, allow me to apologise if I have addressed this e-mail to any of you in error, and also if my use of the phrase "Research Integrity Officer" in the above salutation is not an accurate summary of your job title. I had some difficulty in establishing, from your institution's web site, who was the correct person to write to for questions of research integrity in many cases, including [list]. In those cases I attempted to identify somebody who appears to have a senior function in the relevant department. In the case of [institution], I only found a general contact address --- I am trying to reach someone who might have responsibility for the ethical conduct of "XXX" in the XXX Department.
I am writing to bring your attention to these [sic; I started drafting the e-mail before I wrote the second post, and not everything about it evolved correctly after that] blog posts, which I published on April 21, 2020: https://steamtraen.blogspot.com/2020/04/some-issues-in-recent-gaming-research.html.
At least one author of the scientific article that is the principal subject of that blog post (Etindele Sosso et al., 2020; https://doi.org/10.1038/s41598-020-58462-0, published on 2020-02-06 in Nature Scientific Reports) lists your institution as their affiliation. 
While my phrasing in that public blog post (and a follow-up, which is now linked from the first post) was necessarily conservative, I think it is clear to anyone with even a minimum of relevant scientific training who reads it that there is strong prima facie evidence that the results of the Etindele Sosso et al. (2020) article have been falsified, and perhaps even fabricated entirely. Yet, 15 other scholars, including at least one at your institution (in the absence of errors of interpretation on my part) signed up to be co-authors of this article.
There would seem to be two possibilities in the case of each author.
1. They knew, or should have known, that the reported results were essentially impossible. (Even the Abstract contains claims about the percentage of variance explained by the main independent variable that are utterly implausible on their face.)
2. They did not read the manuscript at all before it was submitted to a Nature group journal, despite the fact that their name is listed as a co-author and included in the "Author contributions" section as having, at least, "contributed to the writing".
It seems to me that either of these constitutes a form of academic misconduct. If these researchers knew that the results were impossible, they are culpable in the publication of falsified results. If they are not --- that is, their defence is that they did not read and understand the implications of the results, even in the Abstract --- then they have made inappropriate claims of authorship (in a journal whose own web site states that it is the 11th most highly cited in the world). Either of these would surely be likely to bring your institution into disrepute.
For your information, I intend to make this e-mail public 30 days from today, accompanied by a one-sentence summary (without, as far as possible, revealing any details that might be damaging to the interests of anyone involved) of your respective institutions' responses until that point. I would hope that, despite the difficult circumstances under which we are all working at the moment, it ought to be able to at least give a commitment to thoroughly investigate a matter of this importance within a month. I mention this because in previous cases where I have made reports of this kind, the modal response from institutional research integrity officers has been no response at all.
Of course, whatever subsequent action you might decide to take in this matter is entirely up to you.
Kind regards,
Nicholas J L Brown, PhD
Linnaeus University
The last-but-one paragraph of that e-mail mentions that, 30 days from the date of the e-mail, I intended to make it public, along with a brief summary of the responses from each institution. The e-mail is above. Here is how each institution responded:

Nottingham Trent University, Nottingham, UK: Stated that they would investigate, and gave me an approximate date by which they anticipated that their investigation would be complete.
Central Queensland University, Rockhampton, Australia: Stated that they would investigate, but with no estimate of how long this would take.
Autonomous University of Nuevo Leon, Monterrey, N.L., Mexico: No reply.
Jamia Millia Islamia, New Delhi, India: No reply.
University of L’Aquila, L’Aquila, Italy: No reply.
Army Share Fund Hospital, Athens, Greece: No reply.
Université de Montréal, Montréal, Québec, Canada: No reply.
University of Limerick, Limerick, Ireland: No reply.
Lero Irish Software Research Centre, Limerick, Ireland: No reply.

By "No reply" here, I mean that I received nothing. No "Undeliverable" message. No out-of-office message. No quick reply saying "Sorry, COVID-19 happened, we're busy". Not "We'll look into it". Not "We won't look into it". Not even "Get lost, there is clearly no case to answer here". Nothing, nada, nichts, rien, zip, in reply to what I (and, apparently, the research integrity people at the two institutions that did reply) think is a polite, professional e-mail, with a subject line that I hope suggests that a couple of minutes of the recipient's time might be a worthwhile investment, in 7 out of 9 cases.

I find this disappointing. I wish I could say that I found it remotely surprising. Maybe I should just be grateful that Daniël's estimate of one institution taking any sort of action was exceeded by 100%.


16 May 2020

The perils of improvising with linear regression: Stedman et al. (in press)

This article has been getting a lot of coverage of various kinds in the last few days, including the regional, UK national, and international news media:

Stedman, M., Davies, M., Lunt, M., Verma, A., Anderson, S. G., & Heald, A. H. (in press). A phased approach to unlocking during the COVID-19 pandemic – Lessons from trend analysis. International Journal of Clinical Practice. https://doi.org/10.1111/ijcp.13528

There doesn't seem to be a typeset version from the journal yet, but you can read the final draft version online here and also download it as a PDF file.

The basic premise of the article is that, according to the authors' model, COVID-19 infections are far more widespread in the community in the United Kingdom(*) than anyone seems to think. Their reasoning works in three stages. First, they built a linear model of the spread of the disease, one of whose predictors was the currently reported number of total cases (i.e., the official number of people who have tested positive for COVID-19). Second, they extrapolated that model to a situation in which the entire population was infected, assuming the the spread continues to be entirely linear. Third, they used the slope of their line to estimate what the official reported number of cases would be at that point. They concluded their model shows that the true number of cases in the population is 150 times larger than the number of positive tests that have been carried out, so that on the day when their data were collected (24 April 2020) 26.8% of the population were already infected.

The above figures are from the Results section of the paper, on p. 11 of the final draft PDF. However, the Abstract contains different numbers, which seem to be based on data from 19 April 2020. The Abstract asserts that the true number of cases in the population may be 237 (versus 150) times the reported number, and that the percentage of the population who had been infected might have been 29% (versus 26.8% several days later). Aside from the question of why the Abstract includes different principal results from the Results section, it would appear to be something of a problem for the authors' assumptions of (ongoing) linearity in the relation between the spread of the disease and the number of reported cases if the slope of their model changed by a factor of one-third over five days.

But it seems to me that, apart from the rather tenuous assumptions of linearity and the validity of extrapolating a considerable way beyond the range of the data (which, to be fair, they mention in their Limitations paragraph), there is an even more fundamental problem with how the authors have used linear regression here. Their regression model contained at least nine covariates(**), and we are told that "The stepwise(***) regression of the local UTLA factors to RADIR showed that only one factor total reported cases/1,000 population [sic] was significantly linked" (p. 11). I take this to mean that, if the authors had reported the regression output in a table, this predictor would be the only one whose absolute value was at least twice its standard error. (The article is remarkably short on numerical detail, with neither a table of regression output nor a table of descriptives and correlations of the variables. Indeed, there are many points in the article where a couple of minutes spent on attention to detail would have greatly improved its quality, even in the context of the understandable desire to communicate results rapidly during a pandemic.)

Having established that only one predictor in this ten-predictor regression was statistically significant (in the sense of a 95-year-old throwaway remark by R. A. Fisher), the authors then proceeded to do something remarkable. Remember, they had built this model:

Y = B0 + B1X1 + B2X2 + B3X3 + ... + B10X10 + Error

(with Error apparently representing 78% of the variance, cf. line 3 of p. 11). But they then dropped ten of those terms (nine regression coefficients multiplied by the values of the predictors, plus the error term) to come up with this model (p. 11):

RADIR = 1.06 - 0.16 x Current Total Cases/1,000

What seems to have happened here is that authors in effect decided to set the regression coefficients B2 through B10 to zero, apparently because their respective values were less than twice their standard errors and so "didn't count" somehow. However, they retained the intercept (B0) and the coefficient associated with their main variable of interest (B1) from the 10-variable regression, as if the presence of the nine covariates had had no effect on the calculation of these values. But of course, those covariates had had an effect on both the estimation of the intercept and the coefficient of the first variable. That was precisely what they were included in the regression for. If the authors had wanted to make a model with just one predictor (the number of current total cases), they could have done so quite simply a one-variable regression. You can't just run a multiple regression and keep the coefficients (with or without the intercept) that you think are important while throwing away the others.

This seems to me to be a rather severe misinterpretation of what a regression model is and how it works. There are many other things that could be questioned about this study(****), and indeed several people are already doing precisely that, but this seems to me to a very fundamental problem with the article, and something that the reviewers really ought to have picked up. The first two authors appear to be management consultants whose qualifications to conduct this sort of analysis are unclear, but the third author's faculty page suggests that he knows his stuff when it comes to statistics, so I'm not sure how this was allowed to happen.

Stedman et al. end with this sentence: "The manuscript is an honest, accurate, and transparent account of the study being reported. No important aspects of the study have been omitted." This is admirable, and I take it to mean that the authors did not in fact run a one-predictor regression to estimate the effect of their main IV of interest on their DV before they decided to run a stepwise regression with nine covariates. However, I suggest that it might be useful if they were to run that one-predictor regression now, and report the results along with those of the multiple regression (cf. Simmons, Nelson, & Simonsohn, 2011, p. 1362, Table 2, point 6). When they do that, they might also consider incorporating the latest testing data and see if the slope of their regression has changed, because since 24 April the number of cases in the UK has more than doubled (240,161 at the moment I am writing this), suggesting that between 54% and 84% of the population has by now become infected, depending on whether we take the numbers from p. 11 of the article or those from the Abstract.

[[ Update 2020-05-18 00:15 UTC: There is a preprint of this paper, available here. It contains the same basic model, which is estimated using data from 11 days earlier than the accepted manuscript. In the preprint, the regression equation (p. 7) is:

RADIR = 1.20 - 0.26 x Current Total Cases/1,000

In other words, between the submission date of the preprint and the preparation data of the final manuscript, the slope of the regression line --- which the model assumes would be constant until everyone was infected --- changed from -0.26 to -0.16. And yet the authors did not apparently think that this was sufficient reason to question the idea that the progress of the disease would continue to match their linear model, despite direct evidence that it had failed to do so over the previous 11 days. This is quite astonishing. ]]

(*) Whether the authors claim that their model applies to the UK or just to England is not entirely clear, as both terms appears to be used more or less interchangeably. They use a figure of 60 million as the population of England, although the Office of National Statistics reports figures of 56.3 million for England and 66.8 million for the UK in mid 2019.

(**) I wrote here that there were nine, but one of them is ethnicity, which would typically have been coded as a series of categories, each of which would have functioned as a separate predictor in the regression. But maybe they used some kind of shorthand such as "Percentage who did/didn't identify in the 'White British' category", so I'll continue to assume that there were nine covariates and hence 10 predictors in total.

(***) Stepwise regression is generally not considered a good idea these days. See for example here. Thanks to Stuart Ritchie for this tip and for reading through a draft of this post.

(****) Looking at Figure 4, it occurs to me that a few data points at the top left might well be lifting the left-hand side of the regression line up to some extent, but that's all moot until we know more about the single-variable regression. Also, there is no confidence interval or any other measure of the uncertainty --- even assuming that the model is perfectly linear --- of the estimated reported case rate when the infection rate drops to zero.


23 April 2020

The Mystery of the Missing Authors

What do the following researchers have in common?

Muller G. Hito, Department of Psychology and Sports Science, Justus-Liebig University, Germany
Okito Nakamura, Global Research Department, Ritsumeikan University, Japan
Mitsu Nakamura, The Graduated [sic] University of Advanced Studies, Japan
John Okutemo, Usman University, Sokoto, Nigeria
Eryn Rekgai, Department of Psychology and Sports Science, Justus-Liebig University, Germany
Mbraga Theophile, Kinshasa University, Republic of Congo
Bern S. Schmidt, Department of Fundamental Neuroscience, University of Lausanne, Switzerland

Despite their varied national origins, it seems that the answer is "quite a bit":

1. They seem to collaborate with each other, in various combinations, on short articles with limited empirical content, typically published with less than a week from submission to acceptance. (Some examples: 1 2 3 4 5) The majority of these articles date from 2017, although there are some from 2018 and 2019 as well.

2. Apart from each other, these people have published with almost nobody else, except that:

(a) Four of them have published with Faustin Armel Etindele Sosso (whom I will refer to from now on as FAES), the lead author of the article that I discussed in this post. (Examples: 6 7 8) In one case, FAES is the corresponding author although he is not listed as an actual author of the article. I don't think I have ever seen that before in scholarly publishing.

(b) Two of them have published with an author named Sana Raouafi --- see the specific paragraph on this person towards the end of this post.

3. Whether FAES is a co-author or not, these researchers have a remarkable taste for citing his published work, which typically accounts for between 50% and 100% of the References section of any of their articles.

4. When one of these researchers, rather than FAES, is the corresponding author of an article, they always use a Yahoo or Gmail address. So far I have identified "s.bern@yahoo.com", "mullerhito@yahoo.com", "mitsunaka216@gmail.com", and "okitonaka216@gmail.com". None of these researchers seems to use an institutional e-mail address for correspondence. Of course, this is not entirely illegitimate (for example, if one anticipates moving to another institution in the near future), but it seems quite unusual for none of them to have used their faculty address.

[[ Update 2020-04-27 17:22 UTC: I have identified that "Erin Regai", who I think is the same person as "Eryn Rekgai" but with slightly different spelling, has the e-mail address "eregai216@gmail.com". That makes three people with Gmail addresses ending in 216. It would be interesting to discover whether anybody involved in these authors' publication projects has a birthday on 21/6 (21 June) or 2/16 (February 16). ]]

5. None of these people seems to have an entry in the online staff directory of their respective institutions. (The links under their names at the start of this post all go to their respective ResearchGate profiles, or if they don't have one, RG's collection of their "scientific contributions".) Of course, one can never prove a negative, and some people just prefer a quiet life. So as part of this blog post I am issuing a public appeal: If you know (or, even better, if you are) any of these people, please get in touch with me.

I don't have time to go into all of these individuals in detail, but here are some highlights of what I found in a couple of cases. (For the two authors named Nakamura, I am awaiting a response to inquiries that I sent to their respective institutions; I hope that readers will forgive me for publishing this post before waiting for a reply to those inquiries, given the current working situation at many universities around the world.)

[[ Update 2020-04-24 21:24 UTC: Ms. Mariko Kajii of the Office of Global Planning and Partnerships at The Ritsumeikan Trust has confirmed to me that nobody named "Okito Nakamura" is known to that institution. ]]

[[ Update 2020-04-24 23:37 UTC: Mitsu Nakamura's ResearchGate page claims that Okito Naakmura is a member of Mitsu's lab at "The graduated [sic] University of Advanced studies". It seems strange that someone would be affiliated with one university (even if that university denied any knowledge of them, cf. my previous update) while working in a lab at another. Meanwhile, Mitsu Nakamura's name does not appear in Japan's national database of researchers. ]]

Muller G. Hito

For this researcher --- who does not seem to be quite sure how their own name is structured(*), as they sometimes appear at the top of an article as "Hito G. Muller" --- we have quite extensive contact information, for example in this article (which cites 18 references, 12 of them authored by FAES).

I looked up that phone number and found that it does indeed belong to someone in the Department of Psychology and Sports Science at Justus-Liebig University, namely Prof. Dr. Hermann Müller. For a moment I thought that maybe Prof. Dr. Müller likes to call himself "Hito", and maybe he got his first and last names mixed up when correcting the proofs of his article. But as my colleague Malte Elson points out, no German person named "Müller" would ever allow their name to be spelled "Muller" without the umlaut. (In situations where the umlaut is not available, for example in an e-mail address, it is compensated for by adding an e to the vowel, e.g., in this case, "Mueller".)

In any case, Malte contacted Prof. Dr. Müller, who assured him that he is not "Hito D. Muller" or "Muller D. Hito". Nor has Dr. Müller ever heard of anyone with that name, or anyone with a name like "Eryn Rekgai", in the department where he works.

Bern S. Schmidt

Bern Schmidt is another author who likes to permute the components of their name. They have published articles as "Bern S. Schmidt", "Bern Schmidt S.", "Bern, SS", and perhaps other combinations. Their bio on their author page on the web site of Insight Medical Publishing, which publishes a number of the journals that contain the articles that are linked to throughout this post, says:
Dr Bern S. Schmidt is a neuroscientist and clinical tenure track [sic] of the CHUV, working in the area of fundamental neuroscience and psychobiological factors influencing appearance of central nervous disorders and neurodegenerative disorders such as Alzheimer and Dementia. He holds a medical degree at the University of Victoria, follow by a residency program at The Shiga University of Medicine and a postdoctoral internship at the Waseda University.
I assume that "CHUV" here refers to "Centre Hospitalier Universitaire Vaudois", the teaching hospital of the University of Lausanne where Dr. Schmidt claims to be affiliated in the Department of Fundamental Neuroscience. But a search of the university's web site did not find any researcher with this name. I asked somebody who has access to a directory of all past and present staff members of the University of Lausanne if they could find anyone with a name that corresponds even partially to this name, and they reported that they found nothing. Meanwhile, The University of Victoria has no medical degree programme, and their neuroscience programme has no trace of anyone with this name.

[[ Update 2020-04-27 17:24 UTC: A representative of the University of Lausanne has confirmed to me that they can find no trace of anybody named "Bern Schmidt" at their institution. ]]

(A minor detail, but one that underscores how big a rabbit hole this story is: Dr. Schmidt seems to have an unusual telephone number. This article lists it as "516-851-8564", which looks more like a North American number than a Swiss one. Indeed, it is identical to the number given in this apparently unrelated article in the same journal for the corresponding author Hong Li of the Department of Neuroscience at Yale University School of Medicine. Dr. Hong Li's doubtless vital --- after all, she is at Yale --- contribution to neuroscience research was accepted within 6 days of being submitted, presumably having been pronounced flawless by the prominent scholars who performed the rigorous peer review process for the prestigious Journal of Translational Neurosciences. It is, however, slightly disappointing that typesetting standard at this paragon of scientific publishing do not extend to removing one author's phone number when typesetting the next one to be published on the same day. If anyone knows where Dr. Bern Schmidt is, perhaps they could mention this to them, so that this important detail can be corrected. We wouldn't want Dr. Hong Li's valuable Yale neuroscientist time to be wasted answering calls intended for Dr. Schmidt.

These authors' recent dataset

The only activity that I have been able to identify from any of these authors in the last few months is the publication of this dataset, which was uploaded to Mendeley on March 22, 2020. As well as FAES, the authors are listed as HG Muller, E. Regai [sic], and O. Nakamura. From the "Related links" on that page, it appears that this dataset is a subset (total N=750) of the 10,566 cases that make up the sample described in the Etindele Sosso et al. article in Nature Scientific Reports that was the subject of my previous blog post.

However, a few things about these data are not entirely consistent with that article. For example, while the per-country means for the variables "Age", "Mean Hours of Gaming/week", and "Mean months of gaming/gamer" correspond to the published numbers in Table 2 of the article in five out of six cases (for "Mean months of gaming/gamer" in the sample from Gabon the mean is 15.77, whereas in the article the integer-rounded value reported was 15), all of the standard deviations in the dataset are considerably higher than those that were published, by factors ranging from 1.3 to 5.1.

Furthermore, there are some patterns in the distribution of scores in the four outcome variables (ISI, EDS, HADS-A subscale, and HADS-D subscale) that are difficult to explain as being the results of natural processes. For all four of these measures in the N=301 sample from Tunisia, and three of them (excluding the EDS) in the N=449 sample from Gabon, between 77% and 92% of the individual participants' total scores on each of these the subscales are even numbers. For the EDS in the sample from Gabon, 78% of the scores are odd numbers. In the Gabon sample, it is also noticeable that the ISI score for every participant is exactly 2 higher than their HADS-A score and exactly 3 higher than their EDS score; the HADS-A score is also 2 higher than the HADS-D for 404 out of 449 participants.

It is not clear to me why Hito G. Muller, Eryn Re[k]gai, and Okito Nakamura might be involved with the publication of this dataset, when their names were not listed as authors of the published article. But perhaps they have very high ethical standards and did not feel that their contribution to the curation of the data, whatever that might have been, merited a claim of authorship in the 11th most highly cited scientific journal in the world.

The other author who does seem to exist

There is one co-author on a few of the articles mentioned above who does actually appear to exist. This is Sana Raouafi, who reports an affiliation with the Department of Biomedical Engineering at the Polytechnique Montréal. The records office of that institution informed me that she was awarded her PhD on January 27, 2020. I have no other contact information for her, nor do I know whether she genuinely took part in the authorship of these strange articles, or what her relationship with FAES (or, if they exist, any of the other co-authors) might be.

Supporting file

There is one supporting file for this post here:
-           Muller-dataset-with-pivots.xls: An Excel file containing my analyses of the Muller et al. dataset, mentioned above in the section "These authors' recent dataset". The basic worksheets from the published dataset have been enhanced with two sheets of pivot tables, illustrating the issues with the outcome measures that I described.

Acknowledgements

Thanks to Elisabeth Bik, Malte Elson, Danny Garside, Steve Lindsay, Stuart Ritchie, and Yannick Rochat for their help in attempting to track down these elusive researchers. Perhaps others will have more luck than us.


(*) I am aware that different customs exist in different countries regarding the order in which "given" and "family" names are written. For example, in several East Asian countries, but also in Hungary, it is common to write the family name first. Interestingly, there is often some ambiguity about this among speakers of French. But as far as I know, German speakers, like English speakers, always use put their given name first and their family name last, unless there is a requirement to invert this order for alphabetisation purposes. And of course, in some parts of the world, the whole idea of "family names" is much more complicated than in Western countries. It's a fascinating subject that, alas, I do not have time to explore here.



21 April 2020

Some issues in a recent gaming research article: Etindele Sosso et al. (2020)


Research into the possibly problematic aspects of gaming is a hot topic. But most studies in this area have focused on gamers in Europe and North America. So a recent article in Nature Scientific Reports, featuring data from over 10,000 African gamers, would seem to be an important landmark for this field. However, even though I am an outsider to gaming research, it seems to my inexpert eye that this article may have a few wrinkles that need ironing out.

Let’s start with the article reference. It has 16 authors, and the new edition of the APA Publication Manual says that we now have to list up to 20 authors’ names in a reference, so let’s take a deep breath:

Etindele Sosso, F. A., Kuss, D. J., Vandelanotte, C., Jasso-Medrano, J. L., Husain, M. E., Curcio, G., Papadopoulos, D., Aseem, A., Bhati, P., Lopez-Rosales, F., Ramon Becerra, J., D’Aurizio, G., Mansouri, H., Khoury, T., Campbell, M., & Toth, A. J. (2020). Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries. Nature Scientific Reports, 10, 1937. https://doi.org/10.1038/s41598-020-58462-0
(The good news is that it is an open access article, so you can just follow the DOI link and download the PDF file.)

Etindele Sosso et al. (2020) investigated the association between gaming and the four health outcomes mentioned in the title. According to the abstract, the results showed that “problematic and addicted gamers show poorer health outcomes compared with non-problematic gamers”, which sounds very reasonable to me as an outsider to the field. A survey that took about 20 minutes to complete was e-mailed to 53,634 participants, with a 23.64% response rate. After eliminating duplicates and incomplete forms, a total of 10,566 gamers were used in the analyses. The “type of gamer” of each participant was classified as “non-problematic”, “engaged”, “problematic”, or “addicted”, depending on their scores on a measure of gaming addiction, and the relations between this variable, other demographic information, and four health outcomes were examined.

The 16 authors of the Etindele Sosso et al. (2020) article report affiliations at 12 different institutions in 8 different countries. According to the “Author contributions” section, the first three authors “contributed equally to this work” (I presume that this means that they did the majority of it); 12 others (all except Papadopoulos, it seems) “contributed to the writing”; the first three authors plus Papadopoulos “contributed to the analyses”; and five (the first three authors, plus Campbell and Toth) “write [sic] the final form of the manuscript”. So this is a very impressive international collaboration, with the majority of the work apparently being split between Canada, the UK, and Australia, and it ought to represent a substantial advance in our understanding of how gaming affects mental and physical health in Africa.

Funding
Given the impressive set of authors and the large scale of this international project (data collection alone took 19 or 20 months, from November 2015 to June 2017), it is somewhat surprising that Etindele Sosso et al.’s (2020) article reports no source of funding. Perhaps everyone involved contributed their time and other resources for free, but there is not even a statement that no external funding was involved. (I am quite surprised that this last element is apparently not mandatory for articles in the Nature family of journals.) The administrative arrangements for the study, involving for example contacting the admissions offices of universities in nine countries and arranging for their e-mail lists to be made available, with appropriate guarantees that each university’s and country’s standards of research ethics would be respected, must have been considerable. The participants completed an online questionnaire, which might well have involved some monetary cost, whether directly paid to a survey hosting company or using up some part of a university’s agreed quota with such a company. Just publishing an Open Access article in Nature Scientific Reports costs, according to the journal’s web site, $1,870 plus applicable taxes.

Ethical approval
One possible explanation for the absence of funding information—although this would still constitute rather sloppy reporting, since as noted in the previous paragraph funding typically doesn’t just pay for data collection—might be if the data had already been collected as part of another study. No explicit statement to this effect is made in the Etindele Sosso et al. (2020) article, but at the start of the Methods section, we find “This is a secondary analysis of data collected during the project MHPE approved by the Faculty of Arts and Science of the University of Montreal (CERAS-2015-16-194-D)”. So I set out to look for any information about the primary analysis of these data.

I searched online to see if “project MHPE” might perhaps be a large data collection initiative from the University of Montreal, but found nothing. However, in the lead author’s Master’s thesis, submitted in March 2018 (full text PDF file available here—note that, apart from the Abstract, the entire document is written in French, but fortunately I am fluent in that language), we find that “MHPE” stands for “Mental Health profile [sic] of Etindele” (p. 5), and that the research in that thesis was covered by a certificate from the ethical board of the university that carries exactly the same reference number. I will therefore tentatively conclude that this is the “project MHPE” referred to in the Etindele Sosso (2020) article.

However, the Master’s thesis describes how data were collected from a sample (prospective size, 12,000–13,000; final size 1,344) of members of the University of Montreal community, collected between November 2015 and December 2016. The two studies—i.e., the one reported in the Master’s thesis and the one reported by Etindele et. al (2020)—each used five measures, of which only two—the Insomnia Severity Index (ISI) and the Hospital Anxiety and Depression Scale (HADS)—were common to both. The questionnaires administered to the participants in the Montreal study included measures of cognitive decline and suicide risk, and it appears from p. 27, line 14 of the Master’s thesis that participants were also interviewed (although no details are provided of the interview procedure). All in all, the ethical issues involved in this study would seem to be rather different to those involved in asking people by e-mail about their gaming habits. Yet it seems that the ethics board gave its approval, on a single certificate, for the collection of two sets of data from two distinct groups of people in two very different studies: (a) a sample of around 12,000 people from the lead author’s local university community, using repeated questionnaires across a four-month period as well as interviews; and (b) a sample of 50,000 people spread across the continent of Africa, using e-mail solicitation and an online questionnaire. This would seem to be somewhat unusual.

Meanwhile, we are still no nearer to finding out who funded the collection of data in Africa and the time taken by the other authors to make their (presumably extensive, in the case of the second and third authors) personal contributions to the project. On p. 3 of his Master’s thesis, the author thanks (translation by me) “The Department of Biological Sciences and the Centre for Research in Neuropsychology and Cognition of the University of Montreal, which provided logistical and financial support to the success of this work”, but it is not clear that “this work” can be extrapolated to cover beyond the collection of data records in Montreal to include the African project. Nor do we have any more idea about why Etindele Sosso et al. (2020) described their use of the African data as a "secondary analysis", when it seems, as far as I have been able to establish, that there has been no previously published (primary) analysis of this data set.

Results
Further questions arise when we look at the principal numerical  results of Etindele Sosso et al.’s (2020) article. On p. 4, the authors report that “4 multiple linear regression analyses were performed (with normal gaming as reference category) to compare the odds for having these conditions [i.e., insomnia, sleepiness, anxiety, and depression] (which are dependent variables) for different levels of gaming.” I’m not sure why the authors would perform linear, as opposed to logistic, regressions to compare the odds of someone in a given category having a specific condition relative to someone in a reference category, but that’s by no means the biggest problem here.

Table 3 lists, for each of the four health outcome variables, the regression coefficients and associated test statistics for each of the predictors in Etindele Sosso et al.’s (2020) model. Before we come to these numbers for individual variables, however, it is worth looking at the R-squared numbers for each model, which range from .76 for depression to .89 for insomnia. Although these are actually labelled as “ΔR2”, I assume that they represent the total variance explained by the whole model, rather than a change in R-squared when “type of gamer” is added to the model that contains only the covariates. (That said, however, the sentence “Gaming significantly contributed to 86.9% of the variance in insomnia, 82.7% of the variance in daytime sleepiness and 82.3% of the variance in anxiety [p < 0.001]” in the Abstract does not make anything much clearer.) But whether these numbers represent the variance explained by the whole model or just by the “type of gamer” variable, these are remarkable results by any standard. I wonder if anything in the prior sleep literature has ever predicted 89% of the variance explained by a measure of insomnia, apart perhaps from another measure of insomnia.

Now let’s look at the details of Table 3. In principle there are seven variables (“Type of Gamers [sic]” being the main one of interest, plus the demographic covariates Age, Sex, Education, Income, Marital status, and Employment status), but because all of these are categorical, each of the levels except the reference category will have been a separate predictor in the regression, giving a total of 17 predictors. Thus, across the four models, there are 68 lines in total reporting regression coefficients and other associated statistics. The labels of the columns seem to be what one would expect from reports of multiple regression analyses: B (unstandardized regression coefficient), SE (standard error, presumably of B), β (standardized regression coefficient), t (the ratio between B and SE), Sig (the p value associated with t), and the upper and lower bounds of the 95% confidence interval (again, presumably of B).

The problem is that none of the actual numbers in the table seem to obey the relations that one would expect. In fact I cannot find a way in which any of them make any sense at all. Here are the problems that I identified:
-        When I compute the ratio B/SE, and compare it to column t (which should give the same ratio), the two don’t even get close to being equal in any of the 68 lines. Dividing the B/SE ratio by column t gives results that vary from 0.0218 (Model 2, Age, 30–36) to 44.1269 (Model 1, Type of Gamers, Engaged), with the closest to 1.0 being 0.7936 (Model 4, Age, 30–36) and 1.3334 (Model 3, Type of Gamers, Engaged).
-        Perhaps SE refers to the standard error of the standardized regression coefficient (β), even though the column SE appears to the left of the column β? Let’s divide β by SE and see how the t ratio compares. Here, we get results that vary from 0.0022 (Model 2, Age, 30–36) to 11.7973 (Model 1, Type of Gamers, Engaged). The closest we get to 1.0 is with values of 0.7474 (Model 3, Marital Status, Engaged) and 1.0604 (Model 3, Marital Status, Married). So here again, none of the β/SE calculations comes close to matching column t.
-        The p values do not match the corresponding t statistics. In most cases this can be seen by simple inspection. For example, on the first line of Table 3, it should be clear that a t statistic of 9.748 would have a very small p value indeed (in fact, about 1E−22) rather than .523. In many cases, even the conventional statistical significance status (either side of p = .05) of the t value doesn’t match the p value. To get an idea of this, I made the simplifying assumption (which is not actually true for the categories “Age: 36–42”, “Education: Doctorate”, and “Marital status: Married”, but individual inspection of these shows that my assumption doesn’t change much) that all degrees of freedom were at least 100, so that any t value with a magnitude greater than 1.96 would be statistically significant at the .05 level. I then looked to see if t and p were the same side of the significance threshold; they were not in 29 out of 68 cases.
-        The regression coefficients are not always contained within their corresponding confidence intervals. This is the case for 29 out of 68 of the B (unstandardized) values. I don’t think that the confidence intervals are meant to refer to the standardized coefficients (β), but just for completeness, 63 out of 68 of these fall outside the reported 95% CI.
-        Whether the regression coefficient falls inside the 95% CI does not correspond with whether the p value is below .05.  For both the unstandardized coefficients (B) and the standardized coefficients (β)—which, again, the CI probably doesn’t correspond to, but it’s quick and cheap to look at the possibility anyway—this test fails in 41 out of 68 cases.

There are some further concerns with Table 3:
-        In the third line (Model 1, “Type of Gamers”, “Problematic”) the value for β is 1.8. Now it is actually possible to have a standardized regression coefficient with a magnitude above 1.0, but its existence usually means that you have big multicollinearity problems, and it’s typically very hard to interpret such a coefficient. It’s the kind of thing that at least one of the four authors who reported performing the analyses would normally be expected to pick up on and discuss, but no such discussion is to be found.
-        From Table 1, we can see that there were zero participants in the “Age” category 42–48, and zero participants in the “Education” category “Postdoctorate”. Yet, in Table 3, for all four models, these categories have non-zero regression coefficients and other statistics. It is not clear to me how one can obtain a regression coefficient or standard error from a categorical variable that corresponds to zero cases (and, hence, when coded has a mean and standard deviation of 0).
-        There is a surprisingly high number of repetitions of exactly the same value, typically to 3 decimal places, within the same variable, category, and absolute value of the statistic from one model to another. For example, the reported value in the column t for the variable “Age” and category “24–30” is 29.741 in both Models 1 and 3. For the variable “Employment status” and category “Employed”, the upper bound of the 95% confidence interval is the same (2.978) in all four models. This seems quite unlikely to be the result of chance, given the relatively large sample sizes that are involved for most of the categories (cf. Brown & Heathers, 2019), so it is not clear how these duplicates could have arisen.


Table 3 from Etindele et al. (2020), with duplicated values (considering the same variable and category across models) highlighted with a different colour for each set of duplicates. Two pairs are included where the sign changed but the digits remained identical; however, p values that were reported as 0.000 are ignored. To find a duplicate, first identify a cell that is outlined in a particular colour, then look up or down the table for one or more other cells with the same outline colour in the analogous position for one or more other models.

The preprint
It is interesting to compare Etindele Sosso et al.’s (2020) article with a preprint entitled “Insomnia and problematic gaming: A study in 9 low- and middle-income countries” by Faustin Armel Etindele Sosso and Daria J. Kuss (who also appears to be the second author of the published article), which is available here. That preprint reports a longitudinal study, with data collected at multiple time points—presumably four, including baseline, although only “after one months, six months, and 12 months” (p. 8) is mentioned—from a sample of people (initial size 120,460) from nine African countries. This must therefore be an entirely different study from the one reported in the published article, which did not use a longitudinal design and had a prospective sample size of 53,634. Yet, by an astonishing coincidence, the final sample retained for analysis in the preprint consisted of 10,566 participants, which is exactly the same as the published article. The number of men (9,366) and women (1,200) was also identical in the two samples. However, the mean and standard deviation of their ages was different (M=22.33 years, SD= 2.0 in the preprint; M=24.0, SD=2.3 in the published article). The number of participants in each of the nine countries (Table 2 of both the preprint and the published article) is also substantially different for each country between the two papers, and with two exceptions—the ISI and the well-known Hospital Anxiety and Depression Scale (HADS)—different measures of symptoms and gaming were used in each case.

Another remarkable coincidence between the preprint and Etindele Sosso et al.’s (2020) published article, given that we are of course dealing with two distinct samples, occurs in the description of the results obtained from the sample of African gamers on the Insomnia Severity Index. On p. 3 of the published article, in the paragraph describing the respondents’ scores on the ISI, we find: “The internal consistency of the ISI was excellent (Cronbach’s α = 0.92), and each individual item showed adequate discriminative capacity (r = 0.65–0.84). The area under the receiver operator characteristic curve was 0.87 and suggested that a cut-off score of 14 was optimal (82.4% sensitivity, 82.1% specificity, and 82.2% agreement) for detecting clinical insomnia”. These two sentences are identical, in every word and number, to the equivalent sentences on p. 5 of the preprint.

Naturally enough, because the preprint and Etindele Sosso et al.’s (2020) published article describe entirely different studies with different designs, and different sample sizes in each country, there is little in common between the Results sections of the two papers. The results in the preprint are based on repeated-measures analyses and include some interesting full-colour figures (the depiction of correlations in Figure 1, on p. 10, is particularly visually attractive), whereas the results of the published article consist mostly of a fairly straightforward summary, in sentences, of the results from the tables, which describe the outputs of linear regressions.


Figure 1 from the preprint by Etindele Sosso and Kuss (2018, p. 10). This appears to use an innovative technique to illustrate the correlation between two variables.

However, approximately 80% of the sentences in the introduction of the published article, and 50% of the sentences in the Discussion section, appear (with only a few cosmetic changes) in the preprint. This is interesting, not only because it would be quite unusual for a preprint of one study to be repurposed to describe en entirely different one, but also because it suggests that the addition of 14 authors resulted in the addition of only about 1,000 words to these two parts of manuscript once the decision to recycle the text had been made.
The Introduction section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) published article (right). Sentences highlighted in yellow are common to both papers.



The Discussion section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) article (right). Sentences highlighted in yellow are common to both papers.

Another (apparently unrelated) preprint contains the same results
It is also perhaps worth noting that the summary of the participants’ results on the ISI measure—which, as we saw above, was identical in every word and number between the preprint and Etindele Sosso et al. (2020)’s published article—also appears, again identical in every word and number, on pp. 5–6 of a 2019 preprint by the lead author, entitled “Insomnia, excessive daytime sleepiness, anxiety, depression and socioeconomic status among customer service employees in Canada”, which is available here [PDF]. This second preprint describes a study of yet another different sample, namely 1,200 Canadian customer service workers. If this is not just another remarkable coincidence, it would suggest that the author may have discovered some fundamental invariant property of humans with regard to insomnia. If so, one would hope that both preprints could be peer reviewed most expeditiously, to bring this important discovery to the wider attention of the scientific community.

Other reporting issues from the same laboratory
The lead author of the Etindele Sosso et al. (2020) article has published even more studies with substantial numbers of participants. Here are two such articles, which have 41 and 35 citations, respectively, according to Google Scholar:

Etindele Sosso, F. A., & Rauoafi, S. (2016). Brain disorders: Correlation between cognitive impairment and complex combination. Mental Health in Family Medicine, 12, 215–222. https://doi.org/10.25149/1756-8358.1202010
Etindele Sosso, F. A. (2017a). Neurocognitive game between risk factors, sleep and suicidal behaviour. Sleep Science, 10(1), 41–46. https://doi.org/10.5935/1984-0063.20170007

In the 2016 article, 1,344 respondents were assessed for cognitive deficiencies; 71.7% of the participants were aged 18–24, 76.2% were women, and 62% were undergraduates. (These figures all match those that were reported in the lead author’s Master’s thesis, so we might tentatively assume that this study used the same sample.) In the 2017 article, 1,545 respondents were asked about suicidal tendencies, with 78% being aged 18–24, 64.3% women, and 71% undergraduates. Although these are clearly entirely different samples in every respect, the tables of results of the two studies are remarkably similar. Every variable label is identical across all three tables, which might not be problematic in itself if similar predictors were used for all of the different outcome variables. More concerning, however, is the fact that of the 120 cells in Tables 1 and 2 that contain statistics (mean/SD combinations, p values other than .000, regression coefficients, standard errors, and confidence intervals), 58—that is, almost half—are identical in every digit. Furthermore, the entirety of Table 3—which shows the results of the logistic regressions, ostensibly predicting completely different outcomes in completely different samples—is identical across the two articles (52 out of 52 numbers). One of the odds ratios in Table 3 has the value 1133096220.169 (again, in both articles). There does not appear to be an obvious explanation for how this duplication could have arisen as the result of a natural process.

Left: The tables of results from Etindele Sosso and Raouafi (2016). Right: The tables of results from Etindele Sosso (2017a). Cells highlighted in yellow are identical (same variable name, identical numbers) in both articles.

The mouse studies
Further evidence that this laboratory may have, at the very least, a suboptimal approach to quality control when it comes to the preparation of manuscripts comes from the following pair of articles, in which the lead author of Etindele Sosso et al. (2020) reported the results of some psychophysiological experiments conducted on mice:

Etindele Sosso, F. A. (2017b). Visual dot interaction with short-term memory. Neurodegenerative Disease Management, 7(3), 182–190. https://doi.org/10.2217/nmt-2017-0012
Etindele Sosso, F. A., Hito, M. G., & Bern, S. S. (2017). Basic activity of neurons in the dark during somnolence induced by anesthesia. Journal of Neurology and Neuroscience, 8(4), 203–207. https://doi.org/10.21767/2171-6625.1000203 [1]

In each of these two articles (which have 28 and 24 Google Scholar citations, respectively), the neuronal activity of mice when exposed to visual stimuli under various conditions was examined. Figure 5 of the first article shows the difference between the firing rates of the neurons of a sample of an unknown number of mice (which could be as low as 1; I was unable to determine the sample size with any great level of certainty by reading the text) in response to visual stimuli that were shown in different orientations. In contrast, Figure 3 of the second article represents the firing rates of two different types of brain cell (interneurons and pyramidal cells) before and after a stimulus was applied. That is, these two figures represent completely different variables in completely different experimental conditions. And yet, give or take the use of dots of different shapes and colours, they appear to be exactly identical. Again, it is not clear how this could have happened by chance.

Top: Figure 5 from Etindele Sosso (2017b). Bottom: Figure 3 from Etindele Sosso et al. (2017). The dot positions and axis labels appear to be identical. Thanks are due to Elisabeth Bik for providing a second pair of eyes.

Conclusion
I find it slightly surprising that 16 authors—all of whom, we must assume because of their formal statements to this effect in the “Author contributions” section, made substantial contributions to the Etindele et al. (2020) article in order to comply with the demanding authorship guidelines of Nature Research journals (specified here)—apparently failed to notice that this work contained quite so many inconsistencies. It would also be interesting to know what the reviewers and action editor had to say about the manuscript prior to its publication. The time between submission and acceptance was 85 days (including the end of year holiday period), which would not appear to suggest that a particularly extensive revision process took place. In any case, it seems that some sort of corrective action may be required for this article, in view of the importance of the subject matter for public policy.

Supporting files
I have made the following supporting files available here
-          Etindele-et-al-Table3-numbers.xls: An Excel file containing the numbers from Table 3 of Etindele et al.’s (2020) article, with some calculations that illustrate the deficiencies in the relations between the statistics that I mentioned earlier. The basic numbers were extracted by performing a copy/paste from the article’s PDF file and using text editor macro commands to clean up the structure.
-          (Annotated) Etindele Sosso, Raouafi - 2016 - Brain Disorders - Correlation between Cognitive Impairment and Complex Combination.pdf” and “(Annotated) Etindele Sosso - 2017 - Neurocognitive Game between Risk Factors, Sleep and Suicidal Behaviour.pdf”: Annotated versions of the 2016 and 2017 articles mentioned earlier, with identical results in the tables highlighted.
-          (Annotated) Etindele Sosso, Kuss - 2018 (preprint) - Insomnia and problematic gaming - A study in 9 low- and middle-income countries.pdf” and “(Annotated) Etindele Sosso et al. - 2020 - Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries.pdf” Annotated versions of the 2018 preprint and the published Etindele et al. (2020) article, with overlapping text highlighted.
-          Etindele-2016-vs-2017.png, Etindele-et-al-Table3-duplicates.png, Etindele-mouse-neurons.png, Etindele Sosso-Kuss-Preprint-Figure1.png, Preprint-article-discussion-side-by-side.png, Preprint-article-intro-side-by-side.png: Full-sized versions of the images from this blog post.

Reference
Brown, N. J. L., & Heathers, J. A. J. (2019). Rounded Input Variables, Exact Test Statistics (RIVETS): A technique for detecting hand-calculated results in published research. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/ctu9z

[[ Update 2020-04-21 13:14 UTC: Via Twitter, I have learned that I am not the first person to have publicly questioned the Etindele et al. (2020) article. See Platinum Paragon's blog post from 2020-04-17 here. ]]

[[ Update 2020-04-22 13:43 UTC: Elisabeth Bik has identified two more articles by the same lead author that share an image (same chart, different meaning). See this Twitter thread. ]]

[[ Update 2020-04-23 22:48 UTC: See my related blog post here, including discussion of a partial data set that appears to correspond to the Etindele et al. (2020) article. ]]




[1] This article was accepted 12 days after submission, which is presumably entirely unrelated to the fact that the lead author is listed here as the journal’s specialist editor for Neuropsychology and Cognition.


19 April 2020

In psychology everything mediates everything


In the past couple of years I have reviewed half a dozen manuscripts with abstracts that go something like this:

<Construct X> is known to be associated with higher levels of well-being and healthy psychological functioning, as indexed by <Construct Y>. However, to date, no study has investigated the role of <Construct M> in this association. The present study bridges this gap by testing a mediation path model in a sample of undergraduates (N = 100). As predicted, M fully mediated the positive association between X and Y.  These results suggest that X predicts higher levels of M, which subsequently predicts higher levels of Y. These results provide new insight that may advance a coherent theoretical framework on the pathways by which M enhances psychological well-being.

There is typically a description of how the 100 participants completed measures of constructs X, M, and Y, with a table of correlations that might look like this:

    X       Y
Y .24*
M .52***  .32**

* p < .05; ** p < .01; *** p < .001.

Then we get to the mediation analysis. More often than not this is done using the PROCESS macro in SPSS, but it can also be done “by hand” using ordinary least-squares regressions. Here are the steps required (cf. Baron & Kenny, 1986):

  1. Show that X is a significant predictor of Y. You don’t actually need to do the regression for this, as the standardized regression coefficient and its associated p value will be identical to the correlation coefficient between X and Y, but sometimes the manuscript will show the SPSS output to prove that the authors conducted this regression anyway. (In the last manuscript that I reviewed, the authors managed to obtain a standardized regression coefficient that was different to the zero-order correlation, which did not enhance my confidence in the rest of their analyses.) Here the p value will be .016.
  2. Show that X is a significant predictor of M. Again, no regression is required for this, as it’s just the correlation coefficient. The p value in this example is about 3E−8.
  3. Regress Y on both X and M. If you get a significant regression coefficient for M then you have at least a “partial” mediation effect. If, in addition, the regression coefficient for X is non-significant then you have “full" mediation. Here, this produces the following standardized coefficients:
    • M: β = 0.268, p = .018
    • X: β = 0.101, p = .368
Ta-da! In this example, we have complete mediation: The p value for the mediator, M, is significant and the p value for X isn’t. We conclude that Construct M fully mediates the relation between Construct X and Construct Y. We can write it up and celebrate contribution to understanding the mechanisms that lead to well-being. Surely the end of mental distress is only one more grant away.

The problem is this: Absolutely any other variable that you might put in place of M, and which is correlated in the same way with X and Y, will also show exactly the same mediation effect. And there is no shortage of things you can measure—in psychology, at least—that are correlated at around .5 and .3 with two other variables, correlated at around .2, that you might have measured. Let’s say that X is some aspect of socioeconomic status and Y is subjective well-being. You can easily come up with any number of ideas for M: gratitude, optimism, self-esteem, all of the Big Five personality traits (if you reverse-score neuroticism as emotional stability), etc., without even needing to resort to Lykken and Meehl’s “crud factor” (“in psychology and sociology everything correlates with everything”; Meehl, 1990, p. 204).

I wrote some R code to demonstrate the example that I gave above, which you can find here. You will see that I performed the calculations in two ways. The first was to generate (with a bit of trial and error) some random data with the correct correlations. (This produces a bit of rounding error, so the p value for the beta for M in the regression is reported at .019, not .018.) The second—my preferred method, since you can generally use this starting with the table of descriptives that appears in an article—is to start with the correlations and perform the regression calculations from there. (A surprising number of people do not seem to know that you can generally determine the standardized coefficients of multiple regression models just from the correlation table. The standard errors—and, hence, the p values—can then be derived from the sample size. If you have the standard deviations as well, you can get the unstandardized coefficients. Add the means and you can calculate the intercepts too. You don't need the raw data for any of this.)

If your initial choice for variable X is more strongly correlated with Y than M is, then you can usually just swap M and Y around, because there is very often nothing to say that whatever X is measuring occurs “before” whatever M is measuring, or vice versa—especially if you just hauled your undergraduates in and gave them measures of X, M, and Y to complete. The reason why you want your mediator, M, to be more strongly correlated than X with the outcome (Y), is a little-known phenomenon of two-variable regression that I like to think of as a sort of “Matthew effect”. Feel free to skip the next paragraph in which I explain this in tedious detail.

When the two predictors are moderately strongly correlated with each other (.52, in our case), then although their zero-order correlations with the outcome variable might be quite close together (.32 and .24 here), their standardized regression coefficients will diverge by quite a bit more than their respective correlation coefficients. Here, M’s correlation of .32 led to a beta of 0.268, which is a 16% reduction, but X’s correlation of .24 was reduced by 58% to a beta of 0.101. If the correlation between M and X had been a little higher (eg, .60 instead of .52), the beta for M would actually have been larger (0.275) while that for X would have been even smaller (0.075). At some point along the M–X correlation continuum (around .75), the beta for M would be exactly equal to the correlation coefficient of M with Y, and the beta for X would be zero. Continuing even further, we would hit “negative suppression” territory, with M’s standardized regression coefficient being greater than the original correlation coefficient of .32, and X’s being negative. Many people seem to have a rather naïve view of multiple regression in which when you add a new predictor, the betas for all of the predictors are reduced in some roughly equal proportion, but the reality is nothing like that. You can explore what happens with just two predictors (with more, things get even wilder) using a Shiny app that I wrote here.

So it’s possible to build an almost infinite number of mediation studies, all of which will appear to tell us something about the mediation of the relation between two psychological variables by a third, although almost all of them are just illustrating a known phenomenon of multiple regression. Again, everything is determined by the three correlations between the variables, plus the sample size if you care about statistical significance. (Alert readers will have noticed that whether or not mediation is “full” or “partial” will depend to a large extent on the sample size; with enough participants even the residual effect of X on Y will be large enough that its p value doesn’t drop below .05. But of course, alter readers will also know that these days statistical significance doesn’t anything, right?)

Now, am I saying that all of the mediation articles that I get to review are based on an atheoretical “throw some numbers at the wall and see what sticks” approach that I may have implied here is what sometimes happens? Well, no... but I’m not saying that none of them are, either. I have heard first-hand from grad students, in several cases, what happens when they have a bunch of variables and no obvious result: Their supervisor suggests that they write them up as a mediation analysis.

I don’t think that preregistration will necessarily help all that much here, because it is quite predictable from previous knowledge that X, M, and Y will have the pattern of correlations needed to produce an apparent mediation effect. I’m going to suggest that the only solution is to refrain from doing this kind of mediation analysis altogether in the absence of (a) much better theoretical justification than we currently see, and (b) some kind of constraint on the order in which changes in X, M, and Y occur. Without a demonstration that the causal arrows are running from X to M and M to Y (MacKinnon & Pirlott, 2014), and not vice versa, we have no way of knowing whether we are dealing with mediation or confounding, especially since in many cases the constructs X, M, and Y may themselves be caused by multiple other factors, and so ad infinitum. (cf. Arah, 2008). In the absence of experimental manipulation, causality is hard to demonstrate, especially in psychology. 


References
Arah, O. A. (2008). The role of causal reasoning in understanding Simpson's paradox, Lord's paradox, and the suppression effect: covariate selection in the analysis of observational studies. Emerging Themes in Epidemiology, 5, 5. https://doi.org/10.1186/1742-7622-5-5
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173
MacKinnon, D. P., & Pirlott, A. G. (2014). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review19(1), 30–43. https://doi.org/10.1177/1088868314542878
Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195

(Thanks to Julia Rohrer for her helpful comments on an earlier draft of this post. If the whole thing is garbage, it's probably because I didn't take incorporate all of her thoughts.)