23 April 2020

The Mystery of the Missing Authors

What do the following researchers have in common?

Muller G. Hito, Department of Psychology and Sports Science, Justus-Liebig University, Germany
Okito Nakamura, Global Research Department, Ritsumeikan University, Japan
Mitsu Nakamura, The Graduated [sic] University of Advanced Studies, Japan
John Okutemo, Usman University, Sokoto, Nigeria
Eryn Rekgai, Department of Psychology and Sports Science, Justus-Liebig University, Germany
Mbraga Theophile, Kinshasa University, Republic of Congo
Bern S. Schmidt, Department of Fundamental Neuroscience, University of Lausanne, Switzerland

Despite their varied national origins, it seems that the answer is "quite a bit":

1. They seem to collaborate with each other, in various combinations, on short articles with limited empirical content, typically published with less than a week from submission to acceptance. (Some examples: 1 2 3 4 5) The majority of these articles date from 2017, although there are some from 2018 and 2019 as well.

2. Apart from each other, these people have published with almost nobody else, except that:

(a) Four of them have published with Faustin Armel Etindele Sosso (whom I will refer to from now on as FAES), the lead author of the article that I discussed in this post. (Examples: 6 7 8) In one case, FAES is the corresponding author although he is not listed as an actual author of the article. I don't think I have ever seen that before in scholarly publishing.

(b) Two of them have published with an author named Sana Raouafi --- see the specific paragraph on this person towards the end of this post.

3. Whether FAES is a co-author or not, these researchers have a remarkable taste for citing his published work, which typically accounts for between 50% and 100% of the References section of any of their articles.

4. When one of these researchers, rather than FAES, is the corresponding author of an article, they always use a Yahoo or Gmail address. So far I have identified "s.bern@yahoo.com", "mullerhito@yahoo.com", "mitsunaka216@gmail.com", and "okitonaka216@gmail.com". None of these researchers seems to use an institutional e-mail address for correspondence. Of course, this is not entirely illegitimate (for example, if one anticipates moving to another institution in the near future), but it seems quite unusual for none of them to have used their faculty address.

[[ Update 2020-04-27 17:22 UTC: I have identified that "Erin Regai", who I think is the same person as "Eryn Rekgai" but with slightly different spelling, has the e-mail address "eregai216@gmail.com". That makes three people with Gmail addresses ending in 216. It would be interesting to discover whether anybody involved in these authors' publication projects has a birthday on 21/6 (21 June) or 2/16 (February 16). ]]

5. None of these people seems to have an entry in the online staff directory of their respective institutions. (The links under their names at the start of this post all go to their respective ResearchGate profiles, or if they don't have one, RG's collection of their "scientific contributions".) Of course, one can never prove a negative, and some people just prefer a quiet life. So as part of this blog post I am issuing a public appeal: If you know (or, even better, if you are) any of these people, please get in touch with me.

I don't have time to go into all of these individuals in detail, but here are some highlights of what I found in a couple of cases. (For the two authors named Nakamura, I am awaiting a response to inquiries that I sent to their respective institutions; I hope that readers will forgive me for publishing this post before waiting for a reply to those inquiries, given the current working situation at many universities around the world.)

[[ Update 2020-04-24 21:24 UTC: Ms. Mariko Kajii of the Office of Global Planning and Partnerships at The Ritsumeikan Trust has confirmed to me that nobody named "Okito Nakamura" is known to that institution. ]]

[[ Update 2020-04-24 23:37 UTC: Mitsu Nakamura's ResearchGate page claims that Okito Naakmura is a member of Mitsu's lab at "The graduated [sic] University of Advanced studies". It seems strange that someone would be affiliated with one university (even if that university denied any knowledge of them, cf. my previous update) while working in a lab at another. Meanwhile, Mitsu Nakamura's name does not appear in Japan's national database of researchers. ]]

Muller G. Hito

For this researcher --- who does not seem to be quite sure how their own name is structured(*), as they sometimes appear at the top of an article as "Hito G. Muller" --- we have quite extensive contact information, for example in this article (which cites 18 references, 12 of them authored by FAES).

I looked up that phone number and found that it does indeed belong to someone in the Department of Psychology and Sports Science at Justus-Liebig University, namely Prof. Dr. Hermann Müller. For a moment I thought that maybe Prof. Dr. Müller likes to call himself "Hito", and maybe he got his first and last names mixed up when correcting the proofs of his article. But as my colleague Malte Elson points out, no German person named "Müller" would ever allow their name to be spelled "Muller" without the umlaut. (In situations where the umlaut is not available, for example in an e-mail address, it is compensated for by adding an e to the vowel, e.g., in this case, "Mueller".)

In any case, Malte contacted Prof. Dr. Müller, who assured him that he is not "Hito D. Muller" or "Muller D. Hito". Nor has Dr. Müller ever heard of anyone with that name, or anyone with a name like "Eryn Rekgai", in the department where he works.

Bern S. Schmidt

Bern Schmidt is another author who likes to permute the components of their name. They have published articles as "Bern S. Schmidt", "Bern Schmidt S.", "Bern, SS", and perhaps other combinations. Their bio on their author page on the web site of Insight Medical Publishing, which publishes a number of the journals that contain the articles that are linked to throughout this post, says:
Dr Bern S. Schmidt is a neuroscientist and clinical tenure track [sic] of the CHUV, working in the area of fundamental neuroscience and psychobiological factors influencing appearance of central nervous disorders and neurodegenerative disorders such as Alzheimer and Dementia. He holds a medical degree at the University of Victoria, follow by a residency program at The Shiga University of Medicine and a postdoctoral internship at the Waseda University.
I assume that "CHUV" here refers to "Centre Hospitalier Universitaire Vaudois", the teaching hospital of the University of Lausanne where Dr. Schmidt claims to be affiliated in the Department of Fundamental Neuroscience. But a search of the university's web site did not find any researcher with this name. I asked somebody who has access to a directory of all past and present staff members of the University of Lausanne if they could find anyone with a name that corresponds even partially to this name, and they reported that they found nothing. Meanwhile, The University of Victoria has no medical degree programme, and their neuroscience programme has no trace of anyone with this name.

[[ Update 2020-04-27 17:24 UTC: A representative of the University of Lausanne has confirmed to me that they can find no trace of anybody named "Bern Schmidt" at their institution. ]]

(A minor detail, but one that underscores how big a rabbit hole this story is: Dr. Schmidt seems to have an unusual telephone number. This article lists it as "516-851-8564", which looks more like a North American number than a Swiss one. Indeed, it is identical to the number given in this apparently unrelated article in the same journal for the corresponding author Hong Li of the Department of Neuroscience at Yale University School of Medicine. Dr. Hong Li's doubtless vital --- after all, she is at Yale --- contribution to neuroscience research was accepted within 6 days of being submitted, presumably having been pronounced flawless by the prominent scholars who performed the rigorous peer review process for the prestigious Journal of Translational Neurosciences. It is, however, slightly disappointing that typesetting standard at this paragon of scientific publishing do not extend to removing one author's phone number when typesetting the next one to be published on the same day. If anyone knows where Dr. Bern Schmidt is, perhaps they could mention this to them, so that this important detail can be corrected. We wouldn't want Dr. Hong Li's valuable Yale neuroscientist time to be wasted answering calls intended for Dr. Schmidt.

These authors' recent dataset

The only activity that I have been able to identify from any of these authors in the last few months is the publication of this dataset, which was uploaded to Mendeley on March 22, 2020. As well as FAES, the authors are listed as HG Muller, E. Regai [sic], and O. Nakamura. From the "Related links" on that page, it appears that this dataset is a subset (total N=750) of the 10,566 cases that make up the sample described in the Etindele Sosso et al. article in Nature Scientific Reports that was the subject of my previous blog post.

However, a few things about these data are not entirely consistent with that article. For example, while the per-country means for the variables "Age", "Mean Hours of Gaming/week", and "Mean months of gaming/gamer" correspond to the published numbers in Table 2 of the article in five out of six cases (for "Mean months of gaming/gamer" in the sample from Gabon the mean is 15.77, whereas in the article the integer-rounded value reported was 15), all of the standard deviations in the dataset are considerably higher than those that were published, by factors ranging from 1.3 to 5.1.

Furthermore, there are some patterns in the distribution of scores in the four outcome variables (ISI, EDS, HADS-A subscale, and HADS-D subscale) that are difficult to explain as being the results of natural processes. For all four of these measures in the N=301 sample from Tunisia, and three of them (excluding the EDS) in the N=449 sample from Gabon, between 77% and 92% of the individual participants' total scores on each of these the subscales are even numbers. For the EDS in the sample from Gabon, 78% of the scores are odd numbers. In the Gabon sample, it is also noticeable that the ISI score for every participant is exactly 2 higher than their HADS-A score and exactly 3 higher than their EDS score; the HADS-A score is also 2 higher than the HADS-D for 404 out of 449 participants.

It is not clear to me why Hito G. Muller, Eryn Re[k]gai, and Okito Nakamura might be involved with the publication of this dataset, when their names were not listed as authors of the published article. But perhaps they have very high ethical standards and did not feel that their contribution to the curation of the data, whatever that might have been, merited a claim of authorship in the 11th most highly cited scientific journal in the world.

The other author who does seem to exist

There is one co-author on a few of the articles mentioned above who does actually appear to exist. This is Sana Raouafi, who reports an affiliation with the Department of Biomedical Engineering at the Polytechnique Montréal. The records office of that institution informed me that she was awarded her PhD on January 27, 2020. I have no other contact information for her, nor do I know whether she genuinely took part in the authorship of these strange articles, or what her relationship with FAES (or, if they exist, any of the other co-authors) might be.

Supporting file

There is one supporting file for this post here:
-           Muller-dataset-with-pivots.xls: An Excel file containing my analyses of the Muller et al. dataset, mentioned above in the section "These authors' recent dataset". The basic worksheets from the published dataset have been enhanced with two sheets of pivot tables, illustrating the issues with the outcome measures that I described.


Thanks to Elisabeth Bik, Malte Elson, Danny Garside, Steve Lindsay, Stuart Ritchie, and Yannick Rochat for their help in attempting to track down these elusive researchers. Perhaps others will have more luck than us.

(*) I am aware that different customs exist in different countries regarding the order in which "given" and "family" names are written. For example, in several East Asian countries, but also in Hungary, it is common to write the family name first. Interestingly, there is often some ambiguity about this among speakers of French. But as far as I know, German speakers, like English speakers, always use put their given name first and their family name last, unless there is a requirement to invert this order for alphabetisation purposes. And of course, in some parts of the world, the whole idea of "family names" is much more complicated than in Western countries. It's a fascinating subject that, alas, I do not have time to explore here.

21 April 2020

Some issues in a recent gaming research article: Etindele Sosso et al. (2020)

Research into the possibly problematic aspects of gaming is a hot topic. But most studies in this area have focused on gamers in Europe and North America. So a recent article in Nature Scientific Reports, featuring data from over 10,000 African gamers, would seem to be an important landmark for this field. However, even though I am an outsider to gaming research, it seems to my inexpert eye that this article may have a few wrinkles that need ironing out.

Let’s start with the article reference. It has 16 authors, and the new edition of the APA Publication Manual says that we now have to list up to 20 authors’ names in a reference, so let’s take a deep breath:

Etindele Sosso, F. A., Kuss, D. J., Vandelanotte, C., Jasso-Medrano, J. L., Husain, M. E., Curcio, G., Papadopoulos, D., Aseem, A., Bhati, P., Lopez-Rosales, F., Ramon Becerra, J., D’Aurizio, G., Mansouri, H., Khoury, T., Campbell, M., & Toth, A. J. (2020). Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries. Nature Scientific Reports, 10, 1937. https://doi.org/10.1038/s41598-020-58462-0
(The good news is that it is an open access article, so you can just follow the DOI link and download the PDF file.)

Etindele Sosso et al. (2020) investigated the association between gaming and the four health outcomes mentioned in the title. According to the abstract, the results showed that “problematic and addicted gamers show poorer health outcomes compared with non-problematic gamers”, which sounds very reasonable to me as an outsider to the field. A survey that took about 20 minutes to complete was e-mailed to 53,634 participants, with a 23.64% response rate. After eliminating duplicates and incomplete forms, a total of 10,566 gamers were used in the analyses. The “type of gamer” of each participant was classified as “non-problematic”, “engaged”, “problematic”, or “addicted”, depending on their scores on a measure of gaming addiction, and the relations between this variable, other demographic information, and four health outcomes were examined.

The 16 authors of the Etindele Sosso et al. (2020) article report affiliations at 12 different institutions in 8 different countries. According to the “Author contributions” section, the first three authors “contributed equally to this work” (I presume that this means that they did the majority of it); 12 others (all except Papadopoulos, it seems) “contributed to the writing”; the first three authors plus Papadopoulos “contributed to the analyses”; and five (the first three authors, plus Campbell and Toth) “write [sic] the final form of the manuscript”. So this is a very impressive international collaboration, with the majority of the work apparently being split between Canada, the UK, and Australia, and it ought to represent a substantial advance in our understanding of how gaming affects mental and physical health in Africa.

Given the impressive set of authors and the large scale of this international project (data collection alone took 19 or 20 months, from November 2015 to June 2017), it is somewhat surprising that Etindele Sosso et al.’s (2020) article reports no source of funding. Perhaps everyone involved contributed their time and other resources for free, but there is not even a statement that no external funding was involved. (I am quite surprised that this last element is apparently not mandatory for articles in the Nature family of journals.) The administrative arrangements for the study, involving for example contacting the admissions offices of universities in nine countries and arranging for their e-mail lists to be made available, with appropriate guarantees that each university’s and country’s standards of research ethics would be respected, must have been considerable. The participants completed an online questionnaire, which might well have involved some monetary cost, whether directly paid to a survey hosting company or using up some part of a university’s agreed quota with such a company. Just publishing an Open Access article in Nature Scientific Reports costs, according to the journal’s web site, $1,870 plus applicable taxes.

Ethical approval
One possible explanation for the absence of funding information—although this would still constitute rather sloppy reporting, since as noted in the previous paragraph funding typically doesn’t just pay for data collection—might be if the data had already been collected as part of another study. No explicit statement to this effect is made in the Etindele Sosso et al. (2020) article, but at the start of the Methods section, we find “This is a secondary analysis of data collected during the project MHPE approved by the Faculty of Arts and Science of the University of Montreal (CERAS-2015-16-194-D)”. So I set out to look for any information about the primary analysis of these data.

I searched online to see if “project MHPE” might perhaps be a large data collection initiative from the University of Montreal, but found nothing. However, in the lead author’s Master’s thesis, submitted in March 2018 (full text PDF file available here—note that, apart from the Abstract, the entire document is written in French, but fortunately I am fluent in that language), we find that “MHPE” stands for “Mental Health profile [sic] of Etindele” (p. 5), and that the research in that thesis was covered by a certificate from the ethical board of the university that carries exactly the same reference number. I will therefore tentatively conclude that this is the “project MHPE” referred to in the Etindele Sosso et al. (2020) article.

However, the Master’s thesis describes how data were collected from a sample (prospective size, 12,000–13,000; final size 1,344) of members of the University of Montreal community, collected between November 2015 and December 2016. The two studies—i.e., the one reported in the Master’s thesis and the one reported by Etindele et. al (2020)—each used five measures, of which only two—the Insomnia Severity Index (ISI) and the Hospital Anxiety and Depression Scale (HADS)—were common to both. The questionnaires administered to the participants in the Montreal study included measures of cognitive decline and suicide risk, and it appears from p. 27, line 14 of the Master’s thesis that participants were also interviewed (although no details are provided of the interview procedure). All in all, the ethical issues involved in this study would seem to be rather different to those involved in asking people by e-mail about their gaming habits. Yet it seems that the ethics board gave its approval, on a single certificate, for the collection of two sets of data from two distinct groups of people in two very different studies: (a) a sample of around 12,000 people from the lead author’s local university community, using repeated questionnaires across a four-month period as well as interviews; and (b) a sample of 50,000 people spread across the continent of Africa, using e-mail solicitation and an online questionnaire. This would seem to be somewhat unusual.

Meanwhile, we are still no nearer to finding out who funded the collection of data in Africa and the time taken by the other authors to make their (presumably extensive, in the case of the second and third authors) personal contributions to the project. On p. 3 of his Master’s thesis, the author thanks (translation by me) “The Department of Biological Sciences and the Centre for Research in Neuropsychology and Cognition of the University of Montreal, which provided logistical and financial support to the success of this work”, but it is not clear that “this work” can be extrapolated beyond the collection of data in Montreal to include the African project. Nor do we have any more idea about why Etindele Sosso et al. (2020) described their use of the African data as a "secondary analysis", when it seems, as far as I have been able to establish, that there has been no previously published (primary) analysis of this data set.

Further questions arise when we look at the principal numerical results of Etindele Sosso et al.’s (2020) article. On p. 4, the authors report that “4 multiple linear regression analyses were performed (with normal gaming as reference category) to compare the odds for having these conditions [i.e., insomnia, sleepiness, anxiety, and depression] (which are dependent variables) for different levels of gaming.” I’m not sure why the authors would perform linear, as opposed to logistic, regressions to compare the odds of someone in a given category having a specific condition relative to someone in a reference category, but that’s by no means the biggest problem here.

Etindele Sosso et al.’s (2020) Table 3 lists, for each of the four health outcome variables, the regression coefficients and associated test statistics for each of the predictors in their study. Before we come to these numbers for individual variables, however, it is worth looking at the R-squared numbers for each model, which range from .76 for depression to .89 for insomnia. Although these are actually labelled as “ΔR2”, I assume that they represent the total variance explained by the whole model, rather than a change in R-squared when “type of gamer” is added to the model that contains only the covariates. (That said, however, the sentence “Gaming significantly contributed to 86.9% of the variance in insomnia, 82.7% of the variance in daytime sleepiness and 82.3% of the variance in anxiety [p < 0.001]” in the Abstract does not make anything much clearer.) But whether these numbers represent the variance explained by the whole model or just by the “type of gamer” variable, they constitute remarkable results by any standard. I wonder if anything in the prior sleep literature has ever predicted 89% of the variance explained by a measure of insomnia, apart perhaps from another measure of insomnia.

Now let’s look at the details of Table 3. In principle there are seven variables (“Type of Gamers [sic]” being the main one of interest, plus the demographic covariates Age, Sex, Education, Income, Marital status, and Employment status), but because all of these are categorical, each of the levels except the reference category will have been a separate predictor in the regression, giving a total of 17 predictors. Thus, across the four models, there are 68 lines in total reporting regression coefficients and other associated statistics. The labels of the columns seem to be what one would expect from reports of multiple regression analyses: B (unstandardized regression coefficient), SE (standard error, presumably of B), β (standardized regression coefficient), t (the ratio between B and SE), Sig (the p value associated with t), and the upper and lower bounds of the 95% confidence interval (again, presumably of B).

The problem is that none of the actual numbers in the table seem to obey the relations that one would expect. In fact I cannot find a way in which any of them make any sense at all. Here are the problems that I identified:
-        When I compute the ratio B/SE, and compare it to column t (which should give the same ratio), the two don’t even get close to being equal in any of the 68 lines. Dividing the B/SE ratio by column t gives results that vary from 0.0218 (Model 2, Age, 30–36) to 44.1269 (Model 1, Type of Gamers, Engaged), with the closest to 1.0 being 0.7936 (Model 4, Age, 30–36) and 1.3334 (Model 3, Type of Gamers, Engaged).
-        Perhaps SE refers to the standard error of the standardized regression coefficient (β), even though the column SE appears to the left of the column β? Let’s divide β by SE and see how the t ratio compares. Here, we get results that vary from 0.0022 (Model 2, Age, 30–36) to 11.7973 (Model 1, Type of Gamers, Engaged). The closest we get to 1.0 is with values of 0.7474 (Model 3, Marital Status, Engaged) and 1.0604 (Model 3, Marital Status, Married). So here again, none of the β/SE calculations comes close to matching column t.
-        The p values do not match the corresponding t statistics. In most cases this can be seen by simple inspection. For example, on the first line of Table 3, it should be clear that a t statistic of 9.748 would have a very small p value indeed (in fact, about 1E−22) rather than .523. In many cases, even the conventional statistical significance status (either side of p = .05) of the t value doesn’t match the p value. To get an idea of this, I made the simplifying assumption (which is not actually true for the categories “Age: 36–42”, “Education: Doctorate”, and “Marital status: Married”, but individual inspection of these shows that my assumption doesn’t change much) that all degrees of freedom were at least 100, so that any t value with a magnitude greater than 1.96 would be statistically significant at the .05 level. I then looked to see if t and p were the same side of the significance threshold; they were not in 29 out of 68 cases.
-        The regression coefficients are not always contained within their corresponding confidence intervals. This is the case for 29 out of 68 of the B (unstandardized) values. I don’t think that the confidence intervals are meant to refer to the standardized coefficients (β), but just for completeness, 63 out of 68 of these fall outside the reported 95% CI.
-        Whether the regression coefficient falls inside the 95% CI does not correspond with whether the p value is below .05. For both the unstandardized coefficients (B) and the standardized coefficients (β)—which, again, the CI probably doesn’t correspond to, but it’s quick and cheap to look at the possibility anyway—this test fails in 41 out of 68 cases.

There are some further concerns with Table 3:
-        In the third line (Model 1, “Type of Gamers”, “Problematic”) the value for β is 1.8. Now it is actually possible to have a standardized regression coefficient with a magnitude above 1.0, but its existence usually means that you have big multicollinearity problems, and it’s typically very hard to interpret such a coefficient. It’s the kind of thing that at least one of the four authors who reported in the "Author contributions" section of the article that they "contributed to the analyses" would normally be expected to pick up on and discuss, but no such discussion is to be found.
-        From Table 1, we can see that there were zero participants in the “Age” category 42–48, and zero participants in the “Education” category “Postdoctorate”. Yet, in Table 3, for all four models, these categories have non-zero regression coefficients and other statistics. It is not clear to me how one can obtain a regression coefficient or standard error from a categorical variable that corresponds to zero cases (and, hence, when coded has a mean and standard deviation of 0).
-        There is a surprisingly high number of repetitions of exactly the same value, typically to 3 decimal places, within the same variable, category, and absolute value of the statistic from one model to another. For example, the reported value in the column t for the variable “Age” and category “24–30” is 29.741 in both Models 1 and 3. For the variable “Employment status” and category “Employed”, the upper bound of the 95% confidence interval is the same (2.978) in all four models. This seems quite unlikely to be the result of chance, given the relatively large sample sizes that are involved for most of the categories (cf. Brown & Heathers, 2019), so it is not clear how these duplicates could have arisen.

Table 3 from Etindele et al. (2020), with duplicated values (considering the same variable and category across models) highlighted with a different colour for each set of duplicates. Two pairs are included where the sign changed but the digits remained identical; however, p values that were reported as 0.000 are ignored. To find a duplicate, first identify a cell that is outlined in a particular colour, then look up or down the table for one or more other cells with the same outline colour in the analogous position for one or more other models.

The preprint
It is interesting to compare Etindele Sosso et al.’s (2020) article with a preprint entitled “Insomnia and problematic gaming: A study in 9 low- and middle-income countries” by Faustin Armel Etindele Sosso and Daria J. Kuss (who also appears to be the second author of the published article), which is available here. That preprint reports a longitudinal study, with data collected at multiple time points—presumably four, including baseline, although only “after one months, six months, and 12 months” (p. 8) is mentioned—from a sample of people (initial size 120,460) from nine African countries. This must therefore be an entirely different study from the one reported in the published article, which did not use a longitudinal design and had a prospective sample size of 53,634. Yet, by an astonishing coincidence, the final sample retained for analysis in the preprint consisted of 10,566 participants, which is exactly the same as the published article. The number of men (9,366) and women (1,200) was also identical in the two samples. However, the mean and standard deviation of their ages was different (M=22.33 years, SD=2.0 in the preprint; M=24.0, SD=2.3 in the published article). The number of participants in each of the nine countries (Table 2 of both the preprint and the published article) is also substantially different for each country between the two papers, and with two exceptions—the ISI and the well-known Hospital Anxiety and Depression Scale (HADS)—different measures of symptoms and gaming were used in each case.

Another remarkable coincidence between the preprint and Etindele Sosso et al.’s (2020) published article, given that we are dealing with two distinct samples, occurs in the description of the results obtained from the sample of African gamers on the Insomnia Severity Index. On p. 3 of the published article, in the paragraph describing the respondents’ scores on the ISI, we find: “The internal consistency of the ISI was excellent (Cronbach’s α = 0.92), and each individual item showed adequate discriminative capacity (r = 0.65–0.84). The area under the receiver operator characteristic curve was 0.87 and suggested that a cut-off score of 14 was optimal (82.4% sensitivity, 82.1% specificity, and 82.2% agreement) for detecting clinical insomnia”. These two sentences are identical, in every word and number, to the equivalent sentences on p. 5 of the preprint.

Naturally enough, because the preprint and Etindele Sosso et al.’s (2020) published article describe entirely different studies with different designs, and different sample sizes in each country, there is little in common between the Results sections of the two papers. The results in the preprint are based on repeated-measures analyses and include some interesting full-colour figures (the depiction of correlations in Figure 1, on p. 10, is particularly visually attractive), whereas the results of the published article consist mostly of a fairly straightforward summary, in sentences, of the results from the tables, which describe the outputs of linear regressions.

Figure 1 from the preprint by Etindele Sosso and Kuss (2018, p. 10). This appears to use an innovative technique to illustrate the correlation between two variables.

However, approximately 80% of the sentences in the introduction of the published article, and 50% of the sentences in the Discussion section, appear (with only a few cosmetic changes) in the preprint. This is interesting, not only because it would be quite unusual for a preprint of one study to be repurposed to describe en entirely different one, but also because it suggests that the addition of 14 authors between the publication of the preprint and the Etindele Sosso et al. (2020) article resulted in the addition of only about 1,000 words to these two parts of the manuscript.
The Introduction section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) published article (right). Sentences highlighted in yellow are common to both papers.

The Discussion section of the Etindele and Kuss (2018) preprint (left) and the Etindele et al. (2020) article (right). Sentences highlighted in yellow are common to both papers.

Another (apparently unrelated) preprint contains the same insomnia results
It is also perhaps worth noting that the summary of the participants’ results on the ISI measure—which, as we saw above, was identical in every word and number between the preprint and Etindele Sosso et al. (2020)’s published article—also appears, again identical in every word and number, on pp. 5–6 of a 2019 preprint by the lead author, entitled “Insomnia, excessive daytime sleepiness, anxiety, depression and socioeconomic status among customer service employees in Canada”, which is available here [PDF]. This second preprint describes a study of yet another different sample, namely 1,200 Canadian customer service workers. If this is not just another remarkable coincidence, it would suggest that the author may have discovered some fundamental invariant property of humans with regard to insomnia. If so, one would hope that both preprints could be peer reviewed most expeditiously, to bring this important discovery to the wider attention of the scientific community.

Other reporting issues from the same laboratory
The lead author of the Etindele Sosso et al. (2020) article has published even more studies with substantial numbers of participants. Here are two such articles, which have 41 and 35 citations, respectively, according to Google Scholar:

Etindele Sosso, F. A., & Rauoafi, S. (2016). Brain disorders: Correlation between cognitive impairment and complex combination. Mental Health in Family Medicine, 12, 215–222. https://doi.org/10.25149/1756-8358.1202010
Etindele Sosso, F. A. (2017a). Neurocognitive game between risk factors, sleep and suicidal behaviour. Sleep Science, 10(1), 41–46. https://doi.org/10.5935/1984-0063.20170007

In the 2016 article, 1,344 respondents were assessed for cognitive deficiencies; 71.7% of the participants were aged 18–24, 76.2% were women, and 62% were undergraduates. (These figures all match those that were reported in the lead author’s Master’s thesis, so we might tentatively assume that this study used the same sample.) In the 2017 article, 1,545 respondents were asked about suicidal tendencies, with 78% being aged 18–24, 64.3% women, and 71% undergraduates. Although these are clearly entirely different samples in every respect, the tables of results of the two studies are remarkably similar. Every variable label is identical across all three tables, which might not be problematic in itself if similar predictors were used for all of the different outcome variables. More concerning, however, is the fact that of the 120 cells in Tables 1 and 2 that contain statistics (mean/SD combinations, p values other than .000, regression coefficients, standard errors, and confidence intervals), 58—that is, almost half—are identical in every digit. Furthermore, the entirety of Table 3—which shows the results of the logistic regressions, ostensibly predicting completely different outcomes in completely different samples—is identical across the two articles (52 out of 52 numbers). One of the odds ratios in Table 3 has the value 1133096220.169 (again, in both articles). There does not appear to be an obvious explanation for how this duplication could have arisen as the result of a natural process.

Left: The tables of results from Etindele Sosso and Raouafi (2016). Right: The tables of results from Etindele Sosso (2017a). Cells highlighted in yellow are identical (same variable name, identical numbers) in both articles.

The mouse studies
Further evidence that this laboratory may have, at the very least, a suboptimal approach to quality control when it comes to the preparation of manuscripts comes from the following pair of articles, in which the lead author of Etindele Sosso et al. (2020) reported the results of some psychophysiological experiments conducted on mice:

Etindele Sosso, F. A. (2017b). Visual dot interaction with short-term memory. Neurodegenerative Disease Management, 7(3), 182–190. https://doi.org/10.2217/nmt-2017-0012
Etindele Sosso, F. A., Hito, M. G., & Bern, S. S. (2017). Basic activity of neurons in the dark during somnolence induced by anesthesia. Journal of Neurology and Neuroscience, 8(4), 203–207. https://doi.org/10.21767/2171-6625.1000203 [1]

In each of these two articles (which have 28 and 24 Google Scholar citations, respectively), the neuronal activity of mice when exposed to visual stimuli under various conditions was examined. Figure 5 of the first article shows the difference between the firing rates of the neurons of a sample of an unknown number of mice (which could be as low as 1; I was unable to determine the sample size with any great level of certainty by reading the text) in response to visual stimuli that were shown in different orientations. In contrast, Figure 3 of the second article represents the firing rates of two different types of brain cell (interneurons and pyramidal cells) before and after a stimulus was applied. That is, these two figures represent completely different variables in completely different experimental conditions. And yet, give or take the use of dots of different shapes and colours, they appear to be exactly identical. Again, it is not clear how this could have happened by chance.

Top: Figure 5 from Etindele Sosso (2017b). Bottom: Figure 3 from Etindele Sosso et al. (2017). The dot positions and axis labels appear to be identical. Thanks are due to Elisabeth Bik for providing a second pair of eyes.

I find it slightly surprising that 16 authors—all of whom, we must assume because of their formal statements to this effect in the “Author contributions” section, made substantial contributions to the Etindele et al. (2020) article in order to comply with the demanding authorship guidelines of Nature Research journals (specified here)—apparently failed to notice that this work contained quite so many inconsistencies. It would also be interesting to know what the reviewers and action editor had to say about the manuscript prior to its publication. The time between submission and acceptance was 85 days (including the end of year holiday period), which does not suggest that a particularly extensive revision process took place. In any case, it seems that some sort of corrective action may be required for this article, in view of the importance of the subject matter for public policy.

Supporting files
I have made the following supporting files available here
-          Etindele-et-al-Table3-numbers.xls: An Excel file containing the numbers from Table 3 of Etindele et al.’s (2020) article, with some calculations that illustrate the deficiencies in the relations between the statistics that I mentioned earlier. The basic numbers were extracted by performing a copy/paste from the article’s PDF file and using text editor macro commands to clean up the structure.
-          (Annotated) Etindele Sosso, Raouafi - 2016 - Brain Disorders - Correlation between Cognitive Impairment and Complex Combination.pdf” and “(Annotated) Etindele Sosso - 2017 - Neurocognitive Game between Risk Factors, Sleep and Suicidal Behaviour.pdf”: Annotated versions of the 2016 and 2017 articles mentioned earlier, with identical results in the tables highlighted.
-          (Annotated) Etindele Sosso, Kuss - 2018 (preprint) - Insomnia and problematic gaming - A study in 9 low- and middle-income countries.pdf” and “(Annotated) Etindele Sosso et al. - 2020 - Insomnia, sleepiness, anxiety and depression among different types of gamers in African countries.pdf” Annotated versions of the 2018 preprint and the published Etindele et al. (2020) article, with overlapping text highlighted.
-          Etindele-2016-vs-2017.png, Etindele-et-al-Table3-duplicates.png, Etindele-mouse-neurons.png, Etindele Sosso-Kuss-Preprint-Figure1.png, Preprint-article-discussion-side-by-side.png, Preprint-article-intro-side-by-side.png: Full-sized versions of the images from this blog post.

Brown, N. J. L., & Heathers, J. A. J. (2019). Rounded Input Variables, Exact Test Statistics (RIVETS): A technique for detecting hand-calculated results in published research. PsyArXiv Preprints. https://doi.org/10.31234/osf.io/ctu9z

[[ Update 2020-04-21 13:14 UTC: Via Twitter, I have learned that I am not the first person to have publicly questioned the Etindele et al. (2020) article. See Platinum Paragon's blog post from 2020-04-17 here. ]]

[[ Update 2020-04-22 13:43 UTC: Elisabeth Bik has identified two more articles by the same lead author that share an image (same chart, different meaning). See this Twitter thread. ]]

[[ Update 2020-04-23 22:48 UTC: See my related blog post here, including discussion of a partial data set that appears to correspond to the Etindele et al. (2020) article. ]]

[[ Update 2020-06-04 11:50 UTC: I blogged about the reaction (or otherwise) of university research integrity departments to my complaint about the authors of the Etindele Sosso et al. article here. ]]

[[ Update 2020-06-04 11:55 UTC: The Etindele Sosso et al. article has been retracted. The retraction notice can be found here. ]]

[1] This article was accepted 12 days after submission, which is presumably entirely unrelated to the fact that the lead author is listed here as the journal’s specialist editor for Neuropsychology and Cognition.

19 April 2020

In psychology everything mediates everything

In the past couple of years I have reviewed half a dozen manuscripts with abstracts that go something like this:

<Construct X> is known to be associated with higher levels of well-being and healthy psychological functioning, as indexed by <Construct Y>. However, to date, no study has investigated the role of <Construct M> in this association. The present study bridges this gap by testing a mediation path model in a sample of undergraduates (N = 100). As predicted, M fully mediated the positive association between X and Y.  These results suggest that X predicts higher levels of M, which subsequently predicts higher levels of Y. These results provide new insight that may advance a coherent theoretical framework on the pathways by which M enhances psychological well-being.

There is typically a description of how the 100 participants completed measures of constructs X, M, and Y, with a table of correlations that might look like this:

     X       Y
Y   .24*
M   .52***  .32**

* p < .05; ** p < .01; *** p < .001.

Then we get to the mediation analysis. More often than not this is done using the PROCESS macro in SPSS, but it can also be done “by hand” using a few ordinary least-squares regressions. Here are the steps required (cf. Baron & Kenny, 1986):

  1. Show that X is a significant predictor of Y. You probably don’t actually need to do the regression for this, as the standardized regression coefficient and its associated p value will be identical to the correlation coefficient between X and Y, but sometimes the manuscript will show the SPSS output to prove that the authors conducted this regression anyway. (In the last manuscript that I reviewed, the authors performed the single-predictor regression and managed to obtain a standardized regression coefficient that was different to the zero-order correlation, which did not enhance my confidence in the rest of their analyses.) Here the p value will be .016.
  2. Show that X is a significant predictor of M. Again, no regression is required for this, as it’s just the correlation coefficient. The p value in this example is about 3E−8.
  3. Regress Y on both X and M. If you get a significant regression coefficient for M then you have at least a “partial” mediation effect. If, in addition, the regression coefficient for X is non-significant then you have “full" mediation. Here, this produces the following standardized coefficients:
    • M: β = 0.268, p = .018
    • X: β = 0.101, p = .368
Ta-da! In this example, we have complete mediation: The p value for the mediator, M, is significant and the p value for X isn’t. We conclude that Construct M fully mediates the relation between Construct X and Construct Y. We write it up and celebrate our fine contribution to understanding the mechanisms that lead to well-being. Surely the end of mental distress is only one more grant away.

The problem is this: Absolutely any other variable that you might put in place of M, and which is correlated in the same way with X and Y, will also show exactly the same mediation effect. And there is no shortage of things you can measure—in psychology, at least—that are correlated at around .5 and .3 with two other variables, themselves intercorrelated at around .2, that you might have measured. Let’s say that X is some aspect of socioeconomic status and Y is subjective well-being. You can easily come up with any number of ideas for M: gratitude, optimism, self-esteem, all of the Big Five personality traits (if you reverse-score neuroticism as emotional stability), etc., without even needing to resort to Lykken and Meehl’s “crud factor” (“in psychology and sociology everything correlates with everything”; Meehl, 1990, p. 204). Does it make sense for multiple third variables all to apparently fully mediate the relation between a predictor and an outcome variable?

I wrote some R code, which you can find here, to demonstrate the example that I gave above. You will see that I performed the calculations in two ways. The first was to generate (with a bit of trial and error) some random data with the correct correlations. (This produces a bit of rounding error, so the p value for the beta for M in the regression is reported at .019, not .018.) The second—my preferred method, since you can generally use this starting with the table of descriptives that appears in an article—is to start with the correlations and perform the regression calculations from there. (A surprising number of people do not seem to know that you can generally determine the standardized coefficients of multiple regression models just from the correlation table. The standard errors—and, hence, the p values—can then be derived from the sample size. If you have the standard deviations as well, you can get the unstandardized coefficients. Add in the means and you can calculate the intercepts too. Again, this can all be done from the descriptive statistics, which is probably why the complete table of descriptives and correlations used to be standard in every paper. You don't need the raw data for any of this.)

If your initial choice for variable X is more strongly correlated with Y than M is, then you can very often just swap M and Y around, because there is typically nothing to say that whatever X is measuring occurs “before” whatever M is measuring, or vice versa—especially if you just hauled a bunch of undergraduates in and gave them measures of their current levels of X, M, and Y to complete. The reason why you want your mediator, M, to be more strongly correlated than X with the outcome (Y), is a little-known phenomenon of two-variable regression that I like to think of as a sort of “Matthew effect”. Feel free to skip the next paragraph in which I explain this in tedious detail.

When the two predictors are moderately strongly correlated with each other (.52, in our case), then although their zero-order correlations with the outcome variable might be quite close together (.32 and .24 here), their standardized regression coefficients will diverge by quite a bit more than their respective correlation coefficients. Here, M’s correlation of .32 led to a beta of 0.268, which is a 16% reduction, but X’s correlation of .24 was reduced by 58% to a beta of 0.101. If the correlation between M and X had been a little higher (eg, .60 instead of .52), the beta for M would actually have been larger (0.275) and the beta for X would have been even smaller (0.075). At some point along the M–X correlation continuum (around .75), the beta for M would be exactly equal to the correlation coefficient of M with Y (as if X wasn't in the regression model at all), and the beta for X would be zero. Continuing even further, we would hit “negative suppression” territory, with M’s standardized regression coefficient being greater than the original correlation coefficient of .32, and X’s standardized regression coefficient being negative. Many people seem to have a rather naïve view of multiple regression in which the addition of a new predictor results in the betas for all of the predictors being reduced in some roughly equal proportion, but the reality is often nothing like that. You can explore what happens with just two predictors (with more, things get even wilder) here using my Shiny app.

So it’s possible to build an almost infinite number of mediation studies, all of which will appear to tell us something about the mediation of the relation between two psychological variables by a third, although almost all of them are just illustrating a known phenomenon of multiple regression. Again, everything is determined by the three correlations between the variables, plus the sample size if you care about statistical significance. (Alert readers will have noticed that whether or not mediation is “full” or “partial” will depend to a large extent on the sample size; with enough participants even the residual effect of X on Y will be large enough that its p value doesn’t drop below .05. But of course, alert readers will also know that these days statistical significance doesn’t mean very much on its own, right?)

Now, am I saying that all of the mediation articles that I get to review are based on an atheoretical “throw some numbers at the wall and see what sticks” approach, which might be an implication of what I have argued here? Well, no... but I’m also not saying that that never happens. I have heard first-hand from grad students, in several cases, what happens when they have a bunch of variables and no obvious result: Their supervisor suggests that they write them up as a mediation analysis.

I don’t think that preregistration will necessarily help all that much here, because it is quite predictable from previous knowledge that X, M, and Y will have the pattern of correlations needed to produce an apparent mediation effect. I’m going to suggest that the only solution is to refrain from doing this kind of mediation analysis altogether in the absence of (a) much better theoretical justification than we currently see, and (b) some kind of constraint on the temporal order in which changes in X, M, and Y occur. Without a demonstration that the causal arrows are running from X to M and M to Y (MacKinnon & Pirlott, 2014), and not vice versa, we have no way of knowing whether we are dealing with mediation or confounding, especially since in many cases the constructs X, M, and Y may themselves be caused by multiple other factors, and so ad infinitum (cf. Arah, 2008). In the absence of experimental manipulation, causality is hard to demonstrate, especially in psychology. 

Arah, O. A. (2008). The role of causal reasoning in understanding Simpson's paradox, Lord's paradox, and the suppression effect: Covariate selection in the analysis of observational studies. Emerging Themes in Epidemiology, 5, 5. https://doi.org/10.1186/1742-7622-5-5
Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182. https://doi.org/10.1037/0022-3514.51.6.1173
MacKinnon, D. P., & Pirlott, A. G. (2014). Statistical approaches for enhancing causal interpretation of the M to Y relation in mediation analysis. Personality and Social Psychology Review19(1), 30–43. https://doi.org/10.1177/1088868314542878
Meehl, P. E. (1990). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports66(1), 195–244. https://doi.org/10.2466/pr0.1990.66.1.195

(Thanks to Julia Rohrer for her helpful comments on an earlier draft of this post. If the whole thing is garbage, it's probably because I didn't incorporate more of her thoughts.)