19 February 2019

Just another week in real-world science: Butler, Pentoney, and Bong (2017).

This is a joint post by Nick Brown and Stuart Ritchie. All royalty cheques arising from the post will be split between us, as will all the legal bills.

Today's topic is this article:
Butler, H. A., Pentoney, C., & Bong, M. P. (2017). Predicting real-world outcomes: Critical thinking ability is a better predictor of life decisions than intelligence. Thinking Skills and Creativity, 25, 38–46. http://dx.doi.org/10.1016/j.tsc.2017.06.005

We are not aware of any official publicly available copies of this article, but readers with institutional access to Elsevier journals should have no trouble in finding it, and otherwise we believe there may exist other ways to get hold of a copy using the DOI.

Butler et al.'s article received some favourable coverage when it appeared, including in Forbes, Psychology Today, the BPS Digest, and an article by the lead author in Scientific American that was picked up by the blog of the noted skeptic (especially of homeopathy) Edzard Ernst. Its premise is that the ability to think critically (measured by an instrument called the Halpern Critical Thinking Assessment, HCTA) is a better predictor than IQ (measured with a set of tests called the Intelligence Structure Battery, or INSBAT) of making life decisions that lead to negative outcomes, measured by the Real-World Outcomes (RWO) Inventory, which was described by its creator in a previous article (Butler, 2012).

In theory, we’d expect both critical thinking and IQ to act favourably to reduce negative experiences. The correlations between both predictors and the outcome in this study would thus be expected to be negative, and indeed they were. For critical thinking the correlation was −.330 and for IQ it was −.264. But is this a "significant" difference?

To test this, Butler et al. conducted a hierarchical regression, entering IQ (INSBAT) and then critical thinking (HCTA) as predictors. They concluded that, since the difference in R² when the second predictor (HCTA) was added was statistically significant, this indicated that the difference between the correlations of each predictors with the outcome (the correlation for HCTA being the larger) was also significant. But this is a mistake. On its own, the fact that the addition of a second predictor variable to a model causes a substantial increase in R² might tell us that both variables add incrementally to the prediction of the outcome, but it tells us nothing about the relative strength of the correlations between the two predictors and the outcome. This is because the change in R² is also dependent on the correlation between the two predictors (here, .380). The usual way to compare the strength of two correlations, taking into account the third variable, is to use Steiger’s z, as shown by the following R code:

> library(cocor)
> cocor.dep.groups.overlap(-.264, -.330, .380, 244, "steiger1980", alt="t")
<some lines of output omitted for brevity>
 z = 0.9789, p-value = 0.3276

So the Steiger’s z test tells us that there’s no statistically significant difference between the sizes of these two (dependent) correlations in this sample, p = .328.

We noted a second problem, namely that the reported bivariate correlations are not compatible with the results of the regression reported in Table 2. In a multiple regression model, the standardized regression coefficients are determined (only) by the pattern of correlations between the variables, and in the case of the two-predictor regression, these coefficients can be determined by a simple formula. Using that formula, we calculated that the coefficients for INSBAT and HCTA in model 2 should be −.162 and −.268, respectively, whereas Butler et al.’s Table 2 reports them as −.158 and −.323. When we wrote to Dr. Butler in July 2017 to point out these issues, she was unable to provide us with the data set, but she did send us an SPSS output file in which neither the correlations nor the regression coefficients exactly matched the values reported in the article.

There was a very minor third problem: The coefficient of .264 in the first cell of Table 2 is missing its minus sign. (Dr. Butler also noticed that there was an issue with the significance stars in this table.)

We wrote to the two joint editors-in-chief of Thinking Skills and Creativity in November 2017. They immediately indicated that they would handle the points that we had raised with the "journal management team" (i.e., Elsevier). We found this rather surprising, as we had only raised scientific issues that we imagined would be entirely an editorial matter. Over the following year we occasionally sent out messages asking if any progress had been made. In November 2018, we were told by the Elsevier representative that following a review of the Butler et al. article by two independent reviewers who are "senior statistical experts in this field", the journal had decided to issue a correction for... the missing minus sign in Table 2. And nothing else.

We were, to say the least, somewhat disappointed by this. We wrote to ask for a copy of the report by these senior statistical experts, but received no reply (and, after more than three months, we guess we aren't going to get one). Perhaps the experts disagree with us about the relevance of Steiger's z, but the inconsistencies between the correlations and the regression coefficients are a matter of simple mathematics and the evidence of numerical discrepancies between the authors' own SPSS output and the published article is indisputable.

So apparently Butler et al.'s result will stand, and another minor urban legend with no empirical support will be added to the folklore of "forget IQ, you just have to work hard (and I can show you how for only $499)" coaches. Of course, both of us are in favour of critical thinking. We just wish that people involved in publishing research about it were as well.

We had been planning to wait for the correction to be issued before we wrote this post, but as far as we can tell it still hasn't appeared (well over a year since we originally contacted the editors, and 19 months since we first contacted the authors). Some recent events make us believe that now would be an appropriate moment to bring this matter to public attention. Most important among these are the two new papers from Ben Goldacre and his team, showing what (a) editors and (b) researchers did when problems were pointed out in medical trial study protocols (spoiler: very often, not much). Then the inimitable James Heathers tweeted this thread expressing some of the frustrations that he (sometimes abetted by Nick) has had when trying to get editors to fix problems. And last week we also saw the case of a publisher taking a ridiculous amount of time to retract an article that was published in one of their journals published after it had been stolen, accompanied by an editorial note of the "move along, nothing to see here" variety.

There seems to be a real problem with academic editors, especially those at the journals of certain publishers, being reluctant, unwilling, or unable to take action on even the simplest problems without the approval of the publisher, whose evaluation of the situation may be based as much on the need to save face as to correct the scientific record.

A final anecdote: One of us (Nick) has been told of a case where the editor would like to retract at least two fraudulent articles but is waiting for the publisher (not Elsevier, in that case) to determine whether the damage to their reputation caused by retracting would be greater than that caused by not retracting. Is this really the kind of consideration to which we want the scientific literature held hostage?


Butler, H. A. (2012). Halpern critical thinking assessment predicts real-world outcomes of critical thinking. Applied Cognitive Psychology, 26, 721–729. http://dx.doi.org/10.1002/acp.2851