18 October 2018

Just another week in real-world science: de Venter et al. (2017)

(Note: This post, as with all my blog posts, represents only my own opinions, and not those of any organizations with which I am affiliated, or anyone who works for those organizations.)

Someone sent me a link to this article and asked what I thought of it.

De Venter, M., Illegems, J., Van Royen, R., Moorkens, G., Sabbe, B. G. C., & Van Den Eede, F. (2017). Differential effects of childhood trauma subtypes on fatigue and physical
functioning in chronic fatigue syndrome. Comprehensive Psychiatry, 78, 76–82. http://dx.doi.org/10.1016/j.comppsych.2017.07.006

The article describes an investigation into possible relations between various negative childhood events (as measured by the Traumatic Experiences Checklist [TEC]) and impaired functioning (fatigue, as measured by the Checklist Individual Strength [CIS] scale, and general health and well-being, as measured by the well-known SF-36 scale).  The authors' conclusions, from the abstract, were fairly unequivocal: "... sexual harassment emerged as the most important predictor of fatigue and poor physical functioning in the CFS patients assessed. These findings have to be taken into account [emphasis added] in further clinical research and in the assessment and treatment of individuals coping with chronic fatigue syndrome." In other words, as of the publication of this article, the authors believe that the assessment of past sexual harassment should be an integral part of people with the condition widely known as Chronic Fatigue Syndrome (I will use that term for simplicity, although I appreciate that some people prefer alternatives.)

The main results are in Table 3, which I have reproduced here (I hope that Elsevier's legal department will agree that this counts as fair use):

Table 3 from de Venter et al., 2017. Red lines added by me. Bold highlighting of the p values below .05 by the authors.

The article is quite short, with the Results section focusing on the two standardized (*) partial regression coefficients (henceforth, "betas") that I've highlighted in red here, which have associated p values below .05, and the Discussion section focusing on the implications of these.

There are a couple of problems here.

First, there are five regression coefficients for each dependent variable, of which just one per DV has a p value below .05 (**). It's not clear to me why, in theoretical terms, childhood experiences of sexual harassment (but not emotional neglect, emotional abuse, bodily threat, or sexual abuse) should be a good predictor of fatigue (CIS) or general physical functioning (SF-36). The authors define sexual harassment as "being submitted to sexual acts without physical contact" and sexual abuse as "having undergone sexual acts involving physical contact". I'm not remotely qualified in these matters, but it seems to me that with these definitions, "sexual abuse" would probably be expected to lead to more problems in later functioning than "sexual harassment". Indeed, I find it difficult to imagine how one could be subjected to "abuse with contact" while not also being subjected to "abuse without contact", more or less by the nature of "sexual acts". (I apologise if this whole topic makes you feel uneasy. It certainly makes me feel that way.)

It seems unlikely that the specific hypothesis that "sexual harassment (but not emotional neglect, emotional abuse, bodily threat, or sexual abuse) will be a significant predictor of fatigue and [impaired] general functioning among CFS patients" was made a priori. And indeed, the authors tell us in the last sentence of the introduction that there were no specific a priori hypotheses: "Thus, in the present study, we examine the differential impact of subtypes of self-reported early childhood trauma on fatigue and physical functioning levels in a well-described population of CFS patients" (p. 77). In other words, they set out to collect some data, run some regressions, and see what emerged. Now, this can be perfectly fine, but it's exploratory research. The conclusions you can draw are limited, and the interpretation of p values is unclear (de Groot, 1956). Any results you find need to be converted into specific hypotheses and given a severe test (à la Popper) with new data.

The second problem is a bit more subtle, but it illustrates the danger of running complex multiple regressions, and especially of reporting only the regression coefficients of the final model. For example, there is no measure of the total variance explained by the model (R^2), or of the increase in R^2 from the model with just the covariates to the model where the variable of interest is added. (Note that only 29 of the 155 participants reported any experience of childhood sexual harassment at all. You might wonder how much of the total variance in a sample can be explained by a variable for which 80% of the participants had the same score, namely zero.)  All we have is statistically significant betas, which doesn't tell us a lot, especially given the following problem.

Take a look at the betas for sexual harassment. You will see that they are both greater (in magnitude) than 1.0. That is, the beta for sexual harassment in each of the regressions in de Venter et al.'s article must be considerably larger than the zero-order correlation between sexual harassment and the outcome variable (CIS or SF-36), which of course is bounded between 0 and 1. For SF-36, for example, even if the original correlation was -.90, the corresponding beta is twice as large. If you have trouble thinking about what a beta coefficient greater than 1.0 might mean, I recommend this excellent blog post by David Disabato.

(A quick poll of some colleagues revealed that quite a few researchers are not even aware that a beta coefficient above 1.0 is even "a thing". Such values tend to arise only when there is substantial correlation between the predictors. For two predictors there are analytically complete solutions to predict the exact circumstances under which this will happen --- e.g., Deegan, 1978 --- but beyond that you need a numerical solution for all but the most trivial cases. The rules of matrix algebra, which govern how multiple regression works, are deterministic, but their effects are often difficult to predict from a simple inspection of the correlation table.)

This doubling of the zero-order coefficient to the beta is a very large difference that is almost certainly explained entirely by substantial correlations between at least some, and possible all, of the five predictors. If the authors wish to claim otherwise, they have some serious theoretical explaining to do. In particular, they need to show why the true relation between sexual harassment and SF-36 functioning is in fact twice as strong as the (presumably already substantial) zero-order correlation would suggest, and how the addition of the other covariates somehow reveals this otherwise hidden part of the relation.  If they cannot do this, then the default explanation --- namely, that this is a statistical artefact as a result of highly correlated predictors --- is by far the most parsimonious.

My best guess is that that the zero-order correlation between sexual harassment and the outcome variables is not statistically significant, which brings to mind one of the key points from Simmons, Nelson, and Simonsohn (2011): "If an analysis includes a covariate, authors must report the statistical results of the analysis without the covariate" (p. 1362). I also suspect that we would find that sexual harassment and sexual abuse are rather strongly correlated, as discussed earlier.

I wanted to try and reproduce the authors' analyses, to understand which predictors were causing the inflation in the beta for sexual harassment. The best way to do this would be if the entire data set had been made public somewhere, but this is research on people with a controversial condition, so it's not necessarily a problem that the data are not just sitting on OSF waiting for anyone to download them. All I needed were the seven variables that went into the regressions in Table 3, so there is no question of requesting any personally-identifiable information. In fact, all I really needed was the correlations between these variables, because I could work out the regression coefficients from them.

(Another aside: Many people don't know that with just the table of correlations, you can reproduce the standardized coefficients of any OLS regression analysis. With the SDs as well you can get the unstandardized coefficients, and with the means you can also derive the intercepts. For more information and R code, see this post, knowing that the correlation matrix is, in effect, the covariance matrix for standardized variables.)

So, that was my starting point when I set out to contact the authors. All I needed was seven columns of (entirely anonymous) numbers --- or even just the table of correlations, which arguably ought to have been in the article anyway. But my efforts to obtain the data didn't go very well, as you can see from the e-mail exchange below. (***)

First, I wrote to the corresponding author (Dr Maud de Venter) from my personal Gmail account:

Nick Brown <**********@gmail.com>  3 Oct, 16:50

Dear Dr. de Venter,

I have read with interest your article "Differential effects of childhood trauma subtypes on fatigue and physical functioning in chronic fatigue syndrome", published in Comprehensive Psychiatry.

My attention was caught, in particular, by the very high beta coefficients for the two statistically significant results.  Standardized regression coefficients above 1.0 tend to indicate that suppression effects or severe confounding are occurring, which can result in betas that are far larger in magnitude than the corresponding zero-order correlations. Such effects tend to require a good deal of theoretical explanation if they are not to be considered as likely statistical artefacts.

I wonder if you could supply me with a copy of the data set so that I could examine this question in more detail? I would only need the seven variables mentioned in Table 3, with no demographic information of any kind, so I would hope that there would be no major concerns about confidentiality. I can read most formats, including SPSS .SAV files and CSV. Alternatively, a simple correlation matrix of these seven variables, together with their means and SDs, would also allow me to reproduce the results.

Kind regards,
Nicholas Brown

A couple of days later I received a reply, not from Dr. de Venter, but from Dr. Filip Van Den Eede, who is the last author on the article, asking me to clarify the purpose of my request. I thought I had been fairly clear, but it didn't seem unreasonable to send my request from my university address with my supervisor in copy, even though this work is entirely independent of my studies. So I did that:

Brown, NJL ...  05 October 2018 19:32

Dear Dr. Van Den Eede,

Thank you for your reply.

As I explained in my initial mail to Dr. de Venter, I would like to understand where the very high beta coefficients --- which would appear to be the key factors driving the headline results of the article --- are coming from. Betas of this magnitude (above 1.0) are unusual in the absence of confounding or suppression effects, the presence of which could have consequences for the interpretation of the results.

If I had access either to the raw data, or even just the full table of descriptives (mean, SD, and Pearson correlations), then I believe that I would be better able to identify the source of these high coefficients. I am aware that these data may be sensitive in terms of patient confidentiality, but it seems unlikely that any participant could be identified on the basis of just the seven variables in question.

I would be happy to answer any other questions that you might have, if that would make my purpose clearer.

As you requested, I am putting my supervisors in copy of this mail.

Kind regards,
Nicholas Brown

Within less than 20 minutes, back came a reply from Dr. Van Den Eede indicating that his research team does not share data outside of specific collaborations or for a "clear scientific purpose", an example of which might be a meta-analysis. This sounded to me like a refusal to share the data with me, but that wasn't entirely clear, so I sent a further reply:

Brown, NJL ...  07 October 2018 18:56

Dear Dr. Van Den Eede,

Thank you for your prompt reply.

I would argue that establishing whether or not a published result might be based on a statistical artifact does in fact constitute a "clear scientific purpose", but perhaps we will have to differ on this.

For the avoidance of doubt, might I ask you to formally confirm that you are refusing to share with me both (a) the raw data for these seven variables and (b) their descriptive statistics (mean, SD, and table of intercorrelations)?

Kind regards,
Nick Brown

That was 11 days ago, and I haven't received a reply since then, despite Dr. Van Den Eede's commendable speed in responding up to that point. I guess I'm not going to get one.

In case you're wondering about the journal's data sharing policy, here it is. It does not impose what I would call especially draconian requirements on authors, so I don't think writing to the editor is going to help much here.

Where does this leave us? Well, it seems to me that this research team is declining to share some inherently anonymous data, or even just the table of correlations for those data, with a bona fide researcher (at least, I think I am!), who has offered reasonable preliminary evidence that there might be a problem with one of their published articles.

I'm not sure that this is an optimal way to conduct robust research into serious problems in people's lives.

[[ Begin update 2018-11-03 18:00 UTC ]]
I wrote to the editor of the journal that published the de Venter et al. article, with my concerns. He replied with commendable speed, enclosing a report from a "statistical expert" whom he had consulted. Sadly, when I asked for permission to quote that report here, the editor requested me not to do so. So I shall have to paraphrase it, and hope that no inaccuracies creep in.

Basically, the statistical expert agreed with my points, stated that it would indeed be useful to know if the regression coefficients were standardized or unstandardized, but didn't think that much could be done unless the authors wanted to write an erratum. This expert also didn't think there was a problem with the lack of Bonferroni correction, because the reader could fill that in for themselves.
[[ End update 2018-11-03 18:00 UTC ]]

[[ Begin update 2018-11-24 17:20 UTC ]]
Since this blog post was first written, I have received two further e-mails from Dr. Van Den Eede on this subject. In the latest of these, he indicated that he does not approve of the sharing of the text of his e-mails on my blog. Accordingly, I have removed the verbatim transcripts of the e-mails that I received from him before this blog post, and replaced them with what I believe to be fair summaries of their content.

The more recent e-mails do not add much in terms of progress towards my goal of obtaining the seven variables, or the full table of descriptives. However, Dr. Van Den Eede did tell me that the regression coefficients published in the De Venter et al. article were unstandardized, suggesting (given the units of the scales involved) that the effect size was very small.
[[ End update 2018-11-24 17:20 UTC ]]

[[ Begin update 2018-11-27 12:15 UTC ]]
I added a note to the start of this post to make it clear that it represents my personal opinions only.

[[ End update 2018-11-27 12:15 UTC ]]


de Groot, A. D. (1956). De betekenis van “significantie” bij verschillende typen onderzoek [The meaning of “significance” for different types of research]. Nederlands Tijdschrift voor de Psychologie, 11, 398–409. English translation by E. J. Wagenmakers et al. (2014), Acta Psychologica, 148, 188–194. http://dx.doi.org/10.1016/j.actpsy.2014.02.001

Deegen, J. D., Jr. (1978). On the occurrence of standardized regression coefficients greater than one. Educational and Psychological Measurement, 38, 873-888. http://dx.doi.org/10.1177/001316447803800404

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. http://dx.doi.org/10.1177/0956797611417632

(*) A colleague suggested to me that these numbers might in fact be unstandardized coefficients. My assumption that they are standardized partial regression coefficients is based on the following:
1. The Abstract and the Results section of the article refer to them with the Greek letter β., which is the symbol normally used to denote standardized partial regression coefficients.
2. The scale ranges of the IVs (0–12 in four cases, 0–21 in the fifth) and DVs (maximum values over 100) are such that I would expect considerably larger unstandardized values for a statistically significant effect, if these IVs were explaining a non-trivial amount of variance in the DVs.
3. I mentioned "standardized regression coefficients" in my first e-mail to the authors, and "beta coefficients" in the second. Had these numbers in fact been referring to unstandardized coefficients, I would have hoped that the last author would have pointed this out, thus saving everybody's time, rather than entering into a discussion about their data sharing policies.

I suppose that it is just possible that these are unstandardized coefficients (in which case a correction to the article would seem to be required on that basis alone), but of course, if the authors would agree to share their data with me, I could ascertain that out for myself.

(**) I hope that readers will forgive me if, for the purposes of the present discussion, I assume that identifying whether a p value is above or below .05 has some utility when one is attempting to learn something true about the universe.

(***) I'm not sure what the ethical rules are about publishing e-mail conversations, but I don't feel that I'm breaking anyone's trust here. Perhaps I should have asked Dr. Van Den Eede if he objected to me citing our correspondence, but since he has stopped replying to me about the substantive issue I'm not sure that it would be especially productive to ask about a procedural matter.