15 July 2021

Some problems in the dataset of a large study of Ivermectin for the treatment of Covid-19

This post appears at the same time as this piece at grftr.news by Jack Lawrence. Jack contacted me to ask if I could help him look at a number of issues with a prominent study of Ivermectin for the treatment of Covid-19. My speciality is forensic numerical data analysis finding errors in numbers, so I concentrated on that and suggested some other names to Jack to help him look at things like the study design, methods, and reporting. 

Hence, this post is almost entirely about the problems with the data from this study; Jack's piece covers other topics, such as the plagiarised text, the clinical trial "pre-registration" that was made after the first version of the results of the study was published, and many other problems. Gideon Meyerowitz-Katz has a piece over at Medium that discusses the implications of this study for the whole Ivermectin-for-Covid literature, and Melissa Davey is covering the story in the Guardian today.

Here is the article reference. In fact, it's not in a peer-reviewed journal. It's just a preprint—in spite of which it has already acquired 30 citations in just over six months, according to Google Scholar, and—as reported by Jack—has also become a major component of the weight of evidence for the efficacy of Ivermectin in several meta-analyses (see Gideon's Medium piece, linked above, for more on this).

Elgazzar, A., Eltaweel, A., Youssef, S. A., Hany, B., Hafez, M., & Moussa, H. (2020). Efficacy and safety of Ivermectin for treatment and prophylaxis of COVID-19 pandemic. Research Square100956https://doi.org/10.21203/rs.3.rs-100956/v3

The preprint is currently in its third revision(*). You can download it, plus the two previous revisions if you want to compare those, from the preprint hosting service Research Square. (I use draftable.com to compare PDFs.)

The authors have, well, "sort of" made their data available. To quote from the preprint (p. 6): "The study data master sheet are [sic] available on reasonable request from the corresponding auther [sic] from the following link. https://filetransfer.io/data-package/qGiU0mw6#link". It is tempting to imagine that one might be able to download the data file directly from that link; however, when you attempt to do that, the site says that you have to create a premium account ($9 per month), and after you have done that and downloaded the file, it turns out to be password-protected. This suggests that the authors did not want anyone to be able to read it without their approval, which is not quite in the spirit of open science. (It is, however, not incompatible with Research Square's rather feeble data sharing policy.)

Fortunately, Jack Lawrence did a lot of work here. Not only did he pay for a premium account at filetransfer.io, but he also guessed the password of the file, which turned out to be 1234. I have never met Jack Lawrence in person, though, so as part of my due diligence for this blog post, I also paid $9 plus VAT for a one-month subscription to filetransfer.io, and downloaded the file for myself. To save you, dear reader, from having to go through that process, I have made an unlocked copy of the file available here. It is perhaps interesting to note that, judging by the filename, the authors were apparently still editing the "study data master sheet" on 12 December 2020, when they had already posted essentially all of their results in two earlier versions of their preprint by November 16.

Formatting problems

The data file is in Microsoft Excel (.XLSX) format, although the authors reported performing their analyses in SPSS 21. In the Excel metadata (File/Properties/Statistics), the creation date of the file is "16 September 2006 02:00:00", which suggests that the authors started with an older file and cleared all the cells before entering their data. Clearing the cells in this way does not remove cell formatting, which might explain one or two of the stranger cell formats that one sees when opening the file in Excel (e.g., in cell K5 the number 6.3 is formatted as a day and appears as "06-Jan", while B222 and F225:Z225 are in a different font); however, the formatting problems go a lot further than that.

Numbers containing non-numeric characters

Several cells that represent numbers in the Excel file appear to have been entered by someone more used to a manual typewriter than a computer. Specifically, cells K17, L318, L354, L366, L380, M38, M101, M396:M402, S272, S278, S280, S396, and S398 contain one or more occurrences of the lowercase letter "o" instead of the digit "0". As a result, these cells are text strings, rather than numbers, and any numerical calculations based on them will fail.

Because these cells contain strings, their values are left-aligned (the default for strings in Excel), whereas the numbers in the same column are right-aligned. In many cases it seems that the creator of the data file has attempted to remedy this visual infelicity by left-padding the non-numeric string with spaces. For example, although the value "1.o" [sic] in cell L318 has no padding, the same value in cells L354, L366, and L380 has been padded on the left with 33, 34, and 32 space characters, respectively.

Relatedly, the percentages in cells M89, M94, M128, S232, S243, S245, S250, S261, S262, S274, and S279 contain a comma as a decimal separator, instead of a dot, and so again are treated as text strings rather than numbers. These cells with commas are padded on the left with between 12 and 16 leading space characters in column S, although there is no padding in column M.

Confusion around date formats

Columns W and X of the Excel file contain dates with the captions "symptoms date&+ ve PCR" and "recovery date & -ve PCR". It seems that these dates are performing multiple duties, since there is no obvious reason why a patient's date of first showing symptoms of Covid-19 should be identical to the date on which they first tested positive, or why the date of their (first?) negative PCR test should correspond with their doctor's (?) certification that they have recovered. These dates seem to have also been used to calculate the length of time during which people were hospitalised (column Y), although again, one would generally expect hospital admission to be somewhat decoupled from symptoms and PCR testsdates. I find it very surprising that there are not more dates recorded for each patient, to account for the various milestones that are of importance in the progression and treatment of Covid-19.

However, it is not only the meaning of these dates that is confusing. Their format is, too. The only dates that are actually formatted as Excel dates (i.e., with an underlying number representing the count of days since December 30, 1899) are those where both the day and month are less than 13. My working hypothesis is that the creator of the file either typed in the dates by hand, or pasted them from a text file, in dd/mm/yyyy format, but that Excel was in "US date mode" at the time. Thus, the only dates that were converted to the underlying numeric date format were those that were interpretable as mm/dd/yyyy (i.e., those with a dd/mm/yyyy "day" less than 13; I assume that there were no errors with a dd/mm/yyyy "month" greater than 31).

As with the numbers that contain non-numeric characters (see previous section), it seems that padding with spaces has been added manually in an attempt to align the "string" dates with the "correct" (numeric) dates. In column W, there are 176 numeric dates, 115 "string" dates with no padding, and 109 "string" dates with left padding of between 19 and 27 spaces; three of these also have padding on the right, of 3, 6, and 33 spaces. In column X, after removing the text "died in ICU" from 21 dates and "died bin ICU" [sic] from another, there are 94 numeric dates, 305 "string" dates with between left padding of between 1 and 28 spaces (three of which also have padding on the right, of 1 and 8 spaces), and one "string" date with left padding of a backquote character (`) and 22 spaces (cell X259). In 6 cases (cells X8, X15, X23, X62, X75, and X402), the discharge date in column X also includes the number of days spent in hospital, which should be in column Y; it appears that whoever was inputting the data may have thought that using spaces to move the cursor over to the right of the cell boundary between the two columns was equivalent to using the Tab key to move to the next cell. 

Several of the "string" dates are incorrectly formatted internally, e.g., "1l6l2020" (cell W110, with lowercase "L" as the separator), "06/82020" (cell W208), "31/7//2020" (cell X223), and "6/8/20/20" (cell X230). The date in cell X155 ("31/06/2020") is ostensibly formatted correctly, but implies that the patient was discharged on the non-existent date of 31 June 2020.

In summary, it is impossible for the dates in the file to have been used to calculate any sort of elapsed time in SPSS. Indeed, it seems that this calculation was done by hand, with the results being reported in column Y (with the addition of the text " days"), and with different "fencepost" rules typically being applied for each group. For example, in groups I and III the number of days in column Y is usually one more than the difference between the dates in columns W and X (i.e., both the start and end date are counted), whereas for groups II and (especially) IV the number of days in column Y is typically equal to the difference between the dates in columns W and X. (See also "Table 6", below, for a brief discussion of the apparent confusion between columns W and X on the one hand, and column Y on the other.)

Repeated sequences

At several points in the Excel file, there are instances where the values of an ostensibly random variable are identical in two or more sequences of 10 or more participants, suggesting that ranges of cells or even entire rows of data have been copied and pasted.

Approximately 19 cloned patients in group II

In cells B150:B168 and B184:B202, the patient's initials are either identical at each corresponding point (e.g., cells B150/B184) or, in almost all the remaining cases, differ in only one letter.

Cells C150:C168 are identical to cells C184:C202.
Cells D150:D168 are identical—with one exception out of 19 cells—to cells D184:D202.
Cells I150:I167 are identical to cells I184:I201.
Cells S150:S165 are identical—with one exception out of 14 cells—to cells S184:S199.
Cells U150:U168 are identical to cells U184:U202.
Cells V150:V168 are identical to cells V184:V202.
Cells W150:W168 are identical—with three exceptions out of 19 cells—to cells W150:W168.
Cells AA150:AA168 are identical to cells AA184:AA202.

Approximately 60 cloned patients in group IV

In cells B303:B320, B321:B338, and B339:B356, the patient's initials are either identical at each corresponding point (e.g., cells B303/B321/B339) or, in almost all the remaining cases, differ in only one letter.

Cells I303:I320 are identical to cells I321:I338 and I339:I356, including the typo "coguh" for "cough".
Cells I358:I371 are identical to cells I372:I385, including the typo "coguh" for "cough".
Cells I340:I349 are identical—with one exception out of 10 cells—to cells I386:I395.

Cells J303:J320 are identical to cells J321:J338 and J339:J356.
Cells J358:J371 are identical to cells J372:J385.
Cells J340:J349 are identical to cells J386:J395.

Cells K303:K320 are identical to cells K321:K338 and K339:K356.
Cells K358:K371 are identical to cells K372:K385.
Cells K340:K349 are identical to cells K386:K395.

Cells L303:L320 are identical—with two exceptions out of 18 cells—to cells L321:L338 and L339:L356.
Cells L358:L371 are identical—with one exception out of 14 cells—to cells L372:L385.
Cells L340:L349 are identical—with two exceptions out of 10 cells—to cells L386:L395.

Cells M303:M320 are identical to cells M321:M338 and M339:M356.
Cells M358:M371 are identical to cells M372:M385.
Cells M340:M349 are identical to cells M386:M395.

Cells S303:S320 are identical to cells S321:S338 and S339:S356.
Cells S358:S371 are identical to cells S372:S385.
Cells S340:S349 are identical to cells S386:S395.

Cells U303:U320 are identical to cells U321:U338 and U339:U356.
Cells U358:U371 are identical to cells U372:U385.
Cells U340:U349 are identical to cells U386:U395.

Cells W303:W320 are identical to cells W321:W338 and W339:W356.
Cells W358:W371 are identical to cells W372:W385.
Cells W340:W349 are identical to cells W386:W395.

Cells Y303:Y320 are identical (apart from spacing differences) to cells Y321:Y338 and
with one exception out of 18 cells—Y339:Y356.
Cells Y358:Y371 are identical—with three exceptions out of 14 cells—to cells Y372:Y385.
Cells Y340:Y349 are identical to cells Y386:Y395.

Cells Z303:Z320 are identical to cells Z321:Y338 and Z339:Y356.
Cells Z358:Z371 are identical—with three exceptions out of 14 cells—to cells Z372:Z385.
Cells Z340:Z349 are identical to cells Z386:Z395.



Duplicated cells in groups II (top) and IV (bottom). In each column, groups of 10 or more cells with the same background colour and surrounded by a solid black border are identical. These images are screenshots from my annotated version of the Excel data file (see "Resources", below).

These patterns are not consistent with groups II and IV each containing the results of 100 different, real patients. The chances of any one of these duplications occurring by chance, let alone all of them, are astronomical. These patterns are, however, highly consistent with the idea that the Excel file has been fabricated with extensive use of copy/paste operations, followed perhaps by occasional attempts to obscure this "cloning" process by changing some numbers manually. Indeed, the slight imperfections in some of the copies would seem to exclude the possibility that these patterns are the result of an unfortunate slip of the mouse.


It seems indisputable that the patients in group II (mild/moderate disease, control condition) whose records are found at line 184 through 202 of the Excel file—a total of 19 people—are crude "clones" of the data of other patients (who, themselves, may or may not have actually existed). Similarly, it is hard to think of any explanation for the duplications in lines 321 through 356 and 372 through 395, other than that the records of around 32 patients in group IV (severe disease, control condition) have been "cloned", some of them multiple times. The question then naturally arises of which other records in the file may not reflect the reality of the patients in the study.


Apparent failures of randomisation

The patients in groups I and II (mild/moderate disease, treatment and control) ought to have been similar to each other; likewise the patients in groups III and IV (severe disease, treatment and control). Indeed, the authors state (p. 3) that "A block randomization method was used to randomize the study participants into two groups that result in equal sample size. This method was used to ensure a balance in sample size across groups over the time and keep the number of participants in each group similar at all times". (Aside: I would be grateful if someone could explain to me what the second sentence there implies for the execution of the study.)

However, the randomisation does not appear to have been a complete success. For example:
  • In group I, the number of patients with anosmia as an additional symptom was 25. In group II, this number was 4.
  • In group I, the number of patients with loss of taste as an additional symptom was 25. In group II, this number was 0.
  • In group III, the number of patients with vomiting as an additional symptom was 1. In group IV, this number was 12.
  • In group III, the number of patients with bronchial asthma as a comorbidity was 14. In group IV, this number was 0.
  • In group III, the number of patients with cholecystitis, chronic kidney disease, hepatitis B, hepatitis C, and open heart surgery as comorbidities was 0 in all five cases. In group IV, these numbers were 6, 5, 5, 6, and 6, respectively.

Descriptive statistics that do not match the preprint

The first three paragraphs of the Results section of Elgazzar et al.'s preprint contain descriptions of the characteristics of their sample. Here, I reproduce the text of each of those paragraphs. Where the numbers that I calculated from the data set differ from those reported, I have included my calculated values in red and inside brackets. It should be apparent that while a few of the numbers calculated from the Excel sheet match those in the preprint, the great majority do not.

First paragraph:
The mean age in Group I was 56.7 [47.5] ±18.4 [15.1]; included 72 % males and 28 % females. The mean age in Group II was 53.8 [43.2] ±21.3 [16.1]; included 67 [66] % males and 33 [34] % females. The mean age in Group III was 58.2 [55.0] ±20.9 [14.0]; included 68 [74] % males and 32 [26] % females. The mean age in Group IV was 59.6 [54.2] ±18.2 [13.7]; included 74 [73] % males and 26 [27]% females. The mean age in Group V was 57.6 [48.8] ±18.4 [9.2]; included 75 % males and 25% females. The mean age in Group VI was 56.8 [54.4]±18.2 [8.8]; included 72 % males and 28% females. There was no statistical significance variation between groups regarding mean age or sex distribution (p-value >0.05).

The sex of one of the participants in group V was coded in the Excel sheet as "A" (cell C449), rather than "M" or "F". The preprint made no mention of any patients identifying as anything other than Male or Female. In order for the numbers of patients of each sex in group V to match the numbers reported in the preprint, I counted "A" as "M".

Second paragraph:
Co morbid conditions distributed between different studied groups showed that DM was present in 15 [4]% of Group I patients, 14 [16]% of Group II patients, 18% of Group III patients, 21 [26]% of Group IV patients 15% of group V and 19 % of group VI. HTN presented in 11 [6]% of Group I patients, 12 [13] % of Group II patients, 14% of Group III patients, 18 [32]% of Group IV patients ,15 [14] % of group V patients and 14 [13]%of group VI patients . 2 [1]% of Group I patients had IHD versus 6 [7]% in Group II, 5% in group III; 12 [5]% in group IV;1% in group V and 3 [4] % in group VI respectively with statistically significant prevalence of ischemic heart disease as severity increase (p-value < 0.03).. Bronchial asthma presented in 5 [3]% of Group I patients, 6 % of Group II patients, and 14% of Group III patients, in 12 [0]% of Group IV patients; 5% of group V and 4% of group VI patients.

I assume that the authors' calculation of "prevalence of ischemic heart disease as severity increase" involved grouping the patients into three pairs of groups by severity (groups I and II, groups III and IV, and groups V and VI). Here are the results of that operation using, first, their IHD prevalence numbers, and second, my calculated numbers. The authors reported a p value of "< 0.03".

> chisq.test(c(8, 15, 4))
X-squared = 6.8889, df = 2, p-value = 0.03192

> chisq.test(c(8, 10, 5))
X-squared = 1.6522, df = 2, p-value = 0.4378

Third paragraph:
Clinically there was a highly statistically significant difference between groups of diseased patients regarding fatigue, dyspnea, and respiratory failure (p-value <0.001), as most of group III & IV, showed fatigue and dyspnea (86 [86, 85]% and 88 [85, 84]%, respectively), compared to (36 [28]%, 38 [47]% ; 54 [34]% and 52 [49]%, respectively), in group I & II. Respiratory failure had been detected in 38% and 40% in group III& IV respectively while no patients in group I& II developed respiratory failure. No skin manifestation had been detected in any group.

The authors' reporting of fatigue and dyspnea is unclear here, as for groups III and IV they report only two percentages. I have assumed that their claim was that these were identical for each of the two groups (i.e., fatigue, 86 for both groups; dyspnea, 88 for both groups), whereas I found three out of four numbers to be different. I was unable to calculate the percentage of respiratory failure as this was not apparently reported in the data file, although "sore throat" was. Nor could I find anything in the data file corresponding to "skin manifestation". Regarding the "highly statistically significant difference between groups of diseased patients", the p values are less than 0.001 (indeed, less that 1E-9) with the authors' numbers or mine.


Table results that do not match the preprint

I wrote some R code that attempts to reproduce the authors' tables, as far as possible. This is available here so that readers can judge (a) whether the small number of decisions that I needed to make in order to adapt the Excel sheet for analysis are reasonable, and (b) whether I have programmed the subsequent calculations correctly.

Here are the results that I obtained. (Full-resolution images are available at the same link as the code.) Readers are invited to compare these results with the tables in Elgazzar et al.'s preprint. I think it is fair to say that there is a substantial degree of divergence.

Table 1



The D-dimer results from the Table 1 in preprint could not be reproduced because no data for that measure exist in the Excel file.

Tables 2 and 3



Only three of the elements from the preprint's Tables 2 and 3 correspond to values in the Excel file after one week of treatment. Longitudinal data for HGB, TLC, Lymphocyte % are all missing, and the Excel file contains no data for D-dimer at any time point.

The time difference between the first and last RT-PCR tests, can be calculated in two ways: either using the authors' provided field (column Y) or by subtracting the date of the first PCR test (column W) from the date of the final PCR test (column X) and then (as was apparently done by the authors, at least for group I) adding one.

Table 4


The first half of Table 4 ("Prognosis") cannot be reproduced, as this variable only exists in the Excel file for groups V and VI. As for Tables 2 and 3, there are two ways to calculate the time of stay in the hospital. The per-group ranges associated with the version labelled "RecordedStay", corresponding to column Y in the Excel sheet, are not too far from the ones reported in the preprint, with 6 out of 8 numbers (minimum or maximum) being identical; would seems to suggest that the reproduction is on the right track.

Also noteworthy here are the extremely small standard deviations of the stays in group I (both as recorded in column Y, and as calculated from columns W and X) and group III (as calculated from columns W and X; in this last case I find myself wondering why there is such a difference between the SDs of the recorded and calculated stays).

Relatedly, in the preprint, the standard deviations for the hospital stay are remarkably different between the groups. The large SD for group IV (8, with a mean of 18 and a range of 9–25) implies that about 40% of the patients stayed 9 days and 60% stayed for 25, with almost no room for any other lengths of stay, as shown by SPRITE (Heathers et al., 2018, https://peerj.com/preprints/26968/; https://shiny.ieis.tue.nl/sprite/).

SPRITE analysis of the possible distribution of the recovery times claimed by Elgazzar et al. for patients in group IV.

[[Begin update 2021-07-16 12:46 UTC]]
Alert reader Anatoly Lubarsky pointed out on Twitter that there are combinations of stay lengths that do not involve quite as many values at the limits, 9 and 25, as the chart above. He is correct. By specifying one decimal place when generating the above chart I had, in effect, told SPRITE to look for SD values in the range 7.95–8.05, whereas the authors reported only integers and so their SD could have been anywhere from 7.50001 through 8.49999. It's a bit ironic that I missed that since I previously wrote this post on pretty much exactly this topic.

rSPRITE doesn't currently work with zero decimal places, but Anatoly also provided an example that he had constructed to show what seems to me to be the most favourable (i.e., the least extreme) result from the point of view of the authors. Here is the resulting chart from that example. I do not think that this greatly alters the idea that this pattern of days spent in hospital is unlikely to be a reflection of real-world data.

Chart showing 100 values with minimum=9, maximum=25, mean=17.86 (which rounds to 18 if no decimal places are included), and SD=7.505 (rounds to 8), cf. Elgazzar et al.'s Table 4, bottom row, group IV.
See the file "SD-simulation.xls" in "Resources", below.

[[End update 2021-07-16 12:35 UTC]]


It is unclear why the authors claim to have performed a chi-squared test (χ2=87.6, p<0.001) on the value of "Recovery time &Hospital stay", as it is clearly not a categorical variable. It is tempting to imagine that this result was copied and pasted from the first half of Table 4 (with the test statistic being altered by subtracting 1 from each digits) by someone who did not understand what they were doing and did not realise that a chi-squared test is meaningless here.

Table 5
The data that would be needed to reproduce Table 5 do not seem to be available in the Excel file.

Table 6
As with Table 4, the first part of this table cannot be reproduced, as the "Prognosis" variable is not available in the Excel file.

I have not reproduced the second part of Table 6 as it appears to be redundant or to use unavailable data. The RT-PCR results correspond to the last lines of Tables 2 and 3. Interestingly, the "Hospital Stay" variable appears to be different from "RT-PCR" here, although as I hope to have demonstrated earlier, the variable marked "Hospital Stay" in column Y of the data file has a very close relationship with the difference between columns W ("symptoms date&+ ve PCR") and X ("recovery date & -ve PCR"). It seems that the authors are unsure whether the difference in days between the first positive and last negative PCR test (columns W and X) corresponds to the hospital stay or not, with or without an adjustment of one day for the "fencepost" issue mentioned earlier.

Other issues


The age distribution

The distribution of patient ages is very strange. There are 34 patients aged 48 and 31 aged 58, but only 3 aged 50 and 4 aged 53. Furthermore, of the 600 patients, 410 have an age that is an even number of years while only 190 have an age that is an odd number of years.

It is difficult to see how any of this could have arisen by chance. (The R function pbinom() reports that the binomial probability of 399 out of 600 ages being even is 1.11E-16; it cannot represent the chance of 400 or more even ages out of 600.)


Trailing digits of numerical variables

Kyle Sheldrick discovered that of the 400 values of the variable "serum ferritin before treatment", only three end in the digit 3. I looked at the other numerical results in the preprint and found that in almost all cases the distribution of the trailing digits is extremely unusual, and in contrast with what one would expect from Benford's Law, which—although it is perhaps best known for its predictions about the first digits of data corresponding to natural phenomena anchored at zero—shows that for the digits of a random variable apart from the first, the expected distribution is approximately uniform. (In November 2020 I used the predictions of the distributions of trailing digits from Benford's Law to demonstrate that the official Covid-19 statistics from the Turkish Ministry of Health were probably fabricated.)

In cases where the dominant trailing digit is zero we might allow for the possibility that different people collected data to different degrees of precision, thus leading to numbers being rounded and, consequently, a trailing digit of zero more often than might be expected by chance. But this cannot explain why, for example, 82% of the numbers for HGB end in the digits 2–5, or why 17.5% of the numbers for TLC end in 8 whereas none end in 2. The large chi-square statistics and their associated homeopathic p values in the tests of the trailing digits from Elgazzar et al.'s data file suggest that none of these patterns are the result of a natural process. They are, however, highly compatible with the idea that the numbers in the Excel table have either been copied and pasted in bulk, or invented out of whole cloth by someone who was trying (and failing) to simulate random numbers—an activity that humans are not very good at.

Counts of the trailing digits (0–9) of various numeric variables in Elgazzar et al.'s data file, and the chi-square statistics for the test against the null hypothesis that their distribution is uniform.


Study entry and exit dates

The preprint states (p. 3) that "The study was carried out from 8th June to 15th September 2020". This seems to conflict with the study's registration on ClinicalTrials.gov, which states that the "Actual Study Completion Date"—defined as "The date [of] the last participant's last visit"—was 30 October 2020. We cannot, perhaps, infer much from the fact that the last recorded entry (positive PCR) and exit (negative PCR) dates in the Excel file are 18 August 2020 and 21 August 2020, respectively, as we do not have date information for the outpatients (groups V and VI). However, we can see that there are 120 patients (71 in group II, 3 in group III, and 47 in group IV) with an entry date prior to 8 June 2020, with the earliest being 12 May 2020. Similarly, there are 49 patients (31 in group II, 1 in group III, and 17 in group IV) with an exit date prior to 8 June 2020, with the earliest being 23 May 2020.

SPSS

Another strange feature of this story is that, although the authors claim to have performed their analyses using SPSS (p. 4 of the preprint), they did not share the SPSS data file (in .SAV or .CSV format), although this would have been a much better way to allow readers to reproduce their analyses. Instead they shared what they called the "study data master sheet". As I have shown here, these data (a) contain numerous signs of manipulation and (b) once cleaned up and analysed with the same statistical tests that authors used, mostly—but, perhaps significantly, not entirely—fail to produce the results reported by the authors in their Results section text and tables.

There is another curious sentence in the preprint that makes me wonder whether the authors actually used SPSS at all, or indeed have ever done so. On p. 5 they wrote "After the calculation of each of the test statistics, the corresponding distribution tables were counseled to get the 'P' (probability value)". Assuming that "counseled" here is a typo for "consulted", it appears that the authors' claim is that they read the test statistics from the SPSS output and then looked up the corresponding p values in a table, such as the one on this page. I wonder why anyone would do this, given that the SPSS output for all of the tests that the authors reported having run contains the p value right next to the test statistic. Looking up test statistics in a table to get the p value has been out of fashion since we stopped computing t statistics using pencil and paper, circa 1995 ("Ah, now I know why my desk calculator has a square root key").

Conclusion

In view of the problems described in the preceding sections, most notably the repeated sequences of identical numbers corresponding to apparently "cloned" patients, it is difficult to avoid the conclusion that the Excel file provided by the authors does not faithfully represent the results of the study, and indeed has probably been extensively manipulated by hand.

In some cases where forensic researchers have discovered discrepancies between images or datasets and the results reported in a paper, the authors have attempted to claim that they "accidentally" provided a "training" version that had been made to calibrate their software. It does not seem possible that such a defence could be used in this case, however, since the Excel sheet provided by Elgazzar et al. cannot possibly have been used for this purpose, in view of the extensive amount of manual cleaning that would be required to make it useable for any purpose.

I urge the authors to make their SPSS data file publicly available without delay, in order that we can see the exact numbers on which their analyses were based—because, as demonstrated above, those numbers cannot be those in the Excel file. If the authors cannot provide their SPSS data file then I believe that either they or Research Square should consider retracting their preprint as a matter of urgency.

Resources

I have made the following files available here:
  • The unlocked original data file, named "Copy of covid_19 final master sheet12-12-2020 (1).xlsx". That is exactly the name that it would have if you were to download it (albeit still locked with a password at that point).
  • The password-protected version of the original data file, to the name of which I have prepended (Locked), so the full name is now "(Locked) Copy of covid_19 final master sheet12-12-2020 (1).xlsx". The password for read-only access is 1234. If someone has the right tools to extract the second-level password that is needed to modify the contents of the file, I'd be curious to know what it is. [[ Update 2021-07-17 15:58 UTC: Graham Sutherland has discovered that there are many passwords that unlock read-write access, the simplest of which appears to be 0001. ]]
  • An annotated version of that file, named "(Nick) Elgazzar data.xls". I have highlighted the anomalous individual cells in yellow and the runs of duplicated cells in a variety of other colours.
  • Slightly higher resolution versions of the images from this post, named "Table1.png" (etc), "SPRITE-Table4.png", and "Trailing-digits.png".
  • My analysis code, named "Elgazzar.R".

Acknowledgements

Thanks to Jack Lawrence (@TimPoolClips) for bringing the paper to my attention and cracking the password of the data file; Gideon Meyerowitz-Katz (@GidMK) for pointing out the Table 4 standard deviation problem, the issue with the patient dates relative to the reported study dates, and the claim that the p values were looked up in a table; and Kyle Sheldrick (K_Sheldrick) for making the initial discovery of the lack of trailing 3s in the serum ferritin numbers.


(*) Things move fast in Covid world. Less than 24 hours before this blog post was due to be published, Research Square posted "V4" of the preprint, which is simply a placeholder that says "Research Square has withdrawn this preprint due to ethical concerns". To ensure that the preprint does not disappear, I have posted the PDFs of the first three versions in the same location as the other supporting files for this post (see "Resources", above).


2021-07-14: Research Square withdraws the preprint.

06 May 2021

My minor involvement in the investigation of some strange articles from marine ecology

Today's topic is this report in Science by Martin Enserink about possible scientific misconduct in a series of studies that investigate the relation between increasing CO2 levels (causing a decrease in the pH of the world's oceans) and the behaviour of fish. Martin's report gives most of the background that you will need to follow this post. While he was preparing it, he asked me to look at the dataset for a couple of articles from the research group whose work he was investigating. In particular, I found a lot of interesting things in this article:

Dixson, D. L., Abrego, D., & Hay, M. A. (2014). Chemically mediated behavior of recruiting corals and fishes: A tipping point that may limit reef recoveryScience, 345(6192)892–897. https://doi.org/10.1126/science.1255057
(You can find the PDF of the article on a ResearchGate page here; I'm not sure if this direct link to the file will work.)

Most of my analyses of the article and its associated dataset are written up in a report that you can find here [PDF]. In this short post I just want to mention one other point that isn't in that report, which is the whole question of why the dataset is in the form of an Excel file (which you can find here [XLSX]) in the first place.

As I note in the report, just for the observations of the behaviours of 15 different species of fish (the first 15 of the 19 worksheets in the Excel file) the researchers must have made 864,000 separate notes. That is, the real "Raw Data" (this phrase appears in the title of the Excel file, but those are not the "raw data") consist of 864,000 entries corresponding to the position, in one of two possible channels, of 20 examples of 15 species of fish captured from 6 locations being recorded in 10 samples of water over 2 sets of trials of 2 minutes each with 12 observations per minute (20 x 15 x 6 x 10 x 2 x 2 x 12 = 864,000). That's almost a million ones and zeroes, each labelled with a species, fish number, capture location, water type, trial number, and sequence number of the observation.

Somewhere there must exist, or at the very least have existed, a CSV file (or, perhaps, a file in some other proprietary format, such as SPSS or SAS or Stata; but as far as I know, all of those packages can export to CSV format) containing those raw numbers. Even if the 864,000 observations were initially made on a very very very large stack of paper, at some point they would have been entered into a computer in a format from which the analyses reported in the article could have been run. Importantly, the analyses could almost certainly not have been run directly from this Excel file, because of the inconsistencies that it contains. Indeed, when I wrote some code (available here) to try to extract some summary statistics from the dataset, I had to explicitly work around the errors in the data, such as the cases where there are 21 rather than 20 fish in a set of tests, or where data elements are in different positions from one sheet to the next. Had the original analyses been based on these Excel sheets, the authors would surely have noticed that these misalignments were causing strange results or even crashes, and fixed the dataset. (And if by some chance they didn't notice these problems, there would be some inconsistencies in the published results, whereas—as the group of investigators led by Timothy Clark has pointed out—the results in this paper, and indeed across multiple studies from this laboratory, are remarkably uniform.)

As Martin's article mentions, several other datasets (all Excel files) from the same laboratory seem to have similar problems. There seems to be a consistent pattern of the researchers deciding that in order to share their data, rather than just making their CSV file available with a few notes to explain the purpose and/or labels of each variable, they needed to laboriously re-enter their data into an Excel sheet, with lots of needless formatting whose only effects are to (a) increase the chance of errors and (b) make it harder for anyone to replicate their analyses in software. Meanwhile, the original raw files from which the statistics and charts in the published articles were made remain mysteriously absent. It is difficult to understand why anybody would work this way, when simply sharing the actual raw data would represent less effort and be much more reliable.

[2021-05-10 20:58 UTC: Updated link to my analysis report with a new version. ]


27 January 2021

Why I blog about apparent problems in science

In this post I want to discuss why I blog directly about what I see as errors or other problems in scientific articles. I had the idea to write this some time ago, and indeed some of the sentences below have been sitting in my drafts folder for quite a while, but the discussions on Twitter about my most recent post have prodded me to finally write this up. (However, I don't go into that post or those Twitter discussions further here.)

I have seen criticism of the "blog first" approach because it "drops stuff on the authors out of the blue" or "doesn't give them a right to defend themselves". People have suggested that it would be better to approach the authors first and discuss the problems. That seems obvious, and it was how I used to approach things too, but over time I have changed my mind, for a couple of reasons.

First, I believe that, in principle, science should be conducted with radical transparency. Subject only to the need to protect participants, all review should take place in public, with all code and data fully open. Currently only a few journals (e.g., Meta-Psychology) offer this, but open review and commenting, at least, is part of the deal with most preprint servers. The whole reason for posting a preprint is to allow direct feedback on it, which anyone can take part in. In contrast, in order to comment on published articles in journals, with a few exceptions such as PLOS One (which allows informal comments to be posted on articles as well as formal comments that go through peer review), the choices are blogging/tweeting, PubPeer, or trying to fit a letter to the editor into some bizarre word count limit. (Some journals refuse to entertain any discussion of their articles unless it arrives via the manuscript submission system.)

So if I publish an unannounced blog describing what I see as issues in an article or a body of work, what I'm doing, in effect, is bringing the rules of Preprint World™ to bear. That might seem unfair, given that the authors have already "run the gauntlet" of peer review. (In some cases they may even have actually published a preprint first, but (spoiler alert) unless your preprint is about a politically hot topic, it isn't going to get much feedback, because we're all too busy with our own stuff.) But peer review is utterly broken, especially in its present mostly-secret form, which allows bad stuff to get published (through various forms of cronyism, as well as the limitations of editors and reviewers) and keeps critical voices out. A few journals now publish reviews alongside accepted articles, which is a first step along the road, but this strikes me as hugely insufficient because we don't get to see what happened to the manuscripts that were rejected.

(I should also acknowledge here that I am in a very unusual and privileged position. I am retired and don't have to keep anyone happy; nor, unlike if I were an emeritus professor, do I have a large list of buddies going back to my time in grad school to whom I feel some kind of obligation of loyalty.) 

The second reason is much more personal. If I write to an author and say "Umm, I think I've found these problems in your article", it feels to me as if I'm entering into a process of negotiation with them. I worry that maybe it feels to them like there is something they can say or do that will persuade me not to share what I've found. Maybe they feel blackmailed. Maybe they will try and negotiate: say, to address three problems if I will "let them off" the fourth ("We didn't feel we had a choice to collect the data any other way, the postdoc had left and the grant money had run out").

I really hate that feeling.

Some people seem to have no problem with that kind of implicit, low-level conflict, but it really doesn't sit well with me. Perhaps this is irrational, but it's how my personal sense of embarrassment works, and I don't think that's about to change. (I'm not usually a fan of psychoanalytic approaches, but for what it's worth, I'm pretty sure that my relationship with my late mother is indeed involved here.)

Of course, this approach has disadvantages. Sometimes it can lead to pointing out "problems" that aren't actual problems, because I didn't understand something. You also risk sounding like a crank, which James Heathers and I wrote here about trying to not to be. You can mitigate this, for example by checking your analyses with multiple other people, but in practice even your closest colleagues don't always have time to go through the boring detailed stuff (and some of it is really, really boring). On occasions it gets you a nastygram from the authors, who in one case complained to my dean that I was violating their human rights [sic] by citing an e-mail of theirs verbatim. When I replaced the verbatim text with a paraphrased version, they then complained that I had misrepresented what they had written. (Another small benefit of not having corresponded with the authors is that you avoid the question of how to cite them.)

I think this is a consequence of the unique nature of science as a human activity. In an ideal world, science would not be conducted by social beings at all. Mr Spock doesn't mind if you call out apparent errors in his regression tables (although presumably he doesn't make many errors). But we don't live in that world. We all have reputations to consider, and we all like to be evaluated positively. (Aside: I thoroughly recommend Judged: The Value of Being Misunderstood by Ziyad Marar, although I wish the author hadn't used quite so much recently-discredited social psychology to support his arguments.) So any kind of scientific criticism is likely to be orthogonal to our usual ways of rubbing along in polite society.

In fact, to me it feels even more rude/disloyal/distasteful to blog about an issue if I have been discussing it cordially (up to a point) with the authors. If you're having a "Dear Jane", "Hi Nick", "Best regards" kind of exchange of e-mails, and at some point you realise that something is badly wrong, what do you do? In some legal systems a lawyer can request the judge to allow them to treat a witness as "hostile" based on their responses during a trial, but lawyers are trained (at least, I hope it doesn't come naturally) to disconnect their embarrassment, they get to walk away at the end of the case, and all of this is taking place in front of others who can see why they are asking.

This problem is even more awkward when you realise that in some cases you may be helping the author to construct an alibi ("Gosh, do 80% of our reaction times really end in 7? Ah yes, I remember now, we made some fake data to test the code, ha ha, yes, we must have used that by mistake, but hey, I've looked and just found the real data here, I'll write up the correction this afternoon, thanks for your keen observational skills"). Indeed, just this week we have heard this story from Joe Hilgard of how, whenever he pointed out a specific problem in the implausibly prolific output of one particular researcher, the next (equally implausible) article from the same person didn't have that particular problem in it. If this happened in a court of law there would be someone in overall charge who could take into account that the story keeps changing, but in science it seems you can get a lot of do-overs. In one case with which I was involved, when it was pointed out to the authors—two of whom were PIs with multiple R01 grants between them—that a coding error in their dataset meant that all of their regressions were uninterpretable (this was in the supplement of our critique, as it was only about the fourth worst thing about the original article), they merely uploaded a corrected version of the data without issuing a correction or indeed telling anyone at all. This meant that anyone who tried to reproduce the problem that I had discovered would now be unable to, but it also meant that re-running their code substantially changed all of their results. I wrote to the journal and the federal office of research integrity, but nothing happened.

Now, I'm aware that a risk of this approach is that it could end up turning me into some kind of solipsistic critic of science, the kind of person who has "Independent researcher" in their bio(*) and has been writing about the same one or two issues, and little else, for the last 10 years. The haughty drive-by posts that dominate sites devoted to "skepticism" in those fields of science that attract a lot of, er, enthusiastic amateur investigators also often seem to be based on an attitude of "I don't care what the authors have to say, here's why they're wrong" (although the one time I came close to having my own work featured on such a site, the potential author first sent me what appeared to be a rather crude form of blackmail note; I ignored it and as far as I know he never wrote up whatever perceived hypocrisy on my part that he was threatening to expose to the world). And indeed it is not hard to see parallels between a hardcore insistence on "science should be about objective truth" and the more juvenile kinds of libertarianism. Science should indeed be dispassionate, but it's still possible to be a dick about it. I don't want to be one of those perpetually disagreeable relatives that most of us have who are proud of proclaiming that "I say what I think, and I'm entitled to my opinion".

So there are limits to how one can go about this process in a reasonable way. I try to keep my language restrained; as James Heathers is fond of repeating, we can usually only talk in terms of error, because determining intent is hard and ultimately requires knowledge of someone's mental states, a method for which has so far—perhaps ironically—eluded psychologists. (The justice system sometimes has to do it, but even then it can result in strange effects; Ziyad Marar's book, mentioned above, has a nice section on this.) Before I go public with something I generally get several other people to take a look at it and see what they think, and if anyone has strong doubts I will often leave a post at the draft stage out of caution.

However, a complex study or body of work can produce a lot of what might look like smoke without the need for any kind of major fire, so this is always going to be an imperfect process. I'm happy to correct my posts in as transparent a way as possible when they are shown to have been based on faulty assumptions or other errors, although I acknowledge that such a correction is always inferior to not getting things wrong in the first place. But I think it's important for the issues themselves to be discussed in public; I hope that it keeps everyone honest, me first of all.



(*) Shout-out to a Twitter pal whose bio used to contain the words "Independent researcher (yes, I know...)"

21 January 2021

Some apparent problems in a high-profile study of ultra-processed vs unprocessed diets

Update 2021-01-27 13:11 UTC: Added the word "apparent" to the post title. That should have been there before.

Update 2021-01-26 14:28 UTC: With the permission of Kevin Hall I have reproduced his responses in line below, in italics and bracketed by [KH... / ...KH]. I believe that these responses adequately address all of the points that I made in the original post.

Update 2021-01-22 23:21 UTC: I have received an extensive response from Kevin Hall, the lead author of the study under discussion here. This addresses the great majority of the points that I raised in this post. I will attempt to incorporate a version of those responses in a forthcoming update, but I wanted to get this acknowledgement of that response out there as soon as possible.

Update 2021-01-21 13:52 UTC: Clarified my use of the terms "processed" and "ultra-processed". In the first version I wrote "I will mostly follow the authors' use of the terms "processed" and "unprocessed" to distinguish between the two". This was sloppy on my part; the authors use "ultra-processed" consistently throughout the article. They mostly use "PROC" and "UNPROC" (or variations on those terms) in the data files, presumably for the easy visual contrast between the two, and that was what I wanted to convey. I also changed "processed" to "ultra-processed" in the title of the post.


(Preamble: This post appears simultaneously with this post by Ethan and Sarah Ludwin-Peery, who have some questions about patterns in the data associated with the article that is discussed. They got in touch with me via Ivan Oransky at Retraction Watch, who is also writing about this today. I recommend that you read their analysis first, not least because it provides a much more comprehensive introduction to the study. Here I discuss a variety of other apparent problems with the same article, which I found while Ethan and Sarah got on with sorting out the mystery of the daily weight changes. Although the contents of this post do not need a great deal of statistical training to follow, they are—as is often the case with post-publication data-based forensics—not exactly a thrill-a-minute ride, but I hope that I have made the implications of each set of points reasonably clear.)

This post looks at an article that first appeared in May 2019 describing a randomised controlled nutrition study. The authors claimed that people who were allowed to eat as much as they wished of a diet based on either "ultra-processed" or "unprocessed" food(*) consumed around 500 kcal/day more on the ultra-processed diet, and gained an average of 0.9 kg (2 lbs) in weight in two weeks, compared to people on the unprocessed diet, who lost an average of 0.9 kg in the same period. The same 20 participants ate both diets, in a randomised order. Importantly, the amount of macronutrients (protein, fat, and carbohydrates) provided in the meals was closely matched across diets, as was the number of calories offered (logically, since calories are a linear function of the macronutrients). That is, the principal claim of the study is that the mere fact that the food was ultra-processed, versus unprocessed, caused people to consume 500 kcal/day more and thus gain, rather than lose, weight in a controlled in-patient setting.

Perhaps not surprisingly, this study has attracted a lot of attention. It has already been cited more than 360 times according to Google Scholar. The National Institutes of Health (NIH), which funded and conducted the study, put out an extensive news release about it, and the story was covered by both Science and Nature, as well as the BBC, the Guardian, the Washington Post, and many other major media outlets.

Here is the full APA-style reference of the article. For the first time since the appearance of the 7th edition of the APA Publication Manual (which says that we now have to list up to 20 authors’ names in a reference) I'm actually going to need an ellipsis to omit some of the 25 authors:

Hall, K. D., Ayuketah, A., Brychta, R., Cai, H., Cassimatis, T., Chen, K. Y., Chung, S. T., Costa, E., Courville, A., Darcey, V., Fletcher, L. A., Forde, C. G., Gharib, A. M., Guo, J., Howard, R., Joseph, P. V., McGehee, S., Ouwerkerk, R., Raisinger, K., ... Zhou1, M. (2019). Ultra-processed diets cause excess calorie intake and weight gain: An inpatient randomized controlled trial of ad libitum food intake. Cell Metabolism30(1), 67–77. https://doi.org/10.1016/j.cmet.2019.05.008

The article is published on an Open Access basis; you can find the full text here (PDF, 2 MB) or a fuller version, including the Supplemental Information, here (PDF, 23 MB). A small erratum, correcting a number of minor issues, was published on August 6, 2019; all of the issues mentioned in that erratum are already corrected in the PDF files, so you don't need to keep that to hand while reading the article.

[KH... Importantly, another erratum was published in October 2020 and is available here. The correction relates one of the questions raised below and we realize that the updated data and code were not yet deposited on the OSF website. We will do so. ...KH]

This study has already been the subject of a comment on PubPeer by Edward Archer, who is, I think it is fair to say, a prolific critic of the way that much nutritional research is carried out. I am not a nutrition scientist, so this blog post will mostly concentrate on the data and statistics of the study. I do have one or two small methodological questions too, but these are based only on my 60 years of experience of consuming food and 40 or so of preparing it, rather than any understanding of how nutrition studies are run.

The study

The authors recruited 20 volunteers, 10 male and 10 female, and kept them under observation for 28 days in an in-patient environment at the NIH Clinical Center in Bethesda, Maryland. The data show that between one and four people were in the facility at any point between the first admission on April 17, 2018 and the last recorded data collection on November 19, 2018.

Participants spent 14 days on each of two diets, one described as "ultra-processed" and the other as "unprocessed". The diets were presented on a 7-day rotation, so each participant ate the same meal twice, 7 days apart. Although the purpose of the study was to examine the effect of an "ultra-processed" diet, and that term tends to be used in nutrition science with a specific meaning that is different from "processed" (it's complicated), I will use the terms "processed" and "unprocessed" to distinguish between the two, which I hope will avoid any confusion that might be caused by the fact that "ultra-processed" and "unprocessed" both start with the same letter. The participants were randomised to receive the processed diet first (N=10, 6 male, 4 female) or the unprocessed diet first (N=10, 4 male, 6 female); after 14 days on one diet they immediately switched to the other, as shown here.

Timeline of participants in the study. Reproduced from Figure 1 of Hall et al.'s article.


This study seems to have been a substantial undertaking. The participants spent 28 days in a highly controlled environment. The study was invasive, with subcutaneous sensors to monitor glucose levels as well as multiple finger stick blood testing operations daily. I like to imagine that the participants were handsomely compensated for taking a month out of their lives in the name of science; certainly the budget for the study must have run well into six figures.

The study code and data

The authors have made their data and SAS analysis code available in an OSF repository here. There are two datasets, named ADLDataSAScode and ADLDataSAScode1, each in its own ZIP file. The only difference between these seems to be that ADLDataSAScode1, which was uploaded on August 20, 2019 (three months after the article was first published online, which was on May 16, 2019), contains one extra data file, and the code has been extended with a few lines to produce a table from that file (more on this later). All of the analyses in this post refer to the ADLDataSAScode1 dataset.

Screenshot of the timestamps of the OSF repository for the study. A full-size version of this image is available as part of the supporting files for this post (see "My code and data", below).

The SAS code is not, as one might have hoped, a run-once script that generates all of the tables and figures from the article. Indeed, as supplied, the main script file (ADLDocumentation1.sas) produces two runtime errors at line 61 because the variables created within the SAS data file DLW at lines 42 and 43 are lost when this file is overwritten twice at lines 45 and 46. It seems that the code is best regarded as a collection of "building blocks" of code that can be run individually, possibly with minor modifications to use different subsets of the data. However, for completeness, I patched up the code so that it would run without error messages, and also to include both the original and adjusted analyses of the figures from Table 3D (see "The adjusted weight data", below), and ran it in SAS University edition. I have made the resulting code ("Nick-ADLDocumentation1.sas") and output ("(Annotated) Results_Nick-ADLDocumentation1.pdf") files available online (see "My code and data", below).

The exact length of the study

An issue that stands out immediately when one looks at any of the data files containing daily records is that there seems to be a fencepost error. Participants spent 14 days on each of two diets, with no break in between; their weight at the start of day 1 was the baseline for the first phase (processed or unprocessed diet, assigned at random), and their weight at the start of day 15 was the baseline for the second phase, when they received the other diet. It would seem, therefore, that they should have been weighed 29 times—once at the very start of the study, and then 28 more times after eating a day's worth of meals each time—but there are only 28 daily weight records for each participant. That is, we apparently do not know the effect on their weight of the last (14th) day of the second diet, because the last measurement of their weight on that second diet was apparently the one made on the morning of the 14th day (their 28th in the study), before they proceeded to eat their food and undergo whatever other measurements were performed on that day. This seems to make little sense, from the standpoint of either study design or ethics. Why feed your participants the controlled diet on the last day if you are not going to collect weight data from them relating to that day?

In fact this problem seems to be exacerbated because, as the data files deltabc and deltabw show, the difference in weight retained for each participant on both diets was the difference between their weights at the start of the first and 14th day on that diet. That is, even for the first diet that each person followed, their final weight was the weight at the start of the 14th day in the study, not that at the start of the 15th day; and the effect of the meals that they consumed on the 14th day of the study is also essentially disregarded.


Participant delta weights according to the data files deltabw (top; an Excel filter has been applied to show only values near to the start and end of each diet period) and deltabc (bottom). It can be seen that the retained weight change for each participant and each diet is the difference between their weight at the start of the 14th day on that diet and the start of the first day on that diet, apparently representing making a span of 13 rather than 14 days. The same pattern holds for every participant.

[KH... Participants were admitted the afternoon before the study began. An overnight fasted body weight measurement was collected the next morning (day 1) which served as the fiducial point for the weight change calculations during the next 14 days on the first diet. On the morning of day 15, subjects were weighed which served as the fiducial point for weight change calculations on the alternate diet that was provided after an oral glucose tolerance test (OGTT). Fasted body weight measurements were then collected each morning including day 29 when the final OGTT was performed after which the subject was discharged. Thus, there were 29 fasted body weight measurements for each subject corresponding to the fiducial markers on days 1 and 15 prior to delivery of each diet and 14 days thereafter. However, the reported body weight changes in the manuscript correspond to days 1-14 of the first diet period and days 15-28 of the second diet period as shown in Figure 3A of the manuscript as described as the weight changes on each respective day on the diet. It would have been possible to report body weight changes corresponding to days 1-15 of the first diet period and days 15-29 of the second diet period, but we thought this would have been confusing to readers. ...KH]

Which days did participants spend in the respiratory chamber?

Participants spent one day per week in a respiratory chamber to enable their energy expenditure to be studied in detail. The article states that "On the chamber days, subjects were presented with identical meals within each diet period, and those meals were not offered on non-chamber days" (p. 72), which makes sense from an experimental control point of view, in that all participants would have consumed the same food on that day. The article's Supplemental Information [PDF, 21MB] further states (on pp. 15, 16, 17, 37, 38, and 39) that the chamber day was day 5 of each weekly meal rotation, corresponding to days 5 and 12 of each participant's time on each diet.

However, the great majority of the records in the data file chamber appear to contradict this. I looked for precise matches between the recorded energy intake on the chamber days and the records for each participant in the dailyintake file, and found exactly one match for each participant and chamber day. Support for the idea that these matches are not coincidental is provided by the fact that the calendar dates of each record of the matched pairs (one in chamber and one in dailyintake) are identical. The matched records imply that of the 80 chamber days (20 participants x 2 diets x 2 chamber days per diet), only 7 took place on day 5 of the weekly meal rotation (whereas 2 were on day 1, 24 on day 3, 3 on day 4, 31 on day 6, and 13 on day 7). Furthermore, of the 40 pairs of chamber days within the same diet, 15 were on different meal rotation days within the pair (e.g., for participant ADL002 on the unprocessed diet, the chamber days were 3 and 8, corresponding to the third and first days of the meal rotation, respectively), meaning that the participant would have eaten different meals on their two chamber days for a given diet in 37.5% of cases. It is difficult to reconcile these records with the claims in the article and supplemental information.

[KH... The article and supplement do not claim that “participants did indeed all spend days 5 and 12 of each diet in the chamber”. Rather, the main manuscript describes that participants spent one day each week in the respiratory chambers but does not specify the days of the week. The Supplementary Materials provide information about the rotating 7-day menu of meals provided on each diet and the chamber days were listed as occurring on day 5 of each week. This was not intended to indicate that the chamber days only occurred on day 5 but rather that the meals provided during the chamber days were prespecified and did not vary between subjects on the same diet no matter what day the chamber days occurred. The clinical protocol (available on the OSF website) indicates in Appendix A that the proposed schedule (page 34) had chamber days planned for days 3 and 10 on each diet. However, the protocol also notes on pages 13-14 that “Every effort will be made to adhere to the proposed timelines, but some flexibility is required for scheduling of other studies, unanticipated equipment maintenance, etc. Scheduling variations will not be reported.” Thus, while chamber days varied to accommodate such scheduling challenges, the meals provided on the chamber days were constant within each diet. ...KH]

Counting the calories

The data file dailyintake contains information about the amount of calories and individual nutrients consumed by the participants on each day. The total number of calories consumed is reported to two decimal places, but the individual readings of calories for protein, fat, and carbohydrates that sum to that total are reported to six decimal places, which on visual inspection do not appear to contain any regular patterns (which might correspond to, say, recurring decimals).

Extract from dailyintake file, showing six digits of precision for macronutrient calorie counts. Some columns have been reduced to zero width to enable the image to fit on this web page.

It is not clear how such numbers could have been generated, however, as the process for calculating the amount of calories consumed presumably ought to have been a fairly simple multiplicative one, based on estimates of the numbers of grams of protein, fat, and carbohydrates in the uneaten portions of each food that was offered, after deducting an estimate of the amount of water. (Edward Archer's comment on PubPeer mentions this issue, and suggests that using a bomb calorimeter might have been a better way to measure energy intake, although this doesn't seem to address the split into macronutrient types.) The authors report that the diets were designed and analyzed using ProNutra software, made by Viocare of Princeton, NJ. I wrote to Viocare to ask how this software calculate calories from macronutrients—for example, whether it uses the Atwater values of 4.0 kcal/g for protein and carbohydrates and 9.0 kcal/g for fat, and whether it typically generates long mantissas in its output. Its founder and president, Rick Weiss, sent me this reply: 

ProNutra’s standard nutritional database is from USDA which we load into ProNutra with the resolution as USDA provides. Typically a research group using ProNutra would round off to the decimal place that they need. So I agree, seeing a value to the 6th decimal doesn’t make sense. The analysis of calories from macronutrients does use Atwater values.

[KH... More specifically, ProNutra uses specific Atwater factors which can deviate from the general values of 4.0 kcal/g for protein and carbohydrates and 9.0 kcal/g for fat. Therefore, the assumption immediately below is invalid. ...KH]

But if the calories per gram are always integers, the presence of six decimal places of precision in the macronutrient information of every meal would seem to imply that the authors calculated the amount of food that was (a) served and (b) remained uneaten to the nearest microgram, which would require rather a lot of effort.

[KH... The six decimal points for the macronutrient kcals in the data files are easily explained. The data for the total energy consumed and the percentage from each macronutrient were provided to 2 decimal places. For example, 15.68% of energy consumed as protein and a total energy intake of 2003.47 kcal. Therefore, the kcal provided from protein was calculated to six decimal places in the data file as follows: 2003.47*0.1568 = 314.144096 kcal from protein. ...KH]

I also wonder what was done in the case of processed snacks, where one would expect the authors to have simply used the nutrition information provided by the manufacturers.

[KH... The assumption that we used manufacturer provided nutrition information is not correct. As indicated in the manuscript, nutrient information was obtained from the USDA standard reference databases or if an item was not found in that database, we pulled from the Food and Nutrition Database for Dietary Studies, (also through the USDA). ...KH]

For example, on four days of the processed diet, three participants (ADL006 on days 3 and 4, ADL007 on day 8, and ADL015 on day 9) are recorded in the data file intakebymeal as having consumed 403.14 kcal in snacks, with 42.007956, 202.218222, and 158.933010 kcal coming from protein, fat, and carbohydrates respectively (these amounts are precisely identical on all four days). The chances that three people left exactly the same amount of snack food unfinished on a total of four occasions would seem to be negligible, so this duplication presumably corresponds to these participants having completely finished the contents of the same combination of snack packages on each day. But the nutrition information for each of these packaged snacks reports the amount of macronutrients with a precision of 1 g, so the calories from each of these macronutrients ought also to be an integer (a multiple of 4 or 8), unless the authors perhaps contacted the manufacturers and obtained analyses down to the microgram level.

Three different participants, four different days, identical snack consumption.

[KH... Indeed, ADL006 consumed the same snack items on days 3 and 4 as did ADL007 on day 8 and ADL015 day 9. From the mass consumed (grams), the subjects did finish the entire package of the snacks (28g, 39 g and 113 g for peanuts, cheese & peanut butter crackers, and applesauce, respectively). As explained above, we did not use manufacturer provided nutrition information, but rather nutrition information from the USDA database. Specific Atwater factors were used for the applesauce and the peanuts, whereas general Atwater factors were used for the cheese & peanut butter crackers. As also explained above, the six decimal points in the reported macronutrient kcals resulted from multiplying the macronutrient percentages by the total energy consumed. ...KH]

A further problem here is that these records show that the three participants in question consumed more calories in the form of fat versus carbohydrates from their snacking on these four days, but substantially fewer calories from protein versus carbohydrates. The only processed snack in the image on p. 24 of the Supplemental Information that has more calories from fat than from carbohydrates is the 28 g package of Planters salted peanuts (see my file snacks.xls), but this also has more calories from protein than from carbohydrates. I have not been able to identify any combination of packaged snacks that would get even close to the proportions of calories from protein, fat, and carbohydrates that is reported for these four participants, especially given the presumed constraint of counting only entire packages.

[KH...  The combination of foods that result in these proportions of calories from protein, fat, and carbohydrates was indicated above: 28g, 39 g and 113 g for peanuts, cheese & peanut butter crackers, and applesauce, respectively. 


As an approximate calculation using general Atwater factors, we have:

  • Peanuts 28 g providing 163.8 kcal, 6.63 g protein, 13.9 g fat, 6.02 g carbohydrates
  • Cheese & Peanut butter crackers 39 g providing 191.88 kcal, 4.21 g protein, 9.55 g fat, 23.01 g carbohydrates
  • Applesauce 113 g providing 47.46 kcal, 0.19 g protein, 0.11 g fat, 12.74 g carbohydrates

When summed, these snacks provide 403.1 kcal, 11.03 g protein (44.12 kcal using general Atwater factor), 23.56 g fat (212.04 kcal using general Atwater factor), 41.77 gm carbohydrates (167.08 kcal using general Atwater factor). Thus, most of the total calories come from fat, followed by carbs, and then protein. ...KH]

[Nick: Aaaarggggghhh. When preparing the spreadsheet that I used to try and determine a possible combination of snacks, I somehow entered 2 kcal/g instead of 9 kcal/g for fat. <homer_Doh!.gif > When I correct this, even using the manufacturers' approximate nutrition information, the combination leading to 403 pops right out at me. Apologies for my incompetence on this point. ]

Nutrition information for Planters salted peanuts snack package (source), showing total grams of protein, fat, and carbohydrate. The corresponding calorie amounts would be protein, 7 x 4 = 28 kcal; fat, 14 x 9 = 126 kcal; carbohydrates, 5 x 4 = 20 kcal.

The participants

Participants are identified in the data by sequentially numbered labels from ADL001 through ADL021. That represents a span of 21 unique values, but there are no records with the label ADL011. Whether this is due to an error in assigning a label or a participant dropping out is not clear; however, there is no mention in the article of anyone dropping out of the study.

[KH... ADL011 declined to participate in the study after their successful screening visit when they were assigned their subject number. No participants dropped out or were withdrawn from the study after admission. ...KH]

Participant ADL006 (male) had a baseline BMI of 18.050 kg/m², which is below the minimum specified in the inclusion criteria on pp. e1–e2 of the article (18.5 kg/m²). That is, on the authors' own terms it seems that he ought to have been excluded from the study.

[KH... This participant met inclusion criteria at their screening visit, but their starting BMI was lower once admitted for the study. ...KH]

Participant ADL020 (female) had a baseline BMI of 26.853. During her 14 days on the unprocessed diet she consumed an average of just 836 kcal/day and lost a total of 4.3 kg (9.4 lbs) in weight, accounting on her own for nearly a quarter (23.7%) of the total weight loss of the sample on the unprocessed diet. On day 12 of the same diet she obtained 22% of her calories (128 kcal out of 578 kcal total) from carbohydrates, which was the lowest daily percentage of any participant on any day on either diet in the entire study, whereas on the next day, day 13, she obtained 62% of her energy intake (602 kcal out of 962 kcal total) calories from carbohydrates, which was the highest daily percentage of any participant on any day on either diet in the entire study. This combination of extraordinary weight loss, very low levels of energy intake, and highly variable eating patterns make me wonder how much we can generalise from this participant to a broader understanding of the effects of different types of diet on the wider population. It seems to me that some kind of Hawthorne-type effect may have been present here.

[KH... The limitations of our study regarding generalizability were discussed in the manuscript. It is well-known in human nutrition research that individual subjects have large day-to-day diet variability and that there is large individual variability in weight loss. ...KH]

Errors in the data for individual participants

ADL002

The data file intakebymeal contains one record for every meal consumed by participants during the study (breakfast, lunch, dinner, and one record for all of the snacks that they took) containing an assortment of nutritional information about that meal, including the type of diet that the participant was following on that day (and, hence, at each meal). For participant ADL002, however, something strange seems to have happened. The three meals (but not the snacks) that he consumed on days when he was on the processed diet are marked with the "unprocessed" diet flag, and vice versa, for all 14 days of each diet.

Extract from data file intakebymeal showing that participant ADL002 apparently consumed unprocessed meals and processed snacks on the same day. Some columns have been reduced to zero width to enable the image to fit on this web page.


It is not at all clear how this could have happened, because one would expect the data to have been recorded directly at the end of the day in question (either in a spreadsheet or directly into the ProNutra software) such that the type of diet would either have been completed automatically by the system, or obvious based on the records from the preceding day. Certainly one would expect the snacks for any given day to have the same diet code as the three meals. (I believe that the three meals have the wrong diet code and the snacks have the right one, rather than the reverse, based on the fact that the dailybw and dailyintake files both show ADL002 being on the processed diet for the first 14 days of the study and the unprocessed diet for the last 14 days, whereas intakebymeal shows "unprocessed" as the diet for the breakfast, lunch, and dinner records for the first 14 days, and "processed" for the last 14 days.)

[KH... This error was previously discovered and an erratum was published in October of 2020 that corrected this error and is available here. We realize that we have yet to update the files in the OSF website to correct this previously identified error and apologize for the delay. ...KH]

ADL010

Participant ADL010 has a baseline (day 1, unprocessed diet) weight of 91.97 kg in the data file deltabw but 93.17 kg in the data file baseline. This affects, at least, the results shown in Table S1. If 91.97 kg is the correct weight then the Total mean for weight is correct but the Male mean (79.2 reported, 79.0 actual) and Male SE (6.6 reported, 6.5 actual) are not. If 93.17 kg is correct then the Male mean and SE are correct, but the Total mean (78.2 reported, 78.3 actual) isn't. I have not evaluated the effect of this discrepancy on the headline results of the study, but given that the total weight loss of all 20 participants on the unprocessed diet was 18.07 kg, a difference of 1.40 kg would seem to be potentially quite important.

ADL010's weight on day 2 (versus day 1) is recorded as 93.17 kg in deltabw, so one possibility is that for this participant only, the copying process that generated the baseline table somehow picked up the day 2 value rather than the day 1 value. Interestingly, according to that same file, this participant's weight fell back again to exactly 91.97 kg on day 3, which seems like quite a strong yo-yo effect.


Weight of participant ADL010 in the data files baseline (top) and deltabw (bottom)..

[KH... The baseline information in Table S1 contains body composition measurements obtained by DXA. All of the subjects except ADL001 and ADL010 had their first DXA measurement on day 1, but ADL001 and ADL010 were measured on day 2. For ADL001, their body weight measurements were the same on days 1 and 2, but ADL010 had different weights on these days. Therefore, the body weight measurement on day 2 for ADL010 was included in the baseline information to correctly correspond to the day of the DXA measurement. ...KH]

As with several other issues raised in this blog post, it is not clear how this discrepancy could have arisen with any kind of systematic processing of the study data from raw observations. If values were copied manually across the various data files, one wonders how many other transcription errors might be lurking.

Other oddities in the data

As mentioned above, the data file intakebymeal contains a record for each meal (plus snacks), with information such as macronutrient and total calories, free water consumption, the total mass of the food consumed, etc. Meanwhile, the data file dailyintake has a record for each day's consumption for each participant, broken down similarly. One would therefore expect the values in the four records in intakebymeal to sum to the values in the corresponding record in dailyintake. Curiously, however, this is not the case. Indeed, while the energy intake (EI) field in dailyintake matches the sum of the per-meal EI values in intakebymeal to within 0.05 kcal in every case (once the diet code error for participant ADL002, discussed above, has been corrected), the calories for protein, fat, and carbohydrates from the four meal records each day frequently sum to a total that is some way from the equivalent values in the daily record.


Per-meal (top, with sum for all four meals under "Total") and per-day intake for participant ADL001 on the first day of the processed diet. Note that while the total energy intake ("EI") from the meals is identical to within 0.01 kcal, the total for each of the macronutrients (protein, fat, and carbohydrates) is different by between 9 and 26 kcal. Some columns have been reduced to zero width to enable the image to fit on this web page.

[KH... We have been able to reproduce this problem using the data from several subjects and it appears to be an issue with the ProNutra software. We have contacted the manufacturer to identify the reason for the problem but have yet to receive a reply. However, we agree with the blogger that the magnitude of the discrepancy is very small (tens of calories) and we note that it does not affect the primary study outcome of total energy intake. This issue may be related to the next problem below. ...KH]

A related problem is that, within intakebymeal, the three macronutrient calorie observations for a meal frequently do not sum to the overall energy intake from that same meal. A spectacular example of this is the dinner of participant ADL005 on day 3 of the unprocessed diet, where the macronutrient calories sum to 1235.54 kcal, but whose total energy content is shown as 1720.21 kcal—a net discrepancy of about 484.67 kcal.

Per-meal total and per-macronutrient calories for participant ADL005 on day 3 of the unprocessed diet. Some columns have been reduced to zero width to enable the image to fit on this web page.


A total of 639 of the 2,240 participant x day x meal records in intakebymeal suffer from this problem, whereas none of the records in dailyintake do. Put simply, a large number of the per-meal macronutrient values in the intakebymeal data file appear to be incorrect. Interestingly, all of these discrepancies are on the positive side—that is, when the reported overall energy intake differs substantially from the total of the energy intake from the macronutrients, the former is always larger— suggesting that whatever process is responsible for these discrepancies might not be entirely random.

[KH...  We noticed this problem with the meal data (and not the daily data) when preparing our correction published in Cell Metabolism in October of 2020.  We identified that this was an error in the ProNutra software that listed the fraction of calories coming from all three macronutrients as 0% while correctly providing a value for the total calories for the following food items:                 
Garlic, raw
Lemon juice, fresh squeezed
NutriSource Fiber
OLD FOODS- Oil, olive (Nina)
Oil, olive
Oil, olive (Nina)
Oranges, raw
Pepper, black (Monarch)
Salsa (del Pasado)
Tomatoes, raw

We contacted the manufacturer of ProNutra at the time, but we have yet to receive a satisfactory explanation for this error.  Nevertheless, we corrected these data in the erratum published in Cell Metabolism in October of 2020. We realize that we have yet to update the files in the OSF website to correct this previously identified error and apologize for the delay. ...KH]

The adjusted weight data

I mentioned earlier that the OSF repository for the project contains two ZIP files. The second of these, uploaded after the article was published, includes an extra data file namedeltabcadj14, and the SAS code has been extended with a few lines that analyse this file. This code seems to be quite important as it claims to generate the results for figure 3D of the article, which presents what are arguably the headline findings of the study: a mean weight gain of 0.9 kg per participant on the processed diet and a mean weight loss of 0.9 kg on the unprocessed diet. The code file contains this comment:

Update1: Body composition changes presented in Figure 3D are adjusted for 14 days because the body compositions were not measured exactly 14 days apart. In the previous version of SAS code and data, such adjustment was not provided. Here we have updated the SAS code at the section "data for figure 3D" and added a dataset "DeltaBCadj14"; 

It is not clear what adjustments were performed to make this new data file. The extra code provided merely re-runs the comparisons of before/after weight, fat mass, and fat-free mass for the two types of diet, using the adjusted data. When the new code is run, it produces results for the mean weight loss and gain that are around 20% different from the originals; had these numbers been available when the article was published, the authors would presumably have reported a mean gain of 0.8 kg (to one decimal place) on the processed diet and a mean loss of 1.1 kg on the unprocessed diet.


Comparison of pre- and post-study weights (first two lines of each panel, for the processed and unprocessed diets, respectively) and fat/non-fat mass, using the original (top) and adjusted (bottom) data. The output from the original data file contains a descriptive label for each line, which I have removed here to allow the figures in the tables to appear in the same size font for both images.


Interestingly, the sample size for fat mass and fat-free mass on the unprocessed diet is higher with the adjusted data than the original data. The data file deltabc is missing these values for participant ADL002, whereas deltabcadj14 is not. Thus, whatever the adjustment process was, it seems to have extrapolated or interpolated in some way whatever data relating to fat mass might have been missing for this participant, such that he could now be included. (I assume that fat-free mass is calculated as weight minus fat mass, so that only one missing value needs to have been inferred in this way.)

I wonder if this adjustment might be an attempt to compensate for the issue that I raised earlier under the heading "The exact length of the study". But if that is the case, it is not clear why it would be necessary to adjust the values for both diets for each participant. After all, the start of day 15 of the study—the day on which the participants changed to the other diet—ought to correspond to exactly 14 days after they were weighed on day 1. (See also my section "The exact length of the study", above.)

The article states (p. e2) that participants were weighed at 6am every day. If it turns out that they were weighed substantially later on day 1 (or earlier on the last day), the question then arises of whether they skipped one or more meals on that day, although there are records for every scheduled meal in intakebymeal. On the other hand, if they were weighed only an hour or so late, the adjustment hardly seems necessary, especially since the Welch Allyn Scale-Tronix 5702 weighing scale that was used for the study has a precision of only 0.1 kg (a fact that I confirmed by e-mail correspondence with the manufacturer; see also Ethan and Sarah's post, which explores the consequences of this constraint in more detail). The adjusted values are reported to 10 or more decimal places, which—assuming that the adjustment was indeed a function of the difference between the actual elapsed time from the first to last measurement, and exactly 14 days—suggests that the time at which participants' weight and fat mass was measured must have been recorded to a very high degree of precision indeed.

[KH... The question about the precision of the body weight measurements is addressed in our response to the blog post by Ethan and Sarah Ludwin-Peery. These apparent high-precision body weight measurements and the statistical anomalies noted by Ethan and Sarah Ludwin-Peery are explained by subtracting pre-weighed pajamas worn during the body weight measurements as described in the manuscript Method Details section. ...KH]

Two questions arise from this operation:
  • First, it would be interesting to know what the adjustment process was. It seems to have been quite powerful, because some of the differences between the original and adjusted values are substantial. For example, for participant ADL014, the loss in weight on the unprocessed diet has been adjusted from 0.10 kg to 0.95 kg, and for ADL005 the equivalent loss has gone from 0.26 kg to 1.79 kg; participant ADL019's gain of 0.30 kg on the unprocessed diet has been adjusted to a loss of 0.24 kg, while participant ADL021's loss of 0.30 kg on the processed diet has been adjusted to a gain of 0.16 kg. These changes appear to affect principally the fat-free mass rather than the fat mass, which in numerous cases (8 out of 20 on the processed diet, 2 out of 19 on the unprocessed diet) is identical to two decimal places after adjustment. For example, participant ADL010's original weight gain of 3.60 kg on the processed diet becomes 2.69 kg in the adjusted file, but his fat mass did not change at all.
  • Second, if the authors believe that these adjusted figures provide a better estimate of the effects of the diets, one might wonder why they have not submitted a correction, updating the claims about weight loss that featured in the abstract of their article, rather than allowing this important new information to languish in an OSF repository. Otherwise it is not clear what the point of performing these "adjusted" analyses was.
[KH... The results in the published manuscript correspond to the unadjusted data and code that was originally deposited on the OSF website. The adjustments in the second file on the OSF website were performed to address the fact that the DXA body composition measurements were not performed on exactly at the same time points for all subjects. Furthermore, subject ADL002 was missing one DXA measurement during the unprocessed diet period. The adjusted data attempt to estimate the mean changes in body composition that would have occurred had the DXA measurements been aligned on day 14. To do this, we calculated the slope of the best fit regression line to the fat mass measurements over each diet period to estimate the fat mass change on day 14. The DXA measurement at the end of the first diet period was also used as the fiducial measurement for the start of the second diet period and subject ADL002 contributed only 2 fat mass measurements during the unprocessed diet period. The corresponding body weight measurements on those days were used to calculate the fat-free mass estimates by subtracting the estimated fat masses on those aligned days. This explains the minor differences between mean results reported in the original file deposited in OSF (which correspond to the results published in the manuscript) and the first updated file. The mean results are not materially different between these analyses, and the adjusted data merely address the potential criticism that the DXA measurements were not all conducted on the same days in all subjects. The reported data in the manuscript are not in error. ...KH]

Conclusion

Hall et al.'s article seems to have had a substantial impact on the field of nutrition research. However, both Ethan & Sarah's post and this one raise a number of concerning questions about the reliability of this study. There seem to be problems with the design, the data collection process, and the analyses. I only looked at about half of the 23 data files, so there may be other problems lurking. I hope that the authors and the editors of Cell Metabolism will take another look at this study and perhaps consider issuing a correction of some kind.

[KH... A correction was published in Cell Metabolism in October of 2020 and is available here.
This correction regards an error described by the blogger that we previously independently discovered. Many of the other questions raised above are the result of misinterpretations of the data and the study. We hope that we have now clarified these issues. One remaining question appears to involve the ProNutra software used to calculate the individual macronutrient amounts, but the discrepancies are very small and do not affect the primary study outcome. ...KH]

My code and data

I have made my R analysis code, which reproduces most of the results reported above, here. Some of my results can probably best be checked by examining the data files in a spreadsheet, so my code also includes a loop (which you need to enable, following what I hope are clear instructions) that will export the original SAS data files to CSV format. Also included at the same location is a spreadsheet file named snacks.xls which summarises the nutrition information for the snacks that were served on the processed diet, plus the OSF screenshot and the SAS code and results files mentioned earlier.

Acknowledgements

Thanks to Andrew Althouse and James Heathers for help with the analyses, and to Ethan and Sarah Ludwin-Peery for sharing their discoveries about the Hall et al. article and some very interesting discussions about what it all might mean.

Note on copyright

I believe that the reproduction of two images in this post (Figure 1 of Hall et al.'s article and the Planters nutrition information label) constitute fair use.

Footnotes

(*) I have put these terms in quote marks to emphasise that they have a specific technical meaning. I don't know if that it a good idea, though; perhaps it looks like I am putting Dr Evil-style air quotes around them. That isn't my intention.