14 September 2021

Some problems in a study of Nitazoxanide as a treatment for Covid-19

Nitazoxanide (NTZ) is an antiparasitic medication that has been proposed as a possible treatment for Covid-19. (The currently most widely-touted repurposed drug for Covid-19, Ivermectin, is also an antiparasitic.) The effects of NTZ were recently examined in this paper:

Blum, V. F., Cimerman, S., Hunter, J. R., Tierno, P., Lacerda, A., Soeiro, A., Cardoso, F., Bellei, N. C., Maricato, J., Mantovani, N., Vassao, M., Dias, D., Galinskas, J.,
Janini, L. M. R., Santos-Oliveira, J. R., Da-Cruz, A. M., & Diaz, R. S. (2021). Nitazoxanide superiority to placebo to treat moderate COVID-19: A pilot prove [sic] of concept randomized double-blind clinical trial. EClinicalMedicine, 37, 100981. https://doi.org/10.1016/j.eclinm.2021.100981

The authors reported (p. 9) that "The study protocol and all data files from the study results and
programs to process the files are deposited in a publicly available repository on the GitLab system for this clinical trial", with a link. To me, this implies that the data and code should have been available on the date when the article was published, i.e., June 27, 2021. However, it appears that no files were actually uploaded to that repository until August 29, 2021, which perhaps not uncoincidentally was shortly after my colleague Kyle Sheldrick had asked the authors to post their data.

The GitLab repository contains the data divided into two sets, one with the participant characteristics ("Demographic_NTZ_Trial_Data") and one with the observations from each day of the trial ("Clinical_Laboratory_NTZ_Trial_Data"). Each of these is ostensibly provided in two formats, as both an Excel sheet (with the extension ".xlsx") and as the binary dump of an R data structure (".rds"). For "Demographic_NTZ_Trial_Data" the Excel and R versions appear to correctly contain the same data. Unfortunately, the two R binary files are byte-for-byte identical to each other, suggesting that the wrong file has been uploaded for “Clinical_Laboratory_NTZ_Trial_Data"; hence, I relied on the Excel version of both files.

Also perhaps of note is that the metadata (“File/Properties/Summary” and “File/Properties/Statistics”) of the Excel files reveal that they were created by “Apache POI” on August 29, 2021 at 19:13:34 UTC (“Demographic_NTZ_Trial_Data”) and 20:27:18 UTC (“Clinical_Laboratory_NTZ_Trial_Data”). The file creator name “Apache POI” suggests that these files were created by the R function write.xlsx(), which leaves this “signature” from the Java library that it uses to create Excel files. This in turn suggests that there ought to be a more fundamental version of the data files somewhere, perhaps in CSV format. It would be nice if the authors could share this “rawer” form of the data with us, rather than the output of an R program after some unknown amount of preprocessing.

The GitLab repository contains only the above-mentioned data files, plus a "data dictionary" containing a one-sentence description of each variable, and a brief "Readme"file. The authors have therefore apparently still not fulfilled their promise to provide the study protocols or their analysis code.

All of the following analyses are based on the two Excel files, in which several of the variables are labeled and/or coded in Portuguese; I believe that I have appropriately interpreted these labels, so that for example the variable “sexo” with the value “Masculino” corresponds to a male patient (reported by the authors in English as "Masculine").

The table of results

The basis of the majority of the authors' claims regarding the superiority of NTZ is the main table of results (Table 1, p. 5). Unfortunately, this table seems to contain a large number of errors and inconsistencies with the supplied data files. Some of these are more serious than others; I will describe them here from the top to the bottom of the table, rather than attempting to order them by some subjective measure of severity.

  1. Table 1 reports that there were 7 men ("Gender: Masculine") and 18 women ("Gender: Feminine") in the NTZ group, versus 8 men and 17 women in the placebo group. However, in the data these numbers show the opposite result, not by group but by gender: In the NTZ group, 18 patient records have the value "Masculino" and 7 have "Feminino" for the variable named "sexo", whereas in the placebo group, 17 records have "Masculino" and 8 contain "Feminino". That is, the data say that two-thirds of the patients were men and the table of results says that two-thirds were women. The article text does not discuss the sex or gender of the patients as a group, but it does not seem unreasonable to me that one of the 17 authors of the article might have been expected to notice that they reported the proportions of men and women entirely incorrectly.
  2. The variables "Race" and "Age group" list counts of the patients in each of these categories, and express these as percentages. However, these percentages are not of the number of people in each experimental condition, but of the total number of patients. For example, the first value for Race is White: 21 (42%), but 21 is 84% of the 25 people in the NTZ condition.
  3. The calculation of "RT-PCR Difference Day 1 - 21" does not seem to be correct. In the NTZ group, there are no records with measurements of PCR viral load at day 21, so there is no way to calculate the mean or SD of the difference between the two timepoints. In the placebo group there are four such records; I calculate the mean difference between the day-1 and day-21 viral load for these cases to be 0.79 with an SD of 3.45.
  4. The calculation of mean and SD values for "Removed from Supplemental O2" does not seem to be possible, as there is no variable corresponding to this in the data file. (The authors provided a data dictionary in their GitLab repository, which does not apparently mention any variable corresponding to supplemental oxygen.) Perhaps this variable is based on the clinical condition score changing from 3 to a lower value on a subsequent day, but this only occurs in three cases in the data file (patient 101008, day 7, change from “3” (supplemental O₂) to “2” (hospitalised); patient 103035, day 7, change from “3” to “2”; patient 104001, day 14, change from “3” to “2”), and all three of these participants are in the placebo group. It does not need to be possible to calculate a meaningful mean or standard deviation for these values even for the placebo group, let alone for the NTZ group where there are zero cases.)
  5. The p values for "Viral Load at Day 1" and "O2 Saturation at Day 1" in the table are both reported as 0.984. Using R's wilcox.test() function, I calculated these values as 0.168 and 0.381 respectively. The authors also mention (p. 4) that they used the Kruskal-Wallis test in some cases; with R's kruskal.test() function, I obtained p values of 0.167 and 0.380 respectively.
  6. Many of the numbers for the patient clinical condition scores in the lines "Patients Hospitalized" (6/8), "Oxygen supplementation" (8/8), and "Death" (3/6) are different between the data file and the article. Interestingly, all eight numbers for "Invasive mechanical ventilation" seem to be correct. See my annotated version of the table for more information.
  7. It is not clear how the p values for the patient clinical conditions were calculated. This could have been a simple two-count chi-square test (e.g., in R, chisq.test(c(2, 1)) gives a p value of 0.564, which appears a couple of times in the table), or it could perhaps have been a 2 x 2 contingency table, with the denominator being the number of patients in the other conditions. In any case, these p values do not appear to match either the calculated or reported frequencies.
  8. The mean and SD values for Lymphocytes at day 1 in the placebo condition and at day 7 in the NTZ condition have been exchanged. That is, according to the data file, the mean lymphocyte value on day 1 in the placebo condition was 1162.36, and the mean value on day 7 in the NTZ condition was 944.88 (a decrease on the day 1 value, rather than the increase that was reported in Table 1).
  9. For the measures "D-Dimer", "US- C Reactive Protein", "TNF-α", "IL-6", and "IL-8", there are several problems. First, the table reports values at day 10, but there are no records of any kind with this day in the data files. Second, for "TNF-α", "IL-6", and "IL-8", the table reports mean values on day 1, but none of the patient records for day 1 have any values other than "#N/A" for any of these three variables. Third, none of the remaining mean or SD values come close to the values that I calculated from the data file.
  10. A column in Table 1 shows the ratio between the means or counts in the two groups and describes this as the "Rate ratio". However, there are some rather strange numbers in this column. In many cases the decimal point (or comma) is missing, so that for example the ratio of 7 "Mixed" (race) patients in the placebo group to 3 in the NTZ group is reported as 2333. In other cases, even with a decimal separator assumed, the numbers make little sense. For example, the "ratios" between the numbers of patients on intensive mechanical ventilation at days 4 (3 placebo, 1 NTZ) and 7 (4 placebo, 1 NTZ) are reported as 3333 and 4444. One wonders what kind of analysis code generated these numbers.
  11. The "Rate ratio" values under "Day 21 Difference" for the measures "D-Dimer" and "US- C Reactive Protein" have been exchanged in the table, relative to what is in the data file.

The regression lines

Another set of problems is apparent in Blum et al.'s Figure 2, which the authors claim (p. 6) “shows that the viral load for the NTZ arm of the study dropped slightly faster than the Placebo arm over the 21 days of the study (slope of 1.19 for NTZ against 1.08 for the placebo)”.

First, the variable corresponding to the PCR tests in the data files is the PCR Ct (cycle threshold, cf. the data file “ntz_data_dict_2808.html” in the authors’ GitLab repository), and a higher cycle threshold corresponds to a lower viral load. That is, both of these slopes ostensibly show the viral load increasing over the course of the study, with (on the authors’ account) the NTZ group getting worse to a greater degree than the placebo group. This apparent failure to understand the meaning of their own data might be considered quite concerning.

Second, it can be readily seen that the great majority of the visible data points here are above the two lines, with a considerable number being both above and far off to the right. A moment's thought suggests a solution, which is that very large numbers of data points with a viral load value of zero have been drawn on top of each other. However, in the data file, the lowest PCR value is 17. What seems to have happened is that, as well as 115 actual PCR values, the authors have coded 136 NA values from days 4 through 21—corresponding either to missing tests or patient dropout—as a value of zero. While some of these NA values might correspond to negative PCR tests, others must be missing data as they occur later than the date at which the patient’s condition code changed to “5” (death). When I omitted these NA values, I obtained the following plot:

On its own terms (but see my next point), this plot shows—rather more plausibly than the published Figure 2—that the PCR cycle threshold increased (and, hence, the viral load decreased) throughout the study, with a bigger decrease in the NTZ group.

The third problem with Figure 2 is that attempting to interpret the slope of a linear regression line (one of the assumptions of which is that each data point is independent of the others) in this way for time series data with multiple repeated measures on the same patients is completely invalid. No conclusions can be drawn on the basis of this model. The authors ought to have used some kind of generalized linear model, accounting for correlation between observations on the same patients and also the censoring effects of patient death or discharge.

Repeated-measures "ANOVA"

Figure 3 of Blum et al.'s article shows the results of analyses that they describe as "Two-way-ANOVA". It is not clear exactly what analyses were performed here. On p. 4 the authors say "Statistical analyses included one-way analyses of variances using the Kruskal-Wallis non-parametric ranks test. Additional statistical questions were addressed with the Wilcoxon Rank Sum Test for numerical variables and Chi-Squared tests for categorical variables." Since the Wilcoxon rank-sum test is sufficient to conduct all of the analyses in Table 2, one might imagine that "Two-way-ANOVA" could include the Kruskal-Wallis test, although that is not appropriate for repeated measures.

In any case, some of the numbers reported in Figure 3 are rather strange. For example, in panels B and D, a significance star is assigned to p values less than 0.5, which does not seem to be a very high bar to meet. Furthermore, the caption for Figure 3 includes the sentence "The MFI of CD38+CD4+ T cells (3A), HLA-DR.+CD4+ T cells (3B), ... are expressed as the median ± standard deviation" (emphasis added), which is a rather unusual combination.

I was unable to reproduce Figure 3 for two reasons. First, there are no data in the files for day 10 or with any label that corresponds to the Y axis of panels E and F. Second, the variables that do exist in the data file with names that resemble the labels of the Y axis of panels A through D (i.e., "cd38_cd4", "cd38_cd8", "hla_dr_cd4", and "hla_dr_cd8") have mean values in the range 0.5–2.0, whereas the Y-axes in Figure 3 are two or more orders of magnitude larger; it is not clear to me if I haven't understood what these variables mean, or if they are simply missing from the data files.

Analyses of individual patients

The authors provided detailed information in the text of their article about the patients who died in the study. However, in some cases it is difficult to identify these individuals in the data, as there are a considerable number of discrepancies between the numbers reported in the text and those in the data file..

  1. Male deaths in the NTZ arm

“Two patients died in the NTZ arm of the study. One of these, patient P3 was a 67-year-old male with systemic arterial hypertension (SAH) and a BMI (body mass index) of 25. … Another patient in the NTZ group, P6, was a 63-year-old male with type 2 diabetes mellitus and dyslipidemia, with a BMI of 31.”

The data contain two records of patients dying in the NTZ arm. One (numerical ID 101004) was aged 77 with a BMI of 24.9, while the other (numerical ID 106016) was aged 63 with a BMI of 30.74. We might tentatively assume that these individuals correspond to the two mentioned in the text. However, 101004, with “systemic arterial hypertension” had baseline systolic blood pressure of 120, whereas 106016, with no mention of hypertension, had a systolic BP of 150. The reported C-reactive protein for the latter also does not match the data. Furthermore, for both of these patients (and indeed for all of the patients who died), IL-6 values are reported, although the IL-6 numbers are NAs in all cases (except for one reading for patient 103004; see next paragraph).

  1. Female deaths in the placebo arm

“P8 was an 88-year-old female with SAH, BMI of 24 … and IL-6 of 4872 MFI”. There is one 88-year-old female patient in the dataset (103004); her BMI is recorded as 28.8. For this patient, unlike all the others who died, there is one IL-6 reading, of 225.5 on day 21. That is the date on which she is reported as having died in the text of the article, although the data file “Demographic_NTZ_Trial_Data” states that she left the trial early (due to death) on day 7.

“Patient P14 was a 65-year-old female with previously not welldefined cardiomyopathy, BMI of 25. … and IL-6 of 4658 MFI. She died on D4...” There are no 65-year-old females in the data, whatever their outcome. By elimination (see next paragraph) it appears that P14 was in fact the patient with numerical ID 106010, aged 76 who according to the data file “Demographic_NTZ_Trial_Data” died on day 6. Again, no IL-6 data were reported for this patient.

“The last patient in the placebo group to die was P43, a 73-year-old female with obesity (BMI of 32). Baseline laboratory evaluation revealed a total lymphocyte count of 930 (total leucocytes13,480), D-dimer of 0.82 mcg/mL, US-RCP of 187.59mcg/L, and IL-6 of 417 MFI. She died on D12...” This appears to be patient 106038, who according to the data file was aged 72 (rather than 73). Again, no IL-6 data were reported for this patient.

  1. Male deaths in the placebo arm

Three deaths among male participants were reported in the placebo arm. Apart from small discrepancies in their ages in two cases (P24, 106012, reported age 55, data file age 55; P27, 106014, reported age 71, data file age 70; P17, 106028, reported age 78, data file age 79), the only obvious discrepancy for these three cases is the reporting of IL-6 data in the text when these values are not present in the data.

Conclusion

Blum et al.'s article contains a large number of both major and minor statistical and reporting errors, some of which are apparent even without the data set. Whether or not these issues are too great to make the article salvageable is a matter for the authors and the journal’s editorial team.

I have made my analysis code, together with full-size versions of the images, available at https://osf.io/uz875/

Thanks to Kyle Sheldrick and Gideon Meyerowitz-Katz for bringing the paper to my attention, and  being the first to notice the problem with the NA values being interpreted as zeroes in Figure 2.


3 comments:

  1. "This apparent failure to understand the meaning of their own data might be considered quite concerning."

    It really totally absolutely might.
    ;-)

    So glad there are people like you, doing this work. A little bit of a shocking read, but very fascinating.

    ReplyDelete
  2. "This apparent failure to understand the meaning of their own data might be considered quite concerning."

    It really totally absolutely might.
    ;-)

    So glad there are people like you, doing this work. A little bit of a shocking read, but very fascinating.

    ReplyDelete
  3. Amazing analysis! Including the zero viral load values in the regression clearly indicates pure incompetence, or bias. There is another study on NTZ done in Brazil that is also very suspicious for the data analysis, however this one passed peer review and was posted on the European Respiratory Journal. I wonder if you could analyze that one as well.
    https://pubpeer.com/publications/3589ECD8BB6AFD69D93D9E40DC39C2#

    ReplyDelete