02 July 2023

Strange numbers in the dataset of Zhang, Gino, & Norton (2016)

In this post I'm going to be discussing this article:

Zhang, T., Gino, F., & Norton, M. I. (2016). The surprising effectiveness of hostile mediators. Management Science, 63(6), 1972–1992. https://doi.org/10.1287/mnsc.2016.2431
 
You can download the article from here and the dataset from here.

[[ Begin update 2023-07-03 16:12 UTC ]]
Following feedback from several sources, I now see how it is in fact possible that these data could have been the result of using a slider to report the amount of money being requested. I still think that this would be a terrible way to design a study (see my previous update, below), as it causes a loss of precision for no obvious good reason compared to having participants type a maximum of 6 digits, and indeed the input method is not reported in the article. However, if a slider was used, then with multiple platforms the observed variety of data could have arisen.
 
In the interests of transparency I will leave this post up, but with the caveat that readers should apply caution in interpreting it until we learn the truth from the various inquiries and resolution exercises that are ongoing in the Gino case.
[[ End update 2023-07-03 16:12 UTC ]]

The focus of my interest here is Study 5. Participants (MTurk workers) were asked to imagine that they were a carpenter who had been given a contract to furnish a number of houses, but decided to use better materials than had been specified and so had overspent the budget by $300,000. The contractor did not want to reimburse them for this. The participants were presented with interactions that represented a mediation process (in the social sense of the word "mediation", not the statistical one) between the carpenter and the contractor. The mediator's interactions were portrayed as "Nice", "Bilateral hostile" (nasty to both parties), or "Unilateral hostile" (nasty to the carpenter only). After this exercise, the participants were asked to say how much of the $300,000 they would ask from the contractor. This was the dependent variable to show how effective the different forms of mediation were.

The authors reported (p. 1986):

We conducted a between-subjects ANOVA using participants’ demands from their counterpart as the dependent variable. This analysis revealed a significant effect for mediator’s level and directedness of hostility, F(2, 134) = 6.86, p < 0.001, partial eta² = 0.09. Post hoc tests using LSD corrections indicated that participants in the bilateral hostile mediator condition demanded less from their counterpart (M = $149,457, SD = 65,642) compared with participants in the unilateral hostile mediator condition (M = $208,807, SD = 74,379, p < 0.001) and the nice mediator condition (M = $183,567, SD = 85,616, p = 0.04). The difference between the latter two conditions was not significant (p = 0.11).

Now, imagine that you are a participant in this study. You are being paid $1.50 to pretend to be someone who feels that they are owed $300,000. How much are you going to ask for? I'm guessing you might ask for all $300,000; or perhaps you are prepared to compromise and ask for $200,000; or you might split the difference and ask for $150,000; or you might be in a hurry and think that 0 is the quickest number to type.

Let's look at what the participants actually entered. In this table, each cell is one participant; I have arranged them in columns, with each column being a condition, and sorted the values in ascending order.

 

This makes absolutely no sense. Not only did 95 out of 139 participants choose a number that wasn't a multiple of $1,000, but also, they chose remarkably similar non-round numbers. Twelve participants chose to ask for exactly $150,323 (and four others asked for $10,323, $170,323, or $250,323). Sixteen participants asked for exactly $150,324. Ten asked for $149,676, which interestingly is equal to $300,000 minus the aforementioned $150,324. There are several other six-digit, non-round numbers that occur multiple times in the data. Remember, every number in this table represents the response of an independent MTurk worker, taking the survey in different locations across the United States.


To coin a phrase, it is not clear how these numbers could have arisen as a result of a natural process. If the authors can explain it, that would be great.


[[ Begin update 2023-07-02 12:38 UTC ]]

Several people have asked, on Twitter and in the comments here, whether these numbers could be explained by a slider having been used, instead of a numerical input field. I don't think so, for several reasons:
  1. It makes no sense to use a slider to ask people to indicate a dollar amount. It's a number. The authors report the mean amount to the nearest dollar. They are, ostensibly at least, interested in capturing the precise dollar amount.
  2. Had the authors used a slider, they would presumably have done so for a very specific reason, which one would imagine they would have reported.
  3. Some of the values reported are 153,023, 153,024, 150,647, 150,972, 151,592, and 151,620. The differences between these values are 1, 323, 325, 620, and 28. In another sequence, we see 180,130, 180,388, and 180,778, separated by 258 and 390; and in another, 200,216, 200,431, 200,864, and 201,290, separated by 215, 433, and 426. Even if we assume that the difference of 1 is a rounding error, in order for a slider to have the granularity to be able to indicate all of those numbers while also covering the range from 0 to 300,000, it would have to be many thousands of pixels wide. Real-world on-screen sliders typically run from 0 to 100 or 0 to 1000, with each of 400 or 500 pixels representing perhaps 0.20% or 0.25% of the available range.
Of course, all of this could be checked if someone had access to the original Qualtrics account. Perhaps Harvard will investigate this paper too...

[[ End update 2023-07-02 12:38 UTC ]]

 

Acknowledgements

Thanks to James Heathers for useful discussions, and to Economist 1268 at econjobrumors.com for suggesting that this article might be worth looking at.



14 comments:

  1. This is crazy. How was this data even generated? So obviously fake.

    ReplyDelete
  2. Looks very strange, indeed. I first wondered whether frequent numbers in the table like 150323 were a product of some sort of a scale that evenly divided 300k to a number. But there are numbers like 150324 and 150647, and they make this impossible. I wouldn't be surprised if this is another case of fraud, really.

    ReplyDelete
  3. Such un-round numbers may indeed happen if the researcher used a slider to ask for the choice.

    ReplyDelete
    Replies
    1. I don't think so. See the update that I have added to the post.

      Delete
  4. Could this have arisen from using a slider instead of having to type in the number? I haven't read the paper so apologies if this should be obvious

    ReplyDelete
    Replies
    1. I don't think so. See the update that I have added to the post.

      Delete
  5. It’s a slider. You can test this (it’s clear you didn’t actually test the clearest explanation before your post). Ask the same question on m-Turk. If the slider is set with 0 and 300,000 as end points, then The design of their slider makes it REALLY hard to get zero-ending values unless you’re at the start or end of the scale. If you simulate the data in Qualtrics you’ll get almost all zero-ending values. Even when people gave you a hypothesis, you didn’t even try to test it. C’mon, I’m all for better science, but you have to be better too.

    ReplyDelete
    Replies
    1. Well, this person https://www.econjobrumors.com/topic/francesca-gino/page/15?replies=310#post-9056423 reports that the midpoint of the Qualtrics slider is easy to hit, and the minimum incremental value is $698.

      And again, using a slider makes no theoretical sense, and is neither justified nor reported by the authors.

      But I tell you what. Republish your comment with your real name and affiliation or other similar information — or send me an e-mail from a non-anonymous account saying that you accept these conditions — and then if it turns out that these data are genuine and came from a slider, I will give $100 to a charity of your choice, and vice versa. Deal? Otherwise we can just wait for the inquiry. I know what I expect to see when Harvard downloads the original Qualtrics data.

      Delete
    2. I think using a slider scale here is just a UX choice. Most researchers using survey tools tend to go with sliders because it’s a better way at capturing data than open ended responses which participants on Mturk tend to skip. Nothing to me is odd here.

      I am all for auditing Gino’s work but let’s not go on a witch-hunt and attack her students or co-authors.

      Delete
    3. My prior for this being fraud is quite high. That being said: I don't think it's useful to explore how sliders work in 2023 when this study was done sth like 10 years ago. Qualtrics has changed a lot in that time.

      Delete
    4. The added analysis of "slider theory" is weak/wrong:

      "It makes no sense to use a slider to ask people to indicate a dollar amount. It's a number. The authors report the mean amount to the nearest dollar. They are, ostensibly at least, interested in capturing the precise dollar amount."

      Many researchers use sliders on Qualtrics by default, or as a way to prevent "out of range" responses that you would get from an open text box. A slider is one way to allow precise responses while easily communicating the range of allowed responses. Not saying it is the best (or even a good way) to measure numerical responses, but many researchers would use a slider for this type of question.

      It's also worth noting that when researchers set up a slider in Qualtrics, they also have to specify a "default" position for the slider. This is usually either at the 0 or 50% mark on the slider. BUT if there is a default value indicated and the question is forced response, then many participants will respond by slightly adjusting the slider away from the default response. Under these conditions it's hard to click on the slider and still retain the exact default value. IF the default was 50% AND the question was forced response, this would lead to the cluster of values slightly above and slightly below the 50% mark.

      I would still not call myself a "slider truther", even given the above. There is too much uncertainty about how Qualtrics generates steps on the slider. There's still more than enough reason here to verify the raw data and/or original OSF file on Qualtrics.

      Delete
  6. Thank you for updating this post to accurately refect that a sliding scale can EASILY explain these data. It only took numerous people confronting you for you to realize this fact. Please stop using your pernicious platform to spread misinformation and ruin academic careers. Personally, I think you should be sued for slander by Zhang and Norton, because there's no way these authors (or any others) can effectively salvage their reputation once data fraud accusations, even if unfounded, are raised in such a public way.

    ReplyDelete
  7. First, I'll just say that I'm disappointed with this blog post. It's really, really poor form for you to sort data, say "ooo this looks strange", and do no further analysis. How about some really, really, simple things - the presence of "odd" numbers don't seem to be different across conditions? How does the presence of "odd numbers" help the researchers at all?

    Further, your arguments for the slider are weak - using a slider is *not* a theoretical choice - I've never heard of a theory on sliders, have you? Also, it's easy to create a slider via java in Qualtrics so there's even more specificity that can be created beyond the Qualtrics defaults that are being mentioned here: https://hbs.qualtrics.com/jfe/form/SV_5yUjIPPr9sWEzNY


    My genuine attempts to reach the mid-point generated the following:
    150947
    147231
    150007
    146070
    148248
    141815

    ReplyDelete
  8. Why would you want a slider for these numbers? Open ended question does the trick with just a few standard options enabled such as forced response, requiring numbers and setting a minimum and maximum number. At the very least a 300.000 sliders is a poor design choice

    ReplyDelete