23 June 2015

Mechanical Turk: Amazon's new charges are not the biggest problem

Twitter was buzzing, or something, this morning, with the news that Amazon is going to change the commission rates that it charges researchers who use Mechanical Turk (henceforth: MTurk) participants to take surveys, quizzes, personality tests, etc.

(This blog post contains some MTurk jargon.  My previous post was way too long because I spent too much time summarising what someone else had written, so if you don't know anything about MTurk concepts, read this.)

The changes to Amazon's rates, effective July 21, 2015, are listed here, but since that page will probably change after July, I took a screenshot:

Here's what this means.  Currently, if you hire 100 people to fill in your survey and want to give them $1 each, you pay Amazon $110 for "regular" workers and $130 for "Masters".  Under the new pricing scheme, this will be $140 and $145, respectively.  That's an increase of 27.3% and 11.5%, respectively.  (I'm assuming, first, that the wording about "10 or more assignments" means "10 or more instances of the HIT being executed, not necessarily by the same worked", and second, that any psychological survey will need more than 10 assignments.)

Twitter users were quite upset about this.  Someone portrayed this as a "400% increase", which is either a typo, or a miscalculation (Amazon's commission for "regular" workers is going from 10% to 40%, which even expressed as "$10 to $40 on a $100 survey" is actually a 300% increase), or a misunderstanding (the actual increase in cost for the customer is noted in the previous paragraph).  People are talking of using this incident as a reason to start a new, improved platform, possibly creating an international participant pool.

Frankly, I think there is a lot of heat and not much light being generated here.

First, researchers are going to have to face up to the fact that by using MTurk, they are typically exploiting sub-minimum wage labour.  (There are, of course, honourable exceptions, who try to ensure that online survey takers are fairly remunerated.)  The lowest wage rate I've personally seen in the literature was a study that paid over 100 workers the princely sum of $0.25 each for a task that took 20 minutes to complete.  Either those people are desperately poor, or they are children looking for pocket money, or they are people who just really, really like being involved in research, to an extent that might make some people wonder about selection bias.

I have asked researchers in the past how they felt about this exploitation, and the standard answer has been, "Well, nobody's forcing them to do it".  The irony of social psychologists --- who tend not to like it when someone points out that they overwhelmingly self-identify as liberal and this is not necessarily neutral for science --- invoking essentially the same arguments as exploitative corporations for not paying people adequately for their time, is wondrous to behold.  (It's not unique to academia, though.  I used to work at an international organisation, dedicated to human rights and the rule of law, where some managers who made six-figure tax-free salaries were constantly looking for ways to get interns to do the job of assistants, or have technical specialists agree to work for several months for nothing until funding "maybe" came through for their next contract.)

Second, I have doubts about the validity of the responses from MTurk workers.  Some studies have shown that they can perform as well as college students, although maybe it's best to take on the "Master"-level workers, whose price is only going up 11.5%; and I'm not sure that college students ought to be regarded as the best benchmark [PDF] here.  But there are technical problems, such as issues with non-independence of data [PDF] --- if you put three related surveys out there, there's a good chance that many of the same people may be answering them --- and the population of MTurk workers is a rather strange and unrepresentative bunch of people; the median participant in your survey has already completed 300 academic tasks, including 20 in the past week.  One worker completed 830,000 MTurk HITs in 9 years; if you don't want to work out how many minutes per HIT that represents assuming she worked for 16 hours a day, 365 days a year, here's the answer.  Workers are overwhelmingly likely to come from one of just two countries, the USA and India, presumably because those are the countries where you can get paid in real cash money; MTurk workers in other countries just get credit towards an Amazon gift card (which, when I tried to use it, could only be redeemed on the US site, amazon.com, thus incurring shipping and tax charges when buying goods in Europe).  Maybe this is better than having your participants being all from just one country, but since you don't know what the mix of countries is (unless you specify that the HIT will only be shown in one country), you can't even make claims about the degree of generalisability of your results.

Third, this increase really does not represent all that much money.  If you're only paying $33 to run 120 participants at $0.25, you can probably afford to pay $42.  That $9 increase is less than you'll spend on doughnuts at the office mini-party when your paper gets accepted (but it won't go very far towards building, running, and paying the electricity bill for your alternative, post-Amazon solution).  And let's face it, if these commission rates had been in place from the start, you'd have paid them; the actual increase is irrelevant, just like it doesn't matter when you pay $20 for shipping on a $2 item from eBay if the alternative is to spend $50.  All those people tweeting "Goodbye Amazon" aren't really going to switch to another platform.  At bottom, they're just upset because they discovered that a corporation with a monopoly will exploit it, as if they really, really thought that things were going to be different this time (despite everyone knowing that Amazon abuses its warehouse workers and has a history of aggressive tax avoidance).  Indeed, the tone of the protests is remarkable for its lack of direct criticism of Amazon, because that would require an admission that researchers have been complicit with its policies, to an extent that I would argue goes far beyond the average book buyer.  (Disclosure: I'm a hypocrite who orders books or other goods from Amazon about four times a year. I have some good and more bad justifications for that, but basically, I'm not very political, the points made above notwithstanding.)

Bottom line: MTurk is something that researchers can, and possibly (this is not a blog about morals) "should", be able to do without.  Its very existence as a publicly available service seems to be mostly a matter of chance; Amazon doesn't spend much effort on developing it, and it could easily disappear tomorrow.  It introduces new and arguably unquantifiable distortions into research in fields that already have enough problems with validity.  If this increase in prices led to people abandoning it, that might be a good thing.  But my guess is that they won't.

Acknowledgement: Thanks to @thosjleeper for the links to studies of MTurk worker performance.


  1. When you talk about workers working for sub minimum wage rates, you don't accurately take into account all the reasons one might want to do such a thing. You state they are probably "desperately poor". I'm here to tell you that that is most likely NOT the case, and I speak from experience.

    I work full time as a mid level manager at a nation retail company. I make 45,000 per year. It's not a lot, but it's I'm not desperately poor, but any means. My wife brings in as much as I do, so together we make close to 100,000 per year.

    I don't work on MTurk because I'm desperately poor, or because I just can't get enough of academic research. I work on MTurk because of the convenience that it provides. The fact I can work from home any hour of any day, for as short or as long a time as I want. No commute. No customer interaction. Can do it in my pajamas. There is a trade off here most people don't recognize. People are willing to work for less than they are "worth" because of the ease of use and convenience factor.

    Another thing most people don't realize is academic surveys are a small part of most Turker's income. At least if you're a smart Turker. Batches are where it's at. Plenty of startups and other companies post batches that pay anywhere from $10-20 per hour if you are skilled at completing them. You don't have to work for sub minimum wage unless you want to. And the key word is "want". Nobody is being exploited here. Take it from someone who knows. Personally.

  2. Sorry to double post, but there is one other point I'd like to make. Just because a study gives 20 minutes as it's expected completion time, or gives that long before the HIT expires doesn't mean that is how long it will take. Seasoned Turkers regularly complete surveys in as little as 25% or less of the estimated completion time as given by the requester.

    If you browse HWTF on Reddit when someone posts a HIT for a survey, they always post how long it took them to complete, and it's always MUCH less than what the requester estimated it would take.

  3. In the case of the study that took 20 minutes, the write-up explicitly states that 20 minutes was the amount of time that it actually took, not the advertised time.

    But I'm sure that "seasoned Turkers" can do way better than the planned time, and in stating that, you kind of make my point for me. MTurk is kind of the ultimate example of instant-gut-feeling self-report. Sure, there's a time limit to complete a questionnaire in a lab setting, but there's not usually any benefit to going quicker because the next part of the experiment is coming up in 20 minutes, so answering all 50 questions in 10 minutes doesn't gain you anything. Contrast that with the online situation where you have a rational motive to fill in the form as quickly as possible. This is already going to be a source of bias in much psychological research, even if it's mostly irrelevant to non-academic surveys where you have to identify pictures of dogs versus cats or whatever.

  4. Oh yes, I don't deny there is bias and skewed results. In fact, I'm sure of it. These surveys are completed as fast as humanly possible with almost no thought to the content. Most of them are done while doing something like watching TV. There are attention checks and similar manipulations inside the surveys, but seasoned Turkers don't have a problem with these.

    I'd venture to say that the results from almost all of these surveys are practically useless. Not to mention that they are almost all the same. I answer the same questions many times a day from different requesters doing psychological surveys. I have my answers on muscle memory at this point.

  5. A lot of the concerns about MTurk need to be contextualized within the larger issue of using nonprobability samples or other sources of data.

    The quality of MTurk data in terms of attentiveness and test-retest reliability is quite good. In fact, consistency across self-reported demographics is equal to or higher than that observed on the General Social Survey or other sources of self-report data. This suggests that if you want to know where your workers are coming from, you can probably just ask them.

    Effects are typically consistent across MTurk and other samples. For example, see here. Not to say they are inevitably consistent, but the available evidence says this is often the case.

    I would be careful about how you frame the technical problems related to non-independence of data. All available evidence suggests that this attenuates true effects and can be offset through increased sample size. Repeated participation is a huge issue in all online panels. For a discussion see here. Also, the paper you link to notes there are many technical solutions within the platform to limit duplicate responses across studies.

    I don't think anyone seriously makes the claim that MTurk workers are a representative sample, but I think you can make a good argument that they are about as representative as many of the samples already used within the field (e.g., student samples, snowball convenience samples. random digit dialing of the 55% of the population that still has a landline etc.). Truly representative samples are simply not attainable for anyone without a substantial grant.

    I can't speak as to why everyone else is irritated by this, but I feel that Amazon has been less than transparent about its motives. It claims this is to increase revenue for product development, but that is clearly false, because it is targeting the price increases at what it claims is a negligible part of its business. I would not be surprised if the hike was targeted at the segment of the requester population that is responsible for putting upward pressure on worker wages (through IRB requirements and an effort by individual researchers to be somewhat less mercenary). Academics have fixed budgets, and so the net effect of this will be to depress the payment rates they offer to workers.

  6. Amazon doesn't care about Mturk from a revenue and profit perspective. That's why they've made basically no updates in years. The price increase was probably to justify the few resources they provide to it, or to get nuisance usage down.

  7. This is probably the most rational discussion on this topic I've seen. Like Anthony, I am also a (casual) Amazon Mechanical Turk worker (~5 years, Masters). When Amazon confirmed the rate changes, I did a little happy dance. The requesters who will pay the highest commissions are the worst abusers of workers and the Mechanical Turk platform.

    TurkOpticon (https://turkopticon.ucsd.edu/reports) is the only collective place workers have to rate and discuss requesters. At any given moment, TurkOpticon reviews paint a picture of the hideous treatment of workers by researchers: broken HITs (with no compensation), unpaid screeners, missing completion codes, phishing, woefully underpaid, deliberately misleading completion times, horrific setup, disingenuous rejections, failure to communicate with workers, etc.

    I can't imagine why any worker would take to the internet to defend their abusers.

  8. The Anon who states . "The requesters who will pay the highest commissions are the worst abusers of workers and the Mechanical Turk platform." is in dream land.

    Look at the bottom of the list of 55 requesters who have signed this document supporting ethical treatment of study participants. The only reason why this person sees garbage is because they are a "casual turker" and they do not know how to turk properly.
    Good fair paying work is taken quickly. Any casual turker will never see it and will only see the work that is left sitting because good non-casual turkers will not touch it.

    Go away Anon, you are hurting the cause with your ignorance.

  9. Shooting the messenger doesn't change what workers report in their TurkOpticon reviews, nor does it do anything to move a conversation forward.

    Amazon raised its commission structure. Researchers believe it's an attempt to force them out of the marketplace. The 55 requesters who have committed to conduct ethical research (http://wiki.wearedynamo.org/index.php/Guidelines_for_Academic_Requesters) with Amazon Mechanical Turk are a drop in the bucket.

    Again, TurkOpticon tells the story. Attacking me doesn't change that.

  10. Thanks for your information, it was really very helpfull..