Is MTurk having a data quality crisis?
April 19, 2019
[ Original Research ]
MTurk data quality research methods

Is MTurk having a “data quality crisis”?

In August 2018, there was an event that I like to call “MTurk botgate”. Basically, researchers on Twitter started to panic after some reported a sudden uptick in low quality data. Some researchers were seeing the same kinds of nonsense response patterns to open-ended survey questions by large numbers of participants, and were also seeing repeated geo-locations appear in their datasets (more on this later).

News outlets also jumped on the bandwagon with hits like these:

So why were researchers finding low quality data? I’d like to lay out a few of the possibilities and explain why I think the dominant narrative of “bots” is an unlikely candidate.

Botpocalypse!

As you can probably guess, the story that really caught people’s attention was that MTurk was over-run with bots! This came from the assertions of some researchers who believed that some bad actors had started using scripts that could autonomously or semi-autonomously complete tasks for them. And maybe those bad actors had acquired lots and lots of accounts through which they could enact their nefarious bot behavior.

Yes, there are some scripts that MTurk “power users” are known to use to enhance their experience and increase their efficiency while working on the platform. A lot of these can be found at greasyfork.org – a site where users share custom browser scripts that change the behavior of certain websites. For example, there’s one script called “MTurk Captcha Alert” that supposedly alerts users when a captcha is found in a task. And another called “mTurk survey highlight words” that will highlight words used in attention check questions like “ignore”, “reading”, and “attention”. But there’s nothing there on the order of automating entire MTurk tasks (let alone automate lots of different kinds of tasks).

There’s also clearly an appetite among some users for scripts that would automate some aspects of the drudgery, as one reddit user asks to the “Is there any script to auto check all the radio buttons?” and another asks, similarly, “How do I find a script that automatically fills radio buttons?”. Of course these users didn’t get the answers they were looking for and the threads were unpopular with the communities. But scripts like these do in fact exist (I won’t link them here so as not to make them easier to find), and I wouldn’t be the least bit surprised if some users have figured out how to make them work to a relatively successful degree.

And yes, the buying and selling of MTurk accounts does take place, though I don’t know how successful those transactions are and for how long the accounts remain active post-transaction.

Yet I’m skeptical of the idea that either A) automation scripts had suddenly become widespread among users, B) that some users had suddenly acquired lots and lots of accounts to use as bots. There just isn’t enough evidence to support this. Even the evidence of repeat geo-coordinates was largely a misunderstanding of the fact that Qualtrics geo-locations are only accurate to the city level. In other words, users who were thought to be coming from the same location could simply have been located in the same, densely-populated city.

Blame the user, not the tool

Another possibility is that researchers were new to the platform and not following established best practices for screening out “low-quality” workers.

Typically, researchers will restrict access to their MTurk tasks by only allowing workers who have a history of at least 100 tasks, an approval rate of at least 95% over those tasks (although lately I’ve seen more researchers using >= 98% approval), and who are located in the US.

This practice dates back to a blog post from 2012 by one of the admins of the now defunct TurkerNation: Tips for Academic Requesters on Mturk, and the practice was later validated by a peer-reviewed research article: Reputation as a sufficient condition for data quality on Amazon Mechanical Turk

In speaking with some of the researchers who were reporting data quality issues, I learned that at least a couple of them had apparently not followed these best practices. In one case, for instance, a user had set an approval % criteria, but had not also set a minimum number of HITs completed. This is important, because any users with fewer than 100 HITs completed are automatically assigned a 100% approval rate as stated in the MTurk documentation (“Note that a Worker’s approval rate is statistically meaningless for small numbers of assignments, since a single rejection can reduce the approval rate by many percentage points. So to ensure that a new Worker’s approval rate is unaffected by these statistically meaningless changes, if a Worker has submitted less than 100 assignments, the Worker’s approval rate in the system is 100%.”)

Foreign users

One weakness with the “blame the user” explanation is that researchers did report using a US location restriction, and furthermore, the repeated geo-locations were showing US locations (one location that was showing a lot of users was Buffalo, NY).

This eventually gave rise to the idea, which now seems to be the most plausible in my estimation, that it was foreign users who were pretending to be from the US by using VPN (Virtual Private Networks; a.k.a. “proxies”) services to route their traffic through US server locations. This is what TurkPrime found in their analysis, referring to these users as “server farmers”, and the analysis is pretty compelling.

When I saw this, I jumped at the opportunity to create a tool that would allow researchers to screen these users out by identifying where their traffic was coming from, and eventually collaborated with some folks to create a suite of tools for this purpose and to write up a document that described in detail how to actually do the screening in Qualtrics.

I’ve started using this protocol as standard practice. Since the panic I’ve collected data for a few larger-scale studies and by all accounts the quality of that data was very good.

So to answer the question I posed at the beginning: Is MTurk having a data quality crisis? I think the answer is pretty clearly no – at least, not if you’re following some of the established best practices.

  • 2019/02/18 Creating a withdraw button in Qualtrics
  • Comments

    comments powered by Disqus