Updated Google is promoting its reCAPTCHA service as a security mechanism for websites, but researchers at the University of California, Irvine claim that the service is collecting information while expending billions of dollars worth of human effort.
The term CAPTCHA stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” and, as Google explains, refers to a challenge-response authentication method that presents humans with puzzles or questions that computers cannot solve.
Such tests have been used for almost 20 years to combat fraud and other automated online misconduct. CAPTCHA puzzles, which include text, images, audio, or behavioral challenges such as clicking a checkbox, are widely used online.
Google acquired the reCAPTCHA service in 2009, two years after it debuted.
The search giant has revamped its service since then, introducing reCAPTCHA v2 in 2014 and reCAPTCHA v3 shortly after the shutdown of v1 in 2018. While v3 is the latest version, v2 is still used by around 3 million websites.
In an era where AI models can answer CAPTCHA questions nearly as well as humans can, the usefulness of reCAPTCHA challenges seems greatly diminished.
Show me the money
Academics at the University of California, Irvine, argue that CAPTCHAs should be abolished.
In the paper [PDF] In a paper titled “Dazed & Confused: A Large-Scale Real-World User Study of reCAPTCHAv2,” authors Andrew Searles, Renascence Tarafder Prapty, and Gene Tsudik argue that the service should be phased out because it is disliked by users, costly in terms of time and datacenter resources, and vulnerable to bots, defeating its intended purpose.
“I believe that the true purpose of reCAPTCHA is to harvest user information and effort from websites,” Andrew Searles, a recent PhD graduate and lead author of the paper, argued in an email to The Register.
“If you think reCAPTCHA makes your website secure, you're mistaken. Moreover, this false sense of security comes at a huge cost to people's time and privacy.”
The paper, published in November 2023, notes that even in 2016, researchers were able to defeat reCAPTCHA v2's image challenge with a 70% success rate. reCAPTCHA v2's checkbox challenge is even weaker, and the researchers claim it can be defeated 100% of the time.
reCAPTCHA v3 fared similarly: in 2019, researchers devised a reinforcement learning attack that beat reCAPTCHAv3's behavior-based challenges 97 percent of the time.
“Version 3 is better than version 2 because it's purely behavior-based,” points out Gene Tudyk, a professor of computer science at the University of California, Irvine. “But like version 2, it's not a true CAPTCHA — that is, it's not public, and it's not a Turing test. It's a behavior-analysis-based method that assigns a score to a user's behavior, and therefore violates privacy because we (the public) don't know how it works. It's essentially a 'black box.'”
“These systems were already broken before they were deployed on a global scale,” Searls argues. “The image selection problem was solved computationally in 2009 (but was solved by Google in 2014). The reCATPCHA third-party cookie for behavioral detection introduced a 'clickjacking' vulnerability that was easy to automatically circumvent.”
You are a commodity
The authors' findings are based on a user study conducted over a 13-month period from 2022 to 2023. Approximately 9,141 reCAPTCHAv2 sessions were captured from unwitting participants and analyzed together with a survey completed by 108 people.
Respondents gave reCAPTCHA v2's checkbox puzzle a score of 78.51 out of 100 on the system usability scale, while the image puzzle received just 58.90. “Results showed that 40 percent of participants found the image version intrusive (or very intrusive), and less than 10 percent found the checkbox version intrusive,” the paper explains.
But when looked at as a whole, we see that reCAPTCHA interactions come at a significant cost, some of which is covered by Google.
“In terms of costs, we estimate that over 13 years of implementation, reCAPTCHA cost 819 million human hours, equivalent to at least $6.1 billion in wages,” the authors wrote in the paper.
“reCAPTCHA traffic consumes 134 petabytes of bandwidth, which equates to roughly 7.5 million kWh of energy and 7.5 million pounds of CO2. Additionally, Google may be making $888 billion in profits from cookies. [created by reCAPTCHA sessions] Each sale of the entire labeled dataset will generate $8,753,230,000 in profits.”
Asked whether the costs that Google is passing on to reCAPTCHA users in the form of time and effort are unfair or exploitative, Searls pointed to the original white paper on CAPTCHA by Louis von Ahn, Manuel Blum and John Langford, which he said includes a section on “stealing cycles from humans.”
“This is basically [summarizes] “CAPTCHAs create an exploitative economy of function where malicious bots can conscript humans to do tasks for them,” Searls explains. “It's absurd to ask someone to solve a security task without getting security.”
That cost should be borne by Google, not website users, Searls argued: “Any service that claims to detect bots should actually detect them, especially if it's a paid service.”
As the paper points out, the image labeling challenge has been around since 2004, and attacks capable of beating it 100% of the time existed by 2010. Despite this, Google introduced reCAPTCHA v2, which featured an alternative image recognition security challenge that had been proven insecure four years earlier.
The authors argue that this doesn't make sense from a security perspective, but it does make sense if the goal is to obtain image label data resulting from users identifying CAPTCHA images, which Google sells as a cloud service.
“In conclusion, the true purpose of reCAPTCHA v2 is a free image labeling effort and tracking cookie farm for advertising and data profits disguised as a security service,” the paper declares.
“I don't think there's any room in computer security for hard AI problems,” Searls argues. “This is an experiment that will increase some computing power, but you can't achieve real, measurable security with these techniques.”
Google did not respond to a request for comment.
Additional Updates
In a statement provided to The Register after this article was filed, a Google spokesperson said: “reCAPTCHA user data is not used for any purpose other than improving the reCAPTCHA service, as made clear in our terms of service. Additionally, the majority of our user base has migrated to reCAPTCHA v3, which improves fraud detection through invisible scoring. Even if your site is still using the previous generation product, all reCAPTCHA v2 visual challenge images are pre-labeled and user input has no impact on image labeling.”
