em360tech image

The scraped data of over 2.6 million users of Duolingo, the world’s largest language learning platform, has been leaked on a dark-web hacking forum.

The compromised data, which includes full names, usernames, email addresses and internal service-related details, was initially put on sale on the now-shutdown hacking forum in January for $1500. 

When the data was first advertised, Duolingo told the Record that the data was sourced from public profiles. However, the leaked email addresses found in the data are not publicly available and can be used to launch targeted phishing attacks on Duolingo users. 

“We’re aware of this report. These records were obtained by data scraping public profile information,” a spokesperson told Infosecurity in an email. 

“We have no indication that our systems were compromised. We take data privacy and security seriously and are continuing to investigate this matter to determine if any further action is needed to protect our learners.”

API Data Scraping

As first reported by VX-Underground on Twitter, the 2.6 million user dataset was released on Friday on a new version of the hacking forum for 8 site credits – worth as little as $2.13

This data was scraped using an exposed application programming interface (API) that has been shared openly since at least March 2023, with researchers tweeting and publicly documenting how to use the API.

The API allows anyone to submit a username and retrieve a JSON output containing the user's public profile information. But it is also possible to feed an email address into the API and confirm if it is associated with a valid DuoLingo account.

This API is still openly available to anyone on the web, even after its abuse was first reported to DuoLingo in January.

Companies often deny or dismiss scraped data as most of the data stolen is already publicly available, even if it is not necessarily easy to compile. 

However, when public data is stolen along with private data, such as phone numbers and email addresses, it tends to make the exposed information more concerning and potentially violate data protection laws.

In 2021, for instance, Facebook fell victim to a major data breach after an “add friend” API vulnerability was used to link phone numbers to over 553 million accounts.

Meta was later slapped with a $275 million GDPR fine by the Irish Data Protection Commission (DPC_ after an investigation into the breach found the social media giant had failed to protect its user's data. 

Twitter was also hit by a similar breach earlier this year, with hackers exploiting a bug in Twitter’s API to trick it into revealing personal account details of approximately 200 million users. The DPC is still investigating the incident. 

A global affair

While the Duolingo data breach exposed account data from users from around the world, some countries were hit harder than others. 

According to a report by SurfShark, the US is the most affected country by the breach, with 967k of the 2.6 million unique email addresses exposed being from US users. 

duolingo data breach countries
Map of users affected by Duolingo data breach. Source: Surfshark

South Sudan comes in second, with 175k accounts leaked. Spain follows in third place with 123k exposed accounts, followed by France with 105k and the United Kingdom with 98k.

On average, each email account was leaked with five data points, such as profile picture (2.7M), username (2.7M), full name (2.2M), country (0.7M) or bio (6k). Some user accounts had all of their details leaked.

However, the biggest concern is the exposure of email addresses, as these could be used to launch targeted phishing attacks on the affected individuals. 

“People affected might receive personalized phishing emails, such as offering affordable courses related to the language they have been studying on Duolingo,” Surfshark noted. 

“This could be done using leaked names and origin countries, resulting in highly customized emails, possibly even in their own native languages.”

Phishing continues to be the most common type of cyber attack for the third year in a row, with a total of 300,497 phishing victims in 2022.