Your Duolingo Data Has Been Leaked on the Dark Web

The scraped data of over 2.6 million users of Duolingo, the world’s largest language learning platform, has been leaked on a dark-web hacking forum.

The compromised data, which includes full names, usernames, email addresses and internal service-related details, was initially put on sale on the now-shutdown hacking forum in January for $1500.

When the data was first advertised, Duolingo told the Record that the data was sourced from public profiles. However, the leaked email addresses found in the data are not publicly available and can be used to launch targeted phishing attacks on Duolingo users.

“We’re aware of this report. These records were obtained by data scraping public profile information,” a spokesperson told Infosecurity in an email.

“We have no indication that our systems were compromised. We take data privacy and security seriously and are continuing to investigate this matter to determine if any further action is needed to protect our learners.”

API Data Scraping

As first reported by VX-Underground on Twitter, the 2.6 million user dataset was released on Friday on a new version of the hacking forum for 8 site credits – worth as little as $2.13

A Threat Actor identified a bug in the Duolingo API. Sending a valid email to the API returns generic account information on the user (name, email, languages studied).

They used an email list to assemble over 2.6m unique entries.

This will be used for doxxing.

— vx-underground (@vxunderground) August 21, 2023

This data was scraped using an exposed application programming interface (API) that has been shared openly since at least March 2023, with researchers tweeting and publicly documenting how to use the API.

The API allows anyone to submit a username and retrieve a JSON output containing the user's public profile information. But it is also possible to feed an email address into the API and confirm if it is associated with a valid DuoLingo account.

This API is still openly available to anyone on the web, even after its abuse was first reported to DuoLingo in January.

Companies often deny or dismiss scraped data as most of the data stolen is already publicly available, even if it is not necessarily easy to compile.

However, when public data is stolen along with private data, such as phone numbers and email addresses, it tends to make the exposed information more concerning and potentially violate data protection laws.

In 2021, for instance, Facebook fell victim to a major data breach after an “add friend” API vulnerability was used to link phone numbers to over 553 million accounts.

Meta was later slapped with a $275 million GDPR fine by the Irish Data Protection Commission (DPC_ after an investigation into the breach found the social media giant had failed to protect its user's data.

Twitter was also hit by a similar breach earlier this year, with hackers exploiting a bug in Twitter’s API to trick it into revealing personal account details of approximately 200 million users. The DPC is still investigating the incident.

A global affair

While the Duolingo data breach exposed account data from users from around the world, some countries were hit harder than others.

According to a report by SurfShark, the US is the most affected country by the breach, with 967k of the 2.6 million unique email addresses exposed being from US users.

duolingo data breach countries — *Map of users affected by Duolingo data breach. Source: Surfshark*

South Sudan comes in second, with 175k accounts leaked. Spain follows in third place with 123k exposed accounts, followed by France with 105k and the United Kingdom with 98k.

On average, each email account was leaked with five data points, such as profile picture (2.7M), username (2.7M), full name (2.2M), country (0.7M) or bio (6k). Some user accounts had all of their details leaked.

However, the biggest concern is the exposure of email addresses, as these could be used to launch targeted phishing attacks on the affected individuals.

“People affected might receive personalized phishing emails, such as offering affordable courses related to the language they have been studying on Duolingo,” Surfshark noted.

“This could be done using leaked names and origin countries, resulting in highly customized emails, possibly even in their own native languages.”

Phishing continues to be the most common type of cyber attack for the third year in a row, with a total of 300,497 phishing victims in 2022.

kasjdf laskjdf asldkfj asdf

new title abc

This is title

AI chatbots must learn to say ‘help!’ says Microsoft executive

test scheduling

This is another test

test 24234

This is a test again

adejh fgbuewv

What Happened to the IRL App? Fake Users and Real Consequences

How a Labour Government Will Change UK Tech, According to Experts

Top 10 Best Public DNS Servers for 2024

The Tendency to Blame the System for Personal Mistakes: An Analysis

Top 10 Facility Management Software Solutions for 2024

Communications Tech Events to Attend: An Analyst's Take

How The Open Group Portfolio of Digital Open Standards Supports your Digital Business Transformation Journey

Test 1

All Social Security Numbers Leaked in Massive Data Breach

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

new title

What Happened to Hi5? From Social Media Star to Digital Footnote

The Growth of Enterprise Tech Podcasting

Olympic Venue Among 40 Museums Targeted by Ransomware Attack: What You Need to Know

test scheduling

test schedule publish ( 12-11-2024 9:10 am bst )

kasjdf laskjdf asldkfj asdf

new title abc

Zero Trust Security: Mastering the Weakest Link

Automated Network Pentesting: Your Secret Weapon in Cybersecurity

Empowering Local Government with FME: Fremont’s Path to RIPA Compliance

AI Strategy in Latin America: Imitation Over Innovation

Top 10 Facility Management Software Solutions for 2024

Top 10 GIS Software Tools And Solutions

Top 10 Biggest GDPR Fines in History (So Far)

Top 10 Building Automation Systems (BAS) for 2024

match

Astera: Revolutionizing Insurance Data Modeling Through Data Vaults

Astera: The Blueprint for End-to-End Data Warehouse Automation

Astera: Transitioning from Manual to Automated Data Pipelines

Cybersecurity Luminary Stephen Khan to Receive Prestigious Hall of Fame Award at Infosecurity Europe

Leadership powerhouse Claire Williams OBE reveals how to navigate change and develop a strong team culture at Infosecurity Europe 2024

Digital Transformation Week Unveils Keynote Topics: Empowering Enterprises with Real-World Insights

Generative AI and Deepfake Expert, Henry Ajder to discuss the impact of generative AI on cybersecurity at Infosecurity Europe 2024

Astera Empowers DXC Technology to Modernize their Legacy Data

Astera Code-free Automated Data Integration

Astera: Build your Custom Data Warehouse in 3 Simple Steps

Astera: Step by Step No-Code Data Preparation

API Data Scraping

A global affair

More from Ellis Stewart

Ellis Stewart

Recommended for you

All Social Security Numbers Leaked in Massive Data Breach