‘AI’ could dox your anonymous posts
Summary created by Smart Answers AI
In summary:
- PCWorld reports that large language models can effectively deanonymize anonymous online posts by analyzing patterns in text and linking them to real identities across platforms.
- Researchers successfully connected Reddit users to Netflix accounts and Hacker News posts to LinkedIn profiles, revealing personal details like age and employment information.
- The best defense against this privacy threat is avoiding sharing personal data online, as even short anonymous quizzes can lead to user identification.
Large language models aren’t good at lots of stuff, like counting fingers or suggesting pizza recipes. But one thing that “AI” is quite good at is analyzing massive amounts of data and finding possible connections that aren’t immediately obvious. That makes it perfect for unmasking anonymous internet posts, according to a new research paper.
Researchers at ETH Zurich and the MATS research fellowship associated with Berkeley ran a program [PDF], collecting data from sources with generally anonymous usernames, like Reddit. By collecting users’ posts across related but distinct movie subreddits, then feeding the LLM data from a Netflix data leak, they could pinpoint specific users associated with those accounts and thus tie them to their real names.
With just one movie recommendation shared on Reddit, 3.1 percent of anonymous users could be nailed down to a specific named Netflix account with 90% accuracy. With five-to-nine movie recommendations shared, that figure jumped up to 23.2 percent. With over 10 shared, it jumped to an astonishing 48.1 percent, with 17 percent of the total being identified with near-total confidence.
Another experiment was run by connecting anonymous accounts on Hacker News (a forum, not an actually malicious site) with publicly confirmed identities on LinkedIn. Users offering up generalized information in short posts over time could expose their real identities, with data like age, home city, job, etc., with a high degree of certainty. It wouldn’t work for every account, and it’s nothing that a private investigator (or even a dedicated layman) couldn’t do… but the automation and scale is staggering.
Pexels
An especially damning example came from a 10-minute anonymous quiz given by an Anthropic researcher on the team. Seven percent of 125 users could be individually identified based on their text answers to the questionnaire, with extrapolated data like their job (“I work in biology, on research”), education history, specific tools, and even the type of English they used in their answer (like the UK spelling for “analysing”).
The results of the research don’t confirm that anyone on any site could be tracked down based on their anonymous activity. The more personal information you give up, even if it seems general, the more vulnerable you are—and that’s nothing new. Users have been “doxxing” each other since the early days of the web and before, and so have law enforcement investigators and other snoops.
But automating the process—building systems that can trawl the web and find confident associations between anonymous and non-anonymous posts—could pose new dangers for those who want to keep their online activity private. The age of social media has largely supplanted the old “screen name” days, but anonymous communities on places like Reddit are still important, especially for those who are part of vulnerable or targeted groups. As the paper says, “deanonymization is one of many ways LLMs empower both criminals and state actors.”
As Ars Technica reports, the researchers offered up suggestions to mitigate your personal risk. Platforms like Reddit can put more strict limits on LLM access to APIs for personal data, and “AI” vendors can monitor activity to try to detect those who are using them to attempt a mass deanonymization campaign.
But the easiest and most reliable way to prevent your personal data from being associated with an anonymous account is, naturally, to make sure that data is never posted online in the first place.
Author: Michael Crider, Staff Writer, PCWorld
Michael is a 15-year veteran of technology journalism, covering everything from Apple to ZTE. On PCWorld he’s the resident keyboard nut, always using a new one for a review and building a new mechanical board or expanding his desktop “battlestation” in his off hours. Michael’s previous bylines include Android Police, Digital Trends, Wired, Lifehacker, and How-To Geek, and he’s covered events like CES and Mobile World Congress live. Michael lives in Pennsylvania where he’s always looking forward to his next kayaking trip.
