Chat at your own risk! Data brokers are selling deeply personal bot transcripts

Chat at your own risk! Data brokers are selling deeply personal bot transcripts

Summary

Data brokers are buying and selling verbatim chatbot conversations captured by browser extensions and aggregated into searchable databases. Research supplied to The Register by Lee S Dryburgh shows extensions that override browser fetch() and XMLHttpRequest() can intercept chats with services such as ChatGPT, Gemini, Claude and others, then store prompts and responses in vector databases where they are searchable via API.

The datasets are purportedly pseudonymised (SHA-256 hashed IDs) and obtained with consent, but many transcripts include clear personally identifiable information: names, dates of birth, medical record numbers, diagnoses, immigration details and other highly sensitive material. Dryburgh’s sampling returned hundreds of unique prompts across more than 20 sensitive categories, and he found examples suggesting healthcare workers paste real patient data into chatbots — data that now appears in commercial stores.

Key Points

  • Malicious or privacy-invasive browser extensions can intercept every chatbot prompt and response by overriding browser network calls.
  • Captured conversations are stored in vector databases and exposed to paying customers through semantic search APIs.
  • Providers claim data is anonymised and obtained legally, but many transcripts contain explicit PII that enables re-identification.
  • Dryburgh’s tests returned ~490 unique prompts from ~435+ panelists across 20 sensitive categories, including mental health, sexual health, immigration and legal issues.
  • Healthcare workers have been observed pasting real patient information into chatbots, creating fresh compliance and legal risks.
  • Shared chatbot accounts and the use of free VPNs or cheap services can amplify exposure when groups reuse logins or rely on extensions that harvest clickstream data.
  • Previous investigations (Koi Security, Dec 2025) and earlier reporting (Sept 2025) documented similar extension-driven data collection at scale.

Content summary

Dryburgh’s report describes how browser extensions marketed as VPNs, ad blockers or utility tools can quietly intercept AI chat traffic by monkey-patching fetch() and XMLHttpRequest() in the browser. The intercepted chat logs — prompts and model replies — are aggregated into panels and stored verbatim in vector DBs with hashed panelist IDs.

Despite claims of anonymisation and lawful collection, the stored text often includes explicit identifiers and sensitive details. Using a VC-backed generative engine optimisation platform, Dryburgh executed 205 queries and retrieved nearly 500 unique prompts that spanned topics from suicide and self-harm to HIV results, abortion clinic searches, immigration status and children’s conversations. The dataset made available to customers therefore included material that could cause real-world harm if abused or linked back to individuals.

The report also flags corporate risk: employees pasting internal documents into chatbots, and remote workers sharing chatbot subscriptions via third-party services — a setup that may route sensitive corporate or client data through the same cheaply monetised channels.

Context and relevance

This is a major privacy and security story at the intersection of browser ecosystem risk, AI adoption and the data-broker market. It underlines how rapidly the threat landscape changes when new interfaces (chatbots) intersect with existing weak privacy practices (loose extension permissions, shared accounts, free VPNs). The finding is relevant to CISOs, privacy officers, developers and everyday users who rely on browser-based AI tooling.

Regulators and organisations should note heightened legal and compliance exposure: patient data, immigration discussions and legal advice saved in commercial databases may violate data-protection rules and create dangerous outcomes for vulnerable people. The incident also emphasises known re-identification risks — hashed IDs and claims of anonymisation are not a fail-safe.

Author style

Punchy: This coverage pulls no punches — it names a systemic problem that’s already producing searchable stores of intimate human data. Read closely if you manage client data, run employee training, or make decisions about approved browser tooling.

Why should I read this?

Short version — because it affects you whether you’re an IT pro, medic, lawyer or just someone who chats to an AI. Don’t be the person who pastes private info into a bot and wakes up on a watchlist. The story gives you the quick facts about how transcripts are harvested, what kinds of sensitive material are exposed, and why the usual ‘it’s anonymised’ excuse doesn’t cut it. If you care about privacy or compliance, this is worth a five-minute read.

Source

Source: https://www.theregister.com/2026/03/03/chatbot_data_harvesting_personal_info/