GitHub hits CTRL-Z, decides it will train its AI with user data after all

GitHub hits CTRL-Z, decides it will train its AI with user data after all

Summary

GitHub has changed its Copilot interaction-data policy: from 24 April the company will use customer interaction data — including inputs, outputs, code snippets and surrounding context — to train its AI models unless users opt out. The change applies to Copilot Free, Pro and Pro+ users; Copilot Business and Enterprise customers, plus students and teachers, are exempt under existing contracts.

Users can opt out via Settings > /settings/copilot/features by disabling “Allow GitHub to use my data for AI model training”. GitHub says adding interaction data improves suggestion quality and security, citing internal improvements after training on Microsoft employee interactions. The move has provoked community pushback and raises questions about the meaning of “private” repos when interaction data may be collected while a user is actively using Copilot in that repo.

Key Points

  • From 24 April, GitHub will collect and use Copilot interaction data to train its AI unless users opt out.
  • Collected data includes model inputs and outputs, accepted or modified suggestions, code snippets shown, code context around the cursor, comments/documentation, file names/repo structure, Copilot feature interactions (chats), and feedback (thumbs up/down).
  • Policy applies to Copilot Free, Pro and Pro+; Copilot Business, Enterprise, students and teachers are exempt.
  • Opt-out follows US-style “established industry practices” (opt-out), not European opt-in norms; users must disable the setting in /settings/copilot/features.
  • GitHub’s FAQs state private repo content can be collected for model training while a user is actively using Copilot in that repo, narrowing expectations of “private” repositories.
  • Community reaction has been largely negative, with limited visible endorsement from GitHub staff beyond internal comments; GitHub justifies the change by citing measurable improvements from interaction-data training.
  • The decision mirrors practices at other vendors (Anthropic, JetBrains, Microsoft) and highlights a wider industry pattern of training models on user-sourced data with opt-out controls.

Context and Relevance

This shift matters if you are a developer, team lead, legal or security professional using GitHub Copilot. The policy affects code provenance, IP exposure and privacy expectations for private repositories because interaction data — including snippets and repo context — can be harvested while using the tool. Organisations that rely on data-protection standards in the EU or have strict IP rules should reassess Copilot use and configuration, and consider Business/Enterprise plans or disabling the training setting.

More broadly, the update is part of an industry trend: vendors increasingly train models on user interaction data and offer opt-out rather than opt-in. That influences how organisations negotiate contracts, apply governance to AI-assisted development, and manage supply-chain or poisoning risks from model training data.

Why should I read this?

Look — if you write code on GitHub, this directly changes whether your work might be consumed by the very AI that helps you write code. It’s quick to check and clip the setting if you don’t want your snippets and context used. If you manage teams or care about IP/privacy, you’ll want to know who’s included, who’s exempt, and how to react.

Author style

Punchy: this is a significant change with practical consequences. Read the detail if you host proprietary code, work under EU privacy rules, or negotiate vendor contracts. If you’re a casual user, at least flip the opt-out switch if it bothers you; if you’re responsible for compliance or IP, dig into the exemptions and consider plan-level protections.

Source

Source: https://go.theregister.com/feed/www.theregister.com/2026/03/26/github_ai_training_policy_changes/