LinkedIn Hit with Class Action Lawsuit Over AI Data Practices
LinkedIn, a professional networking platform owned by Microsoft, is facing a class-action lawsuit over allegations that it shared Premium users’ private messages with third parties to train artificial intelligence (AI) models without obtaining explicit consent.
Background
In mid-2024, LinkedIn introduced a new privacy setting that allowed users to either opt-in or opt-out of data sharing specifically for generative AI training purposes. In other words, LinkedIn began leveraging user data — ranging from messages to other interactions on the platform — to refine AI systems.
By September 2024, LinkedIn had officially updated its privacy policy to make this data usage clear, but the policy remained opt-out by default. Many users, either unaware of the changes or unclear about how to navigate the settings, therefore became participants in these data-sharing practices without ever explicitly consenting.
This class-action lawsuit claims that LinkedIn’s handling of these privacy updates breached several federal and state privacy laws. Chief among these allegations is the claim that LinkedIn’s actions violated the federal Stored Communications Act (18 USC 121 §§ 2701–2713) by failing to adequately safeguard private message data and using it for purposes outside the originally agreed-upon terms.
Implications for AI Development
This lawsuit highlights a broader and increasingly contentious debate about how tech companies use consumer data to fuel their AI ambitions. LinkedIn is far from the only company to employ vast amounts of user-generated content (UGC) for machine learning. Large Language Models (LLMs) and other generative AI systems rely on patterns within massive datasets to “learn” (but see below) how to generate human-like responses, complete sentences, create coherent paragraphs, or even suggest professional networking opportunities. While this process can lead to exciting technological breakthroughs, it also raises significant ethical and legal concerns.
Other tech giants have grappled with similar scrutiny. For example, Meta (formerly Facebook) faced regulatory challenges in the UK after its data usage practices came under review by the Information Commissioner’s Office (ICO). Earlier this year, Meta temporarily paused the use of certain user data in its AI training programs to address compliance issues. Although it later resumed AI data usage, Meta implemented stricter consent models—providing users more control and transparency—to satisfy regulatory demands.
Meanwhile, Microsoft, LinkedIn’s parent company, proactively issued a statement in late 2024 clarifying that its other enterprise products, such as Microsoft 365 applications (Word, Excel, PowerPoint), do not use customer data to train AI models. This move was likely an attempt to stave off concerns and reinforce trust with its business clients.
The Importance of Consent and Transparency
The LinkedIn case underscores an ongoing tension between user privacy and the tech industry’s drive to advance AI. While LinkedIn has provided tools for users to review and delete their personal data from certain AI features, critics argue that these measures don’t go far enough. The opt-out nature of the policy is particularly problematic. By defaulting users into data sharing, the onus is placed on individuals to seek out and change these settings — something many people may never realize they need to do. This lack of explicit consent not only opens the door to lawsuits like the current one but also damages trust between platforms and their user bases.
For users, the implications are broad. Messages that were once thought to be private are now part of a much larger AI ecosystem, training models that may power everything from advanced customer service chatbots to productivity tools. For companies, the lawsuit raises questions about how to balance the need for vast datasets with the ethical imperative to ensure that users have control over their personal information. If courts ultimately find that LinkedIn’s practices were improper, it could set a precedent for how companies must approach transparency, consent, and data usage in the future.
A contrary argument can be made, however, that the privacy of such messages is not practically compromised since it is anonymously tossed into the blender of AI training data, the entire corpus of which is ultimately discarded, for all intents and purposes, post-training. Moreover, the key ingredients that trained the model are never in fact used: LLMs’ output is, in general, truly novel content.
Technical (and Legal) Nuance
At issue is whether LLMs’ output infringes on copyrighted works, depending on whether one considers the training model or the actual output generated by an end-user, such as when they prompt ChatGPT with a question and receive a response.
One perspective argues that LLMs effectively reproduce copyrighted content simply by ingesting copyrighted material for their training models. This is evidenced by instances where AI models have been coerced to generate copyrighted material verbatim across various media formats including text, video, images, and even voice.
Conversely, another viewpoint contends that LLMs operate without directly accessing the source data; rather, they generate outputs based on probabilistic patterns. For example, when asked about cat behavior, an LLM might more likely output that “cats like to sit on the floor” than “cats like to sit on hamburgers.” While Google returns search results from a literal index of stored content, an LLM produces original content.
Under copyright law, infringement occurs when copyrighted material is replicated without sufficient transformation. Thus, the critical question is whether merely scraping copyrighted data for LLM training purposes constitutes infringement. One could argue that no infringement occurs during training since the model itself does not create content. Instead, when an LLM generates output upon user prompting, the result is typically novel and non-derivative, unless deliberately instructed to replicate copyrighted material without transformative changes. But in this latter instance, the burden should be on the end user, not the provider of the AI service. Similarly, if a driver uses a motor vehicle to exceed the speed limit, liability does not fall upon the manufacturer.
The Regulatory Landscape
Adding to the complexity of this situation is the evolving global regulatory environment surrounding AI and data privacy. In the European Union, for instance, LinkedIn’s approach would likely have been untenable from the outset. The EU’s stringent General Data Protection Regulation (GDPR) and its emerging EU AI Act place a heavy emphasis on user consent and transparency. Unlike in the U.S., where opt-out models are more commonly accepted, the EU favors opt-in systems that require explicit user permission before data can be used for purposes like AI training.
This regulatory contrast underscores a significant challenge: the evolution and growth of AI amidst disparate legal landscapes. It’s akin to Airbus being barred from U.S. airspace due to differences in fly-by-wire flight control regulations, or if various countries had conflicting rules on passenger security and baggage handling. Just as the ICAO and IATA facilitate international air travel by harmonizing standards, AI regulation needs similar global frameworks to ensure compatibility across borders.
Looking Ahead
This lawsuit is still in its early stages, and it remains to be seen whether LinkedIn will settle, face penalties, or change its data policies as a result. Whatever the outcome, the case is a stark reminder that the race to build smarter, more capable AI systems must be tempered by respect for user privacy and adherence to clear legal standards. Companies that rely on user data to train AI should take this opportunity to reexamine their consent models, their transparency efforts, and the legal frameworks within which they operate.
For the millions of LinkedIn Premium users included in this lawsuit, the stakes are personal. They’ll be watching closely to see if their privacy concerns are addressed, if their data is safeguarded, and if the balance of power shifts even slightly in favor of user rights. For the tech industry at large, this case could be a turning point — one that further molds the future of how AI and privacy intersect in our increasingly digital world.
If you’d like to discuss, please contact us here.