All Your Post Are Belong To Us

6 min readSep 16, 2024

A few days ago Meta announced it has decided to go ahead using the data of UK users from Facebook and Instagram to train their AI¹. Only a few months ago, back in June, Meta had originally stopped those plans for both the UK and European Union². Now the UK is back on the menu, but not the EU.

UK yes, EU no

The weird thing here: The GDPR (The EU’s data privacy directive) is the law in both jurisdiction (in the UK in form of the “The Data Protection Act 2018”) and has remained so despite Brexit (minus some superficial changes that removed references to the EU), so what why the different treatment by regulators?

The ICO (the UK’s data privacy regulator) has stated³ it will be “monitoring the situation” but also that it “has not provided regulatory approval for the processing and it is for Meta to ensure and demonstrate ongoing compliance.” Nevertheless it seems resigned to let Meta go ahead anyway.

This a bit of an odd statement, but it appears so far to be satisfied with Meta giving users a) an “easy” way to opt-out (if this way is as easy as promised is to be seen), b) is only using data from adult users and a) informing them to some extend.

If the Threads post⁴ by Nick Clegg — vice president for global affairs and communications at Facebook — announcing the change can count as “transparently informing users” is at least debatable. While announcing the policy change, he failed to point users to any mechanism to opt-out of the data harvesting, or even bothered to mention it as an option. That does not bode well for “transparency”.

The ICO will be .. monitoring the situation..

This opt-out for UK users exists, but so far not as a straight forward setting but some kind of “request form”⁵. While this form can be reached via a direct link (thanks to some users who dug around on Instagram’s help pages), I have yet to find a way for users to easily navigate to it.

The ICO states that [Meta has made] “ it simpler for users to object to the processing and providing them with a longer window to do so.” This is confusing, as the GDPR should allow you to “opt-out” at any time, not within some “window” of time.

While I do not know if this represents its final form, I find it extremely disturbing of a data controller to ask users to justify their reason for opting out and I can not see how this mechanism is even remotely compliant with the requirements of the GDPR.

The Instagram “opt-out form” for UK users

It should be obvious why companies are heavily opposed to opt-ins, because

the users would rightly ask what’s in it for them (nothing, really)
the vast amount of inactive accounts who have produced training material over the last decade will never opt-in, a low hanging fruit to pick. (no one will ever opt-out or complain)

I wonder if Meta will release any numbers of how many users have opted out — probably they won’t — but I also wonder if anyone will be able to gather some data of how many users are actually aware that their posts are being used for generative AI training.

The big social media data rush

Social networks are of course a goldmine for training large language models. Not only do they do they contain billions of (mostly) human generated texts to mine, they also offer a valuation system (likes, shares, views) that comes with it for free. And with any significant revenue from generative AI still nowhere to be seen, cost matters more than anything.

Reddit struck a deal with OpenAI earlier this year⁶ earning them some much needed revenue. Twitter/X and its sister company xAI (privately owned by Musk) are obviously harvesting historic data from Twitter to train Musk’s “no guardrails” chatbot “Grok”, available only to Twitter premium users. This happened with neither the possibility for users to opt-out or inform them, which led the Irish regulator (on behalf of the EU) to demand⁷ that not only Twitter stops using the tweets of EU users to train its models, but it has to delete whatever data it has already gathered for this purpose. Twitter settled by agreeing to do so and has announced it will not go forward with the plan.

The regulator wonders if Google should have done a data protection impact assessment .. I wonder that, too.

Meanwhile, Ireland’s Data Protection Commission (DPC) has opened an investigation into whether Google has complied with the EU’s data protection laws, or more specifically, the regulator wonders if Google should have done a data protection impact assessment (DPIA)⁸. I wonder that, too, as I would have expected this to be a standard procedure.

GDPR-exit via (lack of) enforcement?

But back to Meta’s specific problems in Europe: The EU appears to insist (at minimum) that the users are provided with an opt-in, or in GDPR terms: to give informed consent to the new usage of their data.

The use of user content is indeed a significant change of the Terms & Conditions and Data Privacy Policy users have agreed to. The use of user content for generative AI models comes with its own risk of private data leaking to third parties, a view the EU appears to share. A convoluted “opt-out” form does in no way satisfied these requirements.

So why does the UK not share the concerns of the EU? If I have to take a guess, I assume that the UK government has indicated that they will not stand in the way of Meta’s plans. The UK government (both the old Tory and new Labour government) have been extremely reluctant when it comes to regulate the use of data for generative AI training, or the application of AI in general for that matter.

Both have repeatedly stressed AI as a chance for “growth” and “innovation”, and the Labour government has given no indication that they want to follow the EU’s example of the AI Act.

If this is a beginning of an “americanization” of data privacy regulation in the UK, it would be a bad sign of things to come.

Playing jurisdictions against each other

Meanwhile, Meta, OpenAI and Co continue to attack the EU as “standing in the way of innovation”, with Nick Clegg stating that enforcement of the regulation will prevent Meta AI from “our plans to train our AI models to understand the EU’s rich cultural, social and historical contributions ”.

As former UK Deputy Prime Minister Nick Clegg will be very well aware that he is playing to anti-EU sentiment here. Something the UK government will be all too happy to join in considering that they are still desperately trying to find some selling points for Brexit.

It would be devastating though if data privacy is being reduced to score some cheap points against the EU, and even more so if the UK becomes a pawn in a Big Tech’s fight against data privacy regulation.

¹ Building AI Technology for the UK in a Responsible and Transparent Way -Meta | ² Meta reignites plans to train AI using UK users’ public Facebook and Instagram posts — TechCrunch | ³ ICO statement in response to Meta’s announcement on user data to train AI — ICO | ⁴ Nick Clegg on Threads — 13 Sept 2024 | ⁵ Instagram opt out form — Instagram | ⁶ OpenAI strikes Reddit deal to train its AI on your posts — OpenAI | ⁷ Unresolved questions over X’s deal with Ireland to halt AI data processing — Euractiv | ⁸ Google’s GenAI facing privacy risk assessment scrutiny in Europe — TechCrunch

All Your Post Are Belong To Us

UK yes, EU no

The big social media data rush

GDPR-exit via (lack of) enforcement?

Playing jurisdictions against each other

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Wolfgang Hauptfleisch

No responses yet