On the Need to Establish Personal Data Ownership in the Age of AI Agents

On Tuesday, September 24th, 2024, I had the privilege of delivering the opening talk at the Agentic AI Summit 2024, hosted by Steven Echtman and the entire Aiify.io team. In the post below, I’m excited to share some key insights from my presentation.

9 min readSep 27, 2024

💎 The main takeaways?

1 / We are on the cusp of a new reality where each of us will be equipped with multiple AI agents that enhance our digital identities and assist with various tasks.

2 / In this emerging “Agentic Universe,” it’s crucial that individuals retain control and ownership of the data that fuels the “lives” of our personal AI agents and digital twins.

3 / Personal data ownership needs to be established in our legal system, and “Private-by-Default” should be the guiding principle for building AI-powered digital agents and twins.

Two Recent Controversies: LinkedIn and Scarlett Johansson

As AI-driven technologies become increasingly prevalent, the issue of personal data ownership has taken on new urgency. Two recent cases shed light on the growing tensions in this domain:

First, LinkedIn’s decision to scrape user data without seeking prior consent raises fundamental questions about how personal information is treated.
Second, Scarlett Johansson’s dispute with OpenAI over the unauthorized use of her voice highlights similar concerns.

These examples reveal a critical reality: whether it’s our words or our voices, our data is being used in ways we may not have agreed to, often without our explicit approval. In a world where personal data is currency, these practices call for a serious reevaluation of existing norms.

LinkedIn’s “You’re Scraped-by-Default” Approach

Last week LinkedIn found itself at the center of a growing debate over data privacy, following revelations that it has been using user data to train its AI models — without explicit consent. This incident raised eyebrows among its 930 million users, igniting fresh concerns about ownership of personal information.

In September 2024, LinkedIn quietly revised its terms of service, disclosing that user data — including profile details, posts, and engagement patterns — was being leveraged to improve AI models. Notably, this practice had already been in place well before the updated policy was rolled out, catching many users off guard.

LinkedIn’s use of data for AI training is expansive. It includes users’ profile information, their posts and shared content as well as engagement data, such as how often users log in or interact, and language preferences. According to the company, this data serves to refine LinkedIn’s services, particularly by enhancing features like content generation and post recommendations.

LinkedIn’s data scraping initiative was not implemented uniformly across regions. In the U.S. and U.K., users were included by default, whereas in the EU, EEA, and Switzerland, individuals were automatically excluded from such practices — likely due to the stringent privacy protections offered by regulations like the GDPR. The contrast is striking and points to the influence of legal frameworks in shaping corporate behavior.

Unsurprisingly, the revelations triggered a wave of user dissatisfaction. Many felt that their trust had been violated, as their personal and professional data had been used without their explicit consent. This backlash was predictable, but it reveals something more significant about the broader landscape of data governance.

In my view, this incident highlights a fundamental flaw in the enterprise-centric data ecosystem, enshrined by landmark regulations like the GDPR and CCPA. While these laws grant individuals certain rights — such as the ability to opt out of tracking — those rights are often more theoretical than real.

In practice, they are “super-qualified” and offer little more than the illusion of control. The result is that the current framework largely reinforces the dominance of major data-driven companies rather than empowering individuals. [for a deeper analysis, please see this article]

This incident underscores a deeper tension between the drive for AI-powered innovation and the need to respect privacy rights. Had LinkedIn adhered fully to the spirit of data privacy laws, they would have provided users with clear, prior notice and given them the choice to opt in. Instead, they chose to scrape data by default, leaving users to hunt for an obscure opt-out button.

This approach exemplifies a growing trend: companies leveraging loopholes to prioritize their own objectives over meaningful user consent, further tilting the balance in favor of enterprise interests.

Scarlet Johansson vs OpenAI

The dispute between Scarlett Johansson and OpenAI revolves around the release of ChatGPT-4o, which featured a voice, “Sky,” that Johansson argued bore an uncanny resemblance to her own. After the model’s release, Johansson publicly accused OpenAI and its CEO, Sam Altman, of deliberately mimicking her voice without consent. Her reaction was one of shock and disappointment, as she expressed concern over the similarity.

OpenAI responded by clarifying that the voice was not intended to imitate Johansson and noted that the voice actor had been hired well before any outreach to her. Nevertheless, in a gesture of respect, OpenAI chose to pause the use of the voice in its products.

This incident attracted widespread media attention and raised serious ethical questions about the use of voice cloning technologies in generative AI. It also highlights a broader issue: as AI systems become more capable of replicating human voices and likenesses, the lines between creative inspiration and unauthorized imitation blur, making it increasingly difficult to determine where the rights of individuals end and the possibilities of AI begin.

These cases illustrate the broader challenge in today’s data landscape: who owns your data, and what rights do you have over it when companies use AI to extract and process it?

While these issues may seem remote, they are becoming an everyday reality for many individuals, especially as AI becomes more integrated into both consumer and enterprise environments. One potential solution to these challenges is building tools that allow individuals to have control over their own data and AI interactions.

Entering the World with Personal AI Twins & Agents

In my presentation, I introduced “Paul AI,” my personal knowledge twin. “Paul AI” operates “on top of” my previous publications, blog posts, law review articles, book chapters, conference talks, and noted thoughts — forming a collected knowledge base for my digital twin.

Talk to Paul AI: https://hey.speak-to.ai/paul

I have full ownership and control over “Paul AI.” I decide what information Paul AI has, how it responds to user questions, and how I want to share it — whether to keep it private, share it with a limited group (e.g., my students), or make it public.

Hybrid Identity: Paul AI is not just a tool; it’s an integral part of my digital identity. It represents me in digital environments and acts on my behalf when I’m unavailable. In that sense, it is an extension of my digital self.

What is the nature of this new digital entity? For me, it is an extension of who I am. Not only does it know my content, but it also augments my capabilities: it speaks any language. If you ask a question in German, it will be answered in German. If you ask a question in a language I don’t speak, such as Korean, it will answer in Korean.

Human-Centric Data Model: A Guiding Approach to the Agentic Universe

This kind of user-centric approach to data ownership is essential as we move further into the age of AI agents. Current regulations like the GDPR and CCPA (California Consumer Privacy Act) do offer some protections, such as the right to consent and control over how companies use personal data.

However, these regulations were designed in an enterprise-centric environment. In other words, the assumption is still that companies control the data they collect and process. Users can opt-out, but the default remains that the company owns and manages the data.

In this new age, we need a shift toward a human-centric data model, where the starting point is that individuals’ data should be private-by-default, and individuals should be the actual and legal owners of their own data.

The direct consequence of this is that to access data stored in my personal data cloud, companies must ask for explicit permission to access and use it. Just as Scarlett Johansson should have been asked whether her voice could be used, LinkedIn should have sought permission from users before scraping their posts.

This principle — individuals’ data being private by default — is the missing piece in today’s data ecosystem. It is crucial not only for protecting personal information but also for rebalancing the power dynamic between users and tech giants.

My Personal Data is Mine: Defining the Legal Contours of Ownership

For data to be recognized as personal property, it must meet three criteria:

1. It must be clearly defined,

2. It must be controllable by the individual, and

3. It must have clear economic value to the individual.

Historically, personal data has been difficult to conceptualize within these terms. Where does your data begin, and where does it end? How can you control something as intangible as information or metadata? These questions have kept the concept of personal data ownership vague at best.

However, AI agents like Paul AI offer a path forward. By storing my data on a private server and maintaining full control over how my AI interacts with others, I’ve established a system where my data is clearly defined, easily controlled, and unquestionably valuable to me.

This represents a critical shift in thinking: your data is no longer an abstract entity floating in the cloud. It becomes a tangible, valuable asset that you own and control within your own personal data cloud.

Paths Forward: A New Social Contract with Technology

As we think about the future of AI, data ownership, and personal AI agents, we must consider the new social relationships being formed. These relationships are not just between people but between AI entities as well.

If I program Paul AI to interact with another expert’s AI, who owns the output of that interaction? These are questions we are only beginning to grapple with, but they highlight the need for a new legal and regulatory framework that recognizes personal data as a fundamental part of our identity and digital self.

The future of personal data ownership in the age of AI agents depends on us creating systems where individuals — not corporations — have control. By building AI tools that prioritize personal data ownership and privacy by default, we can ensure that our digital identities remain just that — ours.

In this new world, AI agents like Paul AI won’t just be tools or assistants; they’ll be extensions of ourselves, deeply integrated into our professional and personal lives. And just like we own our physical property, we must own our digital identities and the data that defines them.

🙏Thank you for reading!

🌎 Are you curious to explore more about how AI is transforming education? Wondering what the future of learning and work will look like as AI reshapes these spaces? Then sign up for this newsletter to stay updated on the latest trends and insights!

🔎 If you’d like to chat directly with me about my personal views and experiences in AI and EdTech, feel free to DM me — or even have a conversation with Paul AI, my digital knowledge twin.

💎 And don’t wait — now’s the perfect time to create your own digital twin and experience the future of learning firsthand!

📚 Pre-order our forthcoming book “The Creativity Machine”: https://lnkd.in/gJ9USi4h