Why You Should Think About What You Tell ChatGPT : Data Privacy In A World With AI

One of my professors had a weird email style. Weird in the sense that I could never quite tell the tone of his messages. It was difficult for me to know whether he was playfully sarcastic or just…annoyed.

Being the overthinker I am, I would always try to get a second opinion on his emails in order to give a better response myself. It wasn’t before long until I started enlisting ChatGPT to help with this analysis.

I would input the email chain between me and my professor, and chat would output its theories as to what my professor was really trying to say.

While chat was great at deciphering my professor’s messages, I started to question what I was really doing. Some of these conversations with my professor where quite personal. Despite this, I felt no issue sending these messages to chat for analysis. I usually think of chat as a personal, digital journal where I can store useful information to analyze later on. But I tend to forget that this journal is not entirely personal.

ChatGPT is more like a journal you share with a company holding millions of other journals. This company then uses the content of these journals to help millions of other people write theirs. Not quite the private leatherbound book I am use to.

I find myself forgetting this fact a lot as I frequently share personal information with chat. I do this without really thinking about the privacy implications of sharing this data with OpenAI.

I had always heard people talk about the importance of data privacy, especially around technologies such as social media. But I had never really felt this importance personally. I just didn’t really see the issue with companies knowing what type of posts I like on Instagram.

But I had seen people mention the importance of data privacy enough to know I had to look and see if I was missing something. And of course, I was.

Why is Data Privacy Important?

There are many arguments for the importance of data privacy such as your ability to consent and your right to your own data. However, there are two arguments that really helped the importance of data privacy sink in for me.

First off, data privacy helps protects a person’s agency. Think about why companies want to have access to your data. Why is it important for them to know what posts you like on Instagram, or what products you buy off Amazon? These things seemed trivial to me. But this data actually allows companies to build a model of your personality, preferences and thought processes. By doing so, they can better understand who you are as a person, which allows them to better influence your actions, i.e. get you to buy a product. When you don’t have control over a company’s goals, it can be a little nerve wracking giving them this influence.

For example, many social media site’s goal is to keep you on their platforms for as long as possible. To achieve this goal, they have teams of psychologist who help create algorithms that utilize your data to best determine what content will keep you addicted to the app. This is why sometimes it can feel impossible to log off even when you feel like you want to.

This level of control companies can gain over a person just using their viewing history showed me the amount of agency people are at risk of losing from a lack of data privacy. As someone who likes to have control over their life, this can feel a little daunting.

Another reason data privacy is important is because there are plenty of people out there who steal data and use it for nefarious acts such as scamming or harassment. The more personal and sensitive this data is, the more leverage these people have over their victims.

When you give your data to a company, you are putting your trust in that company’s ability to protect your data from bad actors. People should be more aware of who they share their data with as data leaks are quite common, even with large scale companies.

For example, in 2023, 23andMe’s databases got hacked and nearly 7 million people had their data stolen. This isn’t simple data such as one’s product preferences on Amazon either. These where people’s genetic and medical data which carries highly sensitive information about a person. With access to this information, hackers can do all sorts of nefarious acts such as use private ancestry information to blackmail, or scam someone using their medical information.

The point is, your data is a powerful resource that can be used against you, so having more knowledge and control over who protects it is important.

How AI Exacerbates Data Privacy Concerns

The creation of the internet and social media already created some friction between companies and data privacy. But the rise of AI brings in a whole new set of issues that make data privacy all the more important.

The very nature of how AI and LLMs work right now increases both the variety and depth of data companies can now collect about a person. Millions of people are having conversations with LLM models everyday about a broad range of topics. These aren’t surface level conversations either. A lot of these are deep, private conversations about a person’s life and struggles. This is quite clear given that therapy and companionship are the top self-reported uses of ChatGPT. AI companies are able to learn more about an person than any company has been able to before.

This exacerbates both of the problems that come from a lack data privacy mentioned above. The more complex and detailed data one can get about a person, the better they can learn about them in order to control their actions. The more sensitive and personal data one can get about a person the better they can use data against them. LLMs help companies get the most complex and sensitive data out of a person which makes these issues more potent.

The problem is only going to get worse. AI tech leaders are not hiding the fact that they want their models to eventually learn everything about you. Earlier this year at an AI event hosted by VC firm, Sequoia, Sam Altman discussed his hopes that one day, ChatGPT would be able to document and remember everything that happens in a person’s life. In describing this future version of ChatGPT he said:

“This model can reason across your whole context and do it efficiently. And every conversation you’ve ever had in your life, every book you’ve ever read, every email you’ve ever read, everything you’ve ever looked at is in there, plus connected to all your data from other sources. And your life just keeps appending to the context.” — Sam Altman

We can already see this becoming a reality with multimodal methods of data collection. LLMs track our conversations and thoughts, smart watches track our biometrics and health, smart cameras and glasses track our environment and emotions.

It’s clear that risks associated with poor data privacy are growing which prompts the need for new strategies to help put the power of data back into the hands of the people who produce it.

How to Take Back Control of Our Data

The current problem is that many people today just don’t know where their data is going and how it is being used. One way to get around this is to help foster “data portability” among individuals. This is the idea that an individual has the right to own their data, store and move it however they please, and choose who gets access to it. This concept is nice because it still allows people to benefit from their data but in a way that is consensual and in line with their actually wants.

However, data portability is hard to implement in practice. It can be difficult to figure out how to retrieve, store and understand your own data from all these different companies. It can also be time consuming to research all the companies and products that could utilize your data and pick the best ones.

This is where the idea of data intermediaries come in. Data intermediaries are third party entities that would stand between you and the companies in order to help make the process of using and protecting your own data more seamless. The job of these intermediaries would be to understand the data ecosystem well enough to properly negotiate between you and the companies about where and how your data would be used with the hopes of achieving the best outcome for both parties.

The implementation of these strategies would help people feel a greater sense of control over their data and give them back their agency and safety.

In the end, I stopped using ChatGPT as a way to understand my professor’s emails. I hope the conversations I did share with chat could help other college students with confusing professors. But after gaining a bit more clarity about how much power a company can have over me by utilizing my data, I am trying to be a little more cautious about what I put into my digital journal.

Leave a Reply