In the Era of AI Advancement, Safeguard Your Digital Privacy

We’re undoubtedly in the AI era, witnessing the launch of chatbots and single-use AI hardware at a rapid pace. In the years ahead, AI is poised to permeate every aspect of our lives. AI companies are fervently collecting data, both public and personal, to train and enhance their models. However, in this process, there’s a trade-off: we’re surrendering our personal information, potentially jeopardizing our privacy. Consequently, I delved into the privacy policies of popular AI chatbots and services and have outlined recommendations for users to safeguard their privacy.

Google Gemini (Formerly Bard)

Starting with Google’s Gemini, it automatically stores all user activity data without seeking explicit consent. According to Google, all interactions and activities on Gemini are retained for up to 18 months. Additionally, human reviewers analyze Gemini chats to annotate conversations, aiding in the improvement of Google’s AI model. The Gemini Apps Privacy Hub page states:

Furthermore, Google advises users not to share any confidential or personal information they don’t want reviewers or Google to access. This message appears on the Gemini homepage, alerting users to exercise caution. In addition to conversations, Gemini Apps activity includes storing users’ location details, IP address, device type, and home/work address associated with their Google account.

Policy on Retaining Data

Google assures users that their data is anonymized by disconnecting their Google account from conversations to safeguard their privacy. Moreover, Google provides the option to disable Gemini Apps activity and offers the ability to delete all Gemini-related data. However, the process can become somewhat convoluted at this point.

Once your conversations have been evaluated or annotated by human reviewers, they aren’t deleted even if you delete all your past Gemini data. Google retains this data for three years. The page states:

Additionally, even when your Gemini Apps Activity is turned off, Google retains your conversations for 72 hours (three days) to “provide the service and process any feedback.”

Regarding uploaded images, Google indicates that textual information extracted from an image is stored, not the image itself. However, it’s noted that, “At this time,” Google doesn’t utilize the actual images or their pixels to enhance its machine-learning technologies.

Considering the possibility that Google may utilize uploaded images to refine its model in the future, users should exercise caution and refrain from uploading personal photos on Gemini.

If you’ve enabled the Google Workspace extension in Gemini, your personal data accessed from apps like Gmail, Google Drive, and Docs bypass human reviewers. Google assures users that this personal data isn’t utilized to train its AI model. However, the data is retained for the “time period needed to provide and maintain Gemini Apps services.”

On the other hand, if you utilize other extensions such as Google Flights, Google Hotels, Google Maps, and YouTube, conversations associated with these services are subject to human review. It’s essential to keep this in mind while interacting with these extensions.

OpenAI ChatGPT

OpenAI’s ChatGPT stands as one of the most popular AI chatbots among users. Similar to Gemini, ChatGPT automatically saves all your conversations by default. However, unlike Gemini, it only notifies users not to share sensitive information the first time after a new user signs up.

In contrast to Gemini, ChatGPT does not feature a static banner on its homepage to notify users that their data may be used for reviewing conversations or training the model.

Regarding the personal data collected by ChatGPT, it includes conversations, images, files, and content from DALL-E for model training and performance enhancement. Additionally, OpenAI collects IP addresses, usage data, device information, geolocation data, and more.This applies to both ChatGPT users who utilize the free version and those who subscribe to the paid ChatGPT Plus service.

OpenAI specifies that content from business plans such as ChatGPT Team, ChatGPT Enterprise, and the API Platform is not utilized to train and improve its models.

OpenAI provides the option to disable chat history and training in ChatGPT through Settings -> Data controls. However, this setting does not synchronize across different browsers and devices using ChatGPT with the same account. Consequently, users need to manually disable history and training on every device where they use ChatGPT.

Upon disabling chat history, new chats will no longer appear in the sidebar and won’t be utilized for model training. However, OpenAI retains chats for 30 days to monitor for abuse, during which they are not used for model training.

Regarding the involvement of human reviewers in viewing conversations, OpenAI states:

A select group of authorized OpenAI personnel, along with trusted service providers bound by confidentiality and security obligations, may access user content solely for specific purposes:(1) investigating abuse or a security incident; (2) providing support to users reaching out with account-related questions; (3) handling legal matters; or (4) improving model performance (unless opted out). Access to content is strictly controlled and limited to authorized personnel on a need-to-know basis. Additionally, all access to user content is monitored and logged, and authorized personnel must undergo security and privacy training before accessing any user content.”

Similar to Google, OpenAI utilizes human reviewers to view conversations and enhance their models by default. However, OpenAI does not disclose this information on ChatGPT’s homepage, raising concerns about transparency.

Users have the option to opt out and request OpenAI to cease training on their content while retaining the Chat history feature. However, OpenAI does not provide access to this privacy portal within the Settings page. Instead, it is buried deep within OpenAI’s documentation, making it challenging for regular users to locate easily. In terms of transparency, Google appears to do a better job than OpenAI.

Microsoft Copilot

Among all the services, I found Microsoft Copilot’s privacy policy to be the most convoluted. It lacks transparency regarding the specifics of what personal data is collected and how Microsoft handles these data.

Although the Microsoft Copilot FAQ page mentions the option to disable personalization, or chat history, no such setting is available on the Copilot page itself. The only option provided is to clear all Copilot activity history from the Microsoft account page.

One positive aspect of Copilot is that it refrains from personalizing interactions if it deems the prompt sensitive. Additionally, it does not save conversations if the information appears to be private.

For Copilot Pro users, Microsoft utilizes data from Office apps to deliver new AI experiences. To disable this feature, users can turn off Connected Experience from any of the Office apps by navigating to Account -> Manage Settings under Account Privacy.

Remini is among the most widely used AI photo enhancers, boasting millions of users. However, its privacy policy raises concerns, and users should exercise caution before uploading personal photos to such apps.

The company’s data retention policy stipulates that processed personal data is retained for 2 to 10 years, which is a lengthy duration. While images, videos, and audio recordings are removed from the server after 15 days, processed facial data, being sensitive in nature, are retained for many years. Moreover, all user data can be transferred to third-party vendors or corporations in the event of a merger or acquisition.

Similarly, Runway, a popular AI tool for images and videos, retains data for up to three years. Lensa, another popular AI photo editor, does not delete user data until the user requests account deletion. Users must email the company to initiate the account deletion process.

Numerous AI tools and services store personal data, particularly processed data from images and videos, for extended periods. To avoid such services, consider using AI image tools that can be run locally. Apps like SuperImage and Upscayl allow users to enhance photos locally, minimizing the risk of data exposure.

Data Sharing with Third-parties

Regarding data sharing with third parties, Google does not specify whether human reviewers who process conversations are part of Google’s in-house team or third-party vendors. Typically, the industry norm is to outsource this type of work to third-party vendors.

OpenAI, on the other hand, declares, “We collaborate with a select group of trusted service providers who assist us in delivering our services. We share only the essential content required to fulfill this purpose. Our service providers are required to uphold stringent confidentiality and security standards, ensuring the safeguarding of user data and the achievement of the intended goal.”

OpenAI explicitly mentions that both its in-house reviewers and trusted third-party service providers view and process content, albeit de-identified. Additionally, the company does not sell data to third parties, and conversations are not used for marketing purposes.

Similarly, Google asserts that conversations are not used to display ads. However, if this policy changes in the future, Google pledges to communicate the change clearly to users.

Potential Risks of Personal Data in Training Datasets

There are numerous risks associated with personal data finding its way into training datasets. Firstly, it violates the privacy of individuals who may not have explicitly consented to have models trained on their personal information. This can be particularly invasive if the service provider fails to transparently communicate its privacy policy to the user.

Additionally, a common risk is the potential for a data breach of confidential information. For instance, last year, Samsung prohibited its employees from using ChatGPT due to concerns that the chatbot was leaking sensitive company data. Despite anonymization, various prompting techniques can compel the AI model to disclose sensitive information.

Furthermore, data poisoning poses a significant risk. Researchers warn that attackers could inject malicious data into conversations, skewing model outputs and introducing harmful biases that compromise the security of AI models. Andrej Karpathy, a founding team member of OpenAI, has provided an extensive explanation of data poisoning.

Is There a Mechanism for Opting Out?

While major service providers like Google and OpenAI offer users a way to opt out of model training, this often comes with the consequence of disabling chat history. It seems that companies are penalizing users for prioritizing privacy over functionality.

However, companies could easily provide the option to retain chat history without including it in the training dataset. This would allow users to access important past conversations without compromising their privacy.

OpenAI does allow users to opt out of model training, but this feature is not prominently advertised and is not available on ChatGPT’s settings page. Instead, users must navigate to OpenAI’s privacy portal and request to stop training on their content while preserving their chat history.

Unfortunately, Google does not offer such an option, which is disappointing. Privacy should not come at the expense of losing useful functionality.

What are the Alternatives?

When it comes to alternatives and methods to minimize your data footprint, the first step is to disable chat history. On platforms like ChatGPT, users can retain chat history while opting out of model training via the privacy portal page.

Furthermore, for those who prioritize privacy, running LLMs (large language models) locally on your computer is a viable option. Many open-source models are available that can run on Windows, macOS, and Linux, even on mid-range computers. There are comprehensive guides available on how to set up and run an LLM locally on your computer.

Additionally, users can utilize Google’s tiny Gemma model, which can also be run locally on a computer. For those interested in ingesting their own private documents, PrivateGPT is an option that operates on the user’s computer.

Overall, in today’s AI landscape where companies are eager to gather data from various sources and even generate synthetic data, it’s crucial for individuals to safeguard their personal data. I strongly recommend users refrain from providing or uploading personal data to AI services to protect their privacy. Additionally, AI companies should not compromise valuable functionalities in favor of privacy. Both privacy and functionality can coexist harmoniously.

Share this article
Shareable URL
Leave a Reply

Your email address will not be published. Required fields are marked *

Read next
0
Share