Part of the Deep Dive: AI Webinar Series

Data privacy in AI is something everyone needs to plan for. As AI technology continues to advance, it is becoming increasingly important to protect the personal information that is used to train and power these systems, and to ensure that companies are using personal information properly. First, understand that AI systems can inadvertently leak the data used to train the AI as it is producing results. This talk will give an overview of how and why this happens. Second, ensure that you have proper rights to use data fed into your AI. This is not a simple task at times, and the stakes are high. This talk will go into detail about circumstances where the initial rights were not proper, and the sometimes-catastrophic results of that. Third, consider alternatives to using real personal information to train models. One particularly appealing approach is to use the personal data to create statistically-similar synthetic data, and use that synthetic data to train your AI systems. The considerations are important to help protect personal information, or other sensitive information, from being leaked by using AI. This will help to ensure that AI technology can be used safely and responsibly, and that the benefits of AI can be enjoyed with fewer risks.

Webinar summary

In this webinar hosted by the Open Source Initiative as a part of the “Deep Dive: Defining Open Source AI” series, Michael Meehan emphasizes the importance of synthetic data as a means to protect personal information and sensitive data while training AI models. Using the example of a company called Everalbum, the speaker suggests that Everalbum could have avoided privacy issues if it had utilized synthetic data in its model training process. Synthetic data can legally circumvent privacy concerns because it is not derived from real, identifiable information. While acknowledging that synthetic data may sacrifice some accuracy compared to real data, Meehan underscores the crucial balance between privacy and accuracy. He stresses that these considerations extend beyond personal information and can apply to trade secrets or other sensitive data. Ultimately, the use of synthetic data can promote the safe and responsible use of AI technology while minimizing risks associated with data privacy.


An error has occurred, which probably means the feed is down. Try again later.