Commons-based data governance

Part of the Deep Dive: AI Webinar Series

Issues related to data governance (its openness, provenance, transparency) have traditionally been outside the scope of open source frameworks. Yet the development of machine learning models shows that concerns over data governance should be in the scope of any approach that aims to govern open-source AI in a holistic way. In this session, I would like to discuss issues such as: – the need for openly licensed / commons based data sources – the feasibility of a requirement to openly share any data used in the training of open-source models – transparency and provenance requirements that could be part of an open-source AI framework.

Webinar summary

In this webinar hosted by the Open Source Initiative as a part of the “Deep Dive: Defining Open Source AI” series, Alex Tarkowski and Zuzanna Warso discuss the importance of commons-based data governance in the context of open source AI and democratizing AI development. They emphasize the need to rethink how AI datasets are managed and shared, considering the diversity of data sources and the challenges of balancing openness with privacy and fairness. The speakers propose four key principles for commons-based data set governance: sharing data as openly as possible, respecting the decisions of data subjects and creators, ensuring sustainability and fair rewards for contributors, and protecting the commons from pollution and biases. They argue that these principles can help create a roadmap for developing and maintaining open datasets in a fair and sustainable manner, ultimately benefiting the open source AI community.