The Open Source AI Definition – draft v. 0.0.8

version 0.0.8

Note: This document is made of three parts: A preamble, stating the intentions of this document; the Definition of Open Source AI itself; and a checklist to evaluate legal documents.

This document follows the definition of AI system adopted by the Organization for Economic and Co-operation Development (OECD)

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

More information about definitions of AI systems on OSI's blog.

Preamble

Why we need Open Source Artificial Intelligence (AI)

Open Source has demonstrated that massive benefits accrue to everyone when you remove the barriers to learning, using, sharing and improving software systems. These benefits are the result of using licenses that adhere to the Open Source Definition. The benefits can be summarized as autonomy, transparency, frictionless reuse, and collaborative improvement.
Comment

Everyone needs these benefits in AI. We need essential freedoms to enable users to build and deploy AI systems that are reliable and transparent.

What is Open Source AI

An Open Source AI is an AI system made available under terms that grant the freedoms to:

Use the system for any purpose and without having to ask for permission.
Study how the system works and inspect its components.
Modify the system for any purpose, including to change its output.
Share the system for others to use with or without modifications, for any purpose.

Precondition to exercise these freedoms is to have access to the preferred form to make modifications to the system.

Preferred form to make modifications to machine-learning systems

The preferred form of making modifications for a machine-learning Open Source AI must include:

Data information: Sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data.
- For example, if used, this would include the training methodologies and techniques, the training data sets used, information about the provenance of those data sets, their scope and characteristics, how the data was obtained and selected, the labeling procedures and data cleaning methodologies.
Code: The source code used to train and run the system.
- For example, if used, this would include code used for pre-processing data, code used for training, validation and testing, supporting libraries like tokenizers and hyperparameters search code, inference code, and model architecture.
Model: The model parameters.
- For example, this might include checkpoints from key intermediate stages of training as well as the final optimizer state.

Checklist to evaluate machine learning systems

This checklist is based on the paper The Model Openness Framework: Promoting Completeness and Openness for Reproducibility, Transparency and Usability in AI published Mar 21, 2024.

Table of default required components

Required components	Legal frameworks
Data information
– Training methodologies and techniques	Available under OSD-compliant license
– Training data scope and characteristics	Available under OSD-compliant license
– Training data provenance (including how data was obtained and selected)	Available under OSD-compliant license
– Training data labeling procedures, if used	Available under OSD-compliant license
– Training data cleaning methodology	Available under OSD-compliant license
Code
– Data pre-processing	Available under OSI-approved license
– Training, validation and testing	Available under OSI-approved license
– Inference	Available under OSI-approved license
– Supporting libraries and tools	Available under OSI-approved license
Model
– Model architecture	Available under OSI-approved license
– Model parameters	Available under OSD-conformant terms

The following components are not required as the preferred form of making modifications, but their inclusion in releases is appreciated.

Optional components	Legal frameworks
Data information All data sets, including:	Available under OSD-compliant license
– Training data sets	Available under OSD-compliant license
– Testing data sets	Available under OSD-compliant license
– Validation data sets	Available under OSD-compliant license
– Benchmarking data sets	Available under OSD-compliant license
– Data card	Available under OSD-compliant license
– Evaluation data	Available under OSD-compliant license
– Evaluation results	Available under OSD-compliant license
– Other data documentation	Available under OSD-compliant license
Code
– Code used to perform inference for benchmark tests	Available under OSI-approved license
– Evaluation code	Available under OSI-approved license
Model All model elements, including:
– Model card	Available under OSD-compliant license
– Sample model outputs	Available under OSD-compliant license
– Model metadata	Available under OSD-compliant license
Other Any other documentation or tools produced or used, including:
– Research papers	Available under OSD-compliant license
– Technical report	Available under OSD-compliant license

Leave comments for this text

Search Our Site

The Open Source AI Definition – draft v. 0.0.8

version 0.0.8

Preamble

Why we need Open Source Artificial Intelligence (AI)

What is Open Source AI

Preferred form to make modifications to machine-learning systems

Checklist to evaluate machine learning systems

Table of default required components

About

Open
Source AI

Licenses

Board

Trademark and logo

Community

Search Our Site

version 0.0.8

Preamble

Why we need Open Source Artificial Intelligence (AI)

What is Open Source AI

Preferred form to make modifications to machine-learning systems

Checklist to evaluate machine learning systems

Table of default required components

OpenSource AI

Open
Source AI