THE open source ai definition 1.0

We have released the first stable version of the Definition.

Clarifying Misunderstandings About the Open Source AI Definition

The Open Source AI Definition (OSAID) represents an important first step in how we define what Open Source means in an AI context. As AI systems differ fundamentally from traditional software, the OSAID seeks to establish the first set of clear, practical guidelines for their development, use and modification. Unfortunately, misconceptions about the definition persist, often stemming from a lack of understanding about the nature of AI. This post aims to clarify key points and provide a forward-looking perspective on the importance of the OSAID.

What Is an AI System?

One common misunderstanding revolves around the definition of an AI system. According to the OSAID, an AI system aligns with the definition provided by the Organisation for Economic Co-operation and Development (OECD):

An AI system is a machine-based system that, for explicit or implicit objectives, infers, from the input it receives, how to generate outputs such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. Different AI systems vary in their levels of autonomy and adaptiveness after deployment.

In simple terms, an AI system is “the thing” that processes input to produce output, whether that’s a prediction, recommendation, or another result. Anchoring discussions in such common definitions is essential because AI systems differ radically from traditional software.

AI Systems Are Not “Programmed”

Another frequent error is equating training data with source code, suggesting that “training data is how the model gets programmed.” This reflects a misunderstanding: unlike traditional software, AI systems are not programmed in the conventional sense. Instead, they acquire capabilities autonomously during the training process—a phenomenon that distinguishes them from software like the Linux kernel.

For example, the Linux kernel is:

  • Programmed by humans.
  • Composed of source code that developers can read, study, and modify.
  • Reproducible, meaning its binary form can be reliably rebuilt from its source code.

In contrast, modern AI systems such as large language models develop their behavior in ways that are often unpredictable and inexplicable. Training processes are challenging to replicate reliably, even by the system’s creators. These differences necessitated the establishment of a unique definition beyond the Open Source Definition for software.

Addressing Bugs in AI Systems

A core question the OSAID addresses is: How do you fix a buggy AI system? For traditional software, the Open Source Definition provides a clear answer:

The program must include source code, and must allow distribution in source code as well as compiled form. The source code must be the preferred form in which a programmer would modify the program.

However, modifying an AI system requires more than just source code. After extensive consultation with AI developers, researchers and practitioners, the community, through the OSAID co-design process, concluded that the preferred form for modifying an AI system includes:

  1. The software used to create the dataset (i.e., to transform raw data into tokens).
  2. The software used to train the system.
  3. The results of the training (i.e., the parameters).
  4. All legally shareable data used in the training process.

These components collectively enable the study, use, modification and sharing of AI systems in a manner consistent with Open Source principles.

Why Data Matters

A significant challenge in AI is the role of data. Unlike traditional software, where source code is the primary artifact, AI systems depend heavily on data—not just any data, but data processed and curated into training datasets. The OSAID recognizes the legal and ethical complexities of data sharing and uses precise legal terms to outline expectations. While not all raw data can be freely distributed, the Definition ensures that the essential elements for modifying an AI system are accessible.

Moving Forward

The Open Source AI Definition reflects a thoughtful and inclusive process, endorsed by leading AI developers, researchers and practitioners, culminating in a first-step that is represented in version 1.0 of the OSAID. It acknowledges that AI is fundamentally different from software and requires a tailored approach. Misunderstandings about the OSAID often arise from attempts to apply software engineering paradigms to AI, leading to confusion. By embracing the unique characteristics of AI systems, the OSAID offers a robust framework for fostering transparency, innovation and collaboration in AI development.

As we navigate the evolving landscape of AI, it’s crucial to engage with these definitions thoughtfully and constructively. By doing so, we can ensure that AI systems remain open, accessible and aligned with the principles of the broader Open Source movement.

Read More