Part of the Deep Dive: AI Webinar Series
Openness in AI is necessarily a multidimensional and therefore graded notion. We present work on tracking openness, transparency and accountability in current instruction-tuned large language models. Our aim is to provide evidence-based judgements of openness for over ten specific features, from source code to training data to model weights and from licensing to scientific documentation and API access. The features are grouped in three broad areas (availability, documentation, and access methods). The openness judgements can be used individually by potential users to make informed decisions for or against deployment of a particular architecture or model. They can also be used cumulatively to derive overall openness scores (tracked at https://opening-up-chatgpt.github.io). This approach allows us to efficiently point out questionable uses of the term “open source” (for instance, Meta’s Llama2 emerges as the least open of all ‘open’ models) and to incentivise developers to consider openness and transparency throughout the model development and deployment cycle (for instance, the BLOOMZ model stands out as a paragon of openness). While our focus is on LLM+RLHF architectures, the overall approach of decomposing openness into its most relevant constituent features is of general relevance to the question of how to define “open” in the context of AI and machine learning. As scientists working in the spirit of open research, the framework and source code underlying our openness judgements and live tracker is itself open source.
In this webinar hosted by the Open Source Initiative as a part of the “Deep Dive: Defining Open Source AI” series, Andreas Liesenfeld and Mark Dingemanse with the Center for Language Studies at Radboud University in the Netherlands discuss the “Opening Up Chat GPT” project, which aims to evaluate the openness of large language models and text generators. The project addresses the need for open source AI technology in European academia and introduces a methodology to assess the openness of these systems. The evaluation focuses on various dimensions of openness, including the availability of source code, documentation, user access, pre-training datasets, model weights, licenses, and more. Using a comparative analysis of two systems, the BLOOMZ model and Meta’s Llama2, the presentation illustrates how this evidence-based approach can differentiate between genuinely open systems and those with limited openness. The project also highlights emerging challenges in the field, such as the use of synthetic data and the need for transparency in multi-step training pipelines.