Open Source AI: What About Data Transparency?
The New StackAI uses both code and data, and this combination continues to be a challenge for open source, said experts at the United Nations OSPOs for Good Conference.
AI uses both code and data, and this combination continues to be a challenge for open source, said experts at the United Nations OSPOs for Good Conference.
The cries to “ban open source” first surfaced last autumn—partly a response to Meta and others’ “opening” large language models (LLMs). Lobbyists bandied the phrase around political rallies and across policy circles. Yet many critics could not explain what open source means in any context and were unfamiliar with the Open Source Definition (OSD). Not knowing or understanding the technical details did not appear to be a barrier to sharing a negative opinion.
After two years of work, OSI has a draft definition, Stefano Maffulli, executive director of OSI, told The New Stack. The team is going through a “validation phase,” he said, making sure the definition includes everything that falls into the open source category, or is likely to.
The OSI is planning a series of workshops to help it refine its working definition of open source AI, with support from Amazon, Cisco, Google and the Sloan Foundation.
The Open Source Initiative is leading an open, community-driven process to detail how the open source definition applies in the context of an AI world. But even without that process, we already know that Meta’s custom license, by restricting usage and the ability to create derivative works, violates multiple tenets of both the current definition of open source and any final work of the OSI community that’s specific to AI.
DOSP involves initially releasing software under a proprietary license, followed by a planned transition to an open source license.
The U.S. Patent and Trademark Office is inviting public comment on proposals that would eliminate third parties’ ability to help clean up bad patents.
