ClearlyDefined at the ORT Community Days

Once again Bosch’s campus in Berlin received ORT Community Days, the annual event organized by the OSS Review Toolkit (ORT) community. ORT is an Open Source suite of tools to automate software compliance checks.

During this two day event, members from startups like Double Open and NexB, as well as large corporations like Mercedes-Benz, Volkswagen, CARIAD, Porsche, Here Technologies, EPAM, Deloitte, Sony, Zeiss, Fraunhofer, and Roche, came together to discuss best practices around software supply chain compliance.

The ClearlyDefined community had an important presence at the event, represented by E. Lynette Rayle and Lukas Spieß from GitHub and Qing Tomlinson from SAP. I had the pleasure to represent the Open Source Initiative as the community manager for ClearlyDefined. The mission of ClearlyDefined is to crowdsource a global database of licensing metadata for every software component ever published. We see the ORT community as an important partner towards achieving this mission.

Relevant talks

There were several interesting talks at ORT Community Days. These are the ones I found most relevant to ClearlyDefined:

Philippe Ombredanne presented ScanCode, a project of great importance to ClearlyDefined, as we use this tool to detect licenses, copyrights, and dependencies. Philippe gave an overview of the project and its challenges. For ClearlyDefined, we would like to see better accuracy and performance improvements. 

Sebastian Schuberth presented the Double Open Server (DOS) companion for ORT. DOS is a server application that scans the source code of open source components, stores the scan results for use in license compliance pipelines, and provides a graphical interface for manually curating the license findings. I believe there’s an opportunity to integrate DOS with ClearlyDefined by providing access to our APIs to fetch licensing metadata and allowing the sharing of curations.

Marcel Kurzmann and Martin Nonnenmacher presented Eclipse Apoapsis, another ORT server that makes use of its integration APIs for dependency analysis, license scanning, vulnerability databases, rule engine, and report generation. Again, I feel we could also integrate Eclipse Apoapsis with ClearlyDefined the same way as with DOS.

Till Jaeger gave an excellent talk about curation of ORT output from the perspective of FOSS license compliance. He highlighted the Cyber Resilient Act (CRA), which brings legal provisions for SBOMs, and which will likely increase the need for tools like ORT. Till shared the many challenges in the curation process, particularly the compatibility issues from dual licensing, and went on to showcase the OSADL compatibility matrix.

Presenting ClearlyDefined

I had the privilege of presenting ClearlyDefined together with E. Lynette Rayle from GitHub and we got some really good feedback and questions from the audience.

With the move towards SBOMs everywhere for compliance and security reasons, organizations will face great challenges to generate these at scale for each stage on the supply chain, for every build or release. Additionally, multiple organizations will have to curate the same missing or wrongly identified licensing metadata over and over again.

ClearlyDefined is well suited to solve these problems by serving a cached copy of licensing metadata for each component through a simple API. Organizations will also be able to contribute back with any missing or wrongly identified licensing metadata, helping to create a database that is accurate for the benefit of all.

GitHub is well aware of these challenges and is interested in helping its users in this regard. They recently added 17.5 million package licenses sourced from ClearlyDefined to their database, expanding the license coverage for packages that appear in dependency graph, dependency insights, dependency review, and a repository’s software bill of materials (SBOM).

To make use of ClearlyDefined’s data, a user can simply make a call to its API service. For example, to fetch licensing metadata from the lodash library on npm at version 4.17.21, one would call:

curl -X GET "https://api.clearlydefined.io/definitions/npm/npmjs/-/lodash/4.17.21" -H "accept: */*"

This API call would be processed by the service for ClearlyDefined, as illustrated in the diagram below. If there’s a match in the definition store, then that definition would be sent back to the user. Otherwise, this request would trigger the crawler for ClearlyDefined (part of the harvesting process), which would download the lodash library from npm, scan the library, and write the results to the raw results store. The service for ClearlyDefined would then read the raw results, summarize it, and create a definition to be written in the definition store. Finally, the definition would be served to the user.

The curation process is done through another API call via PATCHes. For example, the below PATCH updates a declared license to Apache-2.0:

"contributionInfo": {
      "summary": "[Test] Update declared license",
      "details": "The declared license should be Apache as per the LICENSE file.",
      "resolution": "Updated declared license to Apache-2.0.",
      "type":"incorrect",
      "removeDefinitions":false
  },

This curation is handled by the service for ClearlyDefined, as illustrated in the diagram below. The curation would trigger the creation of a PR in ClearlyDefined’s curated-data repository, which would be reviewed by and signed off by two curators. The PR would then be merged and written in the curated-data store.

GitHub has deployed its own local Harvester for ClearlyDefined, as illustrated in the diagram below. GitHub’s OSPO Policy Service posts requests to GitHub’s Harvester for ClearlyDefined, which downloads any components and dependencies from various package managers, scans these components, and writes the results directly to ClearlyDefined’s raw results store. GitHub’s OSPO Policy Service fetches definitions from the service for ClearlyDefined as well as licenses and attributions from GitHub’s Package License Gateway. GitHub maintains a local cache store which is synced with any updates from ClearlyDefined’s changes-notifications blob storage.

ClearlyDefined’s development has seen an increased participation from various organizations this past year, including GitHub, SAP, Microsoft, Bloomberg, and CodeThink.

Currently, maintainers of ClearlyDefined are focused on ongoing maintenance. Key goals for ClearlyDefined in 2024 include:

  • Publishing periodic releases and switching to semantic versioning
  • Bringing dependencies up to date (in particular using the latest scancode)
  • Improving the NOASSERTION/OTHER issue
  • Advancing usability and the curation process through the UI 
  • Enhancing the documentation and process for creating a local harvest

Our slides are available here.

Relevant breakout sessions

ORT Community Days provided several breakout sessions to allow participants to discuss pain points and solutions.

A special discussion around curations was led by Sebastian Schuberth and E. Lynette Rayle. The ORT Package Curation Data can be broken down into two categories: metadata interpretations and legal curations. The group discussed their thoughts about the curation process and its challenges, including handling false positives and the sharing of curations.

Nowadays, no conference would be complete without at least one talk or discussion about Artificial Intelligence. A group gathered to discuss the potential use of AI to improve user experience as well as for OSS compliance. The majority of attendees believed ORT’s documentation could be improved through the use of AI and even an assistant would be helpful to answer the most common questions. As for the use of AI for OSS compliance, there’s a lot of potential here, and one idea would be to use ClearlyDefined’s curation dataset to fine tune a LLM.

Conclusion

The second edition of ORT Community Days represented a unique opportunity for the ClearlyDefined community to better engage with the ORT community. We were able to meet the maintainers and members of ORT and learn from them about the current and future challenges. We were also able to explore how our communities can further collaborate. 

On behalf of the ClearlyDefined community, I would like to thank the organizers of this wonderful event: Marcel Kurzmann, Nikola Babadzhanov, Surya Santhi, and Thomas Steenbergen. I would also like to thank E. Lynette Rayle, Lukas Spieß and Qing Tomlinson from the ClearlyDefined community who have accepted my invitation to participate in this conference.

If you are interested in Open Source supply chain compliance and security, I invite you to learn a bit more about the ClearlyDefined and the ORT communities. You might also be interested in my report from FOSS Backstage.