Part of the Deep Dive: Data Governance Webinar Series
Today’s AI models depend on an invisible supply chain: the data they’re trained and fine-tuned on. But who’s responsible for documenting, disclosing, and governing that lineage? Without standardized automated ways to capture and communicate data provenance, responsible AI and regulatory compliance remain wishful thinking and the risk of harm when using this data rises .
This talk charts the journey from corporate innovation to open standard, specifically how the Data & Trust Alliance developed the first cross-industry data provenance specification, and how it’s now being shepherded through the OASIS Open standards process to ensure open governance, interoperability, and adoption across sectors. We’ll walk through:
● Why provenance is essential for trustworthy AI
● How governance, signaling, and stewardship intersect in real-world enterprise settings
● The role of open standards in translating principles into practice
● Lessons learned from cross-industry collaboration
● What’s next: implementation guides, tools and enterprise adoption
This is a talk for practitioners, policy thinkers, and engineers who want more than frameworks, and are looking for tools, standards, and field-tested insights that can scale with the complexity of AI systems.
Video transcript
Presenter: Lisa Bobbit, Principal Engineer at Cisco Systems
Role: Co-chair, OASIS Data Provenance Standards Technical Committee
Why Data Provenance Matters
Data is the supply chain for AI models. Without understanding our data—who’s responsible for it, what’s happened to it, and how it’s being used—we cannot ensure AI is deployed responsibly. We need standardization, automation, and transparency.
As Rob Thomas noted: “AI is all about data. In fact, data may be the only sustainable source of competitive advantage.” Data must be treated as an independent, valued asset, but you need to know what you have, how it’s managed, governed, and used to understand its true value and protect it properly.
The Eight Core Metadata Areas
The Data and Trust Alliance identified eight essential metadata categories:
- Source – Where the data originated
- Lineage – What has happened to the data over time
- Legal Rights – Authorization to use the data and under what circumstances
- Privacy & Protection – Required security levels based on risk
- Recency – When the data was generated
- Data Type – What kind of data it is and how it was generated
- Intended Use – What the data should be used for
- Restrictions – What cannot be done with the data
Key Standard Requirements
Baseline Information:
- Standard version tracking
- Dataset title and name
- Unique identifier
- Metadata location
- Issuer and source information
- Creation and issue dates
- Data format and method
- Confidentiality and privacy levels
- Processing and storage locations
- Licensing and usage rules
- Intellectual property designations
Three Pillars of Usability
- Interoperability – Readable, open, and usable across any platform, jurisdiction, or market
- Tool Agnostic – No need to reinvest in new tooling or processes
- Automation – Enable automated information capture and transparent decision-making
Practical Applications
Know Your Data
- Track source information and lineage
- Monitor where data is collected and stored
- Document curation and updates
Share Responsibly
- Track who has access and where data is copied
- Automate cataloging
- Enable intent-based access management
Acquire Wisely
- Understand vendor data quality
- Identify potential biases
- Avoid purchasing duplicate data
Maximize Value
- Identify new opportunities
- Make risk-based decisions
- Enable transparency for certifications and assessments
Implementation Strategy
- Start small – Use agile approaches with high-risk, high-value datasets
- Standardize – Ensure consistent metadata fields across the organization
- Build visibility – Create dashboards for teams to access metadata
- Engage stakeholders – Include data scientists, legal, compliance, and procurement
- Extend externally – Work with vendors, partners, and data brokers
Benefits
- Lower costs in procurement and integration
- Mitigate risks from unknown data sources
- Support explainable and ethical AI
- Enable faster compliance
- Provide traceable proof of data handling
- Build foundation for innovation
Get Involved:
Visit the OASIS Data Provenance Standards Technical Committee and join the community on LinkedIn and GitHub.
