Part of the Deep Dive: Data Governance Webinar Series
As AI systems become more widespread, so does the need for easy to understand, consistent, and actionable documentation that provides information about the training data, how the AI model was created, and how it should and shouldn’t be used. In this talk, we introduce the CLeAR Framework, a set of guiding principles for how datasets, models, and AI systems should be: Comparable, Legible, Actionable, and Robust. Developed by a group of practitioners and scholars with deep expertise in AI documentation across industry and research, CLeAR helps practitioners navigate the tradeoffs and complexities of documenting today’s AI systems.
In this talk, we’ll walk through the core components of the framework, illustrate its value through real-world case studies, and demonstrate how it supports responsible development, reflection, and oversight. Attendees will leave with practical strategies, examples, and a vocabulary to strengthen documentation practices across the AI lifecycle, including making the AI systems we use daily more explainable.”
The CLeAR Framework can be accessed in full paper form here:
The citation for the work is below:
Chmielinski, K., Newman, S., Kranzinger, C. N., Hind, M., Vaughan, J. W., Mitchell, M., Stoyanovich, J., McMillan-Major, A., McReynolds, E., Esfahany, K., Gray, M. L., Chang, A., & Hudson, M. (2024, May 21). The CLeAR Documentation Framework for AI Transparency: Recommendations for Practitioners & Context for Policymakers. Shorenstein Center.
Video transcript
Presenter: Kasia Chmielinski Executive Director, Data Nutrition Project
The Problem: Mystery Sandwiches in Technology
We don’t eat mystery sandwiches without knowing the ingredients, yet we routinely use AI systems without understanding their training data. This lack of transparency leads to problematic outcomes:
- Amazon’s hiring tool discriminated against women due to biased historic resume data
- Suicide risk prediction models fail on underrepresented communities
- Many other cases of AI systems exhibiting bias from incomplete or biased training data
Unlike food, which has nutrition labels and allergy warnings, AI training data has no disclosure requirements or regulations.
The Solution: CLEAR Framework
CLEAR is a documentation framework for AI models, systems, and data, developed by the Data Nutrition Project and collaborators in 2024.
Four Principles
1. Comparable
- Similar components across documentation
- Enables comparison between datasets, models, or systems
- Helps users make informed choices
2. Legible
- Clear and accessible to intended audience
- Uses appropriate language for that audience
- Supports decision-making
3. Actionable
- Provides practical value to intended audience
- Right level of granularity and detail
- Enables concrete actions like auditing or evaluation
4. Robust
- Sustained and updated over time
- Requires ongoing resources and processes
- Stays current with changes to data/models/systems
Trade-offs
These principles interconnect and sometimes conflict:
- Extreme legibility may sacrifice robustness if resources are depleted
- Comparability across different datasets may reduce specificity
- Perfect documentation that’s hard to update isn’t robust
Implementation Guidance
- Build on existing work – Don’t start from scratch
- Be realistic – Consider ongoing resource requirements
- Start early, revise often – Document alongside development, not at the end
- Consider your audience – Technical vs. policy vs. research audiences need different approaches
- Document regardless of scale – The process itself improves outcomes
Case Studies
1. HuggingFace Data Card
- Comparable: ✓ Standard format across platform (but voluntary)
- Legible: Limited information, technical audience
- Actionable: Basic licensing and citation info, but lacks usage guidance
- Robust: Attached to dataset, but no update dates or contact info
2. Google Gemini Model Card
- Comparable: Standardized across Google models, but PDF format limits comparison
- Legible: ✓ Focused on ethics/safety community
- Actionable: Limited for practitioners; useful for safety testing planners
- Robust: Includes dates and contact info, but PDF requires regeneration for updates
3. Data Nutrition Label
- Comparable: Standard visual structure, though domain-specific info may be lost
- Legible: Accessible to general practitioners and researchers
- Actionable: ✓ Clear usage guidance and limitations
- Robust: Includes dates and contacts, but not attached to the dataset
Key Insight
Documentation done alongside development improves the final product. When teams know they must document their work, they make better choices throughout the process—a virtuous cycle.
Learn more: Data Nutrition Project builds open-source tools for creating data nutrition labels and provides education and policy guidance for AI governance.
