Blog
Technology
AI Governance Requires Complete Lineage Across Code, Data, and Models

AI Governance Requires Complete Lineage Across Code, Data, and Models

Technology
February 25, 2026
Team Foundational
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free

Based on the Dataversity DGIQ Dialogues panel: "Leading Data Voices from the Front Lines of Governance," sponsored by Foundational. Panelists: Alon Nafta, CEO, Foundational; Cindy Vogel, Director of Healthcare Analytics & Integration, Right Triangle Consulting; Rania Waseef, Founder, Innova Center of Excellence. Moderated by Mark Horseman, Data Evangelist, Dataversity.

The Problem No One Wants to Admit

Boards are asking about model oversight. Security teams are reviewing AI risk. Data teams are drafting AI governance frameworks. But a critical gap remains: most organizations still lack complete lineage across code, data, and configuration. And AI initiatives are not creating that gap, they are exposing it.

During a February 2026 Dataversity panel discussion, Alon Nafta, Foundational's CEO, put it plainly: Fortune 500 companies are still managing data dependencies in Excel [4:08–5:38]. That is not a gap that AI creates. It is a gap AI makes impossible to ignore.

The Maturity Gap AI Is Revealing

One of the most consistent findings across enterprise AI initiatives right now is the shock of the maturity gap. Organizations believed they had governance in place. AI adoption revealed otherwise.

Cindy Vogel, Director of Healthcare Analytics and Integration at Right Triangle Consulting, described it this way: organizations are not surprised that AI needs governance, they are surprised at where they actually are on the maturity model [5:48–7:07]. They were not prepared before AI entered the picture. AI simply made the gaps visible.

Rania Waseef, founder of Innova Center of Excellence, added that a core problem is that many organizations cannot define their own data domains or identify their critical data elements [7:18–8:25], which is the starting point of any governance program, not an advanced AI readiness question. The organizations getting it right are building AI governance on top of data governance foundations, not instead of them.

"The surprise isn't that AI needs governance. It's that organizations are only now realizing where they actually are in their maturity model — and how much foundational work is still ahead."

Common governance gaps being revealed by AI initiatives:

  • Incomplete or absent documentation of transformation logic
  • Data ownership undefined or tribal (residing with one long-tenured individual)
  • Lineage tracked in Excel or equivalent
  • Cross-platform blind spots between warehouses, BI tools, and pipelines
  • No systematic process for data deprecation or currency management

What Is AI Governance?

AI governance is the framework of controls, policies, oversight, and technical mechanisms that ensure AI systems are trustworthy, compliant, and aligned with business objectives. Effective AI governance requires:

  • Clear data lineage from source through model input
  • Model input traceability and configuration visibility
  • Change controls embedded in engineering workflows
  • Metadata completeness across all systems
  • Human oversight mechanisms at defined decision points
  • Defined model oversight, ethical guardrails, and security controls

Critical distinction: AI governance cannot operate independently of data governance. It builds directly on it. Organizations that attempt to stand up AI governance frameworks without data governance foundations already in place consistently encounter the same outcome: garbage in, garbage out [16:50–17:06].   

Context Graphs: Why Your Data Governance Team Is Now Central to AI

One of the fastest-emerging topics in data governance right now is the context graph and it is gaining traction fast [9:41–11:29]. The concept is not new, but practical implementations are accelerating as AI model capabilities mature.

The terms in circulation knowledge graph, data flow graph, lineage graph are largely synonymous in practice. What they all require is the same set of fundamentals that data governance has owned for years: documentation, dependency mapping, ownership definitions, tags, labels, and classifications.

LLMs and AI agents require structured context to reason accurately. Without that context, AI outputs are unreliable. As one panelist described: a colleague setting up LLMs for analytics discovered that every asset needed comprehensive metadata tagging, because without natural language context, the model has no way to understand what the data means [11:35–12:06].   

"Context graphs go back to the basics of things data governance has always owned: documentation, dependencies, ownership, tags, labels. Now AI is requiring us to actually have them."

What Is a Context Graph?

A context graph is a structured representation of relationships between data assets, built from lineage, metadata, ownership definitions, tags, and dependency mapping. It is the structured context that AI agents use to reason accurately. Incomplete lineage = incomplete context graph = unreliable AI outputs.

AI Lineage Is Not Separate from Data Lineage

AI pipelines are still pipelines. They involve SQL transformations, Spark jobs, Python feature engineering, model configuration files, orchestration logic, version control changes, access control updates, and deployment workflows. Governing them requires the same lineage fundamentals extended across more systems and more change surfaces.

As Alon noted during the panel: the technologies are evolving, but the underlying structure is the same [54:14–56:21]. Unstructured data processed by AI still flows through data pipelines, they may run in Python rather than SQL, on Databricks rather than Oracle, but the governance requirement is identical. The industry is renaming things ("AI lineage" instead of "data lineage") but the discipline is the same.

AI lineage must trace:

  • Source data and transformation logic
  • Feature engineering steps
  • Model inputs and configuration changes
  • Model version updates
  • Downstream consumers and dependencies

If lineage stops at the warehouse, AI governance stops there too.

Cross-Platform Lineage Is Now Mandatory

Enterprise data and AI environments span cloud warehouses, transformation frameworks, BI tools, feature stores, machine learning pipelines, source control systems, CI/CD workflows, and AI services and agents. Governance that covers only one of these systems is not governance.

Cross-platform lineage provides end-to-end traceability, unified metadata, dependency awareness across systems, context for AI reasoning, and the transparency regulators are beginning to require. In healthcare, this is already arriving: HIPAA is moving toward requiring documentation of the source of every dataset used in AI decision-making [19:08–20:01].

The FDA is treating AI agents in clinical settings as medical devices [14:25–15:58]. Regulated industries are simply ahead of the curve on a requirement that will generalize.

Regulatory signal for CDOs

HIPAA is moving toward requiring documented source lineage for every dataset used in AI-assisted decisions. The FDA is classifying AI agents in clinical settings as medical devices. If you are in a regulated industry, cross-platform lineage is no longer optional, and if you are not, these regulations are a preview of where requirements are heading.

The Governance Gap No One Talks About: Data Currency and Deprecation

An important insights from the panel came from Cindy on the topic of data currency [21:07–22:38]. The question is not just whether data is governed. It is whether it is still valid.

Her example: colon cancer screening age recommendations have changed from 50 to 45 to 40 within a decade. If an AI agent is making treatment recommendations based on clinical guidelines, is it working from current data or deprecated protocols? Governance must define not just what data enters AI systems, but when that data should be retired.

This is a CDO-level decision, not a data engineering problem. It requires governance frameworks that include deprecation criteria, data currency policies, and systematic review cycles — particularly for AI systems where outputs directly influence decisions.

"We talk a lot about what data goes in. We don't talk nearly enough about when it should come out. Are the agents making recommendations based on medical practices from 20 years ago?"

The Dimension Most AI Governance Frameworks Skip: Change Management

The panel's most energized discussion was on change management and it is almost entirely absent from most AI governance frameworks [23:17–31:43].

Rania framed it clearly: data governance is fundamentally a behavior change program [23:46–26:02]. You are asking people to treat data as an organizational asset, define ownership, establish quality metrics, and change how they work. Change management is not a soft skill layer on top of a technical program, it is the program. Without people understanding the personal and operational value of governance, adoption fails.

Cindy gave a concrete illustration of what ungoverned change looks like in practice: a dashboard is agreed upon by a team, then one person changes a calculation without realizing it affects downstream teams and shared KPIs [26:34–28:10]. Without change management processes, accountability is unclear, impact is invisible, and the organization develops a culture of workarounds. People making hidden changes because they don't believe the official process will address their needs.

"Engineering has had change management figured out for 20 to 30 years — SDLC, CI/CD, peer review. The more of those concepts we bring into data governance, the better we get."

The Engineering Parallel: What Data Governance Can Borrow

Alon made the case that engineering organizations have solved change management at scale [28:33–31:43]. Thousands of simultaneous code changes across multiple business domains, managed without crashing the system, because of SDLC, CI/CD pipelines, and peer review processes developed over decades.

The data governance equivalent means:

  • Version-controlled dashboards and data products
  • Peer review for changes to metrics, calculations, and KPIs
  • CI/CD for data pipelines — catching governance violations before deployment, not after
  • Automated lineage tracking embedded in engineering workflows
  • Change notifications to all downstream consumers when upstream assets are modified

A New Governance Challenge: Analysts Writing AI-Generated Code in Production

The federated governance discussion surfaced a genuinely new problem that most governance frameworks are not built to handle [31:53–34:35].   

As AI tools enable analysts, finance professionals, and non-engineers to generate and deploy code, the traditional assumption that production code is written and reviewed by trained data engineers no longer holds. Business users can now generate Python or SQL that modifies KPIs in accounting models and push it toward production, bypassing the review processes that data engineers have always relied on.

This is a CDO-level governance question. The guardrails that worked when only engineers wrote code are insufficient when the population writing code has expanded dramatically. Governance must now include:

  • Clear policies on who can push what type of code into which environments
  • Automated review gates in CI/CD pipelines that catch governance violations regardless of who wrote the code
  • Risk-tiered review processes — higher scrutiny for changes that affect shared KPIs or regulated data
  • Visibility into AI-generated code changes the same way lineage tracks human-authored transformations
"It's all getting federated whether we're doing a data mesh or not. You need governance mechanisms appropriate for your organization's maturity to operate in that model."

Proactive Governance: What It Actually Means

AI governance that audits outputs after deployment is already too late. By the time a governance failure surfaces in production — a model trained on deprecated data, a metric changed without notification, an access rule silently updated — the downstream impact has already occurred.

Proactive governance is governance embedded where change happens:

  • Analyzing code changes before deployment, not after incidents
  • Ensuring metadata stays aligned with code as transformations evolve
  • Validating model inputs against defined data contracts before training runs
  • Tracking configuration drift across the AI stack
  • Maintaining end-to-end lineage automatically, not as a documentation exercise

The distinction that matters for your governance program

Reactive governance documents and detects. It tells you what went wrong after it went wrong. Proactive governance prevents, it catches violations before they reach production. With AI systems where errors compound across downstream consumers and decisions, the cost of reactive governance is no longer acceptable.

If You Are Starting a Data or AI Governance Program

The panel closed with direct advice for governance leaders building programs. Three perspectives, all complementary:

Collaborate with the person who you hear from the most

Cindy's advice: the person raising the loudest objections usually has the clearest view of a real pain point. Solve it for them. They become your first adopter, your first advocate, and the person who brings the rest of the organization along. Start with a real problem, not a governance theory [41:02–42:11].

Connect governance to business processes first

Rania's framework: identify the critical business processes that drive revenue, support customers, or mitigate risk. Then trace what data executes those processes. Governance tied to business value gets adopted. Governance framed as compliance gets avoided [42:30–44:05].

Go for early wins in discovery and data quality

Alon's recommendation: lineage and discovery are underrated early wins. When an organization can suddenly see what data it has, where it comes from, and who uses it, people can operate differently. They find things, deprecate what's dead, and build confidence in what remains. Pair that with measurable data quality improvements and you establish trust, the foundation everything else depends on [44:11–46:53].

Key Takeaways

  • AI governance depends on complete lineage: across data, code, model configuration, and AI pipelines.
  • The maturity gap is real: organizations that believe they are governance-ready are discovering foundational gaps when AI initiatives launch.
  • Context graphs require cross-platform metadata: incomplete lineage means incomplete context and unreliable AI outputs.
  • Change management is not optional: data governance is a behavior change program, and SDLC/CI/CD concepts from engineering provide a proven model.
  • Analyst-generated code is a new governance frontier: guardrails built for engineering teams are insufficient when AI tools expand who can write production code.
  • Data currency and deprecation must be governed: not just what data enters AI systems, but when it should be retired.
  • Proactive governance reduces risk before production: reactive monitoring catches problems too late.

Watch the Full Discussion

The full Dataversity DGIQ Dialogues panel explores these topics in depth, including federated governance models, unstructured data, and the human-in-the-loop requirements emerging across industries.

code snippet <goes here>
<style>.horizontal-trigger {height: calc(100% - 100vh);}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
 // Desktop
 let itemsInView = 3;
 let scrollSpeed = 1.2;  if (window.matchMedia("(max-width: 479px)").matches) {
   // Mobile Portrait
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 767px)").matches) {
   // Mobile Landscape
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 991px)").matches) {
   // Tablet
   itemsInView = 2;
   scrollSpeed = 1.2;
 }
 let moveAmount = horizontalItem.length - itemsInView;
 let minHeight =
   scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
 if (moveAmount <= 0) {
   moveAmount = 0;
   minHeight = 0;
   // horizontalSection.css('height', '100vh');
 } else {
   horizontalSection.css("height", "200vh");
 }
 moveDistance = horizontalItem.outerWidth() * moveAmount;
 horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
 calculateScroll();
};let tl = gsap.timeline({
 scrollTrigger: {
   trigger: ".horizontal-trigger",
   // trigger element - viewport
   start: "top top",
   end: "bottom top",
   invalidateOnRefresh: true,
   scrub: 1
 }
});
tl.to(".horizontal-section .list", {
 x: () => -moveDistance,
 duration: 1
});
</script>
Share this post
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free

Govern data and AI at the source code