Blog
Announcements
Announcing Native Data Quality Monitoring for Snowflake

Announcing Native Data Quality Monitoring for Snowflake

Announcements
June 5, 2024
Team Foundational
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free

Our goal at Foundational is to make it easy, predictable, and straightforward to develop code for data – which is any piece of code that might affect data throughout the data stack.

We are extremely excited to share that Foundational now supports native data quality automation for Snowflake, leveraging Snowflake’s preview release of Data Metric Functions (DMF) to automate testing and data quality validation. The new capability is also allowing Snowflake users to automate data contract enforcement for Snowflake data, using Foundational’s code analysis capabilities.

Earlier this year, Snowflake released Data Quality Monitoring–Introducing Data Metric Functions, or DMFs in short. DMFs allow users to leverage native data validation capabilities against tables and columns in Snowflake. Users can either define a set of custom rules, or use a set of default ones (Called “System DMFs”) against the desired tables or columns, and Snowflake would enforce these rules in an ongoing manner.

Automating Data Quality Monitoring in Snowflake

The preview release of DMF introduces exciting new data quality capabilities to Snowflake users, including:

  1. Natively-integrated Data Quality checks in Snowflake: Configuring and attaching DMFs to tables can be done programmatically through SQL, and the results are accessible through standard Snowflake tables. These features make it simple to integrate and use Snowflake data quality components at scale.
  2. Rules can trigger on changes only: While third-party tools have to rely on pre-determined schedules, Snowflake DMFs can be set to trigger “on changes”, allowing users to perform the validation only when an actual change is made to the table (E.g. a new record is added). This allows for a much faster, cost-efficient and scalable data validation to happen within Snowflake, in real-time.

DMF Integration in Foundational

Snowflake users can therefore set DMFs against all or most important tables in their warehouse. However, creating hundreds or thousands of rules is extremely time consuming. Furthermore, these data validation rules need to be constantly updated with every change that is introduced to the data warehouse, for example when a change is introduced to data transformation pipelines such as dbt. 

This is where the new Foundational integration comes in–Since Foundational is tracking all code changes impacting data, and produces the understanding of data pipelines through data lineage and column-level dependencies, DMFs can be created automatically according to the latest version of the data pipeline, and always remain updated against the latest code commit.

How does this work?

Foundational is analyzing all the source code that’s defining and creating data pipelines. For example, the source code can have the following section to define a table through an SQL query:

CREATE TABLE public.my_table AS (
    SELECT 
        CASE
            WHEN source_referral in ('Email', 'Website) THEN 'Inbound'
            ELSE 'Outbound'
        END as source_type,
    REPLACE(account_name, '(Free Trial)', '') as cleaned_account_name,
...
)

From this SQL query snippet we can therefore infer the following rules:

  • public.my_table.source_type can have one of these values: Outbound, Inbound
  • Public.my_table.cleaned_account_name should never contain the string (Free Trial)

Then, using this information, we can create the appropriate DMF in Snowflake to validate these rules:

CREATE DATA METRIC FUNCTION IF NOT EXISTS 
governance.dmfs.invalid_source_type (ARG_T table(ARG_C1 STRING)) RETURNS NUMBER AS 
 'SELECT COUNT_IF(FALSE = ( ARG_C1 IN (''Outbound'', ''Inbound'') )) FROM ARG_T';

And:

CREATE DATA METRIC FUNCTION IF NOT EXISTS
governance.dmfs.invalid_cleaned_account_name_count (ARG_T table(ARG_C1 STRING)) RETURNS NUMBER AS 
 'SELECT COUNT_IF(ARG_C1 LIKE ''%(Free Trial)%'') FROM ARG_T';

Finally, we can attach these DMFs to the appropriate table, and trigger them on every row change, to make sure they are validated at real time:

ALTER TABLE public.my_table ADD DATA METRIC FUNCTION governance.dmfs.invalid_cleaned_account_name_count ON (cleaned_account_name);
ALTER TABLE public.my_table SET DATA_METRIC_SCHEDULE = 'TRIGGER_ON_CHANGES';

To help our users and followers try out DMF, we are also excited to share that we’ve released a free online tool, called the DMF Generator. Users can leverage the free tool to try out different DMF-compatible outputs created by the Foundational engine using SQL statements as inputs.

Our approach at Foundational

At Foundational, we are pragmatists. We assume that organizations would always benefit from leveraging new capabilities within their existing tools, whether it’s GitHub or Snowflake. In the case of DMF, what’s exciting is that every Snowflake user can now benefit from data quality monitoring. Our goal at Foundational is to remove the initial friction, and to ensure there is minimal ongoing development efforts required to maintain and scale these mechanisms. As the data pipeline code changes, Foundational will identify these changes and suggest the required modifications to the relevant DMFs to make sure data validation is always up-to-date with the latest version that’s defined by the code.

If you would like to learn more about how Foundational can help you automatically generate Snowflake Data Metric Functions (DMFs), and leverage the new native support for data quality and data quality monitoring - please reach out.

code snippet <goes here>
<style>.horizontal-trigger {height: calc(100% - 100vh);}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
 // Desktop
 let itemsInView = 3;
 let scrollSpeed = 1.2;  if (window.matchMedia("(max-width: 479px)").matches) {
   // Mobile Portrait
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 767px)").matches) {
   // Mobile Landscape
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 991px)").matches) {
   // Tablet
   itemsInView = 2;
   scrollSpeed = 1.2;
 }
 let moveAmount = horizontalItem.length - itemsInView;
 let minHeight =
   scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
 if (moveAmount <= 0) {
   moveAmount = 0;
   minHeight = 0;
   // horizontalSection.css('height', '100vh');
 } else {
   horizontalSection.css("height", "200vh");
 }
 moveDistance = horizontalItem.outerWidth() * moveAmount;
 horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
 calculateScroll();
};let tl = gsap.timeline({
 scrollTrigger: {
   trigger: ".horizontal-trigger",
   // trigger element - viewport
   start: "top top",
   end: "bottom top",
   invalidateOnRefresh: true,
   scrub: 1
 }
});
tl.to(".horizontal-section .list", {
 x: () => -moveDistance,
 duration: 1
});
</script>
Share this post
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free