Blog
Technology
Navigating the Nuances of dbt: Common Pitfalls, Limitations, and Solutions

Navigating the Nuances of dbt: Common Pitfalls, Limitations, and Solutions

Technology
January 31, 2024
Barak Fargoun
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free

In the dynamic world of data engineering, dbt has become the go-to development framework for many teams, enabling efficient transformation of data within data warehouses. However, dbt comes with its own set of challenges and nuances that, if not properly understood, can lead to unexpected issues and potentially incidents in your data pipelines.

In this blog post we focus on some of these common pitfalls, and in particular on the limitations of dbt in detecting SQL errors, and how to effectively navigate these challenges.

Pitfall #1: Syntax Errors in dbt

A common misunderstanding among dbt users is the belief that the tool's ‘compile’ phase can catch SQL syntax errors. The reality is that dbt's compilation process is designed to process dbt-specific syntax, such as Jinja macros and ref or source references, but does not necessarily validate the underlying SQL syntax. This means, for example, that errors such as a simple typo – writing [.code]select * form[.code] instead of [.code]select * from[.code] – can successfully “compile” and slip through unnoticed until runtime. While dbt Cloud offers features like Linting to catch these syntax errors, users of the more popular dbt Core often find themselves without an out-of-the-box solution. Tools like SQLFluff can help, but they require additional setup and configuration, which can be non-trivial and time-consuming, especially for large projects.

Pitfall #2: Schema Validation

A significant limitation within dbt is its lack of schema validation during the compilation process. dbt does not verify the existence of columns referred to in your SQL queries against the actual database schemas. This means changes in your source data, such as renaming or removing columns, can go undetected until runtime.

For example, if you change a column name in your source table from [.code]customer_id[.code] to [.code]client_id[.code], but fail to update this in your dbt models, dbt will still compile successfully. The error will only become apparent when the model would run and would then fail to find the [.code]client_id[.code] column. But at that point, the error is already present in the warehouse, and fixing it now involves both time and money - in both engineering and cloud costs.Here’s an example of a simple change you can test in jaffle_shop to confirm this limitation. The following code is a modified version of customers.sql, where the original is:

with customers as (
    select * from {{ ref('stg_customers') }}
),

If we were to change the statement to use real columns that exist, changing it the following would work ok:

with customers as (
    select customer_id, first_name, last_name from {{ ref('stg_customers') }}
),

However, now changing this to:

with customers as (
    select client_id, first_name, last_name from {{ ref('stg_customers') }}
),

In the case where client_id does not actually exist, this change will still compile successfully. In this scenario, dbt's compile process won't alert you to the missing client_id column. In a real-life scenario when there’s a large dbt project with hundreds, or thousands of SQL statements, such changes are common and it’s important to understand what’s actually being checked vs. not, so you could implement checks and validation accordingly.

Pitfall #3: Type Checking

Another area where dbt falls short is in type checking during the compilation of SQL queries. dbt does not validate whether the data types used in your calculations or operations are compatible. This oversight can lead to runtime errors if, for example, you attempt to perform operations between incompatible data types, such as concatenating a string with an integer. dbt will compile these models without any issues, but they will fail upon execution.Type checks are extremely important and greatly affect development efficiency. Perhaps the most famous example of this is TypeScript, an extension of JavaScript that supports type inference. TypeScript has already surpassed JavaScript in popularity.

Here’s an example you can try out to demonstrate this issue in dbt, now looking at [.code]stg_customers.sql[.code]:

with source as (
    select * from {{ ref('raw_customers') }}
),

Now, let’s change the code to introduce an additional type:

with source as (
    select *, first_name + 2 as new_id from {{ ref('raw_customers') }}
),

In this example, attempting to concatenate the string [.code]first_name[.code] with an integer should raise an error, but this will only be evident at runtime, not during dbt's compile phase. Interestingly, in this case, the [.code]raw_customers[.code] data is coming from dbt seeds, so dbt “knows” the type of that column, but even so - dbt will succeed in compiling this model.

What More Could Be Done?

It is important to become familiarized with the common limitations in dbt. While the community and dbt developers might consider integrating more robust SQL validation within dbt Core or improving schema change detection, teams today must rely on additional tools or vigilant code reviews to catch these errors. Teams can take proactive steps to ensure the reliability and accuracy of their data models. Whether through integrating additional frameworks like SQLFluff, adopting rigorous code review processes, or introducing Foundational as a more automated, plug-and-play way of doing these things at scale. There are multiple ways to mitigate these pitfalls and harness the full potential of dbt in your data projects.

Foundational’s Approach to Enhancing dbt Projects

At Foundational, we understand the importance of ensuring high data quality and avoiding data incidents. To address these gaps in dbt, we analyze every code change and pull request, detecting a range of coding issues, from simple SQL syntax errors to complex semantic logic bugs. Our goal is to streamline the dbt development process, helping teams deploy changes faster and with greater confidence – Chat with us to learn more.

code snippet <goes here>
<style>.horizontal-trigger {height: calc(100% - 100vh);}</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/gsap.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/gsap/3.8.0/ScrollTrigger.min.js"></script>
<script>
// © Code by T.RICKS, https://www.timothyricks.com/
// Copyright 2021, T.RICKS, All rights reserved.
// You have the license to use this code in your projects but not to redistribute it to others
gsap.registerPlugin(ScrollTrigger);
let horizontalItem = $(".horizontal-item");
let horizontalSection = $(".horizontal-section");
let moveDistance;
function calculateScroll() {
 // Desktop
 let itemsInView = 3;
 let scrollSpeed = 1.2;  if (window.matchMedia("(max-width: 479px)").matches) {
   // Mobile Portrait
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 767px)").matches) {
   // Mobile Landscape
   itemsInView = 1;
   scrollSpeed = 1.2;
 } else if (window.matchMedia("(max-width: 991px)").matches) {
   // Tablet
   itemsInView = 2;
   scrollSpeed = 1.2;
 }
 let moveAmount = horizontalItem.length - itemsInView;
 let minHeight =
   scrollSpeed * horizontalItem.outerWidth() * horizontalItem.length;
 if (moveAmount <= 0) {
   moveAmount = 0;
   minHeight = 0;
   // horizontalSection.css('height', '100vh');
 } else {
   horizontalSection.css("height", "200vh");
 }
 moveDistance = horizontalItem.outerWidth() * moveAmount;
 horizontalSection.css("min-height", minHeight + "px");
}
calculateScroll();
window.onresize = function () {
 calculateScroll();
};let tl = gsap.timeline({
 scrollTrigger: {
   trigger: ".horizontal-trigger",
   // trigger element - viewport
   start: "top top",
   end: "bottom top",
   invalidateOnRefresh: true,
   scrub: 1
 }
});
tl.to(".horizontal-section .list", {
 x: () => -moveDistance,
 duration: 1
});
</script>
Share this post
Subscribe to our Newsletter
Get the latest from our team delivered to your inbox
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Ready to get started?
Try It Free