DBT (Data Build Tool) is a powerful framework for data transformation, but like any powerful tool, small missteps can lead to big issues — especially at scale. In this post, I’ll walk you through three common pitfalls I’ve seen in production DBT projects that can cause performance degradation, operational issues, or incorrect lineage tracking — and how to avoid them.
. . .
🚨 Mistake 1: Misusing DELETE+INSERT When unique_key Is Not Truly Unique
What happens: When using the delete+insert
materialization strategy in an incremental model, DBT generates a DELETE
query that joins the source and target tables on the unique_key
. If that key isn’t truly unique, you end up with a many-to-many join—and potentially a massive performance hit.
Problematic scenario:
Let’s say you’re building a gold_model
that refreshes data based on tx_date
:
{{ config(
materialized='incremental',
materialization_strategy='delete+insert',
unique_key='tx_date'
) }}
SELECT * FROM {{ ref('silver_model') }}sq
This delete executes into something like:
DELETE FROM gold_model d
USING gold_model__dbt_tmp tmp
WHERE d.tx_date = tmp.tx_date;
If your tx_date
exists on 100,000+ records in both tables, this join explodes, causing long runtimes and high resource usage.
A better way: Use append
with a pre_hook
Instead of relying on DBT’s default strategy, handle the deletion yourself:
{{ config(
materialized='incremental',
materialization_strategy='append',
pre_hook="""
DELETE FROM gold_model
WHERE tx_date IN (SELECT DISTINCT tx_date FROM {{ ref('silver_model') }})
"""
) }}
SELECT * FROM {{ ref('silver_model') }}sq
This avoids a complex join and deletes only the necessary records in a much simpler and faster query.
. . .
🚨 Mistake 2: Using
materialized='table'
Without Handling Data Types Properly
What happens: When you use materialized='table'
, DBT issues a CREATE OR REPLACE
query each time, which drops and recreates the table. If the data types change between runs (e.g., numeric to string), you risk schema drift and broken joins.
Example:
First run:
{{ config(materialized='table') }}
SELECT 1 AS id
Second run:
{{ config(materialized='table') }}
SELECT 'a' AS id
Now, the id
column flips from INTEGER
to STRING
. That inconsistency can break downstream dependencies or cause unexpected casting behavior, like truncating leading zeros in stringified numbers.
Fix it with explicit casting:
Its is always preferable to cast your columns explicitly, especially when dealing with source systems that aren’t strict about data types and materialized as table.
{{ config(materialized='table') }}
SELECT CAST(1 AS VARCHAR(10)) AS id
. . .
🚨 Mistake 3: Conditional
ref()
Logic Can Break Lineage in the Manifest
What happens: When you write models that use ref()
conditionally based on variables, DBT may not correctly register all dependencies in the manifest.json
. This leads to incomplete lineage in the DBT docs or in your CI/CD workflows.
Example with an issue:
{{ config(materialized='table') }}
{% if var('country') == 'IND' %}
SELECT * FROM {{ ref('logic_for_ind') }}
{% elif var('country') == 'USA' %}
SELECT * FROM {{ ref('logic_for_usa') }}
{% endif %}
When running:
dbt compile - vars '{"country": "USA"}' - select gold_model
The manifest.json
may miss the parent models (logic_for_ind
, logic_for_usa
) entirely, because the code path wasn’t executed. This may lead to lineage issues speacilly when manifest is used to maintain lineages such as with Airflow using Astronomer cosmos package.

Suggested fix: Use --depends_on
at the top:
-- depends_on: [{{ ref('logic_for_ind') }}, {{ ref('logic_for_usa') }}]
{{ config(materialized='table') }}
{% if var('country') == 'IND' %}
SELECT * FROM {{ ref('logic_for_ind') }}
{% elif var('country') == 'USA' %}
SELECT * FROM {{ ref('logic_for_usa') }}
{% endif %}
This ensures the compiler is aware of all possible dependencies, even if they aren’t used in a particular run.

DBT Reference: https://docs.getdbt.com/reference/dbt-jinja-functions/ref#forcing-dependencies
. . .
🔚 Wrapping Up
Small decisions in DBT can have big consequences — especially when you’re working at scale. Here’s a quick recap:
🔁 Mistake: Using DELETE+INSERT
with a non-unique unique_key
💥 Impact: Triggers a many-to-many join during deletion, causing serious performance issues, especially with large datasets
✅ Solution: Switch to the append
strategy and use a pre_hook
to delete using a simplified IN
clause for better performance and scalability
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
📦 Mistake: Using materialized='table'
without consistent data types
💥 Impact: Causes schema drift due to CREATE OR REPLACE
behavior, which can break joins or drop leading zeros in IDs
✅ Solution: Always explicitly cast your columns to the correct data types to ensure consistency across runs
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
🔀 Mistake: Conditional ref()
usage without declaring dependencies
💥 Impact: Leads to incomplete lineage in manifest.json
, breaking documentation and dependency tracking
✅ Solution: Use --depends_on
to declare all potential references, even if they aren’t used in a specific run