Different types of materializations in dbt

by Nathan Purvis

Background

When using dbt - both the cloud and core offering - our models are materialized as a product of executing a dbt run or dbt build command. By default, dbt will create these database objects as views, unless we tell it otherwise. In the following blog we’ll cover the five types of materializations that are available to us (table, view, incremental, materialized view and ephemeral) and how we tell dbt to create these, along with some advantages and disadvantages of using each.

Table

A database object we’re likely all incredibly familiar with. If we configure a model (or group of) to materialize as a table then, following a dbt run or build, this will execute a ‘CREATE TABLE’ statement and the result of the model’s SELECT statement will be stored in the target schema as a table. Tables are fast, efficient to query and are a great store of data for downstream processes and applications such as BI tools like Tableau and Power BI. However, there are drawbacks to consider, such as tables taking a long time to build, especially if complex transformations are involved or you’re working with large amounts of data.

View

Perhaps a little harder to understand conceptually for new users than tables, but views are still very common database objects used throughout organisations. A view doesn’t result in a store of data, but instead the underlying query. You can think of a view as a packaged SQL query that you can call whenever you want to see the result at that time, and they are often used to blend data from multiple underlying tables together. This type of materialization is the default within dbt and a ‘CREATE VIEW AS’ statement will be run upon execution. Some of the pros for views are that no data is actually stored, saving space in the db, and that when querying a view you’ll always be looking at the latest records. However, we need to be wary when using this type of materialization; we see a drop in performance when trying to use them for heavy transformations, as well as stacking multiple on top of each other. They do particularly well for staging data and performing light-touch transformations like dropping fields and recasting data types.

Incremental

The clue is in the name with this type of object - incremental models allow you to update tables in an incremental manner. That is to say, we don’t need to keep continuously rebuilding the table, but can instead only amend or insert records that have changed since the last run or refresh. A prime use-case for this type of materialization is event data i.e. website events, or tables that already contain high volumes of data that would take a lot of processing time to drop and rebuild repeatedly. One slight downside to incremental models is that they are a more advanced feature within dbt and therefore require additional configuration i.e. setting primary keys and ensuring we’re filtering records correctly.

Ephemeral

Ephemeral models behave in a different way to the other four types of materializations, in that they aren’t built as a database object upon running our workflow. When ephemeral models are referenced in a downstream model, the code they contain is simply run within a common table expression (CTE). As a result, this can help to reduce clutter within your target database. However, the flip side is that, because a physical object isn’t created, we can’t ‘SELECT’ directly from an ephemeral model and workflows can become difficult to debug if we rely on too many of these. Much like views,we are better off using ephemeral nodes for light-touch transformations that very limited downstream models depend on.

Materialized view

You may be wondering what the difference is between a standard and materialized view. In an effort to bust unnecessary jargon, we’ll keep things simple: materialized views are just a cached snapshot of the query results at runtime. Let’s say we ‘SELECT *’ from a standard view at N point in time, we can see the results of this query and be able to look through them, but they’re not stored anywhere for future use. On the other hand, if we ‘SELECT *’ from a materialized view, the results will be returned for us to look through, whilst also being stored as a table. Because of this, materialized views can be a useful option if we are using the results elsewhere. An example would be using this resulting data in a downstream BI tool. For a standard view, upon loading our reports we would need to re-query the view, which will in turn have to run the underlying transformations again. Whereas we just need to ‘SELECT’ from a materialized view as the results are already stored as is until the next dbt run/refresh. There are however some drawbacks with materialized views, not least of which they aren’t necessarily available in all data platforms as a feature.

How do we switch between materializations?

Now that we have a short overview of each type of materialization, how do we implement these? Well, it’s actually incredibly easy to do so. Again, as standard, dbt will materialize all nodes as views so you don’t need to touch anything if that’s what you want! If, however, you do want to change this, you can edit the ‘dbt_project.yml’ file to add the ‘+materialization’ configuration. Here’s an example of me doing so at a directory level (all of my marts i.e. gold-layer data is being stored as tables, whereas my silver-layer staging results are views):

We can also change the materialization type within a configuration block at the beginning of a model. This is particularly useful when we want to set arguments for incremental models, or we need to materialize a node in a different way to its directory settings, as this config block will override those. Below is a demonstration of my final ‘returned_sales’ mart being stored as a table due to the configuration block overriding the overall directory settings in the ‘dbt_project.yml’ file:

I hope you found this blog useful and you now feel more comfortable with the various different types of materialization within dbt cloud and core! As usual, the best way to experiment with the benefits of each is to dive in, test and do your own benchmarking based on your requirements. There are more great resources below if you’d like some extra information on the topic!

https://docs.getdbt.com/docs/build/materializations

https://learn.getdbt.com/courses/advanced-materializations

Previous
Previous

Creating Databricks secrets via the Databricks CLI

Next
Next

Troubleshooting common Formula expression errors in Alteryx Designer