DBT stands for “Data Build Tool.” It is an open-source tool that analytics engineers to transform data in their warehouse easily and in a scalable manner. It is a declarative ELT framework that allows transformations to be expressed as SELECT statements and corresponding configurations in YAML files.
Microsoft Fabric has a dbt adapter, ‘dbt-fabric’ and discussing how to use it will be the focus of this blog.
Fabric (Synapse) Data Warehouse is a lake (OneLake) centric SaaS experience. It supports cross-database queries that can be created using tools like Visual Query Editor, SQL Query Editor, and SQL Server Management Studio (SSMS). Users from any experience level can easily grasp the basic ideas of using a Fabric warehouse due to its very intuitive nature.
Fabric Warehouse supports both read and write operations unlike the SQL Analytics endpoint of a Fabric Lakehouse. Thus, the warehouse supports more T-SQL functionality like COPY INTO, CREATE TABLE AS SELECT (CTAS), INSERT..SELECT, and SELECT INTO are available in a warehouse.
Creating a DBT project and using it with a Fabric data warehouse is simple process. To accomplish this, first, we need to install the following tools in our system.
After installing these tools, we can start creating a DBT project using the <dbt init> command.
Let’s create a project named <dbt_fabric_demo>. We are assuming that we already have a data warehouse in Fabric.
We need to configure the <profiles.yml> file accordingly. Following is the <profiles.yml> file that we are using in our case.
We will test the connection by using <dbt debug>. It may ask for authentication (here, we are using Azure CLI authentication). If the authentication and connection establishment succeed, we will get an outcome like the image below after running <dbt debug>.
Now that we know the ways to connect DBT with Fabric, let’s see the Architecture.
DBT Architecture can be specified in the following way where we implement the Medallion architecture as recommended by Microsoft.
Macros are a way to create scalable functions across DBT projects. Sometimes the macros are created specific to the warehouse/database for which the project is used. In the below case, we have showed an example of a macro for statistics that can be created and updated after every load.
STATISTICS are created on tables to improve the performance of the table in Fabric. In the below macro, the STATISTICS are created/updated based on the full load/incremental refresh. This macro can be used as post-hook operation after every table load. But if the tables are loaded in a higher frequency, statistics are usually discouraged for running after every load.
The above macro can be used as post_hook operation so that the statistics are updated after the data load.
Together DBT and Fabric can help streamline the process of ELT in a more efficient way. The benefits derived include features like lineage, dependency management, native test support, reusable macros, project specific documentation, and so on. In fact, combining DBT with Fabric can be the best way to do transformations at scale.
Stay ahead in a rapidly world. Subscribe to Prysm Insights,our monthly look at the critical issues facing global business.
© 2024 Data Crafters | All rights reserved