We are the Best Consulting web site as part of the annual WebAward Competition!

(832) 981-4635
info@datacrafters.io
img
Language

English

Transformations at Scale: Using Microsoft Fabric with DBT

Tanvir Mahtab

Tanvir Mahtab

Associate Data Analyst, Data Crafters

View Author Profile

What is DBT?

DBT stands for “Data Build Tool.” It is an open-source tool that analytics engineers to transform data in their warehouse easily and in a scalable manner. It is a declarative ELT framework that allows transformations to be expressed as SELECT statements and corresponding configurations in YAML files.

Microsoft Fabric has a dbt adapter, ‘dbt-fabric’ and discussing how to use it will be the focus of this blog.

Data Warehouse and T-SQL Functionality

Fabric (Synapse) Data Warehouse is a lake (OneLake) centric SaaS experience. It supports cross-database queries that can be created using tools like Visual Query Editor, SQL Query Editor, and SQL Server Management Studio (SSMS). Users from any experience level can easily grasp the basic ideas of using a Fabric warehouse due to its very intuitive nature.

Fabric Warehouse supports both read and write operations unlike the SQL Analytics endpoint of a Fabric Lakehouse. Thus, the warehouse supports more T-SQL functionality like COPY INTO, CREATE TABLE AS SELECT (CTAS), INSERT..SELECT, and SELECT INTO are available in a warehouse.

Connecting DBT to a Fabric Data Warehouse

Creating a DBT project and using it with a Fabric data warehouse is simple process. To accomplish this, first, we need to install the following tools in our system.

  • Python version 3.7 (or higher)
  • The Microsoft ODBC Driver for SQL Server
  • The dbt-fabric adapter. It can be installed from the PyPI repository using <pip install dbt-fabric> command

After installing these tools, we can start creating a DBT project using the <dbt init> command.

Let’s create a project named <dbt_fabric_demo>. We are assuming that we already have a data warehouse in Fabric.

We need to configure the <profiles.yml> file accordingly. Following is the <profiles.yml> file that we are using in our case.

We are using “Azure CLI” authentication here, as shown in the <profiles.yml>. To use Aure CLI, we need to install it from Microsoft site.There are some alternatives of Azure CLI authentication, such as-
  1. Microsoft Entra ID authentication (this is the default)
  2. Service Principal authentication
  3. Environment-based authentication
  4. VS Code authentication
  5. Automatic authentication
The <host> option is for the connection string of the target warehouse. The connection string can be derived from Fabric Workspace by clicking on the ellipses button and using the ‘Copy SQL connection string’

We will test the connection by using <dbt debug>. It may ask for authentication (here, we are using Azure CLI authentication). If the authentication and connection establishment succeed, we will get an outcome like the image below after running <dbt debug>.

Now that we know the ways to connect DBT with Fabric, let’s see the Architecture.

DBT Architecture can be specified in the following way where we implement the Medallion architecture as recommended by Microsoft.

  1. The bronze layer of type Lakehouse can be used for ingesting data from source systems. Lakehouse is recommended as it can contain data in both structured and unstructured types. They can be converted to Delta tables if they are not of delta format.
  2. The converted Delta tables will be available for further transformation to silver and gold layers. Warehouse can be used for those layers, so that T-SQL can be leveraged for writing data.

DBT Macros with Fabric

Macros are a way to create scalable functions across DBT projects. Sometimes the macros are created specific to the warehouse/database for which the project is used. In the below case, we have showed an example of a macro for statistics that can be created and updated after every load.

STATISTICS are created on tables to improve the performance of the table in Fabric. In the below macro, the STATISTICS are created/updated based on the full load/incremental refresh. This macro can be used as post-hook operation after every table load. But if the tables are loaded in a higher frequency, statistics are usually discouraged for running after every load.

The above macro can be used as post_hook operation so that the statistics are updated after the data load.

Together DBT and Fabric can help streamline the process of ELT in a more efficient way. The benefits derived include features like lineage, dependency management, native test support, reusable macros, project specific documentation, and so on. In fact, combining DBT with Fabric can be the best way to do transformations at scale.

Post A Comment

Stay ahead in a rapidly world. Subscribe to Prysm Insights,our monthly look at the critical issues facing global business.

[mc4wp_form]