Microsoft Fabric features two storage-compute platforms for batch data processing, namely Lakehouse and Warehouse, which offer the capability to store and query data for use in other components. These elements are seamlessly integrated with other parts of the Fabric, serving as central hubs for data storage. These storage systems are aimed for different use cases within an enterprise data architecture and they both feature distinct interactive characteristics which will be the subject of this article.
Fabric Lakehouse platform enables storing and analyzing structured and unstructured data and a tight integration with other data components of Fabric, simplifying the workflow. Lakehouse tables use Delta format which in turn uses parquet files. For every Lakehouse, a default read-only SQL endpoint and a default semantic model is created for consumption within Power BI.
The Lakehouse provides two ways to interact with the data:
Lakehouse is predominantly geared towards users familiar with Spark for processing data.
Fabric Warehouse platform enables the storage, processing, and analysis of data primarily through T-SQL. It supports creation of tables, insertion and deletion of data using T-SQL, making it suitable for dbt-based work flow with Fabric. Like Lakehouse, Warehouse stores tables in Delta format; however, Warehouse differs in that it conceals the folder and file details of a table, offering a user-friendly environment akin to Dedicated Synapse SQL.A default semantic model is created on top of each warehouse, similar to Lakehouse.
The Warehouse UI is similar to SQL Analytics endpoint of Lakehouse, although with minor differences such as provision for generating code for Dropping table and Create Table AS (CTAS).
Warehouse provides easier interaction with the data using T-SQL, but currently there are a few limitations that might be resolved in the future (Click here to know more). For example, TRUNCATE TABLE is not supported but complete data can be dropped using DELETE FROM without filter.
Warehouse is suitable for SQL Analysts who are familiar with processing data predominantly using SQL.
The primary distinction between Warehouse and Lakehouse lies in their interaction methods. Although both systems store data in the ‘delta’ format, they vary by the features of their interactive engines.
Internally, both Lakehouse and Warehouse share a common architecture for processing. The execution plan for the T-SQL query is passed down to Backend compute pools which work with the data in the OneLake.
A key benefit of Fabric is its ability to facilitate cross-queries between Lakehouse and Warehouse as though they were part of the same system. Lakehouse and Warehouse components can be combined with each other, allowing them to be queried collectively using T-SQL. This functionality enables data to be processed based on developer or user requirements, offering interoperability and removing the necessity to duplicate data across different components.
Both Warehouse and Lakehouse are first-class objects in data pipelines and Gen 2 Dataflows, allowing both reading and writing. One surprising fact is, while Lakehouse does not support white spaces and special characters in column names (read our related blog here), Warehouse does support them. (Although not recommended to use spaces, especially in scenarios involving interaction with Spark)
Lakehouse is directly supported for use with Spark, making them first-class objects, however Warehouse tables need to be added as a shortcut in Lakehouse before they can be used with Spark.
Warehouse | Lakehouse | |
Suitable For | Building enterprise data warehouse | Handling big data workloads for Data Engineering and Data Science |
Type of Data | Structured | Structured and Unstructured |
Read through | T-SQL Endpoint Gen 2 Dataflows Power BI (DirectLake) | T-SQL Endpoint Gen 2 Dataflows Power BI (DirectLake) |
Write through | T-SQL Data pipelines Gen 2 Dataflows | Spark (SQL, Scala, Python and R) Data pipelines Gen 2 Dataflows EventStreams |
Security | Object Level, Row Level, Column Level, Dynamic data masking | RLS and Table level (only when using SQL Analytics Endpoint) |
Zero Copy Cloning | Supported through T-SQL | Supported through Spark |
Statistics and Cache Mechanism | Available | Available |
Query Insights and DMVs for Monitoring | Available | Available |
Shortcuts Data | Not directly supported; Available through Lakehouse | Supported |
The comparison between Warehouse and Lakehouse highlights their similarities and differences, particularly in terms of their interaction features. Organizations can choose either component based on the user requirements and have the ability to provide data interoperability through cross-querying, catering to the needs of different category of users without having to duplicate data.
Follow us for interesting content on Microsoft Fabric.
© 2024 Data Crafters | All rights reserved