We are the Best Consulting web site as part of the annual WebAward Competition!

(832) 981-4635
info@datacrafters.io
img
Language

English

Navigating Data Storage in Microsoft Fabric: Lakehouse and Warehouse

Microsoft Fabric features two storage-compute platforms for batch data processing, namely Lakehouse and Warehouse, which offer the capability to store and query data for use in other components. These elements are seamlessly integrated with other parts of the Fabric, serving as central hubs for data storage. These storage systems are aimed for different use cases within an enterprise data architecture and they both feature distinct interactive characteristics which will be the subject of this article.

Lakehouse

Fabric Lakehouse platform enables storing and analyzing structured and unstructured data and a tight integration with other data components of Fabric, simplifying the workflow. Lakehouse tables use Delta format which in turn uses parquet files. For every Lakehouse, a default read-only SQL endpoint and a default semantic model is created for consumption within Power BI.

The Lakehouse provides two ways to interact with the data:

  1. Lakehouse view– Provides the ability to interact with the tables at file level granularity and getting its properties like ABFSS path. It also provides option to do maintenance activities such as OPTIMIZE (improves performance of Delta tables) and VACCUM (removes historical snapshots of data)
  2. Default SQL Analytics endpoint view –Provides the option to query the data using T-SQL and build views on top of tables. This is read-only endpoint and as such, does not provide any option to create new tables, insert data or delete data. It has Visual Query Builder, similar to Power BI Datamart.

Lakehouse is predominantly geared towards users familiar with Spark for processing data.

Warehouse

Fabric Warehouse platform enables the storage, processing, and analysis of data primarily through T-SQL. It supports creation of tables, insertion and deletion of data using T-SQL, making it suitable for dbt-based work flow with Fabric. Like Lakehouse, Warehouse stores tables in Delta format; however, Warehouse differs in that it conceals the folder and file details of a table, offering a user-friendly environment akin to Dedicated Synapse SQL.A default semantic model is created on top of each warehouse, similar to Lakehouse.

The Warehouse UI is similar to SQL Analytics endpoint of Lakehouse, although with minor differences such as provision for generating code for Dropping table and Create Table AS (CTAS).

Warehouse provides easier interaction with the data using T-SQL, but currently there are a few limitations that might be resolved in the future (Click here to know more). For example, TRUNCATE TABLE is not supported but complete data can be dropped using DELETE FROM without filter.

Warehouse is suitable for SQL Analysts who are familiar with processing data predominantly using SQL.

Comparison

The primary distinction between Warehouse and Lakehouse lies in their interaction methods. Although both systems store data in the ‘delta’ format, they vary by the features of their interactive engines.

Internally, both Lakehouse and Warehouse share a common architecture for processing. The execution plan for the T-SQL query is passed down to Backend compute pools which work with the data in the OneLake.

A key benefit of Fabric is its ability to facilitate cross-queries between Lakehouse and Warehouse as though they were part of the same system. Lakehouse and Warehouse components can be combined with each other, allowing them to be queried collectively using T-SQL. This functionality enables data to be processed based on developer or user requirements, offering interoperability and removing the necessity to duplicate data across different components.

Both Warehouse and Lakehouse are first-class objects in data pipelines and Gen 2 Dataflows, allowing both reading and writing. One surprising fact is, while Lakehouse does not support white spaces and special characters in column names (read our related blog here), Warehouse does support them. (Although not recommended to use spaces, especially in scenarios involving interaction with Spark)

Lakehouse is directly supported for use with Spark, making them first-class objects, however Warehouse tables need to be added as a shortcut in Lakehouse before they can be used with Spark.

 WarehouseLakehouse
Suitable ForBuilding enterprise data warehouseHandling big data workloads for Data Engineering and Data Science
Type of DataStructuredStructured and Unstructured
Read through

T-SQL Endpoint
Spark (SQL, Scala, Python and R) using Lakehouse shortcuts Data pipelines

Gen 2 Dataflows

Power BI (DirectLake)

T-SQL Endpoint
Spark (SQL, Scala, Python and R) Data pipelines

Gen 2 Dataflows

Power BI (DirectLake)

Write through

T-SQL

Data pipelines

Gen 2 Dataflows

Spark (SQL, Scala, Python and R)

Data pipelines

Gen 2 Dataflows

EventStreams

SecurityObject Level, Row Level, Column Level, Dynamic data maskingRLS and Table level (only when using SQL Analytics Endpoint)
Zero Copy CloningSupported through T-SQLSupported through Spark
Statistics and Cache MechanismAvailableAvailable
Query Insights and DMVs for MonitoringAvailableAvailable
Shortcuts DataNot directly supported; Available through LakehouseSupported

The comparison between Warehouse and Lakehouse highlights their similarities and differences, particularly in terms of their interaction features. Organizations can choose either component based on the user requirements and have the ability to provide data interoperability through cross-querying, catering to the needs of different category of users without having to duplicate data.

Follow us for interesting content on Microsoft Fabric.

Post A Comment