Get Started
Get Started

How to Configure Staging in Microsoft Fabric Dataflows Gen2 for Better Performance

March 29, 2026

Why Staging Configuration Drives Cost and Performance

A Dataflow Gen1 processing 140 million rows can take 2.5 to 3.5 hours for what should be a straightforward filter. Complex transformations may time out entirely. Capacity usage can spike past 1000% overage, throttling other workloads across the tenant for hours.

Run the same 140 million rows through a properly configured Dataflow Gen2 with a deliberate staging strategy and the result can drop to 2.5 to 3.5 minutes. Capacity usage stays far more predictable and easier to manage.

The difference is not the engine alone. It comes down to how staging is configured at each Medallion layer. Many teams apply one default across all queries, either staging enabled everywhere or staging disabled everywhere. In practice, both choices create avoidable cost or performance issues. Bronze, Silver, and Gold serve different purposes and behave differently, so each layer benefits from its own staging approach.

This article walks through the staging configuration patterns for Bronze, Silver, and Gold, including M code examples, configuration tables, and the scenarios where exceptions make sense. By the end, you will have a clear decision checklist you can keep handy during pipeline design and review.

Core Principle: Staging Is a Per Layer Decision

Each Medallion layer behaves differently, so staging should be configured differently by layer. Bronze prioritizes speed, and staging usually adds overhead you do not need. Silver prioritizes transformation, and staging can offload heavy, non foldable work to the Warehouse engine. Gold prioritizes aggregations and business logic, and staging is required for Warehouse destinations while also supporting complex calculations. When this mapping is configured correctly, dataflows run faster and capacity usage becomes more predictable.

How Staging Works in Dataflow Gen2

When you create your first Dataflow Gen2 in a workspace, Fabric automatically provisions two hidden artifacts: 

  1. A Staging Lakehouse (stores data being ingested)
  2. A Staging Warehouse (acts as a SQL compute engine to transform and write results faster). 

When staging is enabled, data lands in the Lakehouse first, then the Warehouse’s SQL engine handles transformations that can be expressed as SQL filters, joins, aggregations before writing to your destination. When staging is disabled, the Mashup Engine does everything in-memory.

image 17

Figure 3: Staging Architecture

Dataflow Gen2 Staging Architecture Explained

When staging is ENABLED:

Data flows through the Staging Lakehouse first, then the Staging Warehouse applies transformations using its SQL compute engine and finally writes to the destination. This leverages powerful SQL-based processing for complex operations.

image 18

Figure 3.1.1: Enable Staging

When staging is DISABLED:

The Mashup Engine handles everything in-memory, extracting data, applying transformations, and writing directly to the destination. This can be faster for simple operations but struggles with large volumes.

image 21

Figure 3.1.2: Disable Staging

When Transformations Fold to the Staging Warehouse 

If your staged data undergoes transformations that can be expressed as SQL (filters, joins, aggregations), the Warehouse compute processes them natively, much faster than in-memory processing. Think of staging as a “pit stop” for your data. Before your data reaches its final destination, it pauses at a staging area where transformations can be performed more efficiently. This is particularly powerful when:

  1. Your data source doesn’t support Query Folding
  2. Transformations are too complex to be pushed down to the source system
  3. You’re performing merges or joins across multiple data sources
  4. Downstream queries need to reference the same transformed dataset
Power Query StepFolded SQL Equivalent
Get table “Orders”SELECT *
Filter: Year = 2024WHERE Year = 2024
Select columns: A, B, CSELECT A, B, C
Sort by DateORDER BY Date

Result: a single SQL query sent to the source with minimal data transfer.

ScenarioQuery FoldingStaging Recommendation
All transformations fold to source Full folding Disable staging (adds overhead)
Some fold, some don’t⚠️ Partial folding Split query & enable staging on non-folding part
No transformations fold No folding Enable staging (leverage Warehouse compute)
Merging from multiple sources Cannot fold Enable staging (both sources staged first)

How to Read Folding Indicators in Dataflow Gen2

Dataflow Gen2 provides visual indicators in the Power Query editor:

  • Green indicator (solid line): Step folds to source/staging compute.
  • Gray indicator (dashed line): Step does NOT fold, processed in-memory by the Mashup engine.
  • Folding boundary: The point where folding stops. When you see it turn gray, that’s your signal to consider splitting the query.

When you see the folding indicator turn gray, that’s your signal to consider splitting the query and enabling staging on the portion before the break.

When to Enable or Disable Staging in Microsoft Fabric

Understanding when to enable or disable staging is crucial for optimal performance. This isn’t a one-size-fits-all decision. It depends on your data volume, transformation complexity, and destination type.

When Staging Improves Performance (Enable Staging )

Large Data Volumes

When processing millions of rows, staging allows the Warehouse SQL compute to handle the heavy lifting instead of the memory-constrained Mashup engine.

Real-world example: Processing 29 million records from 50 CSV files, with staging and Partitioned Compute, this can complete in 6-7 minutes instead of 1+ hours.

Complex Transformations That Don’t Fold

Operations like:

  • Custom M functions
  • Complex conditional logic
  • Text parsing with multiple steps
  • Pivot/Unpivot operations
  • Advanced date calculations

These don’t fold to source systems but CAN fold to the Staging Warehouse SQL compute.

Merges and Joins Across Multiple Sources

When combining data from different sources (e.g., D365 F&O transactions with SharePoint reference data):

image 20

Figure 4.1.3: Merges and Joins Across Multiple Sources

Queries Referenced by Multiple Downstream Queries

If Query A is referenced by Query B, Query C, and Query D the staging Query A means it’s computed once and reused, not recomputed three times.

High Scale Compute Requirements

Staging enables access to the High Scale Compute infrastructure, necessary for:

  • Temporal consistency across downstream dataflows
  • Data reconciliation processes
  • Reduced load on source systems

When Staging Adds Overhead (Disable Staging )

Small Data Volumes

For tables with less than 100,000 rows and simple transformations, staging adds unnecessary overhead:

  • Write to staging lakehouse (time cost)
  • Read from staging lakehouse (time cost)
  • Warehouse compute spin-up (time cost)

The direct in-memory path is faster for small data.

Simple Extract-Load Operations

If you’re simply copying data from source to destination with minimal transformation:

  • Column selection
  • Basic data type changes
  • Simple filters that fold to source

Disable staging and let Fast Copy handle it.

Queries Not Referenced by Other Queries

If a query is standalone (not referenced by other queries) and loads directly to destination, evaluate if staging provides benefit for your specific transformation pattern.

When Experiencing Capacity Overage Issues

Staging consumes additional Capacity Units (CU). If you’re hitting capacity limits:

  • Each second of Warehouse compute uses 6 CU seconds
  • Staging Lakehouse operations consume additional CU
  • For F64 SKUs with heavy workloads, staging can push usage over 1000%

Consider disabling staging on less critical dataflows to manage capacity.

Bronze Layer Staging Strategy: Disable Staging and Use Fast Copy

Layer Objective: 

Land raw data from source systems as quickly as possible while preserving original structure and fidelity. The Bronze layer is about speed and preservation.

Why Staging Hurts at Bronze: 

Bronze queries typically involve minimal transformation: column selection, basic filters, data type inference. These operations are lightweight enough that the Mashup engine handles them efficiently. More importantly, when your source is a SQL database or D365 F&O, these operations almost always support query folding, the transformation steps translate into a native SQL query executed at the source. The source does the heavy lifting; only the filtered result travels across the wire.

Enabling staging on a Bronze query adds an unnecessary intermediate write to the Staging Lakehouse followed by a read, before data reaches your actual destination. For a 50-million-row table with basic column selection, this overhead can add 40–60% to refresh time with zero benefit.

Bronze Layer Architecture:

image 19

Figure 5.3: Bronze Layer Architecture

Best Practices for Bronze Layer:

For Bronze ingestion of large volumes, use Fast Copy instead of staging. Fast Copy leverages the same high-throughput CopyActivity backend as Data Pipelines, bypassing the Mashup engine entirely. It provides up to 9x faster performance by optimizing data movement paths and parallelizing writes. Fast Copy supports Lakehouse destinations with CSV and Parquet file formats.

To enable Fast Copy: go to Options > Scale > Allow use of enhanced connector for copy, then enable staging on the query (Fast Copy requires staging as its delivery mechanism). For guaranteed Fast Copy behavior, right-click the query and select Require Fast Copy, this will fail the dataflow explicitly if the query contains unsupported transformations.

  1. Enable Fast Copy: Use this feature for bulk ingestion. It provides up to 9x faster performance by optimizing data movement paths and parallelizing writes.
  2. Leverage Query Folding to Source:  Let the source system handle filtering:
/ Good: Filter folds to D365/SQL source
Table.SelectRows(Source, each [TRANSDATE] >= #date(2024,1,1))


// Good: Column selection folds
Table.SelectColumns(Source,{"OrderID", "CustomerID", "Amount", "Date"})

A complete Bronze extraction query:

let
    Source = Sql.Database("your-server.database.windows.net", "SalesDB"),
    dbo_Orders = Source{[Schema="dbo", Item="Orders"]}[Data],


    // Both steps fold to T-SQL at the source
    FilteredRows = Table.SelectRows(
        dbo_Orders,
        each [ModifiedDate] >= #date(2024, 1, 1)
    ),
    SelectedColumns = Table.SelectColumns(
        FilteredRows,
        {"OrderID", "CustomerID", "OrderDate", "TotalAmount", "ModifiedDate"}
    )
in
    SelectedColumns

Configuration:

SettingValueWhy
Enable Staging❌ NoOverhead exceeds benefit for simple extract-load patterns
Fast Copy✅ YesLeverages CopyActivity engine for up to 9x faster ingestion
Query FoldingMaximizePush filters and column selection to source via native SQL
Modern Evaluator✅ YesProvides incremental gains even on simple queries
Partitioned Compute⚠️ If applicableOnly for 50+ partitioned file scenarios
DestinationBronze Lakehouse → TablesDelta tables preserve raw data fidelity

Bronze Exceptions: When to Enable Staging

There is specific Bronze scenarios where staging helps:

ScenarioEnable Staging?Reason
Combining 50+ partitioned files✅ YesPartitioned Compute requires staging to parallelize
Source doesn’t support any folding✅ YesLeverage Warehouse compute instead of Mashup engine
Need temporal consistency for downstream✅ YesEnsures point-in-time snapshot
Simple extract from foldable source❌ NoDirect path is faster

Silver Layer Staging Strategy: Enable Staging for Cleansing and Validation

Layer Objective:

Transform raw Bronze data into cleansed, validated, deduplicated, enterprise-ready datasets. The Silver layer is where the heavy lifting occurs, data cleansing, validation, standardization, deduplication, and merging data from multiple Bronze sources. These operations typically do not fold to source systems, making staging highly beneficial.

Silver Layer Architecture:

image 22

Figure 6.2: Silver Layer Architecture

Why Staging Helps at Silver:

Silver transformations like Table.NestedJoin, Table.Group, custom M functions, complex conditional logic, text parsing, and pivot/unpivot operations typically cannot fold to the source. Without staging, the Mashup engine must pull all data into memory, perform the operation locally, and write to destination. For tables over 100,000 rows, this becomes a bottleneck, or a timeout.

With staging enabled, data lands in the Staging Lakehouse first, then the Staging Warehouse’s SQL engine takes over. Joins and aggregations execute using distributed SQL compute. Transformations that took 45 minutes in the Mashup engine can finish in under 5 minutes through the Warehouse.

Best Practices for Silver Layer:

  1. Enable Staging on Bronze Source Queries: When reading from Bronze Lakehouse, stage the data first so transformations can leverage Warehouse compute.
  2. Use “Extract Previous” for Non-Folding Steps: When you identify a step that breaks folding: 
  • Right-click on the step
  • Select “Extract Previous”
  • Name the new query (e.g., Stg_Transactions_Raw)
  • Enable staging on the extracted query
  • Continue transformations in the main query
  1. Stage Both Sides of Merge Operations: Before joining tables, ensure both source queries are staged for efficient merge execution.
  2. Apply Modern Evaluator: For Silver transformations on large datasets, enable Modern Evaluator for ~1.6x performance improvement on non-foldable queries.

M Code: Staged Merge Across Two Bronze Sources

When combining data from different sources (e.g., D365 F&O transactions with customer reference data), stage both sides before merging.

Step 1: Staging query for orders (Enable Staging = ON):

// Query: Stg_Bronze_Orders (Enable Staging = ON)
let
    Source = Lakehouse.Contents(null){[workspaceId = "your-ws-id"]}[Data],
    Orders = Source{[Id = "Bronze_Orders", ItemKind = "Table"]}[Data],
    FilteredOrders = Table.SelectRows(
        Orders,
        each [OrderDate] >= #date(2024, 1, 1)
    ),
    SelectedCols = Table.SelectColumns(
        FilteredOrders,
        {"OrderID", "CustomerID", "OrderDate", "TotalAmount",
         "ProductCategory", "Region", "ModifiedDate"}
    )
in
    SelectedCols

Step 2: Staging query for customers (Enable Staging = ON):

/ Query: Stg_Bronze_Customers (Enable Staging = ON)
let
    Source = Lakehouse.Contents(null){[workspaceId = "your-ws-id"]}[Data],
    Customers = Source{[Id = "Bronze_Customers", ItemKind = "Table"]}[Data],
    SelectedCols = Table.SelectColumns(
        Customers,
        {"CustomerID", "CustomerName", "Region", "Segment"}
    )
in
    SelectedCols

Step 3: Silver transformation query (references staged data):

// Query: Silver_OrdersEnriched (destination = Silver Lakehouse)
let
    Orders = Stg_Bronze_Orders,
    Customers = Stg_Bronze_Customers,


    // This merge runs on Staging Warehouse SQL engine
    Merged = Table.NestedJoin(
        Orders, {"CustomerID"},
        Customers, {"CustomerID"},
        "CustDetails", JoinKind.LeftOuter
    ),
    Expanded = Table.ExpandTableColumn(
        Merged, "CustDetails",
        {"CustomerName", "Segment"}
    ),


    // Standardize nulls
    CleanedRegion = Table.ReplaceValue(
        Expanded, null, "Unknown",
        Replacer.ReplaceValue, {"Region"}
    ),


    // Deduplicate: keep latest by ModifiedDate
    Sorted = Table.Sort(CleanedRegion, {{"ModifiedDate", Order.Descending}}),
    Deduped = Table.Distinct(Sorted, {"OrderID"})
in
    Deduped


Modern Evaluator in Dataflow Gen2: Silver Layer Optimization

The Modern Query Evaluation Engine (preview) is particularly effective at the Silver layer. Running on .NET Core 8, it delivers approximately 1.6x faster transformation speed for non-foldable queries. In independent benchmarks, enabling the Modern Evaluator reduced CU consumption by over 50% compared to Gen1 dataflows processing 29 million rows.

Enable it via Options > Scale > Allow use of the modern query evaluation engine. It currently supports connectors including ADLS Gen2, Fabric Lakehouse, Fabric Warehouse, OData, SharePoint, and Web. Unsupported connectors fall back to the standard engine automatically.

Configuration:

SettingValueWhy
Enable Staging✅ YesNon-foldable transforms benefit from Warehouse SQL compute
Fast Copy❌ NoNot applicable. Silver involves heavy transformations
Extract PreviousUse itSplit foldable from non-foldable steps at the folding boundary
Modern Evaluator✅ Yes~1.6x faster for non-foldable queries, 50%+ CU savings
Partitioned Compute⚠️ If applicableRare at Silver unless processing partitioned file sets
DestinationSilver Lakehouse → TablesDelta tables for enriched, deduplicated data

Gold Layer Staging Strategy: Enable Staging for Aggregations and KPIs

Layer Objective:

Create aggregated, enriched, business-ready datasets optimized for reporting, analytics, and consumption by Power BI semantic models. Gold layer transformations involve complex aggregations, business calculations, KPI computations, and dimensional modeling. These operations rarely fold to original sources, benefit significantly from staged intermediate results, and often load to a Warehouse destination where staging is mandatory.

Gold Layer Architecture:

image 23

Figure 7.2: Gold Layer Architecture

Why Staging Is Required for Gold Warehouse Destinations

If your Gold destination is a Fabric Warehouse (the recommended choice for structured analytical data), staging cannot be disabled which is architecturally required. The Dataflow Gen2 engine uses the Staging Warehouse as the intermediary write path to Warehouse destinations.

Even when loading to a Lakehouse, Gold queries benefit enormously from staging because aggregations (Table.Group), multi-table joins, and calculated columns are compute-intensive operations that the Staging Warehouse’s SQL engine handles far more efficiently than in-memory processing.

Partitioned Compute: Gold at Scale

For Gold queries that process large partitioned file sets (monthly CSVs, yearly Parquet archives), enable Partitioned Compute. This feature parallelizes processing across file partitions. Microsoft’s benchmark showed:

  • Without Partitioned Compute: 1 hour 44 minutes
  • With Partitioned Compute: 6 minutes 51 seconds (15x faster)
  • With Partitioned Compute + Modern Evaluator: 22x faster processing

Enable it via Options > Scale > Allow use of partitioned compute. Currently supports ADLS Gen2, Fabric Lakehouse, Folder, and Azure Blob Storage connectors. You must also enable Allow combining data from multiple sources in the Privacy section, and staging must be enabled on the query.

M Code: Gold KPI Aggregation

This Gold query aggregates Silver order data into a monthly revenue KPI table, grouped by region and product category:

// Query: Gold_MonthlyRevenue (Enable Staging = ON)
// Destination: Gold Warehouse
let
    Source = Stg_Silver_OrdersEnriched,
    // Add Month-Year key for time-based aggregation
    AddedMonthKey = Table.AddColumn(
        Source, "MonthKey",
        each Date.Year([OrderDate]) * 100 + Date.Month([OrderDate]),
        Int64.Type
    ),
    // Aggregate revenue by Region, Category, Month
    Grouped = Table.Group(
        AddedMonthKey,
        {"Region", "ProductCategory", "MonthKey"},
        {
            {"TotalRevenue", each List.Sum([TotalAmount]), type number},
            {"OrderCount", each Table.RowCount(_), Int64.Type},
            {"AvgOrderValue", each List.Average([TotalAmount]), type number}
        }
    ),


    // Sort for downstream consumption
    Sorted = Table.Sort(Grouped, {{"MonthKey", Order.Ascending}})
in
    Sorted

Best Practices for Gold Layer:

  1. Staging is Mandatory for Warehouse Destinations: You cannot disable it, so design your queries accordingly.
  2. Stage All Silver Source Queries: Before aggregating, ensure Silver data is staged for efficient processing.
  3. Use Partitioned Compute for Large Aggregations: When processing partitioned datasets, Partitioned Compute provides 15-22x faster performance.
  4. Consider Separate Dataflows for Complex Models: For dimensional models with many fact and dimension tables, split into focused dataflows to stay within the 50-query limit.
  5. Stage Intermediate Aggregations: If building complex KPIs requiring multiple calculation steps, stage intermediate results.

Configuration:

SettingValueWhy
Enable Staging✅ Yes (Mandatory)Required for Warehouse destinations; aggregations need SQL compute
Fast Copy❌ NoNot applicable, Gold involves heavy aggregation logic
Partitioned Compute✅ Yes (if files)22x faster for partitioned file processing
Modern Evaluator✅ YesFaster execution for complex aggregation and business logic
DestinationGold WarehouseOptimized for Power BI DirectQuery and structured analytics

Decision Flowchart: Choosing Staging Settings by Layer and Scenario

QuestionIf YesIf NoFeature to Enable
Is this raw ingestion (Bronze)?Staging OFFFast Copy + Query Folding
Are transforms non-foldable?Staging ONStaging OFFExtract Previous + Modern Evaluator
Is destination a Warehouse?Staging ON (mandatory)Required architecturally
Data volume under 100K rows?Staging OFFStaging ONMashup engine handles small data
Processing 50+ partitioned files?Staging ONPartitioned Compute
All transforms fold to source?Staging OFFStaging ONLet the source do the work
CU budget under pressure?Reduce stagingSchedule off-peak
More than 50 queries needed?Split dataflowsPipeline orchestration
SettingBronzeSilverGold
Enable Staging❌ OFF✅ ON✅ ON (mandatory)
Fast Copy✅ ON❌ OFF❌ OFF
Query FoldingMaximizePartial (Extract Previous)N/A (reads from staged)
Modern Evaluator✅ Yes✅ Yes✅ Yes
Partitioned ComputeOnly for 50+ filesRare✅ If file-based
Typical DestinationLakehouseLakehouseWarehouse
CU ImpactLowMediumHigh (offset by speed)

Staging configuration is not a global setting you toggle once and forget. It is a per-query, per-layer decision that directly determines whether your dataflow runs in minutes or hours, and whether your capacity budget survives the month. Start with the layer defaults in the checklist above, verify with the folding indicators, and measure with the Capacity Metrics App.

Conclusion

Query staging in Microsoft Fabric Dataflows Gen2 is one of the highest impact levers for controlling performance and capacity. When you align staging choices with the Medallion Architecture, long running loads can drop from hours to minutes without creating unpredictable capacity spikes.

The key is to treat staging as a per layer decision. For the Bronze layer, keep staging disabled in most cases and rely on Fast Copy and query folding for fast raw ingestion, often delivering up to 9x improvement. For the Silver layer, enable staging on source and transformation queries so the Warehouse engine can handle cleansing, validation, and merge operations, with Modern Evaluator delivering an additional 1.6x improvement. For the Gold layer, staging is required for Warehouse destinations and remains the best option for heavy aggregations and business logic. For large datasets, Partitioned Compute can deliver up to 22x faster performance.

To make this pattern work reliably in production, validate folding behavior, respect the 50 query limit per dataflow, and keep related artifacts in the same workspace for simpler management and execution. With these practices in place, data engineers, Power BI developers, and solution architects can build enterprise grade pipelines that scale efficiently, control capacity consumption, and deliver timely insights to business stakeholders.

Nasif Azam

Data Analyst

Nasif Azam

Nasif Azam is a Data Analyst at Data Crafters with expertise in Power BI, Power Query, DAX, SQL, and Microsoft Fabric. A 4x Microsoft Certified Super User, he focuses on building interactive dashboards, optimizing data models, and managing data pipelines to turn complex data into clear, actionable insights.

In this article

Like what you see? Share with a friend.

Related Events

Related Services

Ikramul Islam

Khaled Chowdhury

Datacrafters | DatabricksDatacrafters | Microsoft FebricDatacrafters | AzureDatacrafters | power BI Services

Rubayat Yasmin

Microsoft-Certified-Power-BI-Data-Analyst-AssociateMicrosoft-Certified-Fabric-Analytics-Engineer-AssociateMicrosoft-Certified-Azure-Data-Engineer-AssociateMicrosoft-Certified-Azure-Solutions-Architect-Expert

Rami Elsharif, MBA

Microsoft-Certified-Power-BI-Data-Analyst-AssociateMicrosoft-Certified-Fabric-Analytics-Engineer-Associate

Govindarajan D

Microsoft-Certified-Power-BI-Data-Analyst-AssociateMicrosoft-Certified-Azure-Data-Engineer-AssociateMicrosoft-Certified-Azure-Administrator-AssociateMicrosoft-Certified-Azure-Solutions-Architect-ExpertDatabricks-Certified-Data-Engineer-ProfessionalLinux-EssentialsMicrosoft-Certified-Fabric-Analytics-Engineer-AssociateMicrosoft-Certified-Azure-Enterprise-Data-Analyst-AssociateDatabricks-Certified-Data-Engineer-AssociateMicrosoft-Certified-Trainer-MCTAzure-Databricks-Platform-Architect