We are the Best Consulting web site as part of the annual WebAward Competition!

(832) 981-4635
info@datacrafters.io
img
Language

English

The curious case of Column names in Microsoft Fabric Lakehouse

In self-service BI tools, having column names that are user-friendly is crucial for creating an environment that is easy for business users to navigate. The use of whitespace is significant in these instances, as it helps maintain the simplicity and user-friendliness of reports and dashboards.

Power BI semantic models and dataflows support column names with spaces but few of the components in the recently released Fabric come with nuances and quirks that must be handled for using business friendly names in downstream reports and dashboards.

In this article, let’s explore how we can retain the user friendliness of models while still leveraging the best practices of Lakehouse.

Lakehouse

Microsoft Fabric’s Lakehouse is an object that tries to merge the benefits of data lakes and data warehouses enabling a unified platform for different types of workloads. It uses Delta as storage format which in turn uses parquet files for storing data. Parquet files can store names with whitespace, but it is usually discouraged to do so and few of the components in Fabric throw an error when column names contain whitespace and other characters like brackets, semi-colon, comma.

Dataflows

Dataflow Gen 2 in Fabric is a new generation of dataflows that brings lot of new features such as integration with pipelines, allowing for better orchestration capabilities. Organizations would be interested to migrate from Power Platform Dataflows to Dataflow Gen2 because it enables data, transformed by Citizen developers, to be accessible in a Lakehouse environment. This availability is beneficial for applications such as data science models and AI use cases. Since both Dataflows use M-query, there is less effort for migration while there are many benefits to be gained.

Handling special characters

When using the Power Platform dataflows, it is common to have column names that contain white spaces and other special characters. Dataflow Gen2 when used with Lakehouse table as destination, provides an option for converting the unsupported characters to underscore (_). If a column has unsupported characters, it is not mapped to the Lakehouse data unless fixed.

The Curious case of Column names in Microsoft Fabric Lakehouse

 

The Curious case of Column names in Microsoft Fabric Lakehouse

 

One of the disadvantages of the default fix is, it replaces all unsupported characters with just underscore, providing option to change neither the replacement character nor specific characters with specific replacement. For example, in the above example, Amount (USD) is just converted to Amount__USD_. This makes it harder during an automated conversion back to business-friendly names.

Custom Power Query function

We have designed a simple M-Query function that helps replace unsupported characters with custom supported characters. The below FormatColumns function takes the name of Source Table (Query Name) and Replacement character as arguments.

Although the below function replaces all characters, it can be modified to replace specific characters with specific replacements. For example, spaces can be replaced with ‘_’ while round brackets can be replaced with ‘|’. This will require changing the characters it looks to replace within the List.Accumulate function.

// Function FormatColumns: formats column names in a source table
let
    FormatColumns = (SourceTable as table, ReplaceWithCharacter as text) as table =>
    let
        // Custom function that replaces text with replacement text
        ReplaceCharWithCustom = (InputText as text, CharacterToReplace as text) as text => 
            Text.Replace(InputText, CharacterToReplace, ReplaceWithCharacter),

        // Custom function that checks unsupported characters for Delta format and replaces them
        ReplaceForDelta = (InputText as text) as text => 
            List.Accumulate({" ", "[", ",", ";", "{", "}", "(", ")", "=", "]"}, InputText, (state, CharacterToReplace) => 
                ReplaceCharWithCustom(state, CharacterToReplace)),

        // Get the original column names and iterate them with ReplaceForDelta function
        OriginalColumnNames = Table.ColumnNames(SourceTable),
        RenamedColumnNames = List.Transform(OriginalColumnNames, each ReplaceForDelta(_)),
        RenamedTable = Table.RenameColumns(SourceTable, List.Zip({OriginalColumnNames, RenamedColumnNames}))

    in
        RenamedTable
in
    FormatColumns

The above code can be added as a function with the name ‘Format Columns’ in the Dataflow Gen2, and it can be used in a query as shown in the GIF image given below.

The Curious case of Column names in Microsoft Fabric Lakehouse

When consuming the data in Power BI from SQL Endpoint on top of a Lakehouse, the above M-Query function can be reused with modifications to revert to original column names with special characters.

 

Whitespace and special characters serve a crucial function in ensuring that object names remain user-friendly in a business context. The function described above facilitates the migration of Dataflows without compromising functionality or the user-friendliness of semantic models.

Follow us for more interesting articles on Microsoft Fabric!

Post A Comment

Stay ahead in a rapidly world. Subscribe to Prysm Insights,our monthly look at the critical issues facing global business.

[mc4wp_form]