Introduction
Version control is a vital aspect of managing any dynamic system, especially when it comes to data workflows. As organizations increasingly rely on Power BI for business intelligence, ensuring that your dataflows are backed up regularly and versioned correctly becomes critical.
However, a notable limitation in the current Fabric setup is that Dataflow Gen2 is not tracked via Git, necessitating alternative strategies for backup and version control. This article explores a PowerShell script designed to copy Dataflow Gen2 from the Power BI Service daily, helping to maintain historical versions and safeguard your data assets amidst these limitations.
Why Version Control is Essential
Power BI has revolutionized how businesses handle data visualization and analytics. However, managing dataflows within Power BI can be challenging, particularly when it comes to maintaining different versions over time. Dataflow Gen2, with its enhanced capabilities, brings new opportunities but also necessitates a more robust versioning strategy. This is where automation through scripting becomes indispensable.
Use Case
Imagine you are an analyst managing multiple Dataflow/Dataflow Gen2 instances in Power BI Service. You need to ensure that every version of these dataflows is archived daily to track changes and revert to previous versions if necessary. Manually copying these dataflows daily would be tedious and error prone. Instead, by using a PowerShell script, you can automate this task, ensuring that every version is securely stored without any manual intervention.
Prerequisites
Script Overview and Key Code Explanations
Let’s dissect the key portions of the script to understand its functionality and application.
Connect-PowerBIServiceAccount
Explanation: This line establishes a connection with the Power BI Service. It is a prerequisite for executing any operations that involve interacting with Power BI resources.
2. Retrieving and Filtering Workspaces
$Workspaces = Get-PowerBIWorkspace
$WorkspaceFilterNames = 'Data'
foreach($Workspace in $Workspaces) {
if($Workspace.Name -eq $WorkspaceFilterNames) {
...
}
}
Explanation: Here, the script retrieves all available Power BI workspaces and filters them by name. The filter is set to match workspaces named “Data”, focusing the operations on relevant areas, and improving efficiency.
3. Managing Backup Directories
if (Test-Path -Path "$Folder\$FolderName") {
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
$NewFolderName = "$FolderName-$timestamp"
$FolderName = New-Item -Path $Folder -Name $NewFolderName -ItemType "directory"
} else {
$FolderName = New-Item -Path $Folder -Name $FolderName -ItemType "directory"
}
Explanation: This segment checks if a backup folder already exists for the workspace. If it does, it appends a timestamp to create a unique folder for the current backup session, preventing overwriting previous backups and maintaining a clear version history
4. Exporting Dataflows
foreach($Dataflow in $Dataflows) {
$ExportFileName = Join-Path $FolderName -ChildPath ($Dataflow.Name + ".json")
Export-PowerBIDataflow -WorkspaceId $Workspace.Id -Id $Dataflow.Id -Scope Individual -OutFile $ExportFileName
}
Explanation: Each dataflow within the filtered workspace is exported as a JSON file. This process involves constructing the filename with the dataflow’s name and saving it to the previously prepared backup directory.
5. Modifying and Saving JSON
$JSONFile = Get-Content -Path $ExportFileName | ConvertFrom-Json
$JSONFile.'pbi:mashup'.allowNativeQueries = $false
$JSONToWrite = $JSONFile | ConvertTo-Json -Depth 100
Set-Content -Path $ExportFileName -Value $JSONToWrite
Explanation: After exporting the dataflow, the script modifies the JSON content. It sets allowNativeQueries to false, enhancing security by preventing native queries from running. The JSON is then reconverted to a string and saved back to the file, ensuring that the exported dataflow reflects the desired configuration.
Full Script
# Connect to Power BI Service
Connect-PowerBIServiceAccount
# Retrieve all workspaces
$Workspaces = Get-PowerBIWorkspace
# Specify the workspace to filter by name
$WorkspaceFilterNames = 'Data'
# Loop through the Workspaces filtering by the necessary workspace names
foreach($Workspace in $Workspaces) {
if($Workspace.Name -eq $WorkspaceFilterNames) {
# Define the directory where the dataflows will be backed up
$Folder = 'C:\Users\CurrentUser\ Data Crafters Build\DFBackup'
$FolderName = $Workspace.Name
# Check if the folder already exists
if (Test-Path -Path "$Folder\$FolderName") {
# If the folder exists, append the current date and time to the folder name
$timestamp = Get-Date -Format "yyyy-MM-dd_HH-mm-ss"
$NewFolderName = "$FolderName-$timestamp"
$FolderName = New-Item -Path $Folder -Name $NewFolderName -ItemType "directory"
} else {
# If the folder does not exist, create it normally
$FolderName = New-Item -Path $Folder -Name $FolderName -ItemType "directory"
}
# Get dataflows for the current workspace
$Dataflows = Get-PowerBIDataflow -WorkspaceId $Workspace.Id
# Loop through each dataflow
foreach($Dataflow in $Dataflows) {
Write-Host "Exporting Dataflow: " $Dataflow.Name "from Workspace: " $Workspace.Name
# Throttle requests to avoid hitting API limits
Start-Sleep -Seconds 4
# Construct the export filename and path
$ExportFileName = Join-Path $FolderName -ChildPath ($Dataflow.Name + ".json")
# Export the dataflow to a JSON file
Export-PowerBIDataflow -WorkspaceId $Workspace.Id -Id $Dataflow.Id -Scope Individual -OutFile $ExportFileName
# Load the exported JSON, modify it and save it
$JSONFile = Get-Content -Path $ExportFileName | ConvertFrom-Json
$JSONFile.'pbi:mashup'.allowNativeQueries = $false
$JSONToWrite = $JSONFile | ConvertTo-Json -Depth 100
Set-Content -Path $ExportFileName -Value $JSONToWrite
# Prepare the multipart/form-data body for REST API submission
$UID = [System.Guid]::NewGuid().ToString("N")
$LF = "`r`n";
$Body = (
"--$UID",
"Content-Disposition: form-data; name=`"model.json`"; filename=`"model.json`"",
"Content-Type: application/json$LF",
$JSONFile,
"--$UID--$LF"
) -join $LF
$Header = @{
Authorization = "$AccessToken"
'Content-Type' = "multipart/form-data; boundary=--$UID"
}
# Uncomment the line below to execute the REST API call
#Invoke-RestMethod -Uri $InvokeURL -ContentType 'multipart/form-data' -Method POST -Headers $Header -Body $Body
}
}
}
Once you run the script in the Power Shell ISE you will see confirmation message like below image,
Restoring the Json file for Dataflow Gen2
Many of you may believe that JSON is not suitable for Gen2 dataflows; however, there is an effective workaround available. You can export the dataflow to a Gen1 format, extract the Power Query template from this exported dataflow, and then import it into the Gen2 dataflow. Once this is done, simply enable native queries and configure the data destination to complete the setup. This approach ensures compatibility and functionality within the Gen2 environment.
Conclusion
Automating the backup and versioning of Dataflow Gen2 instances in Power BI Service is not just a good practice—it is a necessity for maintaining the integrity and reliability of your data. The PowerShell script discussed in this article streamlines this process, allowing you to keep daily versions without the hassle of manual intervention. By ensuring that every version is securely backed up, you can confidently manage your Power BI environment and respond swiftly to any data issues that may arise.
Stay ahead in a rapidly world. Subscribe to Prysm Insights,our monthly look at the critical issues facing global business.
© 2024 Data Crafters | All rights reserved