Data Engineering in Microsoft Fabric: Part 4 – Orchestration

May 21, 2024

Data orchestration is an automated way to manage and organize data from different sources, making it ready for analysis. It’s also known as data pipeline orchestration or data workflow orchestration.

This process covers many parts of data management, such as ensuring data quality, adhering to data governance rules, and automating various tasks. It also involves moving and transforming data through operations like ETL (Extract, Transform, Load). Data orchestration helps organizations use their data effectively by giving them a single view for analysis, reporting, and decision-making. Depending on what they need, organizations might use several orchestration tools together.

Benefits of Data Orchestration

Integration of Data: Combines data from multiple sources, as part of the Extract, Transform, Load (ETL) process.
Automation and Efficiency: Uses automated tools and technologies to manage and coordinate data across different systems and applications, reducing the need for manual intervention and increasing efficiency.
Job Scheduling and Workflow Management: Schedules tasks, manages workflows, and handles dependencies among various jobs.

Demonstration: Creating a Pipeline Using Dataflow and Stored Procedure Activities in Microsoft Fabric

In this section, we will walk through the steps to create a data pipeline using Dataflow and Stored Procedure activities in Microsoft Fabric.

Step 1: Begin by navigating to the appropriate workspace and accessing the Microsoft Fabric data engineering experience.

AD 4nXdwhHz2aCC7cpGko9bCmKHopOukaGkkXN dW0i8bWaFAgzSRTi4IcJK40R8e izFUJfN5IbLyz1 FiQAyKBW9bGCnKzWsH5CR9AY4kEG4Zo9A1LaDIisNCDXYjqhfFNeDMg6MyoREf efWSNlGlX9I?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 2: Click on ‘+New’ and select ‘Data Pipeline’ to create a new pipeline.

AD 4nXfP4hgaQynVL 7YzTZwmNR0HQzNquXgol9XF6GbABQkVwDJI5Mg3A4NtoZCrFRMpz2Fy35YC

Step 3: Provide a name for the pipeline and click the ‘Create’ button.

AD 4nXfk 8a DcU1F7zz2xsLunxztQ7YXrqwdQ5BfQTHVFFgRsevAnd22nwOaGNItSZuXhj2ywhYtz9e8sWpQW4Uv XFi0Ebs7bwOv7Hy5xyblhfJtx1xg6FWgUIHBuGaoU91hmwnJKoKt REL5Uohhqd4g?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 4: You will be presented with an interface where you can select ‘Pipeline activity’.

Step 5: After choosing ‘Pipeline activity’, a list of activities will appear. Select ‘Dataflow’ from the list.

AD 4nXeVTVw5VHsQMbr CwvM4ThzrOTvUy8N0lYgRjDGw1C77AOrZ520AKhQTro08UTld07srYSeOOBGOrmvxAkcbPcKIP2PQG pME yW67cZG01f PsUuiwLIOC9OQ 6HIjPuAwJnmQwKYVZeT0ClWeasQ?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 6: Within the ‘General’ tab, rename the Dataflow activity to something meaningful. AD 4nXdJOch1M9 lAQHrzt9Nf2eT71kVjKi pmfqOjB1h7UL 8S9KUJIU hsGtmVUgAfzMCuNbl cUTShzXT5n7byCambxbS

Step 7: In the ‘Settings’ tab, choose the appropriate workspace and the previously created Dataflow Gen2. A screenshot of a computer

Description automatically generated

Step 8: Next, click on the ‘Add activity’ icon and select ‘Stored Procedure’ from the list of activities. AD 4nXc V8hzSg35qd0157DoF4NtPnnyJhu8nqcmUYGOMYe yzqrO7F4xPnBTodUy sxQP7fcZbDPzvHxDhMn0yhFoL dYE82QCXamz2S4Xk5fPFYQDyKrhBgCCtbcTfnzHRiyVHIkWbIYnSCouHSwXEHm0?key=KaMYjhHK4V06T8kkqJzRwZ0U

AD 4nXdFkq9ugfJic5q qWPL0 iq30FB H8b9QWj9o0tmrmPpYFQM8f4geVFBoUd0Trun8MKXQ

Step 9: Assign a name to the Stored Procedure activity under the ‘General’ tab.

AD 4nXeNY3hbVezE4eQDwM4eSU7Weg3XYqUn4nRruNudxeBs 4yAdAYkcCdzw9zLBf9ymr0RfFtbv50 gqSlQTxA4KDp8EAeJHR0qQJ31vImB6KwgpkioL4SECQ lAaEreoSz5XbEsGz Ku Obz8ds4CrA?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 10: In the ‘Settings’ tab, select a suitable connection and the previously created stored procedure from the last blog post here.

Step 11: Click the ‘Import’ button to bring in the stored procedure parameters, then input a number in the value field.

A screenshot of a computer

Description automatically generated

. AD 4nXcWB99U1iocnCVq GOTglSDzdJY8kKQe9Cqfrfe2rK0aLSNRuBtHxSVCJdC4blf4hcXIiefqhQCNSsePjsit6mEvYxZFesXzS8KyAM6bNDN0cGyXJ4h69QTm29G gBwUyYrNvgfzFV2BDTMhpZrgZ0?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 12: Click the ‘Run’ button to execute the pipeline

Step 13: After clicking ‘Run’, you will be prompted to save the pipeline first. Select ‘Save and run’ to save and execute the pipeline.

AD 4nXeJ1 22zpUcbQO7ZqPSoBNKnmOVC585ggFIrwuKyjDenhFeW7ADo gpn6qGDTJhAPL5r4A CJM56eJgLejV 4SI0C5CZgXyjz9tL9DzDIsWNF1sdUJGuVmvUr2dTN4jNNOsDSGop2 BAhXMs js294?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 14: Once the pipeline is running, you can monitor its status. AD 4nXfird2c6A0wcLHwgdG5Qq9RHF12vQxczxHLahh2w uDVLy7 91TdSEC1gg2OPqgyWwLrG

Step 15: After the pipeline completes its run, it will display a ‘Succeeded’ status along with the two activities.

AD 4nXdVPwa Mg4IdTiPR2DdUGJvVenQ8Xabp0GNPoq05X1Kh1GxvsQ RutTs4tgFzj3WbKanqwvZzYovw7S3snLLLcG0VLgthIb0 05Ik9UdYjRYJLSX11evwBPEQCgIUpwKHKuwswQ DB1QckbsDmSQA?key=KaMYjhHK4V06T8kkqJzRwZ0U

Step 16: You can also set a schedule for the data pipeline. To do this, select the ‘Schedule’ button from the home menu bar.

A screenshot of a computer

Description automatically generated

Step 17: Turn on the ‘Schedule run’ option and set the schedule at your preferred time and time zone. After entering the details, click ‘Apply’ to save the schedule.

Conclusion

In this article, we explored data orchestration and its benefits, including data integration, automation, and workflow management. We demonstrated how to implement orchestration in Microsoft Fabric by creating a data pipeline with Dataflow and Stored Procedure activities. We also covered running the pipeline and setting up a refresh schedule.

Data orchestration is a powerful tool for efficiently moving and transforming data, enabling organizations to leverage their data effectively for analysis, reporting, and decision-making. For more step-by-step guides on Data Engineering in Microsoft fabric, check out the rest of the series here [insert link]

Abdul Alim

Analytics Engineer • Power BI

Md. Abdul Alim is a trailblazing figure in Data Analytics, renowned for his expertise as a Microsoft Certified Power BI and Azure Enterprise Data Analyst. With mastery in DAX, Power Query, SQL, and Excel, he transforms raw data into actionable insights, driving innovation and delivering measurable value globally.