We are the Best Consulting web site as part of the annual WebAward Competition!

(832) 981-4635
info@datacrafters.io
img
Language

English

Data Engineering in Microsoft Fabric: Part 1 – Ingestion – Data Pipelines

Microsoft Fabric offers a robust platform for modern data management, enabling streamlined processing and advanced analytics. Data Pipelines, one of the tools, allow for automated extract, transform, and load processes, essential for ingesting transactional data into analytical data stores. Data pipeline is similar to Azure Data Factory/Synapse pipelines. You can create pipelines by using multiple activities and connect them together to perform orchestration. It can be configured to run interactively or schedule them to run automatically.

Today we will discuss data pipeline’s core activities and functions and then we will create a sample data pipeline using copy data activity.

Data Pipeline can be used in different scenarios like-

  • Big Data Processing: Data pipelines enable efficient processing of large data volumes in distributed environments like Hadoop and Spark, supporting analytics and machine learning tasks. 
  • Cloud Data Migration: Orchestrating data transfer and transformation tasks, pipelines streamline data migration to cloud platforms like AWS, Azure, and Google Cloud. 
  • Real-Time Data Streaming: Pipelines process streaming data from sources like IoT devices and social media in real-time, allowing immediate insights and actions. 
  • Data Warehousing: Automating ETL processes, pipelines load data into warehouses for easy querying and analysis by business users. 
  • Machine Learning and AI: Pipelines prepare and preprocess data for training ML models, facilitating scalable model training and deployment. 

Core concepts of Data Pipeline: 

Before you start making pipelines in Microsoft Fabric, it’s important to grasp some basic ideas. In data pipelines, you’ll come across various elements like activities, parameters, and pipeline runs.  

Activities in Microsoft Fabric’s data pipelines serve two primary purposes: data transformation and control. Popular activities include Copy Data, Dataflow, Stored Procedure, ForEach, If Condition, and Lookup. 

  • Copy Data Activity facilitates data copying between cloud-based data stores. 
  • Dataflow Activity enables running Dataflow Gen2 in Microsoft Fabric’s Data Factory. 
  • Stored Procedure Activity executes pre-defined procedures in pipelines, streamlining database integration. 
  • ForEach Activity establishes repeating control flows, iterating over collections and executing specified activities. 
  • If Condition branches based on condition evaluation, executing different activities accordingly.
  • Lookup Activity retrieves records or values from external sources for reference by subsequent activities. 

Parameters: Parameters in pipelines allow for customization by providing specific values for each pipeline run. This flexibility enhances the reusability of pipelines, facilitating dynamic data ingestion and transformation processes.

Pipeline Runs: Pipeline runs are initiated each time a pipeline is executed. Runs can be started on-demand or scheduled at regular intervals. Utilize the unique run ID to review run details, ensure successful completion, and examine specific execution settings.

Create a sample pipeline using copy data activity:

To begin utilizing fabric items, start by initiating a fabric trial in your account. Follow the provided link to activate your trial.Next, create a fabric capacity workspace using the trial license. Once done, proceed to create a Data Pipeline from either Data Factory, Data Engineering, or Data Warehouse experiences. Access the “Data pipeline” option from the homepage and select the relevant workspace to create your data pipeline.

Once the data pipeline is established, you’ll encounter an interface to configure the ‘Copy Data’ activity. By choosing the ‘Copy Data’ activity, you’ll be guided to select a data source from the available options. You can opt for any provided data sources. For example, let’s proceed with creating the copy data activity using the sample dataset titled NYC Taxi-Green. 

After selecting the source data, click ‘Next’ to proceed to the ‘Connect to data source’ step. Here, you’ll be able to preview the dataset. 

After reviewing the dataset, click ‘Next’ to proceed to the ‘Choose data destination’ page. Here, you’ll need to select the destination where your data will be stored. For this example, we will select Lakehouse as the data destination. 

Choose Lakehouse and proceed to select the specific Lakehouse where you want to store the data. 

After selecting Lakehouse, proceed to the next step where you’ll be presented with this interface to configure how the dataset will be stored. Choose ‘Tables’ for the Root folder and ‘Load to new table’ as the Load settings. Rename the table as desired. Additionally, you have the option to map the columns if you wish to modify the column names or data types. 

Click ‘Next’ to proceed to the ‘Review + save’ page where you can review the configurations made to copy the data from source to destination. After verifying the names of the source and destination data, select ‘Save + Run’ to save the copy data activity. 

After selecting ‘Save and Run,’ the activity will begin processing the data from the source to the Lakehouse table. You can monitor the status of the run from the output pane. 

After completion, you can view all the settings under the ‘Activities’ tab by selecting the ‘Copy Data’ activity. 

To conclude, Microsoft Fabric’s Data Pipeline enables easier orchestration of ETL processes and facilitates extraction of data from a variety of source systems. From this article, we understand that core activities such as Copy Data streamlines extraction task, while the intuitive interface enables efficient configuration for data transfer and storage. This empowers organizations to harness data effectively for actionable insights and informed decision-making.  

Stay tuned for a series of upcoming blogs covering various experiences within Fabric.

Post A Comment

Stay ahead in a rapidly world. Subscribe to Prysm Insights,our monthly look at the critical issues facing global business.

[mc4wp_form]