Building the Data Pipeline

collapse

Course Details

This course can only be taken as part of the Certificate in Big Data Technologies.

Get Program Details

About this Course


This course focuses on the methods used to acquire, store and process data for downstream analysis. You'll analyze and compare available technologies to make informed decisions as data engineers. You’ll also explore the modern cloud data platform — building systems that handle data from ingestion, through storage, processing and, ultimately, serving.  

What You’ll Learn

  • How a data lake (like Delta Lake, which is part of the Spark ecosystem) can enhance the usability of your organization’s data 
  • Batch and streaming processing using Spark, Flink and other processing tools 
  • How to use Kafka to enable low-latency and real-time processing 
  • The unified log model and the ways the log notion recur in support of building robust, fault-tolerant distributed data systems
  • Data acquisition, data governance and modeling techniques 

Get Hands-On Experience

  • Organize and store data in a data lake and handle updates and changes to your data
  • Use Spark to connect to different data sources and process batch and streaming data 
  • Design, build and integrate a complete end-to-end data pipeline to support a realistic business case 

Program Overview

This course is part of the Certificate in Big Data Technologies.

  Get our email newsletter with career tips, event invites and upcoming program info.       Sign Up Now