Building the Data Pipeline

About this Course

This course focuses on the methods used to acquire, store and process data for downstream analysis. You'll analyze and compare available technologies to make informed decisions as data engineers. You’ll also explore the modern cloud data platform — building systems that handle data from ingestion, through storage, processing and, ultimately, serving.

What You’ll Learn

How a data lake (like Delta Lake, which is part of the Spark ecosystem) can enhance the usability of your organization’s data
Batch and streaming processing using Spark, Flink and other processing tools
How to use Kafka to enable low-latency and real-time processing
The unified log model and the ways the log notion recur in support of building robust, fault-tolerant distributed data systems
Data acquisition, data governance and modeling techniques

Get Hands-On Experience

Organize and store data in a data lake and handle updates and changes to your data
Use Spark to connect to different data sources and process batch and streaming data
Design, build and integrate a complete end-to-end data pipeline to support a realistic business case

Quarter 1

Introduction to Data Engineering

Quarter 2

Building the Data Pipeline

Quarter 3

Data Engineering Capstone

Certificate in Building Modern Data Systems

Discover this program

Approved by the UW Paul G. Allen School of Computer Science & Engineering.

View this program's advisory board.