Data Engineering Capstone


Course Details

This course can only be taken as part of the Certificate in Big Data Technologies.

Get Program Details

About this Course

In this capstone course, you'll get a chance to apply the latest data engineering approaches through completion of a hands-on data engineering project. You'll analyze and explore solutions to complex problems commonly found in the real-life application of Apache Spark’s data processing ecosystem — problems that require comprehensive and specialized knowledge, and where basic techniques would be suboptimal.


  • How to design and implement a data lake for a multichannel retail organization in Azure Data Lake and Azure Databricks using a multi-hop, medallion architecture
  • Ways to efficiently and performantly ingest, transform and land big data workloads using Apache Spark
  • How to build a feature data set for a machine learning model
  • Diagnosis and tuning of common performance pitfalls in Spark jobs
  • How to design, orchestrate and curate data sets based on business requirements


  • Explore and transform semi-structured data sets at real scale in Azure Databricks using Apache Spark
  • Write Airflow DAGs to orchestrate common data pipeline operations
  • Use open-source Delta Lake to manage your data storage and perform common DDL operations

Program Overview

This course is part of the Certificate in Big Data Technologies.

  Stay up to date with emails featuring career tips, event invitations and program updates.       Sign Up Now