Introduction to Data Engineering


Course Details

This course can only be taken as part of the Certificate in Big Data Technologies.

Get Program Details

About this Course

In this course, you'll get an introduction to the fundamental building blocks of big data engineering. You'll learn the foundational concepts of distributed computing, distributed data processing, data management and data pipelines. You'll also survey a variety of available data stack technologies and learn how to run a data processing workflow through a commonly used platform.

What You'll Learn

  • The fundamentals of modern big data stacks, their uses, advantages and limitations
  • How functional programming ideas help with building and using systems to store and process big data 
  • The foundations of the Hadoop ecosystem and its emerging successors 
  • The ins and outs of big data processing via multiple paradigms, both storage-bound and in-memory (Spark, Spark SQL, Delta Lake, Hive, SQL) 
  • The origins, uses and limitations of NoSQL stores (HBase, Redis, Elasticsearch, Cassandra, graph-processing systems, etc.) 


  • Apply contemporary distributed computing frameworks to the storage, processing and analysis of large data sets 
  • Use the MapReduce model and the Spark framework on big data problems 
  • Apply principles of functional programming to data storage and analysis 

Program Overview

This course is part of the Certificate in Big Data Technologies.

  Stay up to date with emails featuring career tips, event invitations and program updates.       Sign Up Now