Introduction to Data Science


Course Details

This course can only be taken as part of the Certificate in Data Science.

Get Program Details

About this Course

This course introduces students to the data management, storage and manipulation tools common in data science and has students apply those tools to real scenarios. You’ll get an overview of relational database management systems and various NoSQL database technologies, and learn how to choose the appropriate tool to get the job done.

Topics include:

  • Introduction to data (data types, data movement, terminology, etc.)
  • Storage and concurrency preliminaries
  • Data structures: EAV, graph, tabular
  • Relational database management systems (RDBMS)
  • Hadoop Distributed File System
  • NoSQL — MapReduce vs. parallel RDBMS
  • Exploratory data analysis and clustering