MAC.3075 Computation in Data Science

This course introduces students to the core computational technologies involved in the analysis of large data sets, known popularly as "Data Science" or "Big Data." We focus on the data-handling and computational aspects of data handling-sometimes referred to as "Data Engineering"-and touch on statistics from the standpoint of ensuring data cleanliness and testing the computations. We begin with case studies in Big Data analysis, involving social networks, sensor networks, and application logs. We then introduce various tools used in the data pipeline, including Map-Reduce, No-SQL databases, and custom code for data manipulation. Finally we introduce the use of R for obtaining descriptive statistics that can indicate the need for data cleaning or errors introduced in the data pipeline.

LA

Credits

3

Prerequisite

Prerequisite: Completion of MAC 2010 Computer Programming II and MAC 2015 Database Design & Construction