MAC 3075 Computation in Data Science
This course introduces students to the core computational technologies involved in the analysis of large data sets, known popularly as "Data Science" or "Big Data." We focus on the data-handling and computational aspects of data handling-sometimes referred to as "Data Engineering"-and touch on statistics from the standpoint of ensuring data cleanliness and testing the computations. We begin with case studies in Big Data analysis, involving social networks, sensor networks, and application logs. We then introduce various tools used in the data pipeline, including Map-Reduce, No-SQL databases, and custom code for data manipulation. Finally we introduce the use of R for obtaining descriptive statistics that can indicate the need for data cleaning or errors introduced in the data pipeline.
LA
Prerequisite
Prerequisite: Completion of
MAC 2010 Computer Programming II and
MAC 2015 Database Design & Construction