COURSE DESCRIPTION AND APPLICATION INFORMATION

Course Name Code Semester T+A+L (hour/week) Type (C / O) Local Credit ECTS
Data Science and Analytics CE 515 Spring 03+00+00 Elective 3 7.5
Academic Unit: Computer Engineering
Mode of Delivery: Face to face
Prerequisites: none
Language of Instruction: English
Level of Course Unit: Graduate
Course Coordinator: Taner ARSAN
Course Lecturer(s): Taner ARSAN
Course Objectives: The Data Science and Analytics course educate students to a foundation level on big data and the state of the practice of analytics. The course provides an introduction to big data and a Data Analytics Lifecycle to address business challenges that leverage big data. It provides grounding in basic and advanced analytic methods and an introduction to big data analytics technology and tools, including MapReduce and Hadoop. Upon completing the course, students will have the knowledge and practical experience to immediately participate effectively in big data and other analytics projects.
Course Contents: 1. Introduction to Big Data and Analytics: Introduction to Big Data and Data Analytics module focuses on the definition of and an overview of big data, the state of practice of analytics, the Data Scientist role, and data analytics in industry verticals. 2. Data Analytics Lifecycle: This module focuses on the explaining the various phases of a typical analytics lifecycle – discovery, data preparation, model planning, model building, communicating results and findings, and operationalizing. This module also details the critical activities that occur in each phase of the lifecycle. 3. Review of Basic Data Analytic Methods Using R / Python: This module focuses on an introduction to R or Python programming, initial exploration and analysis of the data using R, and basic visualization using R, and includes examples to familiarize students with the concepts taught. 4. Advanced Analytics – Theory and Methods: This module focuses on the core analytical methods covered are: Categorization (un-supervised); 1.K-means clustering, 2. Association Rules, Regression; 3. Linear 4. Logistic, Classification (supervised); 5.Naïve Bayesian classifier, 6. Decision Trees, 7. Time Series Analysis, 8. Text Analysis 5. Advanced Analytics – Technology and Tools: This module focuses on analytic tools for unstructured data, including MapReduce and the Hadoop ecosystem. This module introduces the idea behind MapReduce processing, and then describes how Hadoop implements this algorithm. Also, the roles of the Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN) are covered. This lesson covers Data Management: the processing and development of frameworks to work on unstructured data in the terabyte range, and presents extensions to Hadoop that leverage its capabilities. It also covers: Hive and Pig – Hadoop query languages, HBase – a BigTable workalike using Hadoop, Mahout – machine learning algorithms and Hadoop MapReduce 6. Operationalizing an Analytics Project and Data Visualizations Techniques: This module focuses on identifying operationalizing a data analytics lifecycle project, the core deliverables and creating them for key stakeholders and others. This module also details how to emphasize key points using visualization methods. It also covers survey of data visualization tools, creating different visualizations for sponsors and analysts, developing visuals to support your key points, how to clean up a chart or visualization.
Learning Outcomes of the Course Unit (LO):
  • 1- Upon completing the course, students will have the knowledge and practical experience to immediately participate effectively in big data and other analytics projects.
Planned Learning Activities and Teaching Methods: Formal


WEEKLY SUBJECTS AND RELATED PREPARATIONS

WeekSubjectsRelated Preperation
1 Introduction and Course Agenda Module 1: Introduction to Big Data and Analytics Big Data Overview State of the Practice in Analytics
2 The Data Scientist Big Data Analytics in Industry Verticals This module focuses on the explaining the various phases of a typical analytics lifecycle – discovery, data preparation, model planning, model building, communicating results and findings, and operationalizing. This module also details the critical activities that occur in each phase of the lifecycle.
3 Module 2: Data Analytics Lifecycle Discovery, Data Prep, Model Planning, Model Building, Communicate Results, Operationalize
4 Module 3: Review of Basic Data Analytic Methods Using R / Python. Python fundamentals or using the R Graphical User Interface, Overview: Getting Data into (and out of) R, Data Types Used in R, Basic R Operations, Basic Statistics, Generic Functions
5 Module 4: Advanced Analytics – Theory and Methods Overview K-means Clustering (Lesson 1) Association Rules (Lesson 2) • K-means clustering • Association Rules • Linear Regression • Logistic Regression • Naïve Bayesian Classifiers • Decision Trees • Time Series Analysis • Text Analytics
6 Linear Regression (Lesson 3) Logistic Regression (Lesson 4)
7 Naïve Bayesian Classifiers (Lesson 5) Decision Trees (Lesson 6)
8 Time Series Analysis (Lesson 7) Text Analytics (Lesson 8)
9 Midterm Exam
10 Module 5: Advanced Analytics - Technology and Tools Lesson 1: Analytics for Unstructured Data - MapReduce and Hadoop Lesson 2: The Hadoop Ecosystem
11 Lesson 3: In-database Analytics SQL essentials Lesson 4: Advanced SQL and MADlib
12 Module 6 – The Endgame, or Putting it All Together Lesson 1: Operationalizing an Analytics Project Lesson 2: Creating the Final Deliverables
13 Module 6 – The Endgame, or Putting it All Together Lesson 3: Data Visualization Survey of data visualization tools • Creating different visualizations for sponsors and analysts • Developing visuals to support your key points • How to clean up a chart or visualization
14 Preparing Project Paper and Presentations


REQUIRED AND RECOMMENDED READING

The instructor weekly will provide the course materials. These materials are under copyright conditions and can not be shared. All documents and presentations are provided by Dell/EMC.


OTHER COURSE RESOURCES

Introduction to Machine Learning with Python: A Guide for Data Scientists, Andreas C. Müller and Sarah Guido (for beginners).


ASSESSMENT METHODS AND CRITERIA

Semester RequirementsNumberPercentage of Grade (%)
Project 1 30
Presentation / Jury 1 40
Midterms 1 30
Total: 3 100


WORKLOAD

EventsCountDuration (Hours)Total Workload (hour)
Course Hours14342
Practice / Exercise236
Project149.549.5
Preparation for Presentation / Jury13030
Presentation13030
Midterms13030
Total Workload (hour):187.5


THE RELATIONSHIP BETWEEN COURSE LEARNING OUTCOMES (LO) AND PROGRAM QUALIFICATIONS (PQ)

# PQ1 PQ2 PQ3 PQ4 PQ5 PQ6 PQ7 PQ8 PQ9 PQ10
LO1