| Academic Unit: |
Computer Engineering |
| Mode of Delivery: |
Face to face |
| Prerequisites: |
none |
| Language of Instruction: |
English |
| Level of Course Unit: |
Graduate |
| Course Coordinator: |
Taner ARSAN |
| Course Lecturer(s): |
Taner ARSAN |
| Course Objectives: |
The Data Science and Analytics course educate students to a foundation level on big data and the state of the practice of analytics. The course provides an introduction to big data and a Data Analytics Lifecycle to address business challenges that leverage big data. It provides grounding in basic and advanced analytic methods and an introduction to big data analytics technology and tools, including MapReduce and Hadoop. Upon completing the course, students will have the knowledge and practical experience to immediately participate effectively in big data and other analytics projects. |
| Course Contents: |
1. Introduction to Big Data and Analytics:
Introduction to Big Data and Data Analytics module focuses on the definition of and an overview of big data, the state of practice of analytics, the Data Scientist role, and data analytics in industry verticals.
2. Data Analytics Lifecycle:
This module focuses on the explaining the various phases of a typical analytics lifecycle – discovery, data preparation, model planning, model building, communicating results and findings, and operationalizing. This module also details the critical activities that occur in each phase of the lifecycle.
3. Review of Basic Data Analytic Methods Using R / Python:
This module focuses on an introduction to R or Python programming, initial exploration and analysis of the data using R, and basic visualization using R, and includes examples to familiarize students with the concepts taught.
4. Advanced Analytics – Theory and Methods:
This module focuses on the core analytical methods covered are: Categorization (un-supervised); 1.K-means clustering, 2. Association Rules, Regression; 3. Linear 4. Logistic, Classification (supervised); 5.Naïve Bayesian classifier, 6. Decision Trees, 7. Time Series Analysis, 8. Text Analysis
5. Advanced Analytics – Technology and Tools:
This module focuses on analytic tools for unstructured data, including MapReduce and the Hadoop ecosystem. This module introduces the idea behind MapReduce processing, and then describes how Hadoop implements this algorithm. Also, the roles of the Hadoop Distributed File System (HDFS) and Yet Another Resource Negotiator (YARN) are covered. This lesson covers Data Management: the processing and development of frameworks to work on unstructured data in the terabyte range, and presents extensions to Hadoop that leverage its capabilities. It also covers: Hive and Pig – Hadoop query languages, HBase – a BigTable workalike using
Hadoop, Mahout – machine learning algorithms and Hadoop MapReduce
6. Operationalizing an Analytics Project and Data Visualizations Techniques:
This module focuses on identifying operationalizing a data analytics lifecycle project, the core deliverables and creating them for key stakeholders and others. This module also details how to emphasize key points using visualization methods. It also covers survey of data visualization tools, creating different visualizations for sponsors and analysts, developing visuals to support your key points, how to clean up a chart or visualization. |
| Learning Outcomes of the Course Unit (LO): |
- 1- Upon completing the course, students will have the knowledge and practical experience to immediately participate effectively in big data and other analytics projects.
|
| Planned Learning Activities and Teaching Methods: |
Formal |
| Week | Subjects | Related Preperation |
| 1 |
Introduction and Course Agenda Module 1: Introduction to Big Data and Analytics |
Big Data Overview State of the Practice in Analytics |
| 2 |
The Data Scientist Big Data Analytics in Industry Verticals |
This module focuses on the explaining the various phases of a typical analytics lifecycle – discovery, data preparation, model planning, model building, communicating results and findings, and operationalizing. This module also details the critical activities that occur in each phase of the lifecycle. |
| 3 |
Module 2: Data Analytics Lifecycle |
Discovery, Data Prep, Model Planning, Model Building, Communicate Results, Operationalize |
| 4 |
Module 3: Review of Basic Data Analytic Methods Using R / Python. |
Python fundamentals or using the R Graphical User Interface, Overview: Getting Data into (and out of) R, Data Types Used in R, Basic R Operations, Basic Statistics, Generic Functions |
| 5 |
Module 4: Advanced Analytics – Theory and Methods Overview K-means Clustering (Lesson 1) Association Rules (Lesson 2) |
• K-means clustering • Association Rules • Linear Regression • Logistic Regression • Naïve Bayesian Classifiers • Decision Trees • Time Series Analysis • Text Analytics |
| 6 |
Linear Regression (Lesson 3) Logistic Regression (Lesson 4) |
|
| 7 |
Naïve Bayesian Classifiers (Lesson 5) Decision Trees (Lesson 6) |
|
| 8 |
Time Series Analysis (Lesson 7) Text Analytics (Lesson 8) |
|
| 9 |
Midterm Exam |
|
| 10 |
Module 5: Advanced Analytics - Technology and Tools Lesson 1: Analytics for Unstructured Data - MapReduce and Hadoop Lesson 2: The Hadoop Ecosystem |
|
| 11 |
Lesson 3: In-database Analytics SQL essentials Lesson 4: Advanced SQL and MADlib |
|
| 12 |
Module 6 – The Endgame, or Putting it All Together Lesson 1: Operationalizing an Analytics Project Lesson 2: Creating the Final Deliverables |
|
| 13 |
Module 6 – The Endgame, or Putting it All Together Lesson 3: Data Visualization |
Survey of data visualization tools • Creating different visualizations for sponsors and analysts • Developing visuals to support your key points • How to clean up a chart or visualization |
| 14 |
Preparing Project Paper and Presentations |
|
At Kadir Has University, a Semester is 14 weeks; The weeks 15 and 16 are reserved for final exams.
THE RELATIONSHIP BETWEEN COURSE LEARNING OUTCOMES (LO) AND PROGRAM QUALIFICATIONS (PQ)
| # |
PQ1 |
PQ2 |
PQ3 |
PQ4 |
PQ5 |
PQ6 |
PQ7 |
PQ8 |
PQ9 |
PQ10 |
| LO1 |
|
|
|
|
|
|
|
|
|
|
Contribution: 1 Low, 2 Average, 3 High