K6312 INFORMATION MINING & ANALYSIS Spring 2020

Course webiste for K6312

View My GitHub Profile

K6312 INFORMATION MINING & ANALYSIS

NTU, WKW / Spring 2020

Course Description

Nowadays, with the popularity of the Internet, there is a massive amount of data available, and it becomes an important resource for mining useful knowledge. From a business and government point of view, there is an increasing need to interpret and act upon the large-volume data.

This course is an introduction to data (or information) mining and analysis, and covers how to analyse structured data. Students will learn various machine learning (or statistical learning) techniques and tools both through lectures and hands-on exercises in labs. Students will learn following topics in the course:

Course Objectives:

At the end of this course, students should be able to:

Prerequisites:

Basic computer programming skill is required before you take this course. Also it would be helpful if you have some basic knowledge of mathematics and statistics.

Method of Assessment:

Coursework (individual and group assignments): 40% (40 marks) Class Participation (class interactions and attendance): 10% (10 marks) Final examination: 50% (50 marks); 3 hours closed book exam

Reference Books

The following books are helpful, but not required. You will easily get these books from Internet.

If you are not proficient in python, you may find some tutorials helpful.

Announcement

  • 2020-03-19: Zoom URL: https://zoom.us/j/150749230?pwd=clVuUVgrQUp4dVg4QW5OVzRLTXFEQT09
  • 2020-02-27: Zoom URL: https://zoom.us/j/142537741?pwd=dGIxeEU2Mm94US9oUzllWVdhakJWdz09
  • 2020-02-20: Zoom URL: https://zoom.us/j/169297572?pwd=OCtEM01lTVptY1RQOE5KUWk1eUhOdz09
  • 2020-02-20: We will start E-learning from this week via Zoom. The link will be announced before the class.
  • 2020-02-06: submit group information
  • 2020-01-16: Welcome to K6312.
  • 2020-01-08: this site has been public.

Assessment

Individual Homework and Lab Assignments (10 marks)

Due date: will be announced later on

These are individual homework and lab assignments. For the lab assignments, you will be asked to submit simple lab reports during or after certain lab sessions. Note that homework and lab assignment reports handed in after the due date/time will not be marked. These assignments will account for 10% of the overall grade.

Group Project (30 marks)

Project title selection due: see the class schedule Project due date: see the class schedule

This is a group project (the size of each group will be announced later). All team members will receive the same grade. This project will account for 30% of the overall grade. Please submit your project title via NTUlearn->assignment->K6312 Project Proposal by the project title selection due date. Include the names of team members in the message.

This is a data mining project where you collect your own sample dataset or use an existing dataset, and using data mining techniques and tools, build an interesting model that mines and analyzes knowledge/information from the dataset. Generally, the project scope is entirely up to you, but I suggest that you build a useful and interesting model (or application). Then, write a project report explaining your methodology and presenting the results.

You may conduct investigative analysis of your dataset using one or more of the following data mining approaches:

  1. Report Submission:
    • Must include a bibliography listing all references (including URLs, if any) cited
    • Length: 8-10 pages.
    • Formatting: The use of 10-point Times font is mandatory. The formatting should be referred to ICML style. The word template could be found here. The latex template is provided in the overleaf.
    • Only require the soft copy. The soft copy should be submitted through Turnitin.
    • Create your github project page that containing dataset and code used by the project. And put your github repo link in your report.
  2. Group Presentation:
    • Each team will be asked to conduct a 15/20-minute presentation on their project work. This will be followed by a 5 minute question-and-answer session to allow for clarification by students and the lecturer. Schedules for the presentation will be announced later.
    • Every teammember is required to present.

Note that reports and required files/documents handed in after the due date will be marked down by 10% per day.

IDEAS INSPRIATION

You might look at recent deep learning publications from top-tier machine learning conferences and labs, as well as other resources below.

How to get Data?

Final Exam (50%)

We will prepare a review session before the final exam.

Schedule

Class Venue: Tan Tong Meng (TTM) PC Lab CS02-35a WKWSCI Bldg

Date Topic Lecture Assignment Due
Thursday 01/16 Introduction to Information Mining & Analysis Slides, Notebook N.A.
Thursday 01/23 Basics in Python Programming & Data Proprocessing Slides, Notebook N.A
Thursday 01/30 Linear Regression Slides, Notebook, Data1 Data2 N.A.
Thursday 02/06 Logistic Regression Slides & Notebook N.A.
Thursday 02/13 Decision Tree Slides In-class Assignment
Thursday 02/20 Ensemble Learning Slides & Notebook Project Title Submission
Thursday 02/27 Neural Networks Slides & Notebook Additional In-class Assignment
Thursday 03/12 Introduction to Deep Learning Slides & Notebook N.A.
Thursday 03/19 Support Vector Machine Slides & Notebook N.A.
Thursday 03/26 Interpretability of Machine Learning Slides N.A.
Thursday 04/02 No Class TBU N.A.
Thursday 04/09 Unsupervised Learning (Clustering) & Key Points Review Slides & Notebook In-class Assignment
Thursday 04/16 Group Presentation TBU N.A.
Thursday 04/23 Group Presentation TBU N.A.
Thursday 04/30 Final Online Test (7:00pm - 8:30pm) TBU account for 30% of Total Score
Thursday p.m 05/07 N.A N.A. Project Paper Submission

Group Presentation at Week 12 & 13

Group No Title Group Members Time
9 Hotel Booking Demand Analysis Zhao Ziyu, Li Qianyuan, Zhang Yimeng, Wang Zhenjiang 04/16 6:30pm - 6:50pm
8 Taxi Trip Time Prediction Pan Pan, Sun Mingyi, Li Minglu, Chen Zijia 04/16 6:55pm - 7:15pm
6 Between PM2.5 And Wind Speed Based on Linear Regression Model Liu Yi, Yan Zhiruo, Wu Jiani, Xiang Mengjing 04/16 7:20pm - 7:40pm
4 What makes a surefire movie? Yang Haiting, Tian Shuo, Chen Zhuojun, He Rui 04/16 7:45pm - 8:05pm
3 What Leads to Happiness in Asia Xiao Jieying, Long Haifei, Lin Jing, Li Yingtong 04/16 8:10pm - 8:30pm
5 Information Mining & Analysis on Cervical Cancer Huang Baiqiuzi, Yao Mengjie, Guo Jindou, Liu Mengyao 04/23 6:30pm - 6:50pm
2 Prediction of Heart Disease Zhang Liyuan, Ma Xiaoru, Luo Kexin, Luo Lan, Koh Swee Guan 04/23 6:55pm - 7:15pm
7 Predicting product sales with machine learning and deep learning algorithms Priyadharshini, Swedha, Ng Yongrong Shaun 04/23 7:20pm - 7:40pm
1 Comparison of Several Regression Models in Predicting 2019-nCov Epidemic Trends Liu Yuqi, Wang Xiao, Zhou Qi, Song Yajun, Zhou yifei 04/23 7:45pm - 8:05pm
10 Using multiple variables as regressors to predict box-office revenue Chen Yidian, Cai Jingyi, Zhong Churong, Yu Zeyuan 04/23 8:10pm - 8:30pm