Course webiste for K6312
Nowadays, with the popularity of the Internet, there is a massive amount of data available, and it becomes an important resource for mining useful knowledge. From a business and government point of view, there is an increasing need to interpret and act upon the large-volume data.
This course is an introduction to data (or information) mining and analysis, and covers how to analyse structured data. Students will learn various machine learning (or statistical learning) techniques and tools both through lectures and hands-on exercises in labs. Students will learn following topics in the course:
At the end of this course, students should be able to:
Basic computer programming skill is required before you take this course. Also it would be helpful if you have some basic knowledge of mathematics and statistics.
Coursework (individual and group assignments): 40% (40 marks) Class Participation (class interactions and attendance): 10% (10 marks) Final examination: 50% (50 marks); 3 hours closed book exam
The following books are helpful, but not required. You will easily get these books from Internet.
If you are not proficient in python, you may find some tutorials helpful.
- 2020-03-19: Zoom URL: https://zoom.us/j/150749230?pwd=clVuUVgrQUp4dVg4QW5OVzRLTXFEQT09
- 2020-02-27: Zoom URL: https://zoom.us/j/142537741?pwd=dGIxeEU2Mm94US9oUzllWVdhakJWdz09
- 2020-02-20: Zoom URL: https://zoom.us/j/169297572?pwd=OCtEM01lTVptY1RQOE5KUWk1eUhOdz09
- 2020-02-20: We will start E-learning from this week via Zoom. The link will be announced before the class.
- 2020-02-06: submit group information
- 2020-01-16: Welcome to K6312.
- 2020-01-08:
this site has been public.
Due date: will be announced later on
These are individual homework and lab assignments. For the lab assignments, you will be asked to submit simple lab reports during or after certain lab sessions. Note that homework and lab assignment reports handed in after the due date/time will not be marked. These assignments will account for 10% of the overall grade.
Project title selection due: see the class schedule Project due date: see the class schedule
This is a group project (the size of each group will be announced later). All team members will receive the same grade. This project will account for 30% of the overall grade. Please submit your project title via NTUlearn->assignment->K6312 Project Proposal by the project title selection due date. Include the names of team members in the message.
This is a data mining project where you collect your own sample dataset or use an existing dataset, and using data mining techniques and tools, build an interesting model that mines and analyzes knowledge/information from the dataset. Generally, the project scope is entirely up to you, but I suggest that you build a useful and interesting model (or application). Then, write a project report explaining your methodology and presenting the results.
You may conduct investigative analysis of your dataset using one or more of the following data mining approaches:
Note that reports and required files/documents handed in after the due date will be marked down by 10% per day.
You might look at recent deep learning publications from top-tier machine learning conferences and labs, as well as other resources below.
We will prepare a review session before the final exam.
Class Venue: Tan Tong Meng (TTM) PC Lab CS02-35a WKWSCI Bldg
Date | Topic | Lecture | Assignment Due |
---|---|---|---|
Thursday 01/16 | Introduction to Information Mining & Analysis | Slides, Notebook | N.A. |
Thursday 01/23 | Basics in Python Programming & Data Proprocessing | Slides, Notebook | N.A |
Thursday 01/30 | Linear Regression | Slides, Notebook, Data1 Data2 | N.A. |
Thursday 02/06 | Logistic Regression | Slides & Notebook | N.A. |
Thursday 02/13 | Decision Tree | Slides | In-class Assignment |
Thursday 02/20 | Ensemble Learning | Slides & Notebook | Project Title Submission |
Thursday 02/27 | Neural Networks | Slides & Notebook | Additional In-class Assignment |
Thursday 03/12 | Introduction to Deep Learning | Slides & Notebook | N.A. |
Thursday 03/19 | Support Vector Machine | Slides & Notebook | N.A. |
Thursday 03/26 | Interpretability of Machine Learning | Slides | N.A. |
Thursday 04/02 | No Class | TBU | N.A. |
Thursday 04/09 | Unsupervised Learning (Clustering) & Key Points Review | Slides & Notebook | In-class Assignment |
Thursday 04/16 | Group Presentation | TBU | N.A. |
Thursday 04/23 | Group Presentation | TBU | N.A. |
Thursday 04/30 | Final Online Test (7:00pm - 8:30pm) | TBU | account for 30% of Total Score |
Thursday p.m 05/07 | N.A | N.A. | Project Paper Submission |
Group No | Title | Group Members | Time |
---|---|---|---|
9 | Hotel Booking Demand Analysis | Zhao Ziyu, Li Qianyuan, Zhang Yimeng, Wang Zhenjiang | 04/16 6:30pm - 6:50pm |
8 | Taxi Trip Time Prediction | Pan Pan, Sun Mingyi, Li Minglu, Chen Zijia | 04/16 6:55pm - 7:15pm |
6 | Between PM2.5 And Wind Speed Based on Linear Regression Model | Liu Yi, Yan Zhiruo, Wu Jiani, Xiang Mengjing | 04/16 7:20pm - 7:40pm |
4 | What makes a surefire movie? | Yang Haiting, Tian Shuo, Chen Zhuojun, He Rui | 04/16 7:45pm - 8:05pm |
3 | What Leads to Happiness in Asia | Xiao Jieying, Long Haifei, Lin Jing, Li Yingtong | 04/16 8:10pm - 8:30pm |
5 | Information Mining & Analysis on Cervical Cancer | Huang Baiqiuzi, Yao Mengjie, Guo Jindou, Liu Mengyao | 04/23 6:30pm - 6:50pm |
2 | Prediction of Heart Disease | Zhang Liyuan, Ma Xiaoru, Luo Kexin, Luo Lan, Koh Swee Guan | 04/23 6:55pm - 7:15pm |
7 | Predicting product sales with machine learning and deep learning algorithms | Priyadharshini, Swedha, Ng Yongrong Shaun | 04/23 7:20pm - 7:40pm |
1 | Comparison of Several Regression Models in Predicting 2019-nCov Epidemic Trends | Liu Yuqi, Wang Xiao, Zhou Qi, Song Yajun, Zhou yifei | 04/23 7:45pm - 8:05pm |
10 | Using multiple variables as regressors to predict box-office revenue | Chen Yidian, Cai Jingyi, Zhong Churong, Yu Zeyuan | 04/23 8:10pm - 8:30pm |