Machine Learning and Data Mining
[go to overview]Winter Term 2015 / 2016
Welcome to the website of Machine Learning and Data Mining course.
On this page, we want to provide you with the most important information and course material for the course. Here, we publish slides for lectures and tutorials, exercises and home assignments. You can simply sync this Github repository for always being up-to-date and having access to all course materials!
Lectures
Lectures take place every week on Wednesdays, 10 a.m. in room H 010. The following agenda is a first layout and may change during the course.
- 28.10. Introduction & data mining process: Slides (.ppt) - Slides (.pdf)
- 04.11. Clustering I: task, distance measures, k-means, m-medoids: -
- 11.11. Clustering II: EM-Algorithm, density-based clustering: -
- 18.11. Clustering III: hierarchical clustering, other methods: -
- 25.11. Classification I: task & evaluation, Naive Bayes
- 02.12. NO COURSE
- 09.12. Classification II: nearest neighbor, decision trees: Slides
- 11.12. Classification III: Ensemble learning, random forests (time slot of tutorial): Slides
- 16.12. Classification IV: Support vector machines: Slides (.pptx) - Slides (.pdf)
- 13.01. Association rule mining: Slides (.pptx) - Slides (.pdf)
- 20.01. Subgroup discovery: Slides (.pptx) - Slides (.pdf)
- 27.01. Matrix factorization (PCA-SVD-Text Mining): Slides (.ppt) - Slides(.pdf) - additional Slides (LSA)
- 03.02. Sequential data Slides (.odp) - Slides (.pdf)
- 10.02. Bayesian learning Slides (.odp) - Slides(.pdf)
- 24.02. Final exam, Room D 239, 10.15 am
Tutorials
Tutorials take place about every two weeks on Friday, 12.00h in room M 001. The exact dates are:
- 28.10. Basic statistics and data preparation: - -
- 20.11. Statistics, K-Means, ... Tutorial slides (odp) Tutorial slides (pdf)
- 27.11. ROC curve Tutorial slides (pptx) Tutorial slides (pdf)
- 18.12. Tutorial slides (odp) Tutorial slides (pdf) Reddit test Reddit train
- 15.01. Exercise Task
- 29.01.
Assignments
Home assignments are mandatory and to be solved in groups of 2-3 participants. You must complete at least 5 of the 6 home assignments to be admitted to the final exam.
- 1. Home Assignment: Solution
Submit at latest by 10th of November - 2. Home Assignment: Solution
Submit at latest by 25th of November - 3. Home Assignment: Task Solution
Submit at latest by 16th of December - 4. Home Assignment: Task Reddit test Reddit train
Submit at latest by 14th of January - 5. Home Assignment: Task
Submit at latest by 27th of January - 6. Home Assignment: Task Solution
Submit at latest by 15th of February
Exam
Recommended Literature
- T. Mitchell: "Machine Learning", 1997
- J. Han, M. Kamber, J. Pei: "Data Mining: Concepts and Techniques", 2011
- I. Witten, E. Frank, M. Hall: "Data Mining: Practical Machine Learning Tools and Techniques", 2011
- C. Bishop: "Pattern Recognition and Machine Learning", 2008
- M. Ester, J. Sander: "Knowledge Discovery in Databases: Techniken und Anwendungen", 2013 (german language)
More literature can be found in the lecture slides
Team
The course will be coached by the following persons:
- Markus Strohmaier markus.strohmaier@gesis.org
- Florian Lemmerich florian.lemmerich@gesis.org
- Philipp Singer philipp.singer@gesis.org
- Christoph Kling christoph.kling@gesis.org
For inquiries please consult the newsgroup! (.infko.mldm)