Welcome to the page of Machine Learning and Data Mining course of winter terms 2017/2018!
Lecture and Tutorial - Machine Learning and Data Mining (6 ECTS; for Master and Bachelor students in Web Science, Computer Science, Computational Visualistics and Business Informatics)
Inter-student communication: Please use the corresponding newsgroup infko-mldm here.
|Dozent(in)||Prof. Dr. Steffen Staab|
|Dozent(in)||Dr. Mahdi Bohlouli, Raphael Menges|
We. 14.15-15.45, Room A213
Lectures are hold on Wednesdays beginning October 18 and start on 8:30AM if not stated otherwise below. Tutorials are hold on Wednesdays and Thursdays beginning October 25.
This course requires mathematics as taught for CS majors. A compact view of what is needed is available in the DeepLearningBook in Chapters 2, 3, and 4.
Motivation and Introduction
|25.10.||Programming with Python||machinelearning-1-intro-to-python.zip
|01.11.||No Lecture, Public Holiday|
|15.11. 8:00AM sharp-9.30AM||K-Nearest-Neighbors, Bayesian Classification|
|22.11. 8.30AM-10.00AM||First lecture of that day: Decision Trees||ML-3-decision-trees.pptx
|22.11. 6:00PM to 7:30PM in E011||Second lecture of that day: Random Forest||ML-4-random-forests.pdf|
|6.12.||Support Vector Machines||ML5-SVMs.pdf
|13.12. 8:00AM sharp to 9:25AM||Neural Networks / Motivation + Linear transformations||Tentative PDF|
|20.12.||Neural Networks / Gradient descent||Extended tentative PDF (corrected Jan 5, 2018)|
|10.1. 8.30am||Neural Networks / Learning different target functions||ML7-NN3.pdf
|17.1. 8.30am||Neural Networks / Backpropagation + Regularization||ML8-NN4-Backprop.pdf
|17.1. 4.15pm, Room D239||Prof. Dr. Marcin Grzegorzek "Medical Data Science -- Extracting Health-related Knowledge from Big Data"||Abstract and CV|
|24.1. 8.30am||Clustering, Clustering evaluation, K-Means, Expectation Maximization||ML10-Clustering.pptx
|31.1. 8.30am||Latent Semantics, Probabilistic Latent Semantics, Topic Models, Neural Network Autoencoder|
|31.1. 4:15PM, room to be announced||Dr. Thomas Gottron talks about machine learning at credit rating agency Schufa|
|7.2.||Reinforcement learning, Questions & Answers|
|25.10. & 26.10.||Tutorial and assignment structure, groups and SVN introduction||salary.csv
|08.11. & 09.11.||Solutions of "Machine Learning Fundamentals", review of next assignment||Blackboard|
|15.11. & 16.11.||Solutions of "Simple Classification", review of next assignment||Blackboard|
|22.11. & 23.11.||Solutions of "Decision Tree", review of next assignment||Blackboard|
|29.11. & 30.11.||Solutions of "Naive Bayes Classifier", review of next assignment||Blackboard|
|06.12. & 07.12.||Solutions of "Manual Multinomial Naive Bayes Classifier", review of next assignment||Blackboard|
|13.12. & 14.12.||Solutions of "KNN and NB", review of next assignment||Blackboard|
|10.01. & 11.01.||Cancelled, see material for solutions of Assignment 07.||solution07.pdf|
|17.01. & 18.01.||Solutions of "Support Vector Machines", review of next assignment|
|24.01. & 25.01.|
|31.01. & 01.02.|
|07.02. & 08.02.|
Please form groups of three people to work on the assignments here, until 26th of October! They are graded before the next tutorial and it is mandatory to reach 60% of the points in total over all assignments to be allowed to participate in the exam. E.g., if there are 10 assignments each 10 points, you need in total at minimum 60 points in sum over all assignments to participate in the exam.
|Release Date||Assignment||Submission Deadline at 9:00AM||Sheets|
|23.10.||Pen and Paper: Machine Learning Fundamentals||06.11.||assignment01.pdf
|06.11.||Programming: Simple Classification||13.11.||assignment02.pdf
|14.11.||Programming: Decision Tree||20.11.||assignment03.pdf|
|20.11.||Programming: Naive Bayes Classifier||27.11.||assignment04.pdf
|27.11.||Programming: Manual Multinomial Naive Bayes Classifier||04.12.||assignment05.pdf|
|04.12.||Pen and Paper: KNN and NB
Update: Typo in 2.2 - "Cold" should be "Cool"
|11.12.||Pen and Paper: Decision Tree||18.12.||assignment07.pdf|
|08.01.||Pen and Paper: Support Vector Machines||15.01.||assignment08.pdf|
|15.01.||Programming: Deep Learning
Update: Set random state of dataset split to zero!
- 1st exam will be on Wednesday, 21st Februrary 2018, at 8:15AM in room D028. The registration to this exam will be opened on 15th January 2018. Permitted students according to assignments can register to the exam. The exam can be found with the number 432028 in Klips.
- 2nd exam will be on Wednesday, 11th April 2018, at 14:15AM in room tba.
Core Literature & Systems
- Charu C. Aggarwal. Data Mining: The Textbook. Springer, 2015.
- For learning about mathematical basics of this course and about neural networks:
Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org/
- For learning about the principles behind K-Means: Yordan Raykov, Alexis Boukouvalas, Fahd Baig, Max Little. What to Do When k-Means Clustering Fails: A Simple yet Principled Alternative Algorithm. In: PLOS One, DOI: 10.1371/journal.pone.0162259, September 26, 2016.
Further Literature, Systems & Stuff
- NIPS - Neural Information Processing
- ICML - Int. Conf. on Machine Learning
- IEEE ICDM - Int. Conf. on Data Mining (different from "ICDM" without "IEEE"!)
- ACM KDD - Knowledge Discovery
Abstract: On the one hand, the demographic change and the shortage of medical staff (especially in rural areas) critically challenge healthcare systems in industrialised countries. On the other hand, the digitalisation of our society progresses with a tremendous speed, so that more and more health-related data are available in a digital form. For instance, people wear intelligent glasses or/and smart watches, provide digital data with standardised medical devices (e.g., blood pressure and blood sugar meters following the standard ISO/IEEE 11073) or/and deliver personal behavioural data by their smartphones. Pattern recognition algorithms that automatically analyse and interpret that huge amount of heterogeneous data towards prevention (early risk detection), diagnosis, assistance in therapy/aftercare/rehabilitation as well as nursing will experience an extremely high scientific, societal and economic priority in the near future. In this talk, apart from a general overview and introduction to the topic, Marcin Grzegorzek will present his scientific vision addressing the research direction motivated above. It includes the development of original pattern recognition algorithms for holistic health assessment. In his research, Marcin considers mainly the steps of prevention/early risk detection as well as therapy assistance in the context of neurodegenerative diseases. After a general introduction of his scientific vision, Marcin will shortly present two of the related projects he currently leads: (1) Cognitive Village: Adaptively Learning, Technical Support System for Elderly (funded by the German Federal Ministry of Education and Research); (2) My-AHA: My Active and Healthy Ageing (EC Horizon 2020). Apart from the development of adaptive machine learning software, aspects of hardware, user acceptance as well as ELSI (Ethical, Legal and Social Implications) are also considered in these projects. Marcin will close his talk by a summary and some insights into possible future scientific directions in the area of medical data science.
Marcin Grzegorzek is Head of the Research Group for Pattern Recognition at the University of Siegen, Professor at the University of Economics in Katowice and Chairman of the Board at Data Understanding Lab Ltd. He studied Computer Science at the Silesian University of Technology, did his PhD at the University of Erlangen-Nuremberg, worked scientifically as Postdoc at the Queen Mary University of London as well as at the University of Koblenz-Landau, and did his habilitation at the AGH University of Science and Technology in Kraków. He published around 100 papers in pattern recognition, image processing, machine learning, and multimedia analysis. For the time being, he runs six externally funded research projects. For instance, Marcin coordinates the project Cognitive Village aiming at developing a user-friendly support system for elderly that applies machine learning algorithms for sensor-based health assessment.