Sie sind hier

Machine Learning and Data Mining

Welcome to the page of Machine Learning and Data Mining course of winter terms 2017/2018!


Lecture and Tutorial - Machine Learning and Data Mining (6 ECTS; for Master and Bachelor students in Web Science, Computer Science, Computational Visualistics and Business Informatics)

Inter-student communication: Please use the corresponding newsgroup infko-mldm here.

Veranstaltungsnummer: 0432028

Dozent(in) Prof. Dr. Steffen Staab
Time slots

We. 08.30-10.00,
Room G410     till 01.11.2017
Room M001 from 08.11.2017

Dozent(in) Dr. Mahdi Bohlouli, Raphael Menges
Time slots

We. 14.15-15.45, Room A213
Th. 16.15-17.45, Room E113

Lectures are hold on Wednesdays beginning October 18 and start on 8:30AM if not stated otherwise below. Tutorials are hold on Wednesdays and Thursdays beginning October 25.


This course requires mathematics as taught for CS majors. A compact view of what is needed is available in the DeepLearningBook in Chapters 2, 3, and 4.

Course Material

Date Lecture Topics Slides

Motivation and Introduction

25.10. Programming with Python
01.11. No Lecture, Public Holiday  
08.11. Classifcation, K-Nearest-Neighbors machinelearning-2-classification.pptx
15.11. 8:00AM sharp-9.30AM K-Nearest-Neighbors, Bayesian Classification
22.11. 8.30AM-10.00AM First lecture of that day: Decision Trees ML-3-decision-trees.pptx
22.11. 6:00PM to 7:30PM in E011 Second lecture of that day: Random Forest ML-4-random-forests.pdf
29.11 No Lecture  
6.12. Support Vector Machines ML5-SVMs.pdf
13.12. 8:00AM sharp to 9:25AM Neural Networks / Motivation + Linear transformations Tentative PDF
20.12. Neural Networks / Gradient descent Extended tentative PDF (corrected Jan 5, 2018)
10.1. 8.30am Neural Networks / Learning different target functions ML7-NN3.pdf
17.1. 8.30am Neural Networks / Backpropagation + Regularization ML8-NN4-Backprop.pdf
17.1. 4.15pm, Room D239 Prof. Dr. Marcin Grzegorzek "Medical Data Science -- Extracting Health-related Knowledge from Big Data" Abstract and CV
24.1. 8.30am Clustering, Clustering evaluation ML10-Clustering.pptx
31.1. 8.30am K-Means, Expectation Maximization, Hierarchical Clustering ML10-Clustering-revised.pptx
31.1. 4:15PM, D239 Dr. Thomas Gottron talks about machine learning at credit rating agency Schufa  
7.2. Reinforcement learning, Latent Semantics, Probabilistic Latent Semantics, Topic Models, Neural Network Autoencoder, Questions & Answers  


Date Tutorial Topics Material
25.10. & 26.10. Tutorial and assignment structure, groups and SVN introduction salary.csv
08.11. & 09.11. Solutions of "Machine Learning Fundamentals", review of next assignment Blackboard
15.11. & 16.11. Solutions of "Simple Classification", review of next assignment Blackboard
22.11. & 23.11. Solutions of "Decision Tree", review of next assignment Blackboard
29.11. & 30.11. Solutions of "Naive Bayes Classifier", review of next assignment Blackboard
06.12. & 07.12. Solutions of "Manual Multinomial Naive Bayes Classifier", review of next assignment Blackboard
13.12. & 14.12. Solutions of "KNN and NB", review of next assignment Blackboard
10.01. & 11.01. Cancelled, see material for solutions of assignment 07. solution07.pdf
17.01. & 18.01. Solutions of "Support Vector Machines", review of next assignment Blackboard
24.01. & 25.01. Solutions of "Deep Learning", review of next assignment Blackboard
31.01. & 01.02.    
07.02. & 08.02.    


Please form groups of three people to work on the assignments here, until 26th of October!  They are graded before the next tutorial and it is mandatory to reach 60% of the points in total over all assignments to be allowed to participate in the exam. E.g., if there are 10 assignments each 10 points, you need in total at minimum 60 points in sum over all assignments to participate in the exam.

Release Date Assignment Submission Deadline at 9:00AM Sheets
23.10. Pen and Paper: Machine Learning Fundamentals 06.11. assignment01.pdf
06.11. Programming: Simple Classification 13.11. assignment02.pdf
14.11. Programming: Decision Tree 20.11. assignment03.pdf
20.11. Programming: Naive Bayes Classifier 27.11. assignment04.pdf
27.11. Programming: Manual Multinomial Naive Bayes Classifier 04.12. assignment05.pdf
04.12. Pen and Paper: KNN and NB
Update: Typo in 2.2 - "Cold" should be "Cool"
11.12. assignment06_0.pdf
11.12. Pen and Paper: Decision Tree 18.12. assignment07.pdf
08.01. Pen and Paper: Support Vector Machines 15.01. assignment08.pdf
15.01. Programming: Deep Learning
Update: Set random state of dataset split to zero!
22.01. assignment09_0.pdf
22.01. Pen and Paper: Deep Learning 29.01. assignment10.pdf
29.01. Pen and Paper: Clustering 05.02. assignment11.pdf


  • 1st exam will be on Wednesday, 21st Februrary 2018, at 8:15AM in room D028. The registration to this exam will be opened on 15th January 2018. Permitted students according to assignments can register to the exam. The exam can be found with the number 432028 in Klips.
  • 2nd exam will be on Wednesday, 11th April 2018, at 14:15AM in room tba.

General remarks about exams:

  1. No calculators allowed!
  2. Be on time.

Core Literature & Systems

  • Charu C. Aggarwal. Data Mining: The Textbook. Springer, 2015.
  • For learning about mathematical basics of this course and about neural networks:
    Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press, 2016.
  • For learning about the principles behind K-Means: Yordan Raykov, Alexis Boukouvalas, Fahd Baig, Max Little. What to Do When k-Means Clustering Fails: A Simple yet Principled Alternative Algorithm. In: PLOS One, DOI: 10.1371/journal.pone.0162259, September 26, 2016.

Further Literature, Systems & Stuff

Main Conferences

  • NIPS - Neural Information Processing
  • ICML - Int. Conf. on Machine Learning
  • IEEE ICDM - Int. Conf. on Data Mining (different from "ICDM" without "IEEE"!)
  • ACM KDD - Knowledge Discovery

Talk by Prof. Dr. Marcin Grzegorzek

Abstract: On the one hand, the demographic change and the shortage of medical staff (especially in rural areas) critically challenge healthcare systems in industrialised countries. On the other hand, the digitalisation of our society progresses with a tremendous speed, so that more and more health-related data are available in a digital form. For instance, people wear intelligent glasses or/and smart watches, provide digital data with standardised medical devices (e.g., blood pressure and blood sugar meters following the standard ISO/IEEE 11073) or/and deliver personal behavioural data by their smartphones. Pattern recognition algorithms that automatically analyse and interpret that huge amount of heterogeneous data towards prevention (early risk detection), diagnosis, assistance in therapy/aftercare/rehabilitation as well as nursing will experience an extremely high scientific, societal and economic priority in the near future. In this talk, apart from a general overview and introduction to the topic, Marcin Grzegorzek will present his scientific vision addressing the research direction motivated above. It includes the development of original pattern recognition algorithms for holistic health assessment. In his research, Marcin considers mainly the steps of prevention/early risk detection as well as therapy assistance in the context of neurodegenerative diseases. After a general introduction of his scientific vision, Marcin will shortly present two of the related projects he currently leads: (1) Cognitive Village: Adaptively Learning, Technical Support System for Elderly (funded by the German Federal Ministry of Education and Research); (2) My-AHA: My Active and Healthy Ageing (EC Horizon 2020). Apart from the development of adaptive machine learning software, aspects of hardware, user acceptance as well as ELSI (Ethical, Legal and Social Implications) are also considered in these projects. Marcin will close his talk by a summary and some insights into possible future scientific directions in the area of medical data science.


Marcin Grzegorzek is Head of the Research Group for Pattern Recognition at the University of Siegen, Professor at the University of Economics in Katowice and Chairman of the Board at Data Understanding Lab Ltd. He studied Computer Science at the Silesian University of Technology, did his PhD at the University of Erlangen-Nuremberg, worked scientifically as Postdoc at the Queen Mary University of London as well as at the University of Koblenz-Landau, and did his habilitation at the AGH University of Science and Technology in Kraków. He published around 100 papers in pattern recognition, image processing, machine learning, and multimedia analysis. For the time being, he runs six externally funded research projects. For instance, Marcin coordinates the project Cognitive Village aiming at developing a user-friendly support system for elderly that applies machine learning algorithms for sensor-based health assessment.


Prof. Dr. Steffen Staab

Dr. Mahdi Bohlouli

Raphael Menges