# Data Science

Data Science (cf. the Wikipedia definition of data science) describes an attitude towards treating problems with a set of capabilities that is not located in any classic community, but it is a set of capabilities that cross-breed between disciplines, such as physics, biology, social sciences and economics. It uses elaborate computer science paradigms and needs a background in statistics. It feeds the new as well as the classical economy as well as the medical field.

Data scientists: IT's new rock stars

## (Preliminary) Lecturing Schedule

Lecture: Every Wednesday, 18-20, B017

Tutorial: Every Thursday, 18-20, B017

Date      Topic Lecturer
20.04. DS Introduction Claudia
21.04. Exercise (Python Tutorial) Claudia
27.04. Statistics Foundations (Prob. theory, Descriptive Statistics and Inferential statistics) Claudia
28.04. Exercise (Tutorial Probability Distributions & Assignment 1) Claudia
04.05. Statistics (Inferential statistics, Confidence Intervals) Assignment 2 Claudia
11.05. Bayesian Statistics and/or simple nonparametric methods (slides notebook) Assignment 3 Philipp
12.05. Exercise (slides notebook) Christoph
25.05. Hypothesis Testing Assignment 4 Claudia
01.06 Hypothesis Tests & Power Computation Assignment 5 Claudia
02.06. Exercise Christoph
08.06. Nonparametric Statistics Assignment 6 Claudia
09.06. Exercise (slides notebook) Christoph
15.06. Exploring Relationships (Correlations and Linear Regression) Assignment 7 Claudia
16.06. Exercise (slides notebook) Christoph
22.06. Regression Analysis Assignment 8 Claudia
23.06. Exercise (slides notebook bonus:MLE linear regress.) Christoph
29.06. Causality Assignment 9 Claudia
30.06. Exercise Christoph
06.07. Graphical Models and Topic Models Assignment 10 Christoph
07.07. Exercise Christoph
13.07. Statistics on Networks Assignment 11 Fariba
14.07. Exercise (slides, notebook) Christoph
20.07. Distributed Computing Frameworks Christoph
21.07. Exercise (slides notebook) / Questions Christoph
03.08. Exam at 15:00, room E113 Claudia
19.10. Second exam, 16:00, room B017 Claudia

## Exercises

The exercises will be done in groups of two students. For taking part in the exam, solutions for all but one exercise have to be submitted. For this, each group will get an own SVN repository.

Programming will be in IPython with IPython notebooks :)

## Literature

1. Vasant Dhar. Data Science and Prediction. In: Communications of the ACM, December 2013, Vol. 56, No. 12, pp. 64-73
2. Anand Rajaraman, Jeffrey Ullman, Jure Leskovec, Mining of Massive Datasets, Cambridge University Press (free download)