# Data Science

[go to overview]#### Summer Terms 2016

**Data Science** (cf. the Wikipedia definition of data science) describes an attitude towards treating problems with a set of capabilities that is not located in any classic community, but it is a set of capabilities that cross-breed between disciplines, such as physics, biology, social sciences and economics. It uses elaborate computer science paradigms and needs a background in statistics. It feeds the new as well as the classical economy as well as the medical field.

Data scientists: IT's new rock stars

**(Preliminary) Lecturing Schedule**

**Lecture: Every Wednesday, 18-20, B017**

**Tutorial: Every Thursday, 18-20, B017**

Date |
Topic |
Lecturer |
---|---|---|

20.04. |
DS Introduction | Claudia |

21.04. |
Exercise (Python Tutorial) | Claudia |

27.04. |
Statistics Foundations (Prob. theory, Descriptive Statistics and Inferential statistics) | Claudia |

28.04. |
Exercise (Tutorial Probability Distributions & Assignment 1) | Claudia |

04.05. |
Statistics (Inferential statistics, Confidence Intervals) Assignment 2 | Claudia |

11.05. |
Bayesian Statistics and/or simple nonparametric methods (slides notebook) Assignment 3 | Philipp |

12.05. |
Exercise (slides notebook) | Christoph |

25.05. |
Hypothesis Testing Assignment 4 | Claudia |

01.06 |
Hypothesis Tests & Power Computation Assignment 5 | Claudia |

02.06. |
Exercise | Christoph |

08.06. |
Nonparametric Statistics Assignment 6 | Claudia |

09.06. |
Exercise (slides notebook) | Christoph |

15.06. |
Exploring Relationships (Correlations and Linear Regression) Assignment 7 | Claudia |

16.06. |
Exercise (slides notebook) | Christoph |

22.06. |
Regression Analysis Assignment 8 | Claudia |

23.06. |
Exercise (slides notebook bonus:MLE linear regress.) | Christoph |

29.06. |
Causality Assignment 9 | Claudia |

30.06. |
Exercise | Christoph |

06.07. |
Graphical Models and Topic Models Assignment 10 | Christoph |

07.07. |
Exercise | Christoph |

13.07. |
Statistics on Networks Assignment 11 | Fariba |

14.07. |
Exercise (slides, notebook) | Christoph |

20.07. |
Distributed Computing Frameworks | Christoph |

21.07. |
Exercise (slides notebook) / Questions | Christoph |

03.08. |
Exam at 15:00, room E113 |
Claudia |

19.10. |
Second exam, 16:00, room B017 |
Claudia |

**Exercises**

The exercises will be done in groups of two students. For taking part in the exam, solutions for all but one exercise have to be submitted. For this, each group will get an own SVN repository.

Programming will be in IPython with IPython notebooks :)

**Literature**

- Vasant Dhar. Data Science and Prediction. In: Communications of the ACM, December 2013, Vol. 56, No. 12, pp. 64-73
- Anand Rajaraman, Jeffrey Ullman, Jure Leskovec, Mining of Massive Datasets, Cambridge University Press (free download)
- Jeffrey Stanton, Introduction to Data Science (free download)
- John Hopcroft. Foundations of Data Science.
- * http://www.wolframscience.com/thebook.html
- Peter Norvig, Alon Halevy, Fernando Parreira. The unreasonable effectiveness of data. In:
*IEEE Intelligent Systems*, March/April 2009. - topicmodels.info