Statistical Methods for Data Science

Instructor: Prof. Panayiotis Bozanis
Teaching Hours and Credit Allocation: 30+10 hours, 6 credits
Course Assessment: Exam & Coursework

 

Aims

The course examines the mathematical and statistical foundations of Data Science and presents the most commonly used statistical methods in the field. The students will gain the necessary conceptual understanding of statistical methods as they are used to analyze and interpret massive data sets as well as extract meaningful conclusions out of them. The course will provide the students with a solid theoretical background and a collection of techniques which can be applied to a wide range of real world problems.

 

Learning Outcomes

On completing the course, students will be able to:

  • Understand the basic concepts of probability theory and statistics as they are applied in data science.
  • Apply mathematical tools, models and methods to data analysis tasks, such as data fitting, regression, sampling, hypothesis testing etc.
  • Learn the fundamentals of statistical inference and its implementations.
  • Use modern software suites for data analysis, processing and visualization and develop new software tools.

 

Content

  • Introduction to probability theory.
  • Random variables (univariate, multivariate).
  • Random sampling.
  • Hypothesis testing.
  • Linear regression.
  • Bayesian approach in statistics.
  • Software tools for statistical and data analysis.

 

Reading

  • Anderson D.R., Sweeney D.J., Williams T.A., Camm J.D., Cochran J.J, Fry M.J., Ohlmann, J.W. (2020). Statistics for Business & Economics, Cengate, 14th edition
  • Heumann C., Schomaker M. Shalabh (2016). Introduction to Statistics and Data Analysis: With Exercises, Solutions and Applications in R, Springer.
  • McClave J.T., Benson P.J., Sincich T. (2018) Statistics for Business & Economics, Pearson, 13th edition
  • Stinerock R (2018). Statistics with R: A Beginner's Guide, Sage Publishing