Mualtivariate Gaussian distribution. Graphical models with hints in high dimentional settings (lasso-type). Principal components and Factor analysis. Linear and Quadratic discriminant analysis. Supervised learning via CART, boosting, random forest, super learner, BART.
Class notes and slides available on Moodle platform.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). The elements of statistical learning. Second edition. Springer, Berlin: Springer series in statistics.
Giudici, P. (2005). Applied data mining: statistical methods for business and industry. John Wiley & Sons.
The course concerns theory and applications of methods and models to multivariate and high dimensional data. In particular, the course includes topics on classical multivariate analysis, data mining and statistical learnings. Labs in R will integrate the course.
Students attending the 6 CFU course, have to agree upon the topics in their reduced syllabus, covering 2/3 of the 9CFU syllabus.
Type of Assessment
The exam consists in two projects:
1) a group project, presented with slides at a contest among the student groups (30% of the final score)
2) an individual project, with a written report (30% of the final score), presented at a student seminar (40% of the final score).
The slides, reports and codes must be sent to the instructor before each seminar.
Course program
1. Multivariate Gaussian distribution: Bivariate and multivariate distribution; marginal and conditional distributions; Correlation and marginal/conditional independence; Inference on the parameters of a Multivariate Gaussian distribution 2. Introduction to graphical models Graphs and conditional independence properties Undirected graphs (networks / Markov random fields) Markov properties and factorization Gaussian graphical models Log-linear graphical models Directed Graphs (Bayesian networks / DAGs) Markov properties and factorization Learning Basics of Chain Graphs Markov properties and factorization 3. Principal components analysis Notation Definition and properties of PCA Interpretation of PCA 4. Introduction to statistical learning Statistical learning versus Machine learning Supervised and Unsupervised Learning Regression vs Classification Accuracy measures Bias-Variability Trade-off Resampling and cross-validation 5. Linear Model Selection and Regularization Subset Selection Shrinkage Methods Ridge Regression Lasso and Elastic net 6. Tree-Based Methods Basics of DecisionTrees RegressionTrees ClassificationTrees Bagging and Boosting Random Forests BART 7. Super learner for regression and classification 5. Factor analysis Introduction to exploratory factor Rotation of axes Interpretation of the factorial axes Outline of confirmatory factor analysis 6. Discriminant analysis Introduction to discriminant analysis Maximum likelihood estimator Linear discriminant analysis - Fisher's approach Confusion matrix 7. Cluster Analysis Introduction to the problem of classification Distances and metrics Hierarchical and nonhierarchical methods Probabilistic and fuzzy methods