Information on the Lecture: Mathematics of Deep Learning (SS 2020)

General information

Lecturer: JProf. Dr. Philipp Harms, Room 244, Ernst-Zermelo-Straße 1, philipp.harms@stochastik.uni-freiburg.de
Assistant: Jakob Stiefel
Short videos and slides: available on ILIAS every Tuesday night
Discussion and further reading: Wednesdays 14:15-14:45 in our virtual meeting room.
Exercises: Instruction sheets available on ILIAS. Solutions to be handed in every 2nd Wednesday on ILIAS. Discussion of solutions every 2nd Friday at 10:15 in our virtual meeting room.
Virtual meeting room: Zoom meeting 916 6576 1668 or, as a backup option, BigBlueButton meeting vHarms (passwords available on ILIAS).

Instructional Development Award

This lecture is part of an initiative led by Philipp Harms, Frank Hutter, and Thorsten Schmidt to develop a modular and interdisciplinary teaching concept in the areas of machine learning and deep learning.
This initiative is supported by the University of Freiburg in the form an Instructional Development Award.

Overview

Statistical learning theory: generalization and approximation error, bias-variance decomposition
Universal approximation theorems: density of shallow neural networks in various function spaces
Nonlinear approximation theory: dictionary learning and transfer-of-approximation results
Hilbert's 13th problem and Kolmogorov-Arnold representation: some caveats about the choice of activation function
Harmonic analysis: lower bounds on network approximation rates via affine systems of Banach frames
Information theory: upper bounds on network approximation rates via binary encoders and decoders
ReLU networks and the role of network depth: exponential as opposed to polynomial approximation rates

Slides and Videos

All lectures: slides

Deep learning as statistical learning slides video
1. Motivation for Deep Learning video
2. Introduction to Statistical Learning video
3. Empirical risk minimization and related algorithms video
4. Error decompositions video
5. Error trade-offs video
6. Error bounds video
7. Organizational Issues video
8. Wrapup video
Neural networks slides video
1. Multilayer Perceptrons video
2. A Brief History of Deep Learning video
3. Deep Learning as Representation Learning video
4. Definition of Neural Networks video
5. Operations on Neural Networks video
6. Universality of Neural Networks video
7. Discriminatory Activation Functions video
8. Wrapup video
Dictionary learning slides video
1. Introduction to Dictionary Learning video
2. Approximating Hölder Functions by Splines video
3. Approximating Univariate Splines by Multi-Layer Perceptrons video
4. Approximating Products by Multi-Layer Perceptrons video
5. Approximating Multivariate Splines by Multi-Layer Perceptrons video
6. Approximating Hölder Functions by Multi-Layer Perceptrons video
7. Wrapup video
Kolmogorov-Arnold representation: slides video
1. Hilbert’s 13th Problem video
2. Kolmogorov–Arnold Representation video
3. Approximate Hashing for Specific Functions video
4. Approximate Hashing for Generic Functions video
5. Proof of the Kolmogorov–Arnold Theorem video
6. Approximation by Networks of Bounded Size video
7. Wrapup video
Harmonic analysis: slides video
1. Banach frames video
2. Group representations video
3. Signal representations video
4. Regular Coorbit Spaces video
5. Duals of Coorbit Spaces video
6. General Coorbit Spaces video
7. Discretization video
8. Wrapup video
Signal analysis: slides video
1. Coorbit Theory, Signal Analysis, and Deep Learning video
2. Heisenberg Group video
3. Modulation Spaces video
4. Affine Group video
5. Wavelet Spaces video
6. Shearlet Group video
7. Shearlet Coorbit Spaces video
8. Wrapup video
Sparse data representation: slides video
1. Rate-Distortion Theory video
2. Hypercube Embeddings and Ball Coverings video
3. Dictionaries as Encoders video
4. Frames as Dictionaries video
5. Networks as Encoders video
6. Dictionaries as Networks video
7. Wrapup video
ReLU networks and the role of depth: slides video
1. Operations on ReLU Networks video
2. ReLU Representation of Saw-Tooth Functions video
3. Saw-Tooth Approximation of the Square Function video
4. ReLU Approximation of Multiplication video
5. ReLU Approximation of Analytic Functions video
6. Wrapup video

Literature

Courses on deep learning

Frank Hutter and Joschka Boedecker (Department of Computer Science, University of Freiburg): Foundations of Deep Learning. ILIAS
Philipp C. Petersen (University of Vienna): Neural Network Theory. pdf

Effectiveness of deep learning

Sejnowski (2020): The unreasonable effectiveness of deep learning in artificial intelligence
Donoho (2000): High-Dimensional Data Analysis—the Curses and Blessings of Dimensionality

Statistical learning theory

Bousquet, Boucheron, and Lugosi (2003): Introduction to statistical learning theory.
Vapnik (1999): An overview of statistical learning theory.

Universal approximation theorems

Hornik (1989): Multilayer Feedforward Networks are Universal Approximators
Cybenko (1989): Approximation by superpositions of a sigmoidal function
Hornik (1991): Approximation capabilities of multilayer feedforward networks

Nonlinear approximation theory

Oswald (1990): On the degree of nonlinear spline approximation in Besov-Sobolev spaces
DeVore (1998): Nonlinear approximation

Hilbert's 13th problem and Kolmogorov-Arnold representation

Arnold (1958): On the representation of functions of several variables
Torbjörn Hedberg: The Kolmogorov Superposition Theorem. In Shapiro (1971): Topics in Approximation Theory
Bar-Natan (2009): Hilberts 13th problem, in full color
Hecht-Nielsen (1987): Kolmogorov’s mapping neural network existence theorem

Harmonic analysis

Christensen (2016): An introduction to frames and Riesz bases
Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis
Feichtinger Gröchenig (1988): A unified approach to atomic decompositions
Gröchenig (2001): Foundations of Time-Frequency Analysis
Mallat (2009): A Wavelet Tour of Signal Processing
Kutyniok and Labate (2012): Shearlets - Multiscale Analysis for Multivariate Data

Information theory

Bölcskei, Grohs, Kutyniok, Petersen (2017): Optimal approximation with sparsely connected deep neural networks. In: SIAM Journal on Mathematics of Data Science 1.1, pp. 8–45
Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis. Birkhäuser.
Donoho (2001): Sparse Components of Images and Optimal Atomic Decompositions. In: Constructive Approximation 17, pp. 353–382
Shannon (1959): Coding Theorems for a Discrete Source with a Fidelity Criterion. In: International Convention Record 7, pp. 325–350

ReLU networks and the role of depth

Perekrestenko, Grohs, Elbrächter, Bölcskei (2018): The universal approximation power of finite-width deep ReLU Networks. arXiv:1806.01528
E, Wang (2018): Exponential convergence of the deep neural approximation for analytic functions. arXiv:1807.00297
Yarotsky (2017): Error bounds for approximations with deep ReLU networks. Neural Networks 94, pp. 103–114.