Information on the Lecture: Mathematics of Deep Learning (SS 2020)
General information
- Lecturer: JProf. Dr. Philipp Harms, Room 244, Ernst-Zermelo-Straße 1, philipp.harms@stochastik.uni-freiburg.de
- Assistant: Jakob Stiefel
- Short videos and slides: available on ILIAS every Tuesday night
- Discussion and further reading: Wednesdays 14:15-14:45 in our virtual meeting room.
- Exercises: Instruction sheets available on ILIAS. Solutions to be handed in every 2nd Wednesday on ILIAS. Discussion of solutions every 2nd Friday at 10:15 in our virtual meeting room.
- Virtual meeting room: Zoom meeting 916 6576 1668 or, as a backup option, BigBlueButton meeting vHarms (passwords available on ILIAS).
Instructional Development Award
- This lecture is part of an initiative led by Philipp Harms, Frank Hutter, and Thorsten Schmidt to develop a modular and interdisciplinary teaching concept in the areas of machine learning and deep learning.
- This initiative is supported by the University of Freiburg in the form an Instructional Development Award.
Overview
- Statistical learning theory: generalization and approximation error, bias-variance decomposition
- Universal approximation theorems: density of shallow neural networks in various function spaces
- Nonlinear approximation theory: dictionary learning and transfer-of-approximation results
- Hilbert's 13th problem and Kolmogorov-Arnold representation: some caveats about the choice of activation function
- Harmonic analysis: lower bounds on network approximation rates via affine systems of Banach frames
- Information theory: upper bounds on network approximation rates via binary encoders and decoders
- ReLU networks and the role of network depth: exponential as opposed to polynomial approximation rates
Slides and Videos
All lectures: slides
- Deep learning as statistical learning slides video
- Neural networks slides video
- Dictionary learning slides video
- Introduction to Dictionary Learning video
- Approximating Hölder Functions by Splines video
- Approximating Univariate Splines by Multi-Layer Perceptrons video
- Approximating Products by Multi-Layer Perceptrons video
- Approximating Multivariate Splines by Multi-Layer Perceptrons video
- Approximating Hölder Functions by Multi-Layer Perceptrons video
- Wrapup video
- Kolmogorov-Arnold representation: slides video
- Harmonic analysis: slides video
- Signal analysis: slides video
- Sparse data representation: slides video
- ReLU networks and the role of depth: slides video
Literature
Courses on deep learning
- Frank Hutter and Joschka Boedecker (Department of Computer Science, University of Freiburg): Foundations of Deep Learning. ILIAS
- Philipp C. Petersen (University of Vienna): Neural Network Theory. pdf
Effectiveness of deep learning
- Sejnowski (2020): The unreasonable effectiveness of deep learning in artificial intelligence
- Donoho (2000): High-Dimensional Data Analysis—the Curses and Blessings of Dimensionality
Statistical learning theory
- Bousquet, Boucheron, and Lugosi (2003): Introduction to statistical learning theory.
- Vapnik (1999): An overview of statistical learning theory.
Universal approximation theorems
- Hornik (1989): Multilayer Feedforward Networks are Universal Approximators
- Cybenko (1989): Approximation by superpositions of a sigmoidal function
- Hornik (1991): Approximation capabilities of multilayer feedforward networks
Nonlinear approximation theory
- Oswald (1990): On the degree of nonlinear spline approximation in Besov-Sobolev spaces
- DeVore (1998): Nonlinear approximation
Hilbert's 13th problem and Kolmogorov-Arnold representation
- Arnold (1958): On the representation of functions of several variables
- Torbjörn Hedberg: The Kolmogorov Superposition Theorem. In Shapiro (1971): Topics in Approximation Theory
- Bar-Natan (2009): Hilberts 13th problem, in full color
- Hecht-Nielsen (1987): Kolmogorov’s mapping neural network existence theorem
Harmonic analysis
- Christensen (2016): An introduction to frames and Riesz bases
- Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis
- Feichtinger Gröchenig (1988): A unified approach to atomic decompositions
- Gröchenig (2001): Foundations of Time-Frequency Analysis
- Mallat (2009): A Wavelet Tour of Signal Processing
- Kutyniok and Labate (2012): Shearlets - Multiscale Analysis for Multivariate Data
Information theory
- Bölcskei, Grohs, Kutyniok, Petersen (2017): Optimal approximation with sparsely connected deep neural networks. In: SIAM Journal on Mathematics of Data Science 1.1, pp. 8–45
- Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis. Birkhäuser.
- Donoho (2001): Sparse Components of Images and Optimal Atomic Decompositions. In: Constructive Approximation 17, pp. 353–382
- Shannon (1959): Coding Theorems for a Discrete Source with a Fidelity Criterion. In: International Convention Record 7, pp. 325–350
ReLU networks and the role of depth
- Perekrestenko, Grohs, Elbrächter, Bölcskei (2018): The universal approximation power of finite-width deep ReLU Networks. arXiv:1806.01528
- E, Wang (2018): Exponential convergence of the deep neural approximation for analytic functions. arXiv:1807.00297
- Yarotsky (2017): Error bounds for approximations with deep ReLU networks. Neural Networks 94, pp. 103–114.