Sie sind hier: Startseite Studium und Lehre Sommersemester 2020 Vorlesung Mathematics of Deep …

Information on the Lecture: Mathematics of Deep Learning (SS 2020)

General information

  • Lecturer: JProf. Dr. Philipp Harms, Room 244, Ernst-Zermelo-Straße 1, philipp.harms@stochastik.uni-freiburg.de1
  • Assistant: Jakob Stiefel
  • Short videos and slides: available on ILIAS2 every Tuesday night
  • Discussion and further reading: Wednesdays 14:15-14:45 in our virtual meeting room. 
  • Exercises: Instruction sheets available on ILIAS3. Solutions to be handed in every 2nd Wednesday on ILIAS3. Discussion of solutions every 2nd Friday at 10:15 in our virtual meeting room.
  • Virtual meeting room: Zoom meeting 916 6576 16684 or, as a backup option, BigBlueButton meeting vHarms5 (passwords available on ILIAS2). 

Instructional Development Award

  • This lecture is part of an initiative led by Philipp Harms, Frank Hutter, and Thorsten Schmidt6 to develop a modular and interdisciplinary teaching concept in the areas of machine learning and deep learning.
  • This initiative is supported by the University of Freiburg in the form an Instructional Development Award7

Overview

  • Statistical learning theory: generalization and approximation error, bias-variance decomposition
  • Universal approximation theorems: density of shallow neural networks in various function spaces
  • Nonlinear approximation theory: dictionary learning and transfer-of-approximation results
  • Hilbert's 13th problem and Kolmogorov-Arnold representation: some caveats about the choice of activation function
  • Harmonic analysis: lower bounds on network approximation rates via affine systems of Banach frames
  • Information theory: upper bounds on network approximation rates via binary encoders and decoders
  • ReLU networks and the role of network depth: exponential as opposed to polynomial approximation rates 

Slides and Videos

All lectures: slides8

  1. Deep learning as statistical learning slides9 video10
    1. Motivation for Deep Learning  video11 
    2. Introduction to Statistical Learning video12 
    3. Empirical risk minimization and related algorithms video13 
    4. Error decompositions video14 
    5. Error trade-offs video15 
    6. Error bounds video16 
    7. Organizational Issues video17 
    8. Wrapup video18
  2. Neural networks slides19 video20 
    1. Multilayer Perceptrons video21 
    2. A Brief History of Deep Learning video22 
    3. Deep Learning as Representation Learning video23 
    4. Definition of Neural Networks video24 
    5. Operations on Neural Networks video25 
    6. Universality of Neural Networks video26 
    7. Discriminatory Activation Functions video27 
    8. Wrapup video28
  3. Dictionary learning slides29 video30
    1. Introduction to Dictionary Learning video31 
    2. Approximating Hölder Functions by Splines video32 
    3. Approximating Univariate Splines by Multi-Layer Perceptrons video33 
    4. Approximating Products by Multi-Layer Perceptrons video34 
    5. Approximating Multivariate Splines by Multi-Layer Perceptrons video35 
    6. Approximating Hölder Functions by Multi-Layer Perceptrons video36 
    7. Wrapup video37
  4. Kolmogorov-Arnold representation: slides38 video39
    1. Hilbert’s 13th Problem video40 
    2. Kolmogorov–Arnold Representation  video41 
    3. Approximate Hashing for Specific Functions video42 
    4. Approximate Hashing for Generic Functions video43 
    5. Proof of the Kolmogorov–Arnold Theorem video44 
    6. Approximation by Networks of Bounded Size video45 
    7. Wrapup video46 
  5. Harmonic analysis: slides47 video48
    1. Banach frames video49
    2. Group representations video50
    3. Signal representations video51
    4. Regular Coorbit Spaces video52
    5. Duals of Coorbit Spaces video53
    6. General Coorbit Spaces video54
    7. Discretization video55
    8. Wrapup video56
  6. Signal analysis: slides57 video58
    1. Coorbit Theory, Signal Analysis, and Deep Learning video59 
    2. Heisenberg Group video60
    3. Modulation Spaces video61
    4. Affine Group video62
    5. Wavelet Spaces video63
    6. Shearlet Group video64
    7. Shearlet Coorbit Spaces video65
    8. Wrapup video66
  7. Sparse data representation: slides67 video68
    1. Rate-Distortion Theory video69
    2. Hypercube Embeddings and Ball Coverings video70
    3. Dictionaries as Encoders video71
    4. Frames as Dictionaries video72
    5. Networks as Encoders video73
    6. Dictionaries as Networks video74
    7. Wrapup video75
  8. ReLU networks and the role of depth: slides76 video77 
    1. Operations on ReLU Networks video78
    2. ReLU Representation of Saw-Tooth Functions video79
    3. Saw-Tooth Approximation of the Square Function video80
    4. ReLU Approximation of Multiplication video81
    5. ReLU Approximation of Analytic Functions video82
    6. Wrapup video83

Literature

Courses on deep learning

  • Frank Hutter and Joschka Boedecker (Department of Computer Science, University of Freiburg): Foundations of Deep Learning. ILIAS84
  • Philipp C. Petersen (University of Vienna): Neural Network Theory. pdf85

Effectiveness of deep learning

  • Sejnowski (2020): The unreasonable effectiveness of deep learning in artificial intelligence
  • Donoho (2000): High-Dimensional Data Analysis—the Curses and Blessings of Dimensionality

Statistical learning theory

  • Bousquet, Boucheron, and Lugosi (2003): Introduction to statistical learning theory.
  • Vapnik (1999): An overview of statistical learning theory.

Universal approximation theorems

  • Hornik (1989): Multilayer Feedforward Networks are Universal Approximators
  • Cybenko (1989): Approximation by superpositions of a sigmoidal function
  • Hornik (1991): Approximation capabilities of multilayer feedforward networks

Nonlinear approximation theory

  • Oswald (1990): On the degree of nonlinear spline approximation in Besov-Sobolev spaces
  • DeVore (1998): Nonlinear approximation

Hilbert's 13th problem and Kolmogorov-Arnold representation

  • Arnold (1958): On the representation of functions of several variables
  • Torbjörn Hedberg: The Kolmogorov Superposition Theorem. In Shapiro (1971): Topics in Approximation Theory
  • Bar-Natan (2009): Hilberts 13th problem, in full color
  • Hecht-Nielsen (1987): Kolmogorov’s mapping neural network existence theorem

Harmonic analysis

  • Christensen (2016): An introduction to frames and Riesz bases
  • Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis
  • Feichtinger Gröchenig (1988): A unified approach to atomic decompositions
  • Gröchenig (2001): Foundations of Time-Frequency Analysis
  • Mallat (2009): A Wavelet Tour of Signal Processing
  • Kutyniok and Labate (2012): Shearlets - Multiscale Analysis for Multivariate Data

Information theory

  • Bölcskei, Grohs, Kutyniok, Petersen (2017): Optimal approximation with sparsely connected deep neural networks. In: SIAM Journal on Mathematics of Data Science 1.1, pp. 8–45
  • Dahlke, De Mari, Grohs, Labatte (2015): Harmonic and Applied Analysis. Birkhäuser.
  • Donoho (2001): Sparse Components of Images and Optimal Atomic Decompositions. In: Constructive Approximation 17, pp. 353–382
  • Shannon (1959): Coding Theorems for a Discrete Source with a Fidelity Criterion. In: International Convention Record 7, pp. 325–350

ReLU networks and the role of depth

  • Perekrestenko, Grohs, Elbrächter, Bölcskei (2018): The universal approximation power of finite-width deep ReLU Networks. arXiv:1806.01528
  • E, Wang (2018): Exponential convergence of the deep neural approximation for analytic functions. arXiv:1807.00297
  • Yarotsky (2017): Error bounds for approximations with deep ReLU networks. Neural Networks 94, pp. 103–114.