Statistical Learning Theory


  • Learning with a finite dictionary

Recall from the end of last lecture our setup: We are working with a finite dictionary
H = {h1, . . . , hM } of estimators, and we would like to understand the scaling of this problem
with respect to M and the sample size n. Given H, one idea is to simply try to minimize
ˆ the empirical risk based on the samples, and so we define the empirical risk minimizer, h
erm,

by

hˆerm ∈ ˆ argmin Rn(h).
h∈H
ˆ ˆ In what follows, we will simply write h instead of h

erm when possible. Also recall the
definition of the oracle, h ̄, which (somehow) minimizes the true risk and is defined by

h ̄ ∈ argmin R(h).
h∈H

The following theorem shows that, although hˆ  ̄ cannot hope to do better than h in
general, the difference should not be too large as long as the sample size is not too small
compared to M.

Kommentare

Beliebte Posts aus diesem Blog

Akka HTTP Client on Steroids