### Рожденный в москве;pdf

```STAT 241B / CS 281B HOMEWORK #3 SOLUTIONS
(1)
(a) This problem follows by setting φ = 2 in part (b).
(b) Let E be the number of epohs, so the algorithm is run for stages of length
φ0 , φ1 , . . . , φE−1. For a given T , we need (we assume here that φ is integer,
whih will turn out to be the optimal thing to do below)
E−1
X
φi =
i=0
φE − 1
≥ T,
φ−1
whih implies that we should take E = ⌈logφ (T (φ − 1) + 1)⌉. It sues
φ) + 1 ≥ logφ (T (φ − 1) + 1) + 1. In eah epoh i, the
to take E = logφ (T
p
foreaster suers φi−1 log(K)/2 regret; adding these up gives us
R(T ) ≤
p
log(K)/2
E−1
Xp
φi
i=0
E/2
φ
−1
log(K)/2 √
φ−1
p
p
T φ2 − 1
≤ log(K)/2 √
φ−1
r
1
φ
≤
T log(K) √
2
φ−1
r
√
1
φ( φ + 1)
=
T log(K)
2
φ−1
=
(2)
p
By setting the φ-derivative to zero, we nd that this bound is minimized for
φ = 4.
Why FTL fails
(a) Note that this ase is overed by part (b), but more intuition may be gleaned
from working out the FTL ase expliitly. Divide the time steps into epoh
E1 , E2 , . . . where Ej = {K(j − 1) + 1, . . . , Kj}. The adversary has a very
simple strategy: play ℓt = ekt . During the start of the rst epoh, the learner
arbitrarily piks an ation and inurs loss 1. We now have L1 = ek1 , so the
learning will pik any ation exeptPfor k1 . Proeeding by indution, we see
that for all t < K , we have Lt = t−1
i=1 eki is equal to the indiator of the
previous played ations whih fores the learner to pik kt to be something
that has not been played in this epoh. This ontinues until the last ation
is piked at round t = K and Lk is the onstant 1 vetor. Epoh E2 behaves
exatly the same as E1 (sine subtrating a onstant from the loss funtion
does not hange the FTL preditions). For t ∈ Ej , we have Lt = t and
mink Lkt ≥ ⌈T /K⌉, yielding a T K−1
regret.
K
(b) The adversary uses the same strategy as before: ℓt = ekt . Thus, Lt = t. We
must have mink1 ,...,kt mink Lkt ≤ ⌊K/T ⌋ where the rst min is ahieved by
1
2
STAT 241B / CS 281B HOMEWORK #3 SOLUTIONS
the sequene of kt s that spread mass uniformly aross all the ations (as in
the previous part). If the kt s are not uniform, then there must be an expert
with smaller than ⌊K/T ⌋ loss whih only inreases the regret.
(3)
First, note that one an ation has positive loss,
there is no reason to play that ation; the learner knows that a better ation
must exist. Let Et = {k : Lkt−1 = 0}; i.e. the set of perfet experts at time t.
The strategy that obtains ln K regret plays weights wt uniform on Et . Now if
the adversary deides to give loss to an expert, she may as well give him the
maximal loss 1. Furthermore, if she deides to give loss to multiple experts, the
loss (and hene regret) of the learner is higher if she hands out these losses in
separate trials. It follows that in the worst ase, the learner inurs loss 1/|Et |
eah round, and the game essentially terminates when |Et | = 1. Now
An espeially luky ase
K−1
X
t=1
1
≤ ln(K).
K
Sine mink Tt=1 ℓt,k = 0, we have that RT ≤ ln(K). (Note that here the natural
log is essential.)
j
(4)
For some expert k, we have that k = argminj (Lj + −X
i
η
Xj ≤ Xk +η(Lj −Lk ). Sine the Gumbel has density PX (x) = exp(−x+exp(−x)),
we an write
P
FPL and hedge
P
−Xj
)
k = argminj (Lj +
η
=
Y
P {Xj ≤ Xk + η(Lj − Lk )}
j6=k
∞
=
Z
e−x e−e
−x
−∞
Y
e−e
−x−η(Lj −Lk )
dx.
j6=k
Now, using the substitution u = x − ηLk ,
Z ∞
Y
−u−ηLj
−Xj
−u−ηLk
) =
e−u−ηLk e−e
e−e
du
P k = argminj (Lj +
η
−∞
j6=k
−ηLk
=e
Z
∞
e−u
−∞
−ηLk
=e
(5)
C
K
Y
e−e
−u−ηLj
du
j=1
n
where C does not depend on k. Hene, P k = argminj (Lj +
(NML, SNML, KT)
−Xj
)
η
o
∝ e−ηLk .
(a) In this problem, we denote the sequene x1 , . . . , xt by xt . The probability
of sequene xT is
pθ (xT ) = θ
PT
t=1
xt
(1 − θ)T −
PT
t=1
xt
,
STAT 241B / CS 281B HOMEWORK #3 SOLUTIONS
3
and
it is elementary to show that this probability is maximized for θ =
PT
t=1 xt
. Hene,
T
P
P P xt P T − xt
xt
xt
1− T
T
T
P
P
PN M L (x ) =
P yt P T − yt
P
yt
yt
1− T
y t ∈{0,1}T
T
P
P
P xt P T − xt
xt
xt
1− T
T
= PT
T −n
T
n n
1 − Tn
n=0 n
T
P
P
P P
( xt ) xt (T − xt )T − xt
= PT
n
T −n
T
n=0 n (n) (T − n)
sine there are Tn sequenes with yt = n and they all have the same
probability.
Bayes
rule gives us PN M L (xt |xt−1 ) = PN M L (xt )/PN M L (xt−1 ). Let k =
Pt
i=1 xt . Then
PN M L (xt ) =
1
CT
P
X
k+
T
X
xn
n=t+1
(xt+1 ,...,xT ∈{0,1}T −t
!k+PTn=t+1 xn
T −k−
T −t 1 X T −t
(k + n)k+n (T − k − n)T −k−n
=
n
CT n=0
where CT =
T
X
n=t+1
xn
!T −k−PTn=t+1 xn
Pt−1
n
T −n
(n)
(T
−
n)
.
We
then
have,
for
k
=
n=0
i=1 xt ,
PT −t T −t
(xt + k + n)xt +k+n (T − k − n − xt )T −k−n−xt
n=0
t−1
n
.
PN M L (xt |x ) =
PT −t+1 T −t+1
k+n
T −k−n
(k
+
n)
(T
−
k
−
n)
n=0
n
P
(b) Using the NML expression from the previous part with k = t−1
i=1 xt ,
PT
T
n
PSN M L (xt |xt−1 ) = P
(xt + k)xt +k (t − k − xt )t−k−xt
xt +k
(t − k − xt )t−k−xt
xt ={0,1} (xt + k)
.
() Reall that the NML has the equalizer property; that is, every data sequene
has the same regret: the log of the partition funtion
RN M L (T ) = log
T X
T
n=0
n
(n)n (T − n)T −n
!
− T log T.
To nd the worst-ase regret for NML, we need to alulate the above for
eah T .
To alulate the worst ase regret for SNML and KT, we need to alulate
MP (n0 , n1 ) for every n0 , n1 suh that n0 + n1 ≤ T. We use the indution
MP (n0 , n1 ) = max{MP (n0 −1, n1 )−log P (0|n0−1, n1 ), MP (n0 , n1 −1)−log P (1|n0, n1 −1)}.
For both SNML and KT, the base ase is MP (1, 0) = MP (0, 1) = − log(1/2).
4
STAT 241B / CS 281B HOMEWORK #3 SOLUTIONS
Worst case regret
3.5
3
regret
2.5
2
KT
SNML
NML
1.5
1
0.5
0
20
40
60
80
100
T
120
140
160
180
200
The worst-ase regret for three online binary predition algorithms as a funtion of time-horizon
Figure 0.1.
One MP (n0 , n1 ) is alulated, we alulate the loss of the best expert for
n0 , n1 . This is simply
L∗ (n0 , n1 ) = − log nn1 1 nn0 0 (n0 + n1 )−n0 −n1
sine the minimum of Pθ (xT ) is at θ = n1 /(n0 + n1 ). To nd the worst-ase
regret for time horizon T , we take
Rworst (T ) =
min MP (n0 , n1 ) − L∗ (n0 , n1 ).
n0 +n1 =T
See gure 5b for the results. While NML is the best (in lass, we proved the
optimality of NML), the other two algorithms are not too far behind.
```