On the Proof of Global Convergence of Gradient Descent for
Deep ReLU Networks with Linear Widths
Quynh Nguyen 1
Abstract training data, then the output at layer l is given by
X
l = 0,
We give a simple proof for the global conver- Fl = σ(Fl1 Wl ) l ∈ [L 1], (1)
...


雷达卡




京公网安备 11010802022788号







