A Statistical Perspective on Distillation
Aditya Krishna Menon 1 Ankit Singh Rawat 1 Sashank J. Reddi 1 Seungyeon Kim 1 Sanjiv Kumar 1
Abstract et al., 2019). One commonly accepted intuition from Hinton
Knowledge distillation is a technique for improv- et al. (2015) is that the teacher’s soft labels provide “dark
ing a “student” model by replacing its one-hot knowledge” via weights on the “wrong” labels y 0 6= y for an
training labels w ...


雷达卡




京公网安备 11010802022788号







