梯度下降法拟合模型
前面的内容都是通过最小化成本函数来计算参数的:
这里 是解释变量矩阵,当变量很多(上万个)的时候, 计算量会非常大。另外,如果 的
行列式为0,即奇异矩阵,那么就无法求逆矩阵了。这里我们介绍另一种参数估计的方法,梯度下降
法(gradient descent)。拟合的目标并没有变,我们还是用成本函数最小化来进行参数估计。
梯度下降法被比喻成一种方法,一个人蒙着眼睛去找从山坡到溪谷最深处的路。他看不到地形图,所
以只能沿着最陡峭的方向一步一步往前走。每一步的大小与地势陡峭的程度成正比。如果地势很陡
峭,他就走一大步,因为他相信他仍在高出,还没有错过溪谷的最低点。如果地势比较平坦,他就走
一小步。这时如果再走大步,可能会与最低点失之交臂。如果真那样,他就需要改变方向,重新朝着
溪谷的最低点前进。他就这样一步一步的走啊走,直到有一个点走不动了,因为路是平的了,于是他
卸下眼罩,已经到了谷底深处,小龙女在等他。
Gradient descent is sometimes described by the analogy of a blindfolded man who
is trying to find his way from somewhere on a mountainside to the lowest point of
the valley. He cannot see the topography, so he takes a step in the direction with the
steepest decline. He then takes another step, again in the direction with the steepest
decline. The sizes of his steps are proportional to the steepness of the terrain at his
current position. He takes big steps when the terrain is steep, as he is confident that he
is still near the peak and that he will not overshoot the valley's lowest point. The man
takes smaller steps as the terrain becomes less steep. If he were to continue taking large
steps, he may accidentally step over the valley's lowest point. He would then need
to change direction and step toward the lowest point of the valley again. By taking
decreasingly large steps, he can avoid stepping back and forth over the valley's lowest
point. The blindfolded man continues to walk until he cannot take a step that will
decrease his altitude; at this point, he has found the bottom of the valley.
查看了原文,没有小龙女啊