发帖

楼主: oliyiyi

987 0

Troubleshooting Neural Networks [推广有奖]

1关注
185
粉丝

版主

已卖：2998份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库 其他...

计量文库

0%

威望: 7 级
论坛币: -15675 个
通用积分: 31675.1336
学术水平: 1454 点
热心指数: 1573 点
信用等级: 1364 点
经验: 384134 点
帖子: 9629
精华: 66
在线时间: 5508 小时
注册时间: 2007-5-21
最后登录: 2025-7-8

楼主

oliyiyi 发表于 2016-7-5 11:01:42 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

There are many possible reasons that could explain this problem. There could be a technical explanation -- we implemented backpropagation incorrectly -- or, we chose a learning rate that was too high, which in turn let to the problem that we were overshooting the local minima of the cost function.

Gradient Checking

The first thing I would always do is implementing "gradient checking" to make sure that the implementation is correct. Gradient checking is very easy to implement, and it is a good first diagnostic; here, we just compare the analytical solution to a numerically approximated gradient.

(Note that ε is just a small number around 1e-5 or so.)

Even better yet is to use the 2-point solution with +/- ε

Then, we compare this numerically approximated gradient to our analytical gradient:

Depending on the complexity of our network architecture, we could come up with some criteria like this:

Relative error <= 1e-7: everything is okay!
Relative error <= 1e-4: the condition is problematic, and we should look into it.
Relative error > 1e-4: there is probably something wrong in our code

Scaling and Shuffling

Next, we want to check if the data has been scaled appropriately. E.g., if we use stochastic gradient descent and initialized our weights to small random numbers around zero, let's make sure that the features are standardized accordingly (mean = 0 and std deviation=1, which are the properties of a standard normal distribution).

Also, let's make sure that we are shuffling the training set prior to every pass over the training set to avoid cycles in stochastic gradient descent.

Learning Rate

Eventually, we want to look at the learning rate itself. If the calculated cost increases over time, this could simply mean that we are constantly overshooting the local minima. Besides lowering the learning rate, there are a few tricks that I often add to my implementation:

a decrease constant d for an adaptive learning rate; in adaptive learning, we shrink learning rate η over time: η / [1 + t * d ], where t is the time step
a momentum factor for faster initial learning based on the previous gradient

Bio: Sebastian Raschka is a 'Data Scientist' and Machine Learning enthusiast with a big passion for Python & open source. Author of 'Python Machine Learning'. Michigan State University.

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Troubles Networks trouble network Neural technical learning gradient possible correct

Troubleshooting Neural Networks [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

Troubleshooting Neural Networks [推广有奖]

经管之家送您一份

经管之家联合CDA

感谢您参与论坛问题回答

扫码加我 拉你入群

相关帖子

浏览过的帖子

浏览过的版块

初级学术勋章

初级热心勋章

初级信用勋章

中级信用勋章

中级学术勋章

中级热心勋章

高级热心勋章

高级学术勋章

高级信用勋章

特级热心勋章

特级学术勋章

特级信用勋章

本版微信群

扫码加我拉你入群