[学习笔记] [学习笔记] Fundamentals of Deep Learning [推广有奖]

1关注
3粉丝

已卖：70份资源

学科带头人

54%

还不是VIP/贵宾

威望: 0 级
论坛币: 13005 个
通用积分: 409.9229
学术水平: 109 点
热心指数: 112 点
信用等级: 103 点
经验: 71218 点
帖子: 1079
精华: 0
在线时间: 1538 小时
注册时间: 2016-7-19
最后登录: 2024-6-8

楼主

liuxf666 发表于 2019-3-15 10:52:09 |AI写论文

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Eventually, the purpose of choosing different loss functions is to get
1. A smoothed partial derivative with respect to weight
2. A good convex curve, to achieve global minimum. However, a lot of other factors come into play while finding a global minimum (learning rate, shape of function, etc.).

For backpropagation to work, two basic assumptions are taken regarding the Error function.
1. Total error can be written as a summation of individual error of training samples/minibatch,
E = sum(Ex)
2. Error can be written as a function of outputs of the network

Backpropagation consists of two parts:
1. Forward pass, wherein we initialize the weights and make a feedforward network to store all the values
2. Backward pass, which is performed to have the stored values update the weights

Partial derivatives, chain rules, and linear algebra are the main tools required to deal with backpropagation.

Note: Using CBOW over smaller datasets results in smoothening of the distributional information, as the model treats the entire context as a single observation.

Subsampling Frequent Words
The subsampling rate makes the key decision on whether to keep the frequent words. (Distribution of the Survival function, P(x) = {(sqrt(x/0.001) + 1) * (0.001/x)} for a constant value of 0.001 for sampling rate (Credits : http://www.mccormickml.com))

Negative Sampling
Negative sampling is a simplified form of the noise contrastive estimation (NCE) approach, as it makes certain assumptions while selecting the count of the noise, or negative, samples, and their distribution.