签到
- 苹果/安卓/wp
- 苹果/安卓/wp
客户端
0.0

0.00

人大经济论坛 › 论坛 › 计量经济学与统计论坛五区 › 计量经济学与统计软件 › winbugs及其他软件专版 › A Nonsensical Language Model using Theano LSTM

CDA数据分析研究院

商业数据分析与大数据领航教育品牌



经管云课堂

经管/金融/财会/社科/名师公开课



学术培训

Stata 空间计量 SSCI Python

贵宾：通行论坛特权+数据库权限
+案例库+下载特权 VIP：论坛特权+更多下载次数
+ccerdata数据库+更高阅读权限+……

提升主题| 本版置顶| 关闭主题| 变更主题颜色| 抢沙发| 顶贴| 显身卡| 道具中心

楼主: ReneeBK

947 8

A Nonsensical Language Model using Theano LSTM [推广有奖]

1关注
62粉丝

学术权威

14%

还不是VIP/贵宾

-

TA的文库 其他...

Panel Data Analysis

Experimental Design

0%

威望: 1 级
论坛币: 49492 个
通用积分: 53.3854
学术水平: 370 点
热心指数: 273 点
信用等级: 335 点
经验: 57815 点
帖子: 4006
精华: 21
在线时间: 582 小时
注册时间: 2005-5-8
最后登录: 2023-11-26

楼主

ReneeBK 发表于 2017-9-11 02:45:39 |只看作者 |坛友微信交流群|倒序 |AI写论文

相似文件

换一批

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

Today we will train a nonsensical language model !
We will first collect some language data, convert it to numbers, and then feed it to a recurrent neural network and ask it to predict upcoming words. When we are done we will have a machine that can generate sentences from our made-up language ad-infinitum !

复制代码

本帖隐藏的内容

A Nonsensical Language Model using Theano LSTM.pdf (546.33 KB)

二维码

扫码加我拉你入群

请注明：姓名-公司-职位

以便审核进群资格，未注明则拒绝

分享0 收藏0 回帖

关键词：Language Theano model Using lang

相关帖子

回复

使用道具举报

沙发

ReneeBK 发表于 2017-9-11 02:47:07 |只看作者 |坛友微信交流群

def generate_nonsense(word = ""):
if word.endswith("."):
return word
else:
if len(word) > 0:
word += " "
word += samplers["stop"]()
word += " " + samplers["noun"]()
if random.random() > 0.7:
word += " " + samplers["adverb"]()
if random.random() > 0.7:
word += " " + samplers["adverb"]()
word += " " + samplers["verb"]()
if random.random() > 0.8:
word += " " + samplers["noun"]()
if random.random() > 0.9:
word += "-" + samplers["noun"]()
if len(word) > 500:
word += "."
else:
word += " " + samplers["punctuation"]()
return generate_nonsense(word)
def generate_dataset(total_size, ):
sentences = []
for i in range(total_size):
sentences.append(generate_nonsense())
return sentences
# generate dataset
lines = generate_dataset(100)

复制代码

回复

使用道具举报

藤椅

ReneeBK 发表于 2017-9-11 02:47:26 |只看作者 |坛友微信交流群

### Utilities:
class Vocab:
__slots__ = ["word2index", "index2word", "unknown"]
def __init__(self, index2word = None):
self.word2index = {}
self.index2word = []
# add unknown word:
self.add_words(["**UNKNOWN**"])
self.unknown = 0
if index2word is not None:
self.add_words(index2word)
def add_words(self, words):
for word in words:
if word not in self.word2index:
self.word2index[word] = len(self.word2index)
self.index2word.append(word)
def __call__(self, line):
"""
Convert from numerical representation to words
and vice-versa.
"""
if type(line) is np.ndarray:
return " ".join([self.index2word[word] for word in line])
if type(line) is list:
if len(line) > 0:
if line[0] is int:
return " ".join([self.index2word[word] for word in line])
indices = np.zeros(len(line), dtype=np.int32)
else:
line = line.split(" ")
indices = np.zeros(len(line), dtype=np.int32)
for i, word in enumerate(line):
indices[i] = self.word2index.get(word, self.unknown)
return indices
@property
def size(self):
return len(self.index2word)
def __len__(self):
return len(self.index2word)

复制代码

回复

使用道具举报

板凳

ReneeBK 发表于 2017-9-11 02:48:17 |只看作者 |坛友微信交流群

Create a Mapping from numbers to words
Now we can use the Vocab class to gather all the words and store an Index:
In [ ]:
vocab = Vocab()
for line in lines:
vocab.add_words(line.split(" "))
To send our sentences in one big chunk to our neural network we transform each sentence into a row vector and place each of these rows into a bigger matrix that holds all these rows. Not all sentences have the same length, so we will pad those that are too short with 0s in pad_into_matrix:
In [168]:
def pad_into_matrix(rows, padding = 0):
if len(rows) == 0:
return np.array([0, 0], dtype=np.int32)
lengths = map(len, rows)
width = max(lengths)
height = len(rows)
mat = np.empty([height, width], dtype=rows[0].dtype)
mat.fill(padding)
for i, row in enumerate(rows):
mat[i, 0:len(row)] = row
return mat, list(lengths)
# transform into big numerical matrix of sentences:
numerical_lines = []
for line in lines:
numerical_lines.append(vocab(line))
numerical_lines, numerical_lengths = pad_into_matrix(numerical_lines)

复制代码

回复

使用道具举报

报纸

ReneeBK 发表于 2017-9-11 02:49:25 |只看作者 |坛友微信交流群

Prediction
We have now defined our network. At each timestep we can produce a probability distribution for each input index:
def create_prediction(self, greedy=False):
def step(idx, *states):
# new hiddens are the states we need to pass to LSTMs
# from past. Because the StackedCells also include
# the embeddings, and those have no state, we pass
# a "None" instead:
new_hiddens = [None] + list(states)
new_states = self.model.forward(idx, prev_hiddens = new_hiddens)
return new_states[1:]
...
Our inputs are an integer matrix Theano symbolic variable:
...
# in sequence forecasting scenario we take everything
# up to the before last step, and predict subsequent
# steps ergo, 0 ... n - 1, hence:
inputs = self.input_mat[:, 0:-1]
num_examples = inputs.shape[0]
# pass this to Theano's recurrence relation function:

复制代码

回复

使用道具举报

地板

MouJack007 发表于 2017-9-11 06:46:45 |只看作者 |坛友微信交流群

谢谢楼主分享！

回复

使用道具举报

7楼

MouJack007 发表于 2017-9-11 06:47:47 |只看作者 |坛友微信交流群

回复

使用道具举报

8楼

yangbing1008 发表于 2017-9-11 09:56:03 |只看作者 |坛友微信交流群

感谢分享

回复

使用道具举报

9楼

seoulcityyxx 发表于 2017-9-16 22:21:41 |只看作者 |坛友微信交流群

poas 8efp9w

回复

使用道具举报

发帖

本版微信群

加好友,备注jltj
拉您入交流群

如有投资本站、合作意向或投放广告，请联系：13661292478（刘老师）

联系客服

邮箱：service@pinggu.org 投诉或不良信息处理：（010-68466864）

京ICP备16021002-2号京B2-20170662号京公网安备 11010802022788号论坛法律顾问：王进律师知识产权保护声明免责及隐私声明