楼主: oliyiyi
1318 6

Data Preparation for Variable Length Input Sequences [推广有奖]

版主

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
237460 个
通用积分
31653.5208
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384146 点
帖子
9645
精华
66
在线时间
5504 小时
注册时间
2007-5-21
最后登录
2024-10-28

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

Deep learning libraries assume a vectorized representation of your data.
In the case of variable length sequence prediction problems, this requires that your data be transformed such that each sequence has the same length.
This vectorization allows code to efficiently perform the matrix operations in batch for your chosen deep learning algorithms.
In this tutorial, you will discover techniques that you can use to prepare your variable length sequence data for sequence prediction problems in Python with Keras.
After completing this tutorial, you will know:
  • How to pad variable length sequences with dummy values.
  • How to pad variable length sequences to a new longer desired length.
  • How to truncate variable length sequences to a shorter desired length.
Let’s get started.


Data Preparation for Variable-Length Input Sequences for Sequence Prediction
Photo by Adam Bautz, some rights reserved.

OverviewThis section is divided into 3 parts; they are:
  • Contrived Sequence Problem
  • Sequence Padding
  • Sequence Truncation
EnvironmentThis tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this example.
This tutorial assumes you have Keras (v2.0.4+) installed with either the TensorFlow (v1.1.0+) or Theano (v0.9+) backend.
This tutorial also assumes you have scikit-learn, Pandas, NumPy, and Matplotlib installed.
If you need help setting up your Python environment, see this post:
Contrived Sequence ProblemWe can contrive a simple sequence problem for the purposes of this tutorial.
The problem is defined as sequences of integers. There are three sequences with a length between 4 and 1 timesteps, as follows:
  1. 1, 2, 3, 4
  2. 1, 2, 3
  3. 1
复制代码

These can be defined as a list of lists in Python as follows (with spacing for readability):
  1. sequences = [
  2. [1, 2, 3, 4],
  3. [1, 2, 3],
  4. [1]
  5. ]
复制代码

We will use these sequences as the basis for exploring sequence padding in this tutorial.
Sequence PaddingThe pad_sequences() function in the Keras deep learning library can be used to pad variable length sequences.
The default padding value is 0.0, which is suitable for most applications, although this can be changed by specifying the preferred value via the “value” argument. For example:
pad_sequences(..., value=99)The padding to be applied to the beginning or the end of the sequence, called pre- or post-sequence padding, can be specified by the “padding” argument, as follows.
Pre-Sequence PaddingPre-sequence padding is the default (padding=’pre’)
The example below demonstrates pre-padding 3-input sequences with 0 values.
  1. from keras.preprocessing.sequence import pad_sequences
  2. # define sequences
  3. sequences = [
  4.         [1, 2, 3, 4],
  5.            [1, 2, 3],
  6.                      [1]
  7.         ]
  8. # pad sequence
  9. padded = pad_sequences(sequences)
  10. print(padded)
复制代码

Running the example prints the 3 sequences pre-pended with zero values.
  1. [[1 2 3 4]
  2. [0 1 2 3]
  3. [0 0 0 1]
复制代码


Post-Sequence PaddingPadding can also be applied to the end of the sequences, which may be more appropriate for some problem domains.
Post-sequence padding can be specified by setting the “padding” argument to “post”.
  1. from keras.preprocessing.sequence import pad_sequences
  2. # define sequences
  3. sequences = [
  4.         [1, 2, 3, 4],
  5.            [1, 2, 3],
  6.                      [1]
  7.         ]
  8. # pad sequence
  9. padded = pad_sequences(sequences, padding='post')
  10. print(padded)
复制代码

Running the example prints the same sequences with zero-values appended.
  1. [[1 2 3 4]
  2. [1 2 3 0]
  3. [1 0 0 0]]
复制代码


Pad Sequences To LengthThe pad_sequences() function can also be used to pad sequences to a preferred length that may be longer than any observed sequences.
This can be done by specifying the “maxlen” argument to the desired length. Padding will then be performed on all sequences to achieve the desired length, as follows.
  1. from keras.preprocessing.sequence import pad_sequences
  2. # define sequences
  3. sequences = [
  4.         [1, 2, 3, 4],
  5.            [1, 2, 3],
  6.                      [1]
  7.         ]
  8. # pad sequence
  9. padded = pad_sequences(sequences, maxlen=5)
  10. print(padded)
复制代码

Running the example pads each sequence to the desired length of 5 timesteps, even though the maximum length of an observed sequence is only 4 timesteps.
  1. [[0 1 2 3 4]
  2. [0 0 1 2 3]
  3. [0 0 0 0 1]]
复制代码


Sequence TruncationThe length of sequences can also be trimmed to a desired length.
The desired length for sequences can be specified as a number of timesteps with the “maxlen” argument.
There are two ways that sequences can be truncated: by removing timesteps from the beginning or the end of sequences.
Pre-Sequence TruncationThe default truncation method is to remove timesteps from the beginning of sequences. This is called pre-sequence truncation.
The example below truncates sequences to a desired length of 2.
  1. from keras.preprocessing.sequence import pad_sequences
  2. # define sequences
  3. sequences = [
  4.         [1, 2, 3, 4],
  5.            [1, 2, 3],
  6.                      [1]
  7.         ]
  8. # truncate sequence
  9. truncated= pad_sequences(sequences, maxlen=2)
  10. print(truncated)
复制代码

Running the example removes the first two timesteps from the first sequence, the first timestep from the second sequence, and pads the final sequence.
  1. [[3 4]
  2. [2 3]
  3. [0 1]]
复制代码


Post-Sequence TruncationSequences can also be trimmed by removing timesteps from the end of the sequences.
This approach may be more desirable for some problem domains.
Post-sequence truncation can be configured by changing the “truncating” argument from the default ‘pre’ to ‘post’, as follows:
  1. from keras.preprocessing.sequence import pad_sequences
  2. # define sequences
  3. sequences = [
  4.         [1, 2, 3, 4],
  5.            [1, 2, 3],
  6.                      [1]
  7.         ]
  8. # truncate sequence
  9. truncated= pad_sequences(sequences, maxlen=2, truncating='post')
  10. print(truncated)
复制代码

Running the example removes the last two timesteps from the first sequence, the last timestep from the second sequence, and again pads the final sequence.
  1. [[1 2]
  2. [1 2]
  3. [0 1]]
复制代码


SummaryIn this tutorial, you discovered how to prepare variable length sequence data for use with sequence prediction problems in Python.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Preparation Sequences Variable sequence ration prediction techniques learning discover problems

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
sunshine880607 发表于 2017-6-19 09:26:35 |只看作者 |坛友微信交流群
00000000000

使用道具

藤椅
MouJack007 发表于 2017-6-19 11:23:18 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

板凳
MouJack007 发表于 2017-6-19 11:23:56 |只看作者 |坛友微信交流群

使用道具

报纸
h2h2 发表于 2017-6-20 03:17:48 |只看作者 |坛友微信交流群
谢谢分享

使用道具

地板
minixi 发表于 2017-6-20 10:16:48 |只看作者 |坛友微信交流群
谢谢分享

使用道具

7
minixi 发表于 2017-6-20 10:17:20 |只看作者 |坛友微信交流群
谢谢分享

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-11-6 05:07