请选择 进入手机版 | 继续访问电脑版
楼主: oliyiyi
1947 1

Multi-Task Learning in Tensorflow: Part 1 [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
272091 个
通用积分
31269.1729
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383778 点
帖子
9599
精华
66
在线时间
5466 小时
注册时间
2007-5-21
最后登录
2024-3-21

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

oliyiyi 发表于 2016-7-23 11:46:03 |显示全部楼层 |坛友微信交流群

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

A discussion and step-by-step tutorial on how to use Tensorflow graphs for multi-task learning.

By Jonathan Godwin, University College London.

A Jupyter notebook accompanies this blog post. Please download here.

Introduction


Why Multi-Task Learning

When you think about the way people learn to do new things, they often use their experience and knowledge of the world to speed up the learning process. When I learn a new language, especially a related one, I use my knowledge of languages I already speak to make shortcuts. The process works the other way too - learning a new language can help you understand and speak your own better.

Our brains learn to do multiple different tasks at the same time - we have the same brain architecture whether we are translating English to German or English to French. If we were to use a Machine Learning algorithm to do both of these tasks, we might call that ‘multi-task’ learning.

It’s one of the most interesting and exciting areas of research for Machine Learning in coming years, radically reducing the amount of data required to learn new concepts. One of the great promises of Deep Learning is that, with the power of the models and simple ways to share parameters between tasks, we should be able to make significant progress in multi-task learning.

As I started to experiment in this area I came across a bit of a road block - while it was easy to understand the architecture changes required to implement multi-task learning, it was harder to figure out how to implement it in Tensorflow. To do anything but standard nets in Tensorflow requires a good understanding of how it works, but most of the stock examples don’t provide helpful guidance. I hope the following tutorial explains some key concepts simply, and helps those who are struggling.

What We Are Going To Do


Part 1

  • Understand Tensorflow Computation Graphs With An Example. Doing multi-task learning with Tensorflow requires understanding how computation graphs work - skip if you already know.
  • Understand How We Can Use Graphs For Multi-Task Learning. We’ll go through an example of how to adapt a simple graph to do Multi-Task Learning.

Part 2

  • Build A Graph for POS Tagging and Shallow Parsing. We’ll fill in a template that trains a net for two related linguistic tasks. Don’t worry, you don’t need to know what they are!
  • Train A Net Jointly and Separately. We’ll actually train a model in two different ways. You should be able to do this on your laptop.
Understanding Computation Graphs With A Toy Example


The Computation Graph is the thing that makes Tensorflow (and other similar packages) fast. It’s an integral part of machinery of Deep Learning, but can be confusing.

There are some neat features of a graph that mean it’s very easy to conduct multi-task learning, but first we’ll keep things simple and explain the key concepts.

Definition: Computation Graph

The Computation Graph is a template for computation (re: algorithm) you are going to run. It doesn’t perform any calculations, but it means that your computer can conduct backpropagation far more quickly.

If you ask Tensorflow for a result of a calculation it will only make those calculations required for the job, not the whole graph.

A Toy Example - Linear Transformation: Setting Up The Graph


We’re going to look at the graph for a simple calculation - a linear transformation of our inputs, and taking the square loss:



# Import Tensorflow and numpyimport Tensorflow as tfimport numpy as np# ======================# Define the Graph# ======================# Create Placeholders For X And Y (for feeding in data)X = tf.placeholder("float",[10, 10],name="X") # Our input is 10x10Y = tf.placeholder("float", [10, 1],name="Y") # Our output is 10x1# Create a Trainable Variable, "W", our weights for the linear transformationinitial_W = np.zeros((10,1))W = tf.Variable(initial_W, name="W", dtype="float32")# Define Your Loss FunctionLoss = tf.pow(tf.add(Y,-tf.matmul(X,W)),2,name="Loss")

There are a few things to emphasis about this graph:

  • If we were to run this code right now, we would get no output. Remember that a Computation Graph is just a template - it doesn’t do anything. If we want an answer, we have to tell Tensorflow to run the computation using a Session.
  • We haven’t explicitly created a graph object. You might expect that we would have to create a graph object somewhere in order for Tensorflow to know that we wanted to create a graph. In fact, by using the Tensorflow operations, we are telling Tensorflow what parts of our code are in the graph.

Tip: Keep Your Graph Separate. You’ll typically be doing a fair amount of data manipulation and computation outside of the graph, which means keeping track of what is and isn’t available inside of python a bit confusing. I like to put my graph in a separate file, and often in a separate class to keep concerns separated, but this isn’t required.

A Toy Example - Linear Transformation: Getting Results


Computations on your Graph are conducted inside a Tensorflow Session. To get results from your session you need to provide it with two things: Target Results and Inputs.

  • Target Results or Operations. You tell Tensorflow what parts of the graph you want to return values for, and it will automatically figure out what calculations within need to be run. You can also call operations, for example, to initialise your variables.
  • Inputs As Required (‘Feed Dict’). In most calculations you will provide the input data ad-hoc. In this case, you construct the graph with aplaceholder for this data, and feed it in at computation time. Not all calculations or operations will require an input - for many, all the information is already contained in the graph.
# Import Tensorflow and Numpyimport Tensorflow as tfimport numpy as np# ======================# Define the Graph# ======================# Create Placeholders For X And Y (for feeding in data)X = tf.placeholder("float",[10, 10],name="X") # Our input is 10x10Y = tf.placeholder("float", [10, 1],name="Y") # Our output is 10x1# Create a Trainable Variable, "W", our weights for the linear transformationinitial_W = np.zeros((10,1))W = tf.Variable(initial_W, name="W", dtype="float32")# Define Your Loss FunctionLoss = tf.pow(tf.add(Y,-tf.matmul(X,W)),2,name="Loss")with tf.Session() as sess: # set up the session    sess.run(tf.initialize_all_variables())    Model_Loss = sess.run(                Loss, # the first argument is the name of the Tensorflow variabl you want to return                { # the second argument is the data for the placeholders                  X: np.random.rand(10,10),                  Y: np.random.rand(10).reshape(-1,1)                })    print(Model_Loss)

How To Use Graphs for Multi-Task Learning


When we create a Neural Net that performs multiple tasks we want to have some parts of the network that are shared, and other parts of the network that are specific to each individual task. When we’re training, we want information from each task to be transferred in the shared parts of the network.

So, to start, let’s draw a diagram of a simple two-task network that has a shared layer and a specific layer for each individual task. We’re going to feed the outputs of this into our loss function with our targets. I’ve labelled where we’re going to want to create placeholders in the graph.

#  GRAPH CODE# ============# Import Tensorflowimport Tensorflow as tf# ======================# Define the Graph# ======================# Define the PlaceholdersX = tf.placeholder("float", [10, 10], name="X")Y1 = tf.placeholder("float", [10, 1], name="Y1")Y2 = tf.placeholder("float", [10, 1], name="Y2")# Define the weights for the layersshared_layer_weights = tf.Variable([10,20], name="share_W")Y1_layer_weights = tf.Variable([20,1], name="share_Y1")Y2_layer_weights = tf.Variable([20,1], name="share_Y2")# Construct the Layers with RELU Activationsshared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights))Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights))Y2_layer_weights = tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights))# Calculate LossY1_Loss = tf.nn.l2_loss(Y1,Y1_layer)Y2_Loss = tf.nn.l2_loss(Y2,Y2_layer)

When we are training this network, we want the parameters of the Task 1 layer to not change no matter how wrong we get Task 2, but the parameters of the shared layer to change with both tasks. This might seem a little difficult - normally you only have one optimiser in a graph, because you only optimise one loss function. Thankfully, using the properties of the graph it’s very easy to train this sort of model in two ways.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Learning earning Tensor Learn multi especially University experience knowledge download

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
oliyiyi 发表于 2016-7-23 11:52:04 |显示全部楼层 |坛友微信交流群
Alternate Training


The first solution is particularly suited to situations where you’ll have a batch of Task 1 data and then a batch of Task 2 data.

Remember that Tensorflow automatically figures out which calculations are needed for the operation you requested, and only conducts those calculations. This means that if we define an optimiser on only one of the tasks, it will only train the parameters required to compute that task - and will leave the rest alone. Since Task 1 relies only on the Task 1 and Shared Layers, the Task 2 layer will be untouched. Let’s draw another diagram with the desired optimisers at the end of each task.

#  GRAPH CODE# ============# Import Tensorflow and Numpyimport Tensorflow as tfimport numpy as np# ======================# Define the Graph# ======================# Define the PlaceholdersX = tf.placeholder("float", [10, 10], name="X")Y1 = tf.placeholder("float", [10, 20], name="Y1")Y2 = tf.placeholder("float", [10, 20], name="Y2")# Define the weights for the layersinitial_shared_layer_weights = np.random.rand(10,20)initial_Y1_layer_weights = np.random.rand(20,20)initial_Y2_layer_weights = np.random.rand(20,20)shared_layer_weights = tf.Variable(initial_shared_layer_weights, name="share_W", dtype="float32")Y1_layer_weights = tf.Variable(initial_Y1_layer_weights, name="share_Y1", dtype="float32")Y2_layer_weights = tf.Variable(initial_Y2_layer_weights, name="share_Y2", dtype="float32")# Construct the Layers with RELU Activationsshared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights))Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights))Y2_layer = tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights))# Calculate LossY1_Loss = tf.nn.l2_loss(Y1-Y1_layer)Y2_Loss = tf.nn.l2_loss(Y2-Y2_layer)# optimisersY1_op = tf.train.AdamOptimizer().minimize(Y1_Loss)Y2_op = tf.train.AdamOptimizer().minimize(Y2_Loss)

We can conduct Multi-Task learning by alternately calling each task optimiser, which means we can continually transfer some of the information from each task to the other. In a loose sense, we are discovering the ‘commonality’ between the tasks. The following code implements this for our easy example. If you are following along, paste this at the bottom of the previous code:

# Calculation (Session) Code# ==========================# open the sessionwith tf.Session() as session:    session.run(tf.initialize_all_variables())    for iters in range(10):        if np.random.rand() < 0.5:            _, Y1_loss = session.run([Y1_op, Y1_Loss],                            {                              X: np.random.rand(10,10)*10,                              Y1: np.random.rand(10,20)*10,                              Y2: np.random.rand(10,20)*10                              })            print(Y1_loss)        else:            _, Y2_loss = session.run([Y2_op, Y2_Loss],                            {                              X: np.random.rand(10,10)*10,                              Y1: np.random.rand(10,20)*10,                              Y2: np.random.rand(10,20)*10                              })            print(Y2_loss)

Tips: When is Alternate Training Good?

Alternate training is a good idea when you have two different datasets for each of the different tasks (for example, translating from English to French and English to German). By designing a network in this way, you can improve the performance of each of your individual tasks without having to find more task-specific training data.

Alternate training is the most common situation you’ll find yourself in, because there aren’t that many datasets that have two or more outputs. We’ll come on to one example, but the clearest examples are where you want to build hierarchy into your tasks. For example, in vision, you might want one of your tasks to predict the rotation of an object, the other what the object would look like if you changed the camera angle. These two tasks are obviously related - in fact the rotation probably comes before the image generation.

Tips: When is Alternate Training Less Good?

Alternate training can easily become biased towards a specific task. The first way is obvious - if one of your tasks has a far larger dataset than the other, then if you train in proportion to the dataset sizes your shared layer will contain more information about the more significant task.

The second is less so. If you train alternately, the final task in your model will create a bias in the parameters. There isn’t any obvious way that you can overcome this problem, but it does mean that in circumstances where you don’t have to train alternately, you shouldn’t.

Training at the Same Time - Joint Training


When you have a dataset with multiple labels for each input, what you really want is to train the tasks at the same time. The question is, how do you preserve the independence of the task-specific functions? The answer is surprisingly simple - you just add up the loss functions of the individual tasks and optimise on that. Below is a diagram that shows a network that can train jointly, with the accompanying code:

#  GRAPH CODE# ============# Import Tensorflow and Numpyimport Tensorflow as tfimport numpy as np# ======================# Define the Graph# ======================# Define the PlaceholdersX = tf.placeholder("float", [10, 10], name="X")Y1 = tf.placeholder("float", [10, 20], name="Y1")Y2 = tf.placeholder("float", [10, 20], name="Y2")# Define the weights for the layersinitial_shared_layer_weights = np.random.rand(10,20)initial_Y1_layer_weights = np.random.rand(20,20)initial_Y2_layer_weights = np.random.rand(20,20)shared_layer_weights = tf.Variable(initial_shared_layer_weights, name="share_W", dtype="float32")Y1_layer_weights = tf.Variable(initial_Y1_layer_weights, name="share_Y1", dtype="float32")Y2_layer_weights = tf.Variable(initial_Y2_layer_weights, name="share_Y2", dtype="float32")# Construct the Layers with RELU Activationsshared_layer = tf.nn.relu(tf.matmul(X,shared_layer_weights))Y1_layer = tf.nn.relu(tf.matmul(shared_layer,Y1_layer_weights))Y2_layer = tf.nn.relu(tf.matmul(shared_layer,Y2_layer_weights))# Calculate LossY1_Loss = tf.nn.l2_loss(Y1-Y1_layer)Y2_Loss = tf.nn.l2_loss(Y2-Y2_layer)Joint_Loss = Y1_Loss + Y2_Loss# optimisersOptimiser = tf.train.AdamOptimizer().minimize(Joint_Loss)Y1_op = tf.train.AdamOptimizer().minimize(Y1_Loss)Y2_op = tf.train.AdamOptimizer().minimize(Y2_Loss)# Joint Training# Calculation (Session) Code# ==========================# open the sessionwith tf.Session() as session:    session.run(tf.initialize_all_variables())    _, Joint_Loss = session.run([Optimiser, Joint_Loss],                    {                      X: np.random.rand(10,10)*10,                      Y1: np.random.rand(10,20)*10,                      Y2: np.random.rand(10,20)*10                      })    print(Joint_Loss)

Conclusions and Next Steps


In this post we’ve gone through the basic principles behind multi-task learning in deep neural nets. If you’ve used Tensorflow before, and have your own project, then hopefully this has given you enough to get started.

For those of you who want a more meaty, more detailed example of how this can be used to improve performance in multiple tasks, then stay tuned for part 2 of the tutorial where we’ll delve into Natural Language Processing to build a multi-task model for shallow parsing and part of speech tagging.

Bio: Jonathan Godwin is currently studying for a Msc in Machine Learning from UCL with a specialism in deep multi-task learning for NLP. He will be finishing in September and will be looking for jobs/research roles where he can use this skill set on interesting problems.


缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-3-28 21:08