楼主: oliyiyi
1644 17

Analytical and Numerical Solutions to Linear Regression Problems [推广有奖]

版主

泰斗

0%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
271951 个
通用积分
31269.3519
学术水平
1435 点
热心指数
1554 点
信用等级
1345 点
经验
383775 点
帖子
9598
精华
66
在线时间
5468 小时
注册时间
2007-5-21
最后登录
2024-4-18

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

本帖隐藏的内容

This exercise focuses on linear regression with both analytical (normal equation) and numerical (gradient descent) methods. We will start with linear regression with one variable. From this part of the exercise, we will create plots that help to visualize how gradient descent gets the coefficient of the predictor and the intercept. In the second part, we will implement linear regression with multiple variables.

Linear regression with one variable

In this part of this exercise, we will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from cities. We would like to use this data to help us select which city to expand to next.
The file ex1data1.txt contains the dataset for our linear regression problem. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss. The first column refers to the population size in 10,000s and the second column refers to the profit in $10,000s.


  1. library(ggplot2)
  2. library(data.table)
  3. library(magrittr)
  4. library(caret)
  5. library(fields)
  6. library(plot3D)
复制代码

Load the data and display first 6 observations

  1. ex1data1 <- fread("ex1data1.txt",col.names=c("population","profit"))     
  2. head(ex1data1)
  3.    population  profit
  4. 1:     6.1101 17.5920
  5. 2:     5.5277  9.1302
  6. 3:     8.5186 13.6620
  7. 4:     7.0032 11.8540
  8. 5:     5.8598  6.8233
  9. 6:     8.3829 11.8860
复制代码

Plotting the Data

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, we can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population). Many other problems that we encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot.

  1. ex1data1%>%ggplot(aes(x=population, y=profit))+
  2. geom_point(color="blue",size=4,alpha=0.5)+
  3. ylab('Profit in $10,000s')+           
  4. xlab('Population of City in 10,000s')+ggtitle ('Figure 1: Scatter plot of training data')+
  5. theme(plot.title = element_text(size = 16,colour="red"))
复制代码

Here it is the plot:

[color=rgb(255, 255, 255) !important]


Gradient Descent

In this part, we will fit linear regression parameters θ to our dataset using gradient descent.
To implement gradient descent, we need to calculate the cost, which is given by:

[color=rgb(255, 255, 255) !important]


Computing the cost J(θ)

As we perform gradient descent to learn minimize the cost function J(θ), it is helpful to monitor the convergence by computing the cost. In this section, we will implement a function to calculate J(θ) so we can check the convergence of our gradient descent implementation.
But remember, before we go any further we need to add the θ intercept term.

  1. X=cbind(1,ex1data1$population)
  2. y=ex1data1$profit
  3. head(X)
  4.      [,1]   [,2]
  5. [1,]    1 6.1101
  6. [2,]    1 5.5277
  7. [3,]    1 8.5186
  8. [4,]    1 7.0032
  9. [5,]    1 5.8598
  10. [6,]    1 8.3829
复制代码

The function below calcuates cost based on the equation given above.

  1. computeCost=function(X,y,theta){
  2.     z=((X%*%theta)-y)^2
  3.     return(sum(z)/(2*nrow(X)))
  4. }
复制代码

Now, we can calculate the initial cost by initilizating the initial parameters to 0.

  1. theta=matrix(rep(0,ncol(X)))
  2. round(computeCost(X,y,theta),2)
  3. 32.07
复制代码

Gradient descent

Next, we will implement gradient descent by calling the computeCost function above. A good way to verify that gradient descent is working correctly is to look at the value of J(θ) and check that it is decreasing with each step. Our value of J(θ) should never increase, and should converge to a steady value by the end of the algorithm.
The gradient descent function below returns the cost in every iteration and the optimal parameters for the number of iterations and learning rate alpha specified.

  1. gradientDescent=function(X, y, theta, alpha, iters){
  2. gd=list()
  3. cost=rep(0,iters)
  4. for(k in 1:iters){
  5.   z=rep(0,ncol(X))
  6.   for(i in 1:ncol(X)){
  7.       for(j in 1:nrow(X)){
  8.           z[i]=z[i]+(((X[j,]%*%theta)-y[j])*X[j,i])
  9.           }
  10.       }
  11. theta= theta-((alpha/nrow(X))*z)
  12. cost[k]=computeCost(X,y,theta)
  13.     }
  14.     gd$theta= theta
  15.     gd$cost=cost
  16.     gd  
  17. }
复制代码

Now, let’s use the gradientDescent function to find the parameters and we have to make sure that our cost never increases. Let’s use 1500 iterations and a learning rate alpha of 0.04 for now. Later, we will see the effect of these values in our application.

  1. iterations = 1500
  2. alpha = 0.01
  3. theta= matrix(rep(0, ncol(X)))
  4. gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
  5. theta=gradientDescent_results$theta
  6. theta
  7.           [,1]
  8. [1,] -3.630291
  9. [2,]  1.166362
复制代码

Ploting the cost function as a function of the number of iterations

We expect that the cost should be decreasing or at least should not be at all increasing if our implementaion is correct. Let’s plot the cost fucntion as a function of number of iterations and make sure our implemenation makes sense.

  1. data.frame(Cost=gradientDescent_results$cost,Iterations=1:iterations)%>%
  2. ggplot(aes(x=Iterations,y=Cost))+geom_line(color="blue")+
  3. ggtitle("Cost as a function of number of iteration")+
  4. theme(plot.title = element_text(size = 16,colour="red"))
复制代码

Here it is the plot:

[color=rgb(255, 255, 255) !important]


As the plot above shows, the cost decrases with number of iterations and gets almost close to convergence with 1500 iterations.




二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:regression Analytical solutions regressio Numerical

本帖被以下文库推荐

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html
沙发
oliyiyi 发表于 2017-2-19 14:22:28 |只看作者 |坛友微信交流群
Plot the linear fit

Now, since we know the parameters (slope and intercept), we can plot the linear fit on the scatter plot.

  1. ex1data1%>%ggplot(aes(x=population, y=profit))+
  2. geom_point(color="blue",size=3,alpha=0.5)+
  3. ylab('Profit in $10,000s')+           
  4. xlab('Population of City in 10,000s')+
  5. ggtitle ('Figure 1: Scatter plot of training data') +
  6. geom_abline(intercept = theta[1], slope = theta[2],col="red",show.legend=TRUE)+
  7. theme(plot.title = element_text(size = 16,colour="red"))+
  8. annotate("text", x = 12, y = 20, label = paste0("Profit = ",round(theta[1],4),"+",round(theta[2],4)," * Population"))
复制代码

Here it is the plot:

[color=rgb(255, 255, 255) !important]


Visualizing J(θ)

To understand the cost function J(θ) better, let’s now plot the cost over a 2-dimensional grid of θ0 and θ1 values. The global minimum is the optimal point for θ0 and θ1, and each step of gradient descent moves closer to this point.

  1. Intercept=seq(from=-10,to=10,length=100)
  2. Slope=seq(from=-1,to=4,length=100)
  3. # initialize cost values to a matrix of 0's
  4. Cost = matrix(0,length(Intercept), length(Slope));
  5. for(i in 1:length(Intercept)){
  6.     for(j in 1:length(Slope)){
  7.         t = c(Intercept[i],Slope[j])
  8.         Cost[i,j]= computeCost(X, y, t)
  9.     }
  10. }
  11. persp3D(Intercept,Slope,Cost,theta=-45, phi=25, expand=0.75,lighting = TRUE,
  12.         ticktype="detailed", xlab="Intercept", ylab="Slope",
  13.         zlab="",axes=TRUE, main="Surface")
  14. image.plot(Intercept,Slope,Cost, main="Contour, showing minimum")
  15. contour(Intercept,Slope,Cost, add = TRUE,n=30,labels='')
  16. points(theta[1],theta[2],col='red',pch=4,lwd=6)
复制代码

Here it is the plot:

[color=rgb(255, 255, 255) !important]


[color=rgb(255, 255, 255) !important]


Normal Equation

Since linear regression has closed-form solution, we can solve it analytically and it is called normal equation. It is given by the formula below. we do not need to iterate or choose learning curve. However, we need to calculate inverse of a matrix , which make it slow if the number of records is very large. Gradient descent is applicable to other machine learning techniques as well. Further, gradient descent method is more appropriate even to linear regression when the number of observations is very large.
θ=(XTX)−1XTy

  1. theta2=solve((t(X)%*%X))%*%t(X)%*%y
  2. theta2
  3.           [,1]
  4. [1,] -3.895781
  5. [2,]  1.193034
复制代码

There is very small difference between the parameters we got from normal equation and using gradient descent. Let’s increase the number of iteration and see if they get closer to each other. I increased the number of iterations from 1500 to 15000.

  1. iterations = 15000
  2. alpha = 0.01
  3. theta= matrix(rep(0, ncol(X)))
  4. gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
  5. theta=gradientDescent_results$theta
  6. theta
  7.           [,1]
  8. [1,] -3.895781
  9. [2,]  1.193034
复制代码

As you can see, now the results from normal equation and gradient descent are the same.

Using caret package

By the way, we can use packages develoed by experts in the field and perform our machine learning tasks. There are many machine learning packages in R for differnt types of machine learning tasks. To verify that we get the same results, let’s use the caret package, which is among the most commonly used machine learning packages in R.

  1. my_lm <- train(profit~population, data=ex1data1,method = "lm")
  2. my_lm$finalModel$coefficients
  3. (Intercept) -3.89578087831187
  4. population  1.1930336441896
复制代码

Linear regression with multiple variables

In this part of the exercise, we will implement linear regression with multiple variables to predict the prices of houses. Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. The file ex1data2.txt contains a training set of housing prices in Port- land, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

  1. ex1data2 <- fread("ex1data2.txt",col.names=c("size","bedrooms","price"))     
  2. head(ex1data2)
  3.    size bedrooms  price
  4. 1: 2104        3 399900
  5. 2: 1600        3 329900
  6. 3: 2400        3 369000
  7. 4: 1416        2 232000
  8. 5: 3000        4 539900
  9. 6: 1985        4 299900
复制代码

Feature Normalization

House sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly.

  1. ex1data2=as.data.frame(ex1data2)
  2. for(i in 1:(ncol(ex1data2)-1)){
  3.     ex1data2[,i]=(ex1data2[,i]-mean(ex1data2[,i]))/sd(ex1data2[,i])
  4. }
复制代码

Gradient Descent

Previously, we implemented gradient descent on a univariate regression problem. The only difference now is that there is one more feature in the matrix X. The hypothesis function and the batch gradient descent update rule remain unchanged.

  1. X=cbind(1,ex1data2$size,ex1data2$bedrooms)
  2. y=ex1data2$price
  3. head(X)

  4. iterations = 6000
  5. alpha = 0.01
  6. theta= matrix(rep(0, ncol(X)))
  7. gradientDescent_results=gradientDescent(X,y,theta,alpha,iterations)
  8. theta=gradientDescent_results$theta
  9. theta
  10.     [,1]        [,2]       [,3]
  11. [1,]    1  0.13000987 -0.2236752
  12. [2,]    1 -0.50418984 -0.2236752
  13. [3,]    1  0.50247636 -0.2236752
  14. [4,]    1 -0.73572306 -1.5377669
  15. [5,]    1  1.25747602  1.0904165
  16. [6,]    1 -0.01973173  1.0904165

  17.           [,1]
  18. [1,] 340412.660
  19. [2,] 110631.050
  20. [3,]  -6649.474
复制代码

Normal Equation
  1. theta2=solve((t(X)%*%X))%*%t(X)%*%y
  2. theta2
  3.            [,1]
  4. [1,] 340412.660
  5. [2,] 110631.050
  6. [3,]  -6649.474
复制代码

Using caret package
  1. ex1data2 <- fread("ex1data2.txt",col.names=c("size","bedrooms","price"))     
  2. my_lm <- train(price~size+bedrooms, data=ex1data2,method = "lm",  
  3.                preProcess = c("center","scale"))
  4. my_lm$finalModel$coefficients
  5. (Intercept) 340412.659574468
  6. size        110631.050278846
  7. bedrooms  -6649.4742708198
复制代码

Summary

In this post, we saw how to implement numerical and analytical solutions to linear regression problems using R. We also used caret -the famous R machine learning package- to verify our results. The data sets are from the Coursera machine learning course offered by Andrew Ng. The course is offered with Matlab/Octave. I am doing the exercises in that course with R. You can get the code from this Github repository.


缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

使用道具

藤椅
tmdxyz 发表于 2017-2-19 14:48:20 |只看作者 |坛友微信交流群
Analytical and Numerical Solutions to Linear Regression Problems
已有 1 人评分经验 收起 理由
oliyiyi + 5 精彩帖子

总评分: 经验 + 5   查看全部评分

使用道具

板凳
jinyizhe282 发表于 2017-2-19 14:56:35 |只看作者 |坛友微信交流群
谢谢            

使用道具

报纸
20115326 学生认证  发表于 2017-2-19 14:57:02 |只看作者 |坛友微信交流群
谢谢分享

使用道具

地板
research 发表于 2017-2-19 15:00:18 |只看作者 |坛友微信交流群
Analytical and Numerical Solutions to Linear Regression Problems

使用道具

7
钱学森64 发表于 2017-2-19 16:33:27 |只看作者 |坛友微信交流群
谢谢分享,很实用

使用道具

8
军旗飞扬 发表于 2017-2-19 16:43:23 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

9
ekscheng 发表于 2017-2-19 17:11:27 |只看作者 |坛友微信交流群

使用道具

10
liuhanzhong 在职认证  发表于 2017-2-19 17:26:47 |只看作者 |坛友微信交流群
跟我的专业方向差不多,看看怎么样?

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-25 09:18