[有偿编程] 求高手用R或者其他软件解决一元个非参数回归方面的问题 [推广有奖]

21楼

起个好名字 发表于 2011-6-6 16:19:31 |只看作者 |坛友微信交流群

2# 楚韵荆风
> install.packages("np")
--- Please select a CRAN mirror for use in this session ---
警告: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.5
试开URL’http://ftp.ctex.org/mirrors/CRAN ... b/2.5/np_0.13-1.zip
Content type 'application/zip' length 877145 bytes
打开了URL
downloaded 856Kb
package 'np' successfully unpacked and MD5 sums checked
The downloaded packages are in
      C:\Documents and Settings\Administrator\Local Settings\Temp\RtmpkkEDAb\downloaded_packages
updating HTML package descriptions
> library(np)
载入需要的程辑包：boot
Nonparametric Kernel Methods for Mixed Datatypes (version 0.13-1)
Warning message:
程辑包'np'是用R版本2.5.1 来建造的
> library(boot)
> install.packages("cubature")
警告: unable to access index for repository http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.5
Warning message:
package 'cubature' is not available
> library(cubature)
错误于library(cubature) : 不存在叫'cubature'这个名字的程辑包
>
> # 把数据文件放在根目录文件的下面，或指定你的根目录到数据文件所在的文件夹
> dat=read.table("income.txt",head=TRUE)
> cuh=dat$income
> icup=dat$payment
> plot(cuh,icup,xlab="城镇居民的可支配收入",ylab="城镇居民的消费支出")
> # 使用非参数h核回归方法，通过最小二乘交叉验证方法(cv.ls)选择最优带宽，核函数为高斯核
> bw1=npregbw(icup~cuh,regtype="lc",bwmethod="cv.ls",bwscaling=FALSE,
+       ckerorder=2,ckertype="gaussian")$bw          # 最优带宽 bw=942.80058
> fit=npreg(icup~cuh,bws=bw1, gradients=TRUE,residuals = TRUE,regtype="ll")
错误于toFrame(txdat) : txdat must be a data frame, matrix, vector, or factor
> summary(fit)                #  拟合优度 R^2=0.999914
错误于summary(fit) : 找不到这个对象"fit"
> fitted(fit)                   #  拟合值
错误于fitted(fit) : 找不到这个对象"fit"
> [1]  1284.667  1432.575  1687.863  2124.774  2859.107  3483.401  3910.878
错误: syntax error, unexpected '['在"["里
> [8]  4148.837  4340.685  4647.390  4951.708  5369.195  5977.431  6522.083
错误: syntax error, unexpected '['在"["里
> [15]  7182.792  7907.363  8706.570  9994.640 11267.518 12264.079
错误: syntax error, unexpected '['在"["里
> resid(fit)                   #  残差
错误于resid(fit) : 找不到这个对象"fit"
> mse=sum(resid(fit)^2);mse    # 残差平方和 17055.17
错误于resid(fit) : 找不到这个对象"fit"
>
> # 用最小二乘线性回归
> fit2=lm(icup~cuh)
> summary(fit2)                #  拟合优度 R^2= 0.9978
Call:
lm(formula = icup ~ cuh)
Residuals:
Min    1Q  Median    3Q    Max
-234.11 -156.76 65.64  127.45  193.31
Coefficients:
         Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.586e+02  6.518e+01 7.036 1.45e-06 ***
cuh       6.982e-01  7.657e-03  91.181  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 153.8 on 18 degrees of freedom
Multiple R-Squared: 0.9978,    Adjusted R-squared: 0.9977
F-statistic:  8314 on 1 and 18 DF,  p-value: < 2.2e-16
> mse2=sum(resid(fit2)^2);mse2 # 残差平方和  425898.2
[1] 425898.2
>
> # 从残差平方和可以说明非参数回归方法比最小二乘回归方法你和结果要好
>

这个  怎么解决啊  谢谢啦

使用道具举报

22楼

zhangtao 发表于 2011-6-6 17:18:13 |只看作者 |坛友微信交流群

Regression Data: 20 training points, in 1 variable(s)
               cuh
Bandwidth(s): 942.8058
Kernel Regression Estimator: Local-Linear
Bandwidth Type: Fixed
Residual standard error: 852.7584
R-squared: 0.999914
Continuous Kernel Type: Second-Order Gaussian
No. Continuous Explanatory Vars.: 1
> fitted(fit)                   #  拟合值
[1]  1284.667  1432.575  1687.863  2124.774  2859.107  3483.401  3910.878
[8]  4148.837  4340.685  4647.390  4951.708  5369.195  5977.431  6522.083
[15]  7182.792  7907.363  8706.570  9994.640 11267.518 12264.079
> resid(fit)                   #  残差
      1          2          3          4          5          6
-5.7773738  21.2352773 -16.1333188 -13.9640206  -7.7670799  54.1694911
      7          8          9       10       11       12
  8.5923955  36.8027179  -9.0753458 -31.4800886  46.2916381 -60.1853126
      13       14       15       16       17       18
52.4485839 -11.1431333  -0.6923517  35.5168767 -10.0195975 2.8296573
      19       20
-24.6683374 0.4712892
> mse=sum(resid(fit)^2);mse    # 残差平方和 17055.17
[1] 17055.17
>
> # 用最小二乘线性回归
> fit2=lm(icup~cuh)
> summary(fit2)                #  拟合优度 R^2= 0.9978
Call:
lm(formula = icup ~ cuh)
Residuals:
Min    1Q  Median    3Q    Max
-234.11 -156.76 65.64  127.45  193.31
Coefficients:
         Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.586e+02  6.518e+01 7.036 1.45e-06 ***
cuh       6.982e-01  7.657e-03  91.181  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 153.8 on 18 degrees of freedom
Multiple R-squared: 0.9978,    Adjusted R-squared: 0.9977
F-statistic:  8314 on 1 and 18 DF,  p-value: < 2.2e-16
> mse2=sum(resid(fit2)^2);mse2 # 残差平方和  425898.2
[1] 425898.2
>
> # 从残差平方和可以说明非参数回归方法比最小二乘回归方法你和结果要好
>

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
起个好名字	+ 1	+ 1	+ 1	谢谢您啦

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

使用道具举报

23楼

zhangtao 发表于 2011-6-6 17:21:10 |只看作者 |坛友微信交流群

Gcvh <- function(x,y) {
+ n = length(y);
+ m = y;
+ cv0 = 1.0e50;
+ s = matrix(0, n, 1)
+ for (hi in 1:20)
+ {h = hi/20*(max(x)-min(x))/2
+ for (i in 1:n)
+ {k = exp( -(x - x[i])^2/(2*h*h))/sqrt(2*3.14)/h
+ m[i] = t(k) %*% y/sum(k)
+ s[i] = k[i]/sum(k); }
+ cv = mean( (y-m)^2)/( 1 - sum(s)/n)^2;
+ if (cv < cv0) { cv0 =cv
+ h0 = h } }
+ return(h0) }
>
> ks <- function(x, y, xnew=x, bandwidth=0.5)
+ {
+ n = length(y)
+ hatm = matrix(0, n);
+ for (i in 1:n)
+ { k = exp( -(x - x[i])^2/(2*bandwidth*bandwidth))/sqrt(2*3.14)/bandwidth
+ hatm[i] = t(k) %*% y/sum(k)
+ }
+ hatsigma2 = mean((y - hatm)^2)
+ m = matrix(0, length(xnew), 1)
+
+ for (i in 1:length(xnew)) {
+ k = exp( -(x - xnew[i])^2/(2*bandwidth*bandwidth))/sqrt(2*3.14)/bandwidth
+ m[i] = t(k) %*% y/sum(k)
+ }
+ return(list(m=m, hatsigma2 =hatsigma2 ))
+ }
>
> x = c(1510.2,1700.6,2026.6,2577.4,3496.2,4283.0,4838.9,5160.3,5425.1,5854.0,6280.0,6859.6,7702.8,8472.2,
+ 9421.6,10493.0,11759.5,13785.8,15780.8,17174.7)/1000
> y = c(1278.89,1453.81,1671.73,2110.81,2851.34,3537.57,3919.47,4185.64,4331.61,4615.91,4998.00,
+ 5309.01,6029.88,6510.94,7182.10,7942.88,8696.55,9997.47,11242.85,12264.55)/1000
> plot(x,y)
> h = Gcvh(x,y);
> fit = ks(x, y, bandwidth=h)
> lines(x,fit$m)
> fit$m
         [,1]
[1,]  1.424851
[2,]  1.473347
[3,]  1.609928
[4,]  1.987675
[5,]  2.889692
[6,]  3.608146
[7,]  4.004369
[8,]  4.177313
[9,]  4.316095
[10,]  4.602013
[11,]  4.906774
[12,]  5.268427
[13,]  6.027866
[14,]  6.482112
[15,]  7.165776
[16,]  7.929282
[17,]  8.692537
[18,]  9.997471
[19,] 11.244656
[20,] 12.262741
> fit$hatsigma2
[1] 0.003305470
>

已有 1 人评分	学术水平	热心指数	信用等级	收起理由
起个好名字	+ 1	+ 1	+ 1	谢谢您

总评分: 学术水平 + 1 热心指数 + 1 信用等级 + 1 查看全部评分

使用道具举报

24楼

zhangtao 发表于 2011-6-6 17:23:38 |只看作者 |坛友微信交流群

最优带宽 bw=942.80058

请问一下：这个最优带宽 bw=942.80058是怎么算出来的？是怎么来的？

本文来自: 人大经济论坛 S-Plus&R专版版，详细出处参考：https://bbs.pinggu.org/viewthread ... 3&from^^uid=11232
本文来自: 人大经济论坛 S-Plus&R专版版，详细出处参考：https://bbs.pinggu.org/viewthread ... 3&from^^uid=11232

比较一下还是2楼朋友的代码好！
dat=read.table("income.txt",head=TRUE)
> cuh=dat$income
> icup=dat$payment
> plot(cuh,icup,xlab="城镇居民的可支配收入",ylab="城镇居民的消费支出")
> # 使用非参数h核回归方法，通过最小二乘交叉验证方法(cv.ls)选择最优带宽，核函数为高斯核
> bw1=npregbw(icup~cuh,regtype="lc",bwmethod="cv.ls",bwscaling=FALSE,
+       ckerorder=2,ckertype="gaussian")$bw          # 最优带宽 bw=942.80058

Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 |
Multistart 1 of 1 /
Multistart 1 of 1 |
Multistart 1 of 1 |

> fit=npreg(icup~cuh,bws=bw1, gradients=TRUE,residuals = TRUE,regtype="ll")
> summary(fit)                #  拟合优度 R^2=0.999914

使用道具举报

25楼

zhangtao 发表于 2011-6-6 17:25:04 |只看作者 |坛友微信交流群

图片没有附上

请问一下：这个最优带宽 bw=942.80058是怎么算出来的？是怎么来的？

使用道具举报

26楼

ywh19860616 发表于 2011-6-6 17:43:49 |只看作者 |坛友微信交流群

zhangtao 发表于 2011-6-6 17:25
图片没有附上

请问一下：这个最优带宽 bw=942.80058是怎么算出来的？是怎么来的？

> bw1=npregbw(icup~cuh,regtype="lc",bwmethod="cv.ls",bwscaling=FALSE,
+ ckerorder=2,ckertype="gaussian")$bw # 最优带宽 bw=942.80058

bwmethod="cv.ls这个就是用cross-validation选择的

pls see help in detail

bwmethod

which method to use to select bandwidths. cv.aic specifies expected Kullback-Leibler cross-validation (Hurvich, Simonoff, and Tsai (1998)), and cv.ls specifies least-squares cross-validation. Defaults to cv.ls.

一份耕耘，一份收获。

使用道具举报