SAS 9.1.3 sp4 演示 - 第3页

匿名网友

21楼

匿名网友 发表于 2007-3-13 17:55:00

22楼

maximus11111 发表于 2007-3-13 18:36:00

thanks!

匿名网友

23楼

匿名网友 发表于 2007-3-16 01:49:00

<TABLE cellSpacing=0 cellPadding=4 width="100%" border=0>

<TR>
<TD vAlign=top align=middle colSpan=2><FONT face=Tahoma><B>Send results inside SAS to your email</B></FONT></TD></TR>
<TR>
<TD vAlign=top align=middle colSpan=2><FONT face=Tahoma><FONT color=#666666>2006-07-02</FONT> Zhiyong Zhang Read: 287 times</FONT></TD></TR>
<TR>
<TD vAlign=top align=middle colSpan=2><FONT face=Tahoma>Cite this page: <FONT color=blue>Zhiyong Zhang (2006). <I>Send results inside SAS to your email</I>. Retrieved March 15, 2007, from http://www.psychstat.org/us/article.php/57.htm.</FONT> </FONT></TD></TR>
<TR>
<TD vAlign=top bgColor=#eeeeee colSpan=2 height=1><FONT face=Tahoma></FONT></TD></TR>
<TR>
<TD vAlign=top colSpan=2>
<TABLE cellSpacing=1 cellPadding=4 width="100%" border=0>

<TR>
<TD vAlign=top>
<DIV class=subhead><B>http://www.psychstat.org/us/article.php?articleid=57</B></DIV></TD></TR>
<TR>
<TD vAlign=top>
<DIV class=content>
<P>The idea here is to first save your output into a file and then send the file as an attachment into your email. We illustrate this process using an example.</P>
<P>1. This example first creates a data set called sendmail and then prints the content of the data set into the output.</P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT color=#0000ff>/*Create an example*/<BR>data sendmail;<BR>input test;<BR>cards;<BR>1<BR>2<BR>3<BR>5<BR>;<BR>run;</FONT></P>
<P><FONT color=#0000ff>proc print data=sendmail;<BR>run;</FONT></P></BLOCKQUOTE>
<P>2. The output is then saved into a file called test.out using the codes like</P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P><FONT color=#0000ff>/*Save the output and the log into files*/<BR>DM OUTPUT 'FILE "c:\send\test.OUT"';<BR>DM LOG 'FILE "c:\send\test.LOG"';</FONT></P></BLOCKQUOTE>
<P dir=ltr>3. Finally, we send this output file as attachment to your email using the codes like</P>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<P dir=ltr><FONT color=#0000ff>/*Send the output to your email*/<BR>data _null_;<BR>call system('c:\send\mailcmd "mail.xxxx.xxxx" </FONT>"xxxx@<FONT color=#0000ff></FONT>gmail.com<FONT color=#0000ff>" <a href="mailto:xxxx@u" target="_blank" ><FONT color=#000000>"</FONT></A>xxxx@<FONT color=#000000>gmail.com</FONT><FONT color=#0000ff>" </FONT></FONT><FONT color=#0000ff>"Subject: SAS results" "Contents: SAS results" "" "c:\send\readme.txt"');<BR>run;</FONT> </P></BLOCKQUOTE>
<P dir=ltr>In this step, we used a dos program called mailcmd which can be downloaded by clicking this link <a href="http://www.sms4mail.com/download/mailcmd.zip" target="_blank" ><FONT color=#000000>http://www.sms4mail.com/download/mailcmd.zip</FONT></A>. In my example, I put this dos file in the folder "c:\send".</P>
<P dir=ltr>Run step 1 to 3, you will receive an email with the file test.out as an attachment.</P></DIV></TD></TR></TABLE></TD></TR></TABLE>

匿名网友

24楼

匿名网友 发表于 2007-3-17 00:31:00

<TABLE cellSpacing=1 cellPadding=4 width="100%" border=0>

<TR>
<TD vAlign=top>
<DIV class=subhead><B>A complete SAS program to iteratively run WinBUGS for Monte Carlo simulation</B></DIV></TD></TR>
<TR>
<TD vAlign=top>
<DIV class=content>
<P>Download the pdf file with notes. <a href="http://www.psychstat.org/us/upload/Using%20WinBUGS%20from%20SAS.pdf" target="_blank" ><FONT color=#000000>PDF file</FONT></A></P>
<P>TITLE 'Run WinBUGS from SAS: A confirmatory factor Example by Zhang et al. (2006)';</P>
<P>/*WinBUGS program for CFA*/<BR>FILENAME model "C:\zzy\research\SASWinBUGS\cfamodel.txt";<BR>DATA model;<BR>INPUT model $80.;<BR>CARDS;/*start the model*/<BR>model{<BR>for (i in 1:200) {<BR> for (t in 1:4){<BR>       y[i,t]~dnorm(muy[i,t],Inv_sig2[t])<BR>       muy[i,t]<-fload[t]*fscore<BR>}<BR>fscore~dnorm(0, 1)<BR><BR>}<BR>for (t in 1:4){<BR>fload[t]~dnorm(0, 1.0E-6)<BR>Inv_sig2[t]~dgamma(0.001, .001)<BR>Para[t]<-fload[t]<BR>Para[t+4]<-1/Inv_sig2[t]<BR>}<BR>}<BR>;<BR>RUN;<BR>DATA _NULL_;<BR>  SET model;<BR>  FILE model;<BR>  PUT model;<BR>RUN;</P>
<P>/*Starting values*/<BR>DATA _NULL_;<BR>FILE "C:\zzy\research\SASWinBUGS\cfaini.txt";<BR>PUT "list(fload=c(.5,.5,.5,.5), Inv_sig2=c(1,1,1,1))";<BR>RUN;</P>
<P>/*Scripts to run WinBUGS*/<BR>FILENAME runcfa 'c:\program files\winbugs14\runcfa.txt';<BR>DATA _NULL_;<BR>  FILE runcfa;<BR>  PUT@1 "display('log')";<BR>  PUT@1 "check('C:/zzy/research/SASWinBUGS/cfamodel.txt')" ;<BR>  PUT@1 "data('C:/zzy/research/SASWinBUGS/cfadata.txt')";<BR>  PUT@1 "compile(1)";<BR>  PUT@1 "inits(1, 'C:/zzy/research/SASWinBUGS/cfaini.txt')";<BR>  PUT@1 "gen.inits()";<BR>  PUT@1 "update(2000)";<BR>  PUT@1 "set(Para)";<BR>  PUT@1 "update(3000)";<BR>  PUT@1 "stats(*)";<BR>  PUT@1 "save('C:/zzy/research/SASWinBUGS/cfalog.txt')";<BR>  PUT@1  "quit()";<BR>RUN;</P>
<P>DATA _NULL_;<BR>FILE "C:\zzy\research\SASWinBUGS\runcfa.bat";<BR>PUT '"C:\program files\WinBUGS14\WinBUGS14.exe" /PAR runcfa.txt';<BR>PUT 'exit';<BR>RUN;</P>
<P><BR>%macro simcfa(n);<BR>TITLE2 'Generate the Data';<BR>DATA Sim_CFA;<BR>*setting the true parameter values;<BR>fload=.8; sig2=.36;<BR>* setting statistical parameters;<BR>  N = 200; seed = 20060802+&n; M=4;<BR>* need to setup arrays so we can have more variables;<BR>ARRAY y_score{4} y1-y4;<BR>ARRAY e_score{4} y1-y4; </P>
<P>* generating raw data;<BR>  DO _N_ = 1 TO N;<BR>* now the indicator variables ;<BR>    f_score=RANNOR(seed);<BR>    DO t = 1 TO M;<BR>       y_score{t} = fload*f_score +sqrt(sig2)*RANNOR(seed);<BR> END;<BR> KEEP y1-y4;<BR> OUTPUT;<BR> END;<BR>RUN;</P>
<P>/*Data*/<BR>%_sexport(data=Sim_CFA, file='C:\zzy\research\SASWinBUGS\cfadata.txt',<BR>var=y1-y4);</P>
<P><BR>/*Run WinBUGS*/<BR>DATA _NULL_;<BR>X "C:\zzy\research\SASWinBUGS\runcfa.bat";<BR>RUN;<BR>QUIT;</P>
<P>/*Read in the log file to view the DIC*/<BR>TITLE2 'Simulation '&n;<BR>DATA log;<BR>INFILE "C:\zzy\research\SASWinBUGS\cfalog.txt" TRUNCOVER ;<BR>INPUT log $80.;<BR>log=translate(log," ","09"x);<BR>IF (SUBSTR(log, 2, 4) ne 'Para') then delete;<BR>RUN;</P>
<P>PROC PRINT DATA=log;<BR>RUN;<BR>%mend;</P>
<P>%macro runsimcfa;<BR> %let n=1;<BR>    %do %while(&n <= 100);<BR>       %simcfa(&n);<BR>    %let n=%eval(&n+1);<BR> %end;<BR>%mend runsimcfa;</P>
<P>%runsimcfa;</P>
<P>DM OUTPUT 'FILE "C:\zzy\research\SASWinBUGS\allresults.txt"';<BR>DM LOG 'FILE "C:\zzy\research\SASWinBUGS\allresults.log"';</P>
<P>/*Analyze the results*/<BR>DATA temp;<BR>INFILE "C:\zzy\research\SASWinBUGS\allresults.txt" TRUNCOVER ;<BR>INPUT all $90.;<BR>IF (SUBSTR(all, 7, 4) NE 'Para')  THEN DELETE;<BR>FILE "C:\zzy\research\SASWinBUGS\temp.txt";<BR>PUT all;<BR>RUN;</P>
<P>DATA temp;<BR>INFILE  "C:\zzy\research\SASWinBUGS\temp.txt";<BR>INPUT parid parname $ parest parsd MCerror p25 median p975 start sample;<BR>id=int((_N_-.1)/8)+1;<BR>parest=abs(parest);<BR>RUN;</P>
<P>/*Parameter Estimates*/<BR>PROC TRANSPOSE DATA=temp OUT=parest PREFIX=par;<BR> BY id ;<BR> ID parid;<BR> VAR parest;<BR>RUN;</P>
<P>PROC MEANS DATA=parest;<BR>VAR par1-par8;<BR>RUN;</P>
<P>/*SDs*/<BR>PROC TRANSPOSE DATA=temp OUT=parsd PREFIX=sd;<BR> BY id ;<BR> ID parid;<BR> VAR parsd;<BR>RUN;</P>
<P>PROC MEANS DATA=parsd;<BR>VAR sd1-sd8;<BR>RUN;</P></DIV></TD></TR></TABLE>

25楼

xiaoliangfen 发表于 2007-3-17 14:40:00

<P>有关于PLS过程的资料吗,谢谢楼主</P>

匿名网友

26楼

匿名网友 发表于 2007-3-17 18:33:00

<DIV class=quote><B>以下是引用<I>xiaoliangfen</I>在2007-3-17 14:40:00的发言：</B><BR>
<P>有关于PLS过程的资料吗,谢谢楼主</P></DIV>
<P>
<TABLE cellSpacing=0 cellPadding=0 width="100%">

<TR vAlign=center>
<TD class=runninghead>The PLS Procedure </TD></TR></TABLE>
<P>
<H1>Overview </H1>
<P>The PLS procedure fits models using any one of a number of linear predictive methods, including <EM>partial least squares</EM> (PLS). Ordinary least squares regression, as implemented in SAS/STAT procedures such as PROC GLM and PROC REG, has the single goal of minimizing sample response prediction error, seeking linear functions of the predictors that explain as much variation in each response as possible. The techniques implemented in the PLS procedure have the additional goal of accounting for variation in the predictors, under the assumption that directions in the predictor space that are well sampled should provide better prediction for <EM>new</EM> observations when the predictors are highly correlated. All of the techniques implemented in the PLS procedure work by extracting successive linear combinations of the predictors, called <EM>factors</EM> (also called <EM>components</EM>, <EM>latent vectors</EM>, or <EM>latent variables</EM>), which optimally address one or both of these two goals - explaining response variation and explaining predictor variation. In particular, the method of partial least squares balances the two objectives, seeking for factors that explain both response and predictor variation.
<P>Note that the name "partial least squares" also applies to a more general statistical method that is <EM>not</EM> implemented in this procedure. The partial least squares method was originally developed in the 1960s by the econometrician Herman Wold (1966) for modeling "paths" of causal relation between any number of "blocks" of variables. However, the PLS procedure fits only <EM>predictive</EM> partial least squares models, with one "block" of predictors and one "block" of responses. If you are interested in fitting more general path models, you should consider using the CALIS procedure. </P>
<H2>Basic Features </H2>
<P>The techniques implemented by the PLS procedure are
<UL>
<LI>principal components regression, which extracts factors to explain as much predictor sample variation as possible.
<LI>reduced rank regression, which extracts factors to explain as much response variation as possible. This technique, also known as (maximum) redundancy analysis, differs from multivariate linear regression only when there are multiple responses.
<LI>partial least squares regression, which balances the two objectives of explaining response variation and explaining predictor variation. Two different formulations for partial least squares are available: the original predictive method of Wold (1966) and the SIMPLS method of de  Jong (1993). </LI></UL>
<P>The number of factors to extract depends on the data. Basing the model on more extracted factors improves the model fit to the observed data, but extracting too many factors can cause <EM>over-fitting</EM>, that is, tailoring the model too much to the current data, to the detriment of future predictions. The PLS procedure enables you to choose the number of extracted factors by <EM>cross validation</EM>, that is, fitting the model to part of the data, minimizing the prediction error for the unfitted part, and iterating with different portions of the data in the roles of fitted and unfitted. Various methods of cross validation are available, including one-at-a-time validation, and splitting the data into blocks. The PLS procedure also offers test set validation, where the model is fit to the entire primary input data set and the fit is evaluated over a distinct test data set.
<P>You can use the general linear modeling approach of the GLM procedure to specify a model for your design, allowing for general polynomial effects as well as classification or ANOVA effects. You can save the model fit by the PLS procedure in a data set and apply it to new data by using the SCORE procedure. </P>
<H2>Spectrometric Calibration </H2>
<P>The example in this section illustrates basic features of the PLS procedure. The data are reported in Umetrics (1995); the original source is Lindberg, Persson, and Wold (1983). Suppose that you are researching pollution in the Baltic Sea, and you would like to use the spectra of samples of sea water to determine the amounts of three compounds present in samples from the Baltic Sea: lignin sulfonate (ls: pulp industry pollution), humic acids (ha: natural forest products), and optical whitener from detergent (dt). Spectrometric calibration is a type of problem in which partial least squares can be very effective. The predictors are the spectra emission intensities at different frequencies in sample spectrum, and the responses are the amounts of various chemicals in the sample.
<P>For the purposes of calibrating the model, samples with known compositions are used. The calibration data consist of 16 samples of known concentrations of ls, ha, and dt, with spectra based on 27 frequencies (or, equivalently, wavelengths). The following statements create a SAS data set named Sample for these data.
<P><PRE>
data Sample;
   input obsnam $ v1-v27 ls ha dt @@;
   datalines;
EM1 2766 2610 3306 3630 3600 3438 3213 3051 2907 2844 2796
      2787 2760 2754 2670 2520 2310 2100 1917 1755 1602 1467
      1353 1260 1167 1101 1017       3.0110  0.0000 0.00
EM2 1492 1419 1369 1158  958  887  905  929  920  887  800
      710  617  535  451  368  296  241  190  157  128  106
         89 70 65 56 50       0.0000  0.4005 0.00
EM3 2450 2379 2400 2055 1689 1355 1109  908  750  673  644
      640  630  618  571  512  440  368  305  247  196  156
      120 98 80 61 50       0.0000  0.0000  90.63
EM4 2751 2883 3492 3570 3282 2937 2634 2370 2187 2070 2007
      1974 1950 1890 1824 1680 1527 1350 1206 1080  984  888
      810  732  669  630  582       1.4820  0.1580  40.00
EM5 2652 2691 3225 3285 3033 2784 2520 2340 2235 2148 2094
      2049 2007 1917 1800 1650 1464 1299 1140 1020  909  810
      726  657  594  549  507       1.1160  0.4104  30.45
EM6 3993 4722 6147 6720 6531 5970 5382 4842 4470 4200 4077
      4008 3948 3864 3663 3390 3090 2787 2481 2241 2028 1830
      1680 1533 1440 1314 1227       3.3970  0.3032  50.82
EM7 4032 4350 5430 5763 5490 4974 4452 3990 3690 3474 3357
      3300 3213 3147 3000 2772 2490 2220 1980 1779 1599 1440
      1320 1200 1119 1032  957       2.4280  0.2981  70.59
EM8 4530 5190 6910 7580 7510 6930 6150 5490 4990 4670 4490
      4370 4300 4210 4000 3770 3420 3060 2760 2490 2230 2060
      1860 1700 1590 1490 1380       4.0240  0.1153  89.39
EM9 4077 4410 5460 5857 5607 5097 4605 4170 3864 3708 3588
      3537 3480 3330 3192 2910 2610 2325 2064 1830 1638 1476
      1350 1236 1122 1044  963       2.2750  0.5040  81.75
EM10  3450 3432 3969 4020 3678 3237 2814 2487 2205 2061 2001
      1965 1947 1890 1776 1635 1452 1278 1128  981  867  753
      663  600  552  507  468       0.9588  0.1450 101.10
EM11  4989 5301 6807 7425 7155 6525 5784 5166 4695 4380 4197
      4131 4077 3972 3777 3531 3168 2835 2517 2244 2004 1809
      1620 1470 1359 1266 1167       3.1900  0.2530 120.00
EM12  5340 5790 7590 8390 8310 7670 6890 6190 5700 5380 5200
      5110 5040 4900 4700 4390 3970 3540 3170 2810 2490 2240
      2060 1870 1700 1590 1470       4.1320  0.5691 117.70
EM13  3162 3477 4365 4650 4470 4107 3717 3432 3228 3093 3009
      2964 2916 2838 2694 2490 2253 2013 1788 1599 1431 1305
      1194 1077  990  927  855       2.1600  0.4360  27.59
EM14  4380 4695 6018 6510 6342 5760 5151 4596 4200 3948 3807
      3720 3672 3567 3438 3171 2880 2571 2280 2046 1857 1680
      1548 1413 1314 1200 1119       3.0940  0.2471  61.71
EM15  4587 4200 5040 5289 4965 4449 3939 3507 3174 2970 2850
      2814 2748 2670 2529 2328 2088 1851 1641 1431 1284 1134
      1020  918  840  756  714       1.6040  0.2856 108.80
EM16  4017 4725 6090 6570 6354 5895 5346 4911 4611 4422 4314
      4287 4224 4110 3915 3600 3240 2913 2598 2325 2088 1917
      1734 1587 1452 1356 1257       3.1620  0.7012  60.00
;
</PRE>
<P>
<H3><I>Fitting a PLS Model</I></H3>
<P>To isolate a few underlying spectral factors that provide a good predictive model, you can fit a PLS model to the 16 samples using the following SAS statements: </P>
<P><PRE>
proc pls data=sample;
   model ls ha dt = v1-v27;
run;
</PRE>
<P>By default, the PLS procedure extracts at most 15 factors. The procedure lists the amount of variation accounted for by each of these factors, both individual and cumulative; this listing is shown in <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg1" target="_blank" >Figure 56.1</A>.
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV class="c ProcTitle">The PLS Procedure</DIV>
<P>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Percent Variation Accounted for by Extracted Factors" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL>
<COL>
<COL>
<COL></COLGROUP>
<THEAD>
<TR>
<TH b Header" scope=colgroup colSpan=5>Percent Variation Accounted for by Partial<BR>Least Squares Factors</TH></TR>
<TR>
<TH b Header" scope=col rowSpan=2>Number of<BR>Extracted<BR>Factors</TH>
<TH b Header" scope=colgroup colSpan=2>Model Effects</TH>
<TH b Header" scope=colgroup colSpan=2>Dependent Variables</TH></TR>
<TR>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>1</TH>
<TD class="r Data">97.4607</TD>
<TD class="r Data">97.4607</TD>
<TD class="r Data">41.9155</TD>
<TD class="r Data">41.9155</TD></TR>
<TR>
<TH RowHeader" scope=row>2</TH>
<TD class="r Data">2.1830</TD>
<TD class="r Data">99.6436</TD>
<TD class="r Data">24.2435</TD>
<TD class="r Data">66.1590</TD></TR>
<TR>
<TH RowHeader" scope=row>3</TH>
<TD class="r Data">0.1781</TD>
<TD class="r Data">99.8217</TD>
<TD class="r Data">24.5339</TD>
<TD class="r Data">90.6929</TD></TR>
<TR>
<TH RowHeader" scope=row>4</TH>
<TD class="r Data">0.1197</TD>
<TD class="r Data">99.9414</TD>
<TD class="r Data">3.7898</TD>
<TD class="r Data">94.4827</TD></TR>
<TR>
<TH RowHeader" scope=row>5</TH>
<TD class="r Data">0.0415</TD>
<TD class="r Data">99.9829</TD>
<TD class="r Data">1.0045</TD>
<TD class="r Data">95.4873</TD></TR>
<TR>
<TH RowHeader" scope=row>6</TH>
<TD class="r Data">0.0106</TD>
<TD class="r Data">99.9935</TD>
<TD class="r Data">2.2808</TD>
<TD class="r Data">97.7681</TD></TR>
<TR>
<TH RowHeader" scope=row>7</TH>
<TD class="r Data">0.0017</TD>
<TD class="r Data">99.9952</TD>
<TD class="r Data">1.1693</TD>
<TD class="r Data">98.9374</TD></TR>
<TR>
<TH RowHeader" scope=row>8</TH>
<TD class="r Data">0.0010</TD>
<TD class="r Data">99.9961</TD>
<TD class="r Data">0.5041</TD>
<TD class="r Data">99.4415</TD></TR>
<TR>
<TH RowHeader" scope=row>9</TH>
<TD class="r Data">0.0014</TD>
<TD class="r Data">99.9975</TD>
<TD class="r Data">0.1229</TD>
<TD class="r Data">99.5645</TD></TR>
<TR>
<TH RowHeader" scope=row>10</TH>
<TD class="r Data">0.0010</TD>
<TD class="r Data">99.9985</TD>
<TD class="r Data">0.1103</TD>
<TD class="r Data">99.6747</TD></TR>
<TR>
<TH RowHeader" scope=row>11</TH>
<TD class="r Data">0.0003</TD>
<TD class="r Data">99.9988</TD>
<TD class="r Data">0.1523</TD>
<TD class="r Data">99.8270</TD></TR>
<TR>
<TH RowHeader" scope=row>12</TH>
<TD class="r Data">0.0003</TD>
<TD class="r Data">99.9991</TD>
<TD class="r Data">0.1291</TD>
<TD class="r Data">99.9561</TD></TR>
<TR>
<TH RowHeader" scope=row>13</TH>
<TD class="r Data">0.0002</TD>
<TD class="r Data">99.9994</TD>
<TD class="r Data">0.0312</TD>
<TD class="r Data">99.9873</TD></TR>
<TR>
<TH RowHeader" scope=row>14</TH>
<TD class="r Data">0.0004</TD>
<TD class="r Data">99.9998</TD>
<TD class="r Data">0.0065</TD>
<TD class="r Data">99.9938</TD></TR>
<TR>
<TH RowHeader" scope=row>15</TH>
<TD class="r Data">0.0002</TD>
<TD class="r Data">100.0000</TD>
<TD class="r Data">0.0062</TD>
<TD class="r Data">100.0000</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.1:</B> PLS Variation Summary </P>
<P>
<P>Note that all of the variation in both the predictors and the responses is accounted for by only 15 factors; this is because there are only 16 sample observations. More importantly, almost all of the variation is accounted for with even fewer factors - one or two for the predictors and three to eight for the responses.
<P>
<H3><I>Selecting the Number of Factors by Cross Validation</I></H3>
<P>A PLS model is not complete until you choose the number of factors. You can choose the number of factors by using cross validation, in which the data set is divided into two or more groups. You fit the model to all groups except one, then you check the capability of the model to predict responses for the group omitted. Repeating this for each group, you then can measure the overall capability of a given form of the model. The Predicted REsidual Sum of Squares (PRESS) statistic is based on the residuals generated by this process.
<P>To select the number of extracted factors by cross validation, you specify the CV= option with an argument that says which cross validation method to use. For example, a common method is split-sample validation, in which the different groups are comprised of every <I>n</I>th observation beginning with the first, every <I>n</I>th observation beginning with the second, and so on. You can use the CV=SPLIT option to specify split-sample validation with <I>n</I>=7 by default, as in the following SAS statements:
<P><PRE>
proc pls data=sample cv=split;
   model ls ha dt = v1-v27;
run;
</PRE>
<P>The resulting output is shown in <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg2" target="_blank" >Figure 56.2</A> and <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg3" target="_blank" >Figure 56.3</A>.
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV class="c ProcTitle">The PLS Procedure</DIV>
<P>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Residual Summary" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL></COLGROUP>
<THEAD>
<TR>
<TH b Header" scope=colgroup colSpan=2>Split-sample Validation<BR>for the Number of<BR>Extracted Factors</TH></TR>
<TR>
<TH b Header" scope=col>Number of<BR>Extracted<BR>Factors</TH>
<TH b Header" scope=col>Root Mean PRESS</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>0</TH>
<TD class="r Data">1.107747</TD></TR>
<TR>
<TH RowHeader" scope=row>1</TH>
<TD class="r Data">0.957983</TD></TR>
<TR>
<TH RowHeader" scope=row>2</TH>
<TD class="r Data">0.931314</TD></TR>
<TR>
<TH RowHeader" scope=row>3</TH>
<TD class="r Data">0.520222</TD></TR>
<TR>
<TH RowHeader" scope=row>4</TH>
<TD class="r Data">0.530501</TD></TR>
<TR>
<TH RowHeader" scope=row>5</TH>
<TD class="r Data">0.586786</TD></TR>
<TR>
<TH RowHeader" scope=row>6</TH>
<TD class="r Data">0.475047</TD></TR>
<TR>
<TH RowHeader" scope=row>7</TH>
<TD class="r Data">0.477595</TD></TR>
<TR>
<TH RowHeader" scope=row>8</TH>
<TD class="r Data">0.483138</TD></TR>
<TR>
<TH RowHeader" scope=row>9</TH>
<TD class="r Data">0.485739</TD></TR>
<TR>
<TH RowHeader" scope=row>10</TH>
<TD class="r Data">0.48946</TD></TR>
<TR>
<TH RowHeader" scope=row>11</TH>
<TD class="r Data">0.521445</TD></TR>
<TR>
<TH RowHeader" scope=row>12</TH>
<TD class="r Data">0.525653</TD></TR>
<TR>
<TH RowHeader" scope=row>13</TH>
<TD class="r Data">0.531049</TD></TR>
<TR>
<TH RowHeader" scope=row>14</TH>
<TD class="r Data">0.531049</TD></TR>
<TR>
<TH RowHeader" scope=row>15</TH>
<TD class="r Data">0.531049</TD></TR></TABLE></DIV></DIV><BR>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Results" frame=box>
<COLGROUP>
<COL>
<COL></COLGROUP>

<TR>
<TH RowHeader" scope=row>Minimum root mean PRESS</TH>
<TD class="r Data">0.4750</TD></TR>
<TR>
<TH RowHeader" scope=row>Minimizing number of factors</TH>
<TD class="r Data">6</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.2:</B> Split-Sample Validated PRESS Statistics for Number of Factors </P>
<P>
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV class="c ProcTitle">The PLS Procedure</DIV>
<P>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Percent Variation Accounted for by Extracted Factors" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL>
<COL>
<COL>
<COL></COLGROUP>
<THEAD>
<TR>
<TH b Header" scope=colgroup colSpan=5>Percent Variation Accounted for by Partial<BR>Least Squares Factors</TH></TR>
<TR>
<TH b Header" scope=col rowSpan=2>Number of<BR>Extracted<BR>Factors</TH>
<TH b Header" scope=colgroup colSpan=2>Model Effects</TH>
<TH b Header" scope=colgroup colSpan=2>Dependent Variables</TH></TR>
<TR>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>1</TH>
<TD class="r Data">97.4607</TD>
<TD class="r Data">97.4607</TD>
<TD class="r Data">41.9155</TD>
<TD class="r Data">41.9155</TD></TR>
<TR>
<TH RowHeader" scope=row>2</TH>
<TD class="r Data">2.1830</TD>
<TD class="r Data">99.6436</TD>
<TD class="r Data">24.2435</TD>
<TD class="r Data">66.1590</TD></TR>
<TR>
<TH RowHeader" scope=row>3</TH>
<TD class="r Data">0.1781</TD>
<TD class="r Data">99.8217</TD>
<TD class="r Data">24.5339</TD>
<TD class="r Data">90.6929</TD></TR>
<TR>
<TH RowHeader" scope=row>4</TH>
<TD class="r Data">0.1197</TD>
<TD class="r Data">99.9414</TD>
<TD class="r Data">3.7898</TD>
<TD class="r Data">94.4827</TD></TR>
<TR>
<TH RowHeader" scope=row>5</TH>
<TD class="r Data">0.0415</TD>
<TD class="r Data">99.9829</TD>
<TD class="r Data">1.0045</TD>
<TD class="r Data">95.4873</TD></TR>
<TR>
<TH RowHeader" scope=row>6</TH>
<TD class="r Data">0.0106</TD>
<TD class="r Data">99.9935</TD>
<TD class="r Data">2.2808</TD>
<TD class="r Data">97.7681</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.3:</B> PLS Variation Summary for Split-Sample Validated Model </P>
<P>
<P>The absolute minimum PRESS is achieved with six extracted factors. Notice, however, that this is not much smaller than the PRESS for three factors. By using the CVTEST option, you can perform a statistical model comparison suggested by van der Voet (1994) to test whether this difference is significant, as shown in the following SAS statements:
<P><PRE>
proc pls data=sample cv=split cvtest(seed=12345);
   model ls ha dt = v1-v27;
run;
</PRE>
<P>The model comparison test is based on a rerandomization of the data. By default, the seed for this randomization is based on the system clock, but it is specified here. The resulting output is shown in <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg4" target="_blank" >Figure 56.4</A> and <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg5" target="_blank" >Figure 56.5</A>.
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV class="c ProcTitle">The PLS Procedure</DIV>
<P>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Residual Summary" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL>
<COL>
<COL></COLGROUP>
<THEAD>
<TR>
<TH b Header" scope=colgroup colSpan=4>Split-sample Validation for the<BR>Number of Extracted Factors</TH></TR>
<TR>
<TH b Header" scope=col>Number of<BR>Extracted<BR>Factors</TH>
<TH b Header" scope=col>Root Mean PRESS</TH>
<TH b Header" scope=col>T**2</TH>
<TH b Header" scope=col>Prob > T**2</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>0</TH>
<TD class="r Data">1.107747</TD>
<TD class="r Data">9.272858</TD>
<TD class="r Data">0.0010</TD></TR>
<TR>
<TH RowHeader" scope=row>1</TH>
<TD class="r Data">0.957983</TD>
<TD class="r Data">10.62305</TD>
<TD class="r Data"><.0001</TD></TR>
<TR>
<TH RowHeader" scope=row>2</TH>
<TD class="r Data">0.931314</TD>
<TD class="r Data">8.950878</TD>
<TD class="r Data"><.0001</TD></TR>
<TR>
<TH RowHeader" scope=row>3</TH>
<TD class="r Data">0.520222</TD>
<TD class="r Data">5.133259</TD>
<TD class="r Data">0.1430</TD></TR>
<TR>
<TH RowHeader" scope=row>4</TH>
<TD class="r Data">0.530501</TD>
<TD class="r Data">5.168427</TD>
<TD class="r Data">0.1330</TD></TR>
<TR>
<TH RowHeader" scope=row>5</TH>
<TD class="r Data">0.586786</TD>
<TD class="r Data">6.437266</TD>
<TD class="r Data">0.0150</TD></TR>
<TR>
<TH RowHeader" scope=row>6</TH>
<TD class="r Data">0.475047</TD>
<TD class="r Data">0</TD>
<TD class="r Data">1.0000</TD></TR>
<TR>
<TH RowHeader" scope=row>7</TH>
<TD class="r Data">0.477595</TD>
<TD class="r Data">2.809763</TD>
<TD class="r Data">0.4750</TD></TR>
<TR>
<TH RowHeader" scope=row>8</TH>
<TD class="r Data">0.483138</TD>
<TD class="r Data">7.189526</TD>
<TD class="r Data">0.0110</TD></TR>
<TR>
<TH RowHeader" scope=row>9</TH>
<TD class="r Data">0.485739</TD>
<TD class="r Data">7.931726</TD>
<TD class="r Data">0.0060</TD></TR>
<TR>
<TH RowHeader" scope=row>10</TH>
<TD class="r Data">0.48946</TD>
<TD class="r Data">6.612597</TD>
<TD class="r Data">0.0140</TD></TR>
<TR>
<TH RowHeader" scope=row>11</TH>
<TD class="r Data">0.521445</TD>
<TD class="r Data">6.666235</TD>
<TD class="r Data">0.0130</TD></TR>
<TR>
<TH RowHeader" scope=row>12</TH>
<TD class="r Data">0.525653</TD>
<TD class="r Data">7.092861</TD>
<TD class="r Data">0.0070</TD></TR>
<TR>
<TH RowHeader" scope=row>13</TH>
<TD class="r Data">0.531049</TD>
<TD class="r Data">7.538298</TD>
<TD class="r Data">0.0020</TD></TR>
<TR>
<TH RowHeader" scope=row>14</TH>
<TD class="r Data">0.531049</TD>
<TD class="r Data">7.538298</TD>
<TD class="r Data">0.0020</TD></TR>
<TR>
<TH RowHeader" scope=row>15</TH>
<TD class="r Data">0.531049</TD>
<TD class="r Data">7.538298</TD>
<TD class="r Data">0.0020</TD></TR></TABLE></DIV></DIV><BR>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Results" frame=box>
<COLGROUP>
<COL>
<COL></COLGROUP>

<TR>
<TH RowHeader" scope=row>Minimum root mean PRESS</TH>
<TD class="r Data">0.4750</TD></TR>
<TR>
<TH RowHeader" scope=row>Minimizing number of factors</TH>
<TD class="r Data">6</TD></TR>
<TR>
<TH RowHeader" scope=row>Smallest number of factors with p > 0.1</TH>
<TD class="r Data">3</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.4:</B> Testing Split-Sample Validation for Number of Factors </P>
<P>
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV class="c ProcTitle">The PLS Procedure</DIV>
<P>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure PLS: Percent Variation Accounted for by Extracted Factors" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL>
<COL>
<COL>
<COL></COLGROUP>
<THEAD>
<TR>
<TH b Header" scope=colgroup colSpan=5>Percent Variation Accounted for by Partial<BR>Least Squares Factors</TH></TR>
<TR>
<TH b Header" scope=col rowSpan=2>Number of<BR>Extracted<BR>Factors</TH>
<TH b Header" scope=colgroup colSpan=2>Model Effects</TH>
<TH b Header" scope=colgroup colSpan=2>Dependent Variables</TH></TR>
<TR>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH>
<TH b Header" scope=col>Current</TH>
<TH b Header" scope=col>Total</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>1</TH>
<TD class="r Data">97.4607</TD>
<TD class="r Data">97.4607</TD>
<TD class="r Data">41.9155</TD>
<TD class="r Data">41.9155</TD></TR>
<TR>
<TH RowHeader" scope=row>2</TH>
<TD class="r Data">2.1830</TD>
<TD class="r Data">99.6436</TD>
<TD class="r Data">24.2435</TD>
<TD class="r Data">66.1590</TD></TR>
<TR>
<TH RowHeader" scope=row>3</TH>
<TD class="r Data">0.1781</TD>
<TD class="r Data">99.8217</TD>
<TD class="r Data">24.5339</TD>
<TD class="r Data">90.6929</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.5:</B> PLS Variation Summary for Tested Split-Sample Validated Model </P>
<P>
<P>The <I>p</I>-value of 0.1430 in comparing the cross-validated residuals from models with 6 and 3 factors indicates that the difference between the two models is insignificant; therefore, the model with fewer factors is preferred. The variation summary shows that over 99% of the predictor variation and over 90% of the response variation are accounted for by the three factors.
<P>
<H3><I>Predicting New Observations</I></H3>
<P>Now that you have chosen a three-factor PLS model for predicting pollutant concentrations based on sample spectra, suppose that you have two new samples. The following SAS statements create a data set containing the spectra for the new samples:
<P><PRE>
data newobs;
   input obsnam $ v1-v27 @@;
   datalines;
EM17  3933 4518 5637 6006 5721 5187 4641 4149 3789
      3579 3447 3381 3327 3234 3078 2832 2571 2274
      2040 1818 1629 1470 1350 1245 1134 1050  987
EM25  2904 2997 3255 3150 2922 2778 2700 2646 2571
      2487 2370 2250 2127 2052 1713 1419 1200  984
      795  648  525  426  351  291  240  204  162
;
</PRE>
<P>You can apply the PLS model to these samples to estimate pollutant concentration. To do so, append the new samples to the original 16, and specify that the predicted values for all 18 be output to a data set, as shown in the following statements:
<P><PRE>
data all; set sample newobs;
proc pls data=all nfac=3;
   model ls ha dt = v1-v27;
   output out=pred p=p_ls p_ha p_dt;
proc print data=pred;
   where (obsnam in ('EM17','EM25'));
   var obsnam p_ls p_ha p_dt;
run;
</PRE>
<P>The new observations are not used in calculating the PLS model, since they have no response values. Their predicted concentrations are shown in <a href="mk:@MSITStore:s:\SAS\SAS%209.1\nls\zh\help\statug.chm::/statug.hlp/pls_sect4.htm#stat_pls_plsg6" target="_blank" >Figure 56.6</A>.
<P>
<P>
<CENTER>
<TABLE borderColor=#000000 cellSpacing=0 cellPadding=10 rules=groups width="95%" bgColor=#f0f0f0 border=1 frame=box>

<TR>
<TD vAlign=center align=middle bgColor=#f0f0f0>
<DIV class=branch>
<DIV>
<DIV align=center>
<TABLE cellSpacing=1 cellPadding=7 rules=groups summary="Procedure Print: Data Set WORK.PRED" frame=box>
<COLGROUP>
<COL></COLGROUP>
<COLGROUP>
<COL>
<COL>
<COL>
<COL></COLGROUP>
<THEAD>
<TR>
<TH Header" scope=col>Obs</TH>
<TH Header" scope=col>obsnam</TH>
<TH Header" scope=col>p_ls</TH>
<TH Header" scope=col>p_ha</TH>
<TH Header" scope=col>p_dt</TH></TR></THEAD>

<TR>
<TH RowHeader" scope=row>17</TH>
<TD class=" Data">EM17</TD>
<TD class="r Data">2.54261</TD>
<TD class="r Data">0.31877</TD>
<TD class="r Data">81.4174</TD></TR>
<TR>
<TH RowHeader" scope=row>18</TH>
<TD class=" Data">EM25</TD>
<TD class="r Data" noWrap>-0.24716</TD>
<TD class="r Data">1.37892</TD>
<TD class="r Data">46.3212</TD></TR></TABLE></DIV></DIV><BR></DIV>
<p></TD></TR></TABLE></CENTER>
<P><BR><B>Figure 56.6:</B> Predicted Concentrations for New Observations</P>