PARTIAL REGRESSION PLOT
Graphics Command
Generate a partial regression plot. Note that partial regression plots are also referred to as added variable plots, adjusted variable plots, and individual coefficient plots.
- Compute the residuals of regressing the response variable against the indpendent variables but omitting Xi
- Compute the residuals from regressing Xiagainst the remaining indpendent variables.
- Plot the residuals from (1) against the residuals from (2).
When performing a linear regression with a single independent variable, a scatter plot of the response variable against the independent variable provides a good indication of the nature of the relationship. If there is more than one independent variable, things become more complicated. Although it can still be useful to generate scatter plots of the response variable against each of the independent variables, this does not take into account the effect of the other independent variables in the model.Partial regression plots attempt to show the effect of adding an additional variable to the model (given that one or more indpendent variables are already in the model). Partial regression plots are formed by:
Velleman and Welsch (see References below) express this mathematically as:
Y.[i] versus Xi.[i]
where
Y.[i] = residuals from regressing Y (the response variable) against all the indpendent variables exceptXi
Xi.[i] = residuals from regressing Xi against the remaining indpependent variables.
Velleman and Welsch list the following useful properties for this plot:
- The least squares linear fit to this plot has the slopei and intercept zero.
- The residuals from the least squares linear fit to this plot are identical to the residuals from the least squares fit of the original model (Y against all the independent variables including Xi).
- The influences of individual data values on the estimation of a coefficient are easy to see in this plot.
- It is easy to see many kinds of failures of the model or violations of the underlying assumptions (nonlinearity, heteroscedasticity, unusual patterns).
Partial regression plots are widely discussed in the regression diagnostics literature (e.g., see the References section below). Since the strengths and weaknesses of partial regression plots are widely discussed in the literature, we will not discuss that in any detail here.
Partial regression plots are related to, but distinct from, partial residual plots. Partial regression plots are most commonly used to identify leverage points and influential data points that might not be leverage points. Partial residual plots are most commonly used to identify the nature of the relationship between Y and Xi(given the effect of the other indpendent variables in the model). Note that since the simple correlation betweeen the two sets of residuals plotted is equal to the partial correlation between the response variable and Xi partial regression plots will show the correct strength of the linear relationship between the response variable and XiThis is not true for partial residual plots. On the other hand, for the partial regression plot, the x axis is not Xi. This limits its usefulness in determining the need for a transformation (which is the primary purpose of the partial residual plot).
Dataplot provides two forms for the partial regression plot. You can generate either a single partial regression plot or you can generate a matrix of partial regression plots (one plot for each independent variable in the model).
For the matrix form of the command, a number of SET FACTOR PLOT options can be used to control the appearance of the plot (not all of the SET FACTOR PLOT options apply). These are discussed in the Notes section below. Syntax 1:
PARTIAL REGRESSION PLOT <y> <x1> ... <xk> <xi>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> are the independent variables;
<xi> is the independent variable for which the partial regression plot is being generated
(note that <xi> must be one of the variables listed in <x1> ... <xk>;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.This is the syntax for generating a single partial regression plot.
MATRIX PARTIAL REGRESSION PLOT <y> <x1> ... <xk>
<SUBSET/EXCEPT/FOR qualification>
where <y> is the response variable;
<x1> ... <xk> are the independent variables;
and where the <SUBSET/EXCEPT/FOR qualification> is optional.This syntax is used to generate a matrix of partial regression plots.
PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 X2
MATRIX PARTIAL REGRESSION PLOT Y X1 X2 X3 X4
PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 X2 SUBSET TAG > 2
MATRIX PARTIAL REGRESSION PLOT Y X1 X2 X3 X4 SUBSET TAG > 2
The following option controls which axis tic marks, tic mark labels, and axis labels are plotted.
SET FACTOR PLOT LABELS <ON/OFF/XON/YON/BOX>
BOX is a special option that creates an extra column on the left and an extra row on the bottom. The axis label is printed in this box. BOX is typically reserved for the plot types that plot the variable names in the axes labels.
The default is ON (both x and y axis labels are printed).
The following option controls where the x axis tic marks, tic mark labels, and axis label are printed.
SET FACTOR PLOT X AXIS <BOTTOM/TOP/ALTERNATE>
The default is ALTERNATE.
The following option controls where the y axis tic marks, tic mark labels, and axis label are printed.
SET FACTOR PLOT Y AXIS <LEFT/RIGHT/ALTERNATE>
The default is ALTERNATE.
Users have different preferences in terms of whether the plot frames for neighboring plots are connected or not. This is controlled with the following option.
SET FACTOR PLOT FRAME <DEFAULT/CONNECTED/USER>
Since the plots can often have different limits for the axes, the default is USER.
When the tic marks and tic mark labels are all plotted on the same side (i.e., SET FACTOR PLOT Y AXIS is set to LEFT or RIGHT or SET PARTIAL RESIDUAL PLOT X AXIS is set to BOTTOM or TOP), then overlap between plots is possible. The TIC OFFSET command can be used to avoid this. In addition, you can stagger the tic labels with the following command:
SET FACTOR PLOT LABEL DISPLACEMENT <NORMAL/STAGGERED/VALUE>
TIC MARK LABEL DISPLACEMENT 10
SET FACTOR PLOT LABEL DISPLACEMENT STAGGERED
SET FACTOR PLOT LABEL DISPLACEMENT 25
It is often helpful on scatter plot matrices to overlay a fitted line on the plots. The following command is used to specify the type of fit.
SET FACTOR PLOT FIT <NONE/LOWESS/LINE/QUAD/SMOOTH>
For LOWESS, it is recommended that the lowess fraction be set fairly high (e.g., LOWESS FRACTION 0.6).
The fitted line is currently only generated if the factor plot type is PLOT.
The default is for no fitted line to be overlaid on the plot. If a overlaid fit is desired, the most common choice is to use LOWESS.
Dataplot allows you to set axis limits with the LIMITS command. For the factor plot, it is often desirable to set the axis limits for each plot. This can be done with the command
SET FACTOR PLOT YLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
SET FACTOR PLOT XLIMITS <LOW1> <UPP1> <LOW2> <UPP2> ...
You can use standard plot control commands to control the appearance of the factor plot.For example,
MULTIPLOT CORNER COORDINATES 5 5 95 95
MULTIPLOT SCALE FACTOR 3
TIC OFFSET UNITS SCREEN
TIC OFFSET 5 5
None
None
FIT | = Perform a multi-linear fit. |
PARTIAL RESIDUAL PLOT | = Generates a partial residual plot. |
PARTIAL LEVERAGE PLOT | = Generates a partial leverage plot. |
CCPR PLOT | = Generates a CCPR plot. |
VIF | = Compute variance inflation factors for a multi-linear fit. |
CONDITION INDICES | = Compute condition indices for a design matrix. |
SCATTER PLOT MATIRX | = Generate a factor plot. |
FACTOR PLOT | = Generate a plot for a response against a number of different independent variables. |
CONDITIONAL PLOT | = Generate a conditional (subset) plot. |