Example:
use
https://stats.idre.ucla.edu/stat/data/hsbdemo, clear
sem (read <- math)(science <- read math)
Endogenous variables
Observed: read science
Exogenous variables
Observed: math
Fitting target model:
Iteration 0: log likelihood = -2098.5822
Iteration 1: log likelihood = -2098.5822
Structural equation model Number of obs = 200
Estimation method = ml
Log likelihood = -2098.5822
-------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0579824 12.50 0.000 .6111636 .8384504
_cons | 14.07254 3.100201 4.54 0.000 7.996255 20.14882
------------+----------------------------------------------------------------
science |
read | .3654205 .0658305 5.55 0.000 .2363951 .4944459
math | .4017207 .0720457 5.58 0.000 .2605138 .5429276
_cons | 11.6155 3.031268 3.83 0.000 5.674324 17.55668
--------------+----------------------------------------------------------------
var(e.read)| 58.71925 5.871925 48.26811 71.43329
var(e.science)| 50.8938 5.08938 41.83548 61.91346
-------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(0) = 0.00, Prob > chi2 = .
estat teffects
Direct effects
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0579824 12.50 0.000 .6111636 .8384504
-----------+----------------------------------------------------------------
science |
read | .3654205 .0658305 5.55 0.000 .2363951 .4944459
math | .4017207 .0720457 5.58 0.000 .2605138 .5429276
------------------------------------------------------------------------------
Indirect effects
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | 0 (no path)
-----------+----------------------------------------------------------------
science |
read | 0 (no path)
math | .2648593 .0522072 5.07 0.000 .1625351 .3671836
------------------------------------------------------------------------------
Total effects
------------------------------------------------------------------------------
| OIM
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0579824 12.50 0.000 .6111636 .8384504
-----------+----------------------------------------------------------------
science |
read | .3654205 .0658305 5.55 0.000 .2363951 .4944459
math | .66658 .05799 11.49 0.000 .5529217 .7802384
------------------------------------------------------------------------------
The total effect for math, .66658, is the effect we would find if there was no mediator in our model. It is significant with a z of 11.49. The direct effect for math is .4017207 which, while still significant (z = 5.58), is much smaller than the total effect. The indirect effect of math that passes through read is .2648593 and is also statistically significant.
It is often easier to interpret these values by computing ratios and proportions as shown below.
proportion of total effect mediated = .2648593/.66658 = .3973406
ratio of indirect to direct effect = .2648593/.4017207 = .65931205
ratio of total to direct effect = .66658/.4017207 = 1.6593121
We see above that the proportion of the total effect that is mediated is almost .40 which is a respectable amount. The ratio of the indirect effect to the direct effect is about .66 or almost 2/3 the size of the direct effect. And finally, the total effect is about 1.66 times the direct effect.
Mediation with bootstrap standard errors and confidence intervals
If you are uncomfortable with the standard errors and confidence intervals produced directly by sem, you can obtain the bootstrapped standard errors and confidence intervals in two ways. First, by using the vce(boostrap) option after your sem command. Or second, by writing a small program that runs both the sem command and the estat teffects and then bootstrapping this program.
Let’s demonstrate the vce(boostrap) option. Here we will add the reps option and request 200 replications.
sem (read <- math)(science <- read math), vce(bootstrap,reps(200))
Bootstrap replications (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
Structural equation model Number of obs = 200
Estimation method = ml Replications = 200
Log likelihood = -2098.5822
-------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0581262 12.47 0.000 .6108818 .8387321
_cons | 14.07254 3.092117 4.55 0.000 8.012099 20.13297
------------+----------------------------------------------------------------
science |
read | .3654205 .0802203 4.56 0.000 .2081915 .5226495
math | .4017207 .0875101 4.59 0.000 .2302041 .5732373
_cons | 11.6155 2.707368 4.29 0.000 6.309158 16.92184
--------------+----------------------------------------------------------------
var(e.read)| 58.71925 5.93704 48.16332 71.58871
var(e.science)| 50.8938 5.496477 41.18471 62.89176
-------------------------------------------------------------------------------
Adding this option provides us bootstrapped confidence intervals. You can now use estat teffects to obtain normal-based bootstrapped confidence intervals around the indirect effect.
estat teffects
Direct effects
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0581262 12.47 0.000 .6108818 .8387321
-----------+----------------------------------------------------------------
science |
read | .3654205 .0802203 4.56 0.000 .2081915 .5226495
math | .4017207 .0875101 4.59 0.000 .2302041 .5732373
------------------------------------------------------------------------------
Indirect effects
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | 0 (no path)
-----------+----------------------------------------------------------------
science |
read | 0 (no path)
math | .2648593 .0593311 4.46 0.000 .1485726 .3811461
------------------------------------------------------------------------------
Total effects
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural |
read |
math | .724807 .0581262 12.47 0.000 .6108818 .8387321
-----------+----------------------------------------------------------------
science |
read | .3654205 .0802203 4.56 0.000 .2081915 .5226495
math | .66658 .0592669 11.25 0.000 .5504189 .7827411
------------------------------------------------------------------------------
However, you can also write a program to perform the bootstrapping. This enables us to obtain both percentile-based and bias-corrected confidence intervals as well as normal-based confidence intervals. Here is the program that we a calling indireff.ado.
program indireff, rclass
sem (read <- math)(science <- read math)
estat teffects
mat bi = r(indirect)
mat bd = r(direct)
mat bt = r(total)
return scalar indir = el(bi,1,3)
return scalar direct = el(bd,1,3)
return scalar total = el(bt,1,3)
end
So how do we know which elements of r(indirect), r(direct) and r(total) we need? We will use the sem command and then quietly run estat teffects followed by a matrix list to see the matrices of the coefficients.
sem (read <- math)(science <- read math)
quietly estat teffects
matrix list r(indirect)
r(indirect)[1,3]
read: science: science:
o. o.
math read math
r1 0 0 .26485934
matrix list r(direct)
r(direct)[1,3]
read: science: science:
math read math
r1 .72480697 .36542052 .40172068
matrix list r(total)
r(total)[1,3]
read: science: science:
math read math
r1 .72480697 .36542052 .66658002
We see that in each case the coefficient of interest is the third element.
Now that we know the correct matrix elements, we will run indireff for 200 bootstrap replications. You may want to run more, say 2,000 to 5,000. We will then request the percentile and biased corrected confidence intervals.
set seed 358395
bootstrap r(indir) r(direct) r(total), reps(200): indireff
Bootstrap replications (200)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
.................................................. 150
.................................................. 200
Bootstrap results Number of obs = 200
Replications = 200
command: indireff
_bs_1: r(indir)
_bs_2: r(direct)
_bs_3: r(total)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_bs_1 | .2648593 .0545941 4.85 0.000 .1578569 .3718618
_bs_2 | .4017207 .0872965 4.60 0.000 .2306228 .5728186
_bs_3 | .66658 .0576837 11.56 0.000 .553522 .7796381
------------------------------------------------------------------------------
Mediation with multiple IVs
What if you had multiple independent variables? You just need to have one equation for each IV predicting the mediator variable. Here is the symbolic model.
sem (MV <- IV1)(MV <- IV2)(DV <- MV IV1 IV2)
For our example, we will use math and ses as our independent variables. We will keep the same mediator and dependent variable as before.
sem (read <- math)(read <- ses)(science <- read math ses) Endogenous variables
Mediation with multiple mediators
In this section we will consider the case in which there are multiple mediator variables. This time there will be one equation for each mediator variable. The symbolic form of the mode looks like this.
sem (MV1 <- IV)(MV2 <- IV)(DV <- MV1 MV2 IV)
For our example we will use read and write as the mediators. We will go back to a single independent variable, math.
sem (read <- math)(write <- math)(science <- read write math)