- Structural Equation Modeling
- We went over the matrix representation of SEM models in theory. Now, we will show how this can be practically done in R using the RAM matrix specifications mentioned earlier. We will use the political democracy dataset available in the lavaan package for this example. Notably, this dataset is ordered such that the first eight columns are the y variables and the last three are the x variables. Let's take a look at the following example:
- library(lavaan)
- data(PoliticalDemocracy)
- We will create the covariance matrix of this dataset, as follows:
- pd.cov <- cov(PoliticalDemocracy)
- Now, we will create each of our matrices A, S, F, and I. In a real SEM example, we would iteratively create new A and S matrices in an attempt to find the best values to match the implied to the observed covariance matrix. Think of this as starting values for the matrices. In this case, we will choose starting values that pretty closely match a final solution, so that we do not have to iterate through dozens of times to find a good solution. Remember that we have 11 observed and three unobserved variables for a total of 14 variables.
- First, let's take a look at the A matrix of paths, which is a matrix of 14*14:
- mat.A <- matrix(
- c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,1.5,0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,0.5,0.5,0
- ), nrow = 14, byrow = TRUE
- )
- Then, let's take a look at the S matrix of residual variances or covariances, which is also a matrix of 14*14:
- mat.S <- matrix(
- c(2, 0, 0, 0,.5, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 7, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 5, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
- 0, 1, 0, 3, 0, 0, 0,.5, 0, 0, 0, 0, 0, 0,
- .5, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 2, 0, 0, 0, 5, 0, 1, 0, 0, 0, 0, 0, 0,
- 0, 0, 1, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0,.5, 0, 1, 0, 3, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0,.1, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0,.1, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,.5, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,.5, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,.2
- ), nrow = 14, byrow = TRUE
- )
- Next, let's take a look at the filter matrix, which selects the observed variables is a matrix of 11*14:
- mat.F <- matrix(
- c(1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
- 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0
- ), nrow = 11, byrow = TRUE
- )
- Finally the 14 x 14 identity matrix:
- mat.I <- diag(rep(1, 14), nrow = 14)
- We can write a function that will perform all of the matrix operations of the McArdle McDonald equation (variables have a 0 in the name), as follows:
- RAM.implied.covariance <- function(A.0, S.0, F.0, I.0) {
- implied.covariance <- F.0 %*% solve(I.0-A.0) %*% S.0 %*% t(solve(I.0-A.0)) %*% t(F.0)
- return(implied.covariance)
- }
- And, finally, we can estimate an implied covariance matrix based on our starting A and S matrices, which is shown in the following matrix rounded to 2 decimal places:
- > round(RAM.implied.covariance(mat.A, mat.S, mat.F, mat.I), 2)
- [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
- [1,] 7.12 5.12 5.12 5.12 3.44 2.94 2.94 2.94 0.75 1.50 1.50
- [2,] 5.12 12.12 5.12 6.12 2.94 4.94 2.94 2.94 0.75 1.50 1.50
- [3,] 5.12 5.12 10.12 5.12 2.94 2.94 3.94 2.94 0.75 1.50 1.50
- [4,] 5.12 6.12 5.12 8.12 2.94 2.94 2.94 3.44 0.75 1.50 1.50
- [5,] 3.44 2.94 2.94 2.94 3.98 1.98 1.98 1.98 0.63 1.25 1.25
- [6,] 2.94 4.94 2.94 2.94 1.98 6.98 1.98 2.98 0.63 1.25 1.25
- [7,] 2.94 2.94 3.94 2.94 1.98 1.98 4.98 1.98 0.63 1.25 1.25
- [8,] 2.94 2.94 2.94 3.44 1.98 2.98 1.98 4.98 0.63 1.25 1.25
- [9,] 0.75 0.75 0.75 0.75 0.63 0.63 0.63 0.63 0.60 1.00 1.00
- [10,] 1.50 1.50 1.50 1.50 1.25 1.25 1.25 1.25 1.00 2.10 2.00
- [11,] 1.50 1.50 1.50 1.50 1.25 1.25 1.25 1.25 1.00 2.00 2.50
- Compare the preceding matrix to the observed covariance matrix as follows:
- > round(pd.cov, 2)
- y1 y2 y3 y4 y5 y6 y7 y8 x1 x2 x3
- y1 6.88 6.25 5.84 6.09 5.06 5.75 5.81 5.67 0.73 1.27 0.91
- y2 6.25 15.58 5.84 9.51 5.60 9.39 7.54 7.76 0.62 1.49 1.17
- y3 5.84 5.84 10.76 6.69 4.94 4.73 7.01 5.64 0.79 1.55 1.04
- y4 6.09 9.51 6.69 11.22 5.70 7.44 7.49 8.01 1.15 2.24 1.84
- y5 5.06 5.60 4.94 5.70 6.83 4.98 5.82 5.34 1.08 2.06 1.58
- y6 5.75 9.39 4.73 7.44 4.98 11.38 6.75 8.25 0.85 1.81 1.57
- y7 5.81 7.54 7.01 7.49 5.82 6.75 10.80 7.59 0.94 2.00 1.63
- y8 5.67 7.76 5.64 8.01 5.34 8.25 7.59 10.53 1.10 2.23 1.69
- x1 0.73 0.62 0.79 1.15 1.08 0.85 0.94 1.10 0.54 0.99 0.82
- x2 1.27 1.49 1.55 2.24 2.06 1.81 2.00 2.23 0.99 2.28 1.81
- x3 0.91 1.17 1.04 1.84 1.58 1.57 1.63 1.69 0.82 1.81 1.98
- As we can see, it is pretty close. Can we improve it? Yes, but it will take some trial and error with new values in the A (and subsequently the S) matrix. This is what optimizers do in statistical software, and the R packages that we described here do just that. Notably, there are actually an infinite number of possible solutions for many SEM models, so some constraining of values is often needed. In this case, we would want to constrain a few particular path values. In this case, constraining one path going from each of the latent variables to a manifest variable will do the trick.
复制代码
|