Here is what I thought:
1) Look at the scale of measurement of all the variables. If scales are different, e.g., Height vs age, you should normalize all the variables. Or use correlation matrix.
2) Check each variable, if they are close to normally distributed. Otherwise use transformation. This will ensure that all variables are in the elliptical space.
3) In SAS, here is an example: (using correlation matrix)
A) The data:
data Crime;
title 'Crime Rates per 100,000 Population by State';
input State $1-15 Murder Rape Robbery Assault
Burglary Larceny Auto_Theft;
datalines;
Alabama 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7
Alaska 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3
Arizona 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
Arkansas 8.8 27.6 83.2 203.4 972.6 1862.1 183.4
California 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
Colorado 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
Connecticut 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
Delaware 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
Florida 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
Georgia 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
Hawaii 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4
Idaho 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6
Illinois 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
Indiana 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
Iowa 2.3 10.6 41.2 89.8 812.5 2685.1 219.9
Kansas 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
Kentucky 10.1 19.1 81.1 123.3 872.2 1662.1 245.4
Louisiana 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
Maine 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9
Maryland 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
...
;
2) Use the 'PrinComp' procedure:
proc princomp out=Crime_Components;
run;
3) Look at the output (results): pay attention to the correlation matirx. If the correlation between two variables close to 1 or -1, you can omit one of them.
Also pay attention to the eigenvalues of correlation matrix. If eigenvalue < 1 do not use it
Hope this helps.
Crime Rates per 100,000 Population by State |
Observations | 50 |
Variables | 7 |
Simple Statistics |
| Murder | Rape | Robbery | Assault | Burglary | Larceny | Auto_Theft |
Mean | 7.444000000 | 25.73400000 | 124.0920000 | 211.3000000 | 1291.904000 | 2671.288000 | 377.5260000 |
StD | 3.866768941 | 10.75962995 | 88.3485672 | 100.2530492 | 432.455711 | 725.908707 | 193.3944175 |
Correlation Matrix |
| Murder | Rape | Robbery | Assault | Burglary | Larceny | Auto_Theft |
Murder | 1.0000 | 0.6012 | 0.4837 | 0.6486 | 0.3858 | 0.1019 | 0.0688 |
Rape | 0.6012 | 1.0000 | 0.5919 | 0.7403 | 0.7121 | 0.6140 | 0.3489 |
Robbery | 0.4837 | 0.5919 | 1.0000 | 0.5571 | 0.6372 | 0.4467 | 0.5907 |
Assault | 0.6486 | 0.7403 | 0.5571 | 1.0000 | 0.6229 | 0.4044 | 0.2758 |
Burglary | 0.3858 | 0.7121 | 0.6372 | 0.6229 | 1.0000 | 0.7921 | 0.5580 |
Larceny | 0.1019 | 0.6140 | 0.4467 | 0.4044 | 0.7921 | 1.0000 | 0.4442 |
Auto_Theft | 0.0688 | 0.3489 | 0.5907 | 0.2758 | 0.5580 | 0.4442 | 1.0000 |
Eigenvalues of the Correlation Matrix |
| Eigenvalue | Difference | Proportion | Cumulative |
1 | 4.11495951 | 2.87623768 | 0.5879 | 0.5879 |
2 | 1.23872183 | 0.51290521 | 0.1770 | 0.7648 |
3 | 0.72581663 | 0.40938458 | 0.1037 | 0.8685 |
4 | 0.31643205 | 0.05845759 | 0.0452 | 0.9137 |
5 | 0.25797446 | 0.03593499 | 0.0369 | 0.9506 |
6 | 0.22203947 | 0.09798342 | 0.0317 | 0.9823 |
7 | 0.12405606 | | 0.0177 | 1.0000 |
Eigenvectors |
| Prin1 | Prin2 | Prin3 | Prin4 | Prin5 | Prin6 | Prin7 |
Murder | 0.300279 | -.629174 | 0.178245 | -.232114 | 0.538123 | 0.259117 | 0.267593 |
Rape | 0.431759 | -.169435 | -.244198 | 0.062216 | 0.188471 | -.773271 | -.296485 |
Robbery | 0.396875 | 0.042247 | 0.495861 | -.557989 | -.519977 | -.114385 | -.003903 |
Assault | 0.396652 | -.343528 | -.069510 | 0.629804 | -.506651 | 0.172363 | 0.191745 |
Burglary | 0.440157 | 0.203341 | -.209895 | -.057555 | 0.101033 | 0.535987 | -.648117 |
Larceny | 0.357360 | 0.402319 | -.539231 | -.234890 | 0.030099 | 0.039406 | 0.601690 |
Auto_Theft | 0.295177 | 0.502421 | 0.568384 | 0.419238 | 0.369753 | -.057298 | 0.147046 |