[问答] Sample Size for Factor Analysis? [推广有奖]

11楼

ReneeBK 发表于 2014-3-29 10:28:02 |只看作者 |坛友微信交流群

I have over 100 variables (potential questions for a survey), and so far only about 30 pilot test responses.

One thought that occurs to me is that our 100 variables actually fall into half a dozen groups. Each group of questions was designed to elicit a particular dimension of consumer satisfaction. Rather than attempting to run a factor analysis on all 100+ variables at once, with so few cases, would it make more sense to

run the factor analysis on one group of questions at a time
reduce the group to one or two questions with the highest loadings on the principal component
repeat the above procedure for each group of questions
Finally, conduct a factor analysis on the reduced set of variables to test the hypothesis that consumer satisfaction as reflected in this set of questions really is multidimensional.

The guiding theory here is that consumer satisfaction has multiple components. Each group of questions is designed to elicit degree of satisfaction with a particular dimension of consumer experience suggested in the literature. There is a great deal of overlap in the language of the questions, as we seek to identify the language that has resonance with our consumers. Our goal is to develop a consumer satisfaction instrument for our agency that is genuinely multidimensional, allowing the agency to get a better idea of where improvements are most needed. Our current instrument is short, and seems to address different issues, but the answers we get are so highly correlated that we really only seem to be measuring global satisfaction, which is really not a very useful result.

使用道具举报

12楼

ReneeBK 发表于 2014-3-29 10:29:00 |只看作者 |坛友微信交流群

I would like to come back to my question: how to reduce the complexity of a large set of variables if you have few cases ?

This happens in comparative political science all the time when you have countries as cases and a large set of variables that describe them.

I now have a set of some 20 countries in Europe. If you study the EU member states at a aggregate level today you have 27 countries. There are no more member states. I have even fewer cases due to unequal covering of the countries in my sources (the OECD data do not survey the same countries as the EU, the European Social Survey, etc.). At the same time I have a large set of variables describing the economic, social, and cultural structure of the same 20 countries. So how to find a a pattern in the variables if the condition of 1O cases per variable for a sound factor analysis are not met ?

A second question: Factor analysis does not print the KMO or AIC info, even if I demand all stats in the print command. Is this due to the low no. of cases ? How can I force SPSS to print the KMO or the AIC info ?

使用道具举报

13楼

ReneeBK 发表于 2014-3-29 10:29:32 |只看作者 |坛友微信交流群

Some kludges.

Create meaningful subsets of the variables.

Sidestep the question about whether the obtained matrices are reasonable representations of the population matrix.
IFF you want to consider the 27 countries the total population about which you wish to make statements, then take a large dose of salt, hold your nose, and pretend that the obtained correlation matrix IS the population matrix.  Write out the matrix products (means, SDs, Rs) and read them back in faking the number of cases. Use unit weights to create summative scores of standardized item variables.

create a few nominal level variables that relate to clusters of countries based on clusters of countries on the subsets mentioned above. Add an additional cluster identifier for cases that do not have the variables to create the cluster.  Each membership value in the clustering would stand for a meaningful profile

Relate the cluster memberships to each other with CROSSTABS, CATPCA and TWOSTEP treating the membership  variables as nominal level.

Create choropleth (patch) maps of the memberships.  Try different coordinate systems including weighting visual area by population.

Relate the cluster memberships to variables that were not  used to create that clustering.  E.g., relate industrial clusters to housing variables, etc.

使用道具举报

14楼

ReneeBK 发表于 2014-3-29 10:29:52 |只看作者 |坛友微信交流群

Another approach you might consider is Partial Least Squares. This is useful for both categorical and continuous (scale) dependent variables. This is available in SPSS Statistics v16 or 17 as an add-in via programmability that can be downloaded from Developer Central (www.spss.com/devcentral). Of course, you don't get all the inferential apparatus of traditional regression methods, but it has the advantage of finding best combinations of predictors for particular dependent variables.

HTH,
Jon Peck

使用道具举报

15楼

ReneeBK 发表于 2014-3-29 10:30:56 |只看作者 |坛友微信交流群

Resolution number: 20414  Created on: Aug 21 2001  Last Reviewed on: Feb 28 2009

Problem Subject:  FACTOR does not print KMO or Bartlett test for Nonpositive Definite Matrices

Problem Description:  I have run the SPSS FACTOR procedure with principal components analysis (PCA) as the extraction method. I requested the Kaiser-Mayer-Olkin (KMO) measure of sample adequacy and the Bartlett test of sphericity but neither of these measures was printed. The "Communalities", "Total Variance Explained" and "Component Matrix" tables were printed. Why was my request for KMO and Bartlett's sphericity test ignored?

Resolution Subject: KMO, Bartlett's sphericity, and anti-image correlation not printed for nonpositive definite matrices

Resolution Description:
It is likely the case that your correlation matrix is nonpositive definite (NPD), i.e., that some of the eigenvalues of your correlation matrix are not positive numbers. If this is the case, there will be a footnote to the correlation matrix that states "This matrix is not positive definite." Even if you did not request the correlation matrix as part of the FACTOR output, requesting the KMO or Bartlett test will cause the title "Correlation Matrix" to be printed. The footnote will be printed under this title if the correlation matrix was not requested. An NPD matrix will also result in suppression of other output from the 'Descriptives' dialog of the Factor dialog, namely the inverse of the correlation matrix, the anti-image correlation matrix, and the significance values for the correlations. If you had requested a factor extraction method other than PCA or unweighted least squares (ULS), an NPD matrix would have caused the procedure to stop without further analysis.

Matrices can be NPD as a result of various other properties. A correlation matrix will be NPD if there are linear dependencies among the variables, as reflected by one or more eigenvalues of 0. For example, if variable X12 can be reproduced by a weighted sum of variables X5, X7, and X10, then there is a linear dependency among those variables and the correlation matrix that includes them will be NPD. If there are more variables in the analysis than there are cases, then the correlation matrix will have linear dependencies and be NPD. Remember that FACTOR uses listwise deletion of cases with missing data by default. If you had more cases in the file than variables in the analysis but also had many missing values, listwise deletion could leave you with more variables than retained cases. Pairwise deletion of missing data can also lead to NPD matrices. Negative eigenvalues may be present in these situations. See the following chapter for a helpful discussion and illustration of!
  how this
can happen.

Wothke, W. (1993) Nonpositive definite matrices in structural modeling. In K.A. Bollen & J.S. Long (Eds.), Testing Structural Equation Models. Newbury Park NJ: Sage. (Chap. 11, pp. 256-293).

Elements of the KMO and Bartlett test statistic can not be calculated if the correlation matrix is NPD. See the formulae for these statistics in the current Statistical Algorithms documentation by clicking Help->Algorithms in SPSS, then scrolling down to the link for Factor Algorithms. Then click the link for Optional Statistics. . The formulae are also on page 20 of the Factor chapter at
http://support.spss.com/ProductsExt/SPSS/Documentation/Statistics/algorithms/14.0/factor.pdf

The Bartlett formula includes the log of the determinant of the correlation matrix. If there are linear dependencies, then the determinant of the matrix will be 0 and its log will be undefined. The KMO measure formula includes elements of the anti-image covariance matrix, whose calculation involves the inverse of the correlation matrix. If the correlation matrix has linear dependencies, then its inverse can not be computed.

Apart from the inability to print the KMO or Bartlett's test, the presence of an NPD correlation matrix may lead you to rethink the choice of variables or attempt to acquire data on a larger sample to achieve more reliable results.

使用道具举报