I have been following this discussion with much interest, as I have a
similar problem at hand.
For years, we have been conducting a consumer satisfaction survey that
consists of one page, about 10 questions, plus a single open-ended
question. Although the questions were intended to probe consumer
satisfaction in a number of different areas, basically the level of
correlation is so high that it seems that we're really only tracking one
factor: overall satisfaction.
So we conducted literature reviews, and went back to the drawing boards,
formulating more than 100 questions in 6 broad areas of consumer
satisfaction. Our intention was to pilot test these questions with
participants, examine the results, throw out the redundant questions
(discerned through factor analysis), and emerge with, say, 20 questions
known to reflect different dimensions of consumer satisfaction. However,
our sample size thus far is in the pitiful range: perhaps 35 respondents.
Needless to say, we have a long way to go. With our response rates, and
consumer base, we would be lucky to get more than 100 respondents in a year.
In order to improve the subjects to variables ratio (STV), we need either
to greatly increase the sample size (which is difficult for us to do), or
reduce the number of variables, or both. Our questions are short simple
statements requesting responses on a 5-point likert scale. Some of the
questions are worded in almost identical language, and some of these are
almost certainly redundant. Given our relatively small sample size thus
far, what is the best way to proceed to remove redundant questions while
retaining maximum diversity of responses?
From one perspective, it would appear that rank correlations might be the
preferred measure of association, but I wonder if Likert scales are,
analytically speaking, equivalent to rank order variables? What other
measures would be most appropriate? I hesitate to downgrade the measure of
association to categorical, because that throws out the information on
directionality and degree. Likewise, I hesitate to overgrade the measure of
association to ratio, because clearly the intervals are arbitrary and not
additive.
Intuitively, I am seeking to extract, out of these 100 questions, 4-5
groups of 2-3 questions each, such that within-group correlations are high,
but correlations with the other groups are low. The within-group redundancy
reinforces degree of satisfaction with that particular factor, and the low
between-group correlation assures that different aspects of satisfaction
are represented.
Suggestions, please?
|