Some clarification:
PCA is designed to maximize the first underlying factor, maximizing its contribution, and (besides) when it extracts all possible factors (number of factor equal to the number of variables) it accounts for the entire variance of the variables. Other methods of extraction do not maximize the first, and besides they try to explain only the COMMON variance, leaving aside a portion of variance that is regarded as unique to each observed variable. Historically, PCA was used first to identify a single factor underlying a set of related measures, supposedly measuring all the same trait (intelligence, as it was), while Principal Axes was used based on the theory that instead of a single Intelligence factor there were underlying "intelligences" for various "faculties of the mind" such as linguistic, graphic or mathematical ability.
In BOTH cases, the position of the coordinate system on which the underlying factors are measured is essentially arbitrary. By rotation, the analyst may be able to put one factor closer to one set of observed variables, and far from others, and the opposite for another factor. This make happen by chance in the initial extraction, but ordinarily doesn't, so rotation is used in order to get that nice characteristic of a factor linked to sets of interrelated variables. For instance, using the same IQ example, by rotation one may get one factor strongly associated with several linguistic tests, and only weakly related to other tests, while another factor is strongly related to mathematical tests and weakly to other tests, so one intuitively calls the first factor "linguistic" and the second "mathematical". These different factors may be independent from each other, or correlated.
It is perfectly possible that being good in language implies being good also in math, and so at least to some degree, some correlation between linguistic and math factors is only to be expected. Initial patterns of extraction ordinarily extract factors that are orthogonal or uncorrelated to each other, because each successive factor is extracted on the unexplained residuals left by the preceding ones, but rotation can position the factor axes in ways that imply they are correlated to each other. Rotation preserving the independence of factors is called orthogonal rotation. Rotation allowing them to be correlated to each other is called oblique rotation. So in Paul Swank response there is a (probably involuntary) confusion at the end: oblique rotation yields correlated factors (though it does not FORCE them to be correlated) while orthogonal rotation methods FORCE rotated factors to be uncorrelated to each other.
Hector