In supervised learning, the learning algorithm is provided outcome data in advance, in the form of a pre-labeled set of instances. It is from this set that the algorithm is expected to learn what to do when it encounters future, previously unseen instances. Classification is a form of supervised learning.
As an example, take the biological taxonomic hierarchy. Organisms are grouped into successfully more specific ranks of domain, kingdom, phylum, etc. If an algorithm was to learn the defining features of the most specific of the subgroups, species, based on the observance of pre-labeled member instances, it could then make a decision as to where future instances should be placed.
If, for instance, an algorithm had built up a robust model and was then presented with what we would recognize to be a fox, it would be able to inspect the fox's collective descriptive attributes (number of legs, teeth type, eye position, etc.) and make a determination of the unlabeled instance's species (if that were the goal of the model).
The trade-off here is that pre-labeling of training data (what the algorithm is fed to construct its understanding of a problem - the model) comes at a cost: the time and trouble needed to perform the labeling. The benefit is that many classification algorithms are very effective when combined with adequate amounts of properly pre-labeled data.
Support vector machines, decision trees, regression, and a whole host of other algorithms fall under supervised learning.
Unsupervised learning differs in that it is not provided with pre-labeled training data in advance. The learning algorithm instead is expected to search for any sensible pattern among the numerous instance attributes. I have a feeling that when the general public hears the term "data mining," this is what it generally thinks of: heaps of Big Data being searched randomly by Big Brother for meaningful patterns. While some data mining is constructed in this fashion (to say nothing of a whole host of statistical methods used to validate potential findings of relevance in the "randomness"), that's certainly not the norm. Clustering is a form of unsupervised learning.
To contrast the above example, unsupervised learning is like having a data set of biological organisms with all of their defining attributes, but no class attribute among them (i.e. no pre-labeling of species). A clustering algorithm would then attempt to group like instances together, attempting to maximize the similarity of grouped instances while minimizing the similarity of ungrouped instances. The grand concept is that, though foxes are not labeled as foxes, they share a number of similar attribute values which would - hopefully - make them identifiable as very similar to one another, while very different from snakes.
The trade-off here is that no pre-labeling - and none of the time associated with it - is required. The problem can be that different classes may not be as easily distinguishable as one assumes (think wolves vs. dogs).
This is a very high-level, but factually correct, overview of supervised and unsupervised learning. As you will soon see, there are all sorts of questions - technical, theoretical, and philosophical - that accompany all types of learning techniques. Knowing how to identify and differentiate 2 of the major classes of learning algorithm, however, is essential at the start of your journey.