Machine Learning - WEKA
David Gilbert and José Antonio Reyes
The aim of this lab is to give you practical experience in the use of WEKA for Machine Learning applications from the lecture on machine Learning for micro-array classification.Resources:Some useful resources about WEKA are at the website www.cs.waikato.ac.nz/ml/weka
The WEKA datafiles for this tutorial can be found here.
Exercises:- Practice WEKA with the classification example about Play Golf
- Data format: the Datasets for WEKA are formatted according to the arff format. For this example you will use the file weather.nominal.arff as a training file to construct a classification model. Save the file in your workspace for example (C:\WEKA_Tutorial), and open it in a text processor to see an example of the arff format; note that the last attribute corresponds to the class.
- Run WEKA in the Windows environment:
Find the WEKA directory in your machine (C:\Program Files\Weka-3-4)
Double click in the file"weka.jar"; Select the option "Simple CLI"
Now you are ready tu run WEKA using some commands in this window. - Probe the example with different classifiers, and compare the results obtained with each of the classifiers for example in terms of and number of examples correctly and incorrectly classified:Decision Trees: In order to probe decision tree you will use the Id3 classifier. Type the following command
java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff(note that the option -t calls the training file according the PATH location of this file in your machine)Support Vector Machines: In order to probe the SVM classifier, type the following command
java weka.classifiers.functions.SMO -t PATH/weather.nominal.arffNeural Networks: In order to probe the NNs classifier, type the following command
java weka.classifiers.functions.VotedPerceptron -t PATH/weather.nominal.arffNaive Bayes: In order to probe the NB classifier, type the following command
java weka.classifiers.bayes.NaiveBayes -t PATH/weather.nominal.arff - Save the classification model and then use it to classify new examples: You can save the classification model generated by each one of the above classifiers by using the option -d in the following way:
java weka.classifiers.TYPE.CLASSIFIER_NAME -t PATH/weather.nominal.arff -d PATH/modelname.modelYou should generate a file that contains the model; this can be named for example in the form: weather_Id3.model
weather_SVM.model
weather_NN.model
weather_NB.model
e.g. by
java weka.classifiers.trees.Id3 -t PATH/weather.nominal.arff -d PATH/weather_Id3.modelIn order to use the stored model to classify new examples, use the file "test_weather.arff" (save this file in the same folder than weather.nominal.arff and *.model files). In this file you have two examples without classification. Then classify these examples using the models previously generated in the following way:
java weka.classifiers.~.classifier_name -T PATH/test_weather.arff -l PATH/modelname.model -p 0In this case you use the options: -T that calls a test file (test_weather.arff); and -l that call the model file to be used. Compare the results obtained using the four models generated.
- Classification of breast cancer examples.
Download the file Breast_Cancer.arff that include a set of 699 cases, 9 attributes and the class attribute related to the type of cancer cell (in this dataset class 4 is equivalent to malignant cells and class 2 is equivalent to benign cells). This dataset is from the Wisconsin Breast Cancer Database (January 8, 1991). You can look for this and others examples of dataset in this linkClassify the examples in the "Breast_Cancer.arff" dataset (benign and malignant cells) using the four classifiers mentioned in the exercise 1, and compare the results.
NOTE: This dataset contains numerical data, so you you can not use Id3 classifier (Id3 only support nominal attributes). In this case try decision trees with J48 classifier with the following command
java weka.classifiers.trees.J48 -t PATH/Breast_Cancer.arff - Classification of Gene expression data.
Download the file ALLAML.arff (Golub et al 1999) gene expression data that include 72 examples, 7129 genes (attributes) and 2 clases "acute myeloid leukemia (AML)" and "acute lymphoblastic leukemia (ALL)". For more information you can read the gene list in the file ALLAML.gene_names.txt, and in the paper Golub et al 1999Classify the examples in this dataset (ALL or AML class) using the four classifiers mentioned in the exercise 1, and compare the results.
Interpretation: Go to PubMed and search the selected genes, do they have any biological meaning? Can you identify the unknown gene function? (Try using other bioinformatics tools)


雷达卡


京公网安备 11010802022788号







