I got a training dataset of ill horses, the data it contains is about surgeries and diseases. Some of the fields of the registers are like: temperature of the horse, age, pulse, respiratory rate etc ....
What I want to do a clasificator on the live/dead/euthanized column of every row. What I am asked to check is:
- Think about hypothesis of independence of variables
- Check if I got enought number of elements to obtain reliable probabilities
The dataset had like 25% of missing values and them where imputated using MIMMI imputation.
Thinking about the possibility of getting reliable probabilities, I can see that the training dataset is a little unbalanced: 179 horses live and 121 die (dead + euthanized). But im not really sure of that. Any help with that two questions would be so much helpful for me.
=== Run information ===Scheme:weka.classifiers.bayes.NaiveBayes
Relation: horseColic-weka.filters.unsupervised.attribute.Remove-R25-27
Instances: 300
Attributes: 24
surgery
age
id
temp
pulse
respRate
tempExtrem
periPulse
mucMemb
capRefT
pain
peri
abdDist
ngTube
ngReflux
ngRPH
feces
abd
pCellVol
totProt
abdCentApp
abdCentTotProt
outc
surgLes
Test mode:10-foldcross-validation
=== Classifier model (fulltraining set) ===
Naive Bayes Classifier
Class
Attribute lived died euthanized
(0.59) (0.26) (0.15)
==================================================================
surgery
yes 97.0 59.0 28.0
no 84.0 20.0 18.0
[total] 181.0 79.0 46.0
age
adult 168.0 67.0 44.0
young 13.0 12.0 2.0
[total] 181.0 79.0 46.0
id
mean 1009274.0202 1452556.3598 751596.8611
std. dev. 1431022.1677 1887025.7703 989556.6807
weight sum 179 77 44
precision 16915.735 16915.735 16915.735
temp
mean 34.8733 35.0055 33.054
std. dev. 10.2335 13.0545 14.9588
weight sum 179 77 44
precision 0.9275 0.9275 0.9275
pulse
mean 29.2039 33.2115 29.0187
std. dev. 10.8578 14.6404 16.7248
weight sum 179 77 44
precision 0.9107 0.9107 0.9107
respRate
mean 15.0771 16.9169 15.9348
std. dev. 8.9803 7.0278 8.1221
weight sum 179 77 44
precision 0.8667 0.8667 0.8667
tempExtrem
normal 82.0 16.0 12.0
warm 36.0 7.0 3.0
cool 53.0 48.0 25.0
cold 12.0 10.0 8.0
[total] 183.0 81.0 48.0
periPulse
normal 133.0 22.0 11.0
increased 5.0 8.0 7.0
reduced 43.0 47.0 25.0
absent 2.0 4.0 5.0
[total] 183.0 81.0 48.0
mucMemb
normal-pink 95.0 9.0 7.0
bright-pink 23.0 13.0 6.0
pale-pink 37.0 19.0 12.0
pale-cyanotic 16.0 17.0 12.0
bright-red 7.0 14.0 8.0
dark-cyanotic 7.0 11.0 5.0
[total] 185.0 83.0 50.0
capRefT
short 153.0 46.0 23.0
long 28.0 33.0 23.0
long2 1.0 1.0 1.0
[total] 182.0 80.0 47.0
pain
no-pain 53.0 6.0 8.0
depressed 42.0 21.0 14.0
inte-mild-pain 64.0 10.0 8.0
inte-severe-pain 12.0 18.0 12.0
cont-severe-pain 13.0 27.0 7.0
[total] 184.0 82.0 49.0
peri
hypermotile 42.0 7.0 7.0
normal 22.0 8.0 5.0
hypomotile 90.0 37.0 17.0
absent 29.0 29.0 19.0
[total] 183.0 81.0 48.0
abdDist
none 88.0 17.0 13.0
slight 53.0 18.0 8.0
moderate 28.0 30.0 14.0
severe 14.0 16.0 13.0
[total] 183.0 81.0 48.0
ngTube
none 79.0 40.0 27.0
slight 90.0 32.0 15.0
significant 13.0 8.0 5.0
[total] 182.0 80.0 47.0
ngReflux
none 149.0 50.0 30.0
much 17.0 15.0 6.0
less 16.0 15.0 11.0
[total] 182.0 80.0 47.0
ngRPH
mean 11.3797 13.0882 8.0606
std. dev. 2.3535 3.2916 5.1673
weight sum 179 77 44
precision 0.7917 0.7917 0.7917
feces
normal 77.0 14.0 10.0
increased 16.0 14.0 8.0
decreased 44.0 15.0 11.0
absent 46.0 38.0 19.0
[total] 183.0 81.0 48.0
abd
normal 48.0 13.0 4.0
other 39.0 5.0 7.0
firm-large-intestine 18.0 8.0 6.0
dist-small-intest 32.0 24.0 8.0
distended-large-intest 47.0 32.0 24.0
[total] 184.0 82.0 49.0
pCellVol
mean 31.0162 47.0465 46.0112
std. dev. 14.1207 18.5468 17.672
weight sum 179 77 44
precision 0.9518 0.9518 0.9518
totProt
mean 42.6539 41.451 43.7936
std. dev. 16.9138 18.6362 19.3247
weight sum 179 77 44
precision 0.9432 0.9432 0.9432
abdCentApp
clear 112.0 25.0 10.0
cloudy 54.0 22.0 20.0
serosanguinous 16.0 33.0 17.0
[total] 182.0 80.0 47.0
abdCentTotProt
mean 16.1341 21.1634 14.3203
std. dev. 6.8038 4.9109 8.6619
weight sum 179 77 44
precision 0.8837 0.8837 0.8837
surgLes
yes 94.0 70.0 30.0
no 87.0 9.0 16.0
[total] 181.0 79.0 46.0
Time taken to build model: 0.01seconds
=== Stratified cross-validation===
=== Summary ===
Correctly ClassifiedInstances 216 72 %
Incorrectly ClassifiedInstances 84 28 %
Kappa statistic 0.5134
Mean absolute error 0.1965
Root mean squared error 0.3803
Relative absolute error 52.8451 %
Root relative squared error 88.2672 %
Total Number of Instances 300
=== Detailed Accuracy By Class===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.777 0.198 0.853 0.777 0.813 0.873 lived
0.675 0.175 0.571 0.675 0.619 0.871 died
0.568 0.082 0.543 0.568 0.556 0.824 euthanized
Weighted Avg. 0.72 0.175 0.735 0.72 0.725 0.865
=== Confusion Matrix ===
a b c <-- classified as
139 28 12 | a = lived
16 52 9 | b = died
8 11 25 | c = euthanized


雷达卡


京公网安备 11010802022788号







