Geoffrey I. Webb and Janice R. Boughton and Ying Yang
Faculty ofInformation Technology
Monash University, Vic. 3800, Australia
Abstract
Many on-line applications of machine learning require that
the learned classif i ers complete classif i cation within strict
real-time constraints. In consequence, eff i cient classif i ers
such as naive Bayes (NB) are often employed that can com-
plete the required classif i cation tasks even under peak compu-
tational loads. While NB provides acceptable accuracy, more
computationally intensive approaches can improve thereon.
The current paper explores techniques that utilize any ad-
ditional computational resources available at classif i cation
time to improve upon the prediction accuracy of NB. This is
achieved by augmenting NB with a sequence of super-parent
one-dependence estimators. As many of these are evaluated
as possible within the available computational resources and
the resulting set of probability estimates aggregated to pro-
duce a f i nal prediction. The algorithm is demonstrated to pro-
vide consistent improvements in accuracy as computational
resources are increased.
Introduction
In many machine learning applications, the computations of
the deployed classif i er have to be completed within strict
elapsed time limits in order to be useful. For example, to
be acceptable by users, interactive systems typically need
to complete their task within a few seconds. Examples in-
clude information retrieval (Baeza-Yates, Baeza-Yates, &
Ribeiro-Neto 1999), recommendersystems (Resnick & Var-
ian 1997), user modeling (Webb, Pazzani, & Billsus 2001)
and online fraud detection (Chan et al. 1999).
While the total available resources on acceptable elapsed
processing time per job remain constant, typically the num-
ber of simultaneous jobs varies from time to time. Hence
the resources available for each job will vary. Further, the
complexity of individual jobs may vary from job to job, for
example due to the amountofavailable information varying.
The variancein computationalresources available to com-
plete an online classif i cation task poses a complex optimiza-
tion problem for the system designer. From anecdotal ev-
idence, we suggest that the current standard response is as
follows. First, an eff i cient classif i er is developed that could
provide an acceptable accuracy (or whatever other perfor-
mance metrics are sought to be optimized). Second, the
classif i er’s resource requirements during peak time are esti-
mated. Third, the classif i er is provided necessary resources
so as to performwithin the required elapsed time constraints
during peak time. One real-world example is naive Bayes