请选择 进入手机版 | 继续访问电脑版
搜索
人大经济论坛 经典文献» 浏览文献

Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables

文献名称 Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
文献作者 DAVID MAXWELL CHICKERING;DAVID HECKERMAN
作者所在单位 Microsoft Research, Redmond
文献分类 已发表文献
学科一级分类 统计
学科二级分类 统计学
文献摘要 We discuss Bayesian methods for model averaging and model selection among Bayesian-network
models with hidden variables. In particular, we examine large-sample approximations for the marginal likelihood
of naive-Bayes models in which the root node is hidden. Such models are useful for clustering or unsupervised
learning. We consider a Laplace approximation and the less accurate but more computationally efficient approximation
known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen’s (1987) Minimum
Description Length (MDL). Also, we consider approximations that ignore some off-diagonal elements of the
observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the
accuracy of these approximations using a Monte-Carlo gold standard. In experiments with artificial and real
examples, we find that (1) none of the approximations are accurate when used for model averaging, (2) all of
the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate
approximations, the Cheeseman–Stutz and Diagonal approximations are the most computationally efficient, (4)
all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model
parameters, and (5) the Cheeseman–Stutz approximation can be more accurate than the other approximations,
including the Laplace approximation, in situations where the parameters in the maximum a posteriori configuration
are near a boundary.
参考文献 Azevedo-Filho, A. & Shachter, R. (1994). Laplace’s method approximations for probabilistic inference in belief
networks with continuous variables. In Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence
(pp. 28–36). San Mateo, CA: Morgan Kaufmann.
Bareiss, E. & Porter, B. (1987). Protos: An exemplar-based learning apprentice. In Proceedings of the Fourth
International Workshop on Machine Learning (pp. 12–23). San Mateo, CA: Morgan Kaufmann.
Becker, S. & LeCun, Y. (1989). Improving the convergence of back-propagation learning with second order
methods. In Proceedings of the 1988 Connectionist Models Summer School (pp. 29–37). San Mateo, CA:
Morgan Kaufmann.
Berger, J. (1985). Statistical decision theory and Bayesian analysis. Berlin: Springer.
Bernardo, J. & Smith, A. (1994). Bayesian theory. New York: John Wiley and Sons.
Buntine, W. (1994a). Computing second derivatives in feed-forward networks: A review. IEEE Transactions on
Neural Networks, 5, 480–488.
Buntine, W. (1994b). Operations for learning with graphical models. Journal of Artificial Intelligence Research,
2, 159–225.
Buntine, W. (1996). A guide to the literature on learning graphical models. IEEE Transactions on Knowledge
and Data Engineering, 8, 195–210.
Cheeseman, P. & Stutz, J. (1995). Bayesian classification (AutoClass): Theory and results. In Fayyad, U.,
Piatesky-Shapiro, G., Smyth, P., & Uthurusamy, R.(Eds.), Advances in knowledge discovery and data mining,
pp. 153–180. Menlo Park, CA: AAAI Press.
Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90,
1313–1321.
Chickering, D. & Heckerman, D. (1996). Efficient approximations for the marginal likelihood of incomplete data
given a Bayesian network. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence (pp.
158–168). San Mateo, CA: Morgan Kaufmann.
Clogg, C. (1995). Latent class models. In Arminger, G., Clogg, C., & Sobel, M. (Eds.), Handbook of statistical
modeling for the social and behavioral sciences. Plenum Press, New York.
Cooper, G. & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data.
Machine Learning, 9, 309–347.
EFFICIENT APPROXIMATIONS FOR THE MARGINAL LIKELIHOOD 211
Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society, B 39, 1–38.
Draper, D. (1995). Assessment and propagation of model uncertainty (with discussion). Journal of the Royal
Statistical Society B, 57, 45–97.
Geiger, D. & Heckerman, D. (1994). Learning Gaussian networks. In Proceedings of Tenth Conference on
Uncertainty in Artificial Intelligence (pp. 235–243). San Mateo, CA: Morgan Kaufmann.
Geiger, D., Heckerman, D., & Meek, C. (1996). Asymptotic model selection for directed networks with hidden
variables. In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence (pp. 283–290). San
Mateo, CA: Morgan Kaufmann.
Geman, S.&Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 721–742.
Gilks, W., Richardson, S., & Spiegelhalter, D. (1996). Markov chain Monte Carlo in practice. New York:
Chapman and Hall.
Good, I. (1965). The estimation of probabilities. Cambridge, MA: MIT Press.
Gull, S. & Skilling, J. (1991). Quantified maximum entropy. MemSys5 user’s manual. Tech. rep., M.E.D.C., 33
North End, Royston, SG8 6NR, England.
Haughton, D. (1988). On the choice of a model to fit data from an exponential family. Annals of Statistics, 16,
342–355.
Heckerman, D. (1995). A tutorial on learning Bayesian networks. Tech. rep. MSR-TR-95-06, Microsoft
Research, Redmond, WA. Revised January, 1996.
Heckerman, D. & Geiger, D. (1995). Likelihoods and priors for Bayesian networks. Tech. rep. MSR-TR-95-54,
Microsoft Research, Redmond, WA.
Heckerman, D., Geiger, D.,&Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge
and statistical data. Machine Learning, 20, 197–243.
Hong, Z. & Yang, J. (1994). Optimal discriminant plane for a small number of samples and design method of
classifier on the plane. Pattern Recognition, 24, 317–324.
Jeffreys, H. (1939). Theory of probability. Oxford University Press.
Jensen, F., Lauritzen, S., & Olesen, K. (1990). Bayesian updating in recursive graphical models by local
computations. Computational Statisticals Quarterly, 4, 269–282.
Kass, R. & Raftery, A. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Kass, R., Tierney, L., & Kadane, J. (1988). Asymptotics in Bayesian computation. In Bernardo, J., DeGroot, M.,
Lindley, D., & Smith, A.(Eds.), Bayesian statistics 3 (pp. 261–278). Oxford University Press.
Kass, R. & Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the
Schwarz criterion. Journal of the American Statistical Association, 90, 928–934.
MacKay, D. (1992a). Bayesian interpolation. Neural Computation, 4, 415–447.
MacKay, D. (1992b). A practical Bayesian framework for backpropagation networks. Neural Computation, 4,
448–472.
MacKay, D. (1996). Choice of basis for the Laplace approximation. Tech. rep., Cavendish Laboratory,
Cambridge, UK.
Madigan, D. & York, J. (1995). Bayesian graphical models for discrete data. International Statistical Review,
63, 215–232.
Meng, X.&Rubin, D. (1991). UsingEMto obtain asymptotic variance-covariance matrices: The SEM algorithm.
Journal of the American Statistical Association, 86, 899–909.
Merz, C. & Murphy, P. (1996). UCI repository of machine learning databases,
www.ics.uci.edu/ »mlearn/mlrepository.html. Tech. rep., University of California, Irvine.
Michalski, R. & Chilausky, R. (1980). Learning by being told and learning from examples: An experimental
comparison of the two methods of knowledge acquisition in the context of developing an expert system for
soybean disease diagnosis. International Journal of Policy Analysis and Information Systems, 4.
Neal, R. (1991). Bayesian mixture modeling by Monte Carlo simulation. Tech. rep. CRG-TR-91-2, Department
of Computer Science, University of Toronto.
Neal, R. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Tech. rep. CRG-TR-93-1,
Department of Computer Science, University of Toronto.
Raftery, A. (1994). Approximate Bayes factors and accounting for model uncertainty in generalized linear models.
Tech. rep. 255, Department of Statistics, University of Washington.
Raftery, A. (1995). Bayesian model selection in social research. In Marsden, P. (Ed.), Sociological methodology.
Cambridge, MA: Blackwells.
关键字 Bayesian model averaging, model selection, multinomial mixtures, clustering, unsupervised learning,Laplace approximation
发表所在刊物(或来源) Machine Learning, 29, 181–212 (1997)
发表时间 1997
适用研究领域 统计学
评论
上传时间 2011-1-19 17:56
下载文献 fulltext.pdf[452.78 KB]
注:下载文献会消耗您一个“当日剩余下载次数”

会员评论

发表评论
GMT+8, 2026-3-6 02:44