介绍几本经典图书
1,Tom M Mitchell - Machine Learning
2、Introduction_to_Machine_Learning(Ethem_Alpaydin).pdf
3、Pattern Recognition and MachineLearning.pdf
4、MIT.Fundamentals.of.Machine.Learning.for.Predictive.Data.Analytics
5、Foundations_of_Machine_Learning.pdf
6、Learning from data.pdf
7、机器学习系统设计.Python.2014.pdf
8、数据挖掘:概念与技术(中文第三版).pdf
9、Machine Learning and Data Science - AnIntroduction to Statistical Learning Methods with R.pdf
《Mining of Massive Datasets》(《大数据》)
作 者Anand Rajaraman[3]、Jeffrey David Ullman,Anand是Stanford的PhD。这本书介绍了很多算法,也介绍了这些算法在数据规模比较大的时候的变形。但是限于篇幅,每种算法都没有展开讲的感觉,如果想深入了解需要查其他的资料,不过这样的话对算法进行了解也足够了。还有一点不足的地方就是本书原文和翻译都有许多错误,勘误表比 较长,读者要用心了。 《DataMining: Practical Machine Learning Tools and Techniques》(《数据挖掘:实用机器学习技术》)
作者IanH. Witten 、Eibe Frank是weka的作者、新西兰怀卡托大学教授。他们的《ManagingGigabytes》[4]也是信息检索方面的经典书籍。这本书最大的特点是对weka的使用进行了介绍,但是其理论部分太单薄,作为入门书籍还可,但是,经典的入门书籍如《集体智慧编程》、《智能web算法》已经很经典,学习 的话不宜读太多的入门书籍,建议只看一些上述两本书没讲到的算法。 《机器学习及其应用》
周志华、杨强主编。来源于“机器学习及其应用研讨会”的文集。该研讨会由复旦大学智能信息处理实验室发起,目前已举办了十届,国内的大牛如李航、项亮、王海峰、刘铁岩、余凯等都曾在该会议上做过讲座。这本书讲了很多机器学习前沿的具体的应用,需要有基础的才能看 懂。如果想了解机器学习研究趋势的可以浏览一下这本书。关注领域内的学术会议是发现研究趋势的方法嘛。 《ManagingGigabytes》(深入搜索引擎)
信息检索不错的书。 《ModernInformation Retrieval》
Ricardo Baeza-Yates et al. 1999。貌似第一本完整讲述IR的书。可惜IR这些年进展迅猛,这本书略有些过时了。翻翻做参考还是不错的。另外,Ricardo同学现在是Yahoo Research for Europe and Latin Ameria的头头。
坚持参与推荐资源的好活动。
数据挖掘的一些学习资源/主要是网站
1.Statistical Learning Theory from Berkeley
This course will provide an introduction to probabilistic and computational methods for the statistical modeling of complex, multivariate data. It will concentrate on graphical models, a flexible and powerful approach to capturing statistical dependencies in complex, multivariate data. In ...
以下是一些数据挖掘领域专家牛人的网站,有很多精华,能开阔研究者的思路,在此共享:
1.Rakesh Agrawal
主页:http://research.microsoft.com/en-us/people/rakesha/ 数据挖掘领域唯一独有的关键规则研究的创始人,其主要的Apriori算法开启了这一伟大的领域。之前他在IBM研究院工作,目前在微软研究院从事搜索的相关工作。除了关联规则外,他还在Hippocratic Database, Sovereign Information Sharing, and Privacy-Preservi ...
大学课程、在线教程:Stanford课程:CS246 Mining Massive Data Sets,CS246H Mining Massive Data Sets: Hadoop Labs,CS341 Project in Mining Massive Data Sets,配套书籍 Mining of Massive Datasets,DataMiningTalk;CMU课程:Data Mining: Spring 2013,Statistics 36-350: Data Mining (fall 2009);南京大学课程:Introduction to Data Mining;Coursera:Data Mining Specialization。 专著、书籍:Mining of ...
1.Rakesh Agrawal
主页:http://research.microsoft.com/en-us/people/rakesha/ 数据挖掘领域唯一独有的关键规则研究的创始人,其主要的Apriori算法开启了这一伟大的领域。之前他在IBM研究院工作,目前在微软研究院从事搜索的相关工作。除了关联规则外,他还在Hippocratic Database, Sovereign Information Sharing, and Privacy-Preserving Data Mining等方面做出了开创性的工作。
数据挖掘的一些学习资源/主要是网站
1.Statistical Learning Theory from Berkeley
This course will provide an introduction to probabilistic and computational methods for the statistical modeling of complex, multivariate data. It will concentrate on graphical models, a flexible and powerful approach to capturing statistical dependencies in complex, multivariate data. In particular, the course will focus on the key theoretical and methodological issues of representation, estimation, and inference.
2.Data Mining from Stanford
This will also be helpful.
3.The Lasso Page(略有点old)
The Lasso is a shrinkage and selection method for linear regression. It minimizes the usual sum of squared errors, with a bound on the sum of the absolute values of the coefficients. It has connections to soft-thresholding of wavelet coefficients, forward stagewise regression, and boosting methods.
4.Data Mining Tutorials
This is a really informative website with tutorials on statistical data mining. They were written by Andrew Moore an employee at Google. He covers the foundation of data analysis, including decision trees, Bayesian classifiers and many other techniques we've been learning in class. I great website to check out if you're having trouble with any topics or simply would just like to learn more.
5.Data Mining Research
This is a comprehensive blog about the latest developments in data mining research. Provides a great overview of what scholars and professionals are talking about with regards to the discipline. The individual who started this blog is a working professional in the field, working for FinScore, a Swiss provider of software and professional services focusing in data mining and customer intelligence. A couple very interesting and insightful posts from the blog include: “10 Very Interesting People in Data Mining,” “Data Mining: A New Weapon in the Fight Against Medicaid Fraud,” and “Worst practices in Data Mining.” Stephanie Santoso
6.Statistical Learning Article
An article on the elements on statistical learning, how data mining is used to give predictions. Azai Ighadaro
7.Kernel-Machines.Org
This page is devoted to learning methods building on kernels, such as the support vector machine. It grew out of earlier pages at the Max Planck Institute for Biological Cybernetics and at GMD FIRST, snapshots of which can be found here and here. In those days, information about kernel methods was sparse and nontrivial to find, and the kernel machines web site acted as a central repository for the field. It included a list of people working in the field, and online preprints of most publications.
8.Welcome to Boosting.org
We are pleased to announce a new website on Boosting and related ensemble learning methods, e.g. Boosting, Arcing, Bagging, the connection to mathematical programming and large margin classifiers, and model selection. The aim is to serve as a central information source by providing links to papers, upcoming events, datasets, code, etc.
9.Perfectly Random Sampling with Markov Chains
Random sampling has found numerous applications in physics, statistics, and computer science. Perhaps the most versatile method of generating random samples from a probability space is to run a Markov chain. This site provides a comprehensive collection of this area!
10.Independent Component Analysis
A Tutorial
11.Self Organizing Maps
An excellent short introduction
12.Reversible Markov Chains and Random Walks on Graphs
Early drafts of chapters are available as PDF files
13.A Brief Introduction to Graphical Models and Bayesian Networks
"Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering -- uncertainty and complexity -- and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity -- a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.
14.Gaussian Processes for Machine Learning
The bayesian approach for data mining.