楼主: firebig
1628 7

[数据挖掘书籍] Data Mining and Knowledge Discovery via Logic-Based Methods [推广有奖]

  • 0关注
  • 2粉丝

博士生

27%

还不是VIP/贵宾

-

威望
0
论坛币
1723 个
通用积分
1.5900
学术水平
6 点
热心指数
6 点
信用等级
3 点
经验
5691 点
帖子
213
精华
0
在线时间
158 小时
注册时间
2009-2-19
最后登录
2024-4-23

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
好像还没有这本书
Data Mining and Knowledge.jpg
xiv Preface
mining and knowledge discovery problems discussed throughout this monograph.
It pays extra attention to the reasons that lead to formulate some of these problems
as optimization problems since one always needs to keep control on the size (i.e.,
for size minimization) of the extracted new rules or when one tries to gain a deeper
understanding of the system of interest by issuing a small number of new queries
(i.e., for query minimization).
The second and third chapters present some sophisticated branch-and-bound
algorithms for extracting a pattern (in the form of a compact Boolean function)
from collections of observations grouped into two disjoint classes. The fourth chapter
presents some fast heuristics for the same problem.
The fifth chapter studies the problem of guided learning. That is, now the analyst
has the option to decide the composition of the observation to send to an expert or
“oracle” for the determination of its class membership. Apparently, the goal now is
to gain a good understanding of the system of interest by issuing a small number of
inquiries of the previous type.
A related problem is studied in the sixth chapter. Now it is assumed that the
analyst has two sets of examples (observations) and a Boolean function that is
inferred from these examples. Furthermore, it is assumed that the analyst has a new
example that invalidates this Boolean function. Thus, the problem is how to modify
the Boolean function such that it satisfies all the requirements of the available examples
plus the new example. This is known as the incremental learning problem.
Chapter 7 presents an intriguing duality relationship which exists between
Boolean functions expressed in CNF (conjunctive normal form) and DNF (disjunctive
normal form), which are inferred from examples. This dual relationship could
be used in solving large-scale inference problems, in addition to other algorithmic
advantages.
The chapter that follows describes a graph theoretic approach for decomposing
large-scale data mining problems. This approach is based on the construction of a
special graph, called the rejectability graph, from two collections of data. Then certain
characteristics of this graph, such as its minimum clique cover, can lead to some
intuitive and very powerful decomposition strategies.
Part II (“Application Issues”) begins with Chapter 9. This chapter presents an
intriguing problem related to any model (and not only those based on logic methods)
inferred from grouped observations. This is the problem of the reliability of the
model and it is associated with both the number of the training data (sampled observations
grouped into two disjoint classes) and also the nature of these data. It is
argued that many model inference methods today may derive models that cannot
guarantee the reliability of their predictions/classifications. This chapter prepares the
basic arguments for studying a potentially very critical type of Boolean functions
known as monotone Boolean functions.
The problems of inferring a monotone Boolean function from inquiries to an
expert (“oracle”), along with some key mathematical properties and some application
issues are discussed in Chapters 10 and 11. Although this type of Boolean functions
has been known in the literature for some time, it was the author of this book along
with some of his key research associates who made some intriguing contributions
Preface xv
to this part of the literature in recent years. Furthermore, Chapter 11 describes some
key problems in assessing the effectiveness of data mining and knowledge discovery
models (and not only for those which are based on logic). These issues are referred
to as the “three major illusions” in evaluating the accuracy of such models. There it
is shown that many models which are considered as highly successful, in reality may
even be totally useless when one studies their accuracy in depth.
Chapter 12 presents how some of the previous methods for inferring a Boolean
function from observations can be used (after some modifications) to extract what is
known in the literature as association rules. Traditional methods suffer the problem
of extracting an overwhelming number of association rules and they are doing so in
exponential time. The new methods discussed in this chapter are based on some fast
(of polynomial time) heuristics that can derive a compact set of association rules.
Chapter 13 presents some new methods for analyzing and categorizing text documents.
Since theWeb has made possible the availability of immense textual (and not
only) information easily accessible to anyone with access to it, such methods are
expected to attract even more interest in the immediate future.
Chapters 14, 15, and 16 discuss some real-life case studies. Chapter 14 discusses
the analysis of some real-life EMG (electromyography) signals for predicting muscle
fatigue. The same chapter also presents a comparative study which indicates that the
proposed logic-based methods are superior to some of the traditional methods used
for this kind of analysis.
Chapter 15 presents some real-life data gathered from the analysis of cases suspected
of breast cancer. Next these data are transformed into equivalent binary data
and then some diagnostic rules (in the form of compact Boolean functions) are
extracted by using the methods discussed in earlier chapters. These rules are next
presented in the form of IF-THEN logical expressions (diagnostic rules).
Chapter 16 presents a combination of some of the proposed logic methods with
fuzzy logic. This is done in order to objectively capture fuzzy data that may play a
key role in many data mining and knowledge discovery applications. The proposed
new method is demonstrated in characterizing breast lesions in digital mammography
as lobular or microlobular. Such information is highly significant in analyzing
medical data for breast cancer diagnosis.
The last chapter presents some concluding remarks. Furthermore, it presents
twelve different areas that are most likely to experience high interest for future
research efforts in the field of data mining and knowledge discovery.
All the above chapters make clear that methods based on mathematical logic
already play an important role in data mining and knowledge discovery. Furthermore,
such methods are almost guaranteed to play an even more important role in the near
future as such problems increase both in complexity and in size.
Data Mining and Knowledge Discovery via Logic-Based Methods.pdf (4.47 MB, 需要: 5 个论坛币)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Data Mining knowledge Discovery Discover Methods

已有 2 人评分经验 论坛币 收起 理由
xujingtang + 100 精彩帖子
飞天玄舞6 + 20 精彩帖子

总评分: 经验 + 100  论坛币 + 20   查看全部评分

本帖被以下文库推荐

沙发
MouJack007 发表于 2017-8-23 23:19:57 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

藤椅
MouJack007 发表于 2017-8-23 23:25:35 |只看作者 |坛友微信交流群

使用道具

板凳
军旗飞扬 发表于 2017-8-24 06:31:09 |只看作者 |坛友微信交流群
谢谢楼主分享!

使用道具

报纸
pyx1548 学生认证  发表于 2017-8-24 07:51:05 |只看作者 |坛友微信交流群
谢谢分享

使用道具

地板
ziyi1121 发表于 2017-8-24 09:00:11 |只看作者 |坛友微信交流群
谢谢分享

使用道具

7
lianqu 发表于 2017-8-24 09:09:22 |只看作者 |坛友微信交流群

使用道具

8
caifacai 发表于 2017-8-24 09:25:37 |只看作者 |坛友微信交流群
感谢分享好资源!

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-27 22:06