楼主: zhangfu1986
1762 4

网页挖掘-英文原版(免费赠送) [推广有奖]

  • 1关注
  • 3粉丝

博士生

0%

还不是VIP/贵宾

-

威望
0
论坛币
7811 个
通用积分
4.6200
学术水平
2 点
热心指数
4 点
信用等级
4 点
经验
5562 点
帖子
108
精华
0
在线时间
292 小时
注册时间
2010-4-4
最后登录
2023-9-12

相似文件 换一批

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
DATA MINING
THE WEB
Uncovering Patterns in
Web Content, Structure,
and Usage

出版信息:
Copyright C  2007 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee
to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax
978-750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
201-748–6011, fax 201-748–6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or completeness
of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness
for a particular purpose. No warranty may be created or extended by sales representatives or written sales
materials. The advice and strategies contained herein may not be suitable for your situation. You should
consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss
of profit or any other commercial damages, including but not limited to special, incidental, consequential,
or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at 877-762-2974, outside the United States at 317-
572-3993 or fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic formats. For more information about Wiley products, visit our web site at
www.wiley.com.
Wiley Bicentennial Logo: Richard J. Pacifico
Library of Congress Cataloging-in-Publication Data:
Markov, Zdravko, 1956–
Data-mining the Web : uncovering patterns in Web content, structure, and usage /
by Zdravko, Markov & Daniel T. Larose.
p.   cm.
Includes index.
978-0-471-66655-4 (cloth)
1. Data mining.   2. Web databases.   I. Larose, Daniel T.   II. Title.
QA76.9.D343M38 2007
005.74 – dc22
2006025099


以下是目录:
CONTENTS
PREFACE xi
PART  I
WEB STRUCTURE MINING
1      INFORMATION RETRIEVAL AND WEB SEARCH 3
Web Challenges 3
Web Search Engines 4
Topic Directories 5
Semantic Web 5
Crawling the Web 6
Web Basics 6
Web Crawlers 7
Indexing and Keyword Search 13
Document Representation 15
Implementation Considerations 19
Relevance Ranking 20
Advanced Text Search 28
Using the HTML Structure in Keyword Search 30
Evaluating Search Quality 32
Similarity Search 36
Cosine Similarity 36
Jaccard Similarity 38
Document Resemblance 41
References 43
Exercises 43
2      HYPERLINK-BASED RANKING 47
Introduction 47
Social Networks Analysis 48
PageRank 50
Authorities and Hubs 53
Link-Based Similarity Search 55
Enhanced Techniques for Page Ranking 56
References 57
Exercises 57
viiviii CONTENTS
PART  II
WEB CONTENT MINING
3      CLUSTERING 61
Introduction 61
Hierarchical Agglomerative Clustering 63
k-Means Clustering 69
Probabilty-Based Clustering 73
Finite Mixture Problem 74
Classification Problem 76
Clustering Problem 78
Collaborative Filtering (Recommender Systems) 84
References 86
Exercises 86
4      EVALUATING CLUSTERING 89
Approaches to Evaluating Clustering 89
Similarity-Based Criterion Functions 90
Probabilistic Criterion Functions 95
MDL-Based Model and Feature Evaluation 100
Minimum Description Length Principle 101
MDL-Based Model Evaluation 102
Feature Selection 105
Classes-to-Clusters Evaluation 106
Precision, Recall, and F-Measure 108
Entropy 111
References 112
Exercises 112
5      CLASSIFICATION 115
General Setting and Evaluation Techniques 115
Nearest-Neighbor Algorithm 118
Feature Selection 121
Naive Bayes Algorithm 125
Numerical Approaches 131
Relational Learning 133
References 137
Exercises 138
PART  III
WEB USAGE MINING
6      INTRODUCTION TO WEB USAGE MINING 143
Definition of Web Usage Mining 143
Cross-Industry Standard Process for Data Mining 144
Clickstream Analysis 147CONTENTS ix
Web Server Log Files 148
Remote Host Field 149
Date/Time Field 149
HTTP Request Field 149
Status Code Field 150
Transfer Volume (Bytes) Field 151
Common Log Format 151
Identification Field 151
Authuser Field 151
Extended Common Log Format 151
Referrer Field 152
User Agent Field 152
Example of a Web Log Record 152
Microsoft IIS Log Format 153
Auxiliary Information 154
References 154
Exercises 154
7      PREPROCESSING FOR WEB USAGE MINING 156
Need for Preprocessing the Data 156
Data Cleaning and Filtering 158
Page Extension Exploration and Filtering 161
De-Spidering the Web Log File 163
User Identification 164
Session Identification 167
Path Completion 170
Directories and the Basket Transformation 171
Further Data Preprocessing Steps 174
References 174
Exercises 174
8      EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177
Introduction 177
Number of Visit Actions 177
Session Duration 178
Relationship between Visit Actions and Session Duration 181
Average Time per Page 183
Duration for Individual Pages 185
References 188
Exercises 188
9      MODELING FOR WEB USAGE MINING: CLUSTERING,
ASSOCIATION, AND CLASSIFICATION 191
Introduction 191
Modeling Methodology 192
Definition of Clustering 193
The BIRCH Clustering Algorithm 194
Affinity Analysis and the A Priori Algorithm 197x CONTENTS
Discretizing the Numerical Variables: Binning 199
Applying the A Priori Algorithm to the CCSU Web Log Data 201
Classification and Regression Trees 204
The C4.5 Algorithm 208
References 210
Exercises 211
INDEX 213
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:英文原版 introduction Presentation relationship Professional 挖掘 英文 原版 网页

学的太多,反而不知道自己到底学了什么..
沙发
zhangfu1986 发表于 2010-4-18 23:29:59 |只看作者 |坛友微信交流群
由于第一本书发重了,现在再补发另外两本关于网页挖掘的英文书。
第一本
Web_Data_Mining_Exploring_Hyperlinks_Contents_and_Usage_Data
第二本书
Mining_The_Web_Discovering_Knowledge_From_Hypertext_Data

Web_Data_Mining_Exploring_Hyperlinks_Contents_and_Usage_Data.pdf

3.88 MB

需要: 2 个论坛币  [购买]

Mining_The_Web_Discovering_Knowledge_From_Hypertext_Data.pdf

2.98 MB

需要: 2 个论坛币  [购买]

学的太多,反而不知道自己到底学了什么..

使用道具

藤椅
爱萌 发表于 2010-4-20 22:09:52 |只看作者 |坛友微信交流群
knowledge is worthy , the price of ebook is not .....
最恨对我说谎或欺骗我的人

使用道具

板凳
oneforall 发表于 2010-4-21 08:02:26 |只看作者 |坛友微信交流群
http://www.pinggu.org/bbs/thread-546798-1-1.html
上传之前难道没有先检索一下么?

使用道具

报纸
zhangfu1986 发表于 2010-4-24 10:05:34 |只看作者 |坛友微信交流群
4# oneforall


不好意思,之前没有检索到....
学的太多,反而不知道自己到底学了什么..

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-5-6 06:39