[数据挖掘理论与案例] 2016最新版：Foundations of Data Science (Avrim Blum...) [推广有奖]

1关注
0粉丝

本科生

76%

还不是VIP/贵宾

威望: 0 级
论坛币: 205 个
通用积分: 17.4500
学术水平: 5 点
热心指数: 4 点
信用等级: 3 点
经验: 1717 点
帖子: 13
精华: 0
在线时间: 209 小时
注册时间: 2016-2-18
最后登录: 2024-5-12

楼主

ofoliao 发表于 2016-10-14 18:36:45 |只看作者 |坛友微信交流群|倒序 |AI写论文

相似文件

换一批

是否 +2 论坛币

k人参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群

赵安豆老师微信：zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

立即领取

感谢您参与论坛问题回答

经管之家送您两个论坛币！

+2 论坛币

给需要的小伙伴

Contents
1 Introduction 8
2 High-Dimensional Space 11
2.1 Introduction . . . . . . . . . . . . . . . . . 11
2.2 The Law of Large Numbers . . . . . . . . 11
2.3 The Geometry of High Dimensions . . . . 14
2.4 Properties of the Unit Ball . . . . . . . . . 15
2.4.1 Volume of the Unit Ball . . . . . . 15
2.4.2 Most of the Volume is Near the Equator . . . . . . 17
2.5 Generating Points Uniformly at Random from a Ball . . . 20
2.6 Gaussians in High Dimension . . . . . . . 21
2.7 Random Projection and Johnson-Lindenstrauss Lemma . . 23
2.8 Separating Gaussians . . . 25
2.9 Fitting a Single Spherical Gaussian to Data . . . . . . . . 27
2.10 Bibliographic Notes . . . . 29
2.11 Exercises . 30
3 Best-Fit Subspaces and Singular Value Decomposition (SVD) 38
3.1 Introduction and Overview . . . . . . . . . 38
3.2 Preliminaries . . . . . . . 39
3.3 Singular Vectors . . . . . . 41
3.4 Singular Value Decomposition (SVD) . . . 44
3.5 Best Rank-k Approximations . . . . . . . 45
3.6 Left Singular Vectors . . . 47
3.7 Power Method for Computing the Singular Value Decomposition . . . . . . 49
3.7.1 A Faster Method . 50
3.8 Singular Vectors and Eigenvectors . . . . . 52
3.9 Applications of Singular Value Decomposition . . . . . . . 52
3.9.1 Centering Data . . 52
3.9.2 Principal Component Analysis . . . 53
3.9.3 Clustering a Mixture of Spherical Gaussians . . . . 54
3.9.4 Ranking Documents and Web Pages . . . . . . . . 59
3.9.5 An Application of SVD to a Discrete Optimization Problem . . . . 60
3.10 Bibliographic Notes . . . . 63
3.11 Exercises . 64
4 Random Graphs 71
4.1 The G(n; p) Model . . . . 71
4.1.1 Degree Distribution . . . . . . . . . 72
4.1.2 Existence of Triangles in G(n; d=n) . . . 77
4.2 Phase Transitions . . . . . 79
4.3 The Giant Component . . 87
4.4 Branching Processes . . . 96
4.5 Cycles and Full Connectivity . . 102
4.5.1 Emergence of Cycles . . 102
4.5.2 Full Connectivity . 104
4.5.3 Threshold for O(ln n) Diameter . . 105
4.6 Phase Transitions for Increasing Properties . . . 107
4.7 Phase Transitions for CNF-sat . 109
4.8 Nonuniform and Growth Models of Random Graphs . . . . 114
4.8.1 Nonuniform Models . . . 114
4.8.2 Giant Component in Random Graphs with Given Degree Distribution114
4.9 Growth Models . . . . . . 116
4.9.1 Growth Model Without Preferential Attachment . . 116
4.9.2 Growth Model With Preferential Attachment . . . 122
4.10 Small World Graphs . . . 124
4.11 Bibliographic Notes . . . . 129
4.12 Exercises . 130
5 Random Walks and Markov Chains 139
5.1 Stationary Distribution . . 143
5.2 Markov Chain Monte Carlo . . 145
5.2.1 Metropolis-Hasting Algorithm . . . 146
5.2.2 Gibbs Sampling . . 147
5.3 Areas and Volumes . . . . 150
5.4 Convergence of Random Walks on Undirected Graphs . . . 151
5.4.1 Using Normalized Conductance to Prove Convergence . . 157
5.5 Electrical Networks and Random Walks . . 160
5.6 Random Walks on Undirected Graphs with Unit Edge Weights . 164
5.7 Random Walks in Euclidean Space . . . . 171
5.8 The Web as a Markov Chain . . 175
5.9 Bibliographic Notes . . . . 179
5.10 Exercises . 180
6 Machine Learning 190
6.1 Introduction . . 190
6.2 Overtting and Uniform Convergence . . . 192
6.3 Illustrative Examples and Occam's Razor . 194
6.3.1 Learning disjunctions . . 194
6.3.2 Occam's razor . . . 195
6.3.3 Application: learning decision trees . . . 196
6.4 Regularization: penalizing complexity . . . 197
6.5 Online learning and the Perceptron algorithm . 198
6.5.1 An example: learning disjunctions . 198
6.5.2 The Halving algorithm . 199
6.5.3 The Perceptron algorithm . . . . . 199
6.5.4 Extensions: inseparable data and hinge-loss . . . . 201
6.6 Kernel functions . . . . . . 202
6.7 Online to Batch Conversion . . 204
6.8 Support-Vector Machines . 205
6.9 VC-Dimension . 206
6.9.1 Denitions and Key Theorems . . . 207
6.9.2 Examples: VC-Dimension and Growth Function . . 209
6.9.3 Proof of Main Theorems . . . . . . 211
6.9.4 VC-dimension of combinations of concepts . . . . . 214
6.9.5 Other measures of complexity . . . 214
6.10 Strong and Weak Learning - Boosting . . . 215
6.11 Stochastic Gradient Descent . . 218
6.12 Combining (Sleeping) Expert Advice . . . 220
6.13 Deep learning . 222
6.14 Further Current directions . . . 228
6.14.1 Semi-supervised learning . . . . . . 228
6.14.2 Active learning . . 231
6.14.3 Multi-task learning . . . 231
6.15 Bibliographic Notes . . . . 232
6.16 Exercises . 233
7 Algorithms for Massive Data Problems: Streaming, Sketching, and
Sampling 237
7.1 Introduction . . 237
7.2 Frequency Moments of Data Streams . . . 238
7.2.1 Number of Distinct Elements in a Data Stream . . 239
7.2.2 Counting the Number of Occurrences of a Given Element. . . . . . 242
7.2.3 Counting Frequent Elements . . . . 243
7.2.4 The Second Moment . . 244
7.3 Matrix Algorithms using Sampling . . . . 247
7.3.1 Matrix Multiplication Using Sampling . 249
7.3.2 Implementing Length Squared Sampling in two passes . . 252
7.3.3 Sketch of a Large Matrix . . . . . . 253
7.4 Sketches of Documents . . 256
7.5 Bibliography . . 258
7.6 Exercises . 259
8 Clustering 264
8.1 Introduction . . 264
8.1.1 Two general assumptions on the form of clusters . . 265
8.2 k-means Clustering . . . . 267
8.2.1 A maximum-likelihood motivation for k-means . . . 267
8.2.2 Structural properties of the k-means objective . . . 268
8.2.3 Lloyd's k-means clustering algorithm . . 268
8.2.4 Ward's algorithm . 270
8.2.5 k-means clustering on the line . . . 271
8.3 k-Center Clustering . . . . 271
8.4 Finding Low-Error Clusterings . 272
8.5 Approximation Stability . 272
8.6 Spectral Clustering . . . . 275
8.6.1 Stochastic Block Model . 276
8.6.2 Gaussian Mixture Model . . . . . . 278
8.6.3 Standard Deviation without a stochastic model . . 278
8.6.4 Spectral Clustering Algorithm . . . 279
8.7 High-Density Clusters . . 281
8.7.1 Single-linkage . . . 281
8.7.2 Robust linkage . . 282
8.8 Kernel Methods . . . . . . 283
8.9 Recursive Clustering based on Sparse cuts . . . 283
8.10 Dense Submatrices and Communities . . . 284
8.11 Community Finding and Graph Partitioning . . 287
8.11.1 Flow Methods . . . 287
8.12 Axioms for Clustering . . 290
8.12.1 An Impossibility Result . . . . . . 290
8.12.2 Satisfying two of three . 291
8.12.3 Relaxing the axioms . . 293
8.12.4 A Satisable Set of Axioms . . . . 293
8.13 Exercises . 295
9 Topic Models, Hidden Markov Process, Graphical Models, and Belief
Propagation 299
9.1 Topic Models . 299
9.2 Hidden Markov Model . . 303
9.3 Graphical Models, and Belief Propagation . . . 308
9.4 Bayesian or Belief Networks . . 308
9.5 Markov Random Fields . . 309
9.6 Factor Graphs . 311
9.7 Tree Algorithms . . . . . . 311
9.8 Message Passing in general Graphs . . . . 313
9.9 Graphs with a Single Cycle . . 315
9.10 Belief Update in Networks with a Single Loop . 316
9.11 Maximum Weight Matching . . 318
9.12 Warning Propagation . . . 321
9.13 Correlation Between Variables . 322
9.14 Exercises . 327
10 Other Topics 329
10.1 Rankings . 329
10.2 Hare System for Voting . . 331
10.3 Compressed Sensing and Sparse Vectors . 332
10.3.1 Unique Reconstruction of a Sparse Vector . . . . . 333
10.3.2 The Exact Reconstruction Property . . . 336
10.3.3 Restricted Isometry Property . . . 337
10.4 Applications . . 339
10.4.1 Sparse Vector in Some Coordinate Basis . . . . . . 339
10.4.2 A Representation Cannot be Sparse in Both Time and Frequency
Domains . . . . . . 340
10.4.3 Biological . . . . . 342
10.4.4 Finding Overlapping Cliques or Communities . . . 342
10.4.5 Low Rank Matrices . . . 343
10.5 Gradient . 344
10.6 Linear Programming . . . 345
10.6.1 The Ellipsoid Algorithm . . . . . . 347
10.7 Integer Optimization . . . 348
10.8 Semi-Denite Programming . . 349
10.9 Exercises . 351
11 Wavelets 354
11.1 Dilation . 354
11.2 The Haar Wavelet . . . . . 355
11.3 Wavelet Systems . . . . . 359
11.4 Solving the Dilation Equation . 359
11.5 Conditions on the Dilation Equation . . . 361
11.6 Derivation of the Wavelets from the Scaling Function . . . 363
11.7 Sucient Conditions for the Wavelets to be Orthogonal . . 367
11.8 Expressing a Function in Terms of Wavelets . . 370
11.9 Designing a Wavelet System . . 371
12 Appendix 375
12.1 Asymptotic Notation . . . 375
12.2 Useful relations . . . . . . 376
12.3 Useful Inequalities . . . . 380
12.4 Probability . . 387
12.4.1 Sample Space, Events, Independence . . 388
12.4.2 Linearity of Expectation . . . . . . 389
12.4.3 Union Bound . . . 389
12.4.4 Indicator Variables . . . 389
12.4.5 Variance . . . . . . 390
12.4.6 Variance of the Sum of Independent Random Variables . 390
12.4.7 Median . . . . . . 391
12.4.8 The Central Limit Theorem . . . . 391
12.4.9 Probability Distributions . . . . . . 391
12.4.10Bayes Rule and Estimators . . . . . 395
12.4.11Tail Bounds and Cherno inequalities . . 397
12.5 Bounds on Tail Probability . . . 401
12.6 Applications of the tail bound . 403
12.7 Eigenvalues and Eigenvectors . 405
12.7.1 Symmetric Matrices . . 406
12.7.2 Relationship between SVD and Eigen Decomposition . . 408
12.7.3 Extremal Properties of Eigenvalues . . . 409
12.7.4 Eigenvalues of the Sum of Two Symmetric Matrices . . . 411
12.7.5 Norms . 412
12.7.6 Important Norms and Their Properties . 413
12.7.7 Linear Algebra . . 415
12.7.8 Distance between subspaces . . . . 417
12.8 Generating Functions . . 418
12.8.1 Generating Functions for Sequences Dened by Recurrence Relationships
. . . . . . 419
12.8.2 The Exponential Generating Function and the Moment Generating
Function . . . . . . 421
12.9 Miscellaneous . 423
12.9.1 Lagrange multipliers . . 423
12.9.2 Finite Fields . . . . 423
12.9.3 Hash Functions . . 424
12.9.4 Application of Mean Value Theorem . . 424
12.9.5 Sperner's Lemma . 426
12.9.6 Prufer . 426
12.10Exercises . 427
Index 433