楼主: oliyiyi
1378 3

Generalized Machine Learning - Kerneml - Simple ML to train Complex ML [推广有奖]

版主

已卖:2996份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
47275 个
通用积分
31671.2767
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2018-5-5 09:50:19 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币

There have been a lot of articles recently about a new form of optimization called 'particle optimization' or 'swarm optimization,' particle optimization with multiple particles. Coincidentally, I recently created a 'particle optimizer' and published a pip python package called kernelml. My goal is to eventually make the project open source. This optimizer can be used as a generalize machine learning algorithm for custom loss functions and non-linear coefficients.

Example use case:

Lets take the problem of clustering longitude and latitude coordinates. Clustering methods such as K-means use Euclidean distances to compare observations. However, The Euclidean distances between the longitude and latitude data points do not map directly to Haversine distance. That means if you normalize the coordinate between 0 and 1, the distance won't be accurately represented in the clustering model. A possible solution is to find a projection for latitude and longitude so that the Haversian distance to the centroid of the data points is equal to that of the projected latitude and longitude in Euclidean space.


The result of this coordinate transformations allows you represent the Haversine distance, relative to the center, as Euclidean distance, which can be scaled and used in a cluster solution.

Another, simpler problem is to find the optimal values of non-linear coefficients, i.e, power transformations in a least squares linear model. The reason for doing this is simple: integer power transformations rarely capture the best fitting transformation. By allowing the power transformation to be any real number, the accuracy will improve and the model will generalize to validation data much better.  

To clarify what is meant by a power transformation, the formula for the model is provided above.

The algorithm:

The idea behind kernelml is simple. Use the parameter update history in a machine learning model to decide how to update the next parameter set. Using a machine learning model as in the backend causes a bias variance problem, specifically, the parameter updates become more biased by iteration. The problem can be solved by including a monte carlo simulation around the best recorded parameter set after each iteration.

The issue of convergence:

The model saves the best parameter and user-defined loss after each iteration. The model also record a history of all parameter updates. The question is how to use this data to define convergence. One possible solution is:

         convergence = (best_parameter-np.mean(param_by_iter[-10:,:],axis=0))/(np.std(param_by_iter[-10:,:],axis=0))

         if np.all(np.abs(convergence)<1):
            print('converged')
            break

The formula create a Z-score using the last 10 parameters and the best parameter. If the Z-score for all the parameters is less than 1, then the algorithm can be said to have converged. This convergence solution works well when there is a theoretical best parameter set. This is a problem when using the algorithm for clustering. See the example below.

Figure 1: Clustering with kernelml, 2-D multivariate normal distribution (blue), cluster solution (other colors)

We won't get into the quality of the cluster solution because it is clearly not representative of the data. The cluster solution minimized the difference between a multidimensional histogram and the average probability of 6 normal distributions, 3 for each axis. Here, The distributions can 'trade' data points pretty easily which could increase convergence time. Why not just fit 3 multivariate normal distribution? There is a problem with simulating the distribution parameters because some parameters have constraints. The covariance matrix needs to be positive, semi-definite, and the inverse needs to exist. The standard deviation in a normal distribution must be >0. The solution used in this model incorporates the parameter constraints by making a custom simulation for each individual parameter. I have not found a good formulation on how to simulate the covariance matrix for a multivariate distribution yet.

The code for the clustering example, other uses cases, and documentation (still in progress) can be found in github.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Generalized Generalize Learning machine earning

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

沙发
hyq2003 发表于 2018-5-5 10:30:31

藤椅
minixi 发表于 2018-5-5 14:02:16
谢谢分享

板凳
albertwishedu 发表于 2018-5-5 20:53:51

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2026-1-11 20:41