如何使用matlab的这个smote的M文件,新手上路,请多多指教!-经管之家官网!

人大经济论坛-经管之家 收藏本站
您当前的位置> 考研考博>>

考研

>>

如何使用matlab的这个smote的M文件,新手上路,请多多指教!

如何使用matlab的这个smote的M文件,新手上路,请多多指教!

发布:someone111 | 分类:考研

关于本站

人大经济论坛-经管之家:分享大学、考研、论文、会计、留学、数据、经济学、金融学、管理学、统计学、博弈论、统计年鉴、行业分析包括等相关资源。
经管之家是国内活跃的在线教育咨询平台!

经管之家新媒体交易平台

提供"微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯"等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

提供微信号、微博、抖音、快手、头条、小红书、百家号、企鹅号、UC号、一点资讯等虚拟账号交易,真正实现买卖双方的共赢。【请点击这里访问】

functionsample=SMOTE(T,N,k,type,attribute,AttVector)%ImplementSMOTEalgorithm[1].%Thisalgorithmresamplesthesmallclassthroughtakingeachsmall%classexampleandintroducingsyntheticexamplesalongtheline%segme ...
免费学术公开课,扫码加入


function sample=SMOTE(T,N,k,type,attribute,AttVector)
% Implement SMOTE algorithm [1].
% This algorithm resamples the small class through taking each small
% class example and introducing synthetic examples along the line
% segments joining its small class nearest neighbors.
%
% Usage:
% sample=SMOTE(T,N,k,type,attribute,AttVector)
%
% sample: instance matrix.
% it does not contain original set but new synthetic samplesonly
% T : orginal minority class samples. No.Attribute * No.Instance
% N: number of new samples to generate
% k : k-NN used in the algorithm. default value is 5
% type: 'nominal' or 'numeric'.the former using VDM to deal with nominal
% attrinutes when calculate distance while the latter treats nominal
% attributes the same as numric ones, i.e. Eular distance is used.
% default value: 'numeric'
%attribute: attribute structure vector carrying VDM information.
% Each entry has 3 fields
% FIELD kind - 'nominal' or 'numeric'
% FIELD values - values on the nominal attribute or [] for numeric attribute
% FIELD VDM - VDM[i,j] denotes VDM distance between i-th value
% and j-th value on the nominal attribute
%AttVector: attribute vector,1 presents for the corresponding attribute
% is nominal and 0 for numeric.
%
% Refer [1]:
% N.V. Chawla, K.W. Bowyer, L.O. Hall, and W.P. Kegelmeyer, “SMOTE:
% synthetic minority over-sampling technique,” Journal of Artificial Intelligence
% Research, vol.16, pp.321–357, 2002
if(nargin<2)
help smote
elseif(nargin<3)
k=5;
type='numeric';
AttVector=zeros(1,size(T,1));
elseif(nargin<4)
type='numeric';
AttVector=zeros(1,size(T,1));
end
if(strcmp(type,'nominal')&& nargin<6)
help smote
end
NT=size(T,2);
if(NT==0)
error('check T.')
elseif(NT==1)%duplicate
sample=repmat(T,1,N);
else
% number of nearest neighbours can not be greater than NT-1
if(k>NT-1)
k=NT-1;
warning('not so many instances in T.k is set to %d',k);
end
% number of new examples that each example in T should generate
NumAtt=size(T,1);
n=floor(N/NT);
remainder=N-NT*n;
id=randperm(NT);
No=ones(1,NT)*n;
No(id(1:remainder))=No(id(1:remainder))+1;
% generation
sample=[];
for i=find(No~=0)
% k-NN
if(strcmp(type,'numeric'))
d=dist(T(:,i)', T);
elseif(strcmp(type,'nominal'))
aid=find(AttVector==0);
if(isempty(aid))%SMOTE-N
d=dist_nominal(T(:,i),T,attribute,AttVector);
else%SMOTE-NC
Med=median(std(T(aid,:)'));
d=dist_smote(T(:,i),T,AttVector,Med);
end
else
error('type err.\n')
end
d(i)=Inf;
if(k<log(NT))
min_id=[];
for j=1:k
[tmp,id]=min(d);
d(id)=Inf;
min_id=[min_id id];% sort>=O(n*logn),so we take min: O(n).total time:O(k*n)
end
else
[tmp,id]=sort(d);
min_id=id(1:k);
end
rn=floor(rand(1,No(i))*k)+1;
id=min_id(rn);
weight=rand(NumAtt,No(i));
D=repmat(T(:,i),1,No(i));
% for numeric attributes
aid=find(AttVector==0);
D(aid,:)=D(aid,:)+weight(aid,:).*(T(aid,id)-D(aid,:));
% for nominal attributes the new instances take the most frequent
% value in the union of the seed and corresponding k-NN
aid=find(AttVector==1);
for i_aid=1:length(aid)
count=zeros(1,length(attribute(aid(i_aid)).values));
for i_v=1:length(attribute(aid(i_aid)).values)
count(i_v)=length(find([T(aid(i_aid),id) D(aid(i_aid),1)]==attribute(aid(i_aid)).values(i_v)));
end
[tmp,most_id]=max(count);
most_nominal(i_aid)=attribute(aid(i_aid)).values(most_id);
end
D(aid,:)=repmat(most_nominal',1,No(i));
sample=[sample D];
end
end
%end
%-------------------------------------------------------------------------
function z = dist_smote(w,p,AttVector,Med)
[R,Nw] = size(w);
[R2,Np] = size(p);
if (R ~= R2)
error('Attribute numbers do not match.')
end
z = zeros(Nw,Np);
for ii=1:Nw
id=find(AttVector==0);%numeric
z(ii,:)= sum((repmat(w(id,ii),1,Np)-p(id,:)).^2);
id=find(AttVector==1);
z(ii,:)=z(ii,:)+sum(repmat(w(id,ii),1,Np)~=p(id,:),1)*(Med^2);
end
z = sqrt(z);
「经管之家」APP:经管人学习、答疑、交友,就上经管之家!
免流量费下载资料----在经管之家app可以下载论坛上的所有资源,并且不额外收取下载高峰期的论坛币。
涵盖所有经管领域的优秀内容----覆盖经济、管理、金融投资、计量统计、数据分析、国贸、财会等专业的学习宝库,各类资料应有尽有。
来自五湖四海的经管达人----已经有上千万的经管人来到这里,你可以找到任何学科方向、有共同话题的朋友。
经管之家(原人大经济论坛),跨越高校的围墙,带你走进经管知识的新世界。
扫描下方二维码下载并注册APP
本文关键词:

本文论坛网址:https://bbs.pinggu.org/thread-4125078-1-1.html

人气文章

1.凡人大经济论坛-经管之家转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
经管之家 人大经济论坛 大学 专业 手机版