请选择 进入手机版 | 继续访问电脑版
搜索
人大经济论坛 标签 Data 相关日志

tag 标签: Data经管大学堂:名校名师名课

相关日志

分享 Simulate time to event data
stzhao 2017-5-14 10:44
n = 100 beta1 = 2; beta2 = -1 lambdaT = .002 # baseline hazard lambdaC = .004 # hazard of censoring x1 = rnorm(n,0) x2 = rnorm(n,0) # true event time T = rweibull(n, shape=1, scale=lambdaT*exp(-beta1*x1-beta2*x2)) C = rweibull(n, shape=1, scale=lambdaC) # censoring time time = pmin(T,C) # observed time is minof censored and true event = time==T # set to 1 if event is observed library(survival) fit - survfit(Surv(time,event)~1) plot(fit)
个人分类: R学习|16 次阅读|0 个评论
分享 【转自online tutor】SAS 数据管理
huahuaaiqingtia 2015-6-16 08:05
1. create sample 2. create index 3. combine data 1. create sample (1) systematic sample of known # of obs data sasuser.subset; do pickit=1 to 142 by 15; set sasuser.revenue point=pickit; output; end; stop; run; (2) systematic sample of unknown # of obs data sasuser.subset; do pickit=1 to totobs by 10; set sasuser.revenue point=pickit nobs=totobs; output; end; stop; run; (3) random sample with replacement data work.rsubset(drop=1 sampsize); sampsize=10; do i=1 to sampsize; pickit=ceil(ranuni(0)*totobs); set sasuser.revenue point=pickit nobs=totobs; output; end; stop; run; proc print data=work.rsubset label; title 'A Random Sample with Replacement'; run; (4)random sample without replacement data work.rsubset(drop=obsleft sampsize); sampsize=10; obsleft=totobs; do while(sampsize0); pickit+1; if ranuni(0)sampsize/obsleft then do; set sasuser.revenue point=pickit nobs=totobs; output; sampsize=sampsize-1; end; obsleft=obsleft-1; end; stop; run; proc print data=work.rsubset label; title 'A Random Sample without Replacement'; run; 2. create index in data step manage index with proc datasets manage index with proc sql 3. combine data (1) filename statement filename qtr1('add1' 'add2' 'add3'); data work.firstqtr; infile qtr1; input Flight $ Origin $ Dest $ Date: date9. Revcargo: comma15.2; run; (2) infile statement data quarter (drop=monthnum midmon lastmon); monthnum=month(today()); midmon=month(intnx('month', today(), -1)); lastmon=month(intnx('month',today,-2)); do i=mnthnum, midmon, lastmon; nextfile=""!!compress(put(i,2.)!!".dat",' '); do until(lastobs); infile temp filevar=nextfile end=lastobs; input Flight $ Origin $ Dest $ Date: date9. Revcargo: comma15.2; output; end; stop; run; (3)proc append proc append base=work.acities data=work.airports force; run; (4) if-then/else statement data mylib.employees_new; set mylib.employees; if IDnum=1001 then Birthdate='01JAN1963'd; else if IDnum=1002 then Birthdate='08AUG1946'd; else if IDnum=1003 then Birthdate='23MAR1950'd; else if IDnum=1004 then Birthdate='17JUN11973'd; run; (5) array statement data mylib.employees_new; array birthdates{1001:1004} _temporary_ ( '01JAN1963'd '08AUG1946'd '23MAR1950'd '17JUN11973'd ); set mylib.employees; Bithdate=birthdates(IDnum); run; (6)format procedure proc format; value $birthdate '1001'= '01JAN1963'd '1002'='08AUG1946'd '1003'='23MAR1950'd '1004'='17JUN11973'd; run; data mylib.employees_new; set mylib.employees; Birthdate=input(put(IDnum,$birthdate.),date9.); run; (7) match-merge proc sort data=sasuser.expenses out=expenses; by flightid date; run; proc sort data=sasuser.revenue out=expenses; by flightid date; run; datarevexpns (drop=rev1st revbusiness revecon expenses); merge expenses(in=e) revenue(in=r); by flightid date; if e and r; Profit=sum(rev1st, revbusiness, revecon, -expenses); run; data sasuser.alldata; merge revexpns (in=r) acities (in=a rename=(code=dest) keep=city name code); by dest; if r and a; run; (8) sql proc sql; create table sqljoin as select revenue.flightid, revenue.date format=date9., revenue.origin, revenue.dest, sum(revenue.rev1st, revenue.revbusiness, revenue.revecon)-expenses.expenses as Profit, acities.city, acities.name from sasuser.expenses, sasuser.revenue, sasuser.acities where expenses.flightid=revenue.flightid and expenses.date=revenue.date and acities.code=revenue.dest order by revenue.dest, revenue.flightid, revenue.date; quit; (9) many-to-many match proc sql; create table flightemp as select flightschedule.*, firstname, lastname from sasuser.flightschedule, sasuser.flightattendants where flightschedule.empid=flightattendants.empid; quit; data fightemps3(drop=empnum jobcode) set sasuser.flightschedule; do i=1 to num; set sasuser.flightattendants(rename=(empid=empnum)) nob=num point=1; if empid=empnum then output; end; run; (10) summary data and detail data proc means data=sasuser.monthsum noprint; var revcargo; output out=sasuser.summary sum=Cargosum; run; data sasuser.percent1; if _n_=1 then set sasuser.summary(keep=cargosum); set sasuser.monthsum(keep=salemon revcargo); PctRev=revcargo/cargosum; run; data sasuser.percent2(drop=totalrev); if _n_=1 then do until(lastobs); set sasuser.monthsum(keep=revcargo) end=lastobs; totalrev+revcargo; end; set sasuser.monthsum (keep=salemon revcargo); PctRev=revcargo/totalrev; run; (11)index data work.profit work.errors; set sasuser.dnunder; set sasuser.sale200(keep=routeid flightid date rev1st revbusiness revecon revcargo) key=flightdate; if _iorc_=0 then do; Profit=sum(rev1st, revbusiness, revecon, revcargo, -expenses); output work.profit; end; else do; _error_=0; output work.errors; end; run; (12) multidimensional array data work.wndchill(drop=column row); array WC {4,2} _temporary_(-22, -16, -28, -22, -32, -26, -35, 29); set sasuser.flights; row=round(wspeed,5)/5; colunm=(round(temp,5)/5)+3; WindChill=wc{row, column}; run; (13) stored array values data work.lookup1; array Targets{1997:1999,12} _temporary_; if _n_=1 then do i=1 to 3; set sasuser.ctargets; array Mnth{*} Jan--Dec; do j=1 to dim(mnth); targets{year,j}=mnth{j}; end; end; set sasuser.monthsum(keep=salemin revargo monthno); year=input(substr(salemon,4),4.); Ctarget=targets{year,monthno}; format ctarget dollar15.2; run; (14) transpose and merge proc transpose data=sasuser.ctargets out=work.ctarget2 name=Month prefix=Ctarget; by year; run; proc sort data=work.ctarget2; by year month; run; data work.mnthsum2; set sasuser.monthsum(keep=SaleMon RevCargo); length Month $ 8; Year=input(substr(SaleMon,4),4.); Month=substr(SaleMon,1,1)||lowcase(substr(SaleMon,2,2)); run; proc sort data=work.mnthsum2; by year month; run; data work.merged; merge workmnthsum2 work.ctarget2; by year month; run;
26 次阅读|3 个评论
分享 你了解数据科学家有几类吗?
slimdell 2015-3-9 13:23
SAS中文论坛 统计根据不同的领域(生物,营销,产品,金融等)分有多类统计学家,比如:生物统计学家、经济学家、运筹学专家、精算师和商业分析师等。在数据科学 领域,也有不同类别的数据科学家。他们首先工作职位不同,比如我的工作职位就是一个数据公司的联合创始人(译者:DataScience Central公司,主要业务是给业界提供大数据的在线服务。包括从数据分析到集成和实现可视化全套流程。) 我们将数据科学家分为以下九类: (1)精于统计类。他们的主要工作是针对大数据开发新的统计理论。他们专注于非传统统计的统计建模,实验设计,抽样,聚类,数据缩减,置信区间,测试,建模,预测建模等相关技术。 (2)精于数学类。比如NSA(国家安全局)和国防/军事工业领域内的大数据专家,天文学家和运筹学家。他们专注于对业务的分析和优化,比如库存管理和预测,定价优化,供应链,质量控制,成品率优化等。他们主要的工作在于收集,分析数据并从中提取关键价值。 (3)精于数据工程,Hadoop,数据库/记忆/文件系统最优化和架构,API,作为服务的分析,数据流最优化,数据探究。 (4)精于机器学习/计算机科学类(算法,计算复杂性) (5)精于商业类,ROI最优化,决策科学。这些领域实际是大公司中商业分析师们所做的的一些传统的工作的一部分(例如数据报告设计,考量的混合选择以及考量的定义,ROI最优化,高水平的数据库设计)。 (6)精于生产代码的开发,软件工程(他们懂得很多编程语言)。 (7)精于数据可视化类。 (8)精于GIS,空间数据,图形数据建模,图形数据库。 (9) 精于以上多样。有着20年在大大小小的公司跨领域的工作经历,有些数据科学家既精通统计,又懂得机器学习,商业,数学而且还擅长可视化和数据工程。随着时 间的推移,随着经验的积累,你也可以变成这样的多面手。我之所以提到这点是因为很多人仍然认为开发跨多项领域的专业技能是件不可能的事情或传统上认为跨领 域的技能是分隔开来的。事实上,这种多能手便是数据科学的定义。 以上所提到的这些人中绝大多数都非常熟悉大数据甚至可称得上大数据专家。 此外,我们还有其他几种数据科学家分类法,具体可参看我们之前的文章“数据科学家的分类”(Taxonomy of datascientists)。有一种分类法是 基于有无创新性的差异 。有创新性前景会更好,而缺乏创新容易被外包。任何已出版的教材或已流行于网络上的都可以实现自动化或外包,而我们的工作保障就是不可替代性,是基于你知道多少别人不知道的内容或者你比别人更容易学会什么。按照这样的思路,我们可分为 科学应用型人才(指那些使用科学且通常不具有博士学位的从业者),科学创新型人才(指那些创造新科学的研究人员),以及混合型人才 。大多数的数据科学家都属于科学应用型人才,比如会预测地震的地质学家,会为制药公司设计新的药物分子的化学家和科学家们,他们多属于此类数据应用型人才。 对IT行业人士的启示 如果你是工程师或商业分析师,那么你的工作很可能已包括一些数据科学,而转型成为数据科学家的难度,或许也没有那么难。而如果你关心的是数据科学家这个潮流职业是否会对你现有的工作构成威胁,那么请阅读原文中的参考链接,它会给你有效建议,拓宽你的职业前景。 http://www.datasciencecentral.com/profiles/blogs/six-categories-of-data-scientists
个人分类: Big Data|0 个评论
分享 Micro & Macro Data
aliehs 2013-8-5 22:43
What is macro data? The terms micro and macro data are often used to denote data used in social science research. The distinction between them is, however, not always obvious. Micro data Micro data can generally be described as individual level data. These data have often been collected from each individual through a survey or interview. In such a dataset, each row typically represents an individual person and each column an attribute such as age, gender or job-type. Some well-known surveys that collect this type of data include the European Social Survey (ESS), the General Social Survey (GSS), the World Values Survey (WVS) etc. 'Micro data' would also denote data on individuals collected from governmental administrative systems and registers. While the main distinction is most often drawn between micro and macro data, the term 'meso data' is also sometimes used. Meso data generally refers to data on collective and cooperative actors such as commercial companies, organizations or political parties. Macro data 'Macro data' is generally a term used to describe mainly two subtypes of data; aggregated data system-level data Aggregated macro data provide information constructed by combining information on the lower level units, which the higher level unit is composed of (Diez-Roux 2002). Examples of aggregate data include summaries of the properties of individuals, unemployment statistics, demographics, GDP etc. Most often, aggregated macro data imply that the variables are summaries of the properties of lower level units and not measures of inherent higher level properties. System level macro data yield information about properties of the state or the political system and cannot be disaggregated to lower level units. This type of data form political indicators, such as institutional variables and regime indices, and is not based on summaries of the properties of lower-level units, but measures characteristics of the higher-level units themselves. The MacroDataGuide provides links and qualitative information on a wide range of both aggragated and system-level macro data sources. References Diez-Roux, Ana V. 2002. “A glossary for multilevel analysis”. Journal of Epidemiology and Community Health 56 (August): 588-594. BTW a good glossary of social science data terms: http://3stages.org/glossary/glossary.html#micro
个人分类: Data Analysis|13 次阅读|0 个评论

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-4-17 07:50