- 阅读权限
- 255
- 威望
- 0 级
- 论坛币
- 14391 个
- 通用积分
- 1962.0270
- 学术水平
- 1119 点
- 热心指数
- 1167 点
- 信用等级
- 1061 点
- 经验
- 8523 点
- 帖子
- 1629
- 精华
- 1
- 在线时间
- 2491 小时
- 注册时间
- 2010-10-10
- 最后登录
- 2025-12-1
已卖:1978份资源
学科带头人
还不是VIP/贵宾
- 威望
- 0 级
- 论坛币
 - 14391 个
- 通用积分
- 1962.0270
- 学术水平
- 1119 点
- 热心指数
- 1167 点
- 信用等级
- 1061 点
- 经验
- 8523 点
- 帖子
- 1629
- 精华
- 1
- 在线时间
- 2491 小时
- 注册时间
- 2010-10-10
- 最后登录
- 2025-12-1
 | 开心 2025-9-24 13:52:14 |
|---|
签到天数: 344 天 连续签到: 1 天 [LV.8]以坛为家I
|
经管之家送您一份
应届毕业生专属福利!
求职就业群
感谢您参与论坛问题回答
经管之家送您两个论坛币!
+2 论坛币
Intro to data visualization |
-
- *******************************************************************************
- *******************************************************************************
- ***** *****
- ***** *****
- ***** Intro to data visualization *****
- ***** Oscar Torres-Reyna *****
- ***** DSS Princeton University *****
- ***** *****
- ***** *****
- ***** *****
- *******************************************************************************
- *******************************************************************************
-
- * NOTE, commands should be type either in the command window, see here page 5 here
- * http://dss.princeton.edu/training/StataTutorial.pdf#page=5
- * on in a do-file, see here page 9:
- * http://dss.princeton.edu/training/StataTutorial.pdf#page=9
- * Stata has a color-coded system, see here page 13
- * http://dss.princeton.edu/training/StataTutorial.pdf#page=13
- ******* Setting working directory
- *NOTE: If using Mac go to File->Change Working Directory, and select the folder
- *cd "H:"
- ******* Creating a log file.
- qui log c
- qui log using mylog.log, replace
- use "http://www.princeton.edu/~otorres/wdipol.dta", clear
- * See the data
- browse
- ******* Getting to know your data
- describe
- summarize
复制代码- ******* Line graphs
- set scheme s1color
- line unemp unempf unempm year if country=="United States"
- summarize unemp unempf unempm
- replace unemp=. if unemp==0
- replace unempf=. if unempf==0
- replace unempm=. if unempm==0
- summarize unemp unempf unempm
- line unemp unempf unempm year if country=="United States"
- twoway line unemp unempf unempm year if country=="United States", ///
- title("Unemployment rate in the US, 1980-2012") ///
- legend(label(1 "Total") label(2 "Females") label(3 "Males")) ///
- lpattern(solid dash dot) ///
- ytitle("Percentage")
- twoway connected unemp unempf unempm year if country=="United States", ///
- title("Unemployment rate in the US, 1980-2012") ///
- legend(label(1 "Total") label(2 "Females") label(3 "Males")) ///
- msymbol(circle diamond square) ///
- ytitle("Percentage")
- twoway connected unemp year if country=="United States" | ///
- country=="United Kingdom" | ///
- country=="Australia" | ///
- country=="Qatar", ///
- by(country, title("Unemployment")) ///
- msymbol(circle_hollow)
- twoway (connected unemp year if country=="United States", msymbol(dh)) ///
- (connected unemp year if country=="United Kingdom", msymbol(th)) ///
- (connected unemp year if country=="Australia", msymbol(sh)) ///
- (connected unemp year if country=="Qatar", ///
- title("Unemployment") ///
- msymbol(ch) ///
- legend(label(1 "USA") label(2 "UK") label(3 "Australia") label(4 "Qatar")))
- twoway connected gdppc year if gdppc>40000, by(country) msymbol(diamond)
- bysort year: egen gdppc_mean=mean(gdppc)
- bysort year: egen gdppc_median=median(gdppc)
- twoway connected gdppc gdppc_mean year if country=="United States" | ///
- country=="United Kingdom" | ///
- country=="Australia" | ///
- country=="Qatar", ///
- by(country, title("GDP pc (PPP, 2005=100)")) ///
- legend(label(1 "GDP-PC") label(2 "Mean GDP-PC")) ///
- msymbol(circle_hollow)
- help twoway line
- help twoway connected
复制代码 |
|
- ******* Graph markers
- palette symbolpalette
- palette linepalette
- palette color green
- /*
- ssc install showmarkers
- showmarkers , over(msymbol)
- showmarkers , over(mcolor)
- showmarkers , over(mlpattern)
- */
复制代码 |
|
- ******* Bar graphs
- graph hbar (mean) gdppc /*Mean is the default*/
- graph hbar (mean) gdppc, over(country, sort(1) descending)
- graph hbar (mean) gdppc, over(country, sort(1) descending label(labsize(*0.5)))
- graph hbar (mean) gdppc (median) gdppc if gdppc>40000, ///
- over(country, sort(1) descending label(labsize(*1))) ///
- legend(label(1 "GDPpc (mean)") label(2 "GDPpc (median)"))
- help graph bar
复制代码 |
|
- ******* Box plots
- * Need to recode polity2
- recode polity2 (-10/-6=1 "Autocracy") ///
- (-5/6=2 "Anocracy") ///
- (7/10=3 "Democracy") ///
- (else=.), ///
- gen(regime) label(polity_rec)
- tab regime /* Frequency */
- tab regime, nolabel /* See numeric values*/
- tab country regime /* Cross tabulations */
- tab country regime, row /* Adding percent per row */
- help tab
- graph hbox gdppc
- graph hbox gdppc if gdppc<40000
- graph box gdppc, over(regime) yline(4517.94) marker(1,mlabel(country))
- help graph box
复制代码 |
|
- ******* Scatterplots
- * scatter y x
- scatter import export
- #d;
- twoway scatter import export || scatter import export if export>1000000,
- mlabel(country);
- #d cr
- twoway (scatter import export, ytitle("Imports") xtitle("Exports")) ///
- (scatter import export if export>1000000, mlabel(country) legend(off)) ///
- (lfit import export, note("Constant values, 2005, millions US[ DISCUZ_CODE_5 ]quot;))
- *bysort year: egen gdppc_mean=mean(gdppc)
- twoway (scatter gdppc year, jitter(13)) ///
- (connected gdppc_mean year, msymbol(diamond)) , xlabel(1980(1)2012, angle(90))
- help twoway scatter
- ******* Scatterplot matrix
- gr matrix gdppc unemp unempf unempm export import trade polity2, ///
- maxis(ylabel(none) xla(none))
- gr matrix gdppc unemp unempf unempm export import trade polity2, ///
- half maxis(ylabel(none) xla(none))
- help graph matrix
复制代码 |
|
- ******* Histograms
- hist gdppc
- /* Shows density*/
- hist gdppc, frequency
- /*Shows frequency*/
- hist gdppc, kdensity
- /* Combo histogram and density plot */
- hist gdppc, kdensity normal
- /* Adding a normal curve */
- hist gdppc, kdensity normal bin(20)
- hist gdppc if country=="United States" | country=="United Kingdom", bin(10) ///
- by(country)
- twoway hist gdppc if country=="United States", bin(10) || ///
- hist gdppc if country=="United Kingdom", bin(10) ///
- fcolor(none) lcolor(black) legend(label(1 "USA") label(2 "UK"))
- help hist
复制代码 |
|
- ******* Setup panel data
- * See http://dss.princeton.edu/training/Panel101.pdf
- *xtset country year
- /*Gives an error, 'country' is string*/
- encode country, gen(country1)
- /*Assign numeric value to strings*/
- xtset country1 year
- /*No error, 'country1' is coded variable*/
- xtline gdppc
- xtline gdppc if gdppc>39000, overlay
- help xtline
- ******* Combining graphs
- graph drop _all /*Drop graphics saved in memory*/
- hist gdppc if country=="United States", name(gdppc, replace)
- line unemp year if country=="United States", name(unemp, replace)
- graph combine gdppc unemp, col(1)
- help graph combine
- ******* Scatterplots with linear fit and confidence intervals
- use "http://dss.princeton.edu/training/students.dta", clear
- twoway (lfitci sat age) ///
- (scatter sat age, mlabel(lastname)), ///
- title("SAT scores by age") ytitle("Sat")
- * Changing position
- generate position=3
- replace position=6 if lastname=="DOE01"
- replace position=6 if lastname=="DOE10"
- replace position=12 if lastname=="DOE14"
- replace position=12 if lastname=="DOE29"
- #d;
- twoway (lfitci sat age)
- (scatter sat age, mlabel(lastname)mlabv(position)
- jitter(21)), title("SAT scores by age") ytitle("Sat");
- #d cr
- * Without confidence intervals
- #d;
- twoway (lfit sat age)
- (scatter sat age, mlabel(lastname)mlabv(position)
- jitter(21)),title("SAT scores by age") ytitle("Sat");
- #d cr
- help twoway lfit
- help twoway lfitci
- ******* Plotting categorical variables
- ******* Mosaic plots (a.k.a spineplots)
- * May need to install it, type:
- ssc install spineplot
- use "http://dss.princeton.edu/training/students.dta", clear
- encode gender, gen(gender1)
- /* Assign numeric values to categories in string format*/
- encode major, gen(major1)
- spineplot gender1 major1
- bysort gender1 major1: gen gendermajor = _N
- spineplot gender1 major1, text(gendermajor)
- spineplot gender1 major1, percent bar1(bcolor(yellow)) ///
- bar2(bcolor(green)) text(gendermajor)
- * See the graphs here:
- * http://www.princeton.edu/~otorres/mosaic1.pdf
- * http://www.princeton.edu/~otorres/mosaic2.pdf
- ******* Using catplot
- * Chernoff faces
- * Few cases, each face is a row case.
- use "http://www.princeton.edu/~otorres/chernoff.dta", clear
- net install gr0038, from(http://www.stata-journal.com/software/sj9-3)
- /*User-written command, need to install*/
- chernoff, hdark(gdppc) bdens(trade) nose(unemp) mcurv(polity2) ///
- order(gdppc) ilabel(country)
- * See the graph here:
- * http://www.princeton.edu/~otorres/chernoff.pdf
- * Do not forget to close the log
- log close
复制代码 |
|
|
扫码加我 拉你入群
请注明:姓名-公司-职位
以便审核进群资格,未注明则拒绝
|
|
-
总评分: 经验 + 200
论坛币 + 100
学术水平 + 6
热心指数 + 6
信用等级 + 6
查看全部评分
|