搜索
人大经济论坛 附件下载

附件下载

所在主题:
文件名:  Intro to data visualization.do
资料下载链接地址: https://bbs.pinggu.org/a-1905631.html
附件大小:

Intro to data visualization


  1. *******************************************************************************
  2. *******************************************************************************
  3. ***** *****
  4. ***** *****
  5. ***** Intro to data visualization *****
  6. ***** Oscar Torres-Reyna *****
  7. ***** DSS Princeton University *****
  8. ***** *****
  9. ***** *****
  10. ***** *****
  11. *******************************************************************************
  12. *******************************************************************************


  13. * NOTE, commands should be type either in the command window, see here page 5 here
  14. * http://dss.princeton.edu/training/StataTutorial.pdf#page=5
  15. * on in a do-file, see here page 9:
  16. * http://dss.princeton.edu/training/StataTutorial.pdf#page=9

  17. * Stata has a color-coded system, see here page 13
  18. * http://dss.princeton.edu/training/StataTutorial.pdf#page=13
  19. ******* Setting working directory
  20. *NOTE: If using Mac go to File->Change Working Directory, and select the folder

  21. *cd "H:"

  22. ******* Creating a log file.
  23. qui log c
  24. qui log using mylog.log, replace
  25. use "http://www.princeton.edu/~otorres/wdipol.dta", clear

  26. * See the data
  27. browse
  28. ******* Getting to know your data
  29. describe
  30. summarize
复制代码
  1. ******* Line graphs
  2. set scheme s1color
  3. line unemp unempf unempm year if country=="United States"


  4. summarize unemp unempf unempm
  5. replace unemp=. if unemp==0
  6. replace unempf=. if unempf==0
  7. replace unempm=. if unempm==0
  8. summarize unemp unempf unempm


  9. line unemp unempf unempm year if country=="United States"


  10. twoway line unemp unempf unempm year if country=="United States", ///
  11. title("Unemployment rate in the US, 1980-2012") ///
  12. legend(label(1 "Total") label(2 "Females") label(3 "Males")) ///
  13. lpattern(solid dash dot) ///
  14. ytitle("Percentage")

  15. twoway connected unemp unempf unempm year if country=="United States", ///
  16. title("Unemployment rate in the US, 1980-2012") ///
  17. legend(label(1 "Total") label(2 "Females") label(3 "Males")) ///
  18. msymbol(circle diamond square) ///
  19. ytitle("Percentage")


  20. twoway connected unemp year if country=="United States" | ///
  21. country=="United Kingdom" | ///
  22. country=="Australia" | ///
  23. country=="Qatar", ///
  24. by(country, title("Unemployment")) ///
  25. msymbol(circle_hollow)


  26. twoway (connected unemp year if country=="United States", msymbol(dh)) ///
  27. (connected unemp year if country=="United Kingdom", msymbol(th)) ///
  28. (connected unemp year if country=="Australia", msymbol(sh)) ///
  29. (connected unemp year if country=="Qatar", ///
  30. title("Unemployment") ///
  31. msymbol(ch) ///
  32. legend(label(1 "USA") label(2 "UK") label(3 "Australia") label(4 "Qatar")))


  33. twoway connected gdppc year if gdppc>40000, by(country) msymbol(diamond)


  34. bysort year: egen gdppc_mean=mean(gdppc)
  35. bysort year: egen gdppc_median=median(gdppc)
  36. twoway connected gdppc gdppc_mean year if country=="United States" | ///
  37. country=="United Kingdom" | ///
  38. country=="Australia" | ///
  39. country=="Qatar", ///
  40. by(country, title("GDP pc (PPP, 2005=100)")) ///
  41. legend(label(1 "GDP-PC") label(2 "Mean GDP-PC")) ///
  42. msymbol(circle_hollow)


  43. help twoway line
  44. help twoway connected
复制代码















  1. ******* Graph markers




  2. palette symbolpalette



  3. palette linepalette



  4. palette color green


  5. /*
  6. ssc install showmarkers

  7. showmarkers , over(msymbol)

  8. showmarkers , over(mcolor)

  9. showmarkers , over(mlpattern)
  10. */
复制代码




  1. ******* Bar graphs






  2. graph hbar (mean) gdppc /*Mean is the default*/





  3. graph hbar (mean) gdppc, over(country, sort(1) descending)





  4. graph hbar (mean) gdppc, over(country, sort(1) descending label(labsize(*0.5)))





  5. graph hbar (mean) gdppc (median) gdppc if gdppc>40000, ///
  6. over(country, sort(1) descending label(labsize(*1))) ///
  7. legend(label(1 "GDPpc (mean)") label(2 "GDPpc (median)"))




  8. help graph bar
复制代码









  1. ******* Box plots
  2. * Need to recode polity2

  3. recode polity2 (-10/-6=1 "Autocracy") ///
  4. (-5/6=2 "Anocracy") ///
  5. (7/10=3 "Democracy") ///
  6. (else=.), ///
  7. gen(regime) label(polity_rec)

  8. tab regime /* Frequency */
  9. tab regime, nolabel /* See numeric values*/
  10. tab country regime /* Cross tabulations */
  11. tab country regime, row /* Adding percent per row */

  12. help tab

  13. graph hbox gdppc

  14. graph hbox gdppc if gdppc<40000

  15. graph box gdppc, over(regime) yline(4517.94) marker(1,mlabel(country))

  16. help graph box
复制代码




  1. ******* Scatterplots


  2. * scatter y x



  3. scatter import export



  4. #d;
  5. twoway scatter import export || scatter import export if export>1000000,
  6. mlabel(country);
  7. #d cr



  8. twoway (scatter import export, ytitle("Imports") xtitle("Exports")) ///
  9. (scatter import export if export>1000000, mlabel(country) legend(off)) ///
  10. (lfit import export, note("Constant values, 2005, millions US[ DISCUZ_CODE_5 ]quot;))



  11. *bysort year: egen gdppc_mean=mean(gdppc)


  12. twoway (scatter gdppc year, jitter(13)) ///
  13. (connected gdppc_mean year, msymbol(diamond)) , xlabel(1980(1)2012, angle(90))



  14. help twoway scatter


  15. ******* Scatterplot matrix



  16. gr matrix gdppc unemp unempf unempm export import trade polity2, ///
  17. maxis(ylabel(none) xla(none))



  18. gr matrix gdppc unemp unempf unempm export import trade polity2, ///
  19. half maxis(ylabel(none) xla(none))



  20. help graph matrix
复制代码












  1. ******* Histograms




  2. hist gdppc



  3. /* Shows density*/




  4. hist gdppc, frequency




  5. /*Shows frequency*/




  6. hist gdppc, kdensity



  7. /* Combo histogram and density plot */



  8. hist gdppc, kdensity normal





  9. /* Adding a normal curve */




  10. hist gdppc, kdensity normal bin(20)




  11. hist gdppc if country=="United States" | country=="United Kingdom", bin(10) ///
  12. by(country)




  13. twoway hist gdppc if country=="United States", bin(10) || ///
  14. hist gdppc if country=="United Kingdom", bin(10) ///
  15. fcolor(none) lcolor(black) legend(label(1 "USA") label(2 "UK"))




  16. help hist
复制代码













  1. ******* Setup panel data

  2. * See http://dss.princeton.edu/training/Panel101.pdf

  3. *xtset country year
  4. /*Gives an error, 'country' is string*/

  5. encode country, gen(country1)

  6. /*Assign numeric value to strings*/
  7. xtset country1 year


  8. /*No error, 'country1' is coded variable*/

  9. xtline gdppc

  10. xtline gdppc if gdppc>39000, overlay

  11. help xtline

  12. ******* Combining graphs

  13. graph drop _all /*Drop graphics saved in memory*/

  14. hist gdppc if country=="United States", name(gdppc, replace)


  15. line unemp year if country=="United States", name(unemp, replace)


  16. graph combine gdppc unemp, col(1)

  17. help graph combine
  18. ******* Scatterplots with linear fit and confidence intervals
  19. use "http://dss.princeton.edu/training/students.dta", clear

  20. twoway (lfitci sat age) ///
  21. (scatter sat age, mlabel(lastname)),///
  22. title("SAT scores by age") ytitle("Sat")
  23. * Changing position

  24. generate position=3
  25. replace position=6 if lastname=="DOE01"
  26. replace position=6 if lastname=="DOE10"
  27. replace position=12 if lastname=="DOE14"
  28. replace position=12 if lastname=="DOE29"

  29. #d;
  30. twoway (lfitci sat age)
  31. (scatter sat age, mlabel(lastname)mlabv(position)
  32. jitter(21)), title("SAT scores by age") ytitle("Sat");
  33. #d cr

  34. * Without confidence intervals
  35. #d;
  36. twoway (lfit sat age)
  37. (scatter sat age, mlabel(lastname)mlabv(position)
  38. jitter(21)),title("SAT scores by age") ytitle("Sat");
  39. #d cr
  40. help twoway lfit
  41. help twoway lfitci
  42. ******* Plotting categorical variables

  43. ******* Mosaic plots (a.k.a spineplots)


  44. * May need to install it, type:

  45. ssc install spineplot

  46. use "http://dss.princeton.edu/training/students.dta", clear

  47. encode gender, gen(gender1)

  48. /* Assign numeric values to categories in string format*/

  49. encode major, gen(major1)

  50. spineplot gender1 major1

  51. bysort gender1 major1: gen gendermajor = _N

  52. spineplot gender1 major1, text(gendermajor)

  53. spineplot gender1 major1, percent bar1(bcolor(yellow)) ///
  54. bar2(bcolor(green)) text(gendermajor)

  55. * See the graphs here:
  56. * http://www.princeton.edu/~otorres/mosaic1.pdf
  57. * http://www.princeton.edu/~otorres/mosaic2.pdf

  58. ******* Using catplot

  59. * Chernoff faces
  60. * Few cases, each face is a row case.

  61. use "http://www.princeton.edu/~otorres/chernoff.dta", clear

  62. net install gr0038, from(http://www.stata-journal.com/software/sj9-3)

  63. /*User-written command, need to install*/

  64. chernoff, hdark(gdppc) bdens(trade) nose(unemp) mcurv(polity2) ///
  65. order(gdppc) ilabel(country)

  66. * See the graph here:
  67. * http://www.princeton.edu/~otorres/chernoff.pdf





  68. * Do not forget to close the log
  69. log close
复制代码
































/*讲义下载*/

  1. copy "http://dss.princeton.edu/training/Visual101.pdf" "Visual101.pdf"
复制代码




    熟悉论坛请点击新手指南
下载说明
1、论坛支持迅雷和网际快车等p2p多线程软件下载,请在上面选择下载通道单击右健下载即可。
2、论坛会定期自动批量更新下载地址,所以请不要浪费时间盗链论坛资源,盗链地址会很快失效。
3、本站为非盈利性质的学术交流网站,鼓励和保护原创作品,拒绝未经版权人许可的上传行为。本站如接到版权人发出的合格侵权通知,将积极的采取必要措施;同时,本站也将在技术手段和能力范围内,履行版权保护的注意义务。
(如有侵权,欢迎举报)
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

GMT+8, 2025-12-31 21:49