楼主: Mrr4o9o9ki8e3
822 1

[学习分享] case study: use SAS to perform fundamental descriptive statistics (Original EN) [推广有奖]

  • 0关注
  • 0粉丝

已卖:4份资源

硕士生

18%

还不是VIP/贵宾

-

威望
0
论坛币
118 个
通用积分
15.0600
学术水平
0 点
热心指数
0 点
信用等级
0 点
经验
4130 点
帖子
102
精华
0
在线时间
37 小时
注册时间
2019-4-18
最后登录
2019-11-24

楼主
Mrr4o9o9ki8e3 在职认证  发表于 2019-8-4 08:41:02 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Data: givingindex2015_ver2.sas7bdat.zip (4.41 KB, 需要: 5 个论坛币) 本附件包括:
  • givingindex2015_ver2.sas7bdat

Background:

The World Giving Index is published every year by the Charities Aid Foundation (CAF) www.cafonline.org. The index aims to provide insight into the scale and nature of giving around the world. The World Giving Index 2015 presents giving data from 145 countries from across the globe, based on questionnaires completed by a representative sample of individuals living across each country. Index scores have been calculated from responses to the following questions:

Have you done any of the following in the past month?

Donated money to a charity?

Helped a stranger, or someone you didn’t know who needed help?

Volunteered your time to an organization?


Variable

Description

Country_name

Name of country

Country_Code

Abbreviation for country name

Region

Africa, Asia, Europe, Middle East, North & Central America, Oceania, South America

Donating

Donating money score for each country, percent of population

Helping

Helping a stranger score for each country, percent of population

Volunteering

Volunteering score for each country, percent of population

Economy

Three levels of economic development



/**************************************************************************************************************************************/



Study the distributions of donating, helping and volunteering scores.


The values in Donating, Helping and Volunteering columns are calculated scores (percentage of population), numerial variables.


Location measures

Firstly, location measures, mean, median and mode, are used to observe data central tendency.

[td]

Variable

N

N Miss

Mode

Mean

Median

Helping145050.00049.22850.000
Donating145010.00031.51028.000
Volunteering14509.00021.06920.000

codes using the MEANS procedure:

  1. proc means data=mydata.givingindex2015_ver2
  2.    n nmiss mode mean median maxdec=3;
  3.    var helping donating volunteering;
  4. run;
复制代码

From the central tendency measures table, 2 patterns can be quickly found regarding variable 'Helping'.

  • it has almost same mode, mean and median values, which means that it is very close to a normal distribution, where mode=mean=median. This indicates that regarding 'Helping' score, the 145 countries will have a near-symmetric distribution without outliers, with majority countries' scores close up to around 50 and minority of them have very low scores and equally minority have very high socres.

  • it has the highest scores on all of the 3 measures, comparing with other 2 variables 'Donating' and 'Volunteering'.


As a result of this observation, mean can be used to represent the central of variable 'Helping'.

Regarding variables 'Donating' and 'Volunteering', both of them have their median values lower than their mean values, i.e. means on the right hand side of medians, which indicates their distributions will be right-skewed.

  • 'Donating' has median value of 28, which means 50% of the countries socred 28 and lower, and another 50% scored higher than 28. Whereas its average socre across the 145 countries is 31.51, higher than the very middle value 28. Mean value has been carried away by some extreme values (could be very high scores or could be outliers) in this variable's range that contribute to the rise of the mean, therefore mean value has lost its representation.

  • 'Volunteering' has the same tendency but with just a slight difference between mean and median, which is still right-skewed but skewness will be less than 'Donating'. The possibility of having outliers need to be examined through the analysis of distribution spread.


As a result of the observation, median can be used to represent the central of 'Donating' and 'Volunteering'.



Dispersion measures

Then, to discover what factors are affecting the spread of the distribuitons, measures like variance, standard deviation, range, quartiles, and interquartile range are used to observe data's dispersion.

[td]

Variable

N

Variance

Std Dev

Minimum

Lower Quartile

Upper Quartile

Maximum

Quartile Range

Range

Helping  Donating  Volunteering145  145  145 193.149  379.210  130.315 13.898  19.473  11.416 16.000  3.000  3.000 38.000  16.000  11.000 59.000  44.000  29.000 79.000  92.000  50.000 21.000  28.000  18.000 63.000  89.000  47.000

codes using the MEANS procedure:

  1. proc means data=mydata.givingindex2015_ver2
  2.    n var std min q1 q3 max qrange range maxdec=3;
  3.    var helping donating volunteering;
  4. run;
复制代码

Variance and Standard Deviation - 'Donating' has higher variance 379.21 and std deviation score 19.473 than other 2 variables, which indicates bigger sample data fluctuation. 'Donating' samples take values (scores) in a much bigger range than the other 2 variables, that caused the bigger variation, i.e. more variaty, more information.

For 'Donating', q3+1.5xIQR = 86, the whisker will stop at this top limit 86, any values larger than this will be outliers. From above table, can see Maximum is 92, which already is an outlier. q1-1.5xIQR = -26, bottom limit stops at -26, while Minimum 3 falls above it, so no outliers at the lower end of the data.

For 'Volunteering', q3+1.5xIQR = 56. Top limit of the whisker is 56. Maximum 50 is within this limit, no outliers at the higher end. q1-1.5xIQR = -16, while Minimum 3 falls within this limit, so no outliers at the lower end either.


Skewness and Kurtosis

Next, measures skewness and kurtosis are printed to observe skewness level and peak distributions.

[td]

Variable

Skewness

Kurtosis

Helping  0.106  -0.686  
Donating  0.807  0.050  
Volunteering0.528-0.622

codes using the MEANS procedure:[LaTex]

  1. proc means data=MYDATA.GIVINGINDEX2015_VER2
  2.    skewness kurtosis maxdec=3;
  3.    var Helping Donating Volunteering;
复制代码



'Helping' has smallest skewness 0.106 and 'Donating' has highest skewness 0.807 with 'Volunteering' 0.528 in the middle. All of them are not at significant skewness level. Regarding Kurtosis, both 'Helping' and 'Volunteering' are around -0.6, which distributions will be slightly flatter with shorter tails than 'Donating' (Kurtosis=0.05).


Summary

Finally, statistical description can be summed up:

for 'Helping', nearly symmetric distribution with no outliers, mean can be used to represent centre (central tendency) and Standard Deviation can be used to represent dispersion;

for 'Donating', right-skewed distribution with outliers, median to be used for centre and IQR to be used for dispersion;

for 'Volunteering', right-skewed distribution with no outliers, median to be used for centre and IQR to be used for dispersion


Alternative method - Summary Statistics

To obtain the central tendency and dispersion measure results, instead of coding, use Tasks and Utilities/Tasks/Statistics/Summary Statistics with 'Donating', 'Helping' and 'Volunteering' as Analysis variables, no Classification variables. Tick select measures in OPTIONS tab.



SAS1 Summary Statistics

/**************************************************************************************************/


Basic knowledge and practice sharing for beginners. Please do not comment if not interested.



二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Fundamental descriptive Fundamenta Case study Statistics Statistics SAS

ericwang

沙发
hyq2003(未真实交易用户) 发表于 2019-8-4 08:47:31
谢谢分享

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注cda
拉您进交流群
GMT+8, 2026-1-27 07:23