givingindex2015_ver2.sas7bdat.zip
(4.41 KB, 需要: 5 个论坛币)
本附件包括:- givingindex2015_ver2.sas7bdat
Background:
The World Giving Index is published every year by the Charities Aid Foundation (CAF) www.cafonline.org. The index aims to provide insight into the scale and nature of giving around the world. The World Giving Index 2015 presents giving data from 145 countries from across the globe, based on questionnaires completed by a representative sample of individuals living across each country. Index scores have been calculated from responses to the following questions:
Have you done any of the following in the past month?
Donated money to a charity?
Helped a stranger, or someone you didn’t know who needed help?
Volunteered your time to an organization?
Variable | Description |
Country_name | Name of country |
Country_Code | Abbreviation for country name |
Region | Africa, Asia, Europe, Middle East, North & Central America, Oceania, South America |
Donating | Donating money score for each country, percent of population |
Helping | Helping a stranger score for each country, percent of population |
Volunteering | Volunteering score for each country, percent of population |
Economy | Three levels of economic development |
/**************************************************************************************************************************************/
Study the distributions of donating, helping and volunteering scores.
The values in Donating, Helping and Volunteering columns are calculated scores (percentage of population), numerial variables.
Location measures
Firstly, location measures, mean, median and mode, are used to observe data central tendency.
[td]Variable | N | N Miss | Mode | Mean | Median |
| Helping | 145 | 0 | 50.000 | 49.228 | 50.000 |
| Donating | 145 | 0 | 10.000 | 31.510 | 28.000 |
| Volunteering | 145 | 0 | 9.000 | 21.069 | 20.000 |
codes using the MEANS procedure:
- proc means data=mydata.givingindex2015_ver2
- n nmiss mode mean median maxdec=3;
- var helping donating volunteering;
- run;
From the central tendency measures table, 2 patterns can be quickly found regarding variable 'Helping'.
it has almost same mode, mean and median values, which means that it is very close to a normal distribution, where mode=mean=median. This indicates that regarding 'Helping' score, the 145 countries will have a near-symmetric distribution without outliers, with majority countries' scores close up to around 50 and minority of them have very low scores and equally minority have very high socres.
it has the highest scores on all of the 3 measures, comparing with other 2 variables 'Donating' and 'Volunteering'.
As a result of this observation, mean can be used to represent the central of variable 'Helping'.
Regarding variables 'Donating' and 'Volunteering', both of them have their median values lower than their mean values, i.e. means on the right hand side of medians, which indicates their distributions will be right-skewed.
'Donating' has median value of 28, which means 50% of the countries socred 28 and lower, and another 50% scored higher than 28. Whereas its average socre across the 145 countries is 31.51, higher than the very middle value 28. Mean value has been carried away by some extreme values (could be very high scores or could be outliers) in this variable's range that contribute to the rise of the mean, therefore mean value has lost its representation.
'Volunteering' has the same tendency but with just a slight difference between mean and median, which is still right-skewed but skewness will be less than 'Donating'. The possibility of having outliers need to be examined through the analysis of distribution spread.
As a result of the observation, median can be used to represent the central of 'Donating' and 'Volunteering'.
Dispersion measures
Then, to discover what factors are affecting the spread of the distribuitons, measures like variance, standard deviation, range, quartiles, and interquartile range are used to observe data's dispersion.
[td]Variable | N | Variance | Std Dev | Minimum | Lower Quartile | Upper Quartile | Maximum | Quartile Range | Range |
| Helping Donating Volunteering | 145 145 145 | 193.149 379.210 130.315 | 13.898 19.473 11.416 | 16.000 3.000 3.000 | 38.000 16.000 11.000 | 59.000 44.000 29.000 | 79.000 92.000 50.000 | 21.000 28.000 18.000 | 63.000 89.000 47.000 |
codes using the MEANS procedure:
- proc means data=mydata.givingindex2015_ver2
- n var std min q1 q3 max qrange range maxdec=3;
- var helping donating volunteering;
- run;
Variance and Standard Deviation - 'Donating' has higher variance 379.21 and std deviation score 19.473 than other 2 variables, which indicates bigger sample data fluctuation. 'Donating' samples take values (scores) in a much bigger range than the other 2 variables, that caused the bigger variation, i.e. more variaty, more information.
For 'Donating', q3+1.5xIQR = 86, the whisker will stop at this top limit 86, any values larger than this will be outliers. From above table, can see Maximum is 92, which already is an outlier. q1-1.5xIQR = -26, bottom limit stops at -26, while Minimum 3 falls above it, so no outliers at the lower end of the data.
For 'Volunteering', q3+1.5xIQR = 56. Top limit of the whisker is 56. Maximum 50 is within this limit, no outliers at the higher end. q1-1.5xIQR = -16, while Minimum 3 falls within this limit, so no outliers at the lower end either.
Skewness and Kurtosis
Next, measures skewness and kurtosis are printed to observe skewness level and peak distributions.
[td]Variable | Skewness | Kurtosis |
| Helping | 0.106 | -0.686 |
| Donating | 0.807 | 0.050 |
| Volunteering | 0.528 | -0.622 |
codes using the MEANS procedure:[LaTex]
- proc means data=MYDATA.GIVINGINDEX2015_VER2
- skewness kurtosis maxdec=3;
- var Helping Donating Volunteering;
'Helping' has smallest skewness 0.106 and 'Donating' has highest skewness 0.807 with 'Volunteering' 0.528 in the middle. All of them are not at significant skewness level. Regarding Kurtosis, both 'Helping' and 'Volunteering' are around -0.6, which distributions will be slightly flatter with shorter tails than 'Donating' (Kurtosis=0.05).
Summary
Finally, statistical description can be summed up:
for 'Helping', nearly symmetric distribution with no outliers, mean can be used to represent centre (central tendency) and Standard Deviation can be used to represent dispersion;
for 'Donating', right-skewed distribution with outliers, median to be used for centre and IQR to be used for dispersion;
for 'Volunteering', right-skewed distribution with no outliers, median to be used for centre and IQR to be used for dispersion
Alternative method - Summary Statistics
To obtain the central tendency and dispersion measure results, instead of coding, use Tasks and Utilities/Tasks/Statistics/Summary Statistics with 'Donating', 'Helping' and 'Volunteering' as Analysis variables, no Classification variables. Tick select measures in OPTIONS tab.
/**************************************************************************************************/
Basic knowledge and practice sharing for beginners. Please do not comment if not interested.




雷达卡





京公网安备 11010802022788号







