(本文为andruw原创,转载请注明出处)
很多时候,我们需要基于现有数据创建虚拟变量(dummy variable)或分类变量(categorical variable)。比如,我们拿到如下样本,需要按国家建立虚拟变量。
- +------------------+
- | country year |
- |------------------|
- 1. | China 2013 |
- 2. | China 2014 |
- 3. | Japan 2013 |
- 4. | Japan 2014 |
- 5. | Korea 2013 |
- |------------------|
- 6. | Korea 2014 |
- 7. | Germany 2013 |
- 8. | Germany 2014 |
- 9. | UK 2013 |
- 10. | UK 2014 |
- |------------------|
- 11. | Singapore 2013 |
- 12. | Singapore 2014 |
- +------------------+
生成该数据的代码:
- clear
- input str20 country year
- China 2013
- China 2014
- Japan 2013
- Japan 2014
- Korea 2013
- Korea 2014
- Germany 2013
- Germany 2014
- UK 2013
- UK 2014
- Singapore 2013
- Singapore 2014
- end
创建虚拟变量比较常见的有三种方法:
1. 使用generate命令
- gen dummy_1 = (country == "China")
- gen dummy_2 = (country == "Japan")
- gen dummy_3 = (country == "Korea")
- gen dummy_4 = (country == "Germany")
- gen dummy_5 = (country == "UK")
- gen dummy_6 = (country == "Singapore")
2. 使用tabulate命令
- tabulate country, gen(dummy)
3. 使用xi(factor variable,即因子变量)
- xi i.country
创建分类变量比较常见的方法有:
1. 使用egen函数group
- egen category = group(country)
2. 使用encode(同时还会生成数值标签)
- encode country, gen(category_country)
Stata tips & tricks帖子列表:


雷达卡






恩,要针对具体问题选择最有效的方法。
京公网安备 11010802022788号







