1.关于 ggplot 的题
(1)用至少两种方法对 diamond 数据库进行数据处理,显示样本在 cut 和 clarity
两个分类上的分布
Diamonds%>%
Count(cut,clarity)%>%
Ggplot(mapping=aes(x=clarity,y=cut))+geom_tile(mapping=aes(fill=n))
Ggplot(mapping=aes(x=cut))+geom_bar(fill=clarity)
(2)寻找 cut 和 price 的关系
ggplot(data=diamonds,mapping = aes(x=cut,y=price))+geom_boxplot()
(3)Mpg 中 08 年的车的型号排序列成柱状图
new <- filter(mpg,year==2008)%>%
+group_by(model)(不用)
ggplot(data=new)+geom_bar(mapping=aes(x=model))+coord_flip()
2.利用 dplyr 的数据进行下列数据转换
a.找出那些出发延迟,到达没有延迟的航班
filter(flights,dep_delay>0,arr_delay<=0)
# A tibble: 35,442 x 19
year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
<int> <int> <int> <int> <int> <dbl> <int> <int>
1 2013 1 1 601 600 1. 844 850
2 2013 1 1 644 636 8. 931 940
3 2013 1 1 646 645 1. 910 916
4 2013 1 1 646 645 1. 1023 1030
5 2013 1 1 701 700 1. 1123 1154
6 2013 1 1 752 750 2. 1025 1029
7 2013 1 1 803 800 3. 1132 1144
8 2013 1 1 826 817 9. 1145 1158
9 2013 1 1 846 845 1. 1138 1205
10 2013 1 1 856 855 1. 1140 1203
# ... with 35,432 more rows, and 11 more variables: arr_delay <dbl>, carrier <chr>,
# flight <int>, tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
# distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
b.找出每小时飞行速度最快的航班
> flights%>%
+ mutate(hours=air_time/60,speed_per_hour=distance/hours)%>%
+ select(tailnum,hours,speed_per_hour)%>%
+ arrange(desc(speed_per_hour))
# A tibble: 336,776 x 3
tailnum hours speed_per_hour
<chr> <dbl> <dbl>
1 N666DN 1.08 703.
2 N17196 1.55 650.
3 N14568 0.917 648.
4 N12567 1.17 641.
5 N956DL 1.75 591.
6 N3768 2.83 564.
7 N779JB 2.87 557.
8 N5FFAA 2.92 556.
9 N3773D 2.88 554.
10 N571JB 2.88 554.
# ... with 336,766 more rows
c.给天气 table 中加入位置方面的变量
> weather%>%
+ left_join(airports,c("origin"="faa"))
# A tibble: 26,130 x 22
origin year month day hour temp dewp humid wind_dir wind_speed wind_gust
<chr> <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 EWR 2013. 1. 1 0 37.0 21.9 54.0 230. 10.4 11.9
2 EWR 2013. 1. 1 1 37.0 21.9 54.0 230. 13.8 15.9
3 EWR 2013. 1. 1 2 37.9 21.9 52.1 230. 12.7 14.6
4 EWR 2013. 1. 1 3 37.9 23.0 54.5 230. 13.8 15.9
5 EWR 2013. 1. 1 4 37.9 24.1 57.0 240. 15.0 17.2
6 EWR 2013. 1. 1 6 39.0 26.1 59.4 270. 10.4 11.9
7 EWR 2013. 1. 1 7 39.0 27.0 61.6 250. 8.06 9.27
8 EWR 2013. 1. 1 8 39.0 28.0 64.4 240. 11.5 13.2
9 EWR 2013. 1. 1 9 39.9 28.0 62.2 250. 12.7 14.6
10 EWR 2013. 1. 1 10 39.0 28.0 64.4 260. 12.7 14.6
# ... with 26,120 more rows, and 11 more variables: precip <dbl>, pressure <dbl>,
# visib <dbl>, time_hour <dttm>, name <chr>, lat <dbl>, lon <dbl>, alt <int>,
# tz <dbl>, dst <chr>, tzone <chr>
|