前提:我已经按照user_id和request_time进行排序,为某一个user的数据
目前需求:想要把request_time挨在一起的重复出现的sku_id删除只保留一个(比如sku:a864c609d0,在倒数第五行和倒数第四行挨着出现,我只想留下一个,但同时,他也在最后一行和第五航出现,这两个数据是我想保留的)
我已经尝试过duplicates drop user_id sku_id,但是该命令下没有考虑时间时间顺序,比如sku_id为864c609d0的数据,一共出现了3次,用duplicates drop就会只留下一个,因此想请教大家有没有其他方法可以解决?
- * Example generated by -dataex-. To install: ssc install dataex
- clear
- input str10(sku_id user_id) str19 request_time double(click_date click_time)
- "e99eb7d131" "ffff831061" "2018-03-03 21:36:17" 21246 1835732177000
- "43cdf174ae" "ffff831061" "2018-03-03 21:36:33" 21246 1835732193000
- "9a128ffc54" "ffff831061" "2018-03-03 21:37:03" 21246 1835732223000
- "246e6ef6fe" "ffff831061" "2018-03-03 21:40:05" 21246 1835732405000
- "a864c609d0" "ffff831061" "2018-03-03 21:47:53" 21246 1835732873000
- "b65c3ea916" "ffff831061" "2018-03-03 21:49:33" 21246 1835732973000
- "d1f9cee99b" "ffff831061" "2018-03-14 14:22:56" 21257 1836656576000
- "9ac31152dd" "ffff831061" "2018-03-14 14:25:02" 21257 1836656702000
- "d1f9cee99b" "ffff831061" "2018-03-22 06:22:27" 21265 1837318947000
- "43cdf174ae" "ffff831061" "2018-03-28 07:05:27" 21271 1837839927000
- "e99eb7d131" "ffff831061" "2018-03-28 07:05:32" 21271 1837839932000
- "fa43f4c1a1" "ffff831061" "2018-03-28 07:05:45" 21271 1837839945000
- "a864c609d0" "ffff831061" "2018-03-28 07:06:06" 21271 1837839966000
- "a864c609d0" "ffff831061" "2018-03-28 07:06:09" 21271 1837839969000
- "a98e4e1eff" "ffff831061" "2018-03-28 07:06:28" 21271 1837839988000
- "9a128ffc54" "ffff831061" "2018-03-28 07:06:32" 21271 1837839992000
- "a864c609d0" "ffff831061" "2018-03-28 07:06:52" 21271 1837840012000
- end
- format %td click_date
- format %tc click_time