aaa 111 222
aaa 222 111
bbb 111 333
aaa 111 222
ccc 222 222
bbb 111 444
希望去除重复项,最后得到:
aaa 111 222
aaa 222 111
bbb 111 333
ccc 222 222
bbb 111 444
数据很大,好几十个G,我写的脚本感觉运行效率很低:
- #!~/miniconda2/bin/python
- outfile = open('remove_duplicate.txt', 'w')
- list_1=[]
- for line in open('header_and_position_1.txt'):
- tmp = line.strip()
- if tmp not in list_1:
- list_1.append(tmp)
- outfile.write(line)
- outfile.close()


雷达卡




京公网安备 11010802022788号







