楼主: Lisrelchen
1615 9

Biopython Tutorial and Cookbook [推广有奖]

  • 0关注
  • 62粉丝

VIP

院士

67%

还不是VIP/贵宾

-

TA的文库  其他...

Bayesian NewOccidental

Spatial Data Analysis

东西方数据挖掘

威望
0
论坛币
50154 个
通用积分
81.3828
学术水平
253 点
热心指数
300 点
信用等级
208 点
经验
41518 点
帖子
3256
精华
14
在线时间
766 小时
注册时间
2006-5-4
最后登录
2022-11-6

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck,Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich, Bartek Wilczy′nskiLast Update – 8 June 2016 (Biopython 1.67)

本帖隐藏的内容

http://biopython.org/DIST/docs/tutorial/Tutorial.pdf


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Tutorial Cookbook python tutor Book Peter

本帖被以下文库推荐

沙发
Lisrelchen 发表于 2016-8-18 22:26:57 |只看作者 |坛友微信交流群
  1. 5.1.1 Reading Sequence Files
  2. In general Bio.SeqIO.parse() is used to read in sequence files as SeqRecord objects, and is typically used
  3. with a for loop like this:

  4. from Bio import SeqIO
  5. for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
  6. print(seq_record.id)
  7. print(repr(seq_record.seq))
  8. print(len(seq_record
复制代码

使用道具

藤椅
Lisrelchen 发表于 2016-8-18 22:28:36 |只看作者 |坛友微信交流群
  1. 5.1.2 Iterating over the records in a sequence file
  2. In the above examples, we have usually used a for loop to iterate over all the records one by one. You can use
  3. the for loop with all sorts of Python objects (including lists, tuples and strings) which support the iteration
  4. interface.
  5. The object returned by Bio.SeqIO is actually an iterator which returns SeqRecord objects. You get to
  6. see each record in turn, but once and only once. The plus point is that an iterator can save you memory
  7. when dealing with large files.
  8. Instead of using a for loop, can also use the next() function on an iterator to step through the entries,
  9. like this:
  10. from Bio import SeqIO
  11. record_iterator = SeqIO.parse("ls_orchid.fasta", "fasta")
  12. first_record = next(record_iterator)
  13. print(first_record.id)
  14. print(first_record.description)
  15. second_record = next(record_iterator)
  16. print(second_record.id)
  17. print(second_record.description)
复制代码

使用道具

板凳
Lisrelchen 发表于 2016-8-18 22:30:01 |只看作者 |坛友微信交流群
  1. 5.1.3 Getting a list of the records in a sequence file
  2. In the previous section we talked about the fact that Bio.SeqIO.parse() gives you a SeqRecord iterator,
  3. and that you get the records one by one. Very often you need to be able to access the records in any order.
  4. The Python list data type is perfect for this, and we can turn the record iterator into a list of SeqRecord
  5. objects using the built-in Python function list() like so:
  6. from Bio import SeqIO
  7. records = list(SeqIO.parse("ls_orchid.gbk", "genbank"))
  8. print("Found %i records" % len(records))
  9. print("The last record")
  10. last_record = records[-1] #using Python’s list tricks
  11. print(last_record.id)
  12. print(repr(last_record.seq))
  13. print(len(last_record))
  14. print("The first record")
  15. first_record = records[0] #remember, Python counts from zero
  16. print(first_record.id)
  17. print(repr(first_record.seq))
  18. print(len(first_record)
复制代码

使用道具

报纸
Lisrelchen 发表于 2016-8-18 22:33:51 |只看作者 |坛友微信交流群
  1. 5.2 Parsing sequences from compressed files
  2. In the previous section, we looked at parsing sequence data from a file. Instead of using a filename, you
  3. can give Bio.SeqIO a handle (see Section 24.1), and in this section we’ll use handles to parse sequence from
  4. compressed files.
  5. As you’ll have seen above, we can use Bio.SeqIO.read() or Bio.SeqIO.parse() with a filename - for
  6. instance this quick example calculates the total length of the sequences in a multiple record GenBank file
  7. using a generator expression:
  8. >>> from Bio import SeqIO
  9. >>> print(sum(len(r) for r in SeqIO.parse("ls_orchid.gbk", "gb")))
  10. 67518
  11. Here we use a file handle instead, using the with statement to close the handle automatically:
  12. >>> from Bio import SeqIO
  13. >>> with open("ls_orchid.gbk") as handle:
  14. ... print(sum(len(r) for r in SeqIO.parse(handle, "gb")))
  15. 67518
  16. Or, the old fashioned way where you manually close the handle:
  17. >>> from Bio import SeqIO
  18. >>> handle = open("ls_orchid.gbk")
  19. >>> print(sum(len(r) for r in SeqIO.parse(handle, "gb")))
  20. 67518
  21. >>> handle.close()
  22. Now, suppose we have a gzip compressed file instead? These are very commonly used on Linux. We can
  23. use Python’s gzip module to open the compressed file for reading - which gives us a handle object:
  24. >>> import gzip
  25. >>> from Bio import SeqIO
  26. >>> handle = gzip.open("ls_orchid.gbk.gz", "r")
  27. >>> print(sum(len(r) for r in SeqIO.parse(handle, "gb")))
  28. 67518
  29. >>> handle.close()
  30. Similarly if we had a bzip2 compressed file (sadly the function name isn’t quite as consistent):
  31. >>> import bz2
  32. >>> from Bio import SeqIO
  33. >>> handle = bz2.BZ2File("ls_orchid.gbk.bz2", "r")
  34. >>> print(sum(len(r) for r in SeqIO.parse(handle, "gb")))
  35. 67518
  36. >>> handle.close()
复制代码

使用道具

地板
albertwishedu 发表于 2016-8-18 23:02:10 |只看作者 |坛友微信交流群
已有 1 人评分经验 收起 理由
Lisrelchen + 20 鼓励积极发帖讨论

总评分: 经验 + 20   查看全部评分

使用道具

7
bailihongchen 发表于 2016-8-19 15:19:48 |只看作者 |坛友微信交流群
thanks for sharing very much

使用道具

8
vistro 在职认证  发表于 2016-8-19 23:08:46 |只看作者 |坛友微信交流群

使用道具

9
bioguo 发表于 2016-8-20 13:49:11 |只看作者 |坛友微信交流群
支持支持

使用道具

10
mike68097 发表于 2016-8-22 22:53:54 |只看作者 |坛友微信交流群

使用道具

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群

京ICP备16021002-2号 京B2-20170662号 京公网安备 11010802022788号 论坛法律顾问:王进律师 知识产权保护声明   免责及隐私声明

GMT+8, 2024-9-21 08:54