楼主: oliyiyi
896 0

15 practical tips for a bioinformatician [推广有奖]

版主

已卖:2994份资源

泰斗

1%

还不是VIP/贵宾

-

TA的文库  其他...

计量文库

威望
7
论坛币
101105 个
通用积分
31671.0967
学术水平
1454 点
热心指数
1573 点
信用等级
1364 点
经验
384134 点
帖子
9629
精华
66
在线时间
5508 小时
注册时间
2007-5-21
最后登录
2025-7-8

初级学术勋章 初级热心勋章 初级信用勋章 中级信用勋章 中级学术勋章 中级热心勋章 高级热心勋章 高级学术勋章 高级信用勋章 特级热心勋章 特级学术勋章 特级信用勋章

楼主
oliyiyi 发表于 2016-2-18 15:19:01 |AI写论文

+2 论坛币
k人 参与回答

经管之家送您一份

应届毕业生专属福利!

求职就业群
赵安豆老师微信:zhaoandou666

经管之家联合CDA

送您一个全额奖学金名额~ !

感谢您参与论坛问题回答

经管之家送您两个论坛币!

+2 论坛币
(This article was first published on One Tip Per Day, and kindly contributed to R-bloggers)

[color=rgb(255, 255, 255) !important]


Tips below are based on the lessons I learnt from making mistakes during my years of research. It’s purely personal opinion. Order doesn’t mean anything. If you think I should include something else, please comment below.

  • Always set a seed number when you run tools with random option, e.g. bedtools shuffle, random etc.; You (or your boss in a day) want your work reproducible.
  • Set your own temporary folder (via –tmp, $TMPDIR etc., depending on your program). By default, many tools, e.g. sort, use the system /tmp as temporary folder, which may have limited quote that is not enough for your big NGS data.
  • Always use a valid name for your variables, column and rows of data frame. Otherwise, it can bring up unexpected problem, e.g. a ‘-‘ in the name will be transferred to ‘.’ in R unless you specifycheck.names=F.
  • Always make a README file for the folder of your data; For why and how, read this: http://stackoverflow.com/questions/2304863/how-to-write-a-good-readme
  • Always comment your code properly, for yourself and for others, as you very likely will read your ugly code again.
  • Always backup your code timely, using github, svn, Time Machine, or simply copy/paste whatever.
  • Always clean up the intermediate or unnecessary data, as you can easily shock your boss and yourself by generating so much data (and perhaps most of them are useless).
  • Don’t save into *.sam if you can use *.bam. Always zip your fastq (and other large plain files) as much as you. This applies to other file format if you can use the compressed one. As you cannot imagine how much data (aka “digital garbage”) you will generate soon.
  • Using parallel as much as you can, e.g. using “LC_ALL=C sort –parallel=24 –buffer-size=5G” for sorting (https://www.biostars.org/p/66927/), as multi-core CPU/GPU is common nowaday.
  • When a project is completed, remember to clean up your project folder, incl. removing the unnecessary code/data/intermediate files, and burn a CD for the project. You never know when you, your boss or your collaborators will need the data again;
  • Make your code sustainable as possible as you can. Remember the 3 major features of OOP: Inheritance, Encapsulation, Polymorphism. (URL)
  • When you learn some tips from others by Google search, remember to list the URL for future reference and also for acknowledging others’ credit. This applies to this post, of course
  • Keep learning, otherwise you will be out soon. Just like the rapid development of NGS techniques, computing skills are also evolving quickly. Always catch up with the new skills/tools/techniques.
  • When you learn some tips from others, remember to share something you learned to the community as well, as that’s how the community grows healthily.
  • Last but not least, stand up and move around after sitting for 1-2 hours. This is especially important for us bioinformaticians who usually sit in front of computer for hours. Only good health can last your career long. More reading:https://www.washingtonpost.com/news/wonk/wp/2015/06/02/medical-researchers-have-figured-out-how-much-time-is-okay-to-spend-sitting-each-day/










二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

关键词:Practical informat informa practic format practical

缺少币币的网友请访问有奖回帖集合
https://bbs.pinggu.org/thread-3990750-1-1.html

您需要登录后才可以回帖 登录 | 我要注册

本版微信群
加好友,备注jltj
拉您入交流群
GMT+8, 2025-12-22 12:08