Bioinformatics in the Post-Genomic Era 105
Bioinformatics in the Post-Genomic Era | |
author | Jeff Augen |
pages | 388 |
publisher | Addison-Wesley Longman |
rating | 7 |
reviewer | Jose Nazario |
ISBN | 0321173864 |
summary | Genome, Transcriptome, Proteome, and Information-Based Medicine |
Bioinformatics is the science of biological information, namely sequences and metadata about organisms and sequences. What's interesting about this field to many people, both in the sciences and outside of it, is the large volume of data that gets analyzed and the results that emerge on a daily basis. Obviously interesting for the medical advances and the rapidly growing business in the life sciences, there's a complex field that has developed in the past ten years or so. And following the sequencing of the human genome, new challenges have arisen for everyone involved. Augen's Bioinformatics provides a good introduction to this new field of research for students in the sciences, and anyone with a decent undergraduate education in modern biology. I think that this accessibility of the material is one of the book's biggest winning points.
After an introduction to the book and the subject area of bioinformatics (chapters 1 and 2), Augen begins at the level of the structure of a gene (chapter 3). Here, anyone with an undergraduate level understanding of genetics or molecular biology can begin using the book and bridging the gap to the new areas of modern bioinformatics. Augen then describes how basic sequence analysis is performed at the DNA sequence level (in chapter 4). The material in Bioinformatics covers some of the higher-level methods for sequence analysis, including hidden Markov models, neural networks, and pattern discovery, and introduces some of the common algorithms found to do this analysis.
Chapter 5 then covers transcription, the process of going from DNA to mRNA. Beginning with the biology behind this activity (the ribosome and the larger "transcriptome"), Bioinformatics then describes how you would perform transcriptional analysis. Here, Augen shows how you go from a wet lab to a computational lab and describes what classes of experiments you perform to gather data and then what kinds of analysis you perform on it. This chapter introduces some of the more common clustering techniques for data aggregation and understanding.
The next step in the DNA -> RNA -> protein chain is found in chapter 6, which covers the translation process. Coupled to chapter 7, which describes protein structure prediction and searching, these two chapters bridge the next gap between laboratory data and computational analysis. Protein folding and structure analysis was one of my pet areas of study as a graduate student, and Augen's text does a decent summarization of the field to date. The resources listed and techniques described are definitely on par with the common practices in the field.
Finally, Bioinformatics gets into the next major area of bioinformatics, medical databases. Augen's bridge from genetics to medical science is complete, and he discusses how medical professionals utilize databases and can begin to predict disease, for example, based on data mining. The final chapter, "New Themes in Bioinformatics," covers exactly that, but also what Augen refers to as "workflow computing," or basically going about being a bioinformatics scientist. One of my favorite emerging areas in bioinformatics, metabolic pathway elucidation, is also covered briefly.
I've shared this book with a few friends who are all studying computer science or practicing computer scientists. I did so because Augen's material does a good job of explaining my background and introducing them to some of the analysis forms I introduce into my own work. It does a good job of that, and gets them quite excited. Bioinformatics really bridges a number of fascinating areas of computer sciences, including data mining and high performance algorithms. Augen's Bioinformatics is a good introduction to the field for them, and really anyone who has studied a couple of biology courses in college.
Where the book falls short, however, can be grouped into two main areas. The first is the failure of Augen's presentation of the algorithms. While the methods used to describe computational algorithms in Bioinformatics is common for non-computer scientists, it's completely unusable for computer scientists who are used to a specific algorithm presentation style that looks more like pseudocode than rambling text. The ambiguities this presents for a technical reader are unfortunate, especially if anyone studying bioinformatics is supposed to be computer science literate. The book itself assumes a life science literacy, so this isn't an unreasonable expectation of the reader.
The second area that consistently falls short in the book is in the utility of the information given. While I am significantly happier with the quality and depth of material presented in Augen's book than in the O'Reilly bioinformatics series, where the book fails to deliver is in showing the reader how to actually use the data they gather. After all, the book shows various sequence analysis algorithms and discusses tools available to do this work, but it only devotes a few pages (out of over 370 in total) to a workflow that can be used. Also, the book fails to point the reader at very worthwhile web resources sometimes, including meta sites like the SDSC Biology Workbench site, and just says "some Perl scripts" for local data analysis. As such, you'll have to go a few extra miles on your own to make use of the data sources.
I guess a third complaint of the book for me is that Augen has ignored or omitted significant bodies of research that fit squarely into the scope of the book. For example, Ken Dill's research into protein folding models, as well as Martin Karplus' work on the subject, receives no mention, nor does the topic of Bayesian network analysis when Augen discusses time series data analysis. These aren't new, they've been around for many years and influenced most of the field, and their absence is noted. The book's spotty coverage in some places, like these, is noticeable.
Bioinformatics does a few things well, but overall reads too much like a biology textbook to be useful to the average computer scientist. More emphasis on the practice of bioinformatics and data analysis would have made this book stronger and complemented the substantive background material well. Finally, using an approach more similar to the computer science approach would have been a tremendous benefit, since the material really is computer science in part. That said, I think this is probably the best introduction to this exciting area of science that I have yet seen.
You can purchase Bioinformatics in the Post-Genomic Era from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Post Genomics Era? (Score:2, Insightful)
Re:Post Genomics Era? (Score:5, Informative)
Genomics is now part of the game. It used to be that if you sequenced a gene, you could work a PhD off of it. Now that's simply the first step. So now that genomes are a part of every day life science, if you don't know how to run blast, you had better get back to school.
Re:Post Genomics Era? (Score:2)
Re:Post Genomics Era? (Score:2)
Actually, the scope of structural biology has broadened considerably over the past couple of decades, and now membrane proteins and ribosomes are within reach. A number of groups worldwide are trying to apply high-throughput methodology to structure determination, hence "structural genomics." The real problem is that even for small proteins the structure determination process is still so
Re:Post Genomics Era? (Score:5, Funny)
"Gee! Gnomes!"
Thank you, I'm here all week. (Actually, I'm not. But I am feeling cheeky today.
Re:Post Genomics Era? (Score:2)
Re:Post Genomics Era? (Score:3, Informative)
Yup, still got my genes.
Re:Post Genomics Era? (Score:1, Insightful)
Re:Post Genomics Era? (Score:4, Informative)
Re:Post Genomics Era? (Score:2)
Modern genetics doesn't have to actually be about anything as long
Re:Post Genomics Era? (Score:3, Informative)
I'd say that there is far more sequencing going on right now than ever before, in terms of total output. GenBank provides a nice growth summary [nih.gov] (note that the human genome was officially "completed" in 2003). It's just that we now have one nearly complete genome (human) and several largely complete, or getting there.
To me, "post-genomic" sounds
Re:Post Genomics Era? (Score:2, Informative)
We have far more than one completed genome! The human genome project gets the most publicity of course, but there are hundreds of bacteria, viruses and plants which have been sequenced, see http://www.ncbi.nlm.nih.gov/Genomes/index.html [nih.gov]. Many of these genomes have also been annotated by human curators - the so called "meta information".
Re:Post Genomics Era? (Score:2)
And there is another distinction to be made: we keep talking about the human genome, whereas we really only are dealing with a human genome (or rather chunks of a few with a lot more coverage for some specific sites). It will get a lot more interesting when we'll have access to thousands of human genomes (along with patient histories) - that will
It's all proteomics nowadays, post-genomics (Score:1)
I spend my day covered in protein sequences and worried about docking configurations and charges, quite frankly, working on drug design targets to help cure malaria and other nasty beasties.
Bioinformatics book recommendations (Score:2, Informative)
Lots of molecular biologists would say the same thing (perhaps not in the way you meant it). Francis Crick apparently thought genomics was way overhyped.
Seriously though, I sometimes wonder why anyone bothers writing another bioinformatics howto book when Durbin et al [amazon.com] (apologies for amazon link) is still unrivalled. Maybe also Felsenstein [sinauer.com] for phylogeny, MacKay [cam.ac.uk] for general probabilistic modeling... anyone recommend anything for the coalescent? Microarrays? Image analys
Why would you have left the field? (Score:3, Interesting)
Re:Why would you have left the field? (Score:1, Informative)
I hate to say it, but my opinion is that very few today are going to live to see the promise realized. The last polls of westerners I saw showed an almost universal dislike over the idea of genetic engineering. Have you ever heard the head of the US bioethics council speak? He's a nutjob who thinks humans are some sort of divine creation which stands apart from the animals. Any tinkering with our genetics is, to those who
we all have reason to oppose this (Score:2)
We generally have lots of flaws.
Once made-to-order humans become common, all
of us existing people become obsolete. We'll
be, at best, like chimps or gorillas in the
new world.
Life wouldn't be grand for the new people either,
because then human version 2.1 comes out, etc.
Re:we all have reason to oppose this (Score:2)
Re:we all have reason to oppose this (Score:2)
Consider the smartest non-insane person in the world. (one in 6 billion, with an IQ well above 300) Now fix any obvious defects, such as nearsightedness or a heart valve problem. So this is pretty much a clone of the brightest person to ever live, with the easy-to-identify flaws patched out. Tweak the appearance a bit (eye color, etc.) if desired.
Now imagine that there are lots and lots of people made like this. It's no longer 1 in 6
Re:we all have reason to oppose this (Score:2)
Re:Electronics/Computer Science isn't tapped out (Score:1)
A kid interested in biology will get taken to the psychologist if he takes the neighborhood squirrel apart. Sadly, it's when you're young that it's the best time to learn by tinkering, so if you do like bio, you'll only get to tinker in your 20s when you hit university and put mice in the blender.
Like the other poster mentionned
Re:TOC (Score:1, Informative)
Preface.
1. Introduction.
Overview.
Computationally Intense Problems: A Central Theme in Modern Biology.
Building the Public Infrastructure.
The Human Genome's Several Layers of Complexity.
Toward Personalized Medicine.
Illnesses are Polygenic.
New Science, New Infrastructure.
The Proactive Future of Information-Based Medicine.
2. Introduction to Bioinformatics.
Introduction.
The Emergence o
Important point: (Score:5, Insightful)
In bioinformatics, science literacy is so much more important than computer literacy. Computer scientists rarely become good bioinfromaticians. This is the primary reason almost every single peice of commercial bioinformatics software is a complete peice of shit. And why the free stuff is hacky but gets the job done. The free stuff was written by life scientists, the commercial stuff was written by computer scientists with no domain knowledge of the question they were trying to answer.
Bioinformatics is not something you 'just get into.' And it is not a natural path to go from CS to bioinformatics.
Re:Important point: (Score:2)
BTW, the term is "bioinformaticists", not "bioinformaticians"
Re:Important point: (Score:2, Informative)
Actually, it is you who is wrong. In the world of bioinformitics, "bioinformatician" is more widely used than "bioinformaticist". By the way, I work for a bioinformatics company.
Re:Important point: (Score:2)
So...does your company produce crappy software or good software?
Re:Important point: (Score:1)
Most of our customers say good. The people who disagree do so mainly because it lacks a feature they want. Which is a huge problem with bioinformatics software - most do far too many things and none of them particularly well. A terrible side affect of this, is it makes them overly complex, often to the point of unusable by only a very experienced bioinformatitician or very savvy computer user. As the nature of certain types of experiment
Re:Important point: (Score:2)
Hmm... Just posted a question about this in another thread. How about a Mathematics/Physics background?
I'm thinking I'm looking at about 2 years of furthe
Re:Important point: (Score:2)
Re:Important point: (Score:2)
This all seems so much more realistic if I was 20 instead of 30 though...
Thanks for your reply!
Re:Important point: (Score:3, Interesting)
I have to agree. If
Re:Important point: (Score:2, Insightful)
disagree (Score:3, Informative)
Mike
Re:disagree (Score:2)
Re:disagree (Score:2)
Programming and professional software engineering practices should really be vocational school territory, in my opinion. Still valuable--essential--for a devel
Re:disagree (Score:4, Insightful)
Well, there are at least two answers to that. The first is general: the idea that "programmers don't need to know all that theory" is, IMNSDGHO, largely responsible for all the crappy bloatware that the computing world has to deal with; if programmers spent more time learning real CS than the latest buzzwords, software would generally be much better than it is.
The second is specific to the topic of discussion: scientific programming, including bioinformatics, is much closer to the theoretical level than is most application programming. Pretty widgets don't matter nearly as much as the fact that you're dealing with complex operations on huge data sets, and if you write your program without any awareness of What's Really Going On, then your program will run like shit.
Re:disagree (Score:3, Insightful)
Re:disagree (Score:2)
Re:disagree (Score:1)
Absolutely true, and I am proof. A large chunk of my PhD is the results I got from a VERY poorly coded Perl script that I wrote after reading "Teach Yourself Perl in 24 Hours". Had some C background, so that helped, and I eventually learned enough Perl/Tk to code up a UI.
Re:Important point: (Score:3, Interesting)
I had 5 years under my belt of lab work at MIT, and was learning programming again (I took AP comp-sci in highschool, and had decided to learn some programming for the hell of it with friends who were working in the industry.) There was call at work for me to automate some of the analysis that I needed to do.
Doing some simple tests like a TDT (yeah, I like population genetics) by hand took a long time, and was error prone. I used a bit of my p
Re:Important point: (Score:1)
Re:Important point: (Score:1)
Then why are you on
Hmmm. Not sure. Beause the converse of my statement is surely: Bioinformatics people have no interest in programming, linux, etc. Actually, it's a little known fact that all bioinformatics is still done with pencil and paper (we can't even use calculators, the Patriot Act forbids it).
Give me your poor, your tired, your Bioinformatics (Score:1)
Shhh. Don't mention that. Next thing you know Congress will outlaw our Sliderules and Pencils.
It's hard enough using Polaroids to take pictures of the gels when we PCR
commercial bio software sucks because (Score:2)
Re:Important point: (Score:1)
Re:Important point: (Score:1)
There's a real lack of well engineered bioinformatics software. Most of what's there is quick-and-dirty one-off hackery that got entrenched as standard practice.
Like computer science, though maybe for different reasons, biology attracts personalities that don't play nice with others. That's the real problem. Because, in order to build bioinformatics software that is both well engineered and actually usefull, skills from a lot of disciplines will b
random quote (Score:2, Insightful)
Huh? (Score:2)
Too broad in scope (Score:5, Informative)
Having been in the field for 5 years or so, and matriculating for my PhD next year, I know something about the subject. Unfortunately, the subject "bioinformatics [wikipedia.org]" is way too broad to ever make for a good book.
For example, applying for PhD programs, I found myself looking at program names such as: Biophysics, Bioinformatics and Integrative Genomics, Biomedical Informatics, Computational and Systems Biology, and of course Bioinformatics. And the terms meant something different to each professor I spoke to, and are changing over time yet. Biomedical informatics definitely implies medical databases and EMRs (electronic medical records), while Biophysics implies more of a, well, physical approach (x-ray crystallography, cell movement and membrane forces).
But Bioinformatics and computational biology encompass them all--including other topics such as protein folding, genomics, proteomics, sequence alignment, paper-mining, evolution. Each of these touches on a vastly different aspect of biology and/or computer science and to different degrees. A good book (and plenty long enough for a textbook, I assure you) could be written on any single sub-subject. A book titled bioinformatics isn't going to be worth your while.
My 2 cents and rant. Thanks for bearing with me
Re:Too broad in scope (Score:3, Informative)
Quite a diverse collection, really.
Re:Too broad in scope (Score:3, Insightful)
Basically, someone like myself might not be too knowledgeable
Re:Too broad in scope (Score:1)
That's how we do those association studies. True, we don't have the immediate goal of improving medical care, but we manage huge sets of data. Reference: I've got a data warehouse that has over 550M rows of data in it. That's essenti
Re:Too broad in scope (Score:1)
Down here in Melbourne, Australia we tend to refer to them as bioinformaticians for unknown reasons
Should I go into Bioinformatics? (Score:4, Interesting)
Perhaps it's the DB admin that getting to me, but I've enjoyed being able to work with enormous data sets and putting puzzle pieces together.
It's a big leap. I'm 30. I only have first year chemistry under my belt (no university level biology) and having kids, a mortgage and my own health and sanity to take into account, it seems an enormous career change.
I've started to look into the field by checking out about a couple dozen books on the subject from my university library. (I've since whittled the pile down to just a few books!) I'm plodding along and what I've read to date is really intriguing, even if I'm taking a bizzare Math approach to understanding genetics.
I'm concerned that I have a niave approach to the field: looking at genomics, proteomics and bioinformatics as the biggest and coolest LEGO puzzle ever devised. Yet most books (especially the "Programming for Bioinformatics" types) seem to focus solely on data storage and not actually *using* the data.
Has anyone else here moved from Computing or Mathematics into Bioinformatics? Was the experience what you expected?
Re:Should I go into Bioinformatics? (Score:1)
I just switched myself, with a DA/DBA post-grad certificate, into Bioinformatics, but I had four years of Latin, had worked in Health Care for four years, and had University-level Biology and Chemistry. The one thing you'll really need is stronger Biology.
You could take some audit courses in Biochemistry and Biology, of course. That might help.
All the acronyms will drive you crazy, but the field is so specialized that if you study hard you migh
Re:Should I go into Bioinformatics? (Score:1)
Re:Should I go into Bioinformatics? (Score:1)
Do you have any suggestions of companies or organizations that write comp bio software? I'd love to find a need and start my own business writing software but I'm not sure how to break into the field.
Re:Should I go into Bioinformatics? (Score:1)
Re:Should I go into Bioinformatics? (Score:2)
Re:Should I go into Bioinformatics? (Score:3, Informative)
What you need to pick up really depends on what kind of work you want to do in the field. There are absolutely people with little understanding of biology all over. They typically do things like optimize and translate code or tweak algorithms for biologists. To move up to more interesting problems, though, you'll have to teach yourself quite a bit of biology and chemistry.
My advice is to start with the basics. Pick up a college-level Intro to Biology textbook and learn the relevant stuff: Biologica
Re:Should I go into Bioinformatics? (Score:2, Interesting)
I'm a 21-year-old CS student that just applied for a double major in Molecular and Cell Biology (MCB), getting into computational biology, and I will say that knowledge of molecular and genetics biology is a must. The people here at Berkeley know their introns and promoters and amino acid interactions, along with (what seems to be) a foundation in statistics and probability. They're juggling enormous data sets to figure out, "What's the probability that alanine is in this protein family?" And sometimes I fe
Re:Should I go into Bioinformatics? (Score:4, Insightful)
My experience is that formal training in biology and chemistry cannot hurt, but they're not mandatory.
I have degrees in Comp Sci & Math (like a double major in US), but nothing beyond an introduction to biology and chemistry. I have a good understanding of what I know in biology and chemistry, but I'm just a novice in these areas.
I hold a PhD in CS, with a thesis on bioinformatics. I am fairly active in the area, so my experience might be relevant.
Over the years I found that the only necessary skills are good communication and some mathematical intuition. Programming skills are useful, but marginally so. One good idea easily compensates for ten top programmers. I am a good programmer, with years of practice and a few projects of at least 50,000 lines (some published under GPL). So don't think I'm bashing coders because I'm not good at it myself.
However, I always found that the most successful projects followed from good communication between the modellers and the biologists. As long as they were able to tell each other what they wanted and where things weren't going well, all went beautifully.
The quality of the code was a side issue, discussed only when we didn't have anything else to say.
There were some pitfalls I encountered over time, too.
Modellers thinking they understood everything, and that they could do everything on their own. Usually they produced beautiful theories, without much practical application or success.
Biologists thinking the modellers were trying to devise programmes that would replace them. They generally sneered upon our projects and they went back to staring at some experimental results hoping they could sift through thousands of rows in Excel. It rarely worked.
Overly complex programme design because some programmer decided it was useful to use the latest buzzword technology. Usually this failed because it actually wasn't necessary to make the project so complex.
In what concerns the available literature, there are some books that deal with the problems and solutions in the field. One such example would be "Bioinformatics" written by Baldi & Brunak. Another would be "Molecular modelling" by Alan Hinchliffe.
I found these geared more towards presenting the problems at hand, and some of the existing algorithms.
So, all in all: one can work in bioinformatics without much training on life sciences. Some general knowledge is necessary, although mostly for allowing the communication with the experts in biology or chemistry.
From a social perspective, a somewhat modest attitude (not humble, just know your limitations!) is also important, because it facilitates communication. A positive attitude towards group work is also necessary, since I really cannot see anyone being able to do such research alone.
Re:Should I go into Bioinformatics? (Score:2)
First, there has been a huge, huge explosion of data, and the bio community was really not prepared, and so simply getting an understanding of what a real DB is, and how to set them up and so forth, took a while.
Let me tell you a true story: This guy tells me, I used to work on parvovirus ( its not important what parvoviruses do)and I am looking in gen bank, and this PV seq is in th
Wait, I just read about this .. (Score:2)
http://www.chaosmatrix.org/library/humor/pshift.t
Looks like this guy has a newer version, I don't see a "bioinformatics" option.
I liked thisbetter - Bioinformatics: practical gde (Score:2, Interesting)
Had much better sections in the third edition, which I got fresh out of the UW Library when it came in, on PSI-BLAST and BioPerl and suchlike.
The only downside to a textbook in our field is that half the database practical sections become out of date within a year or two.
University Of Manchester Bioinformatics (Score:2)
http://www.bioinf.man.ac.uk/education/MSc.shtml#c
Change Terms Please! (Score:1)
I never understood why people think it's special. We used to call these run-time studys, search algorithms, etc "Computer Science", or maybe just "Informatics".
It seems that biologists decided to learn Perl, and discovered (on their own, maybe!) that you could use it to search these sequence files they generate. Suddenly, they decided they needed to create this entire new field, totally ignoring all of the CS research before them.
It shows in the so
Re:Change Terms Please! (Score:1)
I'm approaching this from the other side, I'm a biologist not a coder.
What I'm working on right now is alignments of RNA secondary structures. Since this is a relatively new idea there is no really polished software to this yet.
Some of the stuff I experinced in the last days:
RSMatch:
http://http//aria.njit.edu/rnacenter/RSmatch/ [http] Chokes on lower case letters in the sequnce files. Most amusingly it does that when it encounters one, meaning it will happily do seed alignments for 45 minutes then f
Re:Change Terms Please! (Score:1)
I'm constantly writing wrappers around things to make them sane, and re-implementing stuff in a hopefully more useable way. Now if only the'd let me BSD licence the results.
Now if I only had time to work on a consed [washington.edu] replacement, like I've wanted to do for quite a while. That is the most unholy piece of software I've ever seen... "../chromat_dir" and "../phd_dir" are HARD CODED in the source!
I need to think of a way to convince the higher-ups to let us hir
Re:Change Terms Please! (Score:1)
Re:Change Terms Please! (Score:1)
Re:Change Terms Please! (Score:1)
You mean "BLAST"?
Of course it doesn't have consistent returns - it's a search of the known entries - people are entering new data every second.
Biology and Biochemistry hold still for no man. Or woman.
Nah, it's the CS folk who coined the damn name (Score:2)
Re:Nah, it's the CS folk who coined the damn name (Score:1)
ANOTHER thing wrong with this industry I can blame on dotcom!
Woosh! The sound you hear is my sanity slipping away...