Beginning Perl for Bioinformatics

Home :: Books :: Science

Arts & Photography
Audio CDs
Audiocassettes
Biographies & Memoirs
Business & Investing
Children's Books
Christianity
Comics & Graphic Novels
Computers & Internet
Cooking, Food & Wine
Entertainment
Gay & Lesbian
Health, Mind & Body
History
Home & Garden
Horror
Literature & Fiction
Mystery & Thrillers
Nonfiction
Outdoors & Nature
Parenting & Families
Professional & Technical
Reference
Religion & Spirituality
Romance
Science
Science Fiction & Fantasy
Sports
Teens
Travel
Women's Fiction

	Beginning Perl for Bioinformatics
	List Price: $39.95 Your Price: $26.37

Product Info

Reviews

<< 1 2 >>

Rating: 4 stars

Summary: Good intro for biologists;poor intro for computer scientists
Review: "Bioinformatics" is the new sexy term for what used to be called simply "computational biology". Simply put, it involves pretty much any application of computation techniques to biological problems. The reason for the new nomenclature and the greatly increased interest in the topic is, like much in modern biology, a more-or-less direct consequence of the many genome sequencing projects of the last decade.

The consensus in the field seems to be that it's more productive (and certainly easier) to teach biologists how to program, rather than try to get programmers up to speed on the intracities of molecular biology. For similar reasons, Perl is a popular language to learn: it's easy to get off the ground and be productive with it, without requiring a heavy computer science background. (This, of course, has downsides as well...)

Never one to miss out on a trend, I'm going to be teaching a course on Bioperl and advanced Perl programming, starting next fall, which means I'm doing a lot of reading in this topic area, trying to develop lectures and find good background reading material. One of the first books I grabbed was _Beginning Perl for Bioinformatics_, which has been sitting on my "to read" shelf since O'Reilly sent me a review copy in December of 2001. It's a typical O'Reilly "animal" book (the cover bears three tadpoles), which does a decent job of introducing the basic features of the Perl language, and it should enable a dedicated student to get to the point where she can produce small useful programs. However, I'm not completely happy about the book's organization, and I think the occasional "if you're not a biologist, here's some background" interjections could have been cut without hurting anything.

The initial chapters in the book cover "meta" information, such as theoretical limits to computation, installing (or finding) the Perl interpreter on your computer, picking a text editor, and locating on-line documentation. Some general programming theory stuff is covered as well -- the code-run-debug cycle, top-down versus bottom-up design, the use of pseudocode. There's also some biology background, but it's very introductory level stuff -- DNA has four bases, proteins are made of 20 amino acids, and so on.

In chapter four, the book begins to get into actual Perl, with some coverage of string manipulation. Examples deal with simulating the transcription of DNA into RNA. Chapters five and six continue to flesh out the language, covering loops, basic file I/O, and subroutines. Chapter seven introduces the rand() function, in the context of simulating mutations in DNA. Subsequent chapters introduce the hash data type (using a RNA->protein translation simulation), regular expressions (as a way to store the recognition patterns of restriction endonucleases), and parsing database flat files and BLAST program output.

I'm clearly out of the target audience of the book, as I already have a strong working knowledge of Perl. Perhaps that's why I found the order that concepts were presented in to be a bit strange -- for example, hashes, which are a fundamental data type, aren't introduced until halfway through the book, and regular expressions (one of the key features of Perl) first appear even later. As I said above, I also found the biological background sections to be more distracting than anything, but I've also got a strong biology background, so perhaps I'm off base here too. That said, I think a person with a CS background would be better served with a copy of _Learning Perl_ and an introductory molecular biology text than with this particular book.

One of the things I did enjoy about the book were the frequent coding examples, all of which presented realistic computational biology sorts of problems and then demonstrated how to solve them. I'm sure that when I get around to writing lectures, I'll be leafing through this book looking for problems I can use in class.

Overall, recommended for biologists without programming experience who would like to get started using Perl for simple programming. Not recommended for people with computer science backgrounds looking to get into bioinformatics.

Rating: 4 stars
Summary: Decent intro to the subject
Review: As the banner above the title of James Tisdall's Beginning Perl for Bioinformatics indicates, this book is 'an introduction to Perl for biologists.' What the banner doesn't mention is that it's also an introduction to biology and bioinformatics for Perl programmers, and it's also an introduction to both Perl *and* biology for people that have never really been exposed to either field. The author has clearly thought a lot about making one book to please these different audiences, and he has pulled it off nicely, in a way that manages to explain basic topics to people learning about each field for the first time while not coming off as condescending or slow-paced to those that might already have some exposure to it.

Superficially, this book isn't all that different from a lot of introductory Perl books: the Perl material starts out with an overview of the language, followed by a crash course on installing Perl, writing programs, and running them. From there, it goes on to introduce all the various language constructs, from variables to statements to subroutines, that any programmer is going to have to get comfortable with. Pretty run of the mill so far. Tisdall starts with two interesting assumptions, though: [1] that the reader may have never written a computer program before, and so needs to learn how to engineer a robust application that will do its job efficiently and well, and [2] that the reader wants to know how to write programs that can solve a series of biological problems, specifically in genetics and proteomics.

As such, there is at least as much material about the problems that a biologist faces and the places she can go to get the data she needs as there is about the issues that a Perl programmer needs to be aware of. The author introduces the reader to the basics of DNA chemistry, the cellular processes that convert DNA to RNA and then proteins, and a little bit about how and why this is important to the biologist and what sorts of information would help a biologist's research. The main sources of public genetic data are noted, and the often confusing -- and huge -- datafiles that can be obtained from these sources are examined in detail.

With the code he presents for solving these problems, Tisdall makes a point of not falling into the indecipherable-Perl trap: this is a useful language, well-suited to the essentially text-analysis problems that bioinformatics means, and he doesn't want to encourage the kind of dense, obscure, idiomatic coding style that has given Perl an undeservedly bad reputation. Some of Perl's more esoteric constructs are useful, and they show up when they're needed, but they're left out when they would only serve to confuse the reader. This is a good decision.

Rather, the focus is on teaching readers how to solve biological problems with a carefully developed library of code that happens to leverage some of Perl's most useful properties. The result is pretty much a biologist's edition of Christiansen & Torkington's Perl Cookbook or Dave Cross' Data Munging With Perl. The author presents a series of issues that a working bioinformaticist might have to deal with daily -- parsing over BLAST, GenBank, and PDB files, finding relevant motifs in that parsed data, and preparing reports about all of it. If a bioinformaticist's job is to be able to report on interesting patterns from these various sources, then following the programming techniques that Tisdall explains in clear, easy-to-follow prose would be an excellent way to go about doing it.

And when I say "programming techniques," note that I'm not specifically mentioning Perl. The code in this book is clear and organized, and all programs are carefully decomposed into logical subroutines that are then packaged up into a library file that each later sample program gets to draw from. Each new program typically contains a main section of a dozen lines of code or less, followed by no more than two or three new subroutines, along with calls to routines written earlier and called from the BeginPerlBioinfo.pm that is built up as the book progresses. Each sample is typically preceded by a description of what it's trying to accomplish and followed by a detaild description of how it was done, as well as suggestions of other ways that might have worked or not worked.

This modular approach is fantastic -- too many Perl books seem to focus so heavily on the mechanics of getting short scripts to work that they lose sight of how to build up a suite of useful methods and, from those methods, to develop ever-more-sophisticated applications. It isn't quite object-oriented programming, but that's clearly where Tisdall is headed with these samples, and given a few more chapters he probably would have started formally wrapping some of this code into OO packages.

If I have a complaint with the book, in fact, it's that Tisdall doesn't go any further: everything is good, but it ends too soon. Seemingly important topics such as OO programming, XML, graphics (charts & GUIs), CGI, and DBI are mentioned only in passing, under "further topics" in the last chapter. I also have a feeling that some of the biology was shorted, and the book barely touches upon the statistical analysis that probably is a critical aspect of the advanced bioinformaticist's toolbox. I can understand wanting to keep the length of a beginner's book relatively short, and this was probably the right decision, but it would have been nice to see some of the earlier sample problems revisited in these new contexts by, for example, formally making an OO library, showing a sample program that provided a web interface to some of the methods already written, or presenting code that presented results as XML or exchanged them with a database.

But these are minor quibbles, and if the reader is comfortable with the material up to this point, she shouldn't have a hard time figuring out how to go a step further and do these things alone. It's a solid book, and one that should be able to get people learning Perl, genetics, or both up to speed and working on real world problems quickly.

Rating: 5 stars
Summary: Very timely introduction to PERL
Review: Finally someone has written a beginning book on PERL for biologists, and has also done an excellent job of doing so. This book assumes no prior programming experience, and therefore suits the biologist who needs to concentrate on using computers to solve biological problems, and not have to become a computer scientist in the process. PERL can be a very cryptic language, but it is also extremely concise, and PERL programmers frequently and rightfully boast about their "one-liners" that accomplish complicated tasks with only one line of code.

Since it is addressed to readers with no programming experience, the author introduces some elementary concepts of programming in the first three chapters. These include what text editor to use, how to install PERL, how run PERL programs, and other relevant elementary topics.

The author then gets down to writing a program to store a DNA sequence in chapter 4. Very basic, it merely reads in a string and prints it out, but serves to start readers on their way to developing more useful programs. Later a program for the transcription of DNA to RNA is given, which illustrates nicely the binding, substitution and trace operators. Block diagrams are used here, and throughout the book, to illustrate basic PERL operators. The author shows in detail how to read protein sequence data from a file and how to use it in a PERL program. The reader is also introduced to the most ubiquitous data structure in all of computing: the array. Already the reader gets a taste of the power of PERL to manipulate arrays, using operations such as 'unshift', 'push', 'splice', etc.

The next chapter introduces conditional statements in PERL, as a warm-up for the discussion on finding motifs in sequences. The reader can see why PERL is the language of choice in bioinformatics, with its ability to find substrings or patterns in strings. Things do become more cryptic in the discussion of regular expressions, but the reader can get through it with some effort. Interesting programs are given for determining the frequency of nucleotides.

Since the programs have become more complicated to this point, a discussion of subroutines follows in the next chapter. And, for the same reason, the reader is introduced to debugging in PERL in this chapter also. The greater the complexity of the program, the harder it becomes to avoid making mistakes, and even more difficult to find them. The very important concepts of pass by value versus pass be reference are discussed briefly in this chapter.

Random number generators, so important in any consideration of mutations, are discussed in chapter 7. It is shown, via some straightforward programs, how to select a random location in DNA and mutate it with some other nucleotide. In addition, the author shows how to use random numbers to generate DNA sequences and mutate them in order to study the effect of mutations over time.

The next chapter is the most interesting in the book, for it shows how PERL can be used to simulate how the genetic code directs the translation of DNA into protein, the hash data structure being used extensively for this purpose. The author shows how to read DNA from files in FASTA format, and discusses in detail reading frames. He gives a useful subroutine to translate reading frames.

The author returns to regular expressions in chapter 9, wherein they are used as 'wildcards' to search for a particular string in a collection of strings. In addition, the range operator is used to find restriction sites. Regular expressions are also used in the next chapter to manipulate GenBank 'flat files'. The author does however give URLs for more sophisticated bioinformatics software. This is followed in chapter 11 by a discussion of the use of PERL to work with files in the Protein Data Bank. Recursion, one of the most powerful techniques in programming, is introduced here.

Chapter 12 covers the Basic Local Alignment Search Tool (BLAST), wherein readers get a taste of the field of computational biology. This extremely popular software package is used to find similarity between a given sequence and a library of known sequences. The author does discuss some of the basic rudiments of string matching and homology, and encourages the reader to consult the BLAST documentation for further details. In addition, the author briefly discusses the Bioperl project in this chapter, and shows the reader how to run some elementary computations using it.

This book definitely is a timely one and it will serve the needs of biologists who need to obtain some programming expertise in PERL. There are helpful exercises at the end of each chapter that serve to solidify the understanding of the concepts introduced in the chapter. After a thorough study of it, readers will be well-equipped to use PERL in bioinformatics. With more mathematical background, readers after finishing it will be able to enter the exciting field of computational biology, a field that is exploding, and one in which will require imaginative programming skill in the future.

Rating: 5 stars
Summary: Good Start
Review: I am a biologist who has written 'Hello World' in innumerable programing languages and progressed no further. No matter how many different books I buy and put on my shelf I still haven't learned to program. After removing this book from the Perl section of my shelf and spending two days with it I wrote a script that turns an amino acid sequence into a degenerate oligo (DNA) and prints its reverse compliment. Certainly not bioinformatics but I impressed myself. This was because simple things that any programmer would know, but I do not, are explained in detail. Thank you. Thank you. Thank you.

Because of this book I am confident that I will now be able to do many of the simple sequence manipulations my current projects require. However and but. A cusory glance at the innards of 'Programming Perl' is enough to show just how much more their is to Perl.

The title is not misleading but it could have been 'Beginning Perl for Beginning Bioinformatics'. Ain't no Hidden Markov models here.

Five stars this time but next time their better be MORE in the same style.

Rating: 5 stars
Summary: Good Start
Review: I am a biologist who has written 'Hello World' in innumerable programing languages and progressed no further. No matter how many different books I buy and put on my shelf I still haven't learned to program. After removing this book from the Perl section of my shelf and spending two days with it I wrote a script that turns an amino acid sequence into a degenerate oligo (DNA) and prints its reverse compliment. Certainly not bioinformatics but I impressed myself. This was because simple things that any programmer would know, but I do not, are explained in detail. Thank you. Thank you. Thank you.

The title is not misleading but it could have been 'Beginning Perl for Beginning Bioinformatics'. Ain't no Hidden Markov models here.

Five stars this time but next time their better be MORE in the same style.

Rating: 5 stars
Summary: popular for a reason
Review: I checked this book out from the school library, and had to put a recall notice on it to get it, and then a recall notice was put back on it from me. It's popular for a reason: it's an excellent primer, and I've decided to just buy a copy for myself.

Rating: 5 stars
Summary: No need to have any previous programming knowledge
Review: I had zero programming experience when I started reading this book. It allowed me, step by step, to get familiar with the language and start writing programs related to the field I am interested in.
It is fun and very helpful. You don't feel the frustration of being lost in the middle of unreadable code. The comments and explanations to the programs are great. It allows you to start learning the simple things first and then, as you get familiar with the language, go into more detail.
You can chose, as the author suggests, to go sometimes to the Perl documentation and read about the operators or functions introduced in the different programs; but what is great about the book is that you are given examples and exercises to use them. This is really the way to learn.

Rating: 3 stars
Summary: OK tutorial. Poor reference.
Review: I have used this book in a beginning Perl programming course for biology majors. While it is good if you sift through it from start to the end, I often found it impossible to find things when I needed to go back to remind myself of something. The index does not help, and there is no concise language reference anywhere.

Also, I do not like the fact that it uses "quick and dirty" Perl (no "use strict" pragma). While it might be less confusing to skip it at the very beginning, very soon students start to waste too much precious class time trying to locate bugs that would make the program not compile with "use strict" in the first place (e.g. mistyped variable names).

Rating: 5 stars
Summary: Good source
Review: I liked this book because I had very little background in programming (aside from a semester of C++ a long time ago) and it wasn't too overwhelming. The excercies were great and the programming was explained fairly well.

I did a lot of bio-informatic work (lineplots, blasts, etc). The book was great for teaching programming that would be useful for these applications, and not a lot of other miscellaneous programming, that i would never really need.

Rating: 5 stars
Summary: Good source
Review: I liked this book because I had very little background in programming (aside from a semester of C++ a long time ago) and it wasn't too overwhelming. The excercies were great and the programming was explained fairly well.

<< 1 2 >>