<< 1 >>
Rating: Summary: Still a useful source of information Review: This book, originally published in 1983, was reissued in 1999, no doubt because of the importance of genetic sequencing in recent years. What is neat about the book is it shows how algorithms from one field can be applied to solve problems in another, possibly totally disparate field, one example being computational linguistics and sequence algorithms in computational biology. A general overview of sequence comparison is given in chapter 1 with applications to molecular biology, human speech, computer science, coding theory, gas chromotography, and bird songs discussed. The author discusses how deletion-insertion, compression-expansion, and substitution are employed in sequence comparison. Different metrics are introduced, such as the Levenshtein distance. Dynamic programming, which pretty much dominates the book, is introduced here also. Part 1 of the book discusses sequence comparison in molecular biology. The use of dynamic programming is emphasized and its importance continues to this day. The advantages of using the dynamic programming method are outlined, and it is shown how to find the substring in a longer sequence with most optimum agreement to a shorter sequence. In addition, given an RNA molecule with a known nucleotide sequence, methods are discussed for predicting the way different parts of the molecule will bond to each other. These methods are based on dynamic programming. Mathematicians considering doing research on or about entering the field will profit from the section on the biological background. The treatment of RNA secondary structures is excellent. In part 2, the emphasis is on speech processing and what is called "time-warping", which is a technique for comparing functions by altering the time axis. An interesting application is given to the comparison of bird songs. An algorithm is given for adjusting the time scales for two songs to arrange them in the most optimal alignment. In addition, the differences between compression and expansion and deletion and insertion are discussed in this part. In part 3, a modified Smith-Waterman algorithm is employed to find similar portions in two sequences. Called local alignment in computational biology, it is shown in detail how to define the recurrences for the alignment and how to keep track of the pointers for backtracking. This part also generalizes the operations of substitution and Levenshtein distance. In addition, the strategy of doing sequence comparison by allowing transpositions is discussed. Such a strategy entails a generalized concept of trace, wherein trace lines can intersect each other, leading to entangling of the traces into knots or plaids. The usual dynamic programming techniques must then be extended to deal with these complications. One particular algorithm for this is discussed, called CELLAR, which involves the construction of a directed graph whose paths correspond to admissible sequences of generalizations of traces, called cuts. The computational complexity of this algorithm is discussed. In addition, an O(n^2/logn) algorithm is given for computing string-edit distances. The last part of the book deals with studying comparisons between random sequences. Combinatorial arguments are used to derive upper bounds on the expected length of the longest common subsequences of two random sequences. Other miscellaneous results dealing with comparing common subsequences of two random sequences are given.
<< 1 >>
|