Rating: Summary: A good but shallow book Review: Among my commercial data mining friends this book is considered to be the bible. It is worth having just to assess the mindset of the day-to-day data miners. The book discusses many data mining issues in more depth than most of the earlier works on this subject. However it still lacks the the depth and counsel of, say, applied multiple regression books (cf. Draper and Smith) that give guidance on when a particular method may give false results or how bogus results can be detected posteriori.
Rating: Summary: A good but shallow book Review: Among my commercial data mining friends this book is considered to be the bible. It is worth having just to assess the mindset of the day-to-day data miners. The book discusses many data mining issues in more depth than most of the earlier works on this subject. However it still lacks the the depth and counsel of, say, applied multiple regression books (cf. Draper and Smith) that give guidance on when a particular method may give false results or how bogus results can be detected posteriori.
Rating: Summary: data mining through the eyes of statisticians Review: Data mining is a field developed by computer scientists but many of its crucial elements are imbedded in important and subtle statistical concepts. Statisticians can play an important role in the development of this field but as was the case with artificial intelligence, expert systems and neural networks the statistical research community has been slow to respond. Hastie, Tibshirani and Friedman are changing this. Friedman has been a major player in pattern recognition of high dimensional data, in tree classification, regularized discriminant analysis and multivariate adaptive regression splines. He has also done some exciting new research on boosting methods. Hastie and Tibshirani invented additive models which are very general types of regression models. Tibshirani invented the lasso method and is a leader among the researchers on bootstrap. Hastie invented principal curves and surfaces. These tools and the expertise of these authors make them naturals to contribute to advances in data mining. They come with great expertise and see data mining from the statistical perspective. They see it as part of a more general process of statistical learning from data. The book is well written and illustrated with many pretty color graphs and figures. Color adds a dimension in pattern recognition and the authors exploit it in this book. It is really the first of its kind that treats data mining from a statistical perspective and is so comprehensive and up-to-date. The important statistical tools that are covered in this book include under the category of supervised learning; regression, discriminant analysis, kernel methods, model assessment and selection, bootstrapping, maximum likelihood and Bayesian inference, additive models, classification and regression trees, multivariate adaptive regression splines, boosting, regularization methods, nearest neighbor classification, k means clustering algorithms and neural networks. These methods are illustrated using real problems. Similarly under the category of unsupervised learning, clustering and association are covered. They cover the latest developments in principal components and principal curves, multidimensional scaling, factor analysis and projection pursuit. This book is innovative and fresh. It is an important contribution that will become a classic. The level is between intermediate and advanced. Good for an advanced special topics course for graduate students in statistics. The only comparable text is the text by Mannila, Hand and Smyth that I hope to be able to review in the near future.
Rating: Summary: Useful book on data mining Review: I use data mining tools in my financial engineering and financial modeling work and I have found this book to be very useful. This book provides two crucial types of information. First, it provides enough theory to allow a potential user to understand the essential insights that motivate specific techniques and to evaluate the situations in which those technique are appropriate. Second, the book gives the exact algorithms to implement the various techniques. While no book I have seen covers every data mining methodology available, this one has the strongest coverage I have seen in additive models, non-linear regression, and CART/MART (regression/classification trees). It also has very strong coverage in many other areas. I highly recommend it.
Rating: Summary: Covers many topics breifly Review: I was already familiar with many of the topics covered in this book, but had to do a double take when reading about familiar concepts. Unfortunately, the authors' unique perspective is not presented in a way that is benificial to the reader. I would strongly suggest another book as a reference or introduction to this material.
Rating: Summary: Do not buy a book without flipping it. Review: That's what happened to me respect this book. Conclusion: DO NOT BUY ANY BOOK, EVER, WITHOUT FLIPPING IT. OTHER'S REVIEW ARE NOT RELIABLE. This is the third time I bougth a book based on other's review. I repented.
Rating: Summary: The Elements of Statistical Learning Review: The book by Hastie, Tibshirani and Friedman is a welcome addition to the quickly growing area of machine learning and data mining. This is a well written book, laid out nicely with excellent examples by 3 well established researchers in the field. It will be helpful to those who are interested in learning about this field, as well as experts who want to know more My only complaint is that although the authors do make an honest attempt to clearly highlight methods that are based on their own research, often this distinction becomes cloudy and the reader is left with the impression that the methods advocated are often the best and represent the standard in the industry. In fact many of their ideas are only heuristic and it is more than conceivable that these will eventually be superseeded with better methods. A good book, which gets you up to speed in the literature but it will only be relevant for a few years.
Rating: Summary: The Elements of Statistical Learning Review: The book is written by some of the biggest names currently in the field, and thus is written at a certain level, this isn't a fault of the book or the authers, but rather it was written for a specific audience. However I did find it odd when they would occassionally explain basic readily known notation, but later on assume the reader is familiar with what I would regard as advanced notation, or leave out quite a few steps in their mathematics assuming the reader understands what they did. This book covers a wide range of techniques ranging from the more traditional to the current, and for each topic presents an overview of the technique and provides adequate references for further exploration.
The reader should have a good underlying understanding of linear algebra, statistics and probability theory and also be familiar with the techniques presented here. This book was used in a graduate engineering data mining class, and most of us struggled greatly with the book. This book probably would have been more appropriate if this was a book to augment another text, or if this had not been the first time we had seen topics such as those presented, this being the book to explain neural networks, support vector machines and whatnot when you've never seen them before makes for a very bewildering experience, but once you find a few journal articles the techniques actually are fairly easy to understand.
The book does not explain how to implement using software any of the techniques, this is a topic left up to other books, such as Modern Applied Statistics with S by Ripley and Venerables, and only in their discussion about apriori for association rules did I see that they state a software package. It would have been nice if they would have given some insight into how they created some of the great graphics that punctuate the book, perhaps as additional material on the website.
A book that is more down to earth for engineers, albeit different in scope, would be Duda and Hart's Pattern Classification, which I believe are electrical engineers and written more from an engineering standpoint. In addition the Duda and Hard book gives a lot of applications-based problems and has an associated MATLAB handbook to walk readers through building many types of learners, while this book the end-of-chapter excercises are almost exclusively proofs and theoretical excercises. Not a fault of the book, but rather just a difference and depends on what the reader wants to get out of it.
Ultimately, even though it did prove to be a rather confusing book, I have learned a lot from it and will continue to go through it to learn even more from it as it does tend to become more lucid the more I go through it.
Rating: Summary: Pedagogical Disaster Review: The Hastie book was used at our major university to teach data mining and statistical learning. The students in this graduate-level course included people with Masters and PhD degrees, as well as post-docs. Most people work in the field of bioinformatics, so have a pretty good grasp of complex topics and computer science, as well as mathematical algorithms. The overall rating from the course was a D-, which is one of the worst ratings for a book that was used on campus (out of hundreds). The text was hard to follow, confusing in many sections, and tough to teach from. It does cover a lot of ground, which is a benefit. But apparently the ability to do justice to clearly cover such breadth is a challenge that 20 really smart people couldn't figure out. Maybe individuals with a strong background and understanding in one or more of the areas covered by the book can do well by this item, but from a teaching/learning perspective there is at least one group of folks out here who would have done better with some other alternative.
Rating: Summary: Counter to review from Sep 8 Review: The review from September 8 expresses an opinion which is the exact opposite of mine, and is worded so strongly that I have to object. I gave a course using the book to bioinformaticians, most of them with a computer science background, and found the book exceptionally well prepared and suitable for a graduate course. The book serves the dual purpose of an introduction and a reference. An especially nice feature is how the authors explain the relationships and differences between different methods. By doing so, they provide context which I have not seen in any other book on this subject. The book is a very nice combination of basic theory and performance evaluation on data from a wide variety of domains and it is quite up-to-date. It has a well developed website going with it and the graphical material can be obtained electronically from the publisher. The book is an outstanding contribution to the field.
|