<< 1 >>
Rating: ![4 stars](http://www.reviewfocus.com/images/stars-4-0.gif) Summary: nice coverage of advanced topics with emphasis on modeling Review: Frank Harrell is a Professor who does a lot of consulting in medical research. This book covers a wide variety of topics in regression analysis including many advanced techniques including data reduction, smoothing techniques, variable selection, transformations, shrinkage methods, tree-based methods and resampling. But note the title "Regression Modeling Strategies". Unlike most advanced texts in regression this book emphasizes modeling strategies. So the focus is on things like variable selection and other techniques to avoid overfitting models and diagnostics to look for violations in assumptions such as variance homogeneity or normality and independence of residuals, or stability problems like colinearity.The book covers an extensive collection of modern techniques for exploratory data analysis. Inferential methods are also considered and he deals appropriately with important issues (particularly for medical research) such as imputation of missing values. Many examples are considered and illustrated in S-PLUS. Harrell also provides many rules of thumb based on his own experience building models. A lot of the techniques are illustrated using data from the Titanic where it is interesting to see which factors affected the probability of survival. My only disappointment was that there is perhaps too much emphasis on this one particular data set. A standard regression text would be expected to include linear and nonlinear regression. Harrell goes much deeper including nonparametric regression, logistic regression and survival models (e.g. the Cox proportional hazards model).
Rating: ![4 stars](http://www.reviewfocus.com/images/stars-4-0.gif) Summary: Good advanced topics book Review: This book covers a lot of advanced regression techniques and is intended for an audience that has been through an introductory regression analysis class. This book sets itself apart by emphasizing modeling strategies rather than teaching just theory or applied regression in the textbook fashion in which all the examples work out perfectly and one doesn't have to worry about dirty data. Instead it talks exstensively about topics that occur in the real world, such as handling missing data, a big problem when dealing with real data but rarely mentioned in most regression texts. He also talks about many of the traps students get themselves from what they learn in an intro regression class is not always exactly the best way of going about doing things and explains better alternatives.
Through out the book, use of S-plus and R is liberal which is very nice. Numerous extensive case studies thoroughly analyze data sets using many of the techniques he describes and gives full S-plus/R code for them to recreate on your own.
Unfortunately, I really didn't like the data sets he chose to analyze. Many of them were medical related, another used Titanic survivors data, another was about the 2000 election, while very well done, I found the datasets themselves rather uninteresting. This of course leads to a problem, me being an engineer, I'd rather have datasets I can relate to, while of course a social scientist would like sets they could relate to, so I realize the author has a hard time making everyone happy. It would be nice to have prehaps had additional case studies available on the book website, perhaps worked by other individuals from a variety of disiplines.
Rating: ![4 stars](http://www.reviewfocus.com/images/stars-4-0.gif) Summary: You need to be an expert in statistics to understand this.. Review: This is clearly an advanced text that mathematicians and PhD students in statistics would find valuable. It is not for an engineer or novice statistican in industry (like myself) who has to come up with an accurate regression model with quantitative and qualitative data in a short period of time. My rating is four stars: buy this book only if you have the advanced statistical training to understand it, otherwise buy a simpler book if you want to get a basic understanding of the subject.
Rating: ![5 stars](http://www.reviewfocus.com/images/stars-5-0.gif) Summary: Outstanding graduate text Review: This text does a five star job of what the title advertises. The book could be used for a one year graduate course in applied linear models. The writing is excellent, and topics very up to date. This is for graduate students with a good foundation in mathematical statistics and applied statistics. Very good integration with modern statistical packages.
Rating: ![5 stars](http://www.reviewfocus.com/images/stars-5-0.gif) Summary: Published Review by Margaret May Review: Though it can be treated as an advanced book in statistics, empirical researchers can find tremendous value in this book just by following the steps and visualize your data. It's very useful for fitting and validation in prognostic models, and it emphasize on use of bootstrap. Just flipping this book, you will feel silly to pre-specify a multivariate regression model (using Proc reg, logistic, phreg) without checking the interactions and nonlinear terms or use simple model fitting approach such as stepwise selection, becasue data in the real world are no means "linear" and free of interaction. All the functions are written in S-Plus (or R) and I cannot resist the temptation of those beautiful and highly informational graphs. As a result, I am converted to be a S-Plus user now after being a SAS user for years, following the steps of Dr. Harrell (a SAS user from 1969-1991). This review is published on International Journal of Epidemiology 2002;31:699-700 (I am a quoter, not the author). Most statistical textbooks present techniques and give simple examples of their use. This book is different. It assumes you already have the basic tools of linear and logistic regression, parametric and semi-parametric survival analysis in your well-stocked statistical tool box which you acquired in graduate school. The question this book addresses is how do you use those regression tools properly. The book succeeds in being both philosophical and intensely practical in nature. It is about the art of data analysis and modelling strategies. It takes you through the whole process starting with imputation of missing data, leading you through dealing with non-linear relationships, estimating transformations, variable selection, model building and finally validation of the model using powerful bootstrap techniques. Harrell has a unifying approach to regression modelling strategies in that he emphasises how the methods he presents may be used across many different types of regression model in a variety of subject areas, although his examples are biomedical. One of the main points of the book is that there is a dishonesty that is widespread in that we treat inference from P-values, confidence intervals and statistics as if the data were not used to build the model. We need to recognise that it is usually not possible to pre-specify a multivariable regression model, for example, whether a survival model should be a Weibull or a lognormal model, what transformations of variables are appropriate, inclusion of non-linear terms and interaction terms and so on. However, statistics are often computed as if the data were not used to make decisions about the form of the model and how predictors are represented in the model. This means that models over fit the data on which they are estimated and poorly predict responses of future observations. Great emphasis is placed on addressing this fundamental problem of the modelling process. In particular, the author strongly recommends using bootstrap methods in many steps of the modelling strategy, including variable selection, derivation of distribution-free confidence intervals and estimation of optimism in model fit. For example, there has been much criticism of stepwise variable selection, but Harrell uses this procedure with bootstrapping and shows that variation in bootstrapped samples of the same dataset will lead to selection of different sets of variables and that a better strategy is to use the set of variables which occurs most frequently in the bootstrapped samples. This will give a more reliable and useful set of prognostic factors in the model which will predict responses from new data with greater precision and accuracy. There are detailed case studies of real examples which are analysed using S-Plus with the code being explicitly given. The web site of the book gives access to the datasets and an S-Plus library with 200 functions for model fitting and testing, estimation, validation, prediction, graphics and typesetting. The book is particularly strong on graphical presentation of the regression models and claims that a picture will often persuade a non-statistician of the necessity for a particular transformation of a predictor rather than to opt for a simple linear term which does not fit the data so well. In particular, cubic splines and non-parametric smoothers are recommended early on as a way of relaxing linear assumptions and are used throughout the case studies. This is an excellent book for its target audience, postgraduates who know the technical details of regression models, but not necessarily when and how to use them. It is also a worthwhile addition to the reference shelf of data analysts and statistical methodologists who will appreciate the many recipes given for successful modelling strategies and tips on validation when the data have been used to inform the modelling process.
<< 1 >>
|