Home :: Books :: Computers & Internet  

Arts & Photography
Audio CDs
Audiocassettes
Biographies & Memoirs
Business & Investing
Children's Books
Christianity
Comics & Graphic Novels
Computers & Internet

Cooking, Food & Wine
Entertainment
Gay & Lesbian
Health, Mind & Body
History
Home & Garden
Horror
Literature & Fiction
Mystery & Thrillers
Nonfiction
Outdoors & Nature
Parenting & Families
Professional & Technical
Reference
Religion & Spirituality
Romance
Science
Science Fiction & Fantasy
Sports
Teens
Travel
Women's Fiction
Programming Spiders, Bots, and Aggregators in Java

Programming Spiders, Bots, and Aggregators in Java

List Price: $59.99
Your Price: $40.79
Product Info Reviews

<< 1 >>

Rating: 2 stars
Summary: Misleading Title
Review: As another reviewer commented this book should be called using the com.heaton.bot package api reference. All you learn is how to use this package of java classes, not how to actually create spiders, bots or aggregators from the ground up. I feel the title is misleading for such an expensive book. The only way I will learn what I want is to read the authors source code - which btw is very ugly however functional.

Rating: 5 stars
Summary: Create a Object Oriented Bot Package Step by Step
Review: I use this book as a supplement to a class that I teach, as it gives the students the necessary stills to programmatically spider, and generally access, information on the Net.

As some of the other reviewers point out, this book does center around the creation of a "bot package". However, I see this as one of the book's greatest strengths. The author explains step by step how to take basic concepts, continually build upon them, progressing onward to more complex spiders and bots. Specifically:

1. Create an advanced HTTP object that overcomes many of the shortcomings of the one which is built into Java. (namely cookie support, referrer support, HTTP authentication, and more)
2. Add forms/page processing on top of the HTTP object. You are shown step by step how to process the data you collect from step 1.
3. Create a bot that wields the page/form processing created in step 2.
4. Create a spider, that, using steps 1-3, can access pages across an entire site.
5. Expand the spider to support thread pooling and a JDBC database.

Rather than providing a bunch of disjoint code samples, like many books do. The author guides you step by step through the above path, revealing the techniques at every step. For the reader who does not care about the intricate nature of bot programming, sadly, some of my students. You can skip to the API documentation and get right onto creating your own bots. You can also download updated versions of the "bot package" from the author's site. I actually did this before buying the book.

The downsides to the book are the example programs use of GUI's. I would rather every example had been straight console, the GUI only gets in the way, for a book targeting bot programming. Also the author very annoyingly putting an underscore in front of every class-instance variable, which gives some of the code something of a C++ look I suppose.

If you are already programming bots and spiders of your own, I don't think you will get much more from this book than you are likely already doing.

But for someone who wants to get started in this exciting area, there is nothing else like it, and I highly recommend it.

Rating: 5 stars
Summary: A great example of how to present highly technical material
Review: I've read MANY technical books over my 23 year career in IT and would say this this book is one of my top 3 favorites. Although I'm a novice Java programmer, I never felt lost within this material. The author (who teaches at a community college) is obviously skilled in the proper presentation of difficult material, a talent which is lacking amongst most technical authors. I suspect his experience teaching Java and other classes at the college goes a long way in increasing his ability to communicate well.

Each chapter initially exposes you to the goals within, then covers underlying technologies, when necessary. After showing the Java code which represents the current project, he drills into the meat of the code with the "under the hood" sections. At first, some of the code was confusing to me, but by following the detailed explanation, I was able to clear everything up and feel confident that I knew what was going on. At the end of each chapter, the summary gives a nice review which helped me to "lock in" what I had learned and to put the chapter material into perspective.

The code samples were well written and they all compiled and ran fine (compiled code is included). The spider libraries on the CD are designed intelligently and easy to build on.

I'm very happy that I chose to read this book and found it to be very enjoyable, considering it's technical nature. I definately feel that I'm equipped to do creative bot coding in Java. Also, my knowlege of Java was generally deepened.

Rating: 2 stars
Summary: not for serious programmers
Review: The code presented in this book is painful to look at. For one thing, the author is not familiar with basic Java coding conventions and continues to use C conventions instead.

In addition to not knowing proper coding conventions, this guy has no clue about writing Java UIs - the code listed in this book actually has Visual Cafe tags all over the place!

As far as info regarding spiders/bots/aggregators - there is decent high level overview info in this book, but nothing for a real programmer. You will not learn how to build these things on your own, and the book relies on the helper libraries included on the cd-rom to accomplish anything. If you are hoping to build anything useful after purchasing this book, understand that you will only succeed if you include the com.heaton.* libraries included on the cd.

Rating: 2 stars
Summary: Not much information for such a long book
Review: The essence of this book could probably have been compressed into a few chapters. I read the whole thing in about a day, skimming over many sections (e.g. the structure of HTML, including discussion of anchor tags) that I, like most programmers, already know well. I think I would have preferred a focussed tutorial on Heaton's Bot package instead of a detailed but boring treatment of every technology (however elementary) used in the process of constructing spiders and bots.

Aside from this, Heaton is not a great writer. Attempting to be particularly organized and structured, he comes off as excessively stiff; I stopped counting the number of times he wrote "I will now show how to..."

I purchased this book expecting the process of constructing a spider or bot to draw on a range of specialized skills, but it appears to be quite simple: basic knowledge of Java network programming (i.e. sockets), HTTP, HTML and XML parsing would appear to suffice. I'm sure there is all sorts of complex stuff Heaton does not talk about, but I wish he had!

At the moment I'm wondering whether this book deserves a space on my finite bookshelf.

Rating: 2 stars
Summary: very limited usefulness
Review: This book is primarily a users guide for the libraries provided on the cd-rom. If you are looking for the information necessary to write spiders, bots and aggregators by hand, then this is not a very good book to use.

The source code is included for the libraries on the cd-rom, but the authors java skills are very weak which makes the code difficult to interpret.

Rating: 5 stars
Summary: Incredible Book!
Review: This book is simply one of the best computer books I have ever read! The author doesn't just cover programming like a lot of programming books do, he also explains how technology related to the subject matter works. Some of the related subjects he covers are socket programming, the HTTP protocol, form submission, cookies, multi-threading, JDBC, and XML. The coverage of these extra topics is not very in-depth, but for the average Java programmer, he provides enough information to fill in some gaps and tie it all together, and that's where the strength of this book lies. Another thing the author does well is build on each chapter, pacing it quickly enough to not bore but not so fast as to overwhelm the reader. And finally, the author provides sections in most chapters that explain in detail how the classes used in that chapter function.

The book is very well organized and laid out in a logical manner. The book's topics start at simple Java socket programming, progress through HTTP and HTTPS socket programming, then into HTML parsing and form posting, to dealing with other file/data formats, using cookies, then into spiders and multi-threaded spiders, then bots, then aggregators. He finishes up by discussing the ethics of using these constructs to access sites and the possible unintended consequences of doing so, and a brief discussion of how these constructs may fit into the web services arena.

A major plus in this book is the companion cd, which in addition to containing the source code for all the examples in the book has the classes already compiled, as well as a copy of the Jakarta/Tomcat server, since some of the examples use JSPs. The companion cd was supposed to contain a JDK as well, but it was left off in error. The software provided on the cd is reusable, and I will be integrating it into a project in the near future.

All in all, this is an outstanding book, and worth more than the money paid for it. The book does assume a few things: you have to know at least the basics of Java, some familiarity with the Internet, and some familiarity with JSPs would be a nice to have. This book is a must-read for advanced beginner and intermediate developers, but expert Java developers should think twice as they are likely already familiar with most of the material.

Rating: 4 stars
Summary: Good work - quite comprehensive!
Review: This is quite a complete book for anyone wishing to write spiders in any language (if he knows Java well enough to understand the examples). It covers protocol, parsing and technique of writing them along with good practices.

Pros:
1> Gives overview of HTTP, SMTP, DNS, HTTPS, Cookies and how they are programmed in Java.
2> Shows parsing of HTML, CSV, QIF, XML and provides guidelines to parse documents in other formats.
3> Many examples - simple to complex.
4> Great source code in Java accompanies it. You can just change it slightly and run your own spider.

Cons:
1> Less theoretical concepts.
2> Restrictive to spiders in WWW. Could have explored other areas where they are used.

Rating: 5 stars
Summary: Great Book!
Review: This just came out March 2002. I'm very impressed. It's a complete intro to Bots in Java. The CD has a bot.jar package that will really come in handy. It solves the problem of cookies, user authentication, and the referrer variable missing in javax.net.* Maybe someday Sun will fix those problems. The book covers useful stuff like posting forms, frames problems, cookies, SSL, authentication, parsing XML, QIF, and CSV formats etc. It also has several complete expamples. This book uses J2SE 1.3. My only complaint is it does not have examples or talk much about artificial intelligence or handling JavaScript although it does mention Rhino the JavaScript engine written in Java.

Neil


<< 1 >>

© 2004, ReviewFocus or its affiliates