<< 1 >>
Rating: Summary: An introduction to the rise of the smart Web. Review: Consisting of 19 survey papers by various authors, this book attempts to overview research into what the editors have called "Web Intelligence". All of the topics included in the book are interesting, and very important in both academia and industry as the World Wide Web continues to evolve into a more powerful research tool and Ecommerce engine. Due to space constraints, only the first eight articles will be reviewed here. After an introductory article on what will be emphasized in the book, the next article deals with how to interpret strong regularities in Web data in terms of user decision-making patterns, and then to describe an agent-based approach to the characterization of user behavior. This article stands out from the others in that it endeavors to be quantitative. For example, heavy-tailed probability distributions are used to model regularities in Web data, and the authors construct an artificial Web space that includes information foraging agents living in it. The authors then compare their model with real-world data, obtaining fairly good agreement. In the third article, the authors overview the work on DAML-S, a version of the DARPA Agent Markup Language, and which is one of the attempts to create a "semantic Web". The goal of the semantic Web is in their view is to construct reusable, high-level, generic procedures that can be customized for individual use, and also, and most importantly to be able to reason about the content that is the result of Web queries. The authors describe the 3 different conceptual areas of DAML-S, and the 3 different processes making it up. They also discuss the advantages in using agent-oriented software engineering in Web services. The emphasize strongly that the semantic Web should not be merely a knowledge repository, but should exhibit behavorial intelligence. The authors of the fourth article discuss the design and use of social agents in Web applications. Using Scheme, they have developed a language they call Q, to develop interaction scenarios between agents and users. I cannot speak to the efficacy of Q in building avatars and other agents since I have never used it, but the authors assert that it can execute hundreds of scenarios simultaneously, and allows for autonomous agents. Web-based education was one of the first uses of the Web, and in chapter 5 the authors show it can be improved via the use of agent technology. Their emphasis is on guidebots, which are animated agents or avatars that interact with learners via a combination of speech and gestures. They also describe the Advanced Distance Education (ADE) architecture for Web-based instruction, and discuss a medical application. Most interesting is their use of Bayesian networks in their construction of guidebots. The acquisition of business intelligence is discussed in chapter 6. The very difficult notion of "interestingness" whose definition plagues most research in artificial intelligence, is addressed in the context of relevant business information on the WWW. The authors discuss a system, coded in C++ and based on vector space representations and association rule mining, that will gather information on companies for eventual comparisons to be made between them. Five methods are used to compare a user site to a competitor site, and the time complexity of each is discussed. Chapter 7 overviews a technique for mining (negative) association patterns in Web usage data, called "indirect association". In this technique, one finds pairs of pages negatively correlated with each other, but that are accessed together via a common set of pages called the "mediator". Indirect association is supposed to give information on the interests of Web users who share common traversal paths, in order for example to target users for marketing. Crucial in the definition of indirect association is a measure for dependence between itemsets, and the authors discuss a few of these measures. Sequential indirect associations are defined, and the authors discuss three types of these: Convergence, which represents the different ways of entering a frequent sequence; Divergence, which illustrates how the interest of Web users being to diverge from the frequent sequence; and Transitivity, which illustrates how users can enter the frequent sequence through a particular page rarely go to another. The psuedocode for the "INDIRECT" algorithm is given, and the authors describe two methods to reduce the number of discovered patterns by combining indirect associations. The authors then describe how they validated their algorithm by testing it on Web server logs from a university site and an online Web store. They conclude from these tests that indirect associations are helpful in the identification of different groups of Web users who share a similar traversal path. The next chapter deals with some of the issues that are involved in the extraction of information from the Web, with emphasis on automatic extraction methods that use wrapper induction. A wrapper is a procedure that understands information taken from a source and translates it into a form that is then used to extract particular "features". The trick is to design a wrapper that is intelligent enough to work for many different sources made up of different presentation formats. The authors classify wrappers into manual, heuristic wrapper induction, and knowledge-based wrapper induction. After arguing that manual and heuristic wrapper induction are unsuitable for efficient and intelligent information extraction, they then concentrate attention on a knowledge-based wrapper induction, wherein wrappers are built automatically. Their implementation is called XTROS, written in Java, which does wrapper generation by first converting HTML sources into logical lines, then determining the meaning of logical lines, and then finding the most frequent pattern. The wrapper is then formatted in XML, and the information is then extracted by the interpreter of XTROS, which parses the XML wrapper to build extraction rules and then applies these rules to the search results. The authors describe their performance evaluation of XTROS using a precision and recall measure. The authors remark that XTROS is limited in that it only works for labeled documents, and point to the need for constructing a wrapper learning agent for multidomain environments.
Rating: Summary: An introduction to the rise of the smart Web. Review: Consisting of 19 survey papers by various authors, this book attempts to overview research into what the editors have called "Web Intelligence". All of the topics included in the book are interesting, and very important in both academia and industry as the World Wide Web continues to evolve into a more powerful research tool and Ecommerce engine. Due to space constraints, only the first eight articles will be reviewed here. After an introductory article on what will be emphasized in the book, the next article deals with how to interpret strong regularities in Web data in terms of user decision-making patterns, and then to describe an agent-based approach to the characterization of user behavior. This article stands out from the others in that it endeavors to be quantitative. For example, heavy-tailed probability distributions are used to model regularities in Web data, and the authors construct an artificial Web space that includes information foraging agents living in it. The authors then compare their model with real-world data, obtaining fairly good agreement. In the third article, the authors overview the work on DAML-S, a version of the DARPA Agent Markup Language, and which is one of the attempts to create a "semantic Web". The goal of the semantic Web is in their view is to construct reusable, high-level, generic procedures that can be customized for individual use, and also, and most importantly to be able to reason about the content that is the result of Web queries. The authors describe the 3 different conceptual areas of DAML-S, and the 3 different processes making it up. They also discuss the advantages in using agent-oriented software engineering in Web services. The emphasize strongly that the semantic Web should not be merely a knowledge repository, but should exhibit behavorial intelligence. The authors of the fourth article discuss the design and use of social agents in Web applications. Using Scheme, they have developed a language they call Q, to develop interaction scenarios between agents and users. I cannot speak to the efficacy of Q in building avatars and other agents since I have never used it, but the authors assert that it can execute hundreds of scenarios simultaneously, and allows for autonomous agents. Web-based education was one of the first uses of the Web, and in chapter 5 the authors show it can be improved via the use of agent technology. Their emphasis is on guidebots, which are animated agents or avatars that interact with learners via a combination of speech and gestures. They also describe the Advanced Distance Education (ADE) architecture for Web-based instruction, and discuss a medical application. Most interesting is their use of Bayesian networks in their construction of guidebots. The acquisition of business intelligence is discussed in chapter 6. The very difficult notion of "interestingness" whose definition plagues most research in artificial intelligence, is addressed in the context of relevant business information on the WWW. The authors discuss a system, coded in C++ and based on vector space representations and association rule mining, that will gather information on companies for eventual comparisons to be made between them. Five methods are used to compare a user site to a competitor site, and the time complexity of each is discussed. Chapter 7 overviews a technique for mining (negative) association patterns in Web usage data, called "indirect association". In this technique, one finds pairs of pages negatively correlated with each other, but that are accessed together via a common set of pages called the "mediator". Indirect association is supposed to give information on the interests of Web users who share common traversal paths, in order for example to target users for marketing. Crucial in the definition of indirect association is a measure for dependence between itemsets, and the authors discuss a few of these measures. Sequential indirect associations are defined, and the authors discuss three types of these: Convergence, which represents the different ways of entering a frequent sequence; Divergence, which illustrates how the interest of Web users being to diverge from the frequent sequence; and Transitivity, which illustrates how users can enter the frequent sequence through a particular page rarely go to another. The psuedocode for the "INDIRECT" algorithm is given, and the authors describe two methods to reduce the number of discovered patterns by combining indirect associations. The authors then describe how they validated their algorithm by testing it on Web server logs from a university site and an online Web store. They conclude from these tests that indirect associations are helpful in the identification of different groups of Web users who share a similar traversal path. The next chapter deals with some of the issues that are involved in the extraction of information from the Web, with emphasis on automatic extraction methods that use wrapper induction. A wrapper is a procedure that understands information taken from a source and translates it into a form that is then used to extract particular "features". The trick is to design a wrapper that is intelligent enough to work for many different sources made up of different presentation formats. The authors classify wrappers into manual, heuristic wrapper induction, and knowledge-based wrapper induction. After arguing that manual and heuristic wrapper induction are unsuitable for efficient and intelligent information extraction, they then concentrate attention on a knowledge-based wrapper induction, wherein wrappers are built automatically. Their implementation is called XTROS, written in Java, which does wrapper generation by first converting HTML sources into logical lines, then determining the meaning of logical lines, and then finding the most frequent pattern. The wrapper is then formatted in XML, and the information is then extracted by the interpreter of XTROS, which parses the XML wrapper to build extraction rules and then applies these rules to the search results. The authors describe their performance evaluation of XTROS using a precision and recall measure. The authors remark that XTROS is limited in that it only works for labeled documents, and point to the need for constructing a wrapper learning agent for multidomain environments.
<< 1 >>
|