<< 1 >>
Rating: Summary: Very comprehensive Review: This book describes a methodology and gives pieces of advice for developing speech applications. The focus is on telephony applications. The book is based on the experience of the authors in developing such applications at Nuance.The book is organized into four parts: 1.Introduction 2.Requirement gathering 3.Detailed design 4.Development and Tuning Each part starts with a description of the general principles guiding the development of a speech application. They end with an "applied" example showing how these principles are used in a real application. The introduction provides an overview of speech technology and an overview of the methodology (requirements, detailed design, development/tuning) used to develop a speech application. This methodology is used as a guide for the rest of the book. The requirement gathering part covers meeting with the company that wants to deploy the speech application and getting information from them. The same kind of information as for other software projects is required: business case, target customers, environment integration, scope of the system, etc. Two interesting additions to the usual process are: 1.Specifying the persona. How should the system be perceived (serious, funny, etc.)? This will impact the prompts, the selection of the voice actor, and the design of the dialog flow. 2.Specifying the type of interaction: system directed or user directed. The former relies on grammars. The latter relies on SLM and robust parsing. This has a huge influence on design and realization. The detailed design phase is concerned with designing the dialog flow, the prompts and the grammars. The authors put an emphasis in developing systems that (1) sound good and (2) are efficient. Sounding good means developing prompts that abide to spoken language rules (by opposition to written language) and paying attention to prosody. The sections on prompt design and prosody are very informative. Efficiency is ensured by making the dialog flow nicely. Techniques include thinking in terms of user scenarios, providing shortcuts to common tasks, educating users about efficient ways of using the system. Efficiency is also improved by helping users to recover from errors efficiently. Techniques here include quick confirmation strategies, providing help prompts, and providing access to main menu/operator. The development and tuning part focuses mainly on tuning grammars and working with the voice actor. Tuning the grammar is done to ensure appropriate coverage while maintaining good recognition accuracy. Tuning must be based on real data since it is difficult to predict how people will use the system. Working with the voice actor is an important part of the system development. The authors give pieces of advice on how to have successful recording sessions. The book has a nice balance of general principles and pieces of advice that can be directly applied. Compared to Kotelly's book, it has a more in-depth coverage of the topics. Compared to Balentine's book it provides a broader view of the development process as well as more detailed explanations of the principles behind the recommendations. On the minus side, the book is solely based on the experience of the authors. Although this experience is extensive, it seems that parts of the book are somewhat biased (e.g., SLM vs. grammar-based speech recognition, high focus on personas). It is not always clear when the numbers given in the books are based on real experience and when they are invented by the authors for the mock application. Some of the pieces of advice may also be difficult to directly apply in practice, since they depend on using vendor tools. In my opinion this book should be required reading for developers of telephony applications and providers of platforms for speech application development.
<< 1 >>
|