The Unicode Standard, Version 4.0

Home :: Books :: Computers & Internet

Arts & Photography
Audio CDs
Audiocassettes
Biographies & Memoirs
Business & Investing
Children's Books
Christianity
Comics & Graphic Novels
Computers & Internet
Cooking, Food & Wine
Entertainment
Gay & Lesbian
Health, Mind & Body
History
Home & Garden
Horror
Literature & Fiction
Mystery & Thrillers
Nonfiction
Outdoors & Nature
Parenting & Families
Professional & Technical
Reference
Religion & Spirituality
Romance
Science
Science Fiction & Fantasy
Sports
Teens
Travel
Women's Fiction

	The Unicode Standard, Version 4.0
	List Price: $74.99 Your Price: $64.04

Product Info

Reviews

<< 1 >>

Rating: 5 stars

Summary: All the Languages of Man
Review: Anyone dealing with XML or java soon runs into Unicode because this is the standard for representing characters in electronic form in those computer languages. Java, for instance, was designed from its inception to use Unicode. Earlier computer languages like C and C++ can have routines added to handle these, while C# uses XML and hence Unicode.

But chances are, when you deal with Unicode, you only deal with a subset. Often only a small subset at that, unless you are using Chinese/Japanese. Typically you work with ascii and the codes for your spoken language if that is not a Western European language. Very few of us deal with much more than this.

Which illustrates the appeal of the book. The Big Picture. ALL of Unicode. The breadth is stunning. It shows the written form of every major spoken language and many minor ones. Has the pictograms for Chinese [of course]. But also the symbols for Khmer, Canadian Aboriginal, Tamil, Syraic, et cetera, et cetera. Thumbing through this, you may encounter languages that you did not even know existed. It is one thing to say that we live in a multilingual world. But it is another to actually see it expressed comprehensively at the most basic level.

There are two audiences for this book. The first is any computer person who has to deal with issues of internationalisation.

But another audience is every Department of Languages or Cultural Anthropology in a university. If this describes your background, then you should know that you do not need facility in computing to appreciate the significance of this book. You can use it as a standard reference, akin to the Oxford English Dictionary vis-a-vis the English language. Look, ignore the computer stuff in the text. Yes, you can do this. The book groups related languages into common chapters. The explanatory text is lucid and the graphics for the languages lets you easily cross compare. Of course, at a higher level of meaning like sentences, you will need specialised texts in those languages. But to understand a language, you need to start at its letters or pictograms.

Think of this book as an index into all the languages of man.

Rating: 4 stars
Summary: New version of one of the most-used standards
Review: One reason for the wide acceptance of the Unicode standard is that the Unicode consortium has made it so freely available. There's no point in my discussing in detail what is in this volume when you can peruse PDF files of the entire work on the Unicode website (minus only chapter division graphics).

Browse through the book just like you would in a bookstore or library. Print out parts of it or all of it for free if you want. Well, it is free if you don't count the cost of paper (about 1500 sheets or twice that for simplex printing), cost of a binder (or maybe two binders) and the time you would have to spend punching the holes.

If you are mainly or only interested in particular sections of the standard then printing only those sections may be a reasonable thing to do.

On the other hand the price is *very* reasonable for an 8Â½" Ã— 11" hardbound book with 1,462 pages. If it's the sort of book you know you want for browsing and for reference then it is likely you will want it in this nicely bound copy.

Like the previously published versions of the Unicode standard, this book is a beautiful book that is useful to those who don't need or want to get into the technical details of character properties and rules for bi-directional display and other necessary rules for displaying the characters. But for the actual use of many characters you will have to consult other lists outside the Unicode book or files, e.g. dictionaries and grammars of various languages or explanations of symbols used in various fields of mathematics.

Language and writing systems are messy and inconsistant and handling them systematically and coherently cannot be made easy. Accordingly the rules and explanations in this standard are by necessity often long and involved and couched in technical language. It can't be avoided that, for example, one must sometimes distinguish carefully between _characters_, _glyphs_, _graphemes_, _grapheme clusters_, _ligatures_ and _digraphs_ and whether one character is a _canonical equivalent_ of another character or sequence of characters or a _compatibility equivalent_ of another character or sequence of characters or just similar to another character or sequence of characters.

The Unicode character set is still a work in progress. Version 4.0 may not even approach the half-way mark in encoding every character that has been used in normal text records by human beings for which a meaning is known. No-one has ever tried to produce a list of characters on this scale before. No-one yet knows how many distinct characters there are.

But 4.0 covers 96,382 characters from *almost* every script currently used for modern languages and from some ancient scripts as well including Ugaritic cuneiform, Cretan Linear B and the ancient Cypriot syllabary. (Sumerian/Akkadian cuneiform is being worked on and Egyptian hieroglyphics will eventually follow.)

Included are a plethora of technical symbol characters including mathematical characters, chess pieces, die faces, characters needed for modern western music notation, characters needed for Byzantine music notation, ornamental dingbats and so much more. All of it is now at the fingertips of every computer user -- that is if fonts that contain the characters are installed.

Finding fonts that display some of these characters is still a problem. :-(

But it would be a worse problem if these characters weren't assigned to a common character set. The past practice of numerous special fonts for various symbols and scripts which disagreed with one another on how the characters were encoded produced a horrible mess.

Large as it is, with 40% more pages than version 3.0, the book doesn't contain the whole standard. Increasingly as the standard has expanded tabular material has been dropped from the printed volumes and replaced with references to data files available on the website or on the CD that comes with the book.

The end of section 3.2 specifies six files found as Annexes on the website and on the CD which "are essential parts of version 4.0" including an explanation of the bidirectional algorithm which appeared in the printed text for earlier releases. And there are many mentions in the printed standard of other files available on the CD or website. A binder containing printouts of this material is necessary if you want a truly complete hardcopy of the entire 4.0 standard.

Unfortunately the 4.0 HTML files are carelessly laid down on the CD with external links pointing to files on the Unicode website and not to the corresponding files on the CD. Graphics are sometimes missing though the only file I think this matters with is StandardizedVariants.html which has a number of variant character images. (The data in this short file should have been in the book).

If you work online you probably won't notice anything wrong but you also are likely not to notice that after clicking on a link you are viewing a file from the Unicode website instead of a file on the CD. That may matter in the future if you need to reference a 4.0 file and don't observe that the file you are actually looking at is from the website and is a "latest version" file that has been updated beyond 4.0. If you are working offline you can avoid this, but it is annoying to have to manually search for the file by name because the link fails.

Also, although the Readme.txt file on the CD mentions "mapping tables" and files with "the extension .UNI", these useful conversion tables which were included on the CD's with previous releases are missing on the 4.0 CD. But they are available on the website.

This is a minor caveat. I suspect most people will use the website in any case rather than the CD.

Rating: 5 stars
Summary: Essential reference for modern programming
Review: The Unicode character set is among the most widely used and least known of the international software standards. Java programmers have used it every day for a decade or so, but barely one in ten appear to know anything about it.

The content of ISO standard 10646 (successor to 8-bit ISO 646), goes way beyond just a charcter set. It contains information critical to the correctness of any program that steps outside the English-language world, i.e. every program on the Internet, and many others sooner or later. This is the basis for correct handling of numerals (there's a lot more than 0 to 9), letters, and text. It's also the explanation for some program behaviors that might otherwise baffle a programmer, or at least a programmer with the wit to be baffled.

More than just crucial, the content of this standard is plain fun. Its snippets of information from every major world language give wonderful insight into how people express themselves. It drives home the delighful diversity of human language and experience. It's also a near-bottomless source of stump-your-friends trivia.

I admit, I'll never use every fact in this incredible assembly. I use a lot of the information, though, and I use it as the point of entry into every discussion of internationalization and localization of software.

Rating: 5 stars
Summary: An indispensable resource
Review: This book is one that every programmer should have access to. Packed with all of information concerning the latest standards, with explanations, this is the reference that I use whenever I need data regarding Unicode mappings. I recommend it to all of my students and have asked all libraries where I have influence to add it to their collection.
There is also a CD included with the book. It contains a database of the current and all past versions of the Unicode mappings, a series of Unicode technical reports and an installable version of the Unibook Character Browser, a small utility for viewing character charts and properties. Invaluable if you prefer electronic versions of the data.

<< 1 >>