Unicode

Java uses the Unicode character set. Unicode is a two-byte character code set that has characters representing almost all characters in almost all human alphabets and writing systems around the world including English, Arabic, Chinese and more.

Unfortunately many operating systems and web browsers do not handle Unicode. For the most part Java will properly handle the input of non-Unicode characters. The first 128 characters in the Unicode character set are identical to the common ASCII character set. The second 128 characters are identical to the upper 128 characters of the ISO Latin-1 extended ASCII character set. It's the next 65,280 characters that present problems.

You can refer to a particular Unicode character by using the escape sequence \u followed by a four digit hexadecimal number. For example

\u00A9 © The copyright symbol
\u0022 " The double quote
\u00BD ½ The fraction 1/2
\u0394 Δ The capital Greek letter delta
\u00F8 ø A little o with a slash through it

You can even use the full Unicode character sequence to name your variables. However chances are your text editor doesn't handle more than basic ASCII very well. You can use Unicode escape sequences instead like this

String Mj\u00F8lner = "Hammer of Thor";

but frankly this is way more trouble than it's worth.


Previous | Next | Top | Cafe au Lait

Copyright 1997-9, 2002 Elliotte Rusty Harold
elharo@metalab.unc.edu
Last Modified January 25, 2002