Computers internally work on numbers. This means that characters need to be coded as numbers. A typical arrangement is to use numbers from 0 to 255, because this range fits into a basic unit of data storage and transfer, called a (8-bit) byte or octet.

A character code is something that defines how that numbers corresponds to characters. Most of the characters have the same assignment for numbers 0 to 127, used for character that appear in English as well as in many other languages: the letters a-z plus their uppercase equivalents, the digits 0-9 and a few punctuation marks. Many of the code numbers in in this so-called ASCII set of characters are used for various technical purpose. For French texts, for example, you need additional characters such as accented letters. These can be provided by using code numbers in the range 128-256 in addition to the ASCII range., and this give room for letters used in most other Western European languages as well.

Thus you can use a single character code, called Latin 1, even for a text containing a mixture of English, French, Spanish and German, because these language all use the Latin characters with relatively few additions. However you quickly run out of numbers if you try to cover too many languages within 256 characters. For this reason, different character codes were developed. For example, Latin 1 is for Western European Languages, Latin 2 for several languages spoken in Central and Eastern Europe, and additional character codes exists for Greek, Cyrillic, Arabic etc. Character codes that use only the code numbers from 0-255 are called 8-bit codes, since such code numbers can be represented using 8 bits. Things change when you need to combine languages in one document and the languages are fundamentally different in their use of characters.

As you know the numbers of characters needed for Chinese and Japanese is very large. They just would not fit into a set with only 256 characters. Therefore, different strategies are used. For example,2 byte (octets) instead of one might be used for some character. On the other hand the character codes developed for the needs of East Asian Languages do not contain all the characters used in the world.

The solution to such problems and many other problems in the world of growing information exchange, is the introduction of a character codes that gives every character of every language a unique number. This number does not depend upon the language used in the text, the font used to display, the character, the software, the operating system or the device. It is universal and kept unchanged. The range of possible number is set sufficiently high to cover the current and future needs of all languages.

The solution is called Unicode, and it gives anyone the opportunity to say, “I want this character displayed and the number is….“, and have herself understood by all systems that support Unicode. This does not always guarantee a success in displaying the character, due to lack of suitable font, but such technical problems are manageable. Much widely used softwares including Windows, Mac OS X, and Linux, has supported Unicode for years. However, to use Unicode , all the relevant components must be “Unicode Enabled”.

Resources for Further Study :

http://www.unicode.org/

http://en.wikipedia.org/wiki/Unicode

-Abhishek Singh