Unicode is the name of the national languages character encoding (Russian, Turkish, Chinese ...) into the computer binary form.
All characters used to be stored in one byte originally, it means that it was possible to code a total of 256 different characters. The first 128 characters are standardized as so-called
ASCII table, where all the lower case letters (a-z), capital letters (A-Z) without diacritic (i.e. national dependent characters) and digits (0, 1-9) are defined, including the special characters like comma, semicolon, colon and many others.
The remaining 128 characters can be used for coding the special national language characters. There are for example the characters á, č, ü, etc., used for Central European languages and ф, и, б, ъ, etc. characters used in the Russian. Unfortunatelly the number of these special national language characters is too high, therefore all of them cannot be coded using the remaining 128 resources. That is why the special code pages has been created for different national language groups. For example, the
Win-1250 codepage used for the Central European languages, the
Win-1251 codepage containing all Cyrillic alphabet characters, etc. This way it was possible to save international language texts, but it was not possible to combine the characters of different codepages into one text file.
The special language characters coding was finally solved by the intoduction of the
Unicode character coding (for more details please visit the server of the Unicode association at
http://www.unicode.org) This coding system is able to code all the special worldwide language characters correctly. The solution is based on the fact, that a character is no longer stored in one byte (only 256 possible options), but it is stored into 2 bytes (i.e. 65536 possible options). This coding system is branded
UTF-16.
The main advantage of
UTF-16 is a very simple management of all possible characters, the disadvantage is the double size and
ASCII table incompatibility. The incompatibility problem is considerable when text files are being saved. Therefore the alternative
Unicode coding system has been created, which is operating with variable length of the saved character. The
ASCII table characters are stored into 1 byte, while the non-ASCII characters are stored into 2 or more bytes (the 1st byte contains the information whether another byte is following). This coding system is branded
UTF-8. It is mainly used for text (XML, HTM) files. While working with such texts, the characters are transformed into
UTF-16 in the computer´s operating memory, making it work faster this way.
The
UTF-16 and
UTF-8 character coding is used by the PROMOTIC system since version
7.1.0 and therefore it is possible to create multilingual applications without the need of character codepage swithing or using special language versions of the
Windows OS. The
UTF-16 coding is used while the application is running (texts in panels, in scripts, etc.), while the
UTF-8 coding is used for the text files (e.g. XML text files for
Macro expression $.text).