Unicode character encoding description

Unicode is the name of the national languages character encoding (Russian, Turkish, Chinese ...) into the computer binary form.

All characters used to be stored in one byte originally, it means that it was possible to code a total of 256 different characters. The first 128 characters are standardized as so-called ASCII table, where all the lower case letters (a-z), capital letters (A-Z) without diacritic (i.e. national dependent characters) and digits (0, 1-9) are defined, including the special characters like comma, semicolon, colon and many others.
The remaining 128 characters can be used for coding the special national language characters. There are for example the characters á, č, ü, etc., used for Central European languages and ф, и, б, ъ, etc. characters used in the Russian. Unfortunatelly the number of these special national language characters is too high, therefore all of them cannot be coded using the remaining 128 resources. That is why the special code pages has been created for different national language groups. For example, the Win-1250 codepage used for the Central European languages, the Win-1251 codepage containing all Cyrillic alphabet characters, etc. This way it was possible to save international language texts, but it was not possible to combine the characters of different codepages into one text file.

The special language characters coding was finally solved by the intoduction of the Unicode character coding (for more details please visit the server of the Unicode association at http://www.unicode.org) This coding system is able to code all the special worldwide language characters correctly. The solution is based on the fact, that a character is no longer stored in one byte (only 256 possible options), but it is stored into 2 bytes (i.e. 65536 possible options). This coding system is branded UTF-16.
The main advantage of UTF-16 is a very simple management of all possible characters, the disadvantage is the double size and ASCII table incompatibility. The incompatibility problem is considerable when text files are being saved. Therefore the alternative Unicode coding system has been created, which is operating with variable length of the saved character. The ASCII table characters are stored into 1 byte, while the non-ASCII characters are stored into 2 or more bytes (the 1st byte contains the information whether another byte is following). This coding system is branded UTF-8. It is mainly used for text (XML, HTM) files. While working with such texts, the characters are transformed into UTF-16 in the computer´s operating memory, making it work faster this way.

The UTF-16 and UTF-8 character coding is used by the PROMOTIC system since version 7.1.0 and therefore it is possible to create multilingual applications without the need of character codepage swithing or using special language versions of the Windows OS. The UTF-16 coding is used while the application is running (texts in panels, in scripts, etc.), while the UTF-8 coding is used for the text files (e.g. XML text files for Macro expression $.text).

PROMOTIC 9.0.31 SCADA system documentation MICROSYS, spol. s r.o.

Send page remark Contact responsible person

Navigation:

- PROMOTIC SCADA/HMI system

- Documentation contents

- Subsystems

- National languages

- How to create an application using different national languages

- Unicode

- Localized texts manager

Promotic

Unicode character encoding description