Appendix
This section provides details about the possible performance issues during internationalization, desktop applications which supports internationalization process, and a brief on the future of UTF.
Performance Issues
By operating in international mode, it is inevitable that certain functions will suffer in terms of application performance, as listed in the following table.
|
Function |
Description |
|---|---|
|
LEN |
Scans variables counting characters, not simply return the number of bytes |
|
LOCATE |
Uses the locale for the sort order |
|
SORT/COMPARE |
Uses the locale for the sort/compare order |
|
MATCHES/MATCHFIELD |
Determines if characters are numeric, alpha, and so on through locale |
|
ICONV/OCONV |
Indicates the date, time and currency conversions for the locale |
|
ALPHA, ISPRINT |
Indicates that the properties must be based on the locale |
|
INPUT/PRINT |
Indicates the code page conversion to and from UTF-8 |
Generally, LEN function returns the current byte length of the array, which is always kept up-to-date as the array increases or decreases in size. In international mode, the LEN function must return the number of characters rather than the number of bytes in the array. As a result, the array must be traversed to count the characters, causing a decrease in performance.
LOCATE usually compares strings directly, irrespective of the locale. In international mode, the locale is used during comparison. The same holds true for MATCHES, MATCHFIELD, SORT, COMPARE and property tests, since variables must first be converted to Unicode.
If international mode is enabled, conversion between code pages is required for terminal I/O. However, it is a relatively slow operation. Whenever possible, it is ideal to use terminal emulators and so on, which are capable of sending and receiving UTF-8 eliminating code page conversion and reducing the CPU overhead of conversion.
As all strings must be converted to UTF-8 encoding before compilation, all read/write data are presumed to be UTF-8 encoded and there should be no overhead to other functions, except when functions are working on a character basis like substring extraction.
If an account is not configured for international mode, the overhead is a simple bit test in a few functions.
Desktop Applications
Desktop applications vary in their Unicode support, resulting in limited internalization support as listed below.
- Limited Interfaces in Win98/ME
- VB 6 (Unicode is only supported on NT/Win2000)
- Microsoft Office 32 bit supports Unicode.
- 32-bit COM only supports Unicode.
- OLE DB, ODBC, ADO, RDO all COM components
- Java handles everything as Unicode
UTF-8 and Future
The industry is converging on UTF-8 and Unicode for all internationalization processes. Microsoft NT is built on a base of Unicode. All the following supports Unicode.
- AIX, Sun, HP/UX
- All new web standards, such as HTML, XML and so on
- Latest versions of Netscape Navigator and Internet Explorer
UNIX support for Unicode for directory names is provided the UTF-8. Majority of UNIX distributors and developers foresee Unicode eventually replacing older legacy encodings, primarily in the UTF-8 form because of the difficulties faced.
As part of future developments, UTF-8 may be used exclusively in the following.
- Text files (source code, HTML files, email messages, etc.)
- File names
- Standard input and standard output, pipes
- Environment variables
- Cut and paste selection buffers
- Telnet, modem, and serial port connections to terminal emulators; and
- Any other places where byte sequences used to be interpreted in ASCII.
For example, terminal emulators such as xterm or Linux console driver transforms every keystroke into the corresponding UTF-8 sequence and sends it to the stdin of the foreground process.
If it is certain that an application will only ever use ASCII characters, internationalization may be not be required. However, with UTF-8 all ASCII characters stay the same. On the other hand, if providing an application to any additional markets is a possibility, internationalization must be considered definitely as a development process.
It is best to consider internationalisation impacts in the early development stages of software products to eliminate significant application in the later stages. Internationalisation refers not only translation but is also considered a standard for development and quality.
It is a fact that internationalisation can lessen the performance of some important functions in the finished software product. However, for a global marketplace it is an important business priority. Carefully consideration and understanding the process of internationalization will make gains in the development lifecycle and improved product quality.
In this topic