jBASE Configuration and Properties

JBASE_I18N

This variable executes the application in international mode.

NOTE: The value of this environment variable can be modified by a LOGTO command. The value of the JBASE_I18N variable will then be set according to the true or false value for the account.

JBASE_CODEPAGE

You can only set the JBASE_CODEPAGE environment variable to a valid code page available with the ICU package. The jcodepages command displays the list of currently available code pages. Conversion for input and output will only take place if configuring the account for international mode or the JBASE_I18N variable is set.

It is recommended to use UTF-8 for input and output, which efficiently eliminates code page conversion and reduces system resource requirements. Several commercially available telnet clients can communicate using UTF-8, in which cases the telnet client performs the conversion from the configured code page to UTF-8. Hence, it is important to ensure that the client is configured properly to ensure the correctness of the input and output code page for which the keyboard mapping required.

Code page conversion is only applicable when the JBASE_I18N environment variable is set. If this variable is not set, code page conversion does not occur, and all variables will be handled as bytes and not as characters. As configuration of the international mode is on an account basis, the state of international mode can change on execution of a LOGTO command.

JBASE_LOCALE

You can only set the JBASE_LOCALE environment variable to a valid locale available with the ICU package. The jlocales command displays the list of currently available locales. You can use the configured locale only if the account is configured for international mode or JBASE_I18N variable is set.

If the JBASE_I18N environment variable is not set, the locale is based on the underlying OS locale configuration and configured locale for the user ID has no effect. As configuration of the international mode is on an account basis, the state of international mode can change on execution of a LOGTO command. If an account is not configured for international mode, the JBASE_I18N environment variable will be unset as the result of LOGTO.

JBASE_TIMEZONE

You can only set the JBASE_TIMEZONE environment variable to a valid time zone available with the ICU package. The jtimezones command displays the list of currently available time zones. You can use the configured locale only if the account is configured for international mode or JBASE_I18N variable is set.

For example, the following environment variable configuration would configure a French user and country locale specific for France and code page set for latin1, ISO-8859-1.

JBASE_I18N=1
JBASE_CODEPAGE=iso-8859-1
JBASE_LOCALE=fr_FR

If the JBASE_I18N environment variable is not set, the timezone is based on the underlying OS timezone configuration and configured timezone for the user ID has no effect. As configuration of the international mode is on an account basis, the state of international mode can change on execution of a LOGTO command. If an account is not configured for international mode, the JBASE_I18N environment variable will be unset as the result of LOGTO.

Characters vs Bytes

LEN, SUBSTRINGS, X[n,m], INDEX

In international mode, the length and sub-string extraction works in characters and not bytes and resultant positions are character positions and not byte offset.

BYTELEN

The BYTELEN function has been provided to obtain the actual number of bytes rather than characters.

EXAMPLE:

The following source code example contains UTF-8 encoded characters representing the German u umlaut (0xC3 0xBC) and double s (0xC3 0x9F).

X = "Füßball";* String as UTF-8 sequence  "F.C3.BC.C3.9Fball"
CRT X
CRT "Character Length of X is ":LEN(X)
CRT "Byte Length of X is ":BYTELEN(X)
CRT "Substring[1,3] of X is ": X[1,3]

If executed in international mode with the Input/Output Code Page configured to ISO-8859-1 (Latin1), this code will produce the following output.

Füßball
Character Length of X is 7
Byte Length of X is 9
Substring[1,3] of X is Füß

NOTE: The length returned by the LEN function is the number of characters in variable X, whereas the length returned by the BYTELEN function is always the number of bytes in variable X.

Internationalisation Properties

This section provides the character, collation and conversion properties required for internationalization.

Character Properties

The following are the character properties involved.

UPCASE, DOWNCASE, ALPHA, MATCHES, MATCHFIELD

In international mode, functions use the configured locale to convert and/or test character properties.

The following source code example contains a UTF-8 encoded byte sequence representing the German ‘u’ umlaut (0xC3 0xBC).

X = "ü"    ;* this string held in source as UTF-8 "C3.BC"
CRT X: " becomes ": UPCASE(X)
IF ALPHA(X) THEN CRT X: " is alphabetic "
IF X MATCHES "1A" THEN CRT X: " is alphabetic "

If executed in international mode with the Input/Output Code Page configured to ISO-8859-1 (de_DE), this code will produce the following output.

ü becomes Ü
ü is alphabetic
ü is alphabetic

The following table shows the functions in the above output and their corresponding descriptions.

Function	Description
UPCASE	Converts the lower case u umlaut to the upper case equivalent, that is, the UTF-8 byte sequence 0xC3 0xBC becomes 0xC3 0x9C.
ALPHA	Tests the lower case u umlaut as an alphabetic character according to the configured locale (de_DE).
MATCHES	Tests the lower case u umlaut against the single alphabetic character according to the configured locale (de_DE)

Collation Properties

The following are the collation properties involved.

SORT, LOCATE, COMPARE, LE, LT, GE, GT

In international mode, statements use the configured locale to determine sort order.

A sort of the following UTF-8 encoded byte sequences using the SORT function will generate a different sort order depending on the configured locale.

locale configured for ‘en_US’
    cote                              stored as UTF-8 sequence ‘cote’
    coté                              stored as UTF-8 sequence ‘cot.C3.A9’
    côte                              stored as UTF-8 sequence ‘c.C3.B4te’
    côté                              stored as UTF-8 sequence ‘c.C3.B4t.C3.A9’
locale configured for ‘fr_FR’ (reverse accented collation)
    cote
    côte
    coté
    côté

NOTE: The word côte sorts BEFORE the word coté for the configured locale fr_FR

X = "côte"                ;* Source contains UTF-8 sequence "c.C3.B4te"
Y = "coté"                ;* Source contains UTF-8 sequence "cot.C3.A9"

The following table lists the statement and corresponding output generated in International mode when executed with the locale configured for French (fr_FR).

Statement	Output
IF X LT Y THEN CRT X:" is lower in collation sequence than ":Y	côte is lower in collation than coté

Conversion Properties

The following are the collation properties involved.

ICONV, OCONV, FMT

The implementation of conversions is by a set of jBASE library functions, which in turn invoke functions in the IBM Public License package (ICU). This package provides cross-platform open source libraries compliant with Unicode Standard 3.0 and currently supports over 170 locales independently of the system locales. Several input and output conversions depends on the configured locale.

For example, then following source code example generates different date formats based upon the configured locale when executing in international mode.

CRT OCONV(0,"D2/")
CRT OCONV(0,"D")

For example, this code produces the following if executed in international mode with a configured German locale (de_DE).

31/12/67
31 DEZ 1967

However, some conversions can be used to force an expected format regardless of locale. For example, the DE date format will always produce a European date format. The DG format is a new Global date format for YYYYMMDD.

CRT OCONV(0,"D2/E")       displays            31/12/67
CRT OCONV(0,”DG”)         displays            19671231

Internationalisation functions

This section provides the character, timestamp, byte count and conversion additional functions required for internationalization.

Timestamp Functions

The following are the timestamp functions involved.

TIMESTAMP, TIMDIFF, CHANGETIMESTAMP, MAKETIMESTAMP, LOCALDATE, LOCALTIME

The provision of additional functions assist with date and time internationalisation; these functions enable applications to obtain, convert and process a timestamp. These functions are available regardless of current state of international mode.

The following table shows the functions and their corresponding descriptions.

Function	Description
TIMESTAMP	Returns a timestamp of Universal Coordinated Time (UTC) as decimal seconds
TIMEDIFF	Returns the interval between two timestamps
CHANGETIMESTAMP	Generates a new timestamp by adjusting the supplied timestamp by a dynamic array, which specifies the adjustment values
MAKETIMESTAMP	Generates a timestamp using a specified time zone
LOCALTIME	Generates an internal time value using a supplied timestamp and time zone
LOCALDATE	Generates an internal date value using a supplied timestamp and time zone

Additional Functions

The following are the additional functions involved.

BYTELEN, LATIN1, LENDP, UTF8

The provision of additional functions helps with programs that need to know the actual real byte length of a variable as well as conversion functions for handling binary values. The conversion function should only be required when dealing with binary data, for example handling data to/from tape devices.

The following table shows the functions and their corresponding descriptions.

Function	Description
BYTELEN	Returns the number of actual bytes used for the string variable. You can use this function irrespective of the international mode status.
LATIN1	Convert a string variable from ISO-8859-1 to a UTF-8 encoded byte sequence. You can use this function irrespective of the international mode status.
LENDP	Returns the number of character display positions required in order to display the string variable. This function determines the display width of characters. For example, null character has a display width of zero; some Japanese Kanji characters require more than one display position, and so on. This function changes behaviour if not used in international mode.
UTF8	Converts a string variable from UTF-8 encoded byte sequence to the ISO-8859-1 (binary) equivalent. You can use this function irrespective of the international mode status.

jQL Dictionary Conversions and Correlatives

For dates and times, simple date format functions are applied to use the configured locale to support the standard conversions D and MTS. Formatting numbers through MR/ML/MD uses locale for Thousands, Decimal Point and Currency notation.

TimeStamp "W{Dx}{Tx}"

In addition, it includes a provided suite of conversions including A, F and I-types for timestamp functionality, which displays a generated timestamp for date and/or time in short, long, and full formats. These conversions also support non-Gregorian locales. The meaning of the components of the conversion is as follows:

W        - Is a new conversion code so not to clash with existing conversions.
D         - Date
T          - Time
x          - Format option: S = Short, M = Medium, L = Long, F = Full
"WDS" or "WTS" SHORT is completely numeric.12/13/52 or 3:30pm
"WDM"              MEDIUM is longer.               Jan 12, 1952
"WDL" or  "WTL" LONG is longer.                   January 12, 1952 or 3:30:32pm
"WDF" or  "WTF" completely specifies FULL.

jQL Locale-Based Collation

As a part of jBASE internationalization, jQL will now use collation tables that are specific for the user’s locale, when enabled for international mode. The keys are first passed to a lookup algorithm that converts the key into a collation key, which is tailored specifically for the user’s language. Using the collation key, the sort processor produces output in the order expected in the user’s locale.

When international mode is not enabled, the keys are sorted by the binary value of the individual characters as in the prior releases.

jQL Right Justified Sort

The primary purpose of right justified attribute definition is to produce the correct sort sequence and display properties for numeric and alphanumeric values. The use of right justified fields with completely non-numeric data affects the display and not sort order.

As part of jBASE internationalization, jQL uses a new algorithm for the right justified fields to provide optimal sorting of mixed numeric and alphanumeric fields. The field width specified in the attribute definition no longer affects the behaviour of the sort.

Status	Action
True	Passes non-numeric parts through the collation algorithm to produce collation key parts
False	Sorts the non-numeric parts left to right

Unicode Value	Represents
0x00E0	LATIN SMALL LETTER A WITH GRAVE
0x0155	LATIN SMALL LETTER R WITH ACUTE

Compiler

You need to convert all source files containing characters in the range 0x80 thru 0x255 for these characters to be represented in UTF-8 before compilation.

Conversion Utility

The jutf8 compilation tool helps with the file conversion. The first step is to restore the data in the normal way using a restore process working in binary mode. After the files have been restored, you need to use the following utility with the imported data files to convert the data. The syntax of the conversion utility is as follows:

jutf8 {-options} {filename {,...} }

The following table lists the utility options and their descriptions.

Option	Description
c	Indicates the code page for conversion. The default value is latin1.
d	Processes directories
f	Indicates the force mode to skip prompt for confirmation
-m MapFilePath	Uses specified map file for conversion
-s	Skips sample testing for file already converted
-u	Enables reverse conversion, that is, converts from UTF-8 to code page
-v	Indicates the verbose mode

The conversion utility, by default, will attempt to confirm that the data is not already converted into UTF-8. Directories are skipped by default unless the –d option is explicitly specified.

NOTE: The conversion of file content with binary data such as compiled programs may render the compiled object no longer usable. It is recommended that the program objects be cleared from folder before use of the utility on source files.

Conversion Map

You need to use the MapFilePath option to specify a file that describes the mapping of certain characters. For example, system delimiters, from and to the required hex value.

The map file describes how characters in the original file should be mapped from their current hex value to the required hex value before UTF-8 conversion. The following example maps any characters in the range 0x01-0x08 into what would normally be system delimiters before conversion to UTF-8. Therefore, character 0x04 is mapped to 0xFC and then converted to the two-byte UTF-8 encoded sequence 0xC4 0xBC, which does not clash with the system delimiter. This in turn represents the 32-bit Unicode value of 0x00FC.

MyMapFile
#From              To
0x01                0xFF
0x02                0xFE
0x03                0xFD
0x04                0xFC
0x05                0xFB
0x06                0xFA
0x07                0xF9
0x08                0xF8

NOTE: If the map file is specified along with the u option, it reverses mapping from/to.

Data Import and Export

The jBASE directory and SEQ drivers have been modified to support an additional IOCTL command, which provides data conversion from a specified code page to UTF-8 when reading from the native operating system file. This command can also be used when writing to the native file for the data to be converted from UTF-8 to the configured code page. This IOCTL is developed specifically for import and/or export of data to external applications and is not recommended for usage as part of an application for on the fly conversion. You can also use this IOCTL with the READSEQ and WRITESEQ statements.

The following is an example of using the IOCTL to convert data in a UNIX directory file from shift_jis, Japanese, to UTF-8 while reading the record from the native file. The record is written to a jBASE Hash File, without conversion. This IOCTL command will also return the previously configured Code Page for the File Descriptor.

NOTE: Hash files do not support this additional IOCTL command.

Convert directory record from CodePage shift-jis to UTF-8 and place into Hash file

INCLUDE JBC.h
OPEN 'MYDIRECTORY.' TO FILE ELSE STOP
OPEN 'MYHASHFILE' TO HASHFILE ELSE STOP
Setup Code Page for IOCTL command
CodePage ="shift-jis"
IF IOCTL(FILE,JIOCTL_COMMAND_SETCODEPAGE,CodePage) ELSE
CRT "Code page problem" ; STOP
END
IF CodePage NE "" THEN CRT "Previously configured Code Page : ":CodePage
Read and convert record from code page shift-jis to UTF-8
READ Record FROM FILE,"MyCodePage" THEN
CRT "No Chars ":LEN(Record), "No Bytes ":BYTELEN(Record)
WRITE Record ON HASHFILE,"MyUTF8"
END

jBASE Configuration and Properties

Environment Variables

Function Changes for International Mode

JQL Changes for International Mode

Pure Numeric keys

Mixed Alpha Numeric Sorting

Data Conversion

File Conversion

Error Message Files

Spooling

Printing