Monday, January 8, 2024

GAMS listing file: missing Unicode support

Newer versions of GAMS allow UTF-8 encoded strings as labels. That is very welcome, as these labels may come from data sources that just use Unicode characters. However, when printing to the listing file, we miss proper Unicode support. At first, I thought, "OK, just a few misaligned tables. No big deal." Here is a constructed example showing this may be a bit more problematic.


The chars 𝛽 in the labels below are Unicode characters (U+1D6FD).
The name is "Mathematical Italic Small Beta"
In UTF-8 encoding they occupy 4 bytes:
0xF0 0x9D 0x9B 0xBD


set j /col1*col2/;

parameter p(*,*);
p('bbbb',j) = 1;
p('𝛽𝛽𝛽𝛽','col2') = 1;

display p;  

The output is:

This table is very misleading. The second row should have the 1.000 in column col2. If you can't trust the output of the display statement, things become a bit dicey. Admittedly, this example was carefully constructed to illustrate the point. Obviously, this needs to be fixed. 

Another issue I have not touched upon here is casing (lower vs upper case), a problem that arises as GAMS labels are case-insensitive. In my opinion (and experience) Unicode support needs to be done very rigorously and can not easily be done in some half-baked way. 

A notable exception is SQLite: this is built without full Unicode support by default. Of course SQLite runs on devices like smart phones, so it needs to watch resource usage. When built without full Unicode support, SQLite will not do Unicode case-insensitive comparison or case changes correctly. This behavior can be changed by building with full Unicode support. 

No comments:

Post a Comment