Sunday, June 28, 2015

GAMS limits on labels

In GAMS strings are used for indexing. E.g.

image

There is a very tight limit on the length of these labels: 63 characters!! This sounds like a strange limit, and it surely is. It is also way too small to handle many datasets from sources like databases and spreadsheets. As these labels are really data, and come form different data sources we do not always have control over their format. That means sometimes we cannot read (otherwise correct) data. In those cases we have to spend time and effort to devise workarounds. In some cases we can truncate strings (e.g. using LEFT(column_name,63) in an SQL query). Note that this would still make it more difficult to put the solution back into the database. And in some cases we see that truncation will not yield unique names. This happens when some names are only different after 63 characters. In that case we have a real problem. This is not just a theoretical possibility: I received just last week a spreadsheet that showed this issue. In another application based on a MySQL database, I am forced to use ugly short codes instead of long descriptive names for the same reason.

So we can conclude: the 63 character limit is just inadequate. A tool like GAMS is supposed to help making a modeler more productive, and this limit is really not doing that. What should the limit be? I have heard that the GAMS people are pondering to make this limit 255 characters. This is much better, but I would argue that this is still the wrong approach. With software we typically have a trade-off between development cost and effort and productivity gains for the end-user. COTS (Commercial Off-the-shelf) software often tips the scale in favor of the user: there are just many users. I.e. making things easy for the programmer should be a secondary concern. So I would argue to spend a little more effort by the developer to implement a really long label functionality (e.g. up to 2^31-1).

If we look at the table below, we see what kind of limits are imposed by other software (often used as data source for GAMS models):

Software Limits Unicode
Excel 32,767 Yes
SQL Server char(n): 8,000
varchar(n): 8,000
varchar(max): 2^31-1
text: 2^31-1
nchar, nvarchar, ntext
Oracle char(n): 2,000
varchar(n), varchar2(n): 4,000
nchar,
nvarchar
DB2 char(n): 254
varchar(n): 32,672
CLOB: 2^31-1
Yes
MySQL char(n): 255
varchar(n): 65,535
tinytext: 255
text; 65,535
mediumtext: 2^16-1
longtext: 2^32-1
Yes
MS Access text: 255
memo: 65,536
Yes
SQLite text: 2^31-1 Yes

Unicode or UTF-8 support would be required to read tables like:

image

Most modern software can handle this kind of data. The table above is from Excel. A modern editor like Notepad++ or Atom can also handle this:

image  image

There is an argument to be made to allow Unicode labels. Not as simple as pure ascii though; here is a unicode table:

images/unichart-printed.jpg

(http://farmdev.com/talks/unicode/)