Sunday, June 28, 2015

GAMS limits on labels

In GAMS strings are used for indexing. E.g.

image

There is a very tight limit on the length of these labels: 63 characters!! This sounds like a strange limit, and it surely is. It is also way too small to handle many datasets from sources like databases and spreadsheets. As these labels are really data, and come form different data sources we do not always have control over their format. That means sometimes we cannot read (otherwise correct) data. In those cases we have to spend time and effort to devise workarounds. In some cases we can truncate strings (e.g. using LEFT(column_name,63) in an SQL query). Note that this would still make it more difficult to put the solution back into the database. And in some cases we see that truncation will not yield unique names. This happens when some names are only different after 63 characters. In that case we have a real problem. This is not just a theoretical possibility: I received just last week a spreadsheet that showed this issue. In another application based on a MySQL database, I am forced to use ugly short codes instead of long descriptive names for the same reason.

So we can conclude: the 63 character limit is just inadequate. A tool like GAMS is supposed to help making a modeler more productive, and this limit is really not doing that. What should the limit be? I have heard that the GAMS people are pondering to make this limit 255 characters. This is much better, but I would argue that this is still the wrong approach. With software we typically have a trade-off between development cost and effort and productivity gains for the end-user. COTS (Commercial Off-the-shelf) software often tips the scale in favor of the user: there are just many users. I.e. making things easy for the programmer should be a secondary concern. So I would argue to spend a little more effort by the developer to implement a really long label functionality (e.g. up to 2^31-1).

If we look at the table below, we see what kind of limits are imposed by other software (often used as data source for GAMS models):

Software Limits Unicode
Excel 32,767 Yes
SQL Server char(n): 8,000
varchar(n): 8,000
varchar(max): 2^31-1
text: 2^31-1
nchar, nvarchar, ntext
Oracle char(n): 2,000
varchar(n), varchar2(n): 4,000
nchar,
nvarchar
DB2 char(n): 254
varchar(n): 32,672
CLOB: 2^31-1
Yes
MySQL char(n): 255
varchar(n): 65,535
tinytext: 255
text; 65,535
mediumtext: 2^16-1
longtext: 2^32-1
Yes
MS Access text: 255
memo: 65,536
Yes
SQLite text: 2^31-1 Yes

Unicode or UTF-8 support would be required to read tables like:

image

Most modern software can handle this kind of data. The table above is from Excel. A modern editor like Notepad++ or Atom can also handle this:

image  image

There is an argument to be made to allow Unicode labels. Not as simple as pure ascii though; here is a unicode table:

images/unichart-printed.jpg

(http://farmdev.com/talks/unicode/)

Friday, June 26, 2015

Microsoft has ownership of R?

From an (ex-) intellectual property lawyer at Adobe (http://www.infoworld.com/article/2940508/big-data/hey-microsoft-a-rewrite-of-the-r-language-is-a-silly-idea.html): 

“Microsoft, unlike Tibco, can claim ownership of the underlying R code. By buying Revolution Analytics, Microsoft gained that ownership”

I was worried. But luckily it looks like this was misinformation. The new version of the article says:

“Correction: An earlier version of this article incorrectly assumed that Microsoft acquired the rights to R Project from Revolution Analytics. As Revolution Analytics chief community officer David Smith has since stressed, "R is owned by the R Foundation," not Microsoft.”

MPS pictures

neos_799838

http://blogs.sas.com/content/operations/2015/06/25/finding-the-beauty-in-optimization-models-visualizing-mps-data-sets/

Somehow I find these pictures not very helpful in my modeling work. One reason is that problems that don’t display this structure may be very well decomposable. The ordering of rows and columns is very important to detect this structure visually. E.g. I often use a time index t as last index in my variables and equations. As the last index runs the fastest, that will not give these nice pictures. Also modeling systems will likely export all variables x() before y() – i.e. ordered by variable (equation) name. In general you will need to reorder rows and columns to make these pictures meaningful.

Wednesday, June 24, 2015

Nonlinear system test function

The following paper:

image

has an interesting test function for a large system of nonlinear equations:

image

This function originates from:

image

Here it has a fixed starting point:

image

Solving a triangular system

This system of nonlinear equations actually forms a triangular system. This means a good preprocessor can solve this just by solving small 1x1 problems. To be precise: first solve for x1, then for x2 etc. Note that these tiny 1x1 problems are possibly nonlinear. In this case they are for sure nonlinear. We write the model as:

image

I.e. 10k equations and 10k variables. Note that x(i-1) for the first i is zero in GAMS.

CONOPT

Using GAMS/CONOPT we see

CONOPT 3         24.4.5 r52258 Released May 26, 2015 WEI x86 64bit/MS Windows

    C O N O P T 3   version 3.16D
    Copyright (C)   ARKI Consulting and Development A/S
                    Bagsvaerdvej 246 A
                    DK-2880 Bagsvaerd, Denmark

   Iter Phase Ninf   Infeasibility   RGmax    NSB   Step InItr MX OK
      0   0        7.3122859754E+06 (Input point)
                                Pre-triangular equations:    10000
                                Post-triangular equations:       0
      1   0        0.0000000000E+00 (After pre-processing)
      2   0        0.0000000000E+00 (After scaling)

** Feasible solution to a square system.

I.e. no “real” work was needed for this problem: the presolver was able to solve the complete problem. CONOPT is really the smart here compared to some other solvers. Below some results.

PATH, KNITRO

Note that with the PATH solver we were not able to solve this problem:

** EXIT - other error.

I was not able to see if the preprocessor did anything here.

With KNITRO we see:

KNITRO presolve eliminated 0 variables
and 0 constraints.

It probably does only a linear presolve. This results in a subsequent failure:

EXIT: Terminate at infeasible point because the relative change in solution
      estimate < xtol.

Wednesday, June 17, 2015

Tight big-M values

It is well known large big-M values can cause big headaches for MIP solvers. There are many examples where something like M=9999999999 is used, leading to very wrong results. In some cases we can use a MIP model to calculate tight values for big-M constants. In this case I have a model where the MIP model to calculate the tightest bound is actually using the big-M itself:

image

This is of course not a useful approach. Luckily there are a few alternatives:

  • Keep (1) and most of (2), and we end up with a case where it is possible to find a bound by just looking at the data and so some simple calculations (no MIP is needed). From the problem (a design problem) we know this is a really good bound.
  • More sophisticated we can relax only equations (3), and solve a MIP to find a slightly tighter bound. We need to do this for several yk's. But the models are small and solve fast (we can even solve them in parallel). That is the approach I am going to take.

In GAMS we can write something like:

eq3(k)$(not relax(k)).. y(k) =L= <expr>;

to turn on or off an equation based on set relax. This allows us to re-use this equation unaltered in both this preprocessing step, in the real model and in some reporting step.

Thursday, June 4, 2015

Adding symmetry-breaking constraints

On some models symmetry poses a real problem for solvers. Here is an example of a sports related scheduling problem. We achieve a speedup of 35x by adding constraints that break a number of symmetries in the model. The results here are with Gurobi.
image

Modern solvers have also built-in facilities to detect and do something about symmetry. E.g. Gurobi has the option Symmetry which can be set to 2 (aggressive). I never had much luck with this. Indeed the results are:

image

Cplex shows very similar behavior:

image

Note that Cplex provides a large number of settings for the symmetry option:

image

That is probably just an educational tool to teach the user a lesson about combinatorial explosion: finding the right settings for your model.

Often solver suppliers say this is a solved problem:

image

This is an example that shows the fixes are at least insufficient. The algorithms do not handle this problem always in a satisfactory way.