Tuesday, July 31, 2012

Statistical Fraud Detection

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2114571

It is not very easy to create fabricated data that is really random. In one case the author describes, the means for a variable in different situations were really different but surprisingly the standard deviations were almost identical. That is a sign of possibly invented data. The author uses simulation to find out how likely these very close SD are.

This study has lead to some retractions of papers and people leaving their academic jobs. See: http://www.nature.com/news/uncertainty-shrouds-psychologist-s-resignation-1.10968.

MIP Gap increasing

As the discussion in http://lists.gnu.org/archive/html/help-glpk/2012-07/msg00104.html indicates a MIP relative gap can increase. See also http://yetanothermathprogrammingconsultant.blogspot.com/2011/06/mip-gap-not-increasing.html. There are a few definitions around. GAMS likes to use:

image

Many solvers (e.g. Cplex) use:

image

This looks like a big problem, but when we say “stop at 5% gap” both definitions are quite close. Gurobi seems to use (https://groups.google.com/forum/?fromgroups#!topic/gurobi/RKbC1qiFZ-A):

image

which is almost the same.

Do we really need an alternative definition as suggested in http://lists.gnu.org/archive/html/help-glpk/2012-07/msg00104.html. I would not think so. The reason is that for almost all models the current definitions work just fine: the gap will only decrease. Only in the case where one bound is at the other side of 0 than the other bound we are somewhat in trouble. However the number of models where

image

is really small.

Monday, July 30, 2012

AMPL-GAMS conversion

The discussion in http://www.or-exchange.com/questions/6021/translate-ampl-code-to-gams is a little bit painful to watch.

First of all there are a number of (small) syntax issues. It is always difficult to write error-free GAMS (or C, Fortran, ….) code without compiling it. I remember from college an exam where we were shown some Pascal code and we were supposed to identify all syntax error. This was not an easy task. This is clearly something a compiler is better at than a human being. In general when presenting some code, it is often good to test it first and see if it at least passes the compiler without syntax errors.

My second point is more substantial. I believe a line-to-line translation will often lead to poor models and unsatisfactory results. It is better to take a step back and re-implement the actual purpose and intent of the original model than to stick to the precise way it was implemented. I have been involved in converting dozens of non-trivial models, and I would say that is the main lesson from those exercises.

Finally the AMPL model is not really well-posed. An equation like:



param h := 11;
var t >= -3.1416, <= 3.1416;
t_r : cos(h*t) = 1;




should raise some questions. I believe the meaning is:


image


This is better implemented as an MINLP. (Or it can be enumerated in this particular case).



If the proposed NLP formulation would be any good then we no longer would need MIP solvers! This is in the same league as using x*(1-x)=0 to model a 0-1 restriction. Actually it is even worse in some respects.



Probably before worrying about a GAMS translation it would be good to go back to the drawing board and improve the model.

Thursday, July 26, 2012

Very large GAMS restart file

From an actual model:

C:\projects\GFACTCOM-WebVersion\Web\scripts>dir ..\..\sims.g00
Volume in drive C has no label.
Volume Serial Number is 9EE2-A434

Directory of C:\projects\GFACTCOM-WebVersion

03/17/2012  07:43 PM     2,894,799,102 sims.g00
               1 File(s)  2,894,799,102 bytes
               0 Dir(s)  239,426,981,888 bytes free

C:\projects\GFACTCOM-WebVersion\Web\scripts>

Restarting from this guy brings the smallish 8GB server to its knees (we need about 9 GB of memory, but GAMS is very Virtual Memory unfriendly). 

Wednesday, July 25, 2012

Running under 32 bit or 64 .Net environment

In pure .Net code one probably does not need to know. But when linking to legacy native DLLs one may need to know. The following is an easy test. A pointer is 8 bytes in a 64 bit world and 4 bytes when in 32 bit:

C:\projects\generaldynamics\vbnet\ConsoleApplication6\ConsoleApplication6\bin>type ..\Program.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

 

namespace ConsoleApplication6

{

    class Program

    {

        static void Main(string[] args)

        {

            Console.Out.WriteLine("{0}", IntPtr.Size);

        }

    }

}

C:\projects\generaldynamics\vbnet\ConsoleApplication6\ConsoleApplication6\bin>x64\Release\ConsoleApplication6.exe

8

 

C:\projects\generaldynamics\vbnet\ConsoleApplication6\ConsoleApplication6\bin>Debug\ConsoleApplication6.exe

4

Watson

I visited yesterday a presentation about IBM’s Watson. This was the system that won Jeopardy! It was interesting to hear about how IBM is going about commercialization of this technology e.g. in the healthcare industry. Another observation is that IBM is capable to populate the product team with 100-200 people. That is way larger than many OR product and project teams combined.

www.ibm.com/watson

http://www.heatonresearch.com/content/free-and-open-software-behind-ibm%E2%80%99s-jeopardy-champion-watson

Friday, July 13, 2012

Finding number of decimals in data

I was asked for some GAMS code that produces the number of decimals provided in a table column. One should realize that a number like 0.1 is not exactly representable in binary, so we need to apply a certain tolerance. Here is what I came up with:

sets
   i
/i1*i4/
   j
/j1*j4/
;

table data(i,j)

    
j1  j2     j3   j4

i1    1  0.1  0.01  100.0001
i2    2   2   0.01   0.01
i3    3   3   0.01   0.01
i4    4   4   0.01   0.01
;
*
* find number of decimals
*
parameter numdec(j) 'number of decimals';
numdec(j)=0;


loop
((i,j),
 
while
(abs(round(data(i,j),numdec(j))-data(i,j))>1.0e-15,
    numdec(j) = numdec(j) + 1;
  )
);

option
data:5;
option
numdec:0;
display data, numdec;

This produces:

----     27 PARAMETER data 

            j1          j2          j3          j4

i1     1.00000     0.10000     0.01000   100.00010
i2     2.00000     2.00000     0.01000     0.01000
i3     3.00000     3.00000     0.01000     0.01000
i4     4.00000     4.00000     0.01000     0.01000


----     27 PARAMETER numdec  number of decimals

j2 1,    j3 2,    j4 4

The first column has integers so the number of decimals is zero for that column.