Wednesday, May 20, 2015
R has data frames, which form arguably the most important data structure in R.
Python has a DataFrame as part of pandas, the toolkit for doing data analysis.
Matlab has a new Table data type. Their statistical toolbox still has a DataSet.
.Net has a DataTable class which I use often.
I am missing similar "standard" facilities in C++, Java (ResultSet is not really the same) and in modeling systems like AMPL and GAMS.
Friday, May 15, 2015
My data is usually not that big that I cannot move it from the RDBMS to the place where I do the “real work”.
Tuesday, May 5, 2015
In http://www.optimization-online.org/DB_HTML/2015/04/4891.html we see some interesting numbers on generation times of large models in different modeling environments. AMPL and Julia/JuMP are doing very well on these benchmarks. E.g.
As https://plus.google.com/+VictorZverovich/posts/RtHfqgwh5QX indicates: AMPL is faster than Gurobi/C++. This does not come as a surprise: AMPL is around quite a while and is known to be quite fast. GAMS is slow compared to AMPL especially on the lqcp and acpower instances. Note that in general solution times are much larger than the modeling system needs to generate the problem. Only very easy LPs can be relatively expensive to generate.
Why is GAMS so slow on these instances? In https://github.com/mlubin/JuMPSupplement we can find much information so we can reproduce things (very praiseworthy to make this available).
We do this in 67 seconds: a little bit better than reported. Not much I can do to speed this up: the structure of the offending equation pde has leads and lags which is not something GAMS is always doing extremely fast. Usually my client models are large and sparse and do not have this structure.
I actually think the equation pde is not 100% correctly formulated. With these discretizations it is important to pay extra attention to the border. This is especially the case as GAMS will set all terms with leads and lags that exceed the boundary are implicitly set to zero. Always inspect the equation listing in GAMS when there are leads and lags involved. However these issues are probably not important for these measurements.
Here we are actually slower than reported. The pure generation time is not that bad: 11 seconds. The main problem we see here is that the GAMS/Gurobi link is inefficient in handling the setup of the quadratic terms for these large models. This turned out to be a problem with Gurobi: adding individual quadratic constraints is expensive. A fix is on its way. Update: fixed in Gurobi 6.0.4.
This is pretty bad for a somewhat small and somewhat dense model. We can make the model easily larger and hopefully sparser and “less” non-linear. After adding some extra options to reduce the size of the listing file we see:
We see again that GAMS generates the model quickly, but that inside the solver link we have some inefficiencies. I am not sure what is happening in the 3 lost minutes, but it could be related in setting up the second derivatives. Further improvements can be obtained by telling GAMS to consider fixed variables as parameters.
Some general points:
- Generation time is often a minor part of the total turn around time
- Solvers are in most cases much slower to solve the problem than the time the modeling system needs to generate the model
- In many models we spend more time in data manipulation than in pure model generation. Many practical models have most of the code devoted to data manipulation (data import, data conversion, checks, aggregation, consolidation etc.)
- I believe some models will behave better with upcoming version of GAMS and Gurobi. These very large examples are excellent to find weak spots in the modeling software and will lead to improvements.
- I am really impressed with the JuMP performance.
- The models are very large. Lots of meaningful models are way smaller than these (but may be more complex). Systems like Pyomo should not be dismissed just on these numbers.
The paper is an interesting read. It touches upon many concepts that are not widely known.