Saturday, March 25, 2023

Simultaneous equation models and data errors

My experience is that using (optimization) models is a great way to provide quality assurance on the data. Data collection can be very difficult and expensive. It can also be quite easy to have errors cropping up somewhere along the way. When using an optimization model, we will get hammered by such errors.

Thursday, March 16, 2023

Algorithm vs. model

From [1]:

We are given a plane defined by Ax+By+Cz-D = 0 where D is significantly larger than A,B,C and GCD(A,B,C) = 1. How would I find all points (x, y, z), where x,y,z are integers and >= 0, that lie on the plane in an efficient manner?

So the poster asks for an algorithm to find \(\color{darkred}x,\color{darkred}y,\color{darkred}z \in \{0,1,2,\dots\}\) such that \[\color{darkblue}A \cdot \color{darkred}x + \color{darkblue}B \cdot \color{darkred}y + \color{darkblue}C \cdot \color{darkred}z = \color{darkblue}D\] Besides the assumptions stated in the question, I'll further assume \(\color{darkblue}A,\color{darkblue}B,\color{darkblue}C,\color{darkblue}D\gt 0\).

Tuesday, March 14, 2023

Choosing between NLP solvers: interior point or active set.

One way to categorize (local) nonlinear programming (NLP) solvers is active set methods and interior point solvers. Some representative large-scale sparse solvers are:

  • Active set: CONOPT, SNOPT. These are using SQP algorithms.
  • Interior point: IPOPT, Knitro. Note: Knitro also contains an active set algorithm.
These are two very different types of algorithms. They work very well on different types of problems. If you do a lot of nonlinear optimization, it actually makes sense to have both types of solvers available. With modeling tools like AMPL and GAMS, it is very easy to switch between solvers. Testing performance between two solvers is then also a snap.

Here I want to discuss some factors that can help in deciding whether to use an active set solver or an interior point solver. 

Monday, March 6, 2023

Some approaches for moving data between MS Access and GAMS

Moving data between different environments is always more difficult than we hope. Here I list some approaches and actually try them out on a small dataset. We hit some bugs along the way and also a few conceptual stumbling blocks (even for this stylized example). We had some issues with Access as well as GAMS Connect. 

This question came up in an actual project. My idea was: "Let me show you how this can be done". I am afraid, I got carried away a bit. But it demonstrates that we should not underestimate these, at first sight, menial tasks. When the data set becomes larger, the problems compound. We can't eyeball the data, and statistically, it is more likely we encounter some problems.

Friday, February 24, 2023

Another fast MIP model: covering

In [1], the following problem is stated:

  • There is a collection of \(n=1,000\) test questions.
  • Each question covers a number of skills.
  • Given is a requirement for a number of questions for each required skill (e.g., 4 questions about skill 1, 3 questions about skill 2, etc.).
  • Create a test with the minimum number of questions that fulfills the requirements.
Of course, we see some remarks about NP-Hardness. Complexity theory is about bounds, about worst-case and asymptotic behavior. Even if we can't prove that a particular problem with a particular dataset given to a particular solver must be fast, it does not mean the opposite: that it must be slow. It is not at all unusual that we can solve things very quickly, even though the complexity bounds are terrible. Of course, this depends on the model, on the data set, and on the solver. Here, we see an open-source MIP solver can solve this particular problem in less than 0.01 seconds. Many well-educated people misunderstand complexity theory – to their detriment. They miss out on great ways to solve actual problems! Obviously, this is a problem with how complexity theory is taught. 

Thursday, February 16, 2023

Assigning jobs to machines without overlap

 Here we consider the following problem from [1]:

  • We have jobs with a given start time and completion time
  • Jobs can be repeated on given days (e.g. job 1 needs to run on Monday, Wednesday, and Friday)
  • We want to assign jobs to machines in such a way that there is no overlap
  • The objective is to minimize the number of machines needed to execute all jobs
The planning horizon is a week, and the problems has 50 jobs.

Wednesday, February 15, 2023

Supplier selection: an easy MIP

This is a fairly standard supplier selection model. It was posted in [1]. It is interesting as it is a good fit for a MIP model: it is simple to model, it solves fast, and it is not obvious how to solve without an optimization model.

The problem is:
  • We want to order items in different quantities from suppliers.
  • Suppliers have an available inventory for these items. This can be zero.
  • We can split the ordering over different suppliers.
  • The cost structure is as follows:
    • Shipping cost is a fixed cost per supplier.
    • Item cost is a variable per-unit cost. 

Monday, February 13, 2023

Populating SQLite databases

 GAMS has three easy ways to populate a SQLite database:

  1. Using the tool gdx2sqlite. This tool populates a SQLite database with data from a GDX file. This means we first have to export GAMS data to a GDX file. As there is quite some file I/O going on here (writing GDX file, reading GDX file, writing database), I would expect this to be slower than the next method.
  2. The new GAMS-connect facility. This does not use intermediate files, and directly copies records from in-memory data. This should be the fastest.
  3. Old fashioned CSV files. We first export data as a GDX file, and then use gdxdump to convert the data to a CSV file. Then sqlite can import the CSV file, and populate the database. There is much file I/O here, so this should be slow.

Saturday, January 28, 2023

Tiny non-convex quadratic model brings solvers to their knees

Here is a very small geometric problem:

Given \(n\) points in 2d space, find the smallest triangle that contains all these points.

Find the smallest triangle containing all points.

This looks like a simple problem. Somewhat to my surprise, my attempt here does not bear that out.

Tuesday, January 24, 2023

Export GAMS GDX file to different Python formats (CSV,Feather,Pickle)

A GAMS GDX file is a binary file with a number of GAMS symbols with their data. When collaborating with colleagues that work with Python, it may be useful to convert a GAMS GDX with relevant data to data files that Python can consume. 

The idea is here that a GAMS user can run the GAMS script, without knowing Python or having Python installed. (GAMS comes with its own somewhat barebones Python, which is used inside the GAMS script). On the other hand, the Python user will not need to have GAMS installed to read the generated data files. This is opposed to Python packages and tools that can interact with GDX files directly. You can see a few in the PyPI directory [1]. Those will require both a GAMS system and a Python environment.