Yet Another Math Programming Consultant: Joins in GAMS, SQL or R

Tuesday, March 21, 2017

Joins in GAMS, SQL or R

To produce the charts in (1) I needed to do a multiple join on some data. From the GAMS model in (2) we extract the the following data:

sets
j 'tasks' /job1*job10/
t 'stages' /t1*t10/
;
parameters
proctime(j,t) 'processing times for each stage'
machine(j,t) 'machine numbers for each stage (zero based)'
;
variable x(j,t) 'start time of subtask';

What we want to arrive in R is the following:

i.e. a single data frame with slightly renamed columns (and a new column called end indicating when a sub-job finishes).

I used “approach 1” (pure SQL, described next) in (1), but there are other ways to attack the problem. I’ll describe a few.

Approach 1: pure SQL

Writing this into a SQLite database is very simple:

execute_unload "ft10";
execute "gdx2sqlite -i ft10.gdx -o ft10.db";

The database will contain a copy of all the data in the model, including the symbols we are interested in.

When loading the data we need to do two things:

Join proctime, machine and x on (j,t)
GAMS only stores non-zero elements, which we need to “repair”
Add the end column.

The second issue can be illustrated by looking at the table machine:

We miss the records that have a zero value. We can reintroduce them as follows. First we can generate the Carthesian product of J × T using a simple SQL subquery:

Note that table j has a single column j and table t has a single column t. As we have 10 jobs and 10 stages, this Carthesian product yields a table of 100 rows. Adding the machine number to this table is easy with a left join where we join on (j,t):

The left join added the machine column but the missing values are represented by NA in R (or NULL in SQL). This is easily fixed:

We now add the other columns and also calculate the end column:

Approach 2: GAMS + SQL

It is easy in GAMS to do the join. In addition we can use a trick to keep the zeroes:

alias(*,job,stage,attribute);
parameter report(job,stage,attribute);
report(j,t,'machine') = EPS + machine(j,t);
report(j,t,'start') = EPS + x.l(j,t);
report(j,t,'duration') = EPS + proctime(j,t);
report(j,t,'end') = EPS + x.l(j,t) + proctime(j,t);

execute_unload "ft10";
execute "gdx2sqlite -i ft10.gdx -o ft10.db";