Tuesday, March 21, 2017

Joins in GAMS, SQL or R

To produce the charts in (1) I needed to do a multiple join on some data. From the GAMS model in (2) we extract the the following data:

sets
  j
'tasks'  /job1*job10/
  t
'stages' /t1*t10/
;
parameters
   proctime(j,t)
'processing times for each stage'
   machine(j,t)
'machine numbers for each stage (zero based)'
;
variable x(j,t) 'start time of subtask';

What we want to arrive in R is the following:

image

i.e. a single data frame with slightly renamed columns (and a new column called end indicating when a sub-job finishes).

I used “approach 1” (pure SQL, described next) in (1), but there are other ways to attack the problem. I’ll describe a few.

Approach 1: pure SQL

Writing this into a SQLite database is very simple:

execute_unload "ft10";
execute "gdx2sqlite -i ft10.gdx -o ft10.db"
;

The database will contain a copy of all the data in the model, including the symbols we are interested in.

When loading the data we need to do two things:

  1. Join proctime, machine and x on (j,t)
  2. GAMS only stores non-zero elements, which we need to “repair”
  3. Add the end column.

The second issue can be illustrated by looking at the table machine:

image

We miss the records that have a zero value. We can reintroduce them as follows. First we can generate the Carthesian product of J × T using a simple SQL subquery:

image

Note that table j has a single column j and table t has a single column t. As we have 10 jobs and 10 stages, this Carthesian product yields a table of 100 rows. Adding the machine number to this table is easy with a left join where we join on (j,t):

image

The left join added the machine column but the missing values are represented by NA in R (or NULL in SQL). This is easily fixed:

image

We now add the other columns and also calculate the end column:

image

Approach 2: GAMS + SQL

It is easy in GAMS to do the join. In addition we can use a trick to keep the zeroes:

alias(*,job,stage,attribute);
parameter
report(job,stage,attribute);
report(j,t,
'machine') = EPS
+ machine(j,t);
report(j,t,
'start') = EPS
+ x.l(j,t);
report(j,t,
'duration') = EPS
+ proctime(j,t);
report(j,t,
'end') = EPS
+ x.l(j,t) + proctime(j,t);

execute_unload "ft10"
;
execute "gdx2sqlite -i ft10.gdx -o ft10.db"
;

The EPS is a placeholder for the zero values The GAMS parameter report looks like:

image

The GDX2SQLITE tool will map the EPS values into zero before storing them in the database, so on the R side we see:

image

We need to pivot on the attribute, value columns. Unfortunately this is not completely trivial in SQL. A standard way to do this is:

image

Approach 3: GAMS + R

We keep the GAMS part the same:

alias(*,job,stage,attribute);
parameter
report(job,stage,attribute);
report(j,t,
'machine') = EPS
+ machine(j,t);
report(j,t,
'start') = EPS
+ x.l(j,t);
report(j,t,
'duration') = EPS
+ proctime(j,t);
report(j,t,
'end') = EPS
+ x.l(j,t) + proctime(j,t);

execute_unload "ft10"
;
execute "gdx2sqlite -i ft10.gdx -o ft10.db"
;

On the R side we can read the parameter report as is and pivot using spread from the tidyr package:

image

Arguably this is the cleanest solution.

References
  1. Gantt Chart with R/Plotly, http://yetanothermathprogrammingconsultant.blogspot.com/2017/03/gantt-chart-with-rplotly.html
  2. Playing with job shop problem ft10 (1), http://yetanothermathprogrammingconsultant.blogspot.com/2014/04/playing-with-ft10-job-shop-1.html
  3. Pivoting a table: GAMS, SQLite, R and Python, http://yetanothermathprogrammingconsultant.blogspot.com/2016/01/pivoting-table-gams-sqlite-r-and-python.html