*
* nodes * set i 'nodes' /node1*node5000/; alias(i,j); * * random arcs (1%) * set A(i,j) 'arcs'; A(i,j) = uniform(0,1)<0.01; scalar numArcs 'number of arcs'; numArcs = card(A); display numArcs; |
Basically, this approach generates \(n^2\) random numbers \( r_{i,j} \sim U(0,1)\) (uniform distribution) and picks the arcs related to a value \(r_{i,j}\lt 0.01\). This is a randomized process, so we don't see exactly 250,000 arcs, but rather:
---- 16 PARAMETER numArcs = 249170.000 number of arcs
This operation is not exactly cheap. It takes 11.2 seconds on my laptop.
There are a few other approaches we can try.
Random locations
Instead of generating \(n^2\) random numbers, we can generate \(k=250,000\) pairs of \(i,j\) values (integers between 1 and \(n\)). A straigthforward implementation would be:
Unfortunately, this is extremely slow. I stopped it after 2,000 seconds: it was still not finished. GAMS is horribly slow when doing loops like this.
scalars k,n,nn,ni,nj;
n = card(i); nn = n*n/100; for (k = 1 to nn, ni = uniformint(1,n); nj = uniformint(1,n); A(i,j)$(ord(i)=ni and ord(j)=nj) = yes; ); |
Unfortunately, this is extremely slow. I stopped it after 2,000 seconds: it was still not finished. GAMS is horribly slow when doing loops like this.
Crazy indexing
A variant of the above approach is to try to trick GAMS into a vectorized version of this. Well, this is not so easy. Here is a version that uses leads/lags when indexing:
sets
A(*,*) 'arcs' k /node1*node5000,k5001*k250000/ ij /i,j/ ; parameter r(k,ij) 'random offset from k'; r(k,ij) = uniformint(1,card(i)) - ord(k); A(k+r(k,'i'),k+r(k,'j')) = yes; |
This requires some explanation.
- The set A is no longer domain-checked. We use funky indexing here, so we need loosen things up.
- The set k has 250,000 elements. The first 5000 are identical to \(i,j\).
- The parameter r contains \(2\times 250,000\) random integers between 1 and \(n\). We store them as offsets from the index \(k\). This sounds crazy but the next statement explains why this is done.
- The assignment to A is using leads. Think of it as a variant of A(k,k) = yes. This would populate a diagonal. Similarly, A(k+1,k) = yes shifts the diagonal by one. Using A(k+r(k,'i'),k+r(k,'j')) we index exactly the correct location.
- We may generate duplicates, so the number will be slightly below 250k arcs.
This beast actually works, but the performance is not great. The fragment takes about 13 seconds, so no gain compared to our original approach.
Python loop
Let's try some Python. We can build up sets using Python within a GAMS model:
set A(i,j) 'arcs';
$onEmbeddedCode Python:
from random import
seed,randint
seed(999) n = len(list(gams.get("i"))) nn = n*n//100 a = {} for k in range(nn): ni = randint(1,n) nj = randint(1,n) elem = ("node{}".format(ni),"node{}".format(nj)) a[elem] = 1 gams.set("A",list(a))
$offEmbeddedCode A
|
This is just a k-loop. We use here a Python dictionary to store elements as this handles possible duplicates correctly. This approach also takes about 11 seconds, so no gain compared to our first approach.
R to the rescue
So let's see if we can use R to speed things up:
We write from inside GAMS an R script that gets executed by rscript.exe. The steps are:
$set n 5000
$set inc data.inc * * R script * $onecho > script.R library(data.table) n <- %n% nn <- n^2/100 df <- data.frame( ni=sample(n,nn,replace=TRUE), nj=sample(n,nn,replace=TRUE)) df <- df[order(df$ni,df$nj),] v <- unique(paste0("node",df$ni,".node",df$nj)) fwrite(list(v),"%inc%",col.names=F,quote=F) $offecho $call '"c:\program files\R\R-3.5.0\bin\Rscript.exe" script.R' * * nodes and arcs * set i 'nodes' /node1*node%n%/; alias(i,j); set A(i,j) 'arcs' / $offlisting $include %inc% $onlisting /; |
We write from inside GAMS an R script that gets executed by rscript.exe. The steps are:
- sample the arcs,
- sort in the order that GAMS likes,
- create a string representation of the tuple, e.g. "node1.node2",
- remove duplicates (this means we end up with a number of arcs that is smaller than 250k),
- write to an include file (note: the function fwrite from the data.table package seems a bit faster than the function writeLines from the base package)
The include file is then processed by GAMS. To speed things up we remove echoing to the listing file.
This method is the winner: it takes just 3.5 seconds. Sometimes using a plain old text file as a communication channel is not so bad.
The code:
GDX file is slower
The code:
$set n 5000
$set gdx data.gdx * * R script * $onecho > script.R library(gdxrrw) library(dplyr) n <- 5000 nn <- n^2/100 df <- data.frame( ni=paste0("node",sample(n,nn,replace=TRUE)), nj=paste0("node",sample(n,nn,replace=TRUE)), stringsAsFactors = F) df <- distinct(df,ni,nj) nr <- nrow(df) wgdx("%gdx%",list(name="A", type="set", dim=2, form="sparse", ts="arcs", val=cbind(1:nr,1:nr), uels=c(list(df$ni),list(df$nj)))) $offecho $call '"c:\program files\R\R-3.5.0\bin\Rscript.exe" script.R' * * nodes and arcs * set i 'nodes' /node1*node%n%/; alias(i,j); set A(i,j) 'arcs'; $gdxin %gdx% $loaddc A |
is slower than using a text file. The reason is that wgdx is not as efficient.