Yet Another Math Programming Consultant: A strange series

Tuesday, November 22, 2022

A strange series

In [1] a question is posed about a somewhat strange series. Let $i \in \{ 0,\dots,n\}$. Then we require:

$\color{darkred}x_i \in \{0,1,2,\dots\}$,
$\color{darkred}x_i$ is equal to the number of times the value $i$ occurs in the series. In mathematical terms, one could say: \[\color{darkred}x_i = |\{j:\color{darkred}x_j=i\}|\] (here $|S|$ is the cardinality of the set $S$).

An example for $n=4$ is:

index 0 1 2 3 4
value 2 1 2 0 0

Note that we can tighten the bounds on $\color{darkred}x_i$ a bit. We know that $\color{darkred}x_i \in \{0,1,2,\dots,n\}$ (note that $\color{darkred}x_i = n+1$ can't happen).

The question is: how can we, for a given $n$, generate such a series? This problem can be conveniently expressed as a constraint programming (CP) problem. The constraint in Docplex is simply:

for i in I:
mdl.add(x[i] == sum(x[j] == i for j in I))

But what about a MIP formulation? The CP formulation is not a linear constraint so we cannot use that directly. One general modeling rule is:

In a MIP model, when we need to do some counting, use binary variables

We therefore introduce binary variables $\color{darkred}y_{i,j}\in \{0,1\}$, defined by \[\color{darkred}y_{i,j}=1 \iff\color{darkred}x_i=j\] We can do this by: \[\begin{align}&\color{darkred}x_i = \sum_j j\cdot\color{darkred}y_{i,j} \\ & \sum_j \color{darkred}y_{i,j} = 1\end{align}\] The counting of occurrences can now be stated as: \[\color{darkred}x_i = \sum_j \color{darkred}y_{j,i}\] These three linear constraints now form our MIP model. Note that the variables $\color{darkred}x_i$ can be substituted out (they can be recovered afterward from the optimal levels of $\color{darkred}y_{i,j}$). So, the remaining linear model is:

Linear MIP constraints
\[\begin{align}& \sum_j j\cdot\color{darkred}y_{i,j} = \sum_j \color{darkred}y_{j,i} && \forall i \\ & \sum_j \color{darkred}y_{i,j} = 1 && \forall i \\ & \color{darkred}y_{i,j}\in \{0,1\} \end{align}\]

Just add a dummy objective and you are ready to go.

The solution looks like:

----     30 SET i  size of problem

i0,    i1,    i2,    i3,    i4


----     30 VARIABLE y.L  

            i0          i1          i2

i0                               1.000
i1                   1.000
i2                               1.000
i3       1.000
i4       1.000


----     30 PARAMETER x  

i0 2.000,    i1 1.000,    i2 2.000

Notes:

Not all $n$ yield a feasible solution. E.g. $n=0,1,2,5$ are infeasible.
But a feasible solution for any $n\ge 6$ is:

index 0 1 2 n-3
value n-3 2 1 1

The zero values are not printed here. We see that the solution is very sparse. Of course, this general solution makes the problem slightly less interesting.
The solutions seem unique. A no-good cut that forbids a previously found solution $\color{darkblue}y^*_{i,j}$ is \[\sum_{i,j} \color{darkblue}y^*_{i,j}\cdot \color{darkred}y_{i,j} \le \color{darkblue}n\] When running with this additional constraint, the model becomes infeasible, indicating there is no other solution.

Conclusion

It is not uncommon that a CP (Constraint Programming) formulation can use integer variables, whereas the corresponding MIP formulation needs binary variables. This small model is a good example of this.

Update: I looked at the AMPL and Minizinc code provided in the comments a bit. Of course, that yielded some more thoughts. My notes are in [2].

References

Simple Integer Optimization Problem: docplex CP model works but equivalent PuLP+CBC model is infeasible?, https://or.stackexchange.com/questions/9328/simple-integer-optimization-problem-docplex-cp-model-works-but-equivalent-pulp
High-level MIP modeling, https://yetanothermathprogrammingconsultant.blogspot.com/2023/01/high-level-mip-modeling.html

Appendix: GAMS model

set
   i 'size of problem' /i0*i4/
;
alias (i,j);

parameter v(i) 'values';
v(i) = ord(i)-1;

binary variable y(i,j);
variable z 'dummy objective';

equations
   e1 'substituted out x'
   e2 'one value for each x(i)'
   obj 'dummy objective'
   ;

e1(i).. sum(j,v(j)*y(i,j)) =e= sum(j,y(j,i));
e2(i).. sum(j, y(i,j)) =e= 1;
obj.. z =e= 0;

model m /all/;
solve m minimizing z using mip;

abort$(m.modelstat<>%ModelStat.Optimal% and m.modelstat<>%ModelStat.Integer Solution%) "No feasible solution";

*
* recover x
*
parameter x(i);
x(i) = sum(j,v(j)*y.l(i,j));

display i,y.l,x;

$stop

*
* optional:
* check if solution is unique
*
equation nogood;
nogood.. sum((i,j),y.l(i,j)*y(i,j)) =l= card(i)-1;

model m2/all/;
solve m2 minimizing z using mip;

abort$(m2.modelstat<>%ModelStat.Infeasible% and m2.modelstat<>%ModelStat.Integer Infeasible%) "Multiple solutions exist";

2 comments:

MikaelNovember 22, 2022 at 6:55 AM
The problem can be formulated in MiniZinc as follows

```
int: n = 100;
set of int: N = 0..n;

array[N] of var N: x;
array[N] of N: index = array1d(N, [i | i in N]);

% Constraint variant 1: Individual sums for each value
constraint forall (i in N) (
x[i] = sum(j in N) (x[j] = i)
);

%Constraint variant 2: Global cardinality constraint
include "globals.mzn";
constraint global_cardinality(x, index, x);

solve satisfy;

output [show(x)];
```

Running in the latest MiniZinc IDE with Gecode 6.3.0 with all solutions search takes 1.8s and 74011 failures for an array of length 100 using constraint variant 1 only that uses the same formulation as in the Docplex model.

The second constraint variant instead uses the global cardinality constraint (https://www.minizinc.org/doc-2.6.4/en/lib-globals-counting.html#mzn-ref-globals-counting-global-cardinality) to specify the counting structure. For the same data as above, it takes just 0.118s and 197 failures to find all 1 solutions.
ReplyDelete
Replies

Add comment