As we are talking about binary variables, \(z = x \cdot y\) can be interpreted as \(z = x \textbf{ and } y\). We can resort to two standard formulations:
Formulation 1  Formulation 2 

\[\begin{align} & \color{darkred} z \le \color{darkred}x\\ & \color{darkred}z \le \color{darkred}y\\ & \color{darkred}z \ge \color{darkred}x+\color{darkred}y1\\ & \color{darkred}x,\color{darkred}y \in \{0,1\} \\ & \color{darkred}z \in [0,1] \end{align}\]  \[\begin{align} & \color{darkred} z \le \frac{\color{darkred}x+\color{darkred}y}{2}\\ & \color{darkred} z \ge \color{darkred}x+\color{darkred}y1\\ & \color{darkred}x,\color{darkred}y,\color{darkred}z \in \{0,1\} \\ \end{align}\] 
I always use the first form. But I quite often see the second formulation being used (e.g. [1]).
The second form can be interpreted as an aggregation of the first one by adding up \(z\le x\) and \(z \le y\).
There is a good reason to use the first form: it is tighter. For instance, the fractional value \((x,y) = (0, 1/2)\) yields \(z=0\) in the first formulation, but would allow \(z=0.25\) in the second formulation. The first formulation is so tight, we can relax \(z\in\{0,1\}\) to \(z\in[0,1]\).
My attempt to show graphically the difference:


Obvious formulation 1 seems to be the winner. But when I try this out on a model [2] I see:
Size  Model 1 Seconds/nodes  Model 2 Seconds/nodes 

\(n=20,k=10\)  2.9 / 3117  1.8 / 3018 
\(n=25,k=5\)  12 / 6245  11 / 7043 
\(n=30,k=4\)  15 / 5263  13 / 4471 
Well, this does not seem to prove my point. My preferred formulation is actually slightly underperforming.
Let's look at a different solver to see if that proves my point.
Size  Model 1 Seconds/nodes  Model 2 Seconds/nodes 

\(n=20,k=10\)  46 / 3082  1623 / 279364 
\(n=25,k=5\)  105 / 6860  1823 / 87396 
\(n=30,k=4\)  144 / 12008  213 / 11336 
These results look quite different. First, obviously CBC is slower than Cplex. No surprise there. But also we see formulation 1 is much better than formulation 2. This is more like I expected beforehand. I suspect the cuts produced by Cplex eliminated the advantage of formulation 1. We see that more often: some modeling tricks are becoming less important as solvers are getting smarter.
Notes:
 The meaning of \(n\) and \(k\) for the size of the problem is explained in [2].
 The variables \(z\) were declared as binary both in models 1 and 2.
Thanks for summing this very common trick up concisely. Looks like you have a typo in the second formula line, though. It has to be "x,y,z \in {0,1}".
ReplyDeleteFixed. Thanks.
Delete