In a huge NLP model (> 1e5 equations) the following constraint caused problems:
REL_P(S,R).. P_L(S,R) =E= P(S,R)**E('elas') * SUM[SS,Lam(SS)*P(SS,R)**(1-E('elas'))];
This can be reformulated into something more manageable, by actually adding constraints and variables:
QDEF(R).. Q(R) =E= SUM[SS,Lam(SS)*P(SS,R)**(1-E('elas'))];
REL_P(S,R).. P_L(S,R) =E= P(S,R)**E('elas') * Q(R);
Although we add extra rows and columns, the sparsity of the model and the “nonlinearity” (in terms of nonlinear nonzeros and in terms of nonlinear code size) will improve a lot. In this case the first formulation could not be solved by hitting memory limits, and the second version solved fine.
Obviously repeating expressions also happen in LP’s (and MIP’s). Also there they can cause problems (extra memory and poor performance), as they are in many cases not automatically removed by presolvers. When using sparse solvers, the number of nonzero elements is a better measure of problem size than the number of rows or columns, so adding a few rows/columns but saving a lot of nonzero elements is often a good trade-off. In the above case we add card(R) rows and columns but save up to [card(S)-1]*card(S) nonzeroes. In the above model we save nonlinear nonzeroes, which makes the trade-off even more worthwhile.
It would be great if a modeling system would recognize the fact that the same subexpressions are used over and over again, and give appropriate feedback to the user. I suspect this is not an easy task however. In compilers we sometimes see “common subexpression elimination”; this is a related issue.