I am a full-time consultant and provide services related to the design, implementation and deployment of mathematical programming, optimization and data-science applications. I also teach courses and workshops. Usually I cannot blog about projects I am doing, but there are many technical notes I'd like to share. Not in the least so I have an easy way to search and find them again myself. You can reach me at email@example.com.
Tuesday, January 20, 2009
> I'm doing median regression with lp_solve. My LP-model is > > min sum(i, s1(i)+s2(i)) > s.t. > sum(j, a(j)*x(i,j)) + s1(i) - s2(i) = b(i) > s1(i),s2(i) >= 0
> > where > i=1..n observations, > j=1..m describing dimensions, > a(j) parameters to be identified, > x(i,j) describing values for observation i, > b(i) observed values, > s1(i), s2(i) deviations of the described value from the observed value. > > For realistics instances I have > 100.000 observations. Many instances > have rows i with equal x(i,j) but different b(i) (describing variables are > equal but observations different). I wonder if there is a modeling trick > to aggregate these rows into 1 or 2 constraints.
Don't know. But I have some suggestions to try out:
If j is large you may want to solve a larger but sparser problem where you prevent repeating the same sum(j, a(j)*x(i,j)) but using extra variables
y(k) = sum(j, a(j)*x(k,j))
y(k) + s1(i) - s2(i) = b(i)
for suitable combinations (k,i).
You could reduce the number of variables again by a different trick: