Wednesday, May 22, 2013

Data Models and Optimization Models

Just had another example of what I see more often. In this case we deal with a fairly small data model. Unfortunately the db guru has mixed up flow and stock variables. When developing a scheduling model the difference between a time period and a point in time is rather important, and I need to explicitly deal with this in the model. The concept of time seems often to be misunderstood.

In this case, I tried to raise the issue but the db people became defensive and angry, so I dropped it quickly. In such a situation it is best to make some reasonable assumptions (e.g. about if something is measured at the beginning or the end of a time period) and be prepared to change the model when the first results are presented.

More general: there is much more hand waiving possible in db design than in the implementation of optimization models. We often have to be much more precise even (may be especially) if the db design is beautifully documented with all kind of fancy diagrams (these sometimes give a false impression of precision).

Another observation that further strengthens this argument that in almost all cases data from a database contains errors. As optimization models typically see the whole thing ("simultaneous equations")  we often see errors in the data that were never observed by the db people. Referential integrity goes only so far....

1 comment:

  1. I once dealt with a well-designed, well-administered and very reliable mainframe database that was chock full of totally incorrect data entered by apparently unmotivated employees. Stamp's Law (http://en.wikipedia.org/wiki/Josiah_Stamp,_1st_Baron_Stamp#Quotes) extends beyond statistics.

    ReplyDelete