Monday, November 16, 2009

Modeling after collecting data

I am involved in a policy evaluation project where lots of effort has been spent on collecting data. Now that we are actually starting to run optimization models using this data, the solutions give rise to other questions. That happens a lot: the solutions of first versions of a model, do not really answer questions. They more often give rise to new questions, and show that some questions are not really well-posed or even relevant. As a result some of the data suddenly seems less important, and at the same time, some new data would be really useful to solve models that help answering new questions. The lesson is really, that it is often a good idea not to delay building the model until you have all the data. Instead try to build early versions of the model as soon as possible, in conjunction with the data collection effort. This can mean we need to “invent” data. This is a useful effort in itself: it requires to form some detailed understanding of a problem that really helps down the road. Building models can help focus the mind on what is important, what we don’t know yet, and what are relevant questions to ask. It helps to pose these issues more sharply than is otherwise possible. It is very difficult to think about all these things in an abstract way: the model helps to identify problems succinctly.