1. I have this maximum likelihood problem. When I maximize the
likelihood function, I get some results; when I take the log of the
likelihood function and maximize it, I get different results. I
suppose this is due to the other problem I seem to have, which is too
many local maxima. Is this the typical case? Or is there something
fundamentally wrong in the way I am posing the problem?
2. Econometricians and statisticians like to use log-likelihood
functions instead of likelihood functions. The official argument is
that "it is easier to manipulate/optimize". Yes, it is easier to
derive, for humans. But it transforms a function that is bounded into
something unbounded. If the optimization is going to be made by a
computer does that really help or does that make it more difficult for
the computer/algorithm? I want to do whatever is the more reliable
technique. I hope this is not one of those "case-by-case" things.
- Taking the log can make the likelihood function better behaved. Some of the very large values that can be formed in sub-expressions can often be prevented this way.
- Many good NLP solvers allow for bounds on the variables that allow you to steer the NLP solver away from the regions where the function can not be evaluated.
- Use an alternative method to find a good starting point, e.g. a method of moments estimate.
- Some of these problems can be difficult. Mixture models are a good example.