Typically income data is presented in income classes:
|Income Class||Number of Observations|
In order to find the mean income based on such data we can fit e.g. a Pareto Distribution. One way of doing this is with a max likelihood estimation procedure. From http://www.math.uconn.edu/~valdez/math3632s10/M3632Week10-S2010.pdf we have:
This is easy to code in GAMS, using a Pareto distribution:
We use here that F(0) = 0 and F(INF) = 1 for the CDF of the Pareto distribution.
Note this approach can be also used to estimate the number of millionaires (ie income > 1e6) even though this number is hidden in the last income class.
Looking at http://www.jstor.org/stable/1914015 I was a little bit confused by
however I think we can arrive at the earlier likelihood function. The factor n! can be dropped as it is constant. Taking logs we see:
The last sum can be dropped as it is also constant. (Of course this can also be deducted directly from the product in EQ2).