Wednesday, May 12, 2010

Max likelihood estimation with income classes

Typically income data is presented in income classes:

 Income Class Number of Observations 0-\$20000 1036 \$20000-\$40000 495 \$40000-\$60000 201 \$60000-\$80000 102 \$80000+ 38

In order to find the mean income based on such data we can fit e.g. a Pareto Distribution. One way of doing this is with a max likelihood estimation procedure. From http://www.math.uconn.edu/~valdez/math3632s10/M3632Week10-S2010.pdf we have:

(EQ.1)

This is easy to code in GAMS, using a Pareto distribution:

 set j 'bins' /b1*b5/; table bin(j,*)         lo     up   observations b1       0  20000      1036 b2   20000  40000       495 b3   40000  60000       201 b4   60000  80000       102 b5   80000    INF        38 ; parameters n(j),lo(j),up(j); lo(j) = bin(j,'lo'); up(j) = bin(j,'up'); n(j) = bin(j,'observations'); * Pareto distribution * See: Evans, Hastings, Peacock, "Statistical Distributions", 3rd ed, Wiley 2000 * find mean income using MLE variables a,c,L; equation loglik; a.lo=1; c.lo=1; a.l = 100; c.l = 1; loglik.. L =e= sum(j, n(j) * log[ (1-(a/up(j))**c)\$(ord(j)1) - 0\$(ord(j)=1)] ); option optcr=0; model mle /all/; parameter mean; solve mle maximizing L using nlp; mean = c.l*a.l/(c.l-1); display mean;

We use here that F(0) = 0 and F(INF) = 1 for the CDF of the Pareto distribution.

Note this approach can be also used to estimate the number of millionaires (ie income > 1e6) even though this number is hidden in the last income class.

Looking at http://www.jstor.org/stable/1914015 I was a little bit confused by

(EQ.2)

however I think we can arrive at the earlier likelihood function. The factor n! can be dropped as it is constant. Taking logs we see:

(EQ.3)

The last sum can be dropped as it is also constant. (Of course this can also be deducted directly from the product in EQ2).