Monday, March 1, 2010

Statistical Learning Algorithm - Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE) is a statistical method used for estimating parameters of some statistical model that may be fitted for existing data. The idea behind MLE is simple: Given a set of observations, we need to choose the parameters that maximizes the likelihood that this statistical model produces these data.

1. General Ideas
Two cases, discrete and continuous populations, are involved in discussion. We only discuss the situation where X is a continuous variable; cases for discrete variables are basically very similar.
Assume X is a continuous variable, and it probability density function (PDF) is and X1, X2, ..., Xn are samples from the population. X1, X2, ..., Xn are independent and identically distributed (i.i.d.). Thus the cumulative probability density should be . When are fixed values, this is exactly the probability density at X1 = x1, X2 = x2, ..., Xn = xn; but when x1, x2, ..., xn are given, it becomes a function of . We call this new function likelihood function. The significance of is that it represents the likelihood of the given observations; as we have "observed" those data, then we need to maximize their probability through this function, i.e. we need to choose the parameters that maximize .

2. How to Compute MLE
Suppose we now have a group of observations X1, X2, ..., Xn, and we need to estimate those parameters that would maximize that likelihood function mentioned above. This is where the name "Maximum Likelihood Estimation" comes from. The log() is a monotonically increasing function, so we have:


To get parameters that maximize L is the same to maximize log(L). In most cases, it is much easier to do so. We compute derivatives of log(L) with respect to , and let them to be zero. Then we get the following:

This is called the system of likelihood equations. The solutions of the above equation system, if can be verified to maximize L, are exactly the parameters we need to get. Sometimes, the solutions may be multiple, so further steps is required to distinguish which solution gives the parameters we want.

No comments: