likelihood ratio test

The likelihood ratio test (LRT) is a statistical test of the goodness-of-fit between two models. A relatively more complex model is compared to a simpler model to see if it fits a particular dataset significantly better. If so, the additional parameters of the more complex model are often used in subsequent analyses. The LRT is only valid if used to compare hierarchically nested models. That is, the more complex model must differ from the simple model only by the addition of one or more parameters. Adding additional parameters will always result in a higher likelihood score. However, there comes a point when adding additional parameters is no longer justified in terms of significant improvement in fit of a model to a particular dataset. The LRT provides one objective criterion for selecting among possible models.

The LRT begins with a comparison of the likelihood scores of the two models:

LR = 2*(lnL1-lnL2)

This LRT statistic approximately follows a chi-square distribution. To determine if the difference in likelihood scores among the two models is statistically significant, we next must consider the degrees of freedom. In the LRT, degrees of freedom is equal to the number of additional parameters in the more complex model. Using this information we can then determine the critical value of the test statistic from standard statistical tables.

The LRT is explained in more detail by Felsenstein (1981), Huelsenbeck and Crandall (1997), Huelsenbeck and Rannala (1997), and Swofford et al. (1996). While the focus of this page is using the LRT to compare two competing models, under some circumstances one can compare two competing trees estimated using the same likelihood model. There are many additional considerations (e.g., see Kishino and Hasegawa 1989, Shimodaira and Hasegawa 1999, and Swofford et al. 1996).


Example 1 - Comparing Likelihood Models:
Consider the HKY85 and GTR models. The GTR model differs from HKY85 by the addition of four additional rate parameters (see DNA substitution models). These models are therefore hierarchically nested - an imperative requirement of the LRT. Imagine calculating the likelihood scores of the two models after acquiring a simple neighbor-joining tree:

HKY85   -lnL = 1787.08
GTR       -lnL = 1784.82


LR = 2 (1787.08 - 1784.82) = 4.53

degrees of freedom = 4 (GTR adds 4 additional parameters to HKY85)

critical value (P = 0.05) = 9.49

In this case, GTR does not fit the data significantly better than HKY85, and we infer that the four rate additional rate parameters are not biologically meaningful (given our power to detect such differences). From this simple example, one could exhaustively test any number of substitution models to determine the best model for a given dataset.

Example 2 - Testing the Molecular Clock:
Lets say you are interested whether a DNA segment evolves at a homogeneous rate along all branches in a phylogeny. That is, you want to test whether the assumption of a molecular clock is a valid one. Since a molecular clock only allows a single rate, this is the simpler, hierarchically nested, null model. In testing a molecular clock, the degrees of freedom work out to be s-2, where s is the number of taxa in the phylogeny (Felsenstein 1981). Here after determining the best likelihood model (similar to Example 1 above), we calculate the likelihood scores for a 5 taxon statement with and without a molecular clock:

HKY85 + clock -lnL = 7573.81
HKY85              -lnL = 7568.56


LR = 2 (7573.81 - 7568.56) = 10.50

degrees of freedom = s-2 = 5-2 = 3

critical value (P = 0.05) = 7.82

The null hypothesis, that the rate of evolution is homogeneous among all branches in the phylogeny, is rejected. Rates of substitution significantly vary among branches and a molecular clock is inappropriate.

A web site to determine the critical values for the chi-square distribution is available here.