Workshop on Molecular Evolution  
Centers for Disease Control and Prevention
HomePeopleScheduleSoftwareResourcesForums

Resources
File Formats
Glossary
References
Substitution Models
Amino Acid Substitution Models
Codon Substitution Models
DNA Substitution Models
UNIX Tutorial
Converting File Formats
Likelihood Ratio Test
Online Journals
Scientific Societies
Tree Formats


DNA Substitution Models

The use of maximum likelihood (ML) algorithms in developing phylogenetic hypotheses requires a model of evolution. The frequently used General Time Reversible (GTR) family of nested models encompasses 64 models with different combinations of parameters for DNA site substitution. The models are listed here from the least complex to the most parameter rich.

Jukes-Cantor (JC, nst=1): Equal base frequencies, all substitutions equally likely (PAUP* rate classification: aaaaaa, PAML: aaaaaa)* (Jukes and Cantor 1969)

Felsenstein 1981 (F81, nst=1): Variable base frequencies, all substitutions equally likely (PAUP*: aaaaaa, PAML: aaaaaa)** (Felsenstein 1981)

Kimura 2-parameter (K80, nst=2): Equal base frequencies, variable transition and transversion frequencies (PAUP*: abaaba, PAML: abbbba) (Kimura 1980)

Hasegawa-Kishino-Yano (HKY, nst=2): Variable base frequencies, variable transition and transversion frequencies (PAUP*: abaaba, PAML: abbbba) (Hasegawa et. al. 1985)

Tamura-Nei (TrN): Variable base frequencies, equal transversion frequencies, variable transition frequencies (PAUP*: abaaea, PAML: abbbbf) (Tamura Nei 1993)

Kimura 3-parameter (K3P): Variable base frequencies, equal transition frequencies, variable transversion frequencies (PAUP*: abccba, PAML: abccba) (Kimura 1981)

Transition Model (TIM): Variable base frequencies, variable transitions, transversions equal (PAUP*: abccea, PAML: abccbe)

Transversion Model (TVM): Variable base frequencies, variable transversions, transitions equal (PAUP*: abcdbe, PAML: abcdea)

Symmetrical Model (SYM): Equal base frequencies, symmetrical substitution matrix (A to T = T to A) (PAUP*: abcdef, PAML: abcdef) (Zharkikh 1994)

General Time Reversible (GTR, nst=6): Variable base frequencies, symmetrical substitution matrix (PAUP*: abcdef, PAML: abcdef) (e.g., Lanave et al. 1984, Tavare 1986, Rodriguez et. al. 1990)

In addition to models describing the rates of change from one nucleotide to another, there are models to describe rate variation among sites in a sequence. The following are the two most commonly used models.

Gamma Distribution (G): Gamma distributed rate variation among sites

Proportion of Invariable Sites (I): Extent of static, unchanging sites in a dataset


Substitutions are themselves grouped hierarchically: simple, general base substitution, transitions and transversions, purine to purine and pyrimidine to pyrimidine transitions, and AC/GT and AT/CG transversions. The groupings are symbolized as rate classifications according to the PAUP and PAML matrices below. Substitution types that are constrained to be equal in rate assume the leftmost letter symbol.

PAUP* Substitution Rate Matrix     PAML Substitution Rate Matrix
    A  C  G  T                         T  C  A  G

A   -  a  b  c                     T   -  a  b  c

C      -  d  e                     C      -  d  e

G         -  f=1                   A         -  f=1

T            -                     G            -

Modeltest 3.7 also uses the special case of equal base frequencies for the models that have variable frequencies.

.......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... .......... ..........

Maintained by Adam Bazinet
Direct questions and comments to Michael Cummings