A selected list of papers on codon models
The papers that started it all
Goldman, N., and Z. Yang. 1994. A codon based model of nucleotide
substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11:725-736.
Muse, S. V. and B. S. Gaut. 1994. A likelihood approach for comparing
synonymous and nonsynonymous nucleotide substitution rates, with applications to
the chloroplast genome. Mol. Biol. Evol. 11:715-725.
Models (lots of them)
Models for variable selection pressure among sites:
Bao L, Gu H, Dunn KA, Bielawski JP. 2008. Likelihood Based Clustering (LiBaC)
for Codon Models, a method for grouping sites according to similarities in the
underlying process of evolution. Mol Biol Evol. Jun 26. [Epub ahead of print]
Bao L, Gu H, Dunn KA, Bielawski JP. 2007. Methods for selecting fixed-effect
models for heterogeneous codon evolution, with comments on their application to
gene and genome data. BMC Evol Biol. 7 Suppl 1:S5.
Mayrose I, Doron-Faigenboim A, Bacharach E, Pupko T. 2007. Towards realistic
codon models: among site variability and dependency of synonymous and
non-synonymous rates. Bioinformatics. 23(13):i319-27.
Huelsenbeck JP, Jain S, Frost SW, Pond SL. 2006. A
Dirichlet process model for detecting positive selection in protein-coding DNA
sequences. Proc Natl Acad Sci U S A. 103(16):6263-6268.
Wilson DJ, McVean G. 2006. Estimating diversifying
selection and functional constraint in the presence of recombination. Genetics.
172(3):1411-1125.
Kosakovsky Pond SK, Muse SV. 2005. Site-to-site variation of
synonymous substitution rates. Mol Biol Evol. 22(12):2375-85.
Massingham T, Goldman N. 2005. Detecting amino acid sites under positive
selection and purifying selection. Genetics.
169(3):1753-1762.
Huelsenbeck JP, Dyer KA. 2004. Bayesian estimation of positively selected
sites. J Mol Evol. 58:661-672.
Yang, Z., and W. J. Swanson. 2002. Codon-substitution models to detect
adaptive evolution that account for heterogeneous selective pressures among site
classes. Mol. Biol. Evol. 19:49-57.
Yang, Z., R. Nielsen, N. Goldman, and A.-M. K. Pedersen. 2000.
Codon-substitution models for heterogeneous selection pressure at amino acid
sites. Genetics 155:431-449.
Nielsen, R. and Z. Yang. 1998. Likelihood models for detecting positively
selected amino acid sites and applications to the HIV-1 envelope gene. Genetics
148:929-936.
Models for variation in selection pressure among lineages:
Kosakovsky
Pond SL, Frost SD. 2005. A genetic algorithm approach to detecting
lineage-specific variation in selection pressure. Mol Biol Evol.
22(3):478-485.
Seo TK, Kishino H, Thorne JL. 2004. Estimating
absolute rates of synonymous and nonsynonymous nucleotide substitution in order
to characterize natural selection and date species divergences. Mol Biol Evol.
21(7):1201-1213.
Bielawski, J. P. and Z. Yang. 2003. Maximum likelihood methods for detecting
adaptive evolution after gene duplication. Journal of Structural and Functional
Genomics, 3:201-212.
Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and
application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.
Models for variable selection pressure among sites & lineages:
Zhang J, Nielsen R, Yang Z. 2005. Evaluation of an
improved branch-site likelihood method for detecting positive selection at the
molecular level. Mol Biol Evol. 22(12):2472-2479.
Bielawski, J. P. and Z. Yang. 2004. A maximum likelihood method for detecting
functional divergence at individual codon sites, with application to gene family
evolution. Journal of Molecular Evolution, 59:121-132.
Guindon S, Rodrigo AG, Dyer KA, Huelsenbeck JP. 2004. Modeling the
site-specific variation of selection patterns along lineages. Proc Natl Acad Sci
U S A. 101:12957-12962.
Forsberg R, Christiansen FB.
2003. A codon-based model of host-specific selection in parasites, with
an application to the influenza A virus. Mol Biol Evol.
20(8):1252-1259.
Yang Z, Nielsen R. 2002. Codon-substitution models for detecting molecular
adaptation at individual sites along specific lineages. Mol Biol Evol.
19:908-917.
Yet more models:
Kosakovsky Pond SL, Poon AF, Leigh Brown AJ, Frost SD.
2008. A Maximum Likelihood Method for Detecting Directional Evolution in
Protein Sequences and its Application to Influenza A Virus. Mol Biol Evol. 2008
May 29. [Epub ahead of print]
Yang Z, Nielsen R. 2008. Mutation-selection models
of codon substitution and their use to estimate selective strengths on codon
usage. Mol Biol Evol. 25(3):568-579.
Doron-Faigenboim A, Pupko T. 2007. A combined
empirical and mechanistic codon model. Mol Biol Evol. 24(2):388-397.
Kosiol C, Holmes I, Goldman N. 2007. An empirical
codon model for protein sequence evolution. Mol Biol Evol. 24(7):1464-1479.
Seoighe C, Ketwaroo F, Pillay V, Scheffler K, Wood N, Duffet R, Zvelebil M,
Martinson N, McIntyre J, Morris L, Hide W. 2007. A
model of directional selection applied to the evolution of drug resistance in
HIV-1. Mol Biol Evol. 24(4):1025-1031.
Wong WS, Sainudiin R, Nielsen R. 2006.
Identification of physicochemical selective pressure on protein encoding
nucleotide sequences. BMC Bioinformatics. 7:148.
Sainudiin R, Wong WS, Yogeeswaran K, Nasrallah JB, Yang Z, Nielsen R.
2005. Detecting site-specific physicochemical
selective pressures: applications to the Class I HLA of the human major
histocompatibility complex and the SRK of the plant sporophytic
self-incompatibility system. J Mol Evol. 60(3):315-26.
Schneider A, Cannarozzi GM, Gonnet GH. 2005.
Empirical codon substitution matrix. BMC Bioinformatics. 6:134.
Yang, Z., R. Nielsen, and M. Hasegawa. 1998. Models of amino acid
substitution and applications to mitochondrial protein evolution. Molecular
Biology and Evolution 15:1600-1611.
The problem of rate estimation (dS and dN) and comparison of rates
Yang Z, Nielsen R. 2008. Mutation-selection models
of codon substitution and their use to estimate selective strengths on codon
usage. Mol Biol Evol. 25(3):568-579.
Aris-Brosou S, Bielawski JP.
2006. Large-scale analyses of synonymous substitution rates
can be sensitive to assumptions about the process of mutation. Gene. 2006
Aug 15;378:58-64.
Chapter 2 in
Yang, Z. 2006. Computational Molecular
Evolution. Oxford University Press, Oxford, England. [Book
web site]
Bierne N, Eyre-Walker A. 2003. The problem of counting sites in the
estimation of the synonymous and nonsynonymous substitution rates: implications
for the correlation between the synonymous substitution rate and codon usage
bias. Genetics. 165:1587-1597.
Bielawski, J. P., K. A. Dunn, and Z. Yang. 2000. Rates of nucleotide
substitution and mammalian nuclear gene evolution: approximate and
maximum-likelihood methods lead to different conclusions. Genetics.
156:1299-1308.
Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous
substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:
32-43.
Muse, S. V. 1996. Estimating synonymous and non-synonymous substitution
rates. Mol. Biol. Evol. 13:105-114.
Statistical tests and the identification of positively
selected sites
Bao L, Gu H, Dunn KA, Bielawski JP. 2008. Likelihood Based Clustering (LiBaC)
for Codon Models, a method for grouping sites according to similarities in the
underlying process of evolution. Mol Biol Evol. Jun 26. [Epub ahead of print]
Anisimova M, Yang Z. 2007. Multiple hypothesis testing to detect lineages under
positive selection that affects only a few sites. Mol Biol Evol. 24(5):1219-1228.
Bao L, Gu H, Dunn KA, Bielawski JP. 2007. Methods for selecting fixed-effect
models for heterogeneous codon evolution, with comments on their application to
gene and genome data. BMC Evol Biol. 7 Suppl 1:S5.
Aris-Brosou S. 2006. Identifying sites under positive selection with uncertain
parameter estimates. Genome. 9(7):767-776.
Scheffler K, Martin DP, Seoighe C. 2006. Robust
inference of positive selection from recombining coding sequences.
Bioinformatics. 22(20):2493-2499.
Yang Z. 2006. On the varied pattern of evolution of
2 fungal genomes: a critique of Hughes and Friedman. Mol Biol Evol.
23(12):2279-2282.
Scheffler K, Seoighe C. 2005. A Bayesian model
comparison approach to inferring positive selection. Mol Biol Evol.
22(12):2531-2540.
Kosakovsky Pond SL and Frost SDW. 2005. Not so different after all: a
comparison of methods for detecting amino acid sites under selection. Mol. Biol.
Evol. 22:1208-1222.
Yang Z, Wong WS, Nielsen R. 2005. Bayes empirical
bayes inference of amino acid sites under positive selection. Mol Biol Evol.
22(4):1107-1118.
Suzuki Y. 2004. New methods for detecting positive
selection at single amino acid sites. J Mol Evol.
59(1):11-19.
Wong WS, Yang Z, Goldman N, Nielsen R. 2004. Accuracy and power of
statistical methods for detecting adaptive evolution in protein coding sequences
and for identifying positively selected sites. Genetics. 168:1041-1051.
Anisimova M, Nielsen R, Yang Z. 2003. Effect of recombination on the accuracy
of the likelihood method for detecting positive selection at amino acid sites.
Genetics. 164:1229-1236.
Shriner D, Nickle DC, Jensen MA, Mullins JI. 2003.
Potential impact of recombination on sitewise approaches for detecting positive
natural selection. Genet Res. 81(2):115-121.
[abstract only]
Anisimova, M., J. P. Bielawski, and Z. Yang. 2002. Accuracy and Power of
Bayes prediction of amino acid sites under positive selection. Molecular Biology
and Evolution, 19:950-958.
Anisimova, M., J. P. Bielawski, and Z. Yang. 2001. Accuracy and power of
likelihood ratio test to detect adaptive molecular evolution. Molecular Biology
and Evolution. 18(8):1585-1592.
Codon models in phylogeny reconstruction
Seo TK, Kishino H. 2008. Synonymous substitutions
substantially improve evolutionary inference from highly diverged proteins. Syst
Biol. 57(3):367-377.
Inagaki Y, Roger AJ. 2006. Phylogenetic estimation
under codon models can be biased by codon usage heterogeneity. Mol Phylogenet
Evol. 40(2):428-434.
Shapiro B, Rambaut A, Drummond AJ. 2006. Choosing
appropriate substitution models for the phylogenetic analysis of protein-coding
sequences. Mol Biol Evol. 23(1):7-9.
Ren, F., H. Tanaka, and Z. Yang. 2005. An empirical examination of the
utility of codon-substitution models in phylogeny reconstruction. Syst. Biol.
54: 808-818.
Reviews and Commentaries
Kosakovsky Pond SL, Poon AF, Zárate S, Smith DM, Little SJ, Pillai SK, Ellis
RJ, Wong JK, Leigh Brown AJ, Richman DD, Frost SD. 2008.
Estimating selection pressures on HIV-1 using phylogenetic likelihood
models. Stat Med. 2008 Apr 1. [Epub ahead of print]
Anisimova M, Liberles DA. 2007. The quest for
natural selection in the age of comparative genomics. Heredity.
99(6):567-579.
Nielsen R, Hellmann I, Hubisz M, Bustamante C,
Clark AG. 2007. Recent and ongoing selection in the
human genome. Nat Rev Genet. 8(11):857-868.
Bielawski, J. P., and Z. Yang. 2005. Maximum likelihood methods for detecting
adaptive protein evolution, in (R. Nielsen ed.) Statistical Methods in Molecular
Evolution, Springer-Verlag, New York. [Book at
Springer]
Yang, Z. 2005. The power of phylogenetic comparison in revealing protein
function. PNAS 102:3179-3180.
Yang, Z. 2002 Inference of selection from multiple species alignments.
Current Opinion in Genetics and Development 12: 688-694.
Yang, Z. and J. P. Bielawski. 2000. Statistical tests of adaptive molecular
evolution. Trends in Ecology and Evolution, 15:496-502.
Other papers cited in the lecture or in the lab
Anisimova M, Bielawski J, Dunn K, Yang Z. 2007. Phylogenomic analysis of natural
selection pressure in Streptococcus genomes. BMC Evol Biol. 7:154.
Aguileta G, Bielawski JP, Yang Z. 2004. Gene conversion and functional
divergence in the beta-globin gene family. J. Mol. Evol. 59:177-189.
Bielawski JP, Dunn KA, Sabehi G, Beja O. 2004. Darwinian adaptation of
proteorhodopsin to different light intensities in the marine environment. Proc
Natl Acad Sci U S A. 101:14824-14829.
Nielsen R, Yang Z. 2003. Estimating the distribution of selection
coefficients from phylogenetic data with applications to mitochondrial and viral
DNA. Mol Biol Evol. 20:1231-1239.
Yang, W., J. P. Bielawski, and Z. Yang. 2003. Widespread adaptive evolution
in the human immunodeficiency virus type-1 genome. J.
Mol.
Evol., 57:212-221.
Schadt E, Lange K. 2002. Codon and rate variation
models in molecular phylogeny. Mol. Biol. Evol. 19(9):1534-49.
Schadt EE, Sinsheimer JS, Lange K. 2002.
Applications of codon and rate variation models in molecular phylogeny. Mol. Biol.
Evol.19(9):1550-1562.
Bielawski, J. P. and Z. Yang. 2001. The role of selection in the evolution of
the DAZ gene family. Mol. Biol.
Evol. 18: 523-529.
Dunn, K. D., J. P. Bielawski, and Z. Yang. 2001. Rates and patterns of
synonymous substitutions in Drosophila: implications for translational
selection. Genetics. 157:295-305.