Tutorial: Comparison of gene flow models using Bayes Factors
Peter Beerli, July 2010
Table of contents
- Expected learning outcome
- Comparison of migration models in a nutshell
- Introduction: Description of the data
- Exercise 1: Improving a MIGRATE-N analysis
- Exercise 2: Reducing the number of population in the dataset
- Exercise 3: Defining a particular migration model
- Exercise 4: Comparison of models using marginal likelihoods and Bayes factors
- Final thoughts
Expected learning outcome
The objective of this activity is to help you understand the estimation of population genetic parameters and the evaluation of different population models using the program MIGRATE-N. You will learn how to diagnose whether an analysis is successful. You will also learn how to describe and format migration models. You will learn how to compare different population models and rank-order them according to their marginal likelihood and calculate Bayes factors.
Comparison of migration models in a nutshell
Most are familiar with the concept of likelihood ratio tests, or Akaike’s Information criterion for model comparison. This tutorial describes how to compare models using Bayes Factors. These allow comparing nested and un-nested models, without assuming Normality, or large samples. Bayes factors are ratios of marginal likelihoods. In contrast to maximum likelihood, the marginal likelihood is the integral of the likelihood function over the complete parameter range. MIGRATE-N can calculate such marginal likelihoods for a particular migration model (Beerli and Palczewski 2010). This tutorial steps through all necessary program runs to calculate Bayes factors for comparing different gene flow models. We need to do following:
- Decide on the models that are interesting for a comparison. The method does not work well for a fishing expedition where one would try to evaluate all models except for a small population model. It will be possible to enumerate all models for three populations but more will be very daunting.
- Run each model through MIGRATE-N. Use the same prior settings for each of them because the prior distribution has some influence on the Bayes factors. Use the heating menu to allow for at least four heated chains, use the # menu suggestion for best results, so that the temperatures are spaced so that the inverse of the temperature are regularly spaced on the interval 0 to 1. For example the the 4 different chains have temperatures 1.0, 1.5, 3.0, 100,000.0, this results in the spacing 1.0, 0.666, 0.333, and 0.0.
- Compare the marginal likelihood of the different runs and calculate the Bayes factor and calculate the probability for each model.
Introduction: Description of the data
The following pages detail all steps using a small example. We use a simulated dataset that was generated using parameters that force a direction of migration from the population Aadorf (A) to the population Bern (B). The Bern population is 10x larger than the Aadorf population and no individual from Bern ever goes to Aadorf, but Bern receives about 1 migrant per generation from Aadorf. The dataset name is twoswisstowns. We will evaluate 4 models: (1) a full model with two population sizes and two migration rates (from A to B and from B to A); (2) a model with two population sizes and one migration rate to Bern; (3) a model with two population sizes and one migration rate to Aadorf; (4) a model where Aadorf and Bern are part of the same panmictic population. we know the truth therefore we have some prejudice about the ranking of the models, model 2 should be best, model 1, because it allows the same migration direction as model 2 should be ranked second. Whether model 3 is better than model 4 is unknown a priori and may depend on the strength of the data. First we need to figure out how to run the dataset efficiently in MIGRATE. For that we pick the most complicated model 1 and experiment with run conditions until we are satisfied that the run converges and delivers posterior distributions that look acceptable.
A reminder: if you want to keep results (they are written by default into a file called outfile and outfile.pdf), save them using cp outfile yourfavoredfilename or change the name of the output file in the menu.
Exercise 1: Improving a MIGRATE-N analysis
- Make sure that there is no file called parmfile in the directory you want to run our experiment.
- Move or copy the data file twoswisstowns into the directory you want to run our experiment.
- Start the program MIGRATE-N (I will call it from now on simply MIGRATE). In the Input/Output formats menu change the Datafile name to twoswisstowns, return to the main menu.
- In the Search strategy menu change the strategy from the default (likelihood) to Bayesian inference. Change the Number of recorded steps in chain to 1000. Do not worry about priors or other runtime options for the moment. Return to the main menu.
- Save the changes by using the menu item Write a parmfile.
- Now run the program (pressing Y will start the run if you are in the main menu). For this dataset the runtime will be very short on a modern computer, if this takes more than 1 minute something is not set up correctly. On my computer this takes 5.2 seconds.
- The program writes considerable information during the run to the screen, that gives some information about the run. Most interesting are the acceptance ratio for the genealogy and the autocorrelations of the parameter and the genealogy. The default acceptance method for parameters is Slice sampling; the acceptance ratio of Slice sampling is always 1.0. If the autocorrelation is high and the effective sample size is low (<500) then a longer run may be needed. If the priors boundaries are too tight, then you will see that the values reported are either very close or exactly at the upper prior boundary, in these cases you need to extend the prior range. However for this dataset we will have no such problems.
- Look at the outfile.pdf, you may need a PDF viewer like acroread or Preview.app (on Macintosh computers use open outfile.pdf. In the outputfile figures labeled Bayesian analysis: Posterior distribution, you see histograms similar to the ones in Figure 1. We expect single peaks where the shading of the histogram shows one dark block in the center (50% credibility set), two light gray bars indicating the extent of the 95% credibility set, and two lighter gray bars indicating the 99% credibility set. On the last pages of the output file you will find the effective sample sizes (ESS), autocorrelation and acceptance ratio tables. These are important tools to judge whether your analysis was successful. If the autocorrelation rates are high you certainly will need to run the program for much longer time, If the ESS are low (<500) then most likely not enough independent samples where used and the analysis will be still biased by the start conditions. The LAMARC activity contains a good discussion on how to recognize problems in the analysis.
Figure 1. Example output from a very short MIGRATE run, showing problems with convergence for all parameters. Θ2 may have still some problems with a too narrow prior distribution.
- In your investigation of Figure 1 you recognize that the histogram does not look very smooth because our run was too short, now restart MIGRATE and set in the strategy menu the setting for change the number of recorded steps in chain from 1000 to 10000. This will lengthen the run by a factor of 10. Don't forget to write the parmfile to save the settings. Run and compare the results (Figure 2) with the histogram from before. You will recognize that the longer run has somewhat smoother histograms, and the double peaks vanish (hopefully). With your own data you may want to do another round of refinements, but eventually, by comparing the medians and means of the parameters in the table and the shape of the histograms you should see a good agreement on similar values, if the modes of the different runs are not within the 50% credibility intervals you certainly need to run longer.
Figure 1. Example output from a short MIGRATE run, showing potential convergence for all parameters except for Θ2 that may have some problems with a too narrow prior distribution.
- Let us assume that our runs are all satisfactory. We turn now to the best estimation of the marginal likelihood to compare models. Because we want to use the thermodynamic integration method, we need to turn on heating. Start MIGRATE-N, use the strategy menu and turn on heating, use static heating, which is the default setting, type Y and press enter. MIGRATE-N will tell what to do next, you will need to enter 4 chains sampling at every tenth (10) interval using the temperature scheme that is suggested with the character #. Save the parmfile, and run. This will take about 4x longer than before. It should give a better posterior distribution histogram. On my computer this takes about 5 minutes.
- MIGRATE-N calculates not only the posterior distributions of parameters, but also allows comparing different models, for this reason it calculates the marginal likelihood of the population model that was used. For example our first two runs used the default model in MIGRATE-N, this model allows different migration rates among each population and also estimates the individual population sizes. You can find the marginal likelihood towards the end of the outfile.pdf, in the Table Log-Probability of the data given the model. With heating, MIGRATE-N reports three quantities in three columns. Each column in the table is a different approximation of the marginal likelihood. The first two column calculate the marginal likelihood using thermodynamic integration (Beerli and Palczewski, 2010; Lartillot and Philippe, 2006), and the last column uses a harmonic mean estimator (Kass and Raftery, 1995). The first column uses a simple trapezoidal rule to do the thermodynamic integration, whereas the second column uses a modified trapezoidal rule that takes into account the general shapes of the marginal likelihood curve; it uses a Bezier curve to improve the integral (Beerli and Palczewski, 2010). Wit only few numbers of heated chains to calculate the marginal likelihood, the Bezier approximation works best. The harmonic mean has been proven to be often incorrect (Paul Lewis' paper; Beerli and Palczewski, 2010; we will see this with this exercise).
Come to the front and write down the log marginal likelihood into the spreadsheet. You will need the numbers from the row labeled All, in the table there are three columns, report the values for the Bezier approximation and the harmonic mean method. (This was our first model, we will compare the different models at the end of this exercise: my log marginal likelihood values for the Bezier approximated score and the Harmonic mean are -4862.85 and -4791.29, respectively.
- Copy the parmfile to parmfile.4param, copy outfile to outfile.4param, and copy outfile.pdf to outfile.4param.pdf
Exercise 2: Reducing the number of populations in the dataset
- We start now to work on those other models. We pick the easiest first: model 4. Start MIGRATE-N and choose the menu Parameter settings. Choose the entry about sampling localities. We want to use the data as if we would have sampled a single population, therefore we need to claim that the two locations Aadorf and Bern belong to the same panmictic population, this is done by telling MIGRATE-N that the dataset has two locations (2), and that they are in the same population by using 1 1. With multiple populations more complicated combinations are possible. Run MIGRATE-N, check the histogram, if it looks OK, come to the front and write down the log marginal likelihoods (again the row labeled All, Bezier and Harmonic score) into the spreadsheet under model 4. My run took 183 seconds and delivered these log marginal likelihoods -4887.25 and -4803.22.
- Copy the parmfile to parmfile.1param, copy outfile to outfile.1param, and copy outfile.pdf to outfile.1param.pdf
Exercise 3: Defining a particular migration model
- Copy the parmfile.4param to parmfile. We now want to work on the remaining two models. Start MIGRATE, choose the parameter menu. Choose the entry labeled Model is set to. MIGRATE will now show a dizzying list of options, do not panic, we will only use few of them. MIGRATE will ask you how many populations are used: enter 2. For a 2-population model we can have 4 parameters: two population sizes and two migration rates. Before you enter values, please read this whole paragraph. A * or x means that that particular parameter will be estimated, a zero means that that particular parameter will not be estimated (is not used). Our goal is to set one of the migration parameters to zero. We start with model 2 (Figure 3, Table 2).
MIGRATE needs to know how to treat all connections between the populations, and we must also give instructions on how the program will treat the population sizes. Because we want to estimate both population sizes and one migration rate, we will use the * and a zero for the unused migration rate. The connection matrix is square so we can label it like shown in the first table below. MIGRATE now asks that you input each row, this can be done by either specifying * 0 (see second table) and then return and then entering the next line * * return (second row in second table), or you can enter the whole matrix as * 0 * *. Exit the parameter menu, write the parmfile, run MIGRATE, check the histogram, report the log marginal likelihoods. My run took 156 seconds, and delivered these log marginal likelihoods: -4860.58, -4795.88.
Cautionary note: if you use this tutorial for your own work, please recognize that a standard run in migrate can only use migration models that allow it to draw a complete genealogy, so for example a model * 0 0 *, that has no migration among the population, does not work out of the box. As a rule of thumb each population must be connected to at least one other population.
- Copy the parmfile to parmfile.3aparam, copy outfile to outfile.3aparam, and copy outfile.pdf to outfile.3aparam.pdf
- Run model 3 using the same procedure as for model 2. The string for the migration connection matrix is * * 0 *. Write parmfile, run, report. My run took 151 seconds and the log marginal likelihoods were -4863.08, and -4794.53.
- Copy the parmfile to parmfile.3bparam, copy outfile to outfile.3bparam, and copy outfile.pdf to outfile.3bparam.pdf
Exercise 4: Comparison of models using marginal likelihoods and Bayes factors
Once about more than half of the class has reached this point we will talk about the marginal likelihoods found.
- How does one calculate Bayes factors? Bayes factors are often calculated in very different ways. In Table 3, I summarized all log marginal likelihoods, ln(mL), and I report the natural log Bayes factors where
Using the guidelines of Kass and Raftery (1995), values smaller than -2 suggest preference for model 2, values larger than 2 suggest preference for model 1. We can use the log marginal likelihoods or the BF to order the models (see column Choice in the Table 3). We can also calculate the model probability. The above outline uses only log marginal likelihoods, for these calculations we need marginal likelihoods. It is calculated by dividing each marginal likelihood by the sum of the marginal likelihoods of all used models:
The calculation of likelihoods from the reported log likelihoods is easy with computer programs that have variable precision (for example Maple or Mathematica). Calculations on a desk calculator often fail, for example the likelihood of model 1 is a remarkable small number because the likelihood is exp(-4862.85)= 1.233 x 10-2112, my emulated HP sci 15C calculator delivers 0.0000. But you can calculate the above quantities using this recipe: (1) find the largest log likelihood (-4860.58), (2) subtract that number form each log likelihood in the list (result: -2.27, 0.0, -2.5, -26.67), (3) exponentiate each element in the new list (result: 0.1033, 1.0, 0.0821, 2.6144 x 10-12) , (4) sum all elements in the list up, this is the denominator (1.1854) . (5) now divide each element in the list by that sum and the numbers will look like the one in table 3.
The harmonic mean fails to recover the true model for my example run and orders the models differently. Often the variance of the harmonic mean estimator (when running the same model multiple times) is large. It is therefore not reliable, but this may depend on the dataset. As consequence, I suggest to use the thermodynamic integration method if the extra cost in runtime is acceptable.
Looking at the model probabilities we can see that the “true” model has considerably higher support than the full model or the model that suggests a wrong direction of gene flow.
MIGRATE-N has many options and therefore must be used thoughtfully. It will perform poorly with "out of the box" settings. Plan to do some trial runs to explore the behavior of the program. Use a subset of your data for such experiments, first trials with larger number of populations lead often to frustration. Datasets that have more than 10 populations are very difficult to analyze, even on computer clusters; start out with a 2- or 3-population dataset. Although it is possible to run large number of populations, but such analyses needs many independent loci. If you encounter any difficulties, please email to Peter Beerli or the migrate-support google group. It is helpful if you include the data file that is causing the problem, especially for crashes and other bugs.