20
Jul
2015
admin

GARLI Web Service - Documentation

Information

Here we provide a web service for GARLI version 2.1, a maximum-likelihood phylogenetic analysis program developed and maintained by Derrick Zwickl. GARLI analyses submitted through the web service run on a grid system at the University of Maryland known as The Lattice Project. If you would like to participate by running these GARLI analyses on your own computing resources, please connect to our BOINC project.

The GARLI web service allows users to perform up to 1,000 best tree search replicates for a single analysis using an adaptive search that begins with the submission of 10 replicates. For bootstrap analyses, the user may request up to 2,000 replicates for a single analysis. Additional functionality becomes available if you register for an account on molecularevolution.org and submit jobs while logged-in. For example, you will be able to upload and reuse input files, bypass the captcha when you submit jobs, clone jobs, and access other advanced job management features.

Contact

If you have questions, feedback, or you have found a bug with the web site, we would like to hear about it. Please feel free to contact us at the email addresses found at the bottom of the page.


Features

Analysis type

Either an adaptive best tree search (maximum likelihood search for the best tree) or a bootstrap analysis can be specified as the analysis type.

↳ Adaptive best tree search analysis

In the case of an adaptive best tree search, the system will automatically determine the necessary number of search replicates to perform by calculating the number of replicates needed to recover the best topology with 0.95 probability. It begins by submitting 10 replicates, and will continue to submit replicates as necessary up to a maximum of 1,000 replicates. More information about the adaptive search can be found in our GARLI web service article in Systematic Biology.

↳ Bootstrap analysis

The maximum number of replicates allowed for a bootstrap analysis is 2,000.

NOTE: as of 16 September 2016, we have set stopgen=500000 to prevent analyses from running for too long. This could limit the thoroughness of the search in some instances, but should not affect too many analyses.

Specifying partitioned analyses

To specify a partitioned analysis, first select a valid sequence data file in NEXUS format, then click the checkbox labeled "Perform partitioned analysis". You can then choose from the following input modes:

↳ Guided mode

Guided mode allows the user to specify the details of the partitioned analysis with graphical form elements, rather than by manually composing a NEXUS sets block and GARLI model blocks. Guided mode is available once the user has selected a valid NEXUS data file. The user then creates one or more character sets (charsets), each consisting of a name, a start position, and an end position; charsets may also be specified by codon position by clicking the checkbox labeled "Partition by codon position". All valid charsets will then be made available to be added to data subsets. Each data subset must contain at least one charset. The service currently allows the definition of up to ten data subsets in guided mode. For each data subset, a particular substitution model (and particular model parameters) may be specified. When the partitioned analysis is submitted, the service will automatically transform the charset and subset data into a NEXUS sets block and include it in the data file, and will likewise produce the appropriate model blocks and add them to the GARLI configuration file.

↳ Expert mode

For users who prefer to specify their own NEXUS sets block and GARLI model blocks, we provide an expert mode that allows the user to input them directly.

Support for a variety of amino acid models

The service allows the user to specify a number of different amino acid models, including cpREV, Dayhoff, DCMut, Fixed, Jones, LG, mtArt, mtMam, mtREV, Poisson, and WAG. Some of these are not natively available in GARLI 2.1, but can be specified using the GARLI web service.

Analyzing sets of data files

You can analyze multiple data files simultaneously by providing a zip file (.zip extension) or tar file (.tar or .tar.gz extension) as the sequence data file. The zip or tar file should contain a number of sequence data files (up to a maximum of 100). When using a sequence data file in zip or tar format, the service will submit a separate analysis for each data file in the compressed file. The model parameters and other GARLI options specified for the job submission will be applied uniformly to each analysis.

Upon submission, you will get a confirmation e-mail that you have submitted a batch job, and you can proceed to monitor the status of each individual analysis on the Job Status page. You will not get a job completion e-mail after each individual analysis has finished, but you will get an e-mail notification once all the jobs in the batch submission have finished. If you would like to monitor the progress of individual jobs, you can do so by navigating to the Job Details page for the batch job. If there were less than seven analyses in the batch, you will be able to see post-processing results separately for each analysis on the Job Details page; if there were more than seven analyses in the batch, the post-processing results for each analysis will be bundled into a single zip file that you can download.

Additionally, there are a few constraints that are currently imposed for batch submissions:

  • Partitioned analyses are not allowed.
  • A maximum of 100 sequence data files are allowed in your zip or tar file.
  • A maximum of 100 replicates may be specified for bootstrap analyses.
  • A maximum of 10,000 replicates in progress (job status is "Running") is allowed per user. Further job submissions will be disallowed until the user's replicates in progress falls below 10,000.

Cloning jobs

You can use the specification of a previously submitted analysis as a starting point for a new job submission. To do this, expand the "Clone Job" section at the top of the Create Job page, select a previously submitted job that you wish to use as a starting point, and click "Clone Job". The form fields will be automatically filled with those from the previous job. You can proceed to edit the job specification if you wish, and then eventually submit the new job.

Managing your file space

When you register for an account on molecularevolution.org, you are allocated a certain amount of space for your input files. Specifically, you are allowed a total of 1 GB for all of your input files, and a single input file cannot be larger than 100 MB. You can manage your file space (view, download or delete your input files) by navigating to your personal file repository.

Monitoring and tracking jobs

Upon job submission, you will either get a confirmation e-mail that the job has been submitted successfully, or that the job failed. If the job failed, the e-mail should include a description of the error. To track your jobs, you can navigate to the Job Status and/or Job Details pages. When an analysis has completed, the results of post-processing will be available on the Job Details page, and in most cases, a web-based visualization of the best tree or bootstrap consensus tree will be shown.

↳ Job states

The are several different states you may find your jobs in:

  • Queued: your analysis has been successfully submitted and is waiting in queue to start.
  • Running: your analysis is in the process of running.
  • Complete: your analysis is complete and the results of post-processing are available.
  • Failed: your analysis has failed; you should receive an email with details as to the reason.
  • Canceled: your analysis has been canceled.

↳ Removing jobs

If you have made a mistake in your job specification or the job failed for some reason, you can remove the job from the Job Status page. If you remove a batch jobs, all of the individual analyses that make up that batch will be removed as well. If you would like to remove individual jobs from a batch, you can navigate to the Job Details page for the batch job and select the jobs you would like to remove.


Citation

The phylogenetic analyses performed by the GARLI web service hosted at molecularevolution.org (Bazinet, Zwickl, and Cummings 2014) used GARLI 2.1 (Genetic Algorithm for Rapid Likelihood Inference; Zwickl 2006).

Bazinet, A. L., Zwickl, D. J., and Cummings, M. P. 2014. A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0. Systematic Biology 63(5):812-818, doi:10.1093/sysbio/syu031.

Zwickl, D. J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.


More information & references

The phylogenetic analyses performed by the GARLI web service hosted at molecularevolution.org (Bazinet, Zwickl, and Cummings 2014) use GARLI 2.1 (Genetic Algorithm for Rapid Likelihood Inference; Zwickl 2006) and grid computing (Cummings and Huskamp 2005) through The Lattice Project (Bazinet and Cummings 2008), which includes clusters and desktops in one encompassing system (Myers et al. 2008). The GARLI web service is a front end to a grid service built using a special programming library and associated tools (Bazinet et al. 2007). Following the general computational model of a previous phylogenetics study (Cummings et al. 2003), which used an earlier grid computing system (Myers and Cummings 2003), required files are distributed among hundreds of computers where the computational analyses are conducted asynchronously in parallel.

Post-processing of the phylogenetic inference results is performed using DendroPy (Sukumaran and Holder 2010) and the R system for statistical computing (R Core Team 2013). The estimation of the number of replicates required to recover the "best" tree topology follows Regier et al. (2009) and uses the symmetric difference metric (Robinson and Foulds 1981). The calculation of confidence intervals for the bootstrap probabilities observed in the majority rule consensus tree follows Hedges (1992). Tree visualizations use the BioJS Tree Viewer (Gómez et al. 2013).

Development of the GARLI web service has been supported by NSF grants DBI-0755048: Grid, Public and GPU Computing for the Tree of Life, and DBI-1356562: Parallel Computing for Phylogenetics: Grid, Public and GPU Computing.

References

Bazinet, A. L., Zwickl, D. J., and Cummings, M. P. 2014. A Gateway for Phylogenetic Analysis Powered by Grid Computing Featuring GARLI 2.0. Systematic Biology 63(5):812-818, doi:10.1093/sysbio/syu031.

Bazinet, A. L., and Cummings, M. P. 2008. The Lattice Project: a grid research and production environment combining multiple grid computing models. Pages 2-13. In Weber, M. H. W. (Ed.) Distributed & Grid Computing - Science Made Transparent for Everyone. Principles, Applications and Supporting Communities. Rechenkraft.net, Marburg.

Bazinet, A. L., Myers, D. S., Fuetsch, J., and Cummings, M. P. 2007. Grid services base library: a high-level, procedural application program interface for writing Globus-based Grid services. Future Generation Computer Systems 23:517.

Cummings, M. P., and Huskamp, J. C. 2005. Grid computing. Educause Review 40:116-117.

Cummings, M. P., Handley, S. A., Myers, D. S., Reed, D. L., Rokas, A., and Winka, K. 2003. Comparing bootstrap and posterior probability values in the four-taxon case. Systematic Biology 52:477-487.

Gómez J., García L. J., Salazar G. A., Villaveces J., Gore S., García A., Martín M. J., Launay G., Alcántara R., Del-Toro N., Dumousseau, M., Orchard, S., Velankar, S., Hermjakob, H., Zong, C., Ping, P., Corpas, M., and Jiménez, R. C. 2013. BioJS: an open source JavaScript framework for biological data visualization. Bioinformatics 29 (8):1103-1104.

Hedges, S. B. 1992. The number of replications needed for accurate estimation of the bootstrap P value in phylogenetic studies. Mol Biol Evol 9:366-369.

Myers, D. S., Bazinet, A. L., and Cummings, M. P. 2008. Expanding the reach of Grid computing: combining Globus- and BOINC-based systems. Pages 71-85. In Talbi, E.-G. and A. Zomaya (Eds.) Grids for Bioinformatics and Computational Biology, Wiley Book Series on Parallel and Distributed Computing. John Wiley & Sons, New York.

Myers, D. S., and Cummings, M. P. 2003. Necessity is the mother of invention: a simple grid computing system using commodity tools. Journal of Parallel and Distributed Computing 63:578-589.

Regier, J. C., Zwick, A., Cummings, M. P., Kawahara, A. Y., Cho, S., Weller, S., Roe, A., Baixeras, J., Brown, J. W., Parr, C., Davis, D. R., Epstein, M., Hallwachs, W., Hausmann, A., Janzen, D. H., Kitching, I. J., Solis, M. A., Yen, S-H., Bazinet, A. L., Mitter, C. 2009. Toward reconstructing the evolution of advanced moths and butterflies (Lepidoptera: Ditrysia): an initial molecular study. BMC Evolutionary Biology 9:280.

Robinson, D. F. and Foulds, L. R. 1981. Comparison of phylogenetic trees. Mathematical Biosciences 53:131-147.

R Core Team. 2013. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

Sukumaran, J., and Holder, M. T. 2010. DendroPy: A Python library for phylogenetic computing. Bioinformatics 26:1569-71.

Zwickl, D. J. 2006. Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. dissertation, The University of Texas at Austin.