The assignment of sequences can be done using a number of different statistical frameworks. The Statistical Assignment Package (SAP) uses a Bayesian approach to determine the probability that an environmental sequence belongs to a specific taxonomic group. We will make the assumption that the species to which the sequence is assigned is represented in a reference database. If this is not the case, the method will calculate the probability that the sequence belongs to any of the taxonomic groups actually represented this database. It is important to emphasize that this method, like any other comparable methods, can only assign sequences to taxonomic groups that are actually represented in the database. SAP makes no attempt to model the structure and sampling representation of the databases to evaluate the probability that the sequence truly belongs to some other taxon not represented in the database.
The program has 5 stages:
1. A set of relevant homologues is compiled using NetBlast searches against GenBank.
2. The set of homologues is aligned using ClustalW.
3. Markov Chain Monte Carlo (MCMC) or neighbour-joining + bootstrapping is used to sample a large number of phylogenetic trees of the sample-sequence and the set of homologues.
4. Assignment statistics are calculated from the sampled phylogenies.
5. HTML output is generated with summary information for the entire analysis as well as detailed information for each assignment.
Munch K, Boomsma W, Willerslev E, Nielsen R. Fast Phylogenetic DNA barcoding. Philosophical Transactions of the Royal Society B, 363(1512) 3997-4002, Dec. 27, 2008.
Munch K, Boomsma W, Huelsenbeck J P, Willerslev E, Nielsen N. Statistical Assignment of DNA Sequences using Bayesian Phylogenetics. Systematic Biology, 57(5) 750-757, Oct. 2008.