Аннотация:The problem statement --- Important desired properties of an algorithm to construct a species tree (supertree) by reconciling input trees are its low complexity and applicability to large biological data. During “reconciliation” the algorithm minimizes the total cost of mappings of individual trees into the desired supertree over all input trees and inferred evolutionary events. Even if only duplication and loss events are allowed, this problem is proved to be NP-hard. In practice, supertree building algorithms, including various heuristic solutions, suffer from exponential complexity.
The solution --- We propose a reformulation of the supertree building problem: the supertree is sought for such that it does not contain clades incompatible with those existing in the input trees. This requirement is biologically natural and allows a solving algorithm of cubic complexity. Our model incorporates duplications, losses, gains and horizontal transfers. It was extensively tested with simulated and biological trees and was shown to possess a square complexity in average even under the assumption of HGTs. Under no HGTs, the algorithm is mathematically correct and possesses the longest running time of n^3 x m^3, where n is a the number of input trees, and m is the total number of species. The corresponding program super3GL and its testing are freely available at http://lab6.iitp. ru/en/super3gl. The program is capable of processing large data (hundreds of species and thousands of trees simultaneously), including polytomic and unrooted trees. It is effectively implemented for both multiprocessor platforms with MPI-1.2, and single-CPU machines.