Bioinformatics Chang JM, Floden EW, Herrero J, Gascuel O, Di Tommaso P, Notredame C
Motivation Most evolutionary analyses are based on pre-estimated multiple sequence alignment. Wong et al. established the existence of an uncertainty induced by multiple sequence alignment when reconstructing phylogenies. They were able to show that in many cases different aligners produce different phylogenies, with no simple objective criterion sufficient to distinguish among these alternatives.
Results We demonstrate that incorporating MSA induced uncertainty into bootstrap sampling can significantly increase correlation between clade correctness and its corresponding bootstrap value. Our procedure involves concatenating several alternative multiple sequence alignments of the same sequences, produced using different commonly used aligners. We then draw bootstrap replicates while favoring columns of the more unique aligner among the concatenated aligners. We named this concatenation and bootstrapping method, Weighted Partial Super Bootstrap (wpSBOOT). We show on three simulated datasets of 16, 32 and 64 tips that our method improves the predictive power of bootstrap values. We also used as a benchmark an empirical collection of 853 one to one orthologous genes from seven yeast species and found wpSBOOT to significantly improve discrimination capacity between topologically correct and incorrect trees. Bootstrap values of wpSBOOT are comparable to similar readouts estimated using a single method. However, for reduced trees by 50 and 95% bootstrap thresholds, wpSBOOT comes out the lowest Type I error (less FP).
Availability and implementation The automated generation of replicates has been implemented in the T-Coffee package, which is available as open source freeware available from www.tcoffee.org. More information at https://doi.org/10.1093/bioinformatics/btz082
Comments