A generalized Robinson-Foulds distance for labeled trees.
Journal article

A generalized Robinson-Foulds distance for labeled trees.

  • Briand S Computer Science Department, Université de Montréal, Montreal, Canada.
  • Dessimoz C Department of Computational Biology, University of Lausanne, Lausanne, Switzerland. c.dessimoz@ucl.ac.uk.
  • El-Mabrouk N Computer Science Department, Université de Montréal, Montreal, Canada. mabrouk@iro.umontreal.ca.
  • Lafond M Computer Science Department, Université de Sherbrooke, Sherbrooke, Canada.
  • Lobinska G Department of Genetics Evolution and Environment, University College London, London, UK.
Show more…
  • 2020-11-19
Published in:
  • BMC genomics. - 2020
English BACKGROUND
The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc).


RESULTS
We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees.


CONCLUSIONS
We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf .
Language
  • English
Open access status
gold
Identifiers
Persistent URL
https://sonar.ch/global/documents/13305
Statistics

Document views: 26 File downloads: