Aggregating Inconsistent Information in Ranking, Clustering and Phylogenetic Trees
This event is free and open to the publicAdd to Google Calendar
(Click “Event Website” above to access the zoom link.)
Abstract: We revisit standard data analysis tasks, where the goal is to construct a ranking, clustering or phylogenetic tree on $n$ elements by aggregating local information on overlapping subsets of the elements. Such information includes ordering constraints for ranking (e.g., pairwise comparisons), “must-link/cannot-link” constraints for clustering, whereas for hierarchical clustering, the analogous constraints are “desired/forbidden” triplets or quartets, which specify ordering relations to be included/avoided in the final output. Inevitably, such constraints may be inconsistent with each other, so the goal becomes to maximize the extent of agreement with the collected information.
In this paper, we study approximation algorithms and their limitations for these basic tasks in the worst-case and beyond. On the positive side, we achieve better approximations for all considered problems and often overcome established worst-case hardness, under a simple query model for obtaining noisy constraints from a ground-truth solution. A common underlying theme in our algorithms is the use of Max Cut on graphs with negative edges. On the negative side, we address an open question regarding ordering problems on trees (triplets/quartets consistency), raised in previous works in computational biology, by presenting near optimal hardness of approximation. As a consequence, we get optimal hardness for the forbidden triplets problem and to the best of our knowledge, this is the first tight hardness of approximation result for a constraint satisfaction problem on trees.
Based on joint work with Mohammad Mahdian and Sara Ahmadian.