Eight papers by CSE researchers at STOC 2025

June 24, 2025

CSE authors are presenting new research on topics related to theoretical computer science, including coding theory, approximation algorithms, and subgraph matching.

Researchers affiliated with Computer Science and Engineering (CSE) at the University of Michigan are presenting eight papers at the 2025 ACM Symposium on Theory of Computing (STOC). Taking place June 23-27 in Prague, Czech Republic, STOC is one of the leading conferences for theoretical computer science, bringing together the world’s top experts in this field to discuss the latest findings.

Topics covered by CSE researchers at STOC include error detection in Reed-Solomon codes, hardness results for constrained optimization problems, dynamic graph algorithms, and more. The papers being presented are as follows, with the names of authors affiliated with CSE in bold:

Explicit Folded Reed-Solomon and Multiplicity Codes Achieve Relaxed Generalized Singleton Bounds *Best Student Paper Award*
Yeyuan Chen, Zihan Zhang

Abstract: In this paper, we prove that explicit folded Reed–Solomon (RS) codes and univariate multiplicity codes achieve relaxed generalized Singleton bounds for list size L≥1. Specifically, we show the following: (1) Any folded RS code of block length n and rate R over the alphabet Fqs with distinct evaluation points is (L/L+1(1−sR/s−L+1),L) (average-radius) list-decodable for list size L∈[s]. (2) Any univariate multiplicity code of block length n and rate R over the alphabet Fps (where p is a prime) with distinct evaluation points is (L/L+1(1−sR/s−L+1),L) (average-radius) list-decodable for list size L∈[s].

Choosing s=Θ(1/є2) and L=O(1/є), our results imply that both explicit folded RS codes and explicit univariate multiplicity codes achieve list-decoding capacity 1−R−є with optimal list size O(1/є). This exponentially improves the previous state of the art (1/є)O(1/є) established by Kopparty, Ron-Zewi, Saraf, and Wootters (FOCS 2018 or SICOMP, 2023) and Tamo (IEEE TIT, 2024). In particular, our results on folded Reed–Solomon codes fully resolve a long-standing open problem originally proposed by Guruswami and Rudra (STOC 2006 or IEEE TIT, 2008). Furthermore, our results imply the first explicit constructions of (1−R−є,O(1/є)) (average-radius) list-decodable codes of rate R with polynomial-sized alphabets in the literature.

Our methodology can also analyze the list-recoverability of folded RS codes. We provide a tighter radius upper bound that states folded RS codes cannot be (L+1−ℓ/L+1(1−mR/m−1)+o(1), ℓ, L) list-recoverable where m=⌈logℓ(L+1)⌉>1. We conjecture this bound is almost tight when L+1=ℓa for any a∈ℕ≥ 2. To give some evidences, we show folded RS codes over the alphabet Fqs are (1/2−sR/s−2,2,3) list-recoverable, which proves the tightness of this bound in the smallest non-trivial special case. As a corollary, our bound states (folded) RS codes cannot be (1−R−є, ℓ, ℓoR(1/є)) list-recoverable, which refutes the possibility that these codes could achieve list-recovery capacity (1−R−є, ℓ, O(ℓ/є)). This implies an intrinsic separation between list-decodability and list-recoverablility of (folded) RS codes.

Asymptotically Optimal Hardness for k-Set Packing and k-Matroid Intersection
Euiwoong Lee, Ola Svensson, Theophile Thiery

Abstract: For any ε > 0, we prove that k-Dimensional Matching is hard to approximate within a factor of k/(12 + ε) for large k unless NP ⊆ BPP. Listed in Karp’s 21 NP-complete problems, k-Dimensional Matching is a benchmark computational complexity problem which we find as a special case of many constrained optimization problems over independence systems including: k-Set Packing, k-Matroid Intersection, and Matroid k-Parity.

For all the aforementioned problems, the best-known lower bound was a Ω(k /log(k))-hardness by Hazan, Safra, and Schwartz. In contrast, state-of-the-art algorithms achieve an approximation of O(k).

Our result narrows down this gap to a constant and thus provides a rationale for the observed algorithmic difficulties. The crux of our result hinges on a novel approximation preserving gadget from R-degree bounded k-CSPs over alphabet size R to kR-Dimensional Matching. Along the way, we prove that R-degree bounded k-CSPs over alphabet size R are hard to approximate within a factor Ωk(R) using known randomised sparsification methods for CSPs.

Hierarchy of constrained optimization problems with arrows denoting reducibility; k-Dimensional Matching lies at the intersection of three incoming paths, with a dashed link extending from k-Set Packing to independent set problems in K₁,k₊₁-free graphs. — A hierarchy of problems capturing𝑘-Dimensional Matching. An arrow from 𝑃 to 𝑄 means that 𝑄 can be cast as 𝑃. For all problems with solid boxes Theorem 1.1 improves the hardness bound from Ω(𝑘/log(𝑘)) to𝑘/12. On the other hand, finding an independent set in a (𝑘 +1)-claw free graph is hard to approximate beyond a factor of 𝑘+1 4 [19, 25].

Deterministic Dynamic Maximal Matching in Sublinear Update Time
Aaron Bernstein, Sayan Bhattacharya, Peter Kiss, Thatchaphol Saranurak

Abstract: We give a fully dynamic deterministic algorithm for maintaining a maximal matching of an n-vertex graph in Õ(n8/9) amortized update time. This breaks the long-standing Ω(n)-update-time barrier on dense graphs, achievable by trivially scanning all incident vertices of the updated edge, and affirmatively answers a major open question repeatedly asked in the literature Baswana, Gupta and Sen [FOCS 2011], Bhattacharya,Chakrabarty, Henzinger and Nanongkai [SODA 2018], Solomon [Dagstuhl].

We also present a faster randomized algorithm against an adaptive adversary with Õ(n3/4) amortized update time.

Our approach employs the edge degree constrained subgraph (EDCS), a central object for optimizing approximation ratio, in a completely novel way; we instead use it for maintaining a matching that matches all high degree vertices in sublinear update time so that it remains to handle low degree vertices rather straightforwardly. To optimize this approach, we employ tools never used in the dynamic matching literature prior to our work, including sublinear-time algorithms for matching high degree vertices, random walks on directed expanders, and the monotone Even-Shiloach tree for dynamic shortest paths.

A (2+ε)-Approximation Algorithm for Metric k-Median
Vincent Cohen-Addad, Fabrizio Grandoni, Euiwoong Lee, Chris Schwiegelshohn, Ola Svensson

Abstract: In the classical NP-hard (metric) k-median problem, we are given a set of n clients and centers with metric distances between them, along with an integer parameter k ≥ 1. The objective is to select a subset of k open centers that minimizes the total distance from each client to its closest open center. In their seminal work, Jain, Mahdian, Markakis, Saberi, and Vazirani presented the Greedy algorithm for facility location, which implies a 2-approximation algorithm for k-median that opens k centers in expectation.Since then, substantial research has aimed at narrowing the gap between their algorithm and the best achievable approximation by an algorithm guaranteed to open exactly k centers, as required in the k-median problem. During the last decade, all improvements have been achieved by leveraging their algorithm (or a small improvement thereof), followed by a second step called bi-point rounding, which inherently adds an additional factor to the approximation guarantee.

Our main result closes this gap: for any > 0, we present a (2+)-approximation algorithm for the k-median problem, improving the previous best-known approximation factor of 2.613. Our approach builds on a combination of two key algorithms. First, we present a non-trivial modification of the Greedy algorithm that operates with only O(logn/2) adaptive phases. Through a novel walk-between-solutions approach, this enables us to construct a (2+)-approximation algorithm for k-median that consistently opens at most k + O(logn/2) centers: via known results, this already implies a (2+)-approximation algorithm that runs in quasi-polynomial time. Second, we develop a novel (2+)-approximation algorithm tailored for stable instances, where removing any center from an optimal solution increases the cost by at least an Ω(3/logn) fraction. Achieving this involves several ideas, including a sampling approach inspired by the k-means++ algorithm and a reduction to submodular optimization subject to a partition matroid. This allows us to convert the previous result into a polynomial time algorithm that opens exactly k centers while maintaining the (2+)-approximation guarantee.

Sequential diagram showing a facility node transitioning from regular (white) to tentative (dark gray) and back, maintaining edges to six client nodes throughout the walk-between-solutions approach. — How the researchers walk from 𝐻′ 𝑝 to 𝐻𝑝 in their walk-between-solutions approach. Dark grey color denotes a free facility and light grey color denotes regular open facilities.

Solving the Correlation Cluster LP in Nearly Linear Time
Nairen Cao, Vincent Cohen-Addad, Euiwoong Lee, Shi Li, David Rasmussen Lolck, Alantha Newman, Mikkel Thorup, Lukas Vogl, Shuyi Yan, Hanwen Zhang

Abstract: Correlation Clustering is a fundamental and widely-studied problem in unsupervised learning and data mining. The input is a graph and the goal is to construct a clustering minimizing the number of inter-cluster edges plus the number of missing intra-cluster edges.

Cao, Cohen-Addad, Lee, Li, Newman, and Vogl [STOC 2024] introduced the cluster LP for Correlation Clustering, which they argued captures the problem much more succinctly than previous linear programming formulations. However, the cluster LP has exponential size, with a variable for every possible set of vertices in the input graph. Nevertheless, they showed how to find a feasible solution for the cluster LP in time O(npoly(1/ε)) with objective value at most (1+ε) times the value of an optimal solution for the respective Correlation Clustering instance. Furthermore, they showed how to round a solution to the cluster LP, yielding a (1.437+ε)-approximation algorithm for the Correlation Clustering problem.

The main technical result of this paper is a new approach to find a feasible solution for the cluster LP with objective value at most (1+ε) of the optimum in time O(2poly(1/ε) n), where n is the number of vertices in the graph. We also show how to implement the rounding within the same time bounds, thus achieving a fast (1.437+ε)-approximation algorithm for the Correlation Clustering problem. This bridges the gap between state-of-the-art methods for approximating Correlation Clustering and the recent focus on fast algorithms.

Covering Approximate Shortest Paths with DAGs
Sepehr Assadi, Gary Hoppenworth, Nicole Wein

Abstract: We define and study analogs of probabilistic tree embedding and tree cover for directed graphs. We define the notion of a DAG cover of a general directed graph G: a small collection D1,…, Dg of DAGs so that for all pairs of vertices s,t, some DAG Di provides low distortion for (s,t); i.e. Di(s, t) ≤ α · G(s, t), where α is the distortion. As a trivial upper bound, there is a DAG cover with n DAGs and α=1 by taking the shortest-paths tree from each vertex. When each DAG is restricted to be a subgraph of G, there is a simple matching lower bound (via a directed cycle) that n DAGs are necessary, even to preserve reachability. Thus, we allow the DAGs to include a limited number of additional edges not from the original graph.

When n2 additional edges are allowed, there is a simple upper bound of two DAGs and α=1. Our first result is an almost-matching lower bound that even for n2−o(1) additional edges, at least n1−o(1) DAGs are needed, even to preserve reachability. However, the story is different when the number of additional edges is Õ(m), a natural setting where the sparsity of the DAG collection asymptotically matches that of the original graph. Our main upper bound is that there is a near-linear time algorithm to construct a DAG cover with Õ(m) additional edges, polylogarithmic distortion, and only O(logn) DAGs. This is similar to known results for undirected graphs: the well-known FRT probabilistic tree embedding implies a tree cover where both the number of trees and the distortion are logarithmic. Our algorithm also extends to a certain probabilistic embedding guarantee. Lastly, we complement our upper bound with a lower bound showing that achieving a DAG cover with no distortion and Õ(m) additional edges requires a polynomial number of DAGs.

Comparison of a directed graph composed of three circles connected by arrows and its reachability-preserving DAG cover: the original graph (left) has cyclic edges, while the DAG (right) includes green internal paths and red edges linking components. — On the left is a directed graph 𝐺, with strongly connected components shown in blue. On the right is one of the two DAGs constructed in the authors’ reachability-preserving DAG cover of 𝐺 in Observation 1.4. The other DAG in their cover can be obtained by reversing all the green edges in the diagram.

Deterministic Vertex Connectivity via Common-Neighborhood Clustering and Pseudorandomness
Yonggang Jiang, Chaitanya Nalam, Thatchaphol Saranurak, Sorrachai Yingchareonthawornchai

Abstract: We give a deterministic algorithm for computing a global minimum vertex cut in a vertex-weighted graph with n vertices and m edges in Ô(mn) time. We use Õ(·) and Ô· to hide log(n) and no(1) factors, respectively. This breaks the long-standing Ω(n4)-time barrier in dense graphs, achievable by trivially computing all-pairs maximum flows. Up to subpolynomial factors, we match the fastest randomized Õ(mn)-time algorithm by [Henzinger, Rao and Gabow FOCS’96], and affirmatively answer the question by [Gabow FOCS’00] whether deterministic O(mn)-time algorithms exist even for unweighted graphs. Our algorithm works in directed graphs, too.

In unweighted undirected graphs, we present a faster deterministic Ô(mκ)-time algorithm where κ≤ n is the size of the global minimum vertex cut. For a moderate value of κ, this strictly improves upon all previous deterministic algorithms in unweighted graphs with running time Ô(m(n+κ2)) [Even’75], Ô(m(n+κ√n)) [Gabow FOCS’00], and Ô(m2O(κ2)) [Saranurak and Yingchareonthawornchai FOCS’22]. Recently, a linear-time algorithm has been shown by [Korhonen’24] for small κ.

Our approach applies the common-neighborhood clustering, recently introduced by [Blikstad, Jiang, Mukhopadhyay and Yingchareonthawornchai STOC’25], in novel ways, e.g., on top of weighted graphs and on top of vertex-expander decomposition. We also exploit pseudorandom objects often used in computational complexity communities, including crossing families based on dispersers from [TaShma, Umans and Zuckerman STOC’01, Wigderson and Zuckerman Combinatorica’99] and selectors based on linear lossless condensers [Cheraghchi’11, Guruswami, Umans and Vadhan JACM’09]. To our knowledge, this is the first application of selectors in graph algorithms.

Harmonic Decomposition in Data Sketches
Dingyu Wang

Abstract: In the turnstile streaming model, a dynamic vector x=(x1,…,xn)∈ ℤn is updated by a stream of entry-wise increments/decrements. Let f∶ℤ→ ℝ+ be a symmetric function with f(0)=0. The f-moment of x is defined to be f(x) := ∑v∈[n]f(xv). We revisit the problem of constructing a universal sketch that can estimate many different f-moments. Previous constructions of universal sketches rely on the technique of sampling with respect to the L0-mass (uniform samples) or L2-mass (L2-heavy-hitters), whose universality comes from being able to evaluate the function f over the samples. To get samples, hash collisions are deliberately detected and avoided (with high probability), e.g. singleton-detectors are used in L0-sampling and the CountSketch is used in L2-sampling. Such auxiliary data structures introduce significant overhead in space. Apart from this issue, sampling-based methods are shown to perform poorly for estimating certain “nearly periodic functions” where Ω(poly(n)) samples are needed. In this paper, we propose a new universal sketching scheme that is almost “dual” to the sampling-based methods. Instead of evaluating f on samples, we decompose f into a linear combination of homomorphisms f1,f2,… from (ℤ,+) to (ℂ,×), where the fk-moments can be estimated regardless of hash collisions—because each fk is a homomorphism! Then we synthesize the estimates of the fk-moments to obtain an estimate of the f-moment. Universality now comes from the fact that we can weight the fk-moments arbitrarily, where the correct weighting depends on the harmonic structure of the function f. In contrast to the sampling-based methods, the new SymmetricPoissonTower sketch takes the harmonic approach. It embraces hash collisions instead of avoiding them, which saves multiple logn factors in space, e.g., when estimating all Lp-moments (f(z) = |z|p,p∈[0,2]). For many nearly periodic functions, the new sketch is exponentially more efficient than sampling-based methods. We conjecture that the SymmetricPoissonTower is the universal sketch that can estimate every tractable function f.

News

Eight papers by CSE researchers at STOC 2025