# Six papers by CSE researchers at STOC 2024

Six papers by researchers in CSE have been accepted for presentation at the 2024 ACM Symposium on Theory of Computing (STOC), taking place in Vancouver, Canada, from June 24-28. Held annually since 1969, STOC is the top international conference in theoretical computer science, bringing together the top minds in the field to share their latest findings and ideas.

CSE authors contributed to six out of 198 total papers accepted by the conference, with their work spanning a range of topics related to computing theory, from approximation algorithms for vertex cuts and dense linear systems to multi-commodity flow emulators and correlation clustering.

The following papers are being presented at the conference, with the names of authors affiliated with CSE in bold:

**“****Approximating Small Sparse Cuts****”****Aditya Anand**, **Euiwoong Lee**, Jason Li, **Thatchaphol Saranurak**

Abstract: We study polynomial-time approximation algorithms for edge and vertex Sparsest Cut and Small Set Expansion in terms of k, the number of edges or vertices cut in the optimal solution. Our main results are *O*(polylog *k*)-approximation algorithms for various versions in this setting. Our techniques involve an extension of the notion of sample sets (Feige and Mahdian STOC ’06), originally developed for small balanced cuts, to sparse cuts in general. We then show how to combine this notion of sample sets with two algorithms, one based on an existing framework of LP rounding and another new algorithm based on the cut-matching game, to get such approximation algorithms. Our cut-matching game algorithm can be viewed as a local version of the cut-matching game by Khandekar, Khot, Orecchia and Vishnoi and certifies an expansion of every vertex set of size s in *O*(log*s*) rounds. These techniques may be of independent interest. As corollaries of our results, we also obtain an *O*(log*opt*) approximation for min-max graph partitioning, where *opt* is the min-max value of the optimal cut, and improve the bound on the size of multicut mimicking networks computable in polynomial time.

**“****Solving Dense Linear Systems Faster than via Preconditioning****”****Michał Dereziński**, **Jiaming Yang**

Abstract: We give a stochastic optimization algorithm that solves a dense *n*× *n* real-valued linear system *Ax*=*b*, returning x such that ||*A*x−*b*||≤ є||*b*|| in time: Õ((*n*^{2}+*nk*^{ω−1})log1/є), where *k* is the number of singular values of *A* larger than *O*(1) times its smallest positive singular value, ω < 2.372 is the matrix multiplication exponent, and Õ hides a poly-logarithmic in *n* factor. When *k*=*O*(*n*^{1−θ}) (namely, A has a flat-tailed spectrum, e.g., due to noisy data or regularization), this improves on both the cost of solving the system directly, as well as on the cost of preconditioning an iterative method such as conjugate gradient. In particular, our algorithm has an Õ(*n*^{2}) runtime when *k*=*O*(*n*^{0.729}). We further adapt this result to sparse positive semidefinite matrices and least squares regression. Our main algorithm can be viewed as a randomized block coordinate descent method, where the key challenge is simultaneously ensuring good convergence and fast per-iteration time. In our analysis, we use theory of majorization for elementary symmetric polynomials to establish a sharp convergence guarantee when coordinate blocks are sampled using a determinantal point process. We then use a Markov chain coupling argument to show that similar convergence can be attained with a cheaper sampling scheme, and accelerate the block coordinate descent update via matrix sketching.

**“****Low-Step Multi-Commodity Flow Emulators****”**

Bernhard Haeupler, D Ellis Hershkowitz, Jason Li, Antti Röyskö, **Thatchaphol Saranurak**

Abstract: We introduce the concept of *low-step multi-commodity flow emulators* for any undirected, capacitated graph. At a high level, these emulators contain approximate multi-commodity flows whose paths contain a small number of edges, shattering the infamous *flow decomposition barrier* for multi-commodity flow.

We prove the existence of low-step multi-commodity flow emulators and develop efficient algorithms to compute them. We then apply them to solve constant-approximate *k*-commodity flow in *O*((*m*+*k*)^{1+є}) time. To bypass the *O*(*mk*) flow decomposition barrier, we represent our output multi-commodity flow *implicitly*; prior to our work, even the existence of implicit constant-approximate multi-commodity flows of size *o*(*mk*) was unknown.

Our results generalize to the *minimum* cost setting, where each edge has an associated cost and the multi-commodity flow must satisfy a cost budget. Our algorithms are also parallel.

**“****Optimal Embedding Dimension for Sparse Subspace Embeddings****”**

Shabarish Chenakkod, **Michał Dereziński**, Xiaoyu Dong, Mark Rudelson

Abstract: A random *m*× *n* matrix *S* is an oblivious subspace embedding (OSE) with parameters є>0, δ∈(0,1/3) and *d*≤ *m*≤ *n*, if for any *d*-dimensional subspace *W*⊆ *R** ^{n}*,

*P*( ∀

_{x∈}

*W*(1+є)

^{−1}||

*x*||≤ ||

*Sx*||≤ (1+є)||

*x*|| )≥ 1−δ. It is known that the embedding dimension of an OSE must satisfy

*m*≥

*d*, and for any θ > 0, a Gaussian embedding matrix with

*m*≥ (1+θ)

*d*is an OSE with є = O

_{θ}(1). However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having

*s*≪

*m*non-zeros per column (Clarkson and Woodruff, STOC 2013), with applications to problems such as least squares regression and low-rank approximation.

We show that, given any θ > 0, an *m*× *n* random matrix *S* with *m*≥ (1+θ)*d* consisting of randomly sparsified ±1/√*s* entries and having *s*= *O*(log^{4}(*d*)) non-zeros per column, is an oblivious subspace embedding with є = *O*_{θ}(1). Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve *m*=*O*(*d*) embedding dimension, and it improves on *m*=*O*(*d*log(*d*)) shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with *O*(*d*) embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression.

We further extend our results to Leverage Score Sparsification (LESS), which is a recently introduced non-oblivious embedding technique. We use LESS to construct the first subspace embedding with low distortion є=*o*(1) and optimal embedding dimension *m*=*O*(*d*/є^{2}) that can be applied in current matrix multiplication time, addressing a question posed by Cherapanamjeri, Silwal, Woodruff and Zhou (SODA 2023).

**“****Understanding the Cluster Linear Program for Correlation Clustering****”**

Nairen Cao, Vincent Cohen-Addad, **Euiwoong Lee**, Shi Li, Alantha Newman, Lukas Vogl

Abstract: In the classic Correlation Clustering problem introduced by Bansal, Blum, and Chawla (FOCS 2002), the input is a complete graph where edges are labeled either + or −, and the goal is to find a partition of the vertices that minimizes the sum of the +edges across parts plus the sum of the -edges within parts. In recent years, Chawla, Makarychev, Schramm and Yaroslavtsev (STOC 2015) gave a 2.06-approximation by providing a near-optimal rounding of the standard LP, and Cohen-Addad, Lee, Li, and Newman (FOCS 2022, 2023) finally bypassed the integrality gap of 2 for this LP giving a 1.73-approximation for the problem. While introducing new ideas for Correlation Clustering, their algorithm is more complicated than *typical* approximation algorithms in the following two aspects: (1) It is based on two different relaxations with separate rounding algorithms connected by the round-or-cut procedure. (2) Each of the rounding algorithms has to separately handle seemingly inevitable *correlated rounding errors*, coming from *correlated rounding* of Sherali-Adams and other strong LP relaxations. In order to create a simple and unified framework for Correlation Clustering similar to those for *typical* approximate optimization tasks, we propose the *cluster LP* as a strong linear program that might tightly capture the approximability of Correlation Clustering. It unifies all the previous relaxations for the problem. It is exponential-sized, but we show that it can be (1+є)-approximately solved in polynomial time for any є > 0, providing the framework for designing rounding algorithms without worrying about correlated rounding errors; these errors are handled uniformly in solving the relaxation. We demonstrate the power of the cluster LP by presenting a simple rounding algorithm, and providing two analyses, one analytically proving a 1.49-approximation and the other solving a factor-revealing SDP to show a 1.437-approximation. Both proofs introduce principled methods by which to analyze the performance of the algorithm, resulting in a significantly improved approximation guarantee. Finally, we prove an integrality gap of 4/3 for the cluster LP, showing our 1.437-upper bound cannot be drastically improved. Our gap instance directly inspires an improved NP-hardness of approximation with a ratio 24/23 ≈ 1.042; no explicit hardness ratio was known before.

**“****Connectivity Labeling and Routing with Multiple Vertex Failures****”**

Merav Parter, Asaf Petruschka, **Seth Pettie**

Abstract: We present succinct labeling schemes for answering connectivity queries in graphs subject to a specified number of vertex failures. An *f*-vertex/edge fault tolerant (*f*-V/EFT) connectivity labeling is a scheme that produces succinct labels for the vertices (and possibly to the edges) of an* n*-vertex graph *G*, such that given only the labels of two vertices *s*,*t* and of at most *f* faulty vertices/edges *F*, one can infer if *s* and *t* are connected in *G*−*F*. The primary complexity measure is the maximum label length (in bits). The *f*-EFT setting is relatively well understood: [Dory and Parter, PODC 2021] gave a randomized scheme with succinct labels of *O*(log^{3} *n*) bits, which was subsequently derandomized by [Izumi et al., PODC 2023] with Õ(*f*^{2})-bit labels. As both noted, handling vertex faults is more challenging. The known bounds for the *f*-VFT setting are far away: [Parter and Petruschka, DISC 2022] gave Õ(*n*^{1−1/2Θ(}^{f}^{)})-bit labels, which is linear in n already for *f *=Ω(loglog*n*). In this work we present an efficient *f*-VFT connectivity labeling scheme using *poly*(*f*, log*n*) bits. Specifically, we present a randomized scheme with *O*(*f*^{3} log^{5} *n*)-bit labels, and a derandomized version with *O*(*f*^{7} log^{13} *n*)-bit labels, compared to an Ω(*f*)-bit lower bound on the required label length. Our schemes are based on a new low-degree graph decomposition that improves on [Duan and Pettie, SODA 2017], and facilitates its distributed representation into labels. This is accompanied with specialized linear graph sketches that extend the techniques of the Dory and Parter to the vertex fault setting, which are derandomized by adapting the approach of Izumi et al. and combining it with hit-miss hash families of [Karthik and Parter, SODA 2021]. Finally, we show that our labels naturally yield routing schemes avoiding a given set of at most *f* vertex failures with table and header sizes of only *poly*(*f*,log*n*) bits. This improves significantly over the linear size bounds implied by the EFT routing scheme of Dory and Parter.