SODA 2023

March 1, 2023

tags: accepted papers, Anindya De, long lists of links, SODA 2023, trace reconstruction problem

Traces of strings, plus ways of tracing accepted papers

Anindya De was at Northwestern University and is now at the University of Pennsylvania—see here.

He was advised by two of the top advisors ever there were: Luca Trevisan and Umesh Vazirani.

Traces

I recently ran across a great paper by Anindya titled Approximate Trace Reconstruction from a Single Trace. It is co-authored with Xi Chen (Columbia University), Chin Ho Lee (Harvard University), and Rocco Servedio and Sandip Sinha (Columbia University). Notice that we did not put an Oxford comma between Servedio and Sinha as they are both from Columbia. The paper appeared at SODA 2023 this January.

Here are pointers to the almost 200 papers that were in the conference. I put this together before discovering the site conference-publishing.com, which as mentioned in my STOC 2023 post generates paper lists with links for a host of conferences. So I did all the following links myself. Do scroll past the list to the bottom to read a little more about traces which Ken and I put together.

The Trace Result

The trace problem begins by sending a binary string ${x}$ of length ${n}$ through a deletion channel with parameter ${\delta \in [0,1]}$ . Each bit ${x_i}$ entering the channel survives with probability ${1 - \delta}$ to be part of the output string ${y}$ . That is, ${x_i}$ is deleted with probability ${\delta}$ . The deletions are independent. For an unknown string ${x}$ , the problem is:

Given ${k}$ strings ${y_1,\dots,y_k}$ produced by ${k}$ runs of the channel on ${x}$ , reconstruct ${x}$ if possible. Else, calculate a binary string ${x'}$ of length ${n}$ that minimizes a distance metric ${d(x,x')}$ . The metric of choice is to maximize the length ${\ell(x,x')}$ of the longest common subsequence (not necessarily contiguous) of ${x}$ and ${x'}$ , which corresponds to minimizing their edit distance.

As indicated by its title “Approximate Trace Reconstruction from a Single Trace,” the paper tackles the extreme case ${k=1}$ . Of course one cannot reconstruct ${x}$ (unless no deletions occur so ${y = x}$ ) so the game is to find ${x'}$ that are most likely to have produced the lone observed ${y}$ . The scoring function takes the expectation of ${\ell(x',x)}$ over both the generation of ${y}$ from the true ${x}$ and the run of the algorithm guessing ${x'}$ from ${y}$ . There are two main questions:

How well does the algorithm perform—relative to theoretically optimal choices given ${y}$ —when ${x}$ itself is generated uniformly at random?
How well does the algorithm perform when ${x}$ is generated adversarially? Note that ${y}$ is still probabilistic, and the performance of both the theoretical optimal algorithm and their algorithm are evaluated based on the distribution of ${y}$ for the fixed (unseen) ${x}$ .

These questions are posed for small, medium, and large values of ${\delta}$ . When the deletion probability is close to ${1}$ , the strings ${y}$ are most often tiny. One would think they offer no help in coming close to ${x}$ . However, they do help efficient algorithms come close to the optimal policy for a worst-case chosen ${x}$ . The paradoxical results of their paper, in their own words (but reversing their order), are:

In the average-case setting, having access to a single trace is provably not very useful: no algorithm, computationally efficient or otherwise, can achieve significantly higher accuracy given one trace that is ${o(n)}$ bits long than it could with no traces.
Having access to a single trace is already quite useful for worst-case trace reconstruction: an efficient algorithm can perform much more accurate reconstruction, given one trace that is even only a few bits long, than it could given no traces at all.

The deep point is that when ${x}$ as well as ${y}$ is random, seeing ${y}$ gives little advantage to both the optimal strategy (which does not know ${x}$ ) and their algorithm. Whereas, when ${x}$ is fixed, the knowledge of ${y}$ is more valuable to the optimal strategy and separates it from the case of not seeing ${y}$ at all. However, the profit given by even a short ${y}$ is one that is apprehendable by a complexity-limited deterministic algorithm that sees only ${y}$ . That’s our attempt at an intuitive takeaway; as always we invite readers to consult the paper in detail.

Open Problems

Comparing my list of pointer to the papers from SODA, which was a bit of trouble to create by hand, to the STOC’23 output from the conference-publishing site, leads to a curious question:

Do we scan lists of papers more by looking for subject words in their titles or looking for authors we know?

Well, I have not found SODA’23 on that website, where authors too would be given; for me, copying the authors would more than double the manual work.

No comments yet

	rjlipton on An Open Problem
	Jordan on An Open Problem
	Jon Awbrey on An Open Problem
	Blog Runner on An Open Problem
	William Gasarch on An Open Problem
	DL on An Open Problem
	Jon Awbrey on An Open Problem
	An Open Problem \| Gö… on Cargo Cult Redo
	Peter Gerdes on Women in Math Research
	Peter Gerdes on Women in Math Research
	rjlipton on Women in Math Research
	Cristóbal Camarero on Women in Math Research
	Yasu on A Big Result On Graph Isomorph…
	Yuly Shipilevsky on Guggenheim 2024
	Blog Runner on 2023 Turing Award

a personal view of the theory of computation

SODA 2023

Traces

The Trace Result

Open Problems

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Gödel’s Lost Letter

Our Book

Recent Posts

Top Posts

Recent Comments

Blogroll

Archives

Sitemeter

a personal view of the theory of computation

SODA 2023

Traces

The Trace Result

Open Problems

Share this:

Like this:

Related

Leave a ReplyCancel reply

Subscribe to Gödel’s Lost Letter

Our Book

Recent Posts

Top Posts

Recent Comments

Blogroll

Archives

Sitemeter

Discover more from Gödel's Lost Letter and P=NP