Skip to content

Problems Better Than Solutions

March 21, 2023


Defining the problem often means more than solving it

Avrim Blum is the CAO at TTIC who got his degrees at MIT and then was at CMU almost for 25 years:

  • CAO is Chief Academic Officer;

  • TTIC is Toyota Technological Institute at Chicago;

  • MIT is Massachusetts Institute of Technology;

  • CMU is Carnegie Mellon University..

See his own site for details on TTIC. Quoting his TTIC site, his main research interests are in:

Theoretical Computer Science and Machine Learning, including Machine Learning Theory, Approximation Algorithms, Algorithmic Game Theory, Privacy, and Algorithmic Fairness. Some current specific interests include multi-agent and distributed learning, learning systems that know when they don’t know, and relations among fairness, accuracy, and incentives.

At the very bottom of his webpage is a note about RPP. No, this is not a new complexity class. Let’s talk about what it is.

Reasonable Person

Avrim endorses the CMU SCS Reasonable Person Principle (RPP), which basically asks to be reasonable and assume everyone else is doing likewise.

  • Everyone will be reasonable.

  • Everyone expects everyone else to be reasonable.

  • No one is special.

  • Do not be offended if someone suggests you are not being reasonable.

This seems reasonable.

Reasonable Problem

Some of the key results in Computer Science are new algorithms—that is new methods for solving old problems. For instance, faster matrix product— of course due to Volker Strassen. But some of the coolest results are not new solutions. Rather they are entirely new problems.

Here is an example of a reasonable behavior that maybe people in my walk of life have done for centuries, but that can cause problems in today’s world of available data and rapid updates. Suppose 98 students take a midterm exam but two students have to take it later because they traveled with the school band to March Madness. The professor posts the class average of 75.2 so the class knows the exam was reasonable. Then the other two make up the exam and the professor reposts the new average 74.3.

Innocent enough? Actually not. Everyone can deduce that the two late takers both failed the exam. This kind of issue arises in myriad walks of life where summaries of aggregate information are made available for purposes of research, transparency, and social obligation to have an informed populace. But updates to the aggregate can in some important cases leak information about individual data.

Defining this problem and criteria for combatting it was one of the most important results by Avrim and his coauthors. It is like the RPP—the definition is more critical than the implementation of the principle.

Recognition

We have previously blogged about differential privacy, including last year about the following recognition, from which we quote:

Avrim Blum, Toyota Technological Institute at Chicago; Irit Dinur, Weizmann Institute; Cynthia Dwork, Harvard University; Frank McSherry, Materialize Inc.; Kobbi Nissim, Georgetown University, and Adam Davison Smith, Boston University received the 2021 ACM Paris Kanellakis Theory and Practice Award for their fundamental contributions to the development of differential privacy.



Here is a survey on this notion and a discussion of implementations of this idea. Quoting further:

The theory of differential privacy gave a conceptually new, simple, and mathematically rigorous definition of privacy that allows for the development of mechanisms which provably protect individual information while at the same time enabling statistical analyses of databases. […It] has important advantages over previously used methods: it requires no assumptions about the knowledge of the attacker and its computational capabilities; and it allows for formal analysis of privacy under composition. It is relatively easy to explain and use and has broad applicability. Differential privacy has been employed by a number of large companies and start-ups and notably the 2020 US Census, to make available data for analyses, including machine learning, with applications from marketing to social science research.

Another Definition

The aspect of “enabling statistical analysis of databases” is exemplified by this STOC 2008 paper by Blum with Katrina Ligett and Aaron Roth while at CMU. Their introduction states:

We also introduce a new concept, distributional privacy, which makes explicit the notion that when run on a database drawn from a distribution, privacy-preserving mechanisms should reveal only information about the underlying distribution, and nothing else.

This kind of definition already had a 25-year track record from the theory of zero-knowledge protocols. More precisely stated in this case:

Given a distribution {D} over database points, a database privatization mechanism satisfies distributional privacy if with high probability, drawing an entirely new database from {D} does not change the probability of any outcome of the privatization mechanism by more than some small amount.

Having an inkling for the right definition is the best guidance toward a payoff, which in this case improved over an already successful idea:

We show that distributional privacy is a strictly stronger guarantee than differential privacy by showing that any mechanism that satisfies distributional privacy also satisfies differential privacy, but that there are some functions that can be answered accurately while satisfying differential privacy, and yet reveal information about the particular database (although not about any particular database element) that is not “distributional.”

Open Problems

I must say that I was involved decades ago in other attempts to define notions of security and privacy. See this joint paper with David Dobkin and Anita Jones, titled “Secure Databases: Protection Against User Influence.” We missed the better boat, but we did try back in 1979 to create definitions of security.

One Comment leave one →
  1. March 22, 2023 8:46 am

    A few random reflections —

    People often comment on the analogy between identity, privacy, security tasks and the job of biological immune systems — distinguishing gen from antigen (and gen-erating anti-bodies), which raises the associated logical problem of defining natural kinds.

    The statistical problem brings to mind the relation between fairness in sampling and fairness in ethics, for instance John Rawls and the Veil of Ignorance.

Leave a Reply

Discover more from Gödel's Lost Letter and P=NP

Subscribe now to keep reading and get access to the full archive.

Continue reading