Explanations and Explorations
Comparing proofs for the Jaccard metric
|
| BetterExplained source |
Kalid Azad is the founder of the website Better Explained. It is devoted to explaining mathematical concepts. He also has written two books.
Today we discuss how some proofs provide a concise explanation whereas others promote exploration of related concepts.
Azad’s site has a rich page titled, “Math Proofs vs. Explanations (aka Nutrition vs. Taste).” It argues that the best explanations start with an analogy to a relation that readers already understand. Even if the connection is not sharp, it can be refined once the reader’s attention is solid. This is opposed to a formal proof in which every step is sharp and correct but intuition is wanting.
To this we add the role proofs can play in exploration. If you have one proof of a theorem that you understand, there is value in seeking other proofs that use other ideas. Usually we think of ideas as coming first—as thoughts we refine into a proof. The advantage of starting with a proof is already having certitude and sharpness—you know a recipe that works and now can try varying and augmenting it.
Jaccard Distance as Example
Azad’s page gives examples of proofs for the Pythagorean Theorem and for . It then quotes from William Thurston’s essay “On Proofs and Progress in Mathematics,” which we once mentioned. We will use the example of Jaccard distance
from our previous post. We start with this definition:
now using for the symmetric difference
. So the triangle inequality becomes, for any finite sets
:
We think the proof we gave in the last post is simple and direct and intuitive but maybe not explorative. It first connects the solid understanding that without the denominators this would be the well-known triangle inequality for Hamming distance. To reprise, it considers fixed and varies
to arrive at that simpler fact in three steps:
-
If
contains
elements not in
then removing them subtracts
from both right-hand numerators and both right-hand denominators. Since those fractions are each
(else 1 would be immediately true), the right-hand side goes down.
-
Then we have
and can replace the denominators by
without increasing the right-hand side.
-
Now we have a common denominator and a statement equivalent to the known truth about Hamming distance. Since undoing the first two steps to restore the original
can only increase the right-hand side, (1) is proved in all cases.
This reasoning readily extends to nonnegative measures on
besides simple counting, provided the removal of elements from
makes the same additive-or-proportional change to
as it does to
, and likewise for the other fraction.
Three Snapshot Proofs
The first short proof should join the pantheon of half-page journal papers. Under fair use, here it is in one screenshot:
Perhaps this is too short. We think this proof would have been more satisfying if a few more lines of calculation had been added. Let us divide the region into its inner part
and outer part
and do likewise for
. Then it seems the intent was:
The end uses the symmetric-difference definition of , so perhaps fully expanding this paper’s intent would have been longer. One can also begin with that definition to get a shorter calculation, but it skips over the
step. Indeed, it does not mention
at all, so it was not intended. The proof by Artur Grygorian and Ionut Iacob in a short paper in last October’s College J. Math. strikes us as a similar-style proof.
The second proof comes from a MathOverflow thread. It assigns a variable to each region of the Venn diagram, forms the fractions, and cross-multiplies to obtain “the following monstrosity”:
The fact that no coefficient is negative completes the proof. This is clear from a computer algebra system, but what about why no negative term appears?
We have realized since the last post that the second of two proofs given in the 2016 paper by Sven Kosub, which we linked in that post, is really equivalent to ours. This is easier to see if one just presumes in the following:
Here sub-modularity is a standard property for which Kosub cites the equivalent condition that whenever and
,
This suffices for step 1 of our earlier proof, first taking then
; the rest of that proof needs only that
is monotone (and implicitly
).
A Magical Proof
Now we look at proofs that add ideas. The first one still strikes us as clean and magical. We are computer scientists so it is natural to think of finite sets as binary-valued vectors of length . They have a
in position
precisely when
is in the set. Of course
is the size of the “universe.”
Now let be such a non-zero vector. The key is to use a probabilistic proof. We will show that we can relate the Jaccard distance to the outcome of a simple random experiment. The experiment once selected leads to a simple proof—it only requires the union bound. Recall this is the fact that
for any two events and
.
The cool idea is to look at the permutations of the vector . For a permutation
let us define
to be
Let provided
is the first value that is equal to
. Of course since
is non-empty it follows that this is well defined.
Note is a random variable that depends on the choice of the permutation
. The key is to see that the probability that
when we average over all permutations uniformly is equal to
This follows by noting that there are ways to select the same
and there are
total ways to select an
. Complementing gives us that the probability of
equals
.
Now hark back to our sets . The event
is subsumed by the disjunction of events
regardless of what is. By the simple union bound, the probability of the first event is at most the sum of the probabilities of the latter two events. We have thus proved
The last step is the same as in the proof that Hamming distance is a metric. What does the randomized view gain us? It gains a nice interpretation of as the probability that
and
hash to different values under the min-hash function
for random
. Min-hashing is used all the time—see this book chapter by Jure Leskovec, Anand Rajaraman, and Jeffrey Ullman, with this proof in section 3.3.3.
A Gradient Idea
Atri Rudra suggested to us the “game” of adjusting one element at a time to walk it toward an extreme value. The sets
and
can be adjusted too. We start by assuming the triangle inequality (1) is false and make moves that can only keep it that way, until we reach a case where it is obviously true.
Step 1 of our proof already plays this game by removing from any elements not in
. So we really start the game with
and we want to walk it to
. Simply replacing the denominators
and
in (1) by
was good in step 2 of the proof but is not a legal move in this game.
What we can do legally is add elements from to
: those leave the denominators unchanged but lower the numerators
and
. The interesting case is when we want to add to
an element from
or from
. The former add decreases the numerator
and increases the denominator
while leaving
unchanged, but it increases the numerator
. Let us abstract the right-hand side of (1) to
. Then the former add converts it to
If both moves increase the right-hand side, then we must have
And from
But cross-multiplying gives the contradiction . So one or both moves must always be possible. This grows
to include either all of
or all of
. The rest of the argument to gobble up all of
we leave to you, dear readers.
Compared to the above proofs, this is tedious. But it captures some tensions among the sizes of ,
, and
that may inform intuitions about Jaccard similarity under changes in the sets.
Open Problems
Which proof do you like best for explanation and which for creative impulse?
This is our post. We intended this discussion as number 800 but were surprised to find the simple proof by reduction to triangle for Hamming distance (steps numbered 1-2-3 above). Are we really the first to write it down, with acknowledgment also to Kosub?
[some typo fixes]






Typo fixes: In Gilbert expansion, extraneous union symbol on 1st line. Missing bars in 2nd line denominator. In “Magical Proof”, para. 2, sentence 2, missing verb.
Well noted—thanks very much. Fixed.
P versus NP is considered as one of the most important open problems in computer science. This consists in knowing the answer of the following question: Is P equal to NP? A precise statement of the P versus NP problem was introduced independently by Stephen Cook and Leonid Levin. Since that date, all efforts to find a proof for this problem have failed. Given a positive integer x and a collection S of positive integers, MAXIMUM is the problem of deciding whether x is the maximum of S. We prove this problem is complete for P. Another problem is SUCCINCT-MAXIMUM. SUCCINCT-MAXIMUM contains the instances of MAXIMUM that can be represented by an exponentially more succinct way. We show this succinct version of MAXIMUM must not be in P. Under the assumption of P = NP, we prove the membership of SUCCINCT-MAXIMUM in P is also hold. In this way, we demonstrate the complexity class P is not equal to NP by the reduction ad absurdum rule. Another major complexity classes are LOGSPACE and NLOGSPACE. Whether LOGSPACE = NLOGSPACE is another fundamental question that it is as important as it is unresolved. We show the problem MAXIMUM can be solved in logarithmic space. Consequently, we demonstrate the complexity class LOGSPACE is equal to P and thus, LOGSPACE is equal to NLOGSPACE.
https://www.academia.edu/38039040/LOGSPACE_vs_NLOGSPACE_and_P_vs_NP
You can participate in the debate
https://www.reddit.com/r/compsci/comments/a9l018/is_this_complexity_theory_proof_correct
Thanks