Hilbert’s Lost Problem
Mathematics consists in proving the most obvious thing in the least obvious way—Pólya.
David Hilbert famously presented 23 important open mathematical problems at the International Congress of Mathematicians in Paris in 1900. He intended to include a 24th on the simplicity of proofs.
Today we discuss this problem, which was “lost” until the historian Rüdiger Thiele discovered it in Hilbert’s papers in 2000 and brought it to light in a paper for the American Mathematical Monthly.
I (Ken) have long wondered why Hilbert targeted the number 23 for his list. That some of his problems have broad, vague statements tells me he could have altered their number. To be sure, he discussed only 10 in his actual 1900 lecture; the rest appeared in his long article that year. Until Dick drafted this post and I found Thiele’s article, I had not known about a No. 24—a nice factorial number.
As discussed by Inês Hipolito and Reinhard Kahle, in a special issue of the UK Royal Society Philosophical Transactions A devoted to Hilbert’s 24th, there were only faint traces of it until Thiele’s discovery. They decline to guess Hilbert’s motives for leaving it out, but I venture this: Hilbert could not agree with himself on the terms by which to state it, even broadly. How to quantify simplicity was already the subject of Dick’s draft, into which we blend.
Pólya on Proofs
One simple question is whether a human notion of simpler proof would accord with, say, that of a Martian. George Pólya was one of a vanguard of Hungarian emigré scientists who came to be called The Martians after another of them, the physicist Leo Szilard, gave this answer to Enrico Fermi’s question of why aliens had not been detected when life in the universe should be plentiful:
“They are among us, but they call themselves Hungarians.”
Kidding aside, this could set up a Sapir-Whorf type hypothesis that humans from disparate linguistic groups and mathematical cultures could have different values for proofs. Even if so, the joke would be on non-Hungarians, because Paul Erdős, whose idea of “Proofs from The Book” set the internationally recognized standard of proof elegance, was also a “Martian.”
Pólya’s quote above hits the nail on the head. For an example, consider the theorem that:
Theorem 1 The square root of
is irrational.
The well-known proof begins by supposing not:
One might claim that this famous proof is not obvious—do you agree?
A Measure
Now consider the general task of proving statements of this form:
Think this as an encoding of an interesting property: For example, for each there is a prime
. Or for each planar graph
there is some four coloring of
. Or
To make this more concrete, let us base proofs on the Peano axioms for arithmetic, and suppose that is well-defined in this system. Note that proving
to be total computable is not the same as proving that all its values are positive as Theorem 2 asserts. In case Theorem 2 is provable, we have two key parameters:
Our proof simplicity measure is then
Some Instances
Here are some keys to understanding the value intuitively:
- For known provable problems we might have that
is not too large
- For classic problems like the Four-Color problem we might have that
is huge. Think of the proof that arises in this case.
- For many chess positions where one side can win, say checkmate in 30 moves, the proof of that is likewise huge. Crafting chess problems where the proof is both crisp and hard to find is a centuries-old art.
- For open problems,
is currently infinite.
- But for practical software we get something bounded. The trouble here is that
is potentially huge and could be the size of
.
Consider for the last case a program that calculates your tax payment based on the US tax code, for example. The code has wording like:
An apparently insignificant discrepancy between the wording in the Statutes at Large and the wording in some editions of the U.S. Code with respect to a provision of the Internal Revenue Code (specifically, the words “any papers” in the first sentence of 6104(a)(1)(A) is described by the U.S. Court of Appeals for the District of Columbia in the case of Tax Analysts v. Internal Revenue Serv., 214 F.3d 179 (D.C. Cir. 2000), in footnote 1. According to the Court, some versions of the U.S. Code show the text as “any paper” while the Statutes at Large version shows the text as “any papers.”
Thus the proof could be the same size of
. Thus,
Case: . This is the usual proof of a classic problem like the four-color problem. Huge proof.
Case: . This is like a simple problem or like a proof of a complex program with tons of bits in its description.
Proofs and Software
My famous—infamous?—paper with Alan Perlis and Rich de Millo raised the related issue in the first sentence of our abstract:
It is argued that formal verifications of programs, no matter how obtained, will not play the same key role in the development of computer science and software engineering as proofs do in mathematics.
This argument made over four decades ago has been argued by critics as being wrong. Take a look at this for comments about the recent debate on the paper moderated by Harry Lewis.
We claim that we were correct but for a different reason that we made in the original paper. We got it right but for the wrong reason. The real reason is alluded to in this statement by Tony Hoare, whose programme of proofs for programs partly spurred our paper:
Ten years ago, researchers into formal methods (and I was the most mistaken among them) predicted that the programming world would embrace with gratitude every assistance promised by formalisation to solve the problems of reliability that arise when programs get large and more safety-critical. Programs have now got very large and very critical—well beyond the scale which can be comfortably tackled by formal methods. … It has turned out that the world just does not suffer significantly from the kind of problem that our research was originally intended to solve.
See also his 1996 position paper, “How Did Software Get So Reliable Without Proof?”
Open Problems
Does this measure shed some light on the problems of program verification? Is it useful? Would it have satisfied Hilbert? Hilbert’s handwritten note discovered by Thiele showed him thinking in terms of proofs via polynomial invariants and equations—hence maybe not at a brute level of counting symbols.
[some format fixes, young Hilbert picture]




Since NP=NP-complete=P=P-complete, as I have proven, there must be some automated (= P-complete-solution) “proof simplicity criteria”-discovery/detection method, too; that’s for sure.
And it might be that Esst (= expressions-systems satisfaction theories) give the simplest proofs; I don’t know, though.
There is an important connection that should be added to this post, namely the subject of proof complexity initiated by Cook and Reckhow in the mid 1970’s.
Cook and Reckhow’s definition encompassed the complexity of proofs using an arbitrary method for writing down proofs, which they called a proof system, and they allowed proofs of membership in arbitrary languages. They particularly focused attention on whether there is an efficient way of doing so and on proofs of propositional tautologies, but their notion also applies to first-order proof systems for other objects such as proofs of first-order statements Peano Arithmetic like the ones that you discuss here.
Their general question of simplicity/efficiency was the measure of the size P of the proof as a function of the description size D of the statement being proven. Rather than just the measure M=P/D, they asked whether P is polynomial in D, and whether that is impossible for the language of propositional logic tautologies. (For propositional proofs, the question of whether that is true in every proof system is equivalent to whether NP!=coNP.)
Though Cook and Reckhow focused on propositional tautologies, there is a very nice connection and some direct correspondences to proofs of first-order Pi_2 sentences, like several of the ones in your post, in restricted versions of Peano Arithmetic, which are broadly called Bounded Arithmetic, that Sam Buss defined and elucidated in his PhD dissertation.
If you meant by “NP!” mere “NP,” yes, P = NP = coNP, as I have repeatedly shared with you on this site: pls search this site for Mr. Corvalan’s proof supported by my comment and for my own proofs.
The problem about shorter proves must be relevant to Algorithmic Information Theory. Given a hypothesis H and a conclusion C, a proof is similar to a logical system or a set of logical sentences S so that S(H)-> C. The problem is all about finding the shortest S. As a problem this is not decidable; see Kolmogorov complexity.
A more relaxed and approachable version of the problem is whether for a general problem H of proof S there is some special instance h of the problem with a shorter proof S’. For your example a general proof about irrational numbers of the form sqrt(n) must be more complex than the proof about sqrt(2).
I’m afraid I’ve refuted by my DMd (= diagonal-methods disproofs) the both DM-line SPs(= self-referential paradoxes)-validities and halting-problems undecidability; hence, no Kolmogorov complexity notion works, either. For the above-mentioned DMd, pls see the Profile-link at @koitiluv1842 on Twitter.
M=P/D is an “objective” measure. But consider a more subjective notion of “simplicity” in terms of ease of understanding a proof. No matter how (objectively) “simple” the above proof of sqrt(2)’s irrationality has at least 3 places where a student might squirm a bit: The first is the assumption that p,q lack common factors. Note that the proof as given actually inserts a parenthetical remark explicitly justifying that assumption. The other two “lemmas” that might cause a student to wonder are “Even(p^2) -> Even(p)” and (perhaps) “Even(p) -> 4|p^2”. Seems to me (wearing my teacher’s hat) that those need as much “extra” justification as p,q lacking common factors. This suggests that a subjectively simple proof is one that requires no additional justifications for such lemmas. Of course, taken to the limit, this means that the subjectively simplest proof is one that spells out every detail, thus becoming longer and longer and….less (objectively) simple.