Skip to content

Some Statistical Gamma Fun

October 19, 2021


Nothing takes place in the world whose meaning is not that of some maximum or minimum. — Leonhard Euler

Cantor’s Paradise page

Kasper Müller is a mathematics and data science writer for Medium, where he contributes primarily to the blogs Cantor’s Paradise and Towards Data Science. He wrote a nice article last April titled, “The Beautiful Gamma Function and the Genius Who Discovered It.”

Today we discuss the relevance of the gamma function to statistics and use statistics to suggest a new kind of estimate for it.

The “Genius” that Müller refers to is Leonhard Euler. Euler proved that for all integers {n \geq 0},

\displaystyle  n! = \int_0^1 (-\ln s)^n ds = \int_0^\infty t^n e^{-t} dt,

where the latter equation uses the substitution {s = e^{-t}}. The right-hand side produces a value for any complex number {z = x + iy} in place of {n} provided {x > -1}. This leads to the formal definition

\displaystyle  \Gamma(z) = \int_0^\infty t^{z-1} e^{-t} dt,

whose analytic extension is defined everywhere except for {z = 0, -1, -2, -3,\dots}. Because {\Gamma(z)} has no zeroes, its reciprocal is an entire function. One neat value is {\Gamma(\frac{1}{2}) = \sqrt{\pi}}. We will be mainly concerned with ratios of two values of {\Gamma}.

What is Gamma For?

For all {z} except the non-positive integers, {\Gamma} obeys the formula

\displaystyle  \frac{\Gamma(z+1)}{\Gamma(z)} = z.

Of course, this follows from {\Gamma(z) = (z-1)!} for positive integers {z}. Also

\displaystyle  \frac{\Gamma(z+2)}{\Gamma(z)} = \frac{\Gamma(z+2)}{\Gamma(z+1)}\cdot\frac{\Gamma(z+1)}{\Gamma(z)} = (z+1)z.

In general, for all {a > 0},

\displaystyle  \frac{\Gamma(z+a)}{\Gamma(z)} \sim z^a \ \ \ \ \ (1)

but there is a discrepancy. This and the lack of a simple explicit formula for {\Gamma(z)} at all have always made the {\Gamma} function seem opaque to me. Two notable values are {\Gamma(\frac{1}{2}) = \sqrt{\pi}} and {\Gamma(\frac{3}{2}) = \frac{\sqrt{\pi}}{2}}.

The {\Gamma} function is not even the only uniformly continuous interpolation of the factorial function. It is the unique one whose logarithm is a convex function. This is the first of many reasons given in Müller’s article for {\Gamma} to be salient and beautiful, culminating in its relation to the Riemann zeta function given by

\displaystyle  \frac{\Gamma(\frac{s}{2})\zeta(s)}{\pi^{s/2}} = \frac{\Gamma(\frac{1-s}{2})\zeta(1-s)}{\pi^{(1-s)/2}}.

Yet the log-convex uniqueness was proved only 99 years ago, and none of these tell me at a flash what the {\Gamma} function is.

What is the simplest label for its corner of the sky? The leading example is the formula for the volume of a sphere of radius {r} in {n} dimensions:

\displaystyle  V_n = \frac{\pi^{n/2}}{\Gamma(n + \frac{1}{2})}r^n.

But I wonder whether a different application is more fundamental. Since we are dealing with {a = \frac{1}{2}} already here, let us define the function

\displaystyle  \Gamma_{1/2}(z) = \frac{\Gamma(z+\frac{1}{2})}{\Gamma(z)}.

Noting {\Gamma_{1/2}(z) \sim z^{1/2}} via (1), this is a tweak of the square-root function. Here are some values of it:

\displaystyle  \begin{array}{rcl}  \Gamma_{1/2}(1) &=& \frac{\Gamma(1.5)}{\Gamma(1)} = \frac{\sqrt{\pi}/2}{1} = \frac{\sqrt{\pi}}{2}\\ \Gamma_{1/2}(2) &=& \frac{\Gamma(2.5)}{\Gamma(2)} = \frac{3\sqrt{\pi}/4}{1} = \frac{3\sqrt{\pi}}{4}\\ \Gamma_{1/2}(3) &=& \frac{\Gamma(3.5)}{\Gamma(3)} = \frac{15\sqrt{\pi}/8}{2} = \frac{15\sqrt{\pi}}{16}\\ \Gamma_{1/2}(4) &=& \frac{\Gamma(4.5)}{\Gamma(4)} = \frac{105\sqrt{\pi}/16}{6} = \frac{35\sqrt{\pi}}{32}\\ \end{array}

Here is the significance:

For integer {n \geq 1}, the expected Euclidean norm of a vector of {n} independent samples from the standard Gaussian distribution is

\displaystyle  \sqrt{2}\cdot\Gamma_{1/2}(\frac{n}{2}). \ \ \ \ \ (2)

That’s it: Gamma gives the norm of Gaussians. The norm is of order {\sqrt{n}} but not exactly. The {\Gamma} function gives it exactly.

An Inferior But Curious Estimate

The norm of {n} independent Gaussians is called the chi distribution. Its square is the better-known chi-squared distribution. This idea is used in the statistical chi-squared test, but what follows is simpler.

We let {X^2} stand for the square norm divided by {n}, so that {X} stands for the Euclidean norm divided by {\sqrt{n}}. From (2) we have

\displaystyle  E[X] = \sqrt{\frac{2}{n}}\Gamma_{1/2}(\frac{n}{2}).

We will estimate {E[X]} a different way and use that to estimate {\Gamma_{1/2}}. First we note that since the vector entries {z_i} are independent and normally distributed, we have the exact values


\displaystyle  \begin{array}{rcl}  E[X^2] &=& \frac{1}{n}\sum_{i=1}^n E[z_i^2] = 1\\ Var[X^2] &=& \frac{1}{n^2} \sum_{i=1}^n Var[z_i^2] = \frac{1}{n^2}\sum_{i=1}^n (E[z_i^4] - E[z_i]^2) = \frac{1}{n}(3 - 1) = \frac{2}{n}. \end{array}


Since we have {E[X^2]}, computing either {E[X]} or {Var[X]} suffices to get the other, by the relation {Var[X] = E[X^2] - E[X]^2}. Our also having {Var[X^2]} enables estimating {Var[X]} via the delta method, in a particular form I noticed here. The derivation requires no special properties of {X}:

\displaystyle  Var[X^2] \approx 4E[X]^2 Var[X] - Var[X]^2. \ \ \ \ \ (3)


For our particular {X} with {Var[X] = 1 - E[X]^2}, this yields a quadratic equation in {y = E[X]^2}:

\displaystyle  \frac{2}{n} \doteq 4y(1 - y) - (1-y)^2, \quad\text{so}\quad 5y^2 - 6y + 1 + \frac{2}{n} = 0 \quad\text{so}\quad y = \frac{6 + \sqrt{16 - \frac{40}{n}}}{10}.

This yields

\displaystyle  \frac{2}{n}\Gamma_{1/2}^2(\frac{n}{2}) \doteq \frac{3}{5} + \sqrt{\frac{4}{25} - \frac{2}{5n}}.

Changing variables to {z = \frac{n}{2}} and rearranging, we get the estimate

\displaystyle  \Gamma_{1/2}(z) \doteq \sqrt{\frac{3z}{5} + \sqrt{\frac{4z^2}{25} - \frac{z}{5}}}.

It has been traditional to estimate what we would call {\Gamma_{1/2}(z+\frac{1}{2})} instead, so putting {x = z + \frac{1}{2}} we finally get:

\displaystyle  \frac{\Gamma(x+1)}{\Gamma(x+\frac{1}{2})} \sim \sqrt{0.6x + 0.3 + 0.2\sqrt{8x^2 - 2x - 1.5}} ~. \ \ \ \ \ (4)

As an estimate, this is barely competitive with the simple {\sqrt{x + 0.25}} and far inferior to

\displaystyle  (x^2 + 0.5x + 0.125)^{1/4},

which is the first of several estimates of the form {p_k(x)^{1/2k}} given by Cristinel Mortici in 2010. But it is curious that we got a formula with nested radicals and non-dyadic coefficients from a simple statistical estimate. It makes us wonder whether formulas with nested radicals can be tuned for greater accuracy, and whether this might knock back to statistical estimation.

Open Problems

Can vectors of Gaussian variables be leveraged to say further interesting things about the gamma function and its applications? What are your favorite properties of the gamma function?

[fixed missing “ds” in intro, typo n–>sqrt(n) at end of sentence with “divided by”]

4 Comments leave one →
  1. Michael Brundage permalink
    October 20, 2021 6:03 pm

    Minor correction: “We let {X^2} stand for the square norm divided by {n}, so that {X} stands for the Euclidean norm divided by {n}.” I think you mean divided by \sqrt{n} at the end there.

  2. October 20, 2021 6:33 pm

    Minor correction: There’s a ds missing in the first integral.

Leave a Reply

Discover more from Gödel's Lost Letter and P=NP

Subscribe now to keep reading and get access to the full archive.

Continue reading