Socially Reproduced Experiments
We must avoid becoming one
|
| Cropped from USA Today source |
José Altuve hit a game-winning home run in the bottom of the ninth against the Yankees on Sunday. He thereby reproduced the conditions and the outcome of baseball’s most dramatic cheating accusation of 2019.
Today, at baseball’s All-Star break, we review this and other social experiments that have quite a bit more data.
Altuve won the 2019 American League Championship series with a pinch-hit homer in the bottom of the ninth against the Yankees’ closer, Aroldis Chapman. As he approached home plate, he was seen telling his teammates waiting to mob him at the plate not to rip off his jersey in celebration. He subsequently scooted into the corridor behind the dugout, then re-emerged into the on-field celebration. This fed accusations that he had been wired with a buzzer to know what kind of pitch was coming from Chapman, in line with sign-stealing by other means from the Astros’ 2017 championship season and 2018 that was proven and punished by Major League Baseball.
Almost the same scenario was reproduced Sunday: Houston down 7-5 against the Yankees with two out and two on base in the ninth, Altuve up against the Yankees’ closer (Chad Green recently supplanting Chapman). Altuve socked a homer to the same part of the ballpark to complete a shocking six-run comeback. Immediately upon touching home, he had his shirt ripped off to reveal nothing but the top half of his birthday suit. This was the most direct way possible to witness that he could have hit the other homer without illegal information.
Examples and Non-Examples
I have dealt with chess-cheating cases in which electronic buzzing has been specifically alleged, including the two most prominent cases of 2013. I will not take this post further in this direction, however, but rather to pose this question:
What is considered a “social proof” of an assertion—especially when there are elements of scientific control and reproduction?
A simple example is a police lineup. This tries to control for whether a witness has previously seen the accused by including the accused among usually four or five similarly represented people. Picking the right person is considered to prove the previous encounter. Statistically, however, this is a -value of only 0.20 or 0.167, which are not considered significant at even the weakest level of “statistical proof.” Allowing null lineups does not change the statistics much.
Baseball gives a non-example that surprises me. One of the bad performances that cost Chapman his closer role was losing an 8-4 lead against the Los Angeles Angels on June 30. As a fantasy-baseball player, I’ve regularly observed poor pitching (by the closers on my “fantasy team”) when the lead is too large to earn credit for a coveted save. Does the data reproduce a phenomenon of closers bearing down less when way ahead, with no “save” to gain? A study after the 2013 season, which cleverly represented performances by the same -scores I use in chess, found none. “Meltdowns” like Chapman’s are offset by cases where closers pitched better. The
-scores in the study are all in the range -1.25 to +1.50 anyway, which count as statistically random.
This study used a reasonably large data set, one that is well-defined and admits controlling factors such as normalizing for game circumstances and the quality of the opposing hitters. At least it is more than the two instances of Altuve. In-between would be an attempt to determine whether certain national soccer teams are consistently worse at penalty-kick tiebreaks. England’s and Italy’s teams brought their long tortured histories together in the tiebreak of Sunday’s European Cup final. The Italians missed two of five kicks, a score that often spells doom, but the English missed three.
Larger Scale
Dick and I are really interested in “experiments” that have spilled into society, with minimal controls but large data. One sphere of this is cybersecurity.
It seems to us that only in the past decade have security experts begun formalizing their research as experimental science with repeatability and reproducibility as explicit criteria. The NSA devoted a special 2012 issue of their Next Wave series to what they titled as “Developing a blueprint for a science of cybersecurity.” Among the contents are:
- An introductory essay by Carl Landwehr titled, “Cybersecurity: From engineering to science.”
- A linchpin paper by Fred Schneider titled, “Blueprint for a science of cybersecurity.”
- A paper by Roy Maxion titled, “Making experiments dependable,” which came from a 2011 Springer LNCS Festschrift.
Maxion’s main example is keystroke biometrics. This covers inferences made from typing style on a computer keyboard or mouse or similar handheld input device. This can be used to verify identity or screen for malfeasant activities. Online chess playing platforms collect data of this nature—okay we could not resist adding chess example.
Another area is experiments designed to simulate attacks and test defenses against them. Schneider’s paper begins with a contrast between predictive modeling versus reactive handling of them. About the latter, he draws an analogy with health care:
“Some health problems are best handled in a reactive manner. We know what to do when somebody breaks a finger, and each year we create a new influenza vaccine in anticipation of the flu season to come. But only after making significant investments in basic medical sciences are we starting to understand the mechanisms by which cancers grow, and a cure seems to require that kind of deep understanding.”
He goes on to outline the kind of scientific foundation that could hopefully underlie a ‘cure’ for intrusion and malware and the like.
What we have seen happen especially in the past months, however—in both health and security—is uncontrolled experiments with society as the domain. Large-scale ransomware attacks are becoming as frequent as hurricanes and heat waves. And of course, the pandemic. These share with Altuve the property of being one-off instances, but have large data on the receiving end.
Summer Pandemic Update
The following chart updates our June 20 post on the state of the pandemic and its projection for the summer—for Florida and the United Kingdom in particular:
The vertical line shows about where the charts were on June 20. The past few days are the first where we can point to a significant rise in Florida, though Missouri had a similar rise last week and it is showing up in some other states. The charts are taken from the Worldometer coronavirus pages.
The UK rise looks ghastly. It was a subtext of our previous post to worry that allowing the large dense soccer crowds at London’s Wembley Stadium for the semis and final—and anything similar in baseball—would stoke the rise in our respective countries even more. However, the rate of hospitalizations in the UK has remained largely flat. This Fortune article last week is one of several attesting that the new cases are mostly in children or in vaccinated people with enough immunity to contain the “breakthrough” positive. The UK is going ahead with large-scale re-openings later this month, with the portion of those 18 and older who have had one dose approaching 90% and those fully vaccinated coming past 70%. The latter number in relation to the whole population is about 52%.
The US looks like becoming an experiment in how the local vaccination rate affects the numbers. The rates of those fully vaccinated by state are currently eerily similar to Joe Biden’s vote percentage in the state. One aspect of scientific reproducibility is the size of the simplest classifier of the results. For a presidential vote to have simpler explaining power than any factors of biology or other life circumstances would make a strange experiment indeed.
Open Problems
Dick and I tried to come up with other examples—from computer security in particular—to sustain what has been occupying our thoughts about standards of proof for policy. We would welcome some examples from you, our readers.
And of course, we have been concerned about the present course of the pandemic amid re-openings since the referenced post last month. In the meantime, if it is your taste, please enjoy the All-Star Game, which Altuve is, ironically, skipping.
[some word fixes and changes]



