Sleeping Beauty and Hazmat Train Derailments
A tale of two probability problems with easy solutions and how engagement-driven media makes them out to be something esoteric.
It’s engagement versus truth, and on social media the winner is not always the truth.
Engagement thrives on discussion and argument. Probability puzzles that people try to solve using words instead of models provide a never-ending source of argument and the same applies to the likelihood of random events when they actually happen, with a recent Ohio train derailment being a perfect example.
In both cases, doing the math solves the problem, but as it removes sources for argument and requires work, it lowers engagement. Hence the detrimental effect on the quality of argumentation of optimizing for social media engagement.
Sleeping Beauty: no consensus needed as there's an actual procedure to solve it.
Veritasium, a popular science channel on YouTube, recently had a video about the Sleeping Beauty puzzle that illustrates the tension between generating engagement by fomenting discussion on one hand and educating the audience on the other.
Disappointingly, he starts his video by saying there's “no consensus.” There are math problems for which there's no consensus answer, meaning there's no known answer. This is not one of them; this is a simple probabilistic puzzle.
The Sleeping Beauty puzzle, from Wikipedia:
Sleeping Beauty volunteers to undergo the following experiment and is told all of the following details: On Sunday she will be put to sleep. Once or twice, during the experiment, Sleeping Beauty will be awakened, interviewed, and put back to sleep with an amnesia-inducing drug that makes her forget that awakening. A fair coin will be tossed to determine which experimental procedure to undertake:
If the coin comes up heads, Sleeping Beauty will be awakened and interviewed on Monday only.
If the coin comes up tails, she will be awakened and interviewed on Monday and Tuesday.
In either case, she will be awakened on Wednesday without interview and the experiment ends.
Any time Sleeping Beauty is awakened and interviewed she will not be able to tell which day it is or whether she has been awakened before. During the interview Sleeping Beauty is asked: "What is your credence now for the proposition that the coin landed heads?"
This is a probability puzzle, which means that there are two approaches to solve it correctly: the frequentist approach, which happens to be quite opaque in this particular case; and the much superior Bayesian approach.1
There are two main reasons to use an established procedure like Bayesian modeling:
1. Instead of relying on our own creativity and understanding of the field, we're relying on the combined brainpower of several geniuses (Laplace, Gauss, Daniel and Jacob Bernoulli, Thomas Bayes, and the list goes on) whose ideas have been checked and rechecked by a literal crowd of smart people.
2. The formalism of the procedure makes all assumptions and all inference steps clear. That means that all judgment calls made by the modeler have to be explicit and formal. It also means that disagreement over results has to be traceable to at least one of these two possibilities:
2.a. Different assumptions about unclear elements of the problem (not in the Sleeping Beauty example, where all elements are explicit and clear, though people who are unfamiliar with formal modeling may not think so, at least at first).
2.b. Errors or unsupported steps in the inference.
Any attempt to argue against a result that doesn't start with tracing the source of the disagreement to 2.a or 2.b signals lack of understanding of the purpose of using an established formal procedure and typically presages an attempt to make word arguments for numbers problems (always a waste of time and a bad sign).
To solve a math problem, do the math. To contest a math result, check the math.
Now, let us apply the Bayesian reasoning procedure to the problem:
First, identify unobserved states of the world and associated prior probabilities.
There are two unobservable state variables: the result of the coin toss {H,T} and the day of the week {Mo,Tu}. Hence, there are four states of the world, with the following probabilities, computed from the fairness of the coin and the randomness of the day.
The double parenthesis is because the outer one is the function Pr(state) and the internal one is for the pair (coin toss result, weekday) representing the state.
Second, identify the observable events, here only E = {Sleeping Beauty is awakened and asked a question} because that's the only observation that can be made by the respondent, and the conditional probability of the event on each state.
The probability that the event E happens given each state is:
Note that since the probability of the event isn't the same for all states, the event is informative with respect to the states.2 Since the domain of that informativeness cannot be reduced further, this also tells us that our space state is parsimonious.
Finally, compute the posterior probabilities of the states conditional on the observed event, using Bayes’s rule.
We will need Pr(E), the prior probability of the event, usually a common source of mistakes. We get it by integrating out the state:
This computation is a common source of error for people who are otherwise smart and well-educated, because they think Pr(E) must be 1 as “the event happened,” which is not what it means: Pr(E) is the prior probability of the event, not the observation.
We can now compute the probability of each of the states given E, using Bayes’s rule:
The probability of heads given E is the total probability of the two first lines, that is 1/3, and similarly the probability of tails given E is the total probability of the last two lines, that is 2/3.3
That's it, no special knowledge necessary beyond the Bayesian procedure.
Any dissension about that 1/3, as per point 2.b. above, has to identify the element in the procedure that is either unsupported or wrong. That’s the value of using math.
Words arguments need not apply.
Train derailments in which a hazmat car is damaged: scarily, much more common than people think.
It’s not as bad as we think; it’s much worse than we think.4
By now anyone who’s likely to read this knows that a freight train carrying vinyl chloride derailed in East Palestine, OH, leading to some environmental impacts that became a major source of engagement for people who know nothing about science or statistics, but are quick to jump onto any bandwagon, even a derailed one.
Several knowledgeable people, mostly drowned out by the nonsense, pointed out that the spill itself was not good but also not catastrophic and that burning vinyl chloride creates combustion products that are less dangerous than it.
A few, like perennial twitter Tesla critic and shift-key under-user
, actually looked at the numbers, and despite all the hoopla on social media (and the ostrich-like behavior of the mainstream media), these derailments aren’t uncommon:Boriquagato wrote his own post, well worth reading, here.
Using that 137 derailments/year in which a hazmat car is damaged, we can ask a few questions. For example, how likely is it that there’s at least two derailments with hazmat car damage next week? Or, how likely is it that we see at least one week with at least five derailments with hazmat car damage in a given year?
Probability theory has the answers.
We need to make a couple of assumptions: we’ll use the 2022 number (137) and 52 weeks in a year as our “derailments per week” baseline; we’ll also assume that these derailments are independent random events.
These two assumptions mean that the number of derailments per week, which is a number with associated uncertainty — or, as we call it in fancy wordage, a “random variable” — follows a Poisson distribution with parameter (average) 137/52.5 That distribution is on the left in the following picture:
From that distribution we can see, for example, that there’s about 1/4 probability of observing exactly 2 derailments with hazmat car damage in any given week. And also that it’s very unlikely that any given week has 20 such derailments (the probability of N= 20 is 0.00000000000765633504417277, or 1 in 130,610,794,098).
We can also note that there’s a probability Pr(N=0) = 7.2% that a specific week has no derailments with hazmat car damage. In other words, a given week in which there’s no such derailment is an occasional exception, while the people compulsively tweeting about the East Palestine, OH derailment seem to think it’s the norm.
The other two charts in the figure are for “at least N” derailments: in a specific week (center) or in at least one week in a year (right). First, we can see that for a specific week, there’s almost a 50/50 chance of at least 3 derailments, and if we allow for any of the 52 weeks of the year, there's a 60/40 chance of at least 7 derailments in one of those weeks.6
Let that sink in: there’s a 60/40 chance that, in a given year at least one week has at least 7 derailments in which hazmat cars are damaged, i.e. an average of one a day for that week.
Now is a good time to go read
on these hazmat chemicals moving around in trains, here.Engagement versus education
What's the common thread between these two examples?
Putting engagement above education.
I like Veritasium videos. I do: I've been subscribed to his channels for more than a decade; I recommend them all the time to others. And engagement is important for education, as an apathetic audience won't learn. But not all forms of engagement are equally appropriate.
The “there's no consensus” opener in that particular video does get people to argue, and generates engagement; but it anti-teaches them (it teaches the wrong lesson), as it presents a probability problem as something esoteric with no clear answer.7
As for the derailment in Ohio, before there was any information the “everything supports my conspiracy theory” crowd was out in force. This has happened pretty much with every common disaster in recent months, from food production facility fires to chicken processing unit accidents.
As mentioned by, well, anyone who has ever worked in a production facility —or studied operations management, or thinks about it for a moment— when we run a facility near its maximum capacity (for example, because there's a shortage of its product), things are more likely to fail than usual: because most equipment and personnel work better when there's some slack and also maintenance and repair are sometimes postponed in order to keep capacity temporarily high.
But that would decrease engagement.
A final story, regarding engagement and education:
Recently, a YouTuber whose math-related videos were popular was invited to teach a course as a guest instructor at a prestigious university; the invitation was based in large part on the popularity of his videos. The results were mixed, according to sources at the school, essentially because a course has to have a unifying theme and a organizing framework; and that's very different from picking and choosing fun but mostly unrelated topics for videos, usually curiosities; those are engagement-rich but instruction-poor.8
No prizes for guessing which camp in statistics, the frequentist or the Bayesian, I'm in. Having solved the problem using the frequentist approach as well, it yields the same result (obviously) and it’s really hard to explain unless the audience has a lot of practice in frequentist modeling.
There's more about events and informativeness in the "Bayesian Interlude" chapter of my Intergalactic mega-bestseller DATA to INFORMATION to DECISION, click for a free sample; that interlude isn't in the sample, though.
Technically Pr(T|E) = Pr( { (T,Mo) or (T,Tu) } | E), which then becomes a sum of the individual probabilities because of the independence of the states.
The East Palestine, OH derailment isn’t as bad as some think: the combustion products of vinyl chloride are mostly water and carbon dioxide and the derailment itself isn’t that uncommon. But the general situation is much worse than most people think, with derailments a common event, even derailments where a hazmat car is damaged or worse, hazardous materials are released to the environment.
If you have random independent events generated by a fixed process and the average time between events (averaged over infinite time; in probability theory we’re always in God-mode) is T, the time between events follows an Exponential distribution with average T, and the number of events that happen during an interval of length I follows a Poisson distribution with average I/T. There are a lot of assumptions for all this, but that’s the gist of it.
For the trust-but-verify crowd, the Poisson distribution cumulative density function (the F(x)) is available in the programming language R (I like the RStudio environment, myself) as
F(x) = ppois(x, 137/52)
and in the Apple spreadsheet Numbers as
F(x) = POISSON(x, 137/52,TRUE)
Veritasium may actually believe that there’s no consensus answer, for two reasons: some well-educated people have significant lacunae in their probability theory basics, and I’m talking string theorists getting these examples wrong (probably because their instructors were as bad as mine and the basics rarely get used except in these puzzles); and there’s a lot of “words” debate, including in the wikipedia page, and lots of people in “let’s do rationality without learning the math parts” fields who will debate anything by playing word games. But, repeating the secondary point of this post:
To solve a math problem, do the math; to contest a math result, check the math. Anything else is a waste of time.
The same applies to writing this kind of blog posts versus writing a textbook. Despite their much maligned profession, university instructors actually need some specific skills that are generally not visible from the audience point of view.
Part of the reason fun math videos are fun and boring math lectures are boring is that the topics in the videos are chosen because they’re fun but the lectures have actual learning objectives so they can’t be only about the fun things. Of course, there are sometimes other reasons why boring math lectures are boring.