# Fivethirtyeight Riddler: Can The Riddler Bros. Beat Joe DiMaggio’s Hitting Streak?

This weekend I took on fivethirtyeight’s weekly Riddler question again. The original problem text can be found here).

The problem statement:

Five brothers join the Riddler Baseball Independent Society, or RBIs. Each of them enjoys a lengthy career of 20 seasons, with 160 games per season and four plate appearances per game. (To make this simple, assume each plate appearance results in a hit or an out, so there are no sac flies or walks to complicate this math.)

Given that their batting averages are .200, .250, .300, .350 and .400, what are each brother’s chances of beating DiMaggio’s 56-game hitting streak at some point in his career? (Streaks can span across seasons.)

By the way, their cousin has a .500 average, but he will get tossed from the league after his 10th season when he tests positive for performance enhancers. What are his chances of beating the streak?

There’s two steps to this problem. First, find the probability of getting a hit in a game, which is trivial knowing BA:

The next step asks “What is the probability of getting a streak of length X in a fixed number of attempts,” which as it turns out, finding a closed form solution to this is not trivial - see discussion on askamathematician and math.stackexchange.

But this is why we have computers. I wrote up a simulation to solve this problem simulating the careers of players with the indicated batting average and career length, and found how often said players would beat DiMaggio’s hit streak. The results are as shown:

The likelihood of a player beating DiMaggio’s record can be thought of as how frequently a simulated careers beats the record compared to the total number of simulated careers. The plot just shows the [5%, 95%] range to avoid outliers from skewing the range. The results:

• All players through the one with 0.300 have effectively no shot at beating DiMaggio’s record.
• The player with .350 BA has a 0.8% chance of beating it, so not very likely at all. On average, this batter will just have a 35 game hitting streak.
• The player with .400 BA has a non-negligible chance of beating the record at about 14%, with an average of a 47 game hitting streak occurring during their career.
• Even with 10 fewer seasons, the PED player will very likely beat DiMaggio’s streak, about 93.5% likelihood, and on average will beat it by nearly 24 games with a mean longest streak of 79.9.

For simulation validation, the final simulated BA of each player was plotted, and ensured that it did, in fact line up with the BA in the problem statement. It did, with a coefficient of variation (std/mean) of between 0.01 and 0.02.

## 2023 NHL Playoff Predictions

Who will win this year’s cup?

## A hierarchical model for hockey scoring

Just how lucky have the 18-3 Bruins gotten?

## Accessing Public Baseball Data in Julia

Interoperability is the name of the game

I got a job!

## Box Score Thoughts: Tempering Run Expectations from Hits

Revisiting some old work, and handling some heteroscadasticity

## Gauging Home-field Advantage in 2020

Using a Bayesian GLM in order to see if a lack of fans translates to a lack of home-field advantage

## Fivethirtyeight Riddler: Golf Percentages

An analytical solution plus some plots in R (yes, you read that right, R)

## Fivethirtyeight Riddler: Astrophysical Signals

okay… I made a small mistake

## Classifying MLB Hit Outcomes - Part 4: Application and Reflection

Creating a practical application for the hit classifier (along with some reflections on the model development)

## Classifying MLB Hit Outcomes - Part 3: Studying Re-sampling Methods

Diving into resampling to sort out a very imbalanced class problem

## Fivethirtyeight Riddler: Can You Find The Fish In State Names?

Or, ‘how I learned the word pneumonoultramicroscopicsilicovolcanoconiosis’

## Classifying MLB Hit Outcomes - Part 2: Optimization

Amping up the hit outcome model with feature engineering and hyperparameter optimization

## Classifying MLB Hit Outcomes - Part 1: Model Selection

Can we classify the outcome of a baseball hit based on the hit kinematics?

## Applying to Work in MLB Front Offices - My Experience

A summary of my experience applying to work in MLB Front Offices over the 2019-2020 offseason

## Fivethirtyeight Riddler: How Low Can You Roll?

Busting out the trusty random number generator

## Astros 2017 K% Change

Perhaps we’re being a bit hyperbolic

## Fivethirtyeight Riddler: Which Baseball Team Will Win The Riddler Fall Classic?

Revisiting more fake-baseball for 538

## Evaluating Lance Lynn’s Unexpected 2019

A deep-dive into Lance Lynn’s recent dominance

## 2015-2016 Di-Higgs Combination

Fresh-off-the-press Higgs results!

## Fivethirtyeight Riddler: Can The Riddler Bros. Beat Joe DiMaggio’s Hitting Streak?

How do theoretical players stack up against Joe Dimaggio?

## Pheno2019 - ATLAS Searches for VH/HH Resonances

I went to Pittsburgh to talk Higgs

## Fivethirtyeight Riddler: Can You Turn America’s Pastime Into A Game Of Yahtzee?

If baseball isn’t random enough, let’s make it into a dice game

## US LHC Users Association 2018 - Why do we care about di-Higgs Production?

Or: how to summarize a PhD’s worth of work in 8 minutes

## Double Higgs Production at Colliders Workshop - VBF HH Generation and Benchmarks

Double the Higgs, double the fun!