<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://tylerjamesburch.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://tylerjamesburch.com/" rel="alternate" type="text/html" /><updated>2026-04-20T12:28:09+00:00</updated><id>https://tylerjamesburch.com/feed.xml</id><title type="html">Tyler James Burch</title><subtitle>Lead Data Analyst for the Boston Red Sox. Baseball analytics, data science, and particle physics research.</subtitle><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><entry><title type="html">Weather Effects on Boston Marathon Times</title><link href="https://tylerjamesburch.com/blog/statistics/weather-effects-boston-marathon" rel="alternate" type="text/html" title="Weather Effects on Boston Marathon Times" /><published>2026-04-20T00:00:00+00:00</published><updated>2026-04-20T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/statistics/weather-effects-boston-marathon</id><content type="html" xml:base="https://tylerjamesburch.com/blog/statistics/weather-effects-boston-marathon"><![CDATA[<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h2 id="the-boston-marathon---so-hot-right-now">The Boston Marathon - So Hot Right Now</h2>

<p>Every year I make the short trek one block from Fenway to Kenmore Square to watch exhausted runners grinding through the final mile, occasionally with a cowbell in hand. While I love watching the marathon, I also love running myself, though my longest race to date is just a half marathon. And, unlike most runners, I actually have a small preference for warm-weather running. 70-75°F is my sweet spot. Research, however, does not back me here - most published literature seems to indicate <a href="https://doi.org/10.1097/00005768-199709000-00018">about 50°F is optimum running conditions</a>.</p>

<p>This year marks the fiftieth anniversary of the 1976 Boston Marathon, which peaked at 91.9°F, even too hot for me. It earned the name “Run for the Hoses.” In this race, Jack Fultz won in 2:20:19, which was <strong>nine minutes slower</strong> than the course record at the time. Bill Rodgers, who’d run 2:09:55 the year before, dropped out, as well as Tom Fleming, who placed third in 1975.</p>

<h2 id="thermal-fluctuations">Thermal Fluctuations</h2>

<p>Below is the average race time of the top 3 Boston Marathon runners over time.</p>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/top3_timeline.png" alt="Top-3 timeline with LOWESS" /><figcaption>
      Top-3 mean finishing time by year, with LOWESS trend. 1976 visible as the outlier near center.

    </figcaption></figure>

<p>This confirms that runners have been getting better over time, though non-linearly. Between 1950 and the turn of the century, gains were substantial. Around the turn of the century, the rate of improvement flattened. However, there’s considerable year-over-year variation, which makes a ton of sense. Conditions vary widely. In fact, that 1976 race sticks out like a sore thumb in the middle of the plot.</p>

<p>This made me curious, how much of that variation can be attributed to weather conditions on race day, which I’ll dive into in this post.</p>

<h2 id="prior-work">Prior work</h2>

<p>This is hardly a novel problem. There has been plenty of work done on this topic. Three to highlight:</p>

<ul>
  <li><strong><a href="https://doi.org/10.1249/mss.0b013e31802d3aba">Ely, Cheuvront, Roberts &amp; Montain (2007)</a></strong> — pools seven major marathons (including Boston) and divides race-day conditions into WBGT quartiles. For elite men (top-3 finishers), slowdowns go 1.7% → 2.5% → 3.3% → 4.5% as you move from cool to hot. This is the closest thing the literature has to a canonical “how much does heat cost” table.</li>
  <li><strong><a href="https://doi.org/10.1038/s41612-024-00637-x">Wang et al. (2024)</a></strong> — the world’s top-96 individual marathon athletes (top 16 per continent) followed across events. Models a linear performance degradation above 15°C with slope ~0.39 min/°C for men, 0.71 min/°C for women. This is the first model I attempt to replicate.</li>
  <li><strong><a href="https://doi.org/10.1097/00005768-199709000-00018">Galloway &amp; Maughan (1997)</a></strong> — not a marathon study, actually. Eight cyclists rode to exhaustion at four temperatures in a lab. Peak endurance came at 10.5°C (~51°F); by 30°C performance had collapsed. This study models the performance degradation as a quadratic function, which I replicate in the second model. See also <a href="https://doi.org/10.2165/00007256-200737040-00032">Maughan, Watson &amp; Shirreffs (2007)</a> for the thermoregulation review that carries the same physiology over to distance running.</li>
</ul>

<h2 id="my-research">My research</h2>

<p>These studies give a great groundwork, but I wanted to dig in a bit deeper. Specifically a few questions:</p>

<ul>
  <li>The literature suggests both linear and quadratic models, which can <a href="/blog/statistics/polynomial-regression-bambi">vary wildly as you get to extreme values</a>. I was curious what this looked like, and what a less constrained fit would look like.</li>
  <li>Both have pretty strongly constrained functional forms. I was interested in a more data-driven approach, namely fitting the weather effect via a spline based model.</li>
</ul>

<p>So I fit three Bayesian models, each on log-finishing-time. To control for athletic performance, each use a random walk, and each control for precipitation. The models differ in how race-day temperature enters:</p>

<ol>
  <li><strong>Wang-replication model</strong>: a linear hinge above 59°F, replicating Wang 2024’s functional form.</li>
  <li><strong>Physiology model</strong>: a quadratic hinge above the same knot, motivated by the Galloway study.</li>
  <li><strong>Spline model</strong>: replaces the hinge with a thin-plate spline, effectively letting the data pick the shape.</li>
</ol>

<p>Each of the three models below is fit in <code class="language-plaintext highlighter-rouge">brms</code> with <code class="language-plaintext highlighter-rouge">cmdstanr</code>. Response is <code class="language-plaintext highlighter-rouge">log_seconds</code> (reader-facing numbers are back-transformed). All three include a state-space level (described below) to absorb improving performance over time, plus precipitation controls. Details on priors, convergence, and specifications live in the collapsible <a href="#technical-details">technical details</a> at the bottom.</p>

<h3 id="controlling-for-athletic-performance-improvement">Controlling for athletic-performance improvement</h3>

<p>Runners have gotten faster over the last century. Training methods modernized, footwear evolved, the global elite pool expanded. To measure weather effects, we need to account for that improvement and what performance would have looked like agnostic of weather.</p>

<p>I choose a simple approach to this, a <strong>random walk</strong>. Each year’s typical finishing time is the previous year’s, plus a small stochastic step. The size of this step is determined as part of the model fitting process. Notably, this does not impose a specific shape, just one wandering step-per-year.</p>

<p>This is a good fit for the following reasons:</p>

<ul>
  <li><strong>Uncertainty grows when you extrapolate.</strong> Predicting ten years past the data accumulates ten years of step variance; predicting one year ahead accumulates one. That matches reality — we know less about the far future than the well-measured middle.</li>
  <li><strong>Year gaps are handled naturally.</strong> 1918 (WWI), 2020 (COVID), and 2021 (October race, omitted from the analysis) are missing from the data. The random walk weights its steps by the time elapsed, so the missing years cost us proportional information without breaking anything.</li>
</ul>

<p>Here’s what the fitted year level looks like:</p>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/latent_slope_trace.png" alt="Year-level fitted finishing time — random-walk posterior" /><figcaption>
      Random-walk posterior for the year level. Red line = model’s best guess of the typical finishing time absent weather; bands = 80% and 95% credible intervals; dots = observed top-3 mean.

    </figcaption></figure>

<p>The red line is the model’s best guess of the typical finishing time for each year absent any weather effect; the bands are 80% and 95% credible intervals. The dots are the actual observed top-3 mean for each year.</p>

<p>If you haven’t worked with random walks often, you might ask: “the line just follows the dots — isn’t this just memorizing the data?” The random walk is <em>regularized</em>: each year’s step is constrained in size by a prior the data has to push against, so the line can track the broad pattern but can’t absorb every wiggle. The years where dots sit visibly above the red band (1976 again, plus 2012 and 2017) are years where <em>something other than the long-run trend</em> is at work, and that residual is what the weather model picks up. If the line went through every dot, there’d be nothing left for the weather term to explain.</p>

<p>In summary, this term captures the bulk year-over-year patterns not explained by the other model terms (like weather), and is temporally regularized so it can’t change too much from one year to the next.</p>

<p>One note is that this plot in particular highlights the 1953 drop from ~156 minutes to ~139 minutes. It seems that marathon was a perfect mix of a few factors: a <a href="https://graphics.boston.com/marathon/history/1953.shtml?">25-knot (29 mph) tailwind</a>, 43 degree weather, a fast field, and <a href="https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon">a course that was found to over 1,000 yards short</a>.</p>

<p>The actual <code class="language-plaintext highlighter-rouge">brms</code> implementation does this via a custom Stan block; the code is in the <a href="#technical-details">technical details</a> at the bottom.</p>

<h3 id="the-wang-replication-model--linear-hinge">The Wang-replication model — linear hinge</h3>

<p>First I reproduce the Wang model. The “hinge” is a kink at a fixed knot temperature: below the knot, there is no temperature effect — finishing time is flat. Above the knot, finishing time grows linearly with each additional degree. The biological story is that thermoregulation is essentially free until core-temperature rise becomes a problem, after which performance degrades steadily.</p>

<p>Explicitly:</p>

\[\log(\text{seconds}) \sim \ell_{\text{year}} + \beta \max(T - k, 0) + \gamma \cdot \text{precip}\]

<p>Knot \(k\) at 59°F (15°C). Linear above the knot, exactly flat below by construction; \(\ell_{\text{year}}\) is the random-walk year level.</p>

<p>In <code class="language-plaintext highlighter-rouge">brms</code> syntax:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">tmax_f</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">KNOT_F</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/marginal_temp_curve_C.png" alt="Wang-replication model marginal curve" /><figcaption>
      Wang-replication model: marginal temperature curve with 95% credible ribbon. Slowdown plotted relative to a 50°F reference.

    </figcaption></figure>

<p>A quick note on how to read this and the next two plots: the y-axis is the predicted <em>slowdown relative to a 50°F reference</em>. Each curve is plotted with the model’s prediction at 50°F subtracted off, so all three pass through zero at 50°F — that’s a reporting anchor, not a model constraint. (The hinge models do have a structural flat region, but it lives below their 59°F knot, not specifically at 50°F.) 50°F is roughly the long-run Patriots’ Day average, so it’s a natural “benign weather” baseline to compare hot or cold years against.</p>

<p>The ribbon is the posterior 95% credible interval. The slope above 15°C comes out to 0.30 min/°C (95% credible interval [0.14, 0.46]), somewhat shallower than Wang’s pooled 0.39.<sup id="fnref:ci-note" role="doc-noteref"><a href="#fn:ci-note" class="footnote" rel="footnote">1</a></sup> This converts to about <strong>10 seconds per °F above 59</strong>. The intervals overlap comfortably and the gap is consistent with the structural differences in the analyses - Boston is a single course, while Wang’s pooled estimate considers variation across many marathons.</p>

<p>The curve is exactly flat below 59°F because the model says it has to be: there’s no below-knot coefficient. By design, this model only makes claims above the hinge. The other two models allow for varying behavior for cold weather too.</p>

<h3 id="the-physiology-model--quadratic-hinge">The Physiology model — quadratic hinge</h3>

<p>Next we do a Physiology-based model, which assumes temperature response is quadratic. Similar to Wang, we pin a knot at 59 degrees, however we allow for variation both above and below that knot:</p>

\[\log(\text{seconds}) \sim \ell_{\text{year}} + \beta_1 \max(k - T, 0) + \beta_2 \max(T - k, 0)^2 + \beta_3 \cdot \text{precip}\]

<p>This approach is motivated by <a href="https://doi.org/10.1097/00005768-199709000-00018">Galloway &amp; Maughan (1997)</a>, a cycling study which found an inverted-U relationship with performance peaking at ~10.5°C. We shift the knot to 15°C to align with Wang’s runner-based study, but keep the quadratic functional form on both sides of it.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">KNOT_F</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">tmax_f</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">tmax_f</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">KNOT_F</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="o">^</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/marginal_temp_curve_B.png" alt="Physiology model marginal curve" /><figcaption>
      Physiology (quadratic hinge) model: marginal temperature curve with 95% credible ribbon. Note the aggressive bend above the 59°F knot.

    </figcaption></figure>

<p>Despite allowing variation below the knot, it remains approximately flat below it. Above, we see strong quadratic behavior, however it is a bit alarming how close it gets to the 92°F day, which smells a bit overfit to me.</p>

<h3 id="the-spline-model">The Spline model</h3>

<p>Last, I fit a model more of my own crafting, with a bit less structure than the literature, which more freely allows the data to drive the curve rather than strict functional form.</p>

\[\log(\text{seconds}) \sim \ell_{\text{year}} + s(T_{\text{anomaly}}) + \beta \cdot \text{precip}\]

<p>Here, we just have the performance and precipitation controls, and a thin-plate spline on temperature. A thin-plate spline is a flexible curve fit through the data with a built-in penalty for getting too wiggly. One small change from the previous two models: the spline takes temperature as an <em>anomaly</em> (race-day max minus the day-of-year climatological average) rather than the raw °F. The hinge models needed a raw-scale variable to anchor the 59°F knot; the spline doesn’t have a fixed knot, so the climate-normalized version is the natural input (Boston’s Patriots’-Day climatology only varies by a few degrees across April, so the practical difference is small).</p>

<p>This model also adds a second equation letting the residual variance depend smoothly on year: the within-year top-3 SD shrinks across the century as the elite field densifies and pacing professionalizes.</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">tmax_anomaly_f</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/marginal_temp_curve_A.png" alt="Spline model marginal curve" /><figcaption>
      Spline model: marginal temperature curve with 95% credible ribbon. Flexible thin-plate spline, no fixed knot.

    </figcaption></figure>

<p>The Spline model agrees with the other two below ~80°F where the data is dense. Above 85°F the curve bends less aggressively than the Physiology model’s quadratic — with one data point above 90°F, the thin-plate prior pulls toward a simpler shape.</p>

<h2 id="so-whats-the-answer">So What’s the Answer?</h2>

<p>So we’ve got three different formulations to answer our question. However, we want one final answer, not three. Often, modelers will just select a “best” model. If we go by that heuristic, the Spline model wins through <a href="https://mc-stan.org/loo/">Leave One Out (LOO) cross-validation</a>, a measure of how well the model predicts held-out observations. The “LOO Δ vs best (in SE)” shown below is the gap to the best model expressed in standard errors of that gap. Spline wins individual leave-one-out: Physiology is modestly worse (~2 SE), Wang-replication is clearly worse (~6 SE).</p>

<table>
  <thead>
    <tr>
      <th>Model</th>
      <th>LOO Δ vs best (in SE)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Spline</td>
      <td>0 (best)</td>
    </tr>
    <tr>
      <td>Physiology (quadratic hinge)</td>
      <td>−2.1</td>
    </tr>
    <tr>
      <td>Wang-replication (linear hinge)</td>
      <td>−5.9</td>
    </tr>
  </tbody>
</table>

<p>However, I wanted to be a bit careful here. Notably, we’re in the area of outliers and extrapolation, dangerous waters for statistical models. In this regime, I err on conservatism where possible. There is plenty of literature that shows a many-models approach often beats a single, highly predictive model. That is what I chose to do here, known as Model Stacking: build a weighted mixture where each model’s weight is chosen to maximize the predictive performance of the combined model, following <a href="https://doi.org/10.1214/17-BA1091">Yao, Vehtari, Simpson &amp; Gelman (2018)</a>.</p>

<table>
  <thead>
    <tr>
      <th>Model</th>
      <th>Stacking weight (95% bootstrap CI, B=200)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Spline</td>
      <td>0.18 [0.14, 0.27]</td>
    </tr>
    <tr>
      <td>Physiology (quadratic hinge)</td>
      <td>0.55 [0.50, 0.65]</td>
    </tr>
    <tr>
      <td>Wang-replication (linear hinge)</td>
      <td>0.27 [0.17, 0.31]</td>
    </tr>
  </tbody>
</table>

<p>Stacking can favor a model that isn’t the best individual predictor, and we see that here — the largest contributor to the blend is the quadratic Physiology model, not the Spline. Stacking asks “if I’m allowed to combine the three models into one weighted prediction, what weights minimize the <em>combined</em> prediction error?” A model can be a slightly worse solo predictor and still earn high mixture weight if it makes its mistakes in <em>different places</em> than the others. Uncorrelated errors cancel out when blended. The Physiology model is more aggressive at the tails, predicting a higher heat penalty than the other two. The Spline and linear hinge models make similar predictions to each other, so they contribute redundant information. The Physiology model’s distinct behavior is what earns it a higher weight in the blend.</p>

<p>That being said, the data-rich region appropriately is the most impactful in fitting the models, but we assume it cleanly extrapolates to a less saturated region. The mixture’s weights reflect predictive accuracy in mild conditions, and we’re assuming that accuracy carries to the heat tail, which may not be a perfect assumption.
For the rest of this post I report the <strong>stacked mixture</strong> as the headline. The Spline is the natural single-model starting point (it’s the best individual predictor), but for an extrapolation question with a single anchoring data point at 92°F, leaning on a blend that incorporates the mechanism-committed Physiology and Wang-replication models is the more defensible read.</p>
<h3 id="effect-of-weather-on-final-race-times">Effect of Weather on Final Race Times</h3>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/year_by_year_effect.png" alt="Stacked year-by-year weather effect" /><figcaption>
      Stacked mixture’s weather-only contribution to each year’s top-3 mean finish time. 1976 and 2017 stand clearly above the pack; 2018’s effect is driven by cold rather than heat.

    </figcaption></figure>

<p>For every race-year in the panel, the stacked model estimates the weather-only contribution to that year’s top-3 average finish time: the difference between what the model predicts happened and what it would have predicted on a 50°F day. Note that by construction, this contribution is bounded below at zero: each model’s marginal curve has its minimum at the 50°F reference so any departure from the reference is by construction non-negative.</p>

<p>Most years cluster within a minute of zero. Two stand clearly above, 1976 and 2017. 2018 in particular is interesting: the model predicts a slowdown as a result of cold weather, rather than heat.</p>

<h3 id="the-1976-counterfactual">The 1976 counterfactual</h3>

<p>In an alternate universe where it was a 50°F Patriots’ Day in 1976, the stacked mixture estimates the 1976 top-3 mean would have been <strong>faster by 6:27</strong> (95% credible interval 2:47–10:52).</p>

<p>For context, by model: the Physiology model alone says 7:21 (the largest of the three), the Wang-replication model says 5:44, and the Spline model says 5:20.</p>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/counterfactual_1976_kde.png" alt="1976 heat-cost posterior densities" /><figcaption>
      Posterior densities for the 1976 heat cost, by model. Stacked mixture headline: 6:27 faster on a 50°F day (95% CI 2:47–10:52).

    </figcaption></figure>

<p>The figure shows each model’s full posterior density side-by-side. The model identifies a <em>cohort-level</em> heat effect — every finisher in the field gets shifted by the same amount, so the counterfactual time is the same for each place by construction:</p>

<table>
  <thead>
    <tr>
      <th>Place</th>
      <th>Runner</th>
      <th>Actual time</th>
      <th>Counterfactual time</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>Jack Fultz</td>
      <td>2:20:19</td>
      <td>2:13:52</td>
    </tr>
    <tr>
      <td>2</td>
      <td>Mario Cuevas</td>
      <td>2:21:13</td>
      <td>2:14:46</td>
    </tr>
    <tr>
      <td>3</td>
      <td>Jose DeJesus</td>
      <td>2:22:10</td>
      <td>2:15:43</td>
    </tr>
  </tbody>
</table>

<p>Observed top-3 mean: <strong>2:21:14</strong>; the model’s 50°F counterfactual top-3 mean is <strong>2:14:47</strong>.</p>

<h3 id="2026-prediction">2026 prediction</h3>

<p>I still have 40 minutes until I’m technically allowed to burn the midnight oil, but I did do my last data pull as I wrapped up this piece at 11:20 pm in Boston, the night before the race. The NWS forecast for Hopkinton is 48°F, near perfect conditions. Under that weather, the stacked mixture predicts basically no temperature-related penalty and a top-3 mean of <strong>2:05:10</strong> with a 95% posterior predictive interval of 1:57:54–2:12:29. Obviously this is an incredibly wide interval. It would be remarkable to beat the world record of 2:00:35 on Boston’s famously difficult course, and 11% of our posterior probability claims the average of the top-3 finishers would be below that. However, that is to be expected given we formulated this as a weather model, not a performance forecasting model.</p>

<figure class="">
  <img src="/blogimages/boston-marathon-1976/predict_2026_kde.png" alt="2026 KDE overlay" /><figcaption>
      Posterior predictive densities for 2026 top-3 mean, by model. All three bunch tightly given the near-perfect 48°F forecast.

    </figcaption></figure>

<p>Of course, these curves all look similar, which is a result of the forecast being near-perfect.</p>

<p><span style="font-size:1.5em; font-weight:bold; display:block; margin:2em 0;">
Good luck to all the runners out there!
</span></p>

<h2 id="caveats">Caveats</h2>

<p class="notice--warning"><strong>Temperature is a proxy.</strong> The best marathon-performance studies use WBGT (wet-bulb globe temperature), which incorporates humidity, radiation, and wind. Blue Hill doesn’t have continuous dewpoint data until 2006 or wind data until 1998 at Logan. I kept maximum temperature + a precipitation indicator for simplicity.</p>

<p class="notice--warning"><strong>DNF selection.</strong> The 1976 top-3 is the 3 best <em>finishers</em>, not the 3 best in the field. Rodgers and Fleming dropped out — athletes who ran 2:09 and 2:12 on 50°F in 1975 and 1977. This selection bias affects the study, but I kept the top-3 framing following the precedent set by Ely 2007.</p>

<h2 id="technical-details">Technical details</h2>

<details>
  <summary><strong>Full model specifications, priors, and convergence</strong></summary>

  <p>All models fit in <code class="language-plaintext highlighter-rouge">brms</code> with the <code class="language-plaintext highlighter-rouge">cmdstanr</code> backend, 4 chains × 2000 post-warmup draws (2000 warmup), <code class="language-plaintext highlighter-rouge">adapt_delta = 0.995</code>. The state-space level uses a non-centered parameterization (sample standardized innovations <code class="language-plaintext highlighter-rouge">tilde_l ~ N(0,1)</code>, build <code class="language-plaintext highlighter-rouge">ll[y]</code> deterministically) to avoid Neal’s funnel.</p>

  <h3 id="state-space-level-shared-across-all-three-models">State-space level (shared across all three models)</h3>

  <div class="language-stan highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">parameters</span> <span class="p">{</span>
  <span class="kt">real</span> <span class="nv">ll_init</span><span class="p">;</span>
  <span class="kt">vector</span><span class="p">[</span><span class="nv">Y</span><span class="p">]</span> <span class="nv">tilde_l</span><span class="p">;</span>
  <span class="kt">real</span><span class="o">&lt;</span><span class="na">lower</span><span class="o">=</span><span class="mi">0</span><span class="o">&gt;</span> <span class="nv">sigma_l</span><span class="p">;</span>
<span class="p">}</span>
<span class="nn">transformed parameters</span> <span class="p">{</span>
  <span class="kt">vector</span><span class="p">[</span><span class="nv">Y</span><span class="p">]</span> <span class="nv">ll</span><span class="p">;</span>
  <span class="nv">ll</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="nv">ll_init</span><span class="p">;</span>
  <span class="k">for</span> <span class="p">(</span><span class="nv">y</span> <span class="kr">in</span> <span class="mi">2</span><span class="o">:</span><span class="nv">Y</span><span class="p">)</span> <span class="p">{</span>
    <span class="nv">ll</span><span class="p">[</span><span class="nv">y</span><span class="p">]</span> <span class="o">=</span> <span class="nv">ll</span><span class="p">[</span><span class="nv">y</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">+</span> <span class="nv">sigma_l</span> <span class="o">*</span> <span class="nb">sqrt</span><span class="p">(</span><span class="nv">dt</span><span class="p">[</span><span class="nv">y</span><span class="p">])</span> <span class="o">*</span> <span class="nv">tilde_l</span><span class="p">[</span><span class="nv">y</span><span class="p">];</span>
  <span class="p">}</span>
<span class="p">}</span>
<span class="nn">model</span> <span class="p">{</span>
  <span class="nv">ll_init</span>  <span class="o">~</span> <span class="nb">normal</span><span class="p">(</span><span class="mi">9</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">);</span>
  <span class="nv">tilde_l</span>  <span class="o">~</span> <span class="nb">std_normal</span><span class="p">();</span>
  <span class="nv">sigma_l</span>  <span class="o">~</span> <span class="nb">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div>  </div>

  <p>Per-row mean predictor adds <code class="language-plaintext highlighter-rouge">ll[year_idx[n]]</code>. The <code class="language-plaintext highlighter-rouge">√dt</code> scaling on the innovation honors race-cancellation gaps (1918, 2020, 2021). Note this is a simplification: the full Harvey 1989 covariance for a level-only RW is the same, but if we’d been able to fit a local linear trend the gap variance would underestimate by ~1 year × σ_β per gap. Documented for completeness.</p>

  <h3 id="spline-model--flexible-spline-mean--year-only-sigma-sub-model">Spline model — flexible spline (mean) + year-only sigma sub-model</h3>

  <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">tmax_anomaly_f</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div>  </div>

  <p>The <code class="language-plaintext highlighter-rouge">~ 0</code> suppresses brms’s automatic intercept, since the state-space level <code class="language-plaintext highlighter-rouge">ll[1]</code> carries the baseline. The sigma sub-model is <code class="language-plaintext highlighter-rouge">s(year)</code> only — I tested adding <code class="language-plaintext highlighter-rouge">s(tmax_anomaly)</code> to the variance equation, but at three observations per year there’s no power for within-year heat-variance signal (posterior dominated by prior). The s(year) term retains a real 4× signal across the century (within-year SD shrinks as the elite field densifies).</p>

  <h3 id="physiology-model--quadratic-hinge">Physiology model — quadratic hinge</h3>

  <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">knot_f</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">tmax_f</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">tmax_f</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">knot_f</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="o">^</span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div>  </div>

  <h3 id="wang-replication-model--linear-hinge">Wang-replication model — linear hinge</h3>

  <div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">bf</span><span class="p">(</span><span class="w">
  </span><span class="n">log_seconds</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">I</span><span class="p">(</span><span class="n">pmax</span><span class="p">(</span><span class="n">tmax_f</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">knot_f</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_day</span><span class="w"> </span><span class="o">+</span><span class="w">
                </span><span class="n">precip_missing</span><span class="p">,</span><span class="w">
  </span><span class="n">sigma</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">s</span><span class="p">(</span><span class="n">year</span><span class="p">,</span><span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">10</span><span class="p">,</span><span class="w"> </span><span class="n">bs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tp"</span><span class="p">)</span><span class="w">
</span><span class="p">)</span><span class="w">
</span></code></pre></div>  </div>

  <p>Knot fixed at 59°F across B and C.</p>

  <h3 id="priors">Priors</h3>

  <table>
    <thead>
      <tr>
        <th>Parameter</th>
        <th>Prior</th>
        <th>Justification</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Intercept</td>
        <td><code class="language-plaintext highlighter-rouge">normal(9, 0.3)</code></td>
        <td>log(7920s) ≈ 8.98; ±2σ covers the 1:40–2:55 elite range</td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">s(year)</code> SD</td>
        <td><code class="language-plaintext highlighter-rouge">normal(0, 0.30)</code></td>
        <td>Total 1927→2025 year effect ≈ 0.4 on log scale</td>
      </tr>
      <tr>
        <td><code class="language-plaintext highlighter-rouge">s(tmax_anomaly)</code> SD (Spline model)</td>
        <td><code class="language-plaintext highlighter-rouge">normal(0, 0.18)</code></td>
        <td>Prior predictive 97.5% tail at 90°F vs 50°F is ~30 min slowdown</td>
      </tr>
      <tr>
        <td>Hinge coefficients (Physiology + Wang-replication)</td>
        <td><code class="language-plaintext highlighter-rouge">normal(0, 0.05)</code></td>
        <td>Above-knot effects on log scale; ±2σ covers ±10% per °F departure</td>
      </tr>
      <tr>
        <td>precip indicator</td>
        <td><code class="language-plaintext highlighter-rouge">normal(0, 0.05)</code></td>
        <td>Wet vs dry ±2σ covers ~±10% effect on finishing time</td>
      </tr>
    </tbody>
  </table>

  <h3 id="convergence">Convergence</h3>

  <table>
    <thead>
      <tr>
        <th>Model</th>
        <th>max Rhat</th>
        <th>min bulk-ESS</th>
        <th>min tail-ESS</th>
        <th>divergent transitions</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Spline</td>
        <td>1.004</td>
        <td>1407</td>
        <td>2380</td>
        <td>0</td>
      </tr>
      <tr>
        <td>Physiology</td>
        <td>1.002</td>
        <td>1225</td>
        <td>2163</td>
        <td>0</td>
      </tr>
      <tr>
        <td>Wang-replication</td>
        <td>1.001</td>
        <td>1476</td>
        <td>3112</td>
        <td>0</td>
      </tr>
    </tbody>
  </table>

</details>

<details>
  <summary><strong>Prior predictive and posterior predictive checks</strong></summary>

  <p><img src="/blogimages/boston-marathon-1976/fig5_prior_predictive.png" alt="Prior predictive" /></p>

  <p>The prior predictive generates plausible top-3 time distributions across the TMAX range.</p>

  <p><img src="/blogimages/boston-marathon-1976/fig6_ppc_combined.png" alt="PPC combined" /></p>

  <p>Posterior predictive checks for all three models. Per-model PPCs:</p>

  <p><img src="/blogimages/boston-marathon-1976/fig6_ppc_A.png" alt="PPC — Spline model" />
<img src="/blogimages/boston-marathon-1976/fig6_ppc_B.png" alt="PPC — Physiology model" />
<img src="/blogimages/boston-marathon-1976/fig6_ppc_C.png" alt="PPC — Wang-replication model" /></p>

</details>

<details>
  <summary><strong>LOO comparison and stacking weight bootstrap</strong></summary>

  <p>Values this large are the expected signature of the Pareto-k &gt; 0.7 degeneracy described below; only the ratio is meaningful.</p>

  <table>
    <thead>
      <tr>
        <th>Model</th>
        <th>ELPD-diff</th>
        <th>SE-diff</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Spline</td>
        <td>0.0</td>
        <td>0.0</td>
      </tr>
      <tr>
        <td>Physiology</td>
        <td>-2168949.5</td>
        <td>1014288.0</td>
      </tr>
      <tr>
        <td>Wang-replication</td>
        <td>-8524375.6</td>
        <td>1454564.9</td>
      </tr>
    </tbody>
  </table>

  <p>Absolute ELPD-LOO and p-LOO are omitted because state-space LOO is degenerate here — every observation has Pareto-k &gt; 0.7, the latent year level is informed by all years, and leaving out a single observation fundamentally changes the inference. The ELPD-diff / SE-diff ratios above are still useful as a <em>relative</em> ranking signal, but the absolute predictive log scores are not interpretable on their usual scale.</p>

  <p>Stacking weights with bootstrapped 95% CIs (200 bootstrap samples, resampling races with replacement):</p>

  <table>
    <thead>
      <tr>
        <th>Model</th>
        <th>Stacking weight (95% bootstrap CI, B=200)</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>Spline</td>
        <td>0.18 [0.14, 0.27]</td>
      </tr>
      <tr>
        <td>Physiology (quadratic hinge)</td>
        <td>0.55 [0.50, 0.65]</td>
      </tr>
      <tr>
        <td>Wang-replication (linear hinge)</td>
        <td>0.27 [0.17, 0.31]</td>
      </tr>
    </tbody>
  </table>

  <p>The Physiology model’s CI doesn’t overlap the other two, identifying it as the preferred mixture component even on this small panel — corroborating the mechanism-anchored choice independently.</p>

</details>

<details>
  <summary><strong>Knot sensitivity</strong></summary>

  <p>The Physiology and Wang-replication models both fix the hinge knot at 59°F (15°C) a priori from Galloway/Wang; the sensitivity sweep below is reported for transparency, not for slope selection. Sweep at 55/57/61/63°F (Physiology model heat-cost shown):</p>

  <table>
    <thead>
      <tr>
        <th>Knot (°F)</th>
        <th>1976 top-3 mean heat cost (min)</th>
        <th>95% CI</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>55</td>
        <td>7.2</td>
        <td>[3.4, 11.2]</td>
      </tr>
      <tr>
        <td>57</td>
        <td>7.2</td>
        <td>[3.3, 11.3]</td>
      </tr>
      <tr>
        <td>59</td>
        <td>7.4</td>
        <td>[3.2, 11.5]</td>
      </tr>
      <tr>
        <td>61</td>
        <td>7.2</td>
        <td>[2.9, 11.6]</td>
      </tr>
      <tr>
        <td>63</td>
        <td>7.4</td>
        <td>[2.9, 11.8]</td>
      </tr>
    </tbody>
  </table>

  <p>The Physiology-model headline is robust across knots — within 0.2 min of the canonical 59°F value, well inside the within-knot CI width. Knot location is not a meaningful researcher degree of freedom for the headline.</p>

  <p>For the Wang comparison, the Wang-replication model’s above-knot slope per knot:</p>

  <table>
    <thead>
      <tr>
        <th>Knot (°F)</th>
        <th>Above-knot slope (min/°C)</th>
        <th>95% CI</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>55</td>
        <td>0.24</td>
        <td>[0.10, 0.39]</td>
      </tr>
      <tr>
        <td>57</td>
        <td>0.27</td>
        <td>[0.12, 0.42]</td>
      </tr>
      <tr>
        <td>59</td>
        <td>0.30</td>
        <td>[0.14, 0.47]</td>
      </tr>
      <tr>
        <td>61</td>
        <td>0.33</td>
        <td>[0.15, 0.51]</td>
      </tr>
      <tr>
        <td>63</td>
        <td>0.36</td>
        <td>[0.16, 0.56]</td>
      </tr>
    </tbody>
  </table>

  <p>The Wang-replication model’s slope grows monotonically with knot (more leverage from a sharper threshold). At the canonical 59°F we get 0.30 min/°C; the slope at knot=63°F approaches Wang’s pooled 0.39. The CIs overlap Wang’s value at every knot we tested.</p>

</details>

<details>
  <summary><strong>Across-model marginal curve (envelope)</strong></summary>

  <p><img src="/blogimages/boston-marathon-1976/marginal_temp_curve.png" alt="Marginal curve envelope" /></p>

  <p><img src="/blogimages/boston-marathon-1976/marginal_temp_curve_decomposed.png" alt="Marginal curve decomposed" /></p>

  <p>The envelope is the pointwise union of the three models’ 95% intervals across TMAX. The decomposed view shows the three curves separately for comparison.</p>

</details>

<h2 id="acknowledgments-and-sources">Acknowledgments and sources</h2>

<ul>
  <li>NOAA <a href="https://www.ncei.noaa.gov/cdo-web/datasets">GHCN-Daily</a> for Blue Hill weather records and <a href="https://www.ncei.noaa.gov/products/us-climate-normals">1991–2020 Climate Normals</a> for the climatology baseline.</li>
  <li><a href="https://github.com/adrian3/Boston-Marathon-Data-Project">adrian3/Boston-Marathon-Data-Project</a> for pre-2019 race results.</li>
  <li>The <a href="https://www.baa.org/">Boston Athletic Association</a> for historical results 2019–2025 and race-date records.</li>
  <li><a href="https://paul-buerkner.github.io/brms/">Paul Bürkner’s <code class="language-plaintext highlighter-rouge">brms</code></a> and the Stan developers.</li>
  <li><a href="https://www.bostonglobe.com/about/staff-list/staff/matt-porter/">Matt Porter</a> for getting my wheels spinning on the problem.</li>
</ul>
<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:ci-note" role="doc-endnote">
      <p>All “95% CI” values in this post are 95% Bayesian credible intervals (the central 95% of the posterior), not frequentist confidence intervals. I’ll abbreviate as “95% CI” in tables for compactness. <a href="#fnref:ci-note" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="bayesian" /><category term="brms" /><category term="sports" /><category term="running" /><category term="weather" /><summary type="html"><![CDATA[How much did the 1976 'Run for the Hoses' actually slow the field?]]></summary></entry><entry><title type="html">Forecasting March Madness 2026 - Latent Skills Models</title><link href="https://tylerjamesburch.com/blog/statistics/march-madness-2026" rel="alternate" type="text/html" title="Forecasting March Madness 2026 - Latent Skills Models" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/statistics/march-madness-2026</id><content type="html" xml:base="https://tylerjamesburch.com/blog/statistics/march-madness-2026"><![CDATA[<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<p><em>For live updates throughout the tournament, see <a href="/march-madness-2026/">the dashboard</a>.</em></p>

<h2 id="background">Background</h2>

<p>Every March, 68 college basketball teams from 31 conferences get thrown into a single-elimination bracket, and \(2^{63}\) (roughly 9.2 quintillion) possible outcomes follow. I find bracketology to be a really fun statistical problem: individual games are often fairly predictable, but with that many possible outcomes, low-probability upsets are virtually guaranteed to sneak through <em>somewhere</em>. Everyone’s bracket breaks. That’s why <a href="https://en.wikipedia.org/wiki/Paul_the_Octopus">even an octopus</a> can put together competitive predictions for single-elimination tournaments.</p>

<p>For this year’s tournament, I fit a model to forecast the NCAA March Madness tournaments. The approach builds on the same philosophy as my <a href="https://tylerjamesburch.com/blog/statistics/hockey-bayes">NHL team strength model</a>, a hierarchical Bayesian model for handling head-to-head matchups where we have historical scores. This model decomposes team quality into attack and defense parameters, and the same decomposition is a natural fit for basketball. By decomposing offense and defense, we can see how Houston’s defense-first approach is fundamentally different from Alabama’s strong offensive profile. Simulating bracket knowing <em>how</em> a team wins, not just <em>that</em> it wins provides a richer, more digestible understanding of the predictions.</p>

<h2 id="the-model">The Model</h2>

<h3 id="step-1-bradley-terry-winloss">Step 1: Bradley-Terry (Win/Loss)</h3>

<p>The classic <a href="https://en.wikipedia.org/wiki/Bradley%E2%80%93Terry_model">Bradley-Terry model</a> assigns each team \(i\) a latent strength \(\theta_i\) and models the probability that team \(i\) beats team \(j\) as a function of the difference in their strengths. In its typical form, the only outcome is binary: who won. This is a workhorse model for pairwise comparisons (<a href="https://en.wikipedia.org/wiki/Elo_rating_system">Elo ratings</a> are a special case) and it works fine for generating bracket probabilities.</p>

<p>But basketball gives us more than just wins and losses. Beating a team by 30 tells you something very different than winning by 1, but the basic Bradley-Terry model sees those as equivalent.</p>

<h3 id="step-2-score-differential">Step 2: Score Differential</h3>

<p>A natural extension is to model the score <em>margin</em> directly:</p>

\[\text{margin}_{ij} \sim \text{Normal}(\theta_i - \theta_j + \alpha \cdot \text{home}_i, \sigma)\]

<p>Now \(\theta_i\) captures how many points team \(i\) is above or below average, and the likelihood uses the full margin rather than collapsing it to a binary outcome. Home court advantage \(\alpha\) enters linearly. I stood this up initially, and it achieved a Brier score of 0.189 across 10 seasons of held-out tournament predictions.</p>

<p>But this model still assigns each team a single “strength” number. Michigan and Houston might have the same \(\theta\), but they get there in completely different ways. The single strength parameter can’t see that.</p>

<h3 id="step-3-offense-defense-decomposition">Step 3: Offense-Defense Decomposition</h3>

<p>In this approach, each game produces <em>two</em> observations, the score produced by each team. By observing each team’s score separately, you can naturally split team strength into offense and defense:</p>

\[\text{score}_i \sim \text{Normal}(\mu + \text{off}_i - \text{def}_j + \alpha \cdot \text{home}_i, \sigma)\]

\[\text{score}_j \sim \text{Normal}(\mu + \text{off}_j - \text{def}_i - \alpha \cdot \text{home}_i, \sigma)\]

<p>Each team gets two parameters: an offensive strength \(\text{off}_i\) (how many points they generate above average) and a defensive strength \(\text{def}_i\) (how many points they <em>prevent</em> above average). The global intercept \(\mu\) anchors the average score per team per game, estimating at about 70 points. Home court advantage \(\alpha\) applies in the usual way.</p>

<p>Note that if you subtract the second equation from the first, you recover the margin model. Any information the margin model could learn, this model can as well. What it gains is the ability to distinguish Houston’s defense-first 77-55 wins from Michigan’s offense-first 101-83 wins, even when both are comfortable victories.</p>

<p>This is the same framework <a href="https://discovery.ucl.ac.uk/id/eprint/16040/1/16040.pdf">Baio &amp; Blangiardo</a> used for modeling the Premier League, adapted for college basketball. The key insight from that work carries over: by modeling scores directly instead of margins, you can identify teams that win by outscoring opponents versus teams that win by shutting them down.</p>

<h3 id="hierarchical-structure-and-the-lkj-correlation">Hierarchical Structure and the LKJ Correlation</h3>

<p>Not all teams are created equal in terms of schedule strength. A mid-major team that went 28-3 against weak competition: how good are they really? I handle this with a hierarchical prior. Each team’s offense and defense are drawn from conference-level distributions:</p>

\[\begin{bmatrix} \text{off}_i \\ \text{def}_i \end{bmatrix} = \begin{bmatrix} \mu^{\text{off}}_{c[i]} \\ \mu^{\text{def}}_{c[i]} \end{bmatrix} + L \cdot z_i\]

<p>where \(L\) is the Cholesky factor of a \(2 \times 2\) covariance matrix with an LKJ(\(\eta=2\)) prior on the correlation, and \(z_i \sim \text{Normal}(0, 1)\). Conference-level means for offense (\(\mu^{\text{off}}_c\)) and defense (\(\mu^{\text{def}}_c\)) are estimated separately. The SEC might produce strong defenses while the Big Ten generates potent offenses, or vice versa.</p>

<p>The LKJ prior lets the model learn whether offense and defense trade off or co-occur within a conference. The non-centered parameterization via \(z_i\) is the standard trick for sampling efficiency in hierarchical models.</p>

<p>The ultimate result is that we get conference-level effects: a dominant team in a weak conference gets pulled down slightly, and an underperforming team in a strong conference gets a boost.</p>

<h3 id="on-the-likelihood">On the likelihood</h3>

<p>One decision to make in this process is the likelihood for number of points scored in a game. Here we balance pragmatism and expressiveness.</p>

<p>In basketball, points are discrete units so a natural question is why not use a Poisson or negative binomial likelihood? A couple of reasons:</p>

<ul>
  <li>Basketball scores are sums of 1/2/3-point possessions, not a genuine counting process. A team’s score of 78 isn’t “78 events occurred” the way 3 goals in a hockey game is.</li>
  <li>At a mean of ~70 points, the Gaussian puts essentially zero mass below zero, and by the central limit theorem it’s a good approximation to a sum of many small discrete contributions anyway.</li>
  <li>The continuous approximation (Gaussian) is simpler and easier to fit, which was helpful given I had one night to put this together.</li>
</ul>

<p>With individual scores instead of margins, outliers become a concern. Blowout games could pull the model around. To handle this empirically, I fit both a Gaussian and a Student-t likelihood (which is more robust to outliers) and compared them via LOO cross-validation:</p>

<p><img src="/blogimages/march-madness-2026/loo_comparison.png" alt="LOO model comparison" /></p>

<p>The ELPD difference was negligible in favor of Gaussian. The Student-t model estimated \(\nu \approx 53\), which is basically indistinguishable from a Gaussian, so I kept the Gaussian for simplicity.</p>

<h3 id="full-model-specification">Full Model Specification</h3>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="n">coords</span><span class="o">=</span><span class="n">coords</span><span class="p">)</span> <span class="k">as</span> <span class="n">model</span><span class="p">:</span>
    <span class="n">mu_intercept</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"mu_intercept"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">70</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span>

    <span class="n">sigma_off_conf</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma_off_conf"</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
    <span class="n">sigma_def_conf</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma_def_conf"</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
    <span class="n">mu_off_conf</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"mu_off_conf"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma_off_conf</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"conference"</span><span class="p">)</span>
    <span class="n">mu_def_conf</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"mu_def_conf"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma_def_conf</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"conference"</span><span class="p">)</span>

    <span class="n">sd_dist</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">sigma</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
    <span class="n">chol</span><span class="p">,</span> <span class="n">corr</span><span class="p">,</span> <span class="n">stds</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">LKJCholeskyCov</span><span class="p">(</span><span class="s">"lkj"</span><span class="p">,</span> <span class="n">n</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">eta</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">sd_dist</span><span class="o">=</span><span class="n">sd_dist</span><span class="p">,</span>
                                          <span class="n">compute_corr</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="n">z</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"z"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">n_teams</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span>

    <span class="n">team_effects</span> <span class="o">=</span> <span class="n">pt</span><span class="p">.</span><span class="n">stack</span><span class="p">([</span><span class="n">mu_off_conf</span><span class="p">[</span><span class="n">conf_of_team</span><span class="p">],</span>
                             <span class="n">mu_def_conf</span><span class="p">[</span><span class="n">conf_of_team</span><span class="p">]],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">pt</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">z</span><span class="p">,</span> <span class="n">chol</span><span class="p">.</span><span class="n">T</span><span class="p">)</span>
    <span class="n">off</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"off"</span><span class="p">,</span> <span class="n">team_effects</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>
    <span class="n">deff</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"def"</span><span class="p">,</span> <span class="n">team_effects</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>

    <span class="n">alpha</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"alpha"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mf">3.5</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mf">2.0</span><span class="p">)</span>
    <span class="n">sigma</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfNormal</span><span class="p">(</span><span class="s">"sigma"</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">15</span><span class="p">)</span>

    <span class="n">mu_score_i</span> <span class="o">=</span> <span class="n">mu_intercept</span> <span class="o">+</span> <span class="n">off</span><span class="p">[</span><span class="n">team_i</span><span class="p">]</span> <span class="o">-</span> <span class="n">deff</span><span class="p">[</span><span class="n">team_j</span><span class="p">]</span> <span class="o">+</span> <span class="n">alpha</span> <span class="o">*</span> <span class="n">home</span>
    <span class="n">mu_score_j</span> <span class="o">=</span> <span class="n">mu_intercept</span> <span class="o">+</span> <span class="n">off</span><span class="p">[</span><span class="n">team_j</span><span class="p">]</span> <span class="o">-</span> <span class="n">deff</span><span class="p">[</span><span class="n">team_i</span><span class="p">]</span> <span class="o">-</span> <span class="n">alpha</span> <span class="o">*</span> <span class="n">home</span>

    <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"score_i"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">mu_score_i</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">score_i</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"game"</span><span class="p">)</span>
    <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"score_j"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">mu_score_j</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">score_j</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"game"</span><span class="p">)</span>
</code></pre></div></div>

<p>Sampled with <a href="https://github.com/pymc-devs/nutpie">nutpie</a>: 4 chains, 2,000 draws each after 2,000 tuning steps. Zero divergences, \(\hat{R} \leq 1.01\) for all parameters.</p>

<h2 id="data">Data</h2>

<p>The model is fit on the <strong>2025-26 regular season</strong>: all 5,647 Division I games across 365 teams and 31 conferences. Data comes from the <a href="https://www.kaggle.com/competitions/march-machine-learning-mania-2026">Kaggle March ML Mania 2026</a> competition dataset. Each game provides: which teams played, both scores, and whether it was a home game, away game, or neutral site.</p>

<h3 id="posterior-predictive-check">Posterior Predictive Check</h3>

<p><img src="/blogimages/march-madness-2026/ppc_scores.png" alt="Posterior predictive scores" /></p>

<p>The posterior predictive distribution matches the observed score distribution well.</p>

<h2 id="results">Results</h2>

<h3 id="overall-team-strength">Overall Team Strength</h3>

<p>The overall strength of a team is the sum of its offensive and defensive contributions: \(\text{off}_i + \text{def}_i\).</p>

<p><img src="/blogimages/march-madness-2026/team_strengths_top30.png" alt="Top 30 team strengths" /></p>

<p>Michigan and Duke sit at the top, followed by Arizona and Florida. These are the 4 number one seeds. Some may say this is chalky, I say it’s a great sanity check.</p>

<h3 id="the-offense-defense-decomposition">The Offense-Defense Decomposition</h3>

<p><img src="/blogimages/march-madness-2026/off_def_scatter.png" alt="Offense vs defense scatter" /></p>

<p>Every dot is a team, grey ones are those that didn’t make the tournament. The x-axis is offensive strength (higher = generates more points above average), the y-axis is defensive strength (higher = allows fewer points than average). Teams in the upper-right are good at both; teams in the lower-left are bad at both.</p>

<p>A few teams to highlight:</p>

<ul>
  <li><strong>Houston</strong> (overall 23.6): off = 7.0, def = 16.6. This is a defense-first team by a wide margin. Elite defensive rating, merely decent offense.</li>
  <li><strong>Alabama</strong> (overall 21.0): off = 20.7, def = 0.3. The highest offensive rating of any tournament team, however paired with defense essentially at the league average.</li>
  <li><strong>Duke</strong> (overall 28.0): off = 12.0, def = 16.0. Also defense-first, but with a more balanced profile than Houston.</li>
</ul>

<p><img src="/blogimages/march-madness-2026/off_def_rankings.png" alt="Top 20 by offense and defense" /></p>

<p>The side-by-side rankings make the decomposition concrete. The list of top-20 offenses and top-20 defenses are quite different. A team can rank in the top 5 offensively while barely cracking the top 20 defensively, or vice versa.</p>

<p>Physically, \(\text{off}_i\) is how many points above average team \(i\) produces per game, and \(\text{def}_i\) is how many points below average they hold opponents to. These are per-game quantities, not per-possession, so they bake in pace alongside efficiency.</p>

<h3 id="conference-effects-offense-vs-defense">Conference Effects: Offense vs Defense</h3>

<p><img src="/blogimages/march-madness-2026/conference_off_def.png" alt="Conference offensive vs defensive effects" /></p>

<p>Conferences generally don’t lean dramatically in one direction or the other - good conferences are good, bad conferences are bad. The correlation in offensive and defensive effects is 0.83, very strong.</p>

<h2 id="shooters-shoot---simpsons-paradox-in-the-wild">Shooters Shoot - Simpson’s Paradox in the Wild</h2>

<p>One thing worth noting in the plot above is the marginal correlation between offense and defense across all 365 teams is <strong>+0.38</strong>. That’s positive: teams that are good at offense tend to also be good at defense. Intuitively, this makes sense: programs with good coaching, recruiting, and resources tend to be good at everything.</p>

<p>But the LKJ model parameter, the <em>within-conference</em> correlation that the model learned, is <strong>-0.20</strong>. That’s negative. Within a conference, offense and defense <em>trade off</em>. This is <a href="https://en.wikipedia.org/wiki/Simpson%27s_paradox">Simpson’s paradox</a>. The relationship reverses when you condition on the confounding variable (conference membership). Here’s why:</p>

<p><strong>Between conferences</strong>, the correlation is <strong>+0.83</strong>. Strong conferences (SEC, Big 12, Big Ten) produce teams that are above average on <em>both</em> offense and defense. Weak conferences produce teams below average on both. This between-group correlation is strong and positive, and when you pool all teams together ignoring conference, it dominates the marginal relationship.</p>

<p><strong>Within a conference</strong>, though, there’s also a structural component: games are zero-sum at the score level. When Team A runs up the score on Team B, that same game hurts Team B’s defensive numbers. One team’s strong offensive showing is simultaneously a poor defensive showing for the opponent, and conference opponents play each other repeatedly, effectively inducing a negative correlation. The data suggests a modest negative correlation of about <strong>-0.20</strong> within conferences, largely shaking out from this.</p>

<p>The LKJ model parameter of <strong>-0.20</strong> captures precisely this within-conference structure. That’s what it’s designed to do: after the conference means (\(\mu^{\text{off}}_c\), \(\mu^{\text{def}}_c\)) absorb the between-conference variation, the LKJ correlation models the <em>residual</em> relationship between offense and defense at the team level.</p>

<h3 id="home-court-advantage">Home Court Advantage</h3>

<p><img src="/blogimages/march-madness-2026/home_court.png" alt="Home court advantage posterior" /></p>

<p>The model estimates \(\alpha\) at <strong>1.44 ± 0.11 points</strong>. At first glance this looks low, but remember that \(\alpha\) appears in <em>both</em> score equations with opposite signs: the home team’s expected score goes up by \(\alpha\), and the away team’s goes down by \(\alpha\). The net effect on the margin is \(2\alpha \approx 2.9\) points, consistent with the published literature and <a href="https://kenpom.com/">KenPom’s estimate</a> of ~3.5 raw points (our estimate is slightly lower because the model accounts for team quality differences simultaneously).</p>

<p>This parameter matters for training but zeroes out for tournament predictions, since all men’s tournament games are on neutral courts. I’ll come back to why this distinction matters for the women’s tournament later.</p>

<h2 id="simulating-the-tournament">Simulating the Tournament</h2>

<p>Win probabilities come from the Gaussian CDF applied to the strength difference. For two teams \(i\) and \(j\) on a neutral court:</p>

\[P(i \text{ beats } j) = \Phi\left(\frac{(\text{off}_i - \text{def}_j) - (\text{off}_j - \text{def}_i)}{\sigma\sqrt{2}}\right)\]

<p>The \(\sqrt{2}\) comes from the fact that we’re comparing the <em>difference</em> of two independent score random variables, each with variance \(\sigma^2\).</p>

<p>For each of 10,000 simulations, I draw a complete set of team strengths from the posterior and play through the 68-team bracket. If only I could spam all of them to my bracket pool.</p>

<h3 id="championship-odds">Championship Odds</h3>

<p><img src="/blogimages/march-madness-2026/championship_odds.png" alt="Championship probabilities" /></p>

<p>Again, the four number one seeds rise to the top, monopolizing nearly 50% of the championship probability. However, that does mean a 50% chance none of them win the tournament.</p>

<h3 id="advancement-probabilities">Advancement Probabilities</h3>

<p><img src="/blogimages/march-madness-2026/advancement_heatmap.png" alt="Tournament advancement probabilities" /></p>

<p>The advancement heatmap tells the full story. Notice how probability drops off dramatically at each round. Even #1 seeds only have ~50% probability of making the Final Four. This is because as the tournament progresses, you’re playing better competition, and you have more and more hurdles to clear. If the first round you have a 10% chance of dropping, the second you have 20%, that’s a total of a 28% chance of dropping out in the first two rounds, worse than one-in-four.</p>

<p>Below is the breakdown by region:</p>

<p><img src="/blogimages/march-madness-2026/bracket_forecast.png" alt="Regional bracket forecasts" /></p>

<h3 id="upset-watch">Upset Watch</h3>

<p><img src="/blogimages/march-madness-2026/upset_probabilities.png" alt="Upset probabilities" /></p>

<p>Worth noting that the model is fairly chalky here. It will generally agree with the committee’s seeding, since it’s estimating overall team strength from the same regular season data. However, it does see Texas A&amp;M as the one 10 seed to beat a 7 in the first round (though, it’s just over 50%, so effectively a coin flip).</p>

<h3 id="tail-outcomes">Tail Outcomes</h3>

<p>The most fun part of running 10,000 brackets: the tail outcomes. These are things that could happen, the model assigns them nonzero probability, even if they’re unlikely.</p>

<p><strong>Deepest Cinderella runs across 10,000 simulations:</strong></p>
<ul>
  <li><strong>Northern Iowa (12)</strong> won the championship in 2 simulations</li>
  <li><strong>NC State (11)</strong> and <strong>SMU (11)</strong> each won the championship</li>
  <li><strong>Cal Baptist (13)</strong> reached the Championship game</li>
  <li><strong>Tennessee St (15)</strong> reached the Championship game</li>
  <li><strong>Siena (16)</strong> reached the Elite Eight</li>
</ul>

<h2 id="historical-validation">Historical Validation</h2>

<p>I validated by fitting the offense-defense model on each regular season from 2015–2025 (skipping 2020’s COVID cancellation) and predicting that year’s tournament.</p>

<p><img src="/blogimages/march-madness-2026/calibration.png" alt="Calibration plot" /></p>

<p><strong>Results across 669 tournament games:</strong></p>
<ul>
  <li><strong>Brier score: 0.189</strong></li>
  <li><strong>Accuracy: 70.4%</strong></li>
  <li><strong>Log loss: 0.555</strong></li>
</ul>

<p>It is worth calling out that this Brier score is equivalent to a single strength parameter model. Sort of a no-free-lunch thing going on here, we’re a bit constrained by using just the score margin as our outcome. That being said, the decomposition paints a much clearer picture of team structure, making the model interpretable and useful in a way not achievable by a single parameter model.</p>

<p>The calibration plot shows the model is <em>reasonably</em> well-calibrated. There is a slight bit of the characteristic “s-curve” shape which comes from overfitting that could be improved upon in the future.</p>

<h2 id="the-womens-tournament">The Women’s Tournament</h2>

<p>I fit an independent model on the women’s regular season using the identical offense-defense specification. The results tell a very different story from the men’s tournament.</p>

<h3 id="uconn-and-the-chalkiness-gap">UConn and the Chalkiness Gap</h3>

<p>The men’s tournament has two co-favorites separated by 1.3 percentage points. The women’s tournament has <em>UConn</em> and everyone else:</p>

<p><img src="/blogimages/march-madness-2026/championship_comparison.png" alt="Championship odds comparison" /></p>

<table>
  <thead>
    <tr>
      <th>Team</th>
      <th>Seed</th>
      <th>P(Champion)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Connecticut</td>
      <td>1</td>
      <td>33.8%</td>
    </tr>
    <tr>
      <td>South Carolina</td>
      <td>1</td>
      <td>18.1%</td>
    </tr>
    <tr>
      <td>UCLA</td>
      <td>1</td>
      <td>15.6%</td>
    </tr>
    <tr>
      <td>Texas</td>
      <td>1</td>
      <td>13.8%</td>
    </tr>
    <tr>
      <td>LSU</td>
      <td>2</td>
      <td>13.1%</td>
    </tr>
  </tbody>
</table>

<p>UConn’s 34-0 record translates to a posterior strength distribution that clearly leads the pack, they are elite on both offense and defense.</p>

<p><img src="/blogimages/march-madness-2026/team_strengths_top30_womens.png" alt="Women's team strengths" /></p>

<p>The structural difference between the men’s and women’s fields is stark. The men’s top 30 is a smooth gradient where each team’s HDI overlaps with the teams around it. The women’s field has a clear break: five teams in one tier, Texas sort of in-between, then below that, the rest of the tournament field packs into a narrow band.</p>

<p>Talent in women’s college basketball is less equitably distributed. The top programs separate themselves from the field by a wider margin than their men’s counterparts, and the model sees it directly in the regular season results. For one, the correlation between offense and defense is much stronger in the women’s tournament.</p>

<p><img src="/blogimages/march-madness-2026/off_def_scatter_womens.png" alt="Women's offense vs defense" /></p>

<p>We can quantify the concentration with a <a href="https://en.wikipedia.org/wiki/Gini_coefficient">Gini coefficient</a> over championship probabilities (0 means every team has equal odds, 1 means a single team wins every simulation). The women’s field is substantially more top-heavy: a Gini of <strong>0.93</strong> compared to <strong>0.79</strong> for the men’s. Put differently, only 29 women’s teams won the title in any of 10,000 simulations, compared to 46 on the men’s side.</p>

<h3 id="womens-advancement-probabilities">Women’s Advancement Probabilities</h3>

<p><img src="/blogimages/march-madness-2026/advancement_heatmap_womens.png" alt="Women's advancement probabilities" /></p>

<p>The advancement heatmap makes the concentration visible. The women’s top seeds are a sea of deep red through the Sweet 16 and beyond, while probability drops off far more sharply for everyone else.</p>

<h3 id="home-court-advantage-this-time-it-actually-matters">Home Court Advantage: This Time it Actually Matters</h3>

<p>The women’s model estimates \(\alpha\) at <strong>1.37 ± 0.10</strong>, almost identical to the men’s 1.44. The posteriors overlap almost entirely, suggesting that home court is roughly the same phenomenon across genders (a margin effect of about 2.7-2.9 points).</p>

<p><img src="/blogimages/march-madness-2026/home_court.png" alt="Home court advantage comparison" /></p>

<p>It is worth noting that this affects predictions differently than in the men’s tournament. In the men’s tournament, every game is neutral-site, so home court zeros out entirely. In the women’s tournament, <strong>the top 16 seeds host rounds 1 and 2 on their home courts</strong>. That ~2.8-point margin advantage compounds on top of the strength differential that already favors higher seeds, making the early rounds even more lopsided.</p>

<p><img src="/blogimages/march-madness-2026/bracket_forecast_womens.png" alt="Women's bracket forecast" /></p>

<h2 id="the-brackets">The Brackets</h2>

<p>Based on the most likely outcome at each matchup:</p>

<h3 id="mens">Men’s</h3>

<p><strong>Final Four:</strong> Duke, Florida, Michigan, Arizona</p>

<p><strong>Championship Game:</strong> Michigan vs. Duke</p>

<p><strong>Champion:</strong> Michigan (16.2%, meaning we’re 83.8% sure this is wrong)</p>

<h3 id="womens">Women’s</h3>

<p><strong>Final Four:</strong> UConn, South Carolina, UCLA, Texas</p>

<p><strong>Championship Game:</strong> UConn vs. South Carolina</p>

<p><strong>Champion:</strong> UConn (33.8%, meaning we’re 66.2% sure this is wrong)</p>

<p>The contrast captures the structural difference between the two fields. The men’s champion is nearly a coin-flip among three or four teams. The women’s champion is the clearest favorite the model produces, but 33.8% still means there’s a 66.2% chance that UConn loses. Single-elimination tournaments are designed to produce uncertainty, and even a dominant team can only be so dominant across six consecutive games.</p>

<h2 id="following-along">Following Along</h2>

<p>I’ve put together an <a href="/march-madness-2026/">interactive dashboard</a> that will update daily as the tournament progresses, so you can watch the model’s predictions shift as games are played and teams are eliminated. It is worth noting that these predictions run the statistical model live every morning, meaning there may be relatively small differences between predictions here and the dashboard, since I do not fix the random seed.</p>

<p>I also submitted these predictions to the <a href="https://www.kaggle.com/competitions/march-machine-learning-mania-2026">Kaggle March ML Mania 2026</a> competition to see how they stack up against other approaches. By no means do I expect this to win, since it’s more a pedagogy-first exercise and more feature-dense approaches will likely perform better. That being said, it will be fun for a performance comparison.</p>

<h2 id="code">Code</h2>

<p>All code for this analysis is available on <a href="https://github.com/tjburch/march-madness">GitHub</a>.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="bayesian" /><category term="pymc" /><category term="sports" /><category term="march-madness" /><summary type="html"><![CDATA[Building a Bayesian offense-defense model for the 2026 NCAA Tournament, finding a Simpson's paradox hiding in the correlation, and what running the same model on both tournaments reveals about the structure of the game]]></summary></entry><entry><title type="html">March Madness 2026 — Interactive Forecast Dashboard</title><link href="https://tylerjamesburch.com/march-madness-2026/" rel="alternate" type="text/html" title="March Madness 2026 — Interactive Forecast Dashboard" /><published>2026-03-16T00:00:00+00:00</published><updated>2026-03-16T00:00:00+00:00</updated><id>https://tylerjamesburch.com/march-madness-dashboard</id><content type="html" xml:base="https://tylerjamesburch.com/march-madness-2026/"><![CDATA[<div id="mm-dashboard">

  <div class="mm-header">
    <p class="mm-subtitle">
      Bracket forecasts from a latent offense+defense skill hierarchical model fit with PyMC.
      <br />
      <a href="/blog/statistics/march-madness-2026">Read the methodology</a>
    </p>
    <p class="mm-kaggle-line" id="kaggle-card" style="display:none;">
      <a href="https://www.kaggle.com/competitions/march-machine-learning-mania-2026/leaderboard" target="_blank" rel="noopener">Kaggle Leaderboard</a>
      <span class="mm-kaggle-sep">—</span>
      <span class="mm-kaggle-stats" id="kaggle-stats"></span>
      <br />
      <span class="mm-kaggle-note">Public leaderboard; will differ from per-game Brier</span>
    </p>
  </div>

  <div class="mm-controls">
    <div class="mm-gender-toggle" id="gender-toggle">
      <button class="mm-toggle-btn active" data-gender="M">Men's</button>
      <button class="mm-toggle-btn" data-gender="W">Women's</button>
    </div>
    <div class="mm-date-selector" id="date-selector">
      <label for="snapshot-date">Snapshot:</label>
      <select id="snapshot-date"></select>
    </div>
    <span class="mm-last-updated" id="last-updated"></span>
  </div>

  <div class="mm-loading" id="loading-indicator">
    <div class="mm-spinner"></div>
    Loading forecast data...
  </div>

  <div class="mm-content" id="dashboard-content" style="display:none;">

    <section class="mm-section" id="section-championship">
      <h2>Championship Odds</h2>
      <div id="championship-chart"></div>
    </section>

    <section class="mm-section" id="section-heatmap">
      <h2>Advancement Probabilities</h2>
      <div id="advancement-heatmap"></div>
    </section>

    <section class="mm-section" id="section-brackets">
      <h2>Region Brackets</h2>
      <div class="mm-region-tabs" id="region-tabs">
        <button class="mm-region-tab active" data-region="W">West</button>
        <button class="mm-region-tab" data-region="X">East</button>
        <button class="mm-region-tab" data-region="Y">South</button>
        <button class="mm-region-tab" data-region="Z">Midwest</button>
      </div>
      <div id="region-bracket-content"></div>
    </section>

    <section class="mm-section" id="section-team-detail" style="display:none;">
      <h2 id="team-detail-title">Team Deep Dive</h2>
      <div class="mm-team-header" id="team-detail-header"></div>
      <div class="mm-team-charts">
        <div id="team-advancement-chart"></div>
        <div id="team-timeline-chart"></div>
      </div>
    </section>

    <section class="mm-section" id="section-predictions">
      <h2>Predictions vs. Reality</h2>
      <div id="predictions-content"></div>
    </section>

  </div>
</div>

<script src="https://cdn.plot.ly/plotly-2.35.0.min.js"></script>

<script src="/assets/js/march-madness-dashboard.js"></script>

<div style="text-align: center; margin-top: 2em; padding-top: 1em; border-top: 1px solid #e0e0e0;">
<script type="text/javascript" src="https://cdnjs.buymeacoffee.com/1.0.0/button.prod.min.js" data-name="bmc-button" data-slug="tylerjamesburch" data-color="#f0a400" data-emoji="" data-font="Comic" data-text="Buy me a coffee" data-outline-color="#000000" data-font-color="#000000" data-coffee-color="#FFDD00"></script>
</div>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="bayesian" /><category term="interactive" /><category term="march-madness" /><summary type="html"><![CDATA[Live Bayesian bracket predictions for the 2026 NCAA Tournament, updated daily.]]></summary></entry><entry><title type="html">2024 Rewind: Orthogonal Polynomial Regression in Bambi</title><link href="https://tylerjamesburch.com/blog/statistics/orthogonal-polynomial-regression-bambi" rel="alternate" type="text/html" title="2024 Rewind: Orthogonal Polynomial Regression in Bambi" /><published>2026-02-23T00:00:00+00:00</published><updated>2026-02-23T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/statistics/orthogonal-polynomial-regression-bambi</id><content type="html" xml:base="https://tylerjamesburch.com/blog/statistics/orthogonal-polynomial-regression-bambi"><![CDATA[<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<p>This is the second of two notebooks I wrote and contributed to <a href="https://bambinos.github.io/bambi/">Bambi’s</a> example documentation back in 2024. The first post, covering polynomial regression basics, is <a href="/blog/statistics/polynomial-regression-bambi">here</a>. This one goes deeper into what happens when you use the <code class="language-plaintext highlighter-rouge">poly</code> keyword in a Bambi formula. Specifically, looking at the orthogonalization that happens under the hood.</p>

<p>The original notebook lives in the <a href="https://bambinos.github.io/bambi/notebooks/orthogonal_polynomial_reg.html">Bambi docs</a>. What follows is the content, lightly adapted for this blog.</p>

<hr />

<h1 id="orthogonal-polynomial-regression">Orthogonal Polynomial Regression</h1>

<p>While the content here can stand alone, it is a companion to the <a href="/blog/statistics/polynomial-regression-bambi">polynomial regression post</a>, which contains additional useful examples.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">warnings</span>

<span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">bambi</span> <span class="k">as</span> <span class="n">bmb</span>
<span class="kn">import</span> <span class="nn">formulae</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="kn">import</span> <span class="nn">scipy</span>
<span class="kn">import</span> <span class="nn">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
<span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span>

<span class="n">SEED</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">warnings</span><span class="p">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s">"ignore"</span><span class="p">)</span>
</code></pre></div></div>

<h2 id="revisiting-polynomial-regression">Revisiting Polynomial Regression</h2>

<p>To start, we’ll recreate the projectile motion data defined in the <a href="/blog/statistics/polynomial-regression-bambi">polynomial regression post</a> with \(x_0 = 1.5\) \(m\) and \(v_0 = 7\) \(m\)/\(s\). This will follow:</p>

\[x_f = \frac{1}{2} g t^2 + v_0 t + x_0\]

<p>Where \(g\) will be the acceleration of gravity on Earth, \(-9.81\) \(m\)/\(s^2\). First we’ll generate the data.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">g</span> <span class="o">=</span> <span class="o">-</span><span class="mf">9.81</span>
<span class="n">v0</span> <span class="o">=</span> <span class="mi">7</span>
<span class="n">x0</span> <span class="o">=</span> <span class="mf">1.5</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">x_projectile</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">g</span> <span class="o">*</span> <span class="n">t</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">v0</span> <span class="o">*</span> <span class="n">t</span> <span class="o">+</span> <span class="n">x0</span>
<span class="n">noise</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="n">x_projectile</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">x_obs_projectile</span> <span class="o">=</span> <span class="n">x_projectile</span> <span class="o">+</span> <span class="n">noise</span>
<span class="n">df_projectile</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"t"</span><span class="p">:</span> <span class="n">t</span><span class="p">,</span> <span class="s">"x"</span><span class="p">:</span> <span class="n">x_obs_projectile</span><span class="p">,</span> <span class="s">"x_true"</span><span class="p">:</span> <span class="n">x_projectile</span><span class="p">})</span>
<span class="n">df_projectile</span> <span class="o">=</span> <span class="n">df_projectile</span><span class="p">[</span><span class="n">df_projectile</span><span class="p">[</span><span class="s">"x"</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">]</span>

<span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_projectile</span><span class="p">.</span><span class="n">t</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Observed Displacement'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C0"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_projectile</span><span class="p">.</span><span class="n">t</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">.</span><span class="n">x_true</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Function'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C1"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'Time (s)'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Displacement (m)'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/projectile-motion-data.png" alt="Projectile motion data" /></p>

<p>Putting this into Bambi, we set \(\beta_2 = \frac{g}{2}\), \(\beta_1 = v_0\), and \(\beta_0 = x_0\), then perform the following regression:</p>

\[x_f = \beta_2 t^2 + \beta_1 t + \beta_0\]

<p>We expect to recover \(\beta_2 = -4.905\), \(\beta_1 = 7\), \(\beta_0 = 1.5\) from our fit. We start with the approach from the other notebook where we explicitly tell formulae to calculate coefficients on \(t^2\) and \(t\).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_projectile_all_terms</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"x ~ I(t**2) + t + 1"</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">)</span>
<span class="n">fit_projectile_all_terms</span> <span class="o">=</span> <span class="n">model_projectile_all_terms</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">fit_projectile_all_terms</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma      0.173  0.014   0.148    0.199      0.000    0.000    2978.0    2514.0    1.0
Intercept  1.455  0.057   1.346    1.558      0.001    0.001    2553.0    2663.0    1.0
I(t ** 2) -4.999  0.097  -5.181   -4.817      0.002    0.002    2067.0    1996.0    1.0
t          7.186  0.161   6.872    7.483      0.003    0.003    2120.0    1967.0    1.0
</code></pre></div></div>

<p>The parameters are recovered as anticipated.</p>

<p>If you want to include <em>all</em> terms of a variable up to a given degree, you can also use the keyword <code class="language-plaintext highlighter-rouge">poly</code>. So if we want the linear and quadratic effects, as in this case, we would designate <code class="language-plaintext highlighter-rouge">poly(t, 2)</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_projectile_poly</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"x ~ poly(t, 2) + 1"</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">)</span>
<span class="n">fit_projectile_poly</span> <span class="o">=</span> <span class="n">model_projectile_poly</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">fit_projectile_poly</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma          0.173  0.014   0.147    0.201      0.000    0.000    4781.0    3156.0    1.0
Intercept      2.883  0.019   2.849    2.919      0.000    0.000    6952.0    3211.0    1.0
poly(t, 2)[0] -3.792  0.173  -4.109   -3.458      0.002    0.003    6298.0    2977.0    1.0
poly(t, 2)[1] -8.987  0.171  -9.307   -8.669      0.002    0.003    6499.0    3087.0    1.0
</code></pre></div></div>

<p>Now there are fitted coefficients for \(t\) and \(t^2\), but wait, those aren’t the parameters we used! What’s going on here?</p>

<h2 id="the-poly-keyword">The <code class="language-plaintext highlighter-rouge">poly</code> Keyword</h2>

<p>To fully understand what’s going on under the hood, we must wade into some linear algebra. When the <code class="language-plaintext highlighter-rouge">poly</code> keyword is used, instead of directly using the values of \(x, x^2, x^3, \dots, x^n\), it converts them into <em>orthogonal polynomials</em>. When including the effect from multiple polynomial terms, there will generally be correlation between them. Including all of these into a model can be a problem from the fitting perspective due to multicollinearity. By orthogonalizing, the correlation is removed by design.</p>

<p>As it turns out, it’s difficult to get any information on <em>how</em> the orthogonalization is performed. <a href="https://github.com/bambinos/formulae/blob/b00f53da4b092ea13eeeabe92866736e97d56db0/formulae/transforms.py#L400-L426">Here is the implementation for <code class="language-plaintext highlighter-rouge">poly</code> in formulae</a>, but to fully understand, I went into the <a href="https://svn.r-project.org/R/trunk/src/library/stats/R/contr.poly.R">source code for the R Stats library</a> where <code class="language-plaintext highlighter-rouge">poly</code> is defined as a function for use on any vector, and took a look at its code.</p>

<p>Here’s a step-by-step summary, along with a toy example for \(x^4\).</p>

<ul>
  <li>The data is first centered around the mean for stability</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>

<span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">X_centered</span> <span class="o">=</span> <span class="n">X</span> <span class="o">-</span> <span class="n">mean</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Array: </span><span class="si">{</span><span class="n">X</span><span class="si">}</span><span class="s">, mean: </span><span class="si">{</span><span class="n">mean</span><span class="si">}</span><span class="s">.</span><span class="se">\n</span><span class="s">Centered: </span><span class="si">{</span><span class="n">X_centered</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Array: [1 2 3 4 5], mean: 3.0.
Centered: [-2. -1.  0.  1.  2.]
</code></pre></div></div>

<ul>
  <li>A <em>Vandermonde matrix</em> is created. This just takes the input data and generates a matrix where columns represent increasing polynomial degrees. In this example, the first column is \(x^0\), a constant term. The second is \(x^1\), or the centered data. The third column is \(x^2\), the fourth is \(x^3\), the last is \(x^4\).</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">degree</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">simple_vander</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">vander</span><span class="p">(</span><span class="n">X_centered</span><span class="p">,</span> <span class="n">N</span><span class="o">=</span><span class="n">degree</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">increasing</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">simple_vander</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 1., -2.,  4., -8., 16.],
       [ 1., -1.,  1., -1.,  1.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  2.,  4.,  8., 16.]])
</code></pre></div></div>

<ul>
  <li>QR decomposition is performed. There are <a href="https://en.wikipedia.org/wiki/QR_decomposition">several methods to doing this in practice</a>, the most common being the <a href="https://en.wikipedia.org/wiki/Gram%E2%80%93Schmidt_process">Gram-Schmidt process</a>. Here I just take advantage of the <a href="https://numpy.org/doc/stable/reference/generated/numpy.linalg.qr.html">Numpy implementation</a>. We take the above matrix and convert it into two components, an orthogonal matrix \(Q\), and an upper triangular matrix \(R\).</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">q</span><span class="p">,</span> <span class="n">r</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">qr</span><span class="p">(</span><span class="n">simple_vander</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Orthogonal matrix Q:</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">q</span><span class="p">.</span><span class="nb">round</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"</span><span class="se">\n</span><span class="s">Upper triangular matrix R:</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">r</span><span class="p">.</span><span class="nb">round</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Orthogonal matrix Q:
 [[-0.4472 -0.6325  0.5345 -0.3162 -0.1195]
 [-0.4472 -0.3162 -0.2673  0.6325  0.4781]
 [-0.4472 -0.     -0.5345  0.     -0.7171]
 [-0.4472  0.3162 -0.2673 -0.6325  0.4781]
 [-0.4472  0.6325  0.5345  0.3162 -0.1195]]

Upper triangular matrix R:
 [[ -2.2361  -0.      -4.4721  -0.     -15.2053]
 [  0.       3.1623   0.      10.7517   0.    ]
 [  0.       0.       3.7417   0.      16.5702]
 [  0.       0.       0.       3.7947   0.    ]
 [  0.       0.       0.       0.      -2.8685]]
</code></pre></div></div>

<ul>
  <li>Last take the dot product of \(Q\) with the diagonal elements of \(R\). \(Q\) is then scaled to the magnitude of the polynomial degrees in \(R\). This serves as our transformation matrix which transforms input data into the space defined by the orthogonal polynomials.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">diagonal</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">r</span><span class="p">))</span>  <span class="c1"># First call gets elements, second creates diag matrix
</span><span class="n">transformation_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">diagonal</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">transformation_matrix</span><span class="p">.</span><span class="nb">round</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[ 1.     -2.      2.     -1.2     0.3429]
 [ 1.     -1.     -1.      2.4    -1.3714]
 [ 1.     -0.     -2.      0.      2.0571]
 [ 1.      1.     -1.     -2.4    -1.3714]
 [ 1.      2.      2.      1.2     0.3429]]
</code></pre></div></div>

<ul>
  <li>From the transformation matrix, we get squared norms (<code class="language-plaintext highlighter-rouge">norm2</code>), which give us the scale of each polynomial. We also get the value by which we need to shift each polynomial to match the centered data (<code class="language-plaintext highlighter-rouge">alpha</code>).</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">norm2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">transformation_matrix</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

<span class="n">weighted_sums</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span>
    <span class="p">(</span><span class="n">transformation_matrix</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">X_centered</span><span class="p">,</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)),</span>
    <span class="n">axis</span><span class="o">=</span><span class="mi">0</span>
<span class="p">)</span>
<span class="n">normalized_sums</span> <span class="o">=</span> <span class="n">weighted_sums</span> <span class="o">/</span> <span class="n">norm2</span>
<span class="n">adjusted_sums</span> <span class="o">=</span> <span class="n">normalized_sums</span> <span class="o">+</span> <span class="n">mean</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="n">adjusted_sums</span><span class="p">[:</span><span class="n">degree</span><span class="p">]</span>

<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Norm2: </span><span class="si">{</span><span class="n">norm2</span><span class="si">}</span><span class="se">\n</span><span class="s">alpha: </span><span class="si">{</span><span class="n">alpha</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Norm2: [ 5.         10.         14.         14.4         8.22857143]
alpha: [3. 3. 3. 3.]
</code></pre></div></div>

<ul>
  <li>Finally, we iteratively apply this to all desired polynomial degrees, shifting the data and scaling by the squared norms appropriately to maintain orthogonality with the prior term.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">transformed_X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">full</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="n">degree</span><span class="o">+</span><span class="mi">1</span><span class="p">),</span> <span class="n">np</span><span class="p">.</span><span class="n">nan</span><span class="p">)</span>
<span class="n">transformed_X</span><span class="p">[:,</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="n">transformed_X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span> <span class="o">-</span> <span class="n">alpha</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">degree</span><span class="p">):</span>
    <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span>
        <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">*</span> <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span><span class="p">]</span> <span class="o">-</span>
        <span class="p">(</span><span class="n">norm2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">norm2</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">])</span> <span class="o">*</span> <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
    <span class="p">)</span>

<span class="n">transformed_X</span> <span class="o">/=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">norm2</span><span class="p">)</span>
<span class="n">transformed_X</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[ 4.47213595e-01, -6.32455532e-01,  5.34522484e-01,
        -3.16227766e-01,  1.19522861e-01],
       [ 4.47213595e-01, -3.16227766e-01, -2.67261242e-01,
         6.32455532e-01, -4.78091444e-01],
       [ 4.47213595e-01,  0.00000000e+00, -5.34522484e-01,
         2.34055565e-16,  7.17137166e-01],
       [ 4.47213595e-01,  3.16227766e-01, -2.67261242e-01,
        -6.32455532e-01, -4.78091444e-01],
       [ 4.47213595e-01,  6.32455532e-01,  5.34522484e-01,
         3.16227766e-01,  1.19522861e-01]])
</code></pre></div></div>

<p>This is now a matrix of orthogonalized polynomials of X. The first column is just a constant. The second column corresponds to the input \(x\), the next is \(x^2\) and so on. In most implementations, the constant term is eliminated, giving us the following final matrix.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">transformed_X</span><span class="p">[:,</span><span class="mi">1</span><span class="p">:]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-6.32455532e-01,  5.34522484e-01, -3.16227766e-01,
         1.19522861e-01],
       [-3.16227766e-01, -2.67261242e-01,  6.32455532e-01,
        -4.78091444e-01],
       [ 0.00000000e+00, -5.34522484e-01,  2.34055565e-16,
         7.17137166e-01],
       [ 3.16227766e-01, -2.67261242e-01, -6.32455532e-01,
        -4.78091444e-01],
       [ 6.32455532e-01,  5.34522484e-01,  3.16227766e-01,
         1.19522861e-01]])
</code></pre></div></div>

<p>The approach shown in this derivation has been reproduced below as a Scikit-Learn style class, where the <code class="language-plaintext highlighter-rouge">fit</code> method calculates the coefficients and the <code class="language-plaintext highlighter-rouge">transform</code> method returns orthogonalized data. It is also <a href="https://gist.github.com/tjburch/062547b3600f81db73b40feb044bab2a#file-orthogonalpolynomialtransformer-py">at this gist</a>, including the typical <code class="language-plaintext highlighter-rouge">BaseEstimator</code>, <code class="language-plaintext highlighter-rouge">TransformerMixin</code> inheritances.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">OrthogonalPolynomialTransformer</span><span class="p">:</span>
    <span class="s">"""Transforms input data using orthogonal polynomials."""</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">degree</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">=</span> <span class="n">degree</span> <span class="o">+</span> <span class="mi">1</span>  <span class="c1"># Account for constant term
</span>        <span class="bp">self</span><span class="p">.</span><span class="n">norm2</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">alpha</span> <span class="o">=</span> <span class="bp">None</span>

    <span class="k">def</span> <span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="s">"""Calculate transformation matrix, extract norm2 and alpha."""</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">norm2</span> <span class="o">=</span> <span class="bp">None</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">alpha</span> <span class="o">=</span> <span class="bp">None</span>

        <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">X</span><span class="p">).</span><span class="n">flatten</span><span class="p">()</span>
        <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">&gt;=</span> <span class="nb">len</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">unique</span><span class="p">(</span><span class="n">X</span><span class="p">)):</span>
            <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span>
                <span class="s">"'degree' must be less than the number of unique data points."</span>
            <span class="p">)</span>

        <span class="n">mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
        <span class="n">X_centered</span> <span class="o">=</span> <span class="n">X</span> <span class="o">-</span> <span class="n">mean</span>

        <span class="n">vandermonde</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">vander</span><span class="p">(</span><span class="n">X_centered</span><span class="p">,</span> <span class="n">N</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">increasing</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
        <span class="n">Q</span><span class="p">,</span> <span class="n">R</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linalg</span><span class="p">.</span><span class="n">qr</span><span class="p">(</span><span class="n">vandermonde</span><span class="p">)</span>

        <span class="n">diagonal</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">diag</span><span class="p">(</span><span class="n">R</span><span class="p">))</span>
        <span class="n">transformation_matrix</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">Q</span><span class="p">,</span> <span class="n">diagonal</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">norm2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">transformation_matrix</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

        <span class="n">weighted_sums</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span>
            <span class="p">(</span><span class="n">transformation_matrix</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">X_centered</span><span class="p">,</span> <span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)),</span>
            <span class="n">axis</span><span class="o">=</span><span class="mi">0</span>
        <span class="p">)</span>
        <span class="n">normalized_sums</span> <span class="o">=</span> <span class="n">weighted_sums</span> <span class="o">/</span> <span class="bp">self</span><span class="p">.</span><span class="n">norm2</span>
        <span class="n">adjusted_sums</span> <span class="o">=</span> <span class="n">normalized_sums</span> <span class="o">+</span> <span class="n">mean</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">alpha</span> <span class="o">=</span> <span class="n">adjusted_sums</span><span class="p">[:</span><span class="bp">self</span><span class="p">.</span><span class="n">degree</span><span class="p">]</span>
        <span class="k">return</span> <span class="bp">self</span>

    <span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
        <span class="s">"""Iteratively apply up to 'degree'."""</span>
        <span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">asarray</span><span class="p">(</span><span class="n">X</span><span class="p">).</span><span class="n">flatten</span><span class="p">()</span>
        <span class="n">transformed_X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">empty</span><span class="p">((</span><span class="nb">len</span><span class="p">(</span><span class="n">X</span><span class="p">),</span> <span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span>

        <span class="n">transformed_X</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
        <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
            <span class="n">transformed_X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">X</span> <span class="o">-</span> <span class="bp">self</span><span class="p">.</span><span class="n">alpha</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>

        <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">degree</span> <span class="o">&gt;</span> <span class="mi">1</span><span class="p">:</span>
            <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">degree</span><span class="p">):</span>
                <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span>
                    <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="bp">self</span><span class="p">.</span><span class="n">alpha</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">*</span> <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span><span class="p">]</span> <span class="o">-</span>
                    <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">norm2</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="bp">self</span><span class="p">.</span><span class="n">norm2</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">])</span> <span class="o">*</span> <span class="n">transformed_X</span><span class="p">[:,</span> <span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span>
                <span class="p">)</span>

        <span class="n">transformed_X</span> <span class="o">/=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">norm2</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">transformed_X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">:</span><span class="bp">self</span><span class="p">.</span><span class="n">degree</span><span class="p">]</span>

    <span class="k">def</span> <span class="nf">fit_transform</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
</code></pre></div></div>

<p>An example call is shown below. It’s worth noting that in this implementation, the constant term is not returned, the first column corresponds to \(x\), the second to \(x^2\), and the third to \(x^3\).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">X</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
<span class="n">poly3</span> <span class="o">=</span> <span class="n">OrthogonalPolynomialTransformer</span><span class="p">(</span><span class="n">degree</span><span class="o">=</span><span class="mi">3</span><span class="p">).</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
<span class="n">poly3</span><span class="p">.</span><span class="n">transform</span><span class="p">(</span><span class="n">X</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-6.32455532e-01,  5.34522484e-01, -3.16227766e-01],
       [-3.16227766e-01, -2.67261242e-01,  6.32455532e-01],
       [ 0.00000000e+00, -5.34522484e-01,  2.34055565e-16],
       [ 3.16227766e-01, -2.67261242e-01, -6.32455532e-01],
       [ 6.32455532e-01,  5.34522484e-01,  3.16227766e-01]])
</code></pre></div></div>

<p>This matches what you may get when calling the same function in R:</p>

<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span><span class="w"> </span><span class="n">poly</span><span class="p">(</span><span class="n">X</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">)</span><span class="w">
                 </span><span class="m">1</span><span class="w">          </span><span class="m">2</span><span class="w">             </span><span class="m">3</span><span class="w">          </span><span class="m">4</span><span class="w">
</span><span class="p">[</span><span class="m">1</span><span class="p">,]</span><span class="w"> </span><span class="m">-6.324555e-01</span><span class="w">  </span><span class="m">0.5345225</span><span class="w"> </span><span class="m">-3.162278e-01</span><span class="w">  </span><span class="m">0.1195229</span><span class="w">
</span><span class="p">[</span><span class="m">2</span><span class="p">,]</span><span class="w"> </span><span class="m">-3.162278e-01</span><span class="w"> </span><span class="m">-0.2672612</span><span class="w">  </span><span class="m">6.324555e-01</span><span class="w"> </span><span class="m">-0.4780914</span><span class="w">
</span><span class="p">[</span><span class="m">3</span><span class="p">,]</span><span class="w"> </span><span class="m">-3.288380e-17</span><span class="w"> </span><span class="m">-0.5345225</span><span class="w">  </span><span class="m">9.637305e-17</span><span class="w">  </span><span class="m">0.7171372</span><span class="w">
</span><span class="p">[</span><span class="m">4</span><span class="p">,]</span><span class="w">  </span><span class="m">3.162278e-01</span><span class="w"> </span><span class="m">-0.2672612</span><span class="w"> </span><span class="m">-6.324555e-01</span><span class="w"> </span><span class="m">-0.4780914</span><span class="w">
</span><span class="p">[</span><span class="m">5</span><span class="p">,]</span><span class="w">  </span><span class="m">6.324555e-01</span><span class="w">  </span><span class="m">0.5345225</span><span class="w">  </span><span class="m">3.162278e-01</span><span class="w">  </span><span class="m">0.1195229</span><span class="w">
</span></code></pre></div></div>

<p>or, most relevant, from formulae,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">formulae_poly</span> <span class="o">=</span> <span class="n">formulae</span><span class="p">.</span><span class="n">transforms</span><span class="p">.</span><span class="n">Polynomial</span><span class="p">()</span>
<span class="n">formulae_poly</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="mi">4</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[-0.63245553,  0.53452248, -0.31622777,  0.11952286],
       [-0.31622777, -0.26726124,  0.63245553, -0.47809144],
       [ 0.        , -0.53452248, -0.        ,  0.71713717],
       [ 0.31622777, -0.26726124, -0.63245553, -0.47809144],
       [ 0.63245553,  0.53452248,  0.31622777,  0.11952286]])
</code></pre></div></div>

<p>For an example, applying this function to x over a domain from 0-10,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span>

<span class="n">transformer</span> <span class="o">=</span> <span class="n">OrthogonalPolynomialTransformer</span><span class="p">(</span><span class="n">degree</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">x_orthogonalized</span> <span class="o">=</span> <span class="n">transformer</span><span class="p">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">x_orth</span> <span class="o">=</span> <span class="n">x_orthogonalized</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">x2_orth</span> <span class="o">=</span> <span class="n">x_orthogonalized</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>

<span class="n">data</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">vstack</span><span class="p">([</span><span class="n">x</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x_orth</span><span class="p">,</span> <span class="n">x2_orth</span><span class="p">]).</span><span class="n">T</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">'x'</span><span class="p">,</span> <span class="s">'$x^2$'</span><span class="p">,</span> <span class="s">'$x$ Orth'</span><span class="p">,</span> <span class="s">'$x^2$ Orth'</span><span class="p">])</span>
<span class="n">correlation_matrix</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">corr</span><span class="p">()</span>
<span class="n">sns</span><span class="p">.</span><span class="n">heatmap</span><span class="p">(</span><span class="n">correlation_matrix</span><span class="p">,</span> <span class="n">annot</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s">'Reds'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">rotation</span><span class="o">=</span><span class="mi">45</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/correlation-heatmap.png" alt="Correlation heatmap" /></p>

<p>We now see that the orthogonalized version of \(x\) and \(x^2\) are no longer correlated to each other. Next, we construct a response variable and plot against it.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">y</span> <span class="o">=</span> <span class="mi">3</span> <span class="o">*</span> <span class="n">x2</span> <span class="o">+</span> <span class="n">x</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">axs</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="n">sharey</span><span class="o">=</span><span class="s">'row'</span><span class="p">)</span>

<span class="n">plots</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s">'x'</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x2</span><span class="p">,</span> <span class="s">'$x^2$'</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="s">'Orthogonalized $x$'</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x2_orth</span><span class="p">,</span> <span class="s">'Orthogonalized $x^2$'</span><span class="p">,</span> <span class="bp">False</span><span class="p">)</span>
<span class="p">]</span>

<span class="k">for</span> <span class="n">ax</span><span class="p">,</span> <span class="n">plot_data</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">axs</span><span class="p">.</span><span class="n">flat</span><span class="p">,</span> <span class="n">plots</span><span class="p">):</span>
    <span class="n">x_val</span><span class="p">,</span> <span class="n">xlabel</span> <span class="o">=</span> <span class="n">plot_data</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">plot_data</span><span class="p">)</span> <span class="o">==</span> <span class="mi">3</span> <span class="ow">and</span> <span class="n">plot_data</span><span class="p">[</span><span class="mi">2</span><span class="p">]:</span>
        <span class="n">sns</span><span class="p">.</span><span class="n">regplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_val</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">line_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"color"</span><span class="p">:</span> <span class="s">"C1"</span><span class="p">})</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_val</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="n">xlabel</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'y'</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">plot_data</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">'Orthogonalized $x^2$'</span><span class="p">:</span>
        <span class="n">ax</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'k'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'--'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/orthogonalized-scatter.png" alt="Orthogonalized scatter plots" /></p>

<p>The top half shows the response variable against \(x\) and \(x^2\), this should look familiar.</p>

<p>The bottom half shows the new orthogonalized polynomial terms. First, you’ll notice the domain is centered at 0 and more compressed than the original scale, which is done within the orthogonalization process. Otherwise, the \(x\) term is the same. Remember in the construction, the first order is untouched, then subsequent terms are built orthogonal to the first degree polynomial.</p>

<p>I’ve shown a linear fit on top of the first order term. What you’ll notice is that the orthogonalized \(x^2\) correspond to the residuals of this line. At the lowest values of \(y\), the fit is poor, and this is where the orthogonalized \(x^2\) is highest. As the first order term crosses the linear fit, you see the orthogonalized \(x^2\) cross zero, then go to negative values as it dips under the linear fit. It crosses 0 one more time and then is once again poor at the highest values shown. Since the \(x^2\) is proportional to the residuals of the first order term, if we plot the orthogonalized \(x^2\) term against the residuals, we should see a linear trend.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">slope</span><span class="p">,</span> <span class="n">intercept</span><span class="p">,</span> <span class="n">r_value</span><span class="p">,</span> <span class="n">p_value</span><span class="p">,</span> <span class="n">std_err</span> <span class="o">=</span> <span class="n">scipy</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">linregress</span><span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>

<span class="n">y_pred</span> <span class="o">=</span> <span class="n">intercept</span> <span class="o">+</span> <span class="n">slope</span> <span class="o">*</span> <span class="n">x_orth</span>
<span class="n">residuals</span> <span class="o">=</span> <span class="n">y</span> <span class="o">-</span> <span class="n">y_pred</span>

<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Original data'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'C1'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Fitted line'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'$x$ Orth'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'y'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'$x$ Orth vs y with Linear Fit'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">x2_orth</span><span class="p">,</span> <span class="n">residuals</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'$x^2$ Orth'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'$x^2$ Orth vs Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'--'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/residuals-linear-trend.png" alt="Residuals linear trend" /></p>

<p>And, in fact, the linear trend bears out when plotting the orthogonal \(x^2\) vs the residuals.</p>

<p>We can take this a degree higher and look at a cubic term.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">x3</span> <span class="o">=</span> <span class="n">x</span><span class="o">**</span><span class="mi">3</span>
<span class="n">x2</span> <span class="o">=</span> <span class="n">x</span><span class="o">**</span><span class="mi">2</span>
<span class="n">y_cubic</span> <span class="o">=</span> <span class="mf">2.5</span> <span class="o">*</span> <span class="n">x3</span> <span class="o">-</span> <span class="mi">15</span> <span class="o">*</span> <span class="n">x2</span> <span class="o">+</span> <span class="mi">55</span> <span class="o">*</span> <span class="n">x</span>

<span class="n">transformer</span> <span class="o">=</span> <span class="n">OrthogonalPolynomialTransformer</span><span class="p">(</span><span class="n">degree</span><span class="o">=</span><span class="mi">3</span><span class="p">)</span>
<span class="n">x_orthogonalized</span> <span class="o">=</span> <span class="n">transformer</span><span class="p">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">x_orth</span> <span class="o">=</span> <span class="n">x_orthogonalized</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
<span class="n">x2_orth</span> <span class="o">=</span> <span class="n">x_orthogonalized</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">x3_orth</span> <span class="o">=</span> <span class="n">x_orthogonalized</span><span class="p">[:,</span> <span class="mi">2</span><span class="p">]</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">axs</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">8</span><span class="p">),</span> <span class="n">sharey</span><span class="o">=</span><span class="s">'row'</span><span class="p">)</span>

<span class="n">plots</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="s">'x'</span><span class="p">,</span> <span class="s">'x vs y'</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x2</span><span class="p">,</span> <span class="s">'$x^2$'</span><span class="p">,</span> <span class="s">'$x^2$ vs y'</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x3</span><span class="p">,</span> <span class="s">'$x^3$'</span><span class="p">,</span> <span class="s">'$x^3$ vs y'</span><span class="p">,</span> <span class="bp">False</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="s">'$x$ Orth'</span><span class="p">,</span> <span class="s">'$x$ Orth vs y'</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x2_orth</span><span class="p">,</span> <span class="s">'$x^2$ Orth'</span><span class="p">,</span> <span class="s">'$x^2$ Orth vs y'</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span>
    <span class="p">(</span><span class="n">x3_orth</span><span class="p">,</span> <span class="s">'$x^3$ Orth'</span><span class="p">,</span> <span class="s">'$x^3$ Orth vs y'</span><span class="p">,</span> <span class="bp">False</span><span class="p">)</span>
<span class="p">]</span>

<span class="k">for</span> <span class="n">ax</span><span class="p">,</span> <span class="n">plot_data</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">axs</span><span class="p">.</span><span class="n">flat</span><span class="p">,</span> <span class="n">plots</span><span class="p">):</span>
    <span class="n">x_val</span><span class="p">,</span> <span class="n">xlabel</span><span class="p">,</span> <span class="n">title</span> <span class="o">=</span> <span class="n">plot_data</span><span class="p">[:</span><span class="mi">3</span><span class="p">]</span>
    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">plot_data</span><span class="p">)</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="n">plot_data</span><span class="p">[</span><span class="mi">3</span><span class="p">]:</span>
        <span class="n">sns</span><span class="p">.</span><span class="n">regplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_val</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y_cubic</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">,</span> <span class="n">line_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"color"</span><span class="p">:</span> <span class="s">"C1"</span><span class="p">})</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_val</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y_cubic</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">ax</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="n">xlabel</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">'y'</span><span class="p">,</span> <span class="n">title</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">title</span> <span class="ow">in</span> <span class="p">(</span><span class="s">'$x^2$ Orth vs y'</span><span class="p">,</span> <span class="s">'$x^3$ Orth vs y'</span><span class="p">):</span>
        <span class="n">ax</span><span class="p">.</span><span class="n">axvline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'k'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'--'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/cubic-scatter.png" alt="Cubic scatter plots" /></p>

<p>At a cubic level, it’s a bit more difficult to see the trends, however, the procedure is still the same. We can model each subsequent term against the residuals of the prior, and we can see that since this data was constructed from a cubic function, the \(x^3\) plot against the residuals of the \(x^2\) term is linear.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">slope</span><span class="p">,</span> <span class="n">intercept</span><span class="p">,</span> <span class="n">r_value</span><span class="p">,</span> <span class="n">p_value</span><span class="p">,</span> <span class="n">std_err</span> <span class="o">=</span> <span class="n">scipy</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">linregress</span><span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y_cubic</span><span class="p">)</span>
<span class="n">y_pred</span> <span class="o">=</span> <span class="n">intercept</span> <span class="o">+</span> <span class="n">slope</span> <span class="o">*</span> <span class="n">x_orth</span>
<span class="n">residuals</span> <span class="o">=</span> <span class="n">y_cubic</span> <span class="o">-</span> <span class="n">y_pred</span>

<span class="n">slope_res</span><span class="p">,</span> <span class="n">intercept_res</span><span class="p">,</span> <span class="n">r_value_res</span><span class="p">,</span> <span class="n">p_value_res</span><span class="p">,</span> <span class="n">std_err_res</span> <span class="o">=</span> <span class="n">scipy</span><span class="p">.</span><span class="n">stats</span><span class="p">.</span><span class="n">linregress</span><span class="p">(</span>
    <span class="n">x2_orth</span><span class="p">,</span> <span class="n">residuals</span>
<span class="p">)</span>
<span class="n">residuals_pred</span> <span class="o">=</span> <span class="n">intercept_res</span> <span class="o">+</span> <span class="n">slope_res</span> <span class="o">*</span> <span class="n">x2_orth</span>
<span class="n">second_order_residuals</span> <span class="o">=</span> <span class="n">residuals</span> <span class="o">-</span> <span class="n">residuals_pred</span>

<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>

<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">y_cubic</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x_orth</span><span class="p">)),</span>
                <span class="n">palette</span><span class="o">=</span><span class="s">"viridis"</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_orth</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'Linear Model'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'$x$ Orth'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'y'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'$x$ Orth vs y with Linear Fit'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x2_orth</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">residuals</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x2_orth</span><span class="p">)),</span>
                <span class="n">palette</span><span class="o">=</span><span class="s">"viridis"</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x2_orth</span><span class="p">,</span> <span class="n">residuals_pred</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'$x^2$ Orth'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'$x^2$ Orth vs Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'grey'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'--'</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">x</span><span class="o">=</span><span class="n">x3_orth</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">second_order_residuals</span><span class="p">,</span> <span class="n">hue</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x3_orth</span><span class="p">)),</span>
                <span class="n">palette</span><span class="o">=</span><span class="s">"viridis"</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'$x^3$ Orth'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Second Order Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'$x^3$ Orth vs Second Order Residuals'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'grey'</span><span class="p">,</span> <span class="n">linestyle</span><span class="o">=</span><span class="s">'--'</span><span class="p">,</span> <span class="n">zorder</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">annotate</span><span class="p">(</span><span class="s">'Point hue denotes index'</span><span class="p">,</span>
             <span class="n">xy</span><span class="o">=</span><span class="p">(</span><span class="mf">0.99</span><span class="p">,</span> <span class="mf">0.05</span><span class="p">),</span> <span class="n">ha</span><span class="o">=</span><span class="s">'right'</span><span class="p">,</span> <span class="n">xycoords</span><span class="o">=</span><span class="s">'axes fraction'</span><span class="p">,</span>
             <span class="n">fontsize</span><span class="o">=</span><span class="mi">14</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/cubic-residuals.png" alt="Cubic residuals" /></p>

<p>The main takeaway of this deep dive is the following: <strong>The <code class="language-plaintext highlighter-rouge">poly</code> keyword when used in a formula creates orthogonal polynomials. This is well-suited for fitting statistical models, since it eliminates the risk of multicollinearity between terms.</strong></p>

<p>This wasn’t used in the other notebook since we were trying to recover parameters associated with each term. However, if you’re building a statistical model, especially one in which prediction is the focus, they may be the appropriate approach.</p>

<p>As one final note, the formulae version of <code class="language-plaintext highlighter-rouge">poly</code> does include a <code class="language-plaintext highlighter-rouge">raw</code> argument, which allows you to get the non-orthogonalized versions of each polynomial term. You can call that in Bambi like <code class="language-plaintext highlighter-rouge">bmb.Model("y ~ poly(x, 4, raw=True)", df)</code>.</p>

<h2 id="orthogonal-polynomials-in-practice">Orthogonal Polynomials in Practice</h2>

<p>In order to see the <code class="language-plaintext highlighter-rouge">poly</code> keyword in action, we’ll take a look at the cars dataset. This dataset, preloaded into Seaborn, includes information on cars manufactured between 1970-1982. First we’ll load it in and take a look at the included variables.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_mpg</span> <span class="o">=</span> <span class="n">sns</span><span class="p">.</span><span class="n">load_dataset</span><span class="p">(</span><span class="s">"mpg"</span><span class="p">)</span>
<span class="n">df_mpg</span><span class="p">.</span><span class="n">head</span><span class="p">()</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    mpg  cylinders  displacement  horsepower  weight  acceleration  model_year origin                       name
0  18.0          8         307.0       130.0    3504          12.0          70    usa  chevrolet chevelle malibu
1  15.0          8         350.0       165.0    3693          11.5          70    usa          buick skylark 320
2  18.0          8         318.0       150.0    3436          11.0          70    usa         plymouth satellite
3  16.0          8         304.0       150.0    3433          12.0          70    usa              amc rebel sst
4  17.0          8         302.0       140.0    3449          10.5          70    usa                ford torino
</code></pre></div></div>

<p>In this example, we’ll take a look at how a car’s fuel efficiency (<code class="language-plaintext highlighter-rouge">mpg</code>) relates to its <code class="language-plaintext highlighter-rouge">horsepower</code> (hp).</p>

<p>To start, we’ll just plot the joint distribution, as well as the distribution of the response variable.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">df_mpg</span> <span class="o">=</span> <span class="n">df_mpg</span><span class="p">.</span><span class="n">dropna</span><span class="p">(</span><span class="n">subset</span><span class="o">=</span><span class="p">[</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="s">"mpg"</span><span class="p">])</span>

<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">regplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">line_kws</span><span class="o">=</span><span class="p">{</span><span class="s">"color"</span><span class="p">:</span> <span class="s">"firebrick"</span><span class="p">})</span>

<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">histplot</span><span class="p">(</span><span class="n">df_mpg</span><span class="p">[</span><span class="s">"mpg"</span><span class="p">],</span> <span class="n">edgecolor</span><span class="o">=</span><span class="s">"black"</span><span class="p">,</span> <span class="n">kde</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'MPG'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">'Count'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Histogram of MPG'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">tight_layout</span><span class="p">()</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/mpg-hp-joint.png" alt="MPG vs horsepower joint distribution" /></p>

<p>Immediately, we see that the linear fit doesn’t seem to model this data perfectly, it exhibits some nonlinearity. We’ll use a polynomial regression in order to see if we can improve that fit and capture the curvature. We will first fit a linear model as a benchmark.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpg_hp_linear_mod</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"mpg ~ horsepower"</span><span class="p">,</span> <span class="n">df_mpg</span><span class="p">)</span>
<span class="n">mpg_hp_linear_fit</span> <span class="o">=</span> <span class="n">mpg_hp_linear_mod</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
<span class="n">mpg_hp_linear_mod</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">mpg_hp_linear_fit</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"response"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="p">[.</span><span class="mi">68</span><span class="p">,</span> <span class="p">.</span><span class="mi">95</span><span class="p">]:</span>
    <span class="n">bmb</span><span class="p">.</span><span class="n">interpret</span><span class="p">.</span><span class="n">plot_predictions</span><span class="p">(</span>
        <span class="n">mpg_hp_linear_mod</span><span class="p">,</span>
        <span class="n">mpg_hp_linear_fit</span><span class="p">,</span>
        <span class="s">"horsepower"</span><span class="p">,</span>
        <span class="n">pps</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">prob</span><span class="o">=</span><span class="n">p</span><span class="p">,</span>
        <span class="n">ax</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">gca</span><span class="p">()</span>
    <span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/linear-fit-predictions.png" alt="Linear fit predictions" /></p>

<p>Looking at this plot with the 68% and 95% CIs shown, the fit looks <em>okay</em>. Most notably, at about 160 hp, the data diverge from the fit pretty drastically. The fit at low hp values isn’t particularly good either, there’s quite a bit that falls outside of our 95% CI. This can be accented pretty heavily by looking at the residuals from the mean of the model.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">predicted_mpg</span> <span class="o">=</span> <span class="n">mpg_hp_linear_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">))</span>
<span class="n">residuals</span> <span class="o">=</span> <span class="n">df_mpg</span><span class="p">[</span><span class="s">"mpg"</span><span class="p">]</span> <span class="o">-</span> <span class="n">predicted_mpg</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">residuals</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">"Residuals"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Residuals for linear model'</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/linear-residuals.png" alt="Linear model residuals" /></p>

<p>This is definitely not flat like we would ideally like it.</p>

<p>Next we fit a polynomial regression, including a square term.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpg_hp_sq_mod</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"mpg ~ poly(horsepower, 2)"</span><span class="p">,</span> <span class="n">df_mpg</span><span class="p">)</span>
<span class="n">mpg_hp_sq_fit</span> <span class="o">=</span> <span class="n">mpg_hp_sq_mod</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
<span class="n">mpg_hp_sq_mod</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">mpg_hp_sq_fit</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"response"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="p">[.</span><span class="mi">68</span><span class="p">,</span> <span class="p">.</span><span class="mi">95</span><span class="p">]:</span>
    <span class="n">bmb</span><span class="p">.</span><span class="n">interpret</span><span class="p">.</span><span class="n">plot_predictions</span><span class="p">(</span>
        <span class="n">mpg_hp_sq_mod</span><span class="p">,</span>
        <span class="n">mpg_hp_sq_fit</span><span class="p">,</span>
        <span class="s">"horsepower"</span><span class="p">,</span>
        <span class="n">pps</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">prob</span><span class="o">=</span><span class="n">p</span><span class="p">,</span>
        <span class="n">ax</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">gca</span><span class="p">()</span>
    <span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"Quadratic Fit"</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/quadratic-fit-predictions.png" alt="Quadratic fit predictions" /></p>

<p>Visually, this seems to look better. Particularly at high values, the model follows the pattern in the data much, much better, since we allow for curvature by including the polynomial term. Generating the same residual plot gives the following,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">predicted_mpg</span> <span class="o">=</span> <span class="n">mpg_hp_sq_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">))</span>
<span class="n">residuals</span> <span class="o">=</span> <span class="n">df_mpg</span><span class="p">[</span><span class="s">"mpg"</span><span class="p">]</span> <span class="o">-</span> <span class="n">predicted_mpg</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">residuals</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">"Residuals"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Residuals for quadratic model'</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/quadratic-residuals.png" alt="Quadratic model residuals" /></p>

<p>This is far closer to flat than before.</p>

<p>For a true comparison, we can look at the elpd difference between the models.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">compare</span><span class="p">({</span><span class="s">"Linear"</span><span class="p">:</span> <span class="n">mpg_hp_linear_fit</span><span class="p">,</span> <span class="s">"Quadratic"</span><span class="p">:</span> <span class="n">mpg_hp_sq_fit</span><span class="p">})</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>           rank     elpd_loo     p_loo  elpd_diff    weight         se       dse  warning scale
Quadratic     0 -1137.318163  4.249899    0.00000  0.915146  18.118029   0.00000    False   log
Linear        1 -1181.836583  3.362876   44.51842  0.084854  15.124216  10.37422    False   log
</code></pre></div></div>

<p>The quadratic model performs better by LOO-CV.</p>

<h3 id="cautionary-tales">Cautionary Tales</h3>

<p>Last, we’re going to investigate a couple of pitfalls with polynomial regression.</p>

<h4 id="fitting-too-many-polynomial-degrees">Fitting too many polynomial degrees</h4>

<p>Typically, when fitting a statistical model, you want to come to your data with a hypothesis and motivate your polynomial degree based on domain knowledge and expertise with the data. Instead of being principled, we’re going to throw caution to the wind and iteratively fit models from degree 1-10 and then see which performs best.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">poly_fits</span><span class="p">,</span> <span class="n">poly_models</span> <span class="o">=</span> <span class="p">{},</span> <span class="p">{}</span>
<span class="k">for</span> <span class="n">degree</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">10</span><span class="p">):</span>
    <span class="n">model</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="sa">f</span><span class="s">"mpg ~ poly(horsepower, </span><span class="si">{</span><span class="n">degree</span><span class="si">}</span><span class="s">)"</span><span class="p">,</span> <span class="n">df_mpg</span><span class="p">)</span>
    <span class="n">fit</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
        <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">,</span> <span class="n">progressbar</span><span class="o">=</span><span class="bp">False</span>
    <span class="p">)</span>
    <span class="n">poly_models</span><span class="p">[</span><span class="sa">f</span><span class="s">"Poly</span><span class="si">{</span><span class="n">degree</span><span class="si">}</span><span class="s">"</span><span class="p">]</span> <span class="o">=</span> <span class="n">model</span>
    <span class="n">poly_fits</span><span class="p">[</span><span class="sa">f</span><span class="s">"Poly</span><span class="si">{</span><span class="n">degree</span><span class="si">}</span><span class="s">"</span><span class="p">]</span> <span class="o">=</span> <span class="n">fit</span>

<span class="nb">cmp</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">compare</span><span class="p">(</span><span class="n">poly_fits</span><span class="p">)</span>
<span class="nb">cmp</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>       rank     elpd_loo      p_loo  elpd_diff    weight         se       dse  warning scale
Poly7     0 -1133.197215   9.458617   0.000000  0.649372  18.880097   0.000000    False   log
Poly6     1 -1134.193525   8.909358   0.996310  0.000000  18.614197   1.827800     True   log
Poly8     2 -1134.296370  10.653330   1.099154  0.000000  18.876017   0.609013    False   log
Poly5     3 -1134.866504   7.696208   1.669289  0.000000  18.554120   3.508807    False   log
Poly9     4 -1135.238197  12.004611   2.040982  0.000000  18.951758   1.579663    False   log
Poly2     5 -1137.318129   4.249865   4.120914  0.000000  18.118010   6.509055    False   log
Poly3     6 -1137.990983   5.376918   4.793768  0.285760  18.402214   7.003322    False   log
Poly4     7 -1138.858924   7.061455   5.661709  0.000000  18.308157   6.200712    False   log
Poly1     8 -1181.870350   3.400270  48.673135  0.064868  15.124141  11.007970    False   log
</code></pre></div></div>

<p>A 7th-degree polynomial seems to do better than the quadratic one we fit before. But we must also notice that most ELPD values are very similar. Let’s do a plot, so we can more easily grasp how different models are according to the ELPD. We are going to use <code class="language-plaintext highlighter-rouge">az.plot_compare</code>, and we are going to add a blue band to indicate models that have an ELPD difference of less than 4 with respect to the first-ranked model. Essentially models that are that close cannot be distinguished when using ELPD.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ax</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">plot_compare</span><span class="p">(</span><span class="nb">cmp</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">4</span><span class="p">),</span> <span class="n">plot_ic_diff</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">False</span><span class="p">);</span>
<span class="n">best_loo</span> <span class="o">=</span> <span class="nb">cmp</span><span class="p">[</span><span class="s">"elpd_loo"</span><span class="p">].</span><span class="n">iloc</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">ax</span><span class="p">.</span><span class="n">axvspan</span><span class="p">(</span><span class="n">best_loo</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="n">best_loo</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C0"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.2</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/elpd-comparison.png" alt="ELPD comparison" /></p>

<p>We can see that <code class="language-plaintext highlighter-rouge">Poly6</code>, <code class="language-plaintext highlighter-rouge">Poly8</code>, <code class="language-plaintext highlighter-rouge">Poly5</code> and <code class="language-plaintext highlighter-rouge">Poly9</code> are all very similar (within a difference of 4 units). Even more, all models except <code class="language-plaintext highlighter-rouge">Poly1</code> have overlapping standard errors.</p>

<p>Overall, this is telling us that there is no clear gain in predictive performance once we move beyond a quadratic model. If we want to pick a single model, then we need another criteria to decide. If we have no reason to prefer a more complex model, choosing the simpler one (<code class="language-plaintext highlighter-rouge">Poly2</code> in this example) is a good heuristic.</p>

<p>Before deciding let’s do a couple more plots. First, see what those residuals look like!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">best_model</span> <span class="o">=</span> <span class="n">poly_models</span><span class="p">[</span><span class="s">"Poly7"</span><span class="p">]</span>
<span class="n">best_fit</span> <span class="o">=</span> <span class="n">poly_fits</span><span class="p">[</span><span class="s">"Poly7"</span><span class="p">]</span>
<span class="n">best_model</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">best_fit</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"response"</span><span class="p">)</span>

<span class="n">predicted_mpg</span> <span class="o">=</span> <span class="n">best_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">))</span>
<span class="n">residuals</span> <span class="o">=</span> <span class="n">df_mpg</span><span class="p">[</span><span class="s">"mpg"</span><span class="p">]</span> <span class="o">-</span> <span class="n">predicted_mpg</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="n">residuals</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'black'</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s">"Residuals"</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Residuals for degree 7 model'</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/poly7-residuals.png" alt="Poly7 residuals" /></p>

<p>Hey, that looks pretty good, the residuals appear nice and flat. Before we go full steam ahead with this model, let’s take a look at the posterior predictive distribution.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">()</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="p">[.</span><span class="mi">68</span><span class="p">,</span> <span class="p">.</span><span class="mi">95</span><span class="p">]:</span>
    <span class="n">bmb</span><span class="p">.</span><span class="n">interpret</span><span class="p">.</span><span class="n">plot_predictions</span><span class="p">(</span>
        <span class="n">best_model</span><span class="p">,</span>
        <span class="n">best_fit</span><span class="p">,</span>
        <span class="s">"horsepower"</span><span class="p">,</span>
        <span class="n">pps</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
        <span class="n">prob</span><span class="o">=</span><span class="n">p</span><span class="p">,</span>
        <span class="n">ax</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">gca</span><span class="p">()</span>
    <span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">"Best Fit Model: 7th Degree Polynomial"</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/poly7-predictions.png" alt="Poly7 predictions" /></p>

<p>Uh-oh. You can see that while this gave the best elpd, and had a nice residual plot, it’s obviously overfit, as expected given that we already showed that the difference with the quadratic model is small. Given our knowledge about how cars operate, we expect a decreasing trend of fuel efficiency at higher horsepower. The 7th degree polynomial absolutely is not consistent with that. First, looking at the low values, it increases before starting the decreasing trend. Second, it starts to go back up at the high end of the data, strongly latching onto a couple of points that are likely driven by noise.</p>

<p>This behavior evokes the classic quote,</p>

<blockquote>
  <p>“With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.” - John von Neumann</p>
</blockquote>

<p>The takeaway here is that <strong>as you fit higher polynomial degrees, you increase the risk of overfitting</strong>.</p>

<h4 id="extrapolation-of-polynomial-models">Extrapolation of polynomial models</h4>

<p>With any model, we should be careful when extrapolating and ensure our assumptions hold, but this particularly applies when considering polynomial regression. Since we consider the higher order polynomials, terms can quickly blow up outside of our domain.</p>

<p>For example, with the quadratic fit, we see that the drop in mpg flattens out at higher horsepower. However, if you look closely at the posterior predictive of the quadratic model, you can start to see the fit rise again at the end. But, if we extend this beyond the bounds, due to the curvature of a second degree polynomial, we see a reversal of the negative effect on horsepower, where our quadratic model implies a higher horsepower leads to <em>better</em> mpg.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">extrapolate_x_hp</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">500</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">mpg_hp_sq_mod</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">mpg_hp_sq_fit</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"horsepower"</span><span class="p">:</span> <span class="n">extrapolate_x_hp</span><span class="p">}))</span>

<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
    <span class="n">extrapolate_x_hp</span><span class="p">,</span>
    <span class="n">mpg_hp_sq_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">)),</span>
    <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
    <span class="n">label</span><span class="o">=</span><span class="s">"Extrapolated Fit"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="n">extrapolate_x_hp</span><span class="p">.</span><span class="nb">max</span><span class="p">())</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">frameon</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/quadratic-extrapolation.png" alt="Quadratic extrapolation" /></p>

<p>This is strictly untrue based on what we know about cars and what we’ve seen in the data, so you would <em>not</em> want to use the model outside of the intended domain. If that is the goal, you would want to find a more appropriate specification. Something like an exponential or inverse fit may be appropriate, in order to make sure the fit approaches 0, while still forbidding predictions below 0.</p>

<p>Extrapolation issues are not unique to polynomial regression, for example we run into forbidden values with linear regression when extrapolating too.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">mpg_hp_linear_mod</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span>
    <span class="n">mpg_hp_linear_fit</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"horsepower"</span><span class="p">:</span> <span class="n">extrapolate_x_hp</span><span class="p">})</span>
<span class="p">)</span>
<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">)</span>

<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
    <span class="n">extrapolate_x_hp</span><span class="p">,</span>
    <span class="n">mpg_hp_linear_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">)),</span>
    <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
    <span class="n">label</span><span class="o">=</span><span class="s">"Predicted"</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span>
    <span class="n">extrapolate_x_hp</span><span class="p">,</span> <span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">()[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">0</span><span class="p">,</span>
    <span class="n">color</span><span class="o">=</span><span class="s">'grey'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"MPG Forbidden region"</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="n">extrapolate_x_hp</span><span class="p">.</span><span class="nb">max</span><span class="p">())</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="n">mpg_hp_linear_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">)).</span><span class="nb">min</span><span class="p">())</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">frameon</span><span class="o">=</span><span class="bp">False</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/linear-extrapolation.png" alt="Linear extrapolation" /></p>

<p>However, it is highlighted in this notebook because, due to the nature of polynomial regression, it can be very sensitive outside the fitting domain. Just for fun to wrap this notebook up, we will take a look at what the 7th order “best model” does outside of where we fit the model.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">extrapolate_x_hp</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">300</span><span class="p">,</span> <span class="mi">250</span><span class="p">)</span>
<span class="n">best_model</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">best_fit</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"horsepower"</span><span class="p">:</span> <span class="n">extrapolate_x_hp</span><span class="p">}))</span>

<span class="n">sns</span><span class="p">.</span><span class="n">scatterplot</span><span class="p">(</span><span class="n">data</span><span class="o">=</span><span class="n">df_mpg</span><span class="p">,</span> <span class="n">x</span><span class="o">=</span><span class="s">"horsepower"</span><span class="p">,</span> <span class="n">y</span><span class="o">=</span><span class="s">"mpg"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'blue'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Data'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span>
    <span class="n">extrapolate_x_hp</span><span class="p">,</span>
    <span class="n">best_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">)),</span>
    <span class="n">color</span><span class="o">=</span><span class="s">"red"</span><span class="p">,</span>
    <span class="n">label</span><span class="o">=</span><span class="s">"Extrapolated Fit"</span><span class="p">,</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span>
    <span class="n">extrapolate_x_hp</span><span class="p">,</span> <span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">()[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">0</span><span class="p">,</span>
    <span class="n">color</span><span class="o">=</span><span class="s">'grey'</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"MPG Forbidden region"</span>
<span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">right</span><span class="o">=</span><span class="n">extrapolate_x_hp</span><span class="p">.</span><span class="nb">max</span><span class="p">())</span>
<span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">bottom</span><span class="o">=</span><span class="n">best_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"mu"</span><span class="p">].</span><span class="n">mean</span><span class="p">((</span><span class="s">"chain"</span><span class="p">,</span> <span class="s">"draw"</span><span class="p">)).</span><span class="nb">min</span><span class="p">())</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">frameon</span><span class="o">=</span><span class="bp">False</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/orthogonal-polynomial-regression/poly7-extrapolation.png" alt="Poly7 extrapolation" /></p>

<p>Yikes.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="python" /><category term="bayesian" /><category term="bambi" /><category term="statistics" /><category term="regression" /><summary type="html"><![CDATA[A deep dive into what orthogonal polynomials actually do under the hood, contributed to Bambi's examples]]></summary></entry><entry><title type="html">2024 Rewind: Polynomial Regression in Bambi</title><link href="https://tylerjamesburch.com/blog/statistics/polynomial-regression-bambi" rel="alternate" type="text/html" title="2024 Rewind: Polynomial Regression in Bambi" /><published>2026-02-16T00:00:00+00:00</published><updated>2026-02-16T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/statistics/polynomial-regression-bambi</id><content type="html" xml:base="https://tylerjamesburch.com/blog/statistics/polynomial-regression-bambi"><![CDATA[<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<p>Back in 2024, I wrote a couple of example notebooks that got merged into the <a href="https://bambinos.github.io/bambi/">Bambi</a> documentation. For those unfamiliar, Bambi is a library for fitting Bayesian regression models using a formulaic interface on top of PyMC (the closest thing in python to <code class="language-plaintext highlighter-rouge">brms</code>, in my opinion). I realized I never migrated the content here, so I thought it was time to do so.</p>

<p>This post covers polynomial regression. The original notebook lives in the <a href="https://bambinos.github.io/bambi/notebooks/polynomial_regression.html">Bambi docs</a>.</p>

<p>What follows is the content from the notebook, lightly adapted for this blog format.</p>

<hr />

<h1 id="polynomial-regression">Polynomial Regression</h1>

<p>Unlike many other examples shown in Bambi, there aren’t specific polynomial methods or families implemented – most of the interesting behavior for polynomial regression occurs within the formula definition. Regardless, there are some nuances that are useful to be aware of.</p>

<p>This example uses the kinematic equations from classical mechanics as a backdrop. Specifically, an object in motion experiencing constant acceleration can be described by the following:</p>

\[x_f = \frac{1}{2} a t^2 + v_0 t + x_0\]

<p>where \(x_0\) and \(x_f\) are the initial and final locations, \(v_0\) is the initial velocity, and \(a\) is acceleration.</p>

<h2 id="a-falling-ball">A falling ball</h2>

<p>First, we’ll consider a simple falling ball, released from 50 meters. In this situation, \(v_0 = 0\) \(m\)/\(s\), \(x_0 = 50\) \(m\) and \(a = g\), the acceleration due to gravity, \(-9.81\) \(m\)/\(s^2\). So dropping out the \(v_0 t\) component, the equation takes the form:</p>

\[x_f = \frac{1}{2} g t^2 + x_0\]

<p>We’ll start by simulating data for the first 2 seconds of motion. We will also assume some measurement error with a gaussian distribution of \(\sigma = 0.3\).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">warnings</span>

<span class="kn">import</span> <span class="nn">arviz</span> <span class="k">as</span> <span class="n">az</span>
<span class="kn">import</span> <span class="nn">bambi</span> <span class="k">as</span> <span class="n">bmb</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>

<span class="n">SEED</span> <span class="o">=</span> <span class="mi">1234</span>
<span class="n">az</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="n">use</span><span class="p">(</span><span class="s">"arviz-darkgrid"</span><span class="p">)</span>
<span class="n">warnings</span><span class="p">.</span><span class="n">filterwarnings</span><span class="p">(</span><span class="s">"ignore"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">g</span> <span class="o">=</span> <span class="o">-</span><span class="mf">9.81</span>  <span class="c1"># acceleration due to gravity (m/s^2)
</span><span class="n">t</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span>  <span class="c1"># time in seconds
</span><span class="n">inital_height</span> <span class="o">=</span> <span class="mi">50</span>
<span class="n">x_falling</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">g</span> <span class="o">*</span> <span class="n">t</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">inital_height</span>

<span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">(</span><span class="n">SEED</span><span class="p">)</span>
<span class="n">noise</span> <span class="o">=</span> <span class="n">rng</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">,</span> <span class="n">x_falling</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">x_obs_falling</span> <span class="o">=</span> <span class="n">x_falling</span> <span class="o">+</span> <span class="n">noise</span>
<span class="n">df_falling</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"t"</span><span class="p">:</span> <span class="n">t</span><span class="p">,</span> <span class="s">"x"</span><span class="p">:</span> <span class="n">x_obs_falling</span><span class="p">})</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">x_obs_falling</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Observed Displacement"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C0"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">x_falling</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"True Function"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C1"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">"Time (s)"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"Displacement (m)"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">();</span>
</code></pre></div></div>

<p><img src="/blogimages/polynomial-regression/falling-ball-data.png" alt="Falling ball data" /></p>

<p>Casting the equation \(x_f = \frac{1}{2} g t^2 + x_0\) into a regression context, fitting:</p>

\[x_f = \beta_0 + \beta_1 t^2\]

<p>We let time (\(t\)) be the independent variable, and final location (\(x_f\)) be the response/dependent variable. This allows our coefficients to be proportional to \(g\) and \(x_0\). The intercept, \(\beta_0\) corresponds exactly to \(x_0\), the initial height. Letting \(\beta_1 = \frac{1}{2} g\) gives \(g = 2\beta_1\) when \(x_1 = t^2\), meaning we’re doing <em>polynomial regression</em>. We can put this into Bambi via the following, optionally including the <code class="language-plaintext highlighter-rouge">+ 1</code> to emphasize that we choose to include the coefficient.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_falling</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"x ~ I(t**2) + 1"</span><span class="p">,</span> <span class="n">df_falling</span><span class="p">)</span>
<span class="n">results_falling</span> <span class="o">=</span> <span class="n">model_falling</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">)</span>
</code></pre></div></div>

<p>The term <code class="language-plaintext highlighter-rouge">I(t**2)</code> indicates to evaluate inside the <code class="language-plaintext highlighter-rouge">I</code>. For including <em>just the \(t^2\) term</em>, you can express it in any of the following ways:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">I(t**2)</code></li>
  <li><code class="language-plaintext highlighter-rouge">{t**2}</code></li>
  <li>Square the data directly, and pass it as a new column</li>
</ul>

<p>To verify, we’ll fit the other two versions as well.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_falling_variation1</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span>
    <span class="s">"x ~ {t**2} + 1"</span><span class="p">,</span>  <span class="c1"># Using {t**2} syntax
</span>    <span class="n">df_falling</span>
<span class="p">)</span>
<span class="n">results_variation1</span> <span class="o">=</span> <span class="n">model_falling_variation1</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">)</span>

<span class="n">model_falling_variation2</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span>
    <span class="s">"x ~ tsquared + 1"</span><span class="p">,</span>  <span class="c1"># Using data with the t variable squared
</span>    <span class="n">df_falling</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">tsquared</span><span class="o">=</span><span class="n">t</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">results_variation2</span> <span class="o">=</span> <span class="n">model_falling_variation2</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="s">"I{t**2} coefficient: "</span><span class="p">,</span> <span class="nb">round</span><span class="p">(</span><span class="n">results_falling</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(t ** 2)"</span><span class="p">].</span><span class="n">values</span><span class="p">.</span><span class="n">mean</span><span class="p">(),</span> <span class="mi">4</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"{t**2} coefficient: "</span><span class="p">,</span> <span class="nb">round</span><span class="p">(</span><span class="n">results_variation1</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(t ** 2)"</span><span class="p">].</span><span class="n">values</span><span class="p">.</span><span class="n">mean</span><span class="p">(),</span> <span class="mi">4</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"tsquared coefficient: "</span><span class="p">,</span> <span class="nb">round</span><span class="p">(</span><span class="n">results_variation2</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"tsquared"</span><span class="p">].</span><span class="n">values</span><span class="p">.</span><span class="n">mean</span><span class="p">(),</span> <span class="mi">4</span><span class="p">))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I{t**2} coefficient:  -4.8476
{t**2} coefficient:  -4.8476
tsquared coefficient:  -4.8476
</code></pre></div></div>

<p>Each of these provides identical results, giving -4.9, which is \(g/2\). This makes the acceleration exactly the \(-9.81\) \(m\)/\(s^2\) acceleration that generated the data. Looking at our model summary,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">results_falling</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>             mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma       0.336  0.025   0.289    0.381      0.000    0.000    5977.0    2861.0    1.0
Intercept  49.961  0.051  49.870   50.058      0.001    0.001    5997.0    3145.0    1.0
I(t ** 2)  -4.848  0.028  -4.899   -4.799      0.000    0.000    5704.0    2844.0    1.0
</code></pre></div></div>

<p>We see that both \(g/2 = -4.9\) (so \(g=-9.81\)) and the original height of \(x_0 = 50\) \(m\) are recovered, along with the injected noise.</p>

<p>We can then use the model to answer some questions, for example, when would the ball land? This would correspond to \(x_f = 0\).</p>

\[0 = \frac{1}{2} g t^2 - x_0\]

\[t = \sqrt{2x_0 / g}\]

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">calculated_x0</span> <span class="o">=</span> <span class="n">results_falling</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"Intercept"</span><span class="p">].</span><span class="n">values</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">calculated_g</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">results_falling</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(t ** 2)"</span><span class="p">].</span><span class="n">values</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
<span class="n">calculated_land</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">calculated_x0</span> <span class="o">/</span> <span class="n">calculated_g</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"The ball will land at </span><span class="si">{</span><span class="nb">round</span><span class="p">(</span><span class="n">calculated_land</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span><span class="si">}</span><span class="s"> seconds"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The ball will land at 3.21 seconds
</code></pre></div></div>

<p>Or if we want to account for our measurement error and use the full posterior,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">calculated_x0_posterior</span> <span class="o">=</span> <span class="n">results_falling</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"Intercept"</span><span class="p">].</span><span class="n">values</span>
<span class="n">calculated_g_posterior</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">results_falling</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(t ** 2)"</span><span class="p">].</span><span class="n">values</span>
<span class="n">calculated_land_posterior</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">calculated_x0_posterior</span> <span class="o">/</span> <span class="n">calculated_g_posterior</span><span class="p">)</span>
<span class="n">lower_est</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">calculated_land_posterior</span><span class="p">,</span> <span class="mf">0.025</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">upper_est</span> <span class="o">=</span> <span class="nb">round</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">quantile</span><span class="p">(</span><span class="n">calculated_land_posterior</span><span class="p">,</span> <span class="mf">0.975</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"The ball landing will be measured between </span><span class="si">{</span><span class="n">lower_est</span><span class="si">}</span><span class="s"> and </span><span class="si">{</span><span class="n">upper_est</span><span class="si">}</span><span class="s"> seconds"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The ball landing will be measured between 3.2 and 3.23 seconds
</code></pre></div></div>

<h2 id="projectile-motion">Projectile Motion</h2>

<p>Next, instead of a ball strictly falling, instead imagine one thrown straight upward. In this case, we add the initial velocity back into the equation.</p>

\[x_f = \frac{1}{2} g t^2 + v_0 t + x_0\]

<p>We will envision the ball tossed upward, starting at 1.5 meters above ground level. It will be tossed at 7 m/s upward. It will also stop when hitting the ground.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">v0</span> <span class="o">=</span> <span class="mi">7</span>
<span class="n">x0</span> <span class="o">=</span> <span class="mf">1.5</span>
<span class="n">x_projectile</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="o">*</span> <span class="n">g</span> <span class="o">*</span> <span class="n">t</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">v0</span> <span class="o">*</span> <span class="n">t</span> <span class="o">+</span> <span class="n">x0</span>
<span class="n">noise</span> <span class="o">=</span> <span class="n">rng</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">,</span> <span class="n">x_projectile</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">x_obs_projectile</span> <span class="o">=</span> <span class="n">x_projectile</span> <span class="o">+</span> <span class="n">noise</span>
<span class="n">df_projectile</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">({</span><span class="s">"t"</span><span class="p">:</span> <span class="n">t</span><span class="p">,</span> <span class="s">"tsq"</span><span class="p">:</span> <span class="n">t</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="s">"x"</span><span class="p">:</span> <span class="n">x_obs_projectile</span><span class="p">,</span> <span class="s">"x_true"</span><span class="p">:</span> <span class="n">x_projectile</span><span class="p">})</span>
<span class="n">df_projectile</span> <span class="o">=</span> <span class="n">df_projectile</span><span class="p">[</span><span class="n">df_projectile</span><span class="p">[</span><span class="s">"x"</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">]</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">df_projectile</span><span class="p">.</span><span class="n">t</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Observed Displacement"</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C0"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">df_projectile</span><span class="p">.</span><span class="n">t</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">.</span><span class="n">x_true</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">'True Function'</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"C1"</span><span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span><span class="n">xlabel</span><span class="o">=</span><span class="s">"Time (s)"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"Displacement (m)"</span><span class="p">,</span> <span class="n">ylim</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="bp">None</span><span class="p">))</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">();</span>
</code></pre></div></div>

<p><img src="/blogimages/polynomial-regression/projectile-motion-data.png" alt="Projectile motion data" /></p>

<p>Modeling this using Bambi, we must include the linear term on time to capture the initial velocity. We’ll do the following regression,</p>

\[x_f = \beta_0 + \beta_1 t + \beta_2 t^2\]

<p>which then maps the solved coefficients to the following: \(\beta_0 = x_0\), \(\beta_1 = v_0\), and \(\beta_2 = \frac{g}{2}\).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_projectile_all_terms</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"x ~ I(t**2) + t + 1"</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">)</span>
<span class="n">fit_projectile_all_terms</span> <span class="o">=</span> <span class="n">model_projectile_all_terms</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">target_accept</span><span class="o">=</span><span class="mf">0.9</span><span class="p">,</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">fit_projectile_all_terms</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>            mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma      0.202  0.017   0.171    0.234      0.000    0.000    2723.0    2328.0    1.0
Intercept  1.561  0.066   1.441    1.687      0.001    0.001    2058.0    2550.0    1.0
I(t ** 2) -4.867  0.114  -5.079   -4.649      0.003    0.002    1667.0    1966.0    1.0
t          6.909  0.189   6.553    7.262      0.005    0.003    1694.0    2039.0    1.0
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hdi</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">hdi</span><span class="p">(</span><span class="n">fit_projectile_all_terms</span><span class="p">.</span><span class="n">posterior</span><span class="p">,</span> <span class="n">hdi_prob</span><span class="o">=</span><span class="mf">0.95</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Initial height: </span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'Intercept'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'lower'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> to "</span>
      <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'Intercept'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'higher'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> meters (True: </span><span class="si">{</span><span class="n">x0</span><span class="si">}</span><span class="s"> m)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Initial velocity: </span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'t'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'lower'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> to "</span>
      <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'t'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'higher'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> meters per second (True: </span><span class="si">{</span><span class="n">v0</span><span class="si">}</span><span class="s"> m/s)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Acceleration: </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(t ** 2)'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'lower'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> to "</span>
      <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(t ** 2)'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'higher'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> meters per second squared (True: </span><span class="si">{</span><span class="n">g</span><span class="si">}</span><span class="s"> m/s^2)"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Initial height: 1.43 to 1.69 meters (True: 1.5 m)
Initial velocity: 6.54 to 7.28 meters per second (True: 7 m/s)
Acceleration: -10.16 to -9.27 meters per second squared (True: -9.81 m/s^2)
</code></pre></div></div>

<p>We once again are able to recover all our input parameters.</p>

<p>In addition to directly calculating all terms, to include all polynomial terms up to a given degree you can use the <code class="language-plaintext highlighter-rouge">poly</code> keyword. We don’t do that in this notebook for two reasons. First, by default it orthogonalizes the terms making it ill-suited to this example since the coefficients have physical meaning and we want to directly interpret them. The orthogonalization process can be disabled by the <code class="language-plaintext highlighter-rouge">raw</code> argument of <code class="language-plaintext highlighter-rouge">poly</code>, but we still elect not to use <code class="language-plaintext highlighter-rouge">poly</code> here because in later examples we decide to use different effects on the \(t\) term vs the \(t^2\) term, and doing so is not easy when using <code class="language-plaintext highlighter-rouge">poly</code>. However, just to show that the results match when using the <code class="language-plaintext highlighter-rouge">raw = True</code> argument, we’ll fit the same model as above.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model_poly_raw</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"x ~ poly(t, 2, raw=True)"</span><span class="p">,</span> <span class="n">df_projectile</span><span class="p">)</span>
<span class="n">fit_poly_raw</span> <span class="o">=</span> <span class="n">model_poly_raw</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">fit_poly_raw</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                          mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma                    0.201  0.017   0.172    0.234      0.000    0.000    3066.0    2205.0    1.0
Intercept                1.561  0.067   1.437    1.682      0.001    0.001    2535.0    2154.0    1.0
poly(t, 2, raw=True)[0]  6.911  0.196   6.556    7.280      0.004    0.004    2092.0    2075.0    1.0
poly(t, 2, raw=True)[1] -4.870  0.118  -5.095   -4.653      0.003    0.002    2059.0    2166.0    1.0
</code></pre></div></div>

<p>We see the same results, where <code class="language-plaintext highlighter-rouge">poly(t, 2, raw=True)[0]</code> corresponds to the coefficient on \(t\) (\(v_0\) in our example), and <code class="language-plaintext highlighter-rouge">poly(t, 2, raw=True)[1]</code> is the coefficient on \(t^2\) (\(\frac{g}{2}\)).</p>

<h2 id="measuring-gravity-on-a-new-planet">Measuring gravity on a new planet</h2>

<p>In the next example, you’ve been recruited to join the space program as a research scientist, looking to directly measure the gravity on a new planet, PlanetX. You don’t know anything about this planet or its safety, so you have time for one, and only one, throw of a ball. However, you’ve perfected your throwing mechanics, and can achieve the same initial velocity wherever you are. To baseline, you make a toss on planet Earth, warm up your spacecraft and stop at Mars to make a toss, then travel far away, and make a toss on PlanetX.</p>

<p>First we simulate data for this experiment.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">simulate_throw</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">noise_std</span><span class="p">,</span> <span class="n">time_step</span><span class="o">=</span><span class="mf">0.25</span><span class="p">,</span> <span class="n">max_time</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">seed</span><span class="o">=</span><span class="mi">1234</span><span class="p">):</span>
    <span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span>
    <span class="n">times</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">max_time</span><span class="p">,</span> <span class="n">time_step</span><span class="p">)</span>
    <span class="n">heights</span> <span class="o">=</span> <span class="n">v0</span> <span class="o">*</span> <span class="n">times</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">g</span> <span class="o">*</span> <span class="n">times</span><span class="o">**</span><span class="mi">2</span>
    <span class="n">heights_with_noise</span> <span class="o">=</span> <span class="n">heights</span> <span class="o">+</span> <span class="n">rng</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">noise_std</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">times</span><span class="p">))</span>
    <span class="n">valid_indices</span> <span class="o">=</span> <span class="n">heights_with_noise</span> <span class="o">&gt;=</span> <span class="mi">0</span>
    <span class="k">return</span> <span class="n">times</span><span class="p">[</span><span class="n">valid_indices</span><span class="p">],</span> <span class="n">heights_with_noise</span><span class="p">[</span><span class="n">valid_indices</span><span class="p">],</span> <span class="n">heights</span><span class="p">[</span><span class="n">valid_indices</span><span class="p">]</span>

<span class="c1"># Define the parameters
</span><span class="n">v0</span> <span class="o">=</span> <span class="mi">20</span>  <span class="c1"># Initial velocity (m/s)
</span><span class="n">g_planets</span> <span class="o">=</span> <span class="p">{</span><span class="s">"Earth"</span><span class="p">:</span> <span class="mf">9.81</span><span class="p">,</span> <span class="s">"Mars"</span><span class="p">:</span> <span class="mf">3.72</span><span class="p">,</span> <span class="s">"PlanetX"</span><span class="p">:</span> <span class="mf">6.0</span><span class="p">}</span>
<span class="n">noise_std</span> <span class="o">=</span> <span class="mf">1.5</span>

<span class="c1"># Generate data
</span><span class="n">records</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">planet</span><span class="p">,</span> <span class="n">g</span> <span class="ow">in</span> <span class="n">g_planets</span><span class="p">.</span><span class="n">items</span><span class="p">():</span>
    <span class="n">times</span><span class="p">,</span> <span class="n">heights</span><span class="p">,</span> <span class="n">heights_true</span> <span class="o">=</span> <span class="n">simulate_throw</span><span class="p">(</span><span class="n">v0</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">noise_std</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">time</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">height_true</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">times</span><span class="p">,</span> <span class="n">heights</span><span class="p">,</span> <span class="n">heights_true</span><span class="p">):</span>
        <span class="n">records</span><span class="p">.</span><span class="n">append</span><span class="p">([</span><span class="n">planet</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">height</span><span class="p">,</span> <span class="n">height_true</span><span class="p">])</span>

<span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">DataFrame</span><span class="p">(</span><span class="n">records</span><span class="p">,</span> <span class="n">columns</span><span class="o">=</span><span class="p">[</span><span class="s">"Planet"</span><span class="p">,</span> <span class="s">"Time"</span><span class="p">,</span> <span class="s">"Height"</span><span class="p">,</span> <span class="s">"Height_true"</span><span class="p">])</span>
<span class="n">df</span><span class="p">[</span><span class="s">"Planet"</span><span class="p">]</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="s">"Planet"</span><span class="p">].</span><span class="n">astype</span><span class="p">(</span><span class="s">"category"</span><span class="p">)</span>
</code></pre></div></div>

<p>And drawing those trajectories,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>

<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">planet</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">"Planet"</span><span class="p">].</span><span class="n">cat</span><span class="p">.</span><span class="n">categories</span><span class="p">):</span>
    <span class="n">subset</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">"Planet"</span><span class="p">]</span> <span class="o">==</span> <span class="n">planet</span><span class="p">]</span>
    <span class="n">ax</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">subset</span><span class="p">[</span><span class="s">"Time"</span><span class="p">],</span> <span class="n">subset</span><span class="p">[</span><span class="s">"Height_true"</span><span class="p">],</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sa">f</span><span class="s">"C</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
    <span class="n">ax</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">subset</span><span class="p">[</span><span class="s">"Time"</span><span class="p">],</span> <span class="n">subset</span><span class="p">[</span><span class="s">"Height"</span><span class="p">],</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="n">planet</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sa">f</span><span class="s">"C</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>

<span class="n">ax</span><span class="p">.</span><span class="nb">set</span><span class="p">(</span>
    <span class="n">xlabel</span><span class="o">=</span><span class="s">"Time (seconds)"</span><span class="p">,</span> <span class="n">ylabel</span><span class="o">=</span><span class="s">"Height (meters)"</span><span class="p">,</span>
    <span class="n">title</span><span class="o">=</span><span class="s">"Trajectory Comparison"</span><span class="p">,</span> <span class="n">ylim</span><span class="o">=</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="bp">None</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">ax</span><span class="p">.</span><span class="n">legend</span><span class="p">(</span><span class="n">title</span><span class="o">=</span><span class="s">"Planet"</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/polynomial-regression/planet-trajectories.png" alt="Planet trajectories" /></p>

<p>We now aim to model this data. We again use the following equation (calling displacement \(h\) for height):</p>

\[h = \frac{1}{2} g_{p} t^2 + v_{0} t\]

<p>where \(g_p\) now has a subscript to indicate the planet that we’re throwing from.</p>

<p>In Bambi, we’ll do the following:</p>

<p><code class="language-plaintext highlighter-rouge">Height ~ I(Time**2):Planet + Time + 0</code></p>

<p>which corresponds one-to-one with the above formula. The intercept is eliminated since we start from \(x=0\).</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">planet_model</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="s">"Height ~ I(Time**2):Planet + Time + 0"</span><span class="p">,</span> <span class="n">df</span><span class="p">)</span>
<span class="n">planet_model</span><span class="p">.</span><span class="n">build</span><span class="p">()</span>
<span class="n">planet_fit</span> <span class="o">=</span> <span class="n">planet_model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">chains</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span><span class="p">)</span>
</code></pre></div></div>

<p>The model has fit. Let’s look at how we did recovering the simulated parameters.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">planet_fit</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                                mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma                          1.759  0.147   1.498    2.044      0.003    0.003    2054.0    1938.0    1.0
I(Time ** 2):Planet[Earth]    -4.998  0.075  -5.145   -4.865      0.002    0.001    1833.0    2431.0    1.0
I(Time ** 2):Planet[Mars]     -1.884  0.022  -1.925   -1.844      0.001    0.000    1428.0    1763.0    1.0
I(Time ** 2):Planet[PlanetX]  -3.017  0.036  -3.087   -2.953      0.001    0.001    1519.0    1729.0    1.0
Time                          20.128  0.166  19.827   20.449      0.004    0.003    1393.0    1714.0    1.0
</code></pre></div></div>

<p>Getting the gravities back to the physical value,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hdi</span> <span class="o">=</span> <span class="n">az</span><span class="p">.</span><span class="n">hdi</span><span class="p">(</span><span class="n">planet_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">,</span> <span class="n">hdi_prob</span><span class="o">=</span><span class="mf">0.95</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"g for Earth: </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'Earth'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'lower'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"to </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'Earth'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'higher'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"meters (True: -9.81 m)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"g for Mars: </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'Mars'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'lower'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"to </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'Mars'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'higher'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"meters (True: -3.72 m)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"g for PlanetX: </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'PlanetX'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'lower'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"to </span><span class="si">{</span><span class="mi">2</span><span class="o">*</span><span class="n">hdi</span><span class="p">[</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet</span><span class="s">'].sel(</span><span class="si">{</span><span class="s">'I(Time ** 2)</span><span class="si">:</span><span class="n">Planet_dim</span><span class="s">'</span><span class="si">:</span><span class="s">'PlanetX'</span><span class="p">,</span> <span class="s">'hdi'</span><span class="si">:</span><span class="s">'higher'</span><span class="si">}</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"meters (True: -6.0 m)"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"Initial velocity: </span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'Time'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'lower'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> to </span><span class="si">{</span><span class="n">hdi</span><span class="p">[</span><span class="s">'Time'</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span><span class="n">hdi</span><span class="o">=</span><span class="s">'higher'</span><span class="p">)</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> "</span>
      <span class="sa">f</span><span class="s">"meters per second (True: 20 m/s)"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>g for Earth: -10.29 to -9.71 meters (True: -9.81 m)
g for Mars: -3.85 to -3.68 meters (True: -3.72 m)
g for PlanetX: -6.18 to -5.90 meters (True: -6.0 m)
Initial velocity: 19.80 to 20.45 meters per second (True: 20 m/s)
</code></pre></div></div>

<p>We can see that we’re pretty close to recovering most the parameters, but the fit isn’t great. Plotting the posteriors for \(g\) against the true values,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">earth_posterior</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"Earth"</span><span class="p">})</span>
<span class="n">planetx_posterior</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"PlanetX"</span><span class="p">})</span>
<span class="n">mars_posterior</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"Mars"</span><span class="p">})</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">axs</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">6</span><span class="p">))</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">earth_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">9.81</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Posterior $g$ on Earth"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">mars_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">3.72</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Posterior $g$ on Mars"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">planetx_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">6.0</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">2</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Posterior $g$ on PlanetX"</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/polynomial-regression/gravity-posteriors-no-prior.png" alt="Gravity posteriors without prior" /></p>

<p>The fit seems to work, more or less, but certainly could be improved.</p>

<h3 id="adding-a-prior">Adding a prior</h3>

<p>But, we can do better! We have a <a href="https://en.wikipedia.org/wiki/Gravity_of_Earth">very good idea of the acceleration due to gravity on Earth</a> and <a href="https://en.wikipedia.org/wiki/Gravity_of_Mars">Mars</a>, so why not use that information? From an experimental standpoint, we can consider these throws from a calibration mindset, allowing us to get some information on the resolution of our detector, and our throwing apparatus. With informative priors constraining the Earth and Mars gravity parameters, the model can more precisely estimate the unknown PlanetX gravity, as there will be less uncertainty propagating from the calibration planets.</p>

<p>For Earth, at the extremes, \(g\) takes values as low as 9.78 \(m\)/\(s^2\) (at the Equator) up to 9.83 (at the Poles). So we can add a very strong prior,</p>

\[g_{\text{Earth}} \sim \text{Normal}(-9.81, 0.025)\]

<p>For Mars, we know the mean value is about 3.72 \(m\)/\(s^2\). There’s less information on local variation readily available by a cursory search, <em>however</em> we know that the radius of Mars is about half that of Earth, so \(\sigma = \frac{0.025}{2} = 0.0125\) might make sense, but to be conservative we’ll round that up to \(\sigma = 0.02\).</p>

\[g_{\text{Mars}} \sim \text{Normal}(-3.72, 0.02)\]

<p>For PlanetX, we must use a very loose prior. We might say that we know the ball took longer to fall than Earth, but not as long as on Mars, so we can split the difference. Then set a very wide \(\sigma\) value.</p>

\[g_{\text{PlanetX}} \sim \text{Normal}(\frac{-9.81 - 3.72}{2}, 3) = \text{Normal}(-6.77, 3)\]

<p>Since these correspond to \(g/2\), we’ll divide all values by 2 when putting them into Bambi. Additionally, we know the balls landed eventually, so \(g\) <em>must be</em> negative. We’ll truncate the upper limit of the distribution at 0.</p>

<p>Now, for defining this in Bambi, the term of interest is <code class="language-plaintext highlighter-rouge">I(Time ** 2):Planet</code>. Often, you set one prior that applies to all groups, however, if you want to set each group individually, you can pass a list to the <code class="language-plaintext highlighter-rouge">bmb.Prior</code> definition. <a href="https://github.com/bambinos/bambi/issues/778">The broadcasting rules from PyMC apply here</a>, so it could equivalently take a numpy array. You’ll notice that the priors are passed alphabetically by group name.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">priors</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">"I(Time ** 2):Planet"</span><span class="p">:</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Prior</span><span class="p">(</span>
        <span class="s">"TruncatedNormal"</span><span class="p">,</span>
        <span class="n">mu</span><span class="o">=</span><span class="p">[</span>
            <span class="o">-</span><span class="mf">9.81</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span>  <span class="c1"># Earth
</span>            <span class="o">-</span><span class="mf">3.72</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span>  <span class="c1"># Mars
</span>            <span class="o">-</span><span class="mf">6.77</span><span class="o">/</span><span class="mi">2</span>   <span class="c1"># PlanetX
</span>        <span class="p">],</span>
        <span class="n">sigma</span><span class="o">=</span><span class="p">[</span>
            <span class="mf">0.025</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span>  <span class="c1"># Earth
</span>            <span class="mf">0.02</span><span class="o">/</span><span class="mi">2</span><span class="p">,</span>   <span class="c1"># Mars
</span>            <span class="mi">3</span><span class="o">/</span><span class="mi">2</span>       <span class="c1"># PlanetX
</span>        <span class="p">],</span>
        <span class="n">upper</span><span class="o">=</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
    <span class="p">)}</span>

<span class="n">planet_model_with_prior</span> <span class="o">=</span> <span class="n">bmb</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span>
    <span class="s">'Height ~ I(Time**2):Planet + Time + 0'</span><span class="p">,</span>
    <span class="n">df</span><span class="p">,</span>
    <span class="n">priors</span><span class="o">=</span><span class="n">priors</span>
<span class="p">)</span>

<span class="n">planet_model_with_prior</span><span class="p">.</span><span class="n">build</span><span class="p">()</span>
<span class="n">idata</span> <span class="o">=</span> <span class="n">planet_model_with_prior</span><span class="p">.</span><span class="n">prior_predictive</span><span class="p">()</span>
<span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">idata</span><span class="p">.</span><span class="n">prior</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"stats"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                                 mean       sd   hdi_3%  hdi_97%
sigma                          14.466   13.809    0.025   36.595
I(Time ** 2):Planet[Earth]     -4.905    0.012   -4.928   -4.883
I(Time ** 2):Planet[Mars]      -1.860    0.010   -1.880   -1.841
I(Time ** 2):Planet[PlanetX]   -3.622    1.509   -6.360   -0.915
Time                            0.520   14.788  -26.565   27.992
</code></pre></div></div>

<p>Here we’ve sampled the prior predictive and can see that our priors are correctly specified to the associated planets.</p>

<p>Next we fit the model.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">planet_fit_with_prior</span> <span class="o">=</span> <span class="n">planet_model_with_prior</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span>
    <span class="n">chains</span><span class="o">=</span><span class="mi">4</span><span class="p">,</span> <span class="n">idata_kwargs</span><span class="o">=</span><span class="p">{</span><span class="s">"log_likelihood"</span><span class="p">:</span> <span class="bp">True</span><span class="p">},</span> <span class="n">random_seed</span><span class="o">=</span><span class="n">SEED</span>
<span class="p">)</span>
<span class="n">planet_model_with_prior</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">planet_fit_with_prior</span><span class="p">,</span> <span class="n">kind</span><span class="o">=</span><span class="s">"pps"</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">az</span><span class="p">.</span><span class="n">summary</span><span class="p">(</span><span class="n">planet_fit_with_prior</span><span class="p">)[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>                                mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
sigma                          1.759  0.142   1.495    2.024      0.002    0.002    3333.0    2373.0    1.0
I(Time ** 2):Planet[Earth]    -4.907  0.012  -4.929   -4.884      0.000    0.000    4360.0    2943.0    1.0
I(Time ** 2):Planet[Mars]     -1.862  0.009  -1.879   -1.847      0.000    0.000    2054.0    2614.0    1.0
I(Time ** 2):Planet[PlanetX]  -2.985  0.023  -3.025   -2.940      0.000    0.000    2282.0    2772.0    1.0
Time                          19.960  0.075  19.827   20.103      0.002    0.001    2025.0    2249.0    1.0
</code></pre></div></div>

<p>We see some improvements here! Off the cuff, these look better, you’ll notice the \(v_0\) coefficient on <code class="language-plaintext highlighter-rouge">Time</code> covers the true value of 20 m/s.</p>

<p>Now taking a look at the effects before and after adding the prior on the gravities,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">earth_posterior_2</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit_with_prior</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"Earth"</span><span class="p">})</span>
<span class="n">mars_posterior_2</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit_with_prior</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"Mars"</span><span class="p">})</span>
<span class="n">planetx_posterior_2</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="n">planet_fit_with_prior</span><span class="p">.</span><span class="n">posterior</span><span class="p">[</span><span class="s">"I(Time ** 2):Planet"</span><span class="p">].</span><span class="n">sel</span><span class="p">(</span>
    <span class="p">{</span><span class="s">"I(Time ** 2):Planet_dim"</span><span class="p">:</span> <span class="s">"PlanetX"</span><span class="p">})</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">axs</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span> <span class="n">sharex</span><span class="o">=</span><span class="s">'col'</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">earth_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">9.81</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Earth $g$ - No Prior"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">mars_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">3.72</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Mars $g$ - No Prior"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">planetx_posterior</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">6.0</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">2</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"PlanetX $g$ - No Prior"</span><span class="p">)</span>

<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">earth_posterior_2</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">9.81</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Earth $g$ - Priors Used"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">mars_posterior_2</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">3.72</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Mars $g$ - Priors Used"</span><span class="p">)</span>
<span class="n">az</span><span class="p">.</span><span class="n">plot_posterior</span><span class="p">(</span><span class="n">planetx_posterior_2</span><span class="p">,</span> <span class="n">ref_val</span><span class="o">=</span><span class="mf">6.0</span><span class="p">,</span> <span class="n">ax</span><span class="o">=</span><span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">])</span>
<span class="n">axs</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">].</span><span class="n">set_title</span><span class="p">(</span><span class="s">"PlanetX $g$ - Priors Used"</span><span class="p">);</span>
</code></pre></div></div>

<p><img src="/blogimages/polynomial-regression/gravity-posteriors-comparison.png" alt="Gravity posteriors comparison" /></p>

<p>Adding the prior gives smaller uncertainties for Earth and Mars by design, however, we can see the estimate for PlanetX has also improved by injecting our knowledge into the model.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="python" /><category term="bayesian" /><category term="bambi" /><category term="statistics" /><category term="regression" /><summary type="html"><![CDATA[Overview of polynomial regression using Bambi, through projectile motion and fictitious planets]]></summary></entry><entry><title type="html">linear-term - a TUI for Linear</title><link href="https://tylerjamesburch.com/blog/misc/linear-term" rel="alternate" type="text/html" title="linear-term - a TUI for Linear" /><published>2026-01-26T00:00:00+00:00</published><updated>2026-01-26T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/misc/linear-term</id><content type="html" xml:base="https://tylerjamesburch.com/blog/misc/linear-term"><![CDATA[<h1 id="background">Background</h1>

<p>A little over a decade ago, toward the start of my Ph.D., I started programming for real work for the first time. I was doing data analysis in C++ using <a href="https://en.wikipedia.org/wiki/ROOT">ROOT</a> (and yes, data analysis in C++ is as awful as it sounds). At the time, My advisor was the first person to introduce me to a terminal and to emacs. I’m pretty sure when I wasn’t looking, he aliased  <code class="language-plaintext highlighter-rouge">emacs = emacs -nw</code> just so I wouldn’t even have to use the emacs GUI.</p>

<p>In 2018, I switched to VSCode. The integrated terminal made it feel like a one-stop-shop - everything in one program, no context switching, plus a rich editing experience. But I’ve always missed parts of a terminal-only workflow, and lately I’ve been drifting back to it.</p>

<p>This has made me a sucker for terminal-based tooling. One friction point I noticed recently: there isn’t a good terminal user interface (TUI) for <a href="https://linear.app">Linear</a> project management, which forces me to context-switch into the native app to log progress, comment on issues, etc. So this past weekend I hacked together my own.</p>

<h1 id="linear-term">linear-term</h1>

<p><code class="language-plaintext highlighter-rouge">linear-term</code> is the TUI I put together (repo <a href="https://github.com/tjburch/linear-term">here</a>). It was built using <code class="language-plaintext highlighter-rouge">textual</code> in python. The original design was intended to look similar to the native app, but within the terminal.</p>

<p><img src="/blogimages/linear-term/main-view.png" alt="Main view" /></p>

<p>It’s a 3-panel layout: the center shows your issues, the right panel shows issue details once selected, and the left panel has filtering options. You can toggle through each via <code class="language-plaintext highlighter-rouge">TAB</code> or <code class="language-plaintext highlighter-rouge">F1</code>, <code class="language-plaintext highlighter-rouge">F2</code>, <code class="language-plaintext highlighter-rouge">F3</code>.</p>

<p>I also created a kanban board view, accessible via <code class="language-plaintext highlighter-rouge">b</code>, where you can look at issues by their status.</p>

<p><img src="/blogimages/linear-term/kanban-view.png" alt="Kanban board view" /></p>

<p>I also added some CLI tools. There are existing Linear CLIs out there, but I wanted this to be enough of a one-stop-shop that you didn’t have to install a bunch of other tools.</p>

<p>For example:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>linear-term list <span class="nt">--mine</span>
TJB-1 <span class="o">[</span>Backlog] <span class="nt">---</span> Get familiar with Linear @Tyler Burch
TJB-4 <span class="o">[</span>Done] <span class="nt">---</span> Import your data @Tyler Burch
TJB-3 <span class="o">[</span>In Progress] <span class="nt">---</span> Connect your tools @Tyler Burch
TJB-2 <span class="o">[</span>Todo] <span class="nt">---</span> Set up your teams @Tyler Burch
</code></pre></div></div>

<p>I mainly put this together for my own use, but please feel free to use it if you’re interested. Happy to hear feedback, or take contributions too.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Misc" /><category term="python" /><category term="tools" /><category term="cli" /><category term="terminal" /><summary type="html"><![CDATA[A terminal user interface for Linear project management]]></summary></entry><entry><title type="html">2025 Year in Review</title><link href="https://tylerjamesburch.com/blog/personal/year-in-review" rel="alternate" type="text/html" title="2025 Year in Review" /><published>2025-12-31T00:00:00+00:00</published><updated>2025-12-31T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/personal/year-in-review</id><content type="html" xml:base="https://tylerjamesburch.com/blog/personal/year-in-review"><![CDATA[<p>In case anyone is keeping score, it’s been a while since I’ve posted anything to this blog (assuming like most people, you consider 2 calendar years to be a long time). I’m going to chalk it up to being <em>really</em> locked in.</p>

<p>Several times this year, I’ve had an idea for a post that never came to fruition. I figured a good way to wrap up the year would be to collect some of those loose threads into a single lightning-round collection of thoughts.</p>

<p>So, without further ado, here’s what I did in 2025:</p>

<hr />

<h3 id="i-wrote-a-lot-of-r">I wrote a lot of R</h3>

<p>A little over a year ago, I switched groups within our analytics team, and now have a strong requirement to use R for building projects. As a result, 2025 was the first year where I wrote more lines of R than any other language.</p>

<p>A few reflections after a year of working in R full-time:</p>

<ol>
  <li>
    <p>I used to advise people who wanted to break into baseball analytics that it didn’t matter what language they learned, just pick R or Python and learn as much as you can. If a job requires you to switch, you can learn the other on the job, but prioritize knowledge depth while learning. This year has reinforced that opinion; good design patterns are ubiquitous, and learning on the fly has not been arduous. Living in a world with LLMs helps too (more on that later).</p>
  </li>
  <li>
    <p>That being said, I’m now of the opinion that it’s near impossible to be a proper statistician and not engage with <em>some</em> R. There are several non-negotiable statistics libraries that plainly don’t have reasonable non-R alternatives (cough <code class="language-plaintext highlighter-rouge">mgcv</code> cough). Certainly, there are ways around it if you go the Python route, but often it’s fitting a square peg in a round hole.</p>
  </li>
  <li>
    <p>I still prefer working in Python, especially for production code. R has so many idiosyncrasies that still frustrate me. The pain point that continues to plague me more than any is silent failures and tragically uninformative error messages. (Seriously, what does <code class="language-plaintext highlighter-rouge">object of type 'closure' is not subsettable</code> even mean?)</p>
  </li>
</ol>

<hr />

<h3 id="i-made-some-small-contributions-to-tidymodels">I made some small contributions to tidymodels</h3>

<p>Speaking of R, after some initial skepticism, I’ve grown pretty fond of the <code class="language-plaintext highlighter-rouge">tidymodels</code> framework.</p>

<div style="display: flex; justify-content: center;">
<blockquote class="bluesky-embed" data-bluesky-uri="at://did:plc:ct7boh6ncbhxjuyrebejsov2/app.bsky.feed.post/3ljr5567d2224" data-bluesky-cid="bafyreibpl3csknoxqosnv3zphztxqbgsqkwy2l62kefh7qey5uvktafowi">
<p lang="en">I'm here to walk back this take after 3 months. The upfront pain provides a lot of really good guardrails against doing really stupid statistical malpractice, and also makes downstream stuff (e.g. model tuning) trivially easy.</p>
&mdash; <a href="https://bsky.app/profile/did:plc:ct7boh6ncbhxjuyrebejsov2?ref_src=embed">Tyler Burch (@tylerjamesburch.com)</a> <a href="https://bsky.app/profile/did:plc:ct7boh6ncbhxjuyrebejsov2/post/3ljr5567d2224?ref_src=embed">March 7, 2025</a>
</blockquote>
</div>
<script async="" src="https://embed.bsky.app/static/embed.js" charset="utf-8"></script>

<p>Following my personal policy of helping out the libraries that help me, I made a couple of small PRs to <code class="language-plaintext highlighter-rouge">tidymodels</code>, probably the most interesting being <a href="https://github.com/tidymodels/tune/pull/1007">adding fold weighting to the <code class="language-plaintext highlighter-rouge">tune</code> package</a>.</p>

<p>For context, <code class="language-plaintext highlighter-rouge">tune</code> handles hyperparameter tuning within <code class="language-plaintext highlighter-rouge">tidymodels</code>. For each candidate hyperparameter set, you evaluate performance across resampling folds and take the set with the best average performance. Prior to this change, each fold contributed equally to that average, regardless of how much data it represented. That assumption is fine in a classic K-Fold setup, but if you switch to something like expanding window CV where folds have variable sizes, it seemed to miss the mark.</p>

<p>With this change, folds can now be weighted (naturally by training set size in the expanding window example) so later, more informative folds carry more influence when selecting hyperparameters. It’s a very small tweak, but it fixed a real issue I ran into on a project.</p>

<hr />

<h3 id="ive-thought-a-lot-about-forecasting-problems">I’ve thought a lot about forecasting problems</h3>

<p>Related to the above, more than before, I’ve found myself approaching problems through a forecasting lens. The largest project I worked on this year wasn’t explicitly a forecasting task, but required conditioning on time due to distributional shifts over time.</p>

<p>One of the clearest implications was cross-validation. I have increasingly used rolling or expanding window setups for cases when time could even plausibly matter, even if the problem isn’t explicitly framed as forecasting.</p>

<p>Lately, I’ve found myself defaulting to the assumption that time is relevant unless proven otherwise. Instead of asking whether I should use time-aware cross-validation, I enter with the posture that I need to be convinced it’s safe not to. Stationarity assumptions are helpful when true, but frequently break down in real-world problems, and ignoring that can lead to misleading performance estimates. Perhaps it can be a bit overcautious, but it’s one of those cases where a bit more care here has made me more comfortable with the results I’m delivering.</p>

<hr />

<h3 id="i-used-a-lot-of-tokens">I used a lot of tokens</h3>

<p>Agentic AI was impossible to ignore in 2025. I’ve long been bought in on codebase-aware LLMs. I find copy/pasting from ChatGPT both painfully slow and prone to stripping useful information. Because of this, I was a pretty early adopter of Cursor; I started using it in August of 2023. The tab complete was enough for me to buy in, and when agent-based edits came along, it became an important part of my workflow.</p>

<p>These days I’m using some combination of Cursor, Codex, and Claude Code for scaffolding boilerplate, generating prototypes (especially front-ends), quickly testing hypotheses, and making publication-quality plots faster than I could myself. The domain of tasks where it’s faster to prompt than to tweak, dissect, debug, etc. has grown way faster than I expected.</p>

<p>I don’t have any novel insights in this domain that haven’t been said elsewhere. The advice that I think about most day-to-day in my workflow is:</p>

<ul>
  <li><strong>Be meticulous about context</strong>. Keep context window usage under 50% if possible. The smaller the haystack, the easier the needle is to find.</li>
  <li><strong>Spend time on prompting well</strong>. An extra 5 minutes on a good prompt can save an hour of debugging.</li>
  <li><strong>Be judicious about correctness</strong>. One “wrong” line in the context window can yield hundreds of bad lines of code, or subtle unexpected bugs. Clear, correct prompts are key.</li>
  <li><strong>Let the tool actually run code and see output.</strong> They’re better about this in other languages, but I find particularly in R they have to be pushed to do this. Add logging statements so it can iterate with itself and find bugs.</li>
</ul>

<p>One thing I have noticed is that most of the public conversation around LLM tooling is focused more in the domain of traditional software engineering, where specs are far clearer than data analysis workflows. The projects I work on are fuzzier, and the road from question to answer (or from idea to predictions) is not typically well-defined from the outset. I haven’t seen nearly as much written about what works well in this environment and am still figuring it out day-by-day for myself.</p>

<hr />

<h3 id="i-loosened-my-grip-on-statistical-dogma">I loosened my grip on statistical dogma</h3>

<p>My first experience reading an honest stats text was McElreath’s Statistical Rethinking back in 2020 (10/10 recommend). Much of my early experience with doing statistics was strongly through a Bayesian lens. I used to feel pretty strongly that this was the best way to do things when possible.</p>

<p>In 2025, I let go of a lot of those biases. At the end of the day, I’m a practitioner; I need answers to problems. I’m not debating or writing about the philosophy of statistics, and in many of the problems I work on trying to wedge Bayesian inference into the solution can be a hindrance more than a value add.</p>

<p>If a frequentist framework can get me to effectively the same answer more quickly, I’ve become much more comfortable using it. In a lot of applied settings, the difference between a Bayesian posterior and a well-validated frequentist estimate is functionally negligible, while the difference in iteration speed isn’t. The Bayesian paradigm still shapes how I think about uncertainty, but I don’t reach for that machinery unless it provides something that will be used and I’m willing to wait for MCMC chains to finish sampling before I can provide an answer.</p>

<p>One of my repeated lines throughout this year has been “just use the right tool for the job,” which could even be a hidden thread behind this post. For me, the “right” tool is the one that answers the real question under real constraints and produces results that stakeholders can understand and act on. In practice, that’s often the approach that generates the most business value (or in my case, wins the most games) as quickly and correctly as possible.</p>

<hr />

<h3 id="i-changed-a-lot-of-diapers">I changed a lot of diapers</h3>

<p>Above all else, I welcomed a second child into the world in May. At the time of writing this, I’m parenting a teething 7-month-old who gives the best smiles, as well as a curious and fiercely independent daughter who turned 3 yesterday.</p>

<p>I have a tendency to value myself entirely based on my work and the things I produce. Fatherhood is a constant forcing function to get out of that mindset and to enjoy life outside of sheer production. While it has exhausting moments, it has made me appreciate the day-to-day moments so much more.</p>

<p>Right now I’m most appreciating the sense of wonder from my toddler. I dread air travel, airports are awful, logistics are a nightmare, I could go on for hours. But hearing my daughter say “I’m so excited” while getting on a plane, and watching her stare with awe out the window during takeoff, has made me stop for a few moments and appreciate how cool life really can be. There are countless places where getting a chance to look through her lens has made me far more appreciative of the little things in life.</p>

<div style="display: flex; justify-content: center;">
<img src="/blogimages/2025_postseason.png" style="width: 60%;" />
</div>

<p>See you soon, hopefully before another 2 years go by, but no promises.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Personal" /><category term="R" /><category term="statistics" /><category term="forecasting" /><category term="LLM" /><category term="personal" /><summary type="html"><![CDATA[A lightning-round collection of loose threads from 2025]]></summary></entry><entry><title type="html">2023 Reading List</title><link href="https://tylerjamesburch.com/blog/personal/reading-list" rel="alternate" type="text/html" title="2023 Reading List" /><published>2023-12-29T00:00:00+00:00</published><updated>2023-12-29T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/personal/reading-list</id><content type="html" xml:base="https://tylerjamesburch.com/blog/personal/reading-list"><![CDATA[<h2 id="books-read-in-2023">Books Read in 2023:</h2>

<ul>
  <li><em>How Not to Be Wrong: The Power of Mathematical Thinking</em> - Jordan Ellenberg</li>
  <li><em>Fluent in 3 Months: How Anyone at Any Age Can Learn to Speak Any Language from Anywhere in the World</em> - Benny Lewis</li>
  <li><em>Zak George’s Dog Training Revolution</em> - Zak George</li>
  <li><em>Winning Fixes Everything: How Baseball’s Brightest Minds Created Sports’ Biggest Mess</em> - Evan Drellich</li>
  <li><em>Heaven and Hell: A History of the Afterlife</em> - Bart D. Ehrman</li>
  <li><em>My Life as a Quant: Reflections on Physics and Finance</em> - Emanuel Derman</li>
  <li><em>The Checklist Manifesto: How to Get Things Right</em> - Atul Gawande</li>
  <li><em>The Big Short: Inside the Doomsday Machine</em> - Michael Lewis</li>
</ul>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Personal" /><category term="books" /><category term="reading" /><summary type="html"><![CDATA[Books I read in 2023]]></summary></entry><entry><title type="html">2023 NHL Playoff Predictions</title><link href="https://tylerjamesburch.com/blog/statistics/nhl-predictions" rel="alternate" type="text/html" title="2023 NHL Playoff Predictions" /><published>2023-04-30T00:00:00+00:00</published><updated>2023-04-30T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/statistics/nhl-predictions</id><content type="html" xml:base="https://tylerjamesburch.com/blog/statistics/nhl-predictions"><![CDATA[<script src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" type="text/javascript"></script>

<h2 id="background">Background</h2>

<p>Earlier this NHL season, I posted <a href="http://tylerjamesburch.com/blog/statistics/hockey-bayes">a Bayesian hierarchical model for NHL scoring</a> in an aim to understand the skill of the Bruins based on the first 21 games (in which they went 18-3). This model has been expanded to better model NHL games (specifically the overtime structure), fit on all of 2022-2023 data to get the goal creation and suppression parameters for each team, and then used to project the remainder of the playoffs, which can be found <a href="http://nhl-projections.tylerjamesburch.com">here</a>.</p>

<h2 id="methodology">Methodology</h2>

<h3 id="original-model">Original Model</h3>

<p>The base of the model has remained unchanged, based on the  <a href="https://discovery.ucl.ac.uk/id/eprint/16040/1/16040.pdf">Baio and Blangiardo</a> model. For regulation scoring, I fit the following model for goal scoring, \(y = (y_{g0}, y_{g1})\), a Poisson process:</p>

\[y_{gj} | \theta_{gj} \sim \text{Poisson}(\theta_{gj})\]

<p>for game \(g\), with \(j = {0, 1}\) an indicator variable for home ice. In their paper the rate parameter \(\theta_{gj}\) is given by the following:</p>

\[\log \theta_{g0} = \alpha + h + a_{hg} - d_{vg}\]

\[\log \theta_{g1} = \alpha + a_{vg} - d_{hg}\]

<p>Where \(h\) is the home ice advantage, \(a\) is the “attack strength,” \(d\) is the “defense strength,” and \(h\) and \(v\) denote “home” or “visitor” respectively. Last, \(\alpha\) is a flat intercept. In words: the home scoring rate is proportional to the attack skill of the home team, minus the defense of the away team, plus home ice advantage. The away scoring rate is the opposite, with no advantage.</p>

<h3 id="overtime">Overtime</h3>

<p>However, some crucial updates have been made, specifically accounting for the overtime mechanisms in hockey. For games that reach overtime, the $\theta$ parameters are scaled down to the relative time frame</p>

\[\theta_{h,o} = \theta_h \times \frac{1}{K},\]

\[\theta_{a,o} = \theta_a \times \frac{1}{K}.\]

<p>\(K\) represents the scaling factor for overtime expectations. For the regular season, \(K = 12\), since overtime is 5 minutes. In playoff games, \(K\) is set to 3, as each overtime period lasts one-third of the time of a regulation period. This assumption implies that the goal creation and suppression parameters remain the same during overtime, a necessary compromise given the relatively small dataset of OT games.</p>

<p>I also introduce a custom likelihood function, which compares the observed home and away overtime goals to the expected overtime goal rates (ot_home_theta and ot_away_theta). This function is only applied to games that went into overtime. It allows only 3 possible outcomes:</p>

<ol>
  <li>No goals scored by either team: (0, 0)</li>
  <li>Home team scores 1 goal and away team scores 0 goals: (1, 0)</li>
  <li>Home team scores 0 goals and away team scores 1 goal: (0, 1)</li>
</ol>

<p>For each allowed outcome \((h\_goals, a\_goals)\), we calculate the log-likelihood of the observed home and away overtime goals as follows:</p>

\[\text{loglikelihood}_{h_g, a_g} = \log P(y_{h,ot} | \theta_{h,ot}, h_g) + \log P(y_{a,ot} | \theta_{a,ot}, a_g)\]

<p>Here, \(P(y_{h,ot})\) and \(P(y_{a,ot})\) represent the probabilities of observing the home and away overtime goals, given their respective expected goal rates and the allowed outcome. And recall, these are Poisson distributed.</p>

\[P(y_{h,ot} | \theta_{h,ot}, h_g) \sim \text{Poisson}(\theta_{h,ot} \cdot h_g)\]

\[P(y_{a,ot} | \theta_{a,ot}, a_g) \sim \text{Poisson}(\theta_{a,ot} \cdot a_g)\]

<p>Then for custom likelihood function, the log-sum-exp of the log-likelihoods is as follows:</p>

\[\text{OT goals likelihood} = \log \left(\sum_{(h_g, a_g) \in \text{outcomes}} \exp\left({\text{loglikelihood}_{h_g, a_g}}\right)\right)\]

<p>Here, \(\exp(\text{loglikelihood}_{h_g, a_g})\) simply represents the likelihood of observing the home and away overtime goals, given their respective expected goal rates and the allowed outcome.</p>

<h3 id="shootouts">Shootouts</h3>

<p>For regular season games, if the score is still the same after the overtime consideration, a shootout model is then introduced, modeling the probability of the home team winning the using a familiar logistic regression. I introduce team-specific coefficients for shootout success and failure, denoted by \(so_o\) (success) and \(so_d\) (failure), as well as an intercept term \(so_i\) and a home advantage term \(so_{adv}\). Then, we calculate the probability of the home team winning the shootout using the logistic function:</p>

\[\text{logit}(so_{P_\text{home}}) = so_i + (so_{o,h} - so_{o,a}) + (so_{d,h} - so_{d,a}) + so_{adv} * h_i\]

<p>Finally, we model the shootout_winner variable as a Bernoulli random variable with probability \(so_{P_\text{home}}\):</p>

\[\text{shootout winner} \sim \text{Bernoulli}(so_{P_\text{home}})\]

<p>This shootout model is conditioned only on games that went to a shootout.</p>

<p>Frankly, I believe all this is far more transparent with code. So without further ado,</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">def</span> <span class="nf">overtime_goals_likelihood</span><span class="p">(</span><span class="n">observed_ot_h_goals</span><span class="p">,</span> <span class="n">observed_ot_a_goals</span><span class="p">,</span> <span class="n">ot_h_theta</span><span class="p">,</span> <span class="n">ot_a_theta</span><span class="p">):</span>
    <span class="n">allowed_outcomes</span> <span class="o">=</span> <span class="p">[(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">)]</span>
    <span class="n">likelihoods</span> <span class="o">=</span> <span class="p">[]</span>

    <span class="k">for</span> <span class="n">h_goals</span><span class="p">,</span> <span class="n">a_goals</span> <span class="ow">in</span> <span class="n">allowed_outcomes</span><span class="p">:</span>
        <span class="n">h_likelihood</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">logp</span><span class="p">(</span><span class="n">pm</span><span class="p">.</span><span class="n">Poisson</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">mu</span><span class="o">=</span><span class="n">ot_h_theta</span><span class="p">),</span> <span class="n">observed_ot_h_goals</span> <span class="o">*</span> <span class="n">h_goals</span><span class="p">)</span>
        <span class="n">a_likelihood</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">logp</span><span class="p">(</span><span class="n">pm</span><span class="p">.</span><span class="n">Poisson</span><span class="p">.</span><span class="n">dist</span><span class="p">(</span><span class="n">mu</span><span class="o">=</span><span class="n">ot_a_theta</span><span class="p">),</span> <span class="n">observed_ot_a_goals</span> <span class="o">*</span> <span class="n">a_goals</span><span class="p">)</span>
        <span class="n">likelihoods</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">h_likelihood</span> <span class="o">+</span> <span class="n">a_likelihood</span><span class="p">)</span>

    <span class="k">return</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">logsumexp</span><span class="p">(</span><span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">stack</span><span class="p">(</span><span class="n">likelihoods</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

    
<span class="n">home_idx</span><span class="p">,</span> <span class="n">teams</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">factorize</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s">"home_team"</span><span class="p">],</span> <span class="n">sort</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">away_idx</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">factorize</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s">"away_team"</span><span class="p">],</span> <span class="n">sort</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>

<span class="n">coords</span> <span class="o">=</span> <span class="p">{</span>
    <span class="s">"team"</span><span class="p">:</span> <span class="n">teams</span><span class="p">,</span>
    <span class="s">"match"</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">data</span><span class="p">)),</span>
<span class="p">}</span>

<span class="k">with</span> <span class="n">pm</span><span class="p">.</span><span class="n">Model</span><span class="p">(</span><span class="n">coords</span><span class="o">=</span><span class="n">coords</span><span class="p">)</span> <span class="k">as</span> <span class="n">model</span><span class="p">:</span>
    <span class="c1"># Global model parameters
</span>    <span class="n">intercept</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"intercept"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
    <span class="n">home</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"home"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mf">0.2</span><span class="p">)</span>

    <span class="c1"># Hyperpriors for attacks and defs
</span>    <span class="n">sd_att</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfCauchy</span><span class="p">(</span><span class="s">"sd_att"</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">)</span>
    <span class="n">sd_def</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">HalfCauchy</span><span class="p">(</span><span class="s">"sd_def"</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">)</span>

    <span class="c1"># Team-specific model parameters
</span>    <span class="n">atts_star</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"atts_star"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sd_att</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>
    <span class="n">defs_star</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"defs_star"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="n">sd_def</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>

    <span class="c1"># Demeaned team-specific parameters
</span>    <span class="n">atts</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"atts"</span><span class="p">,</span> <span class="n">atts_star</span> <span class="o">-</span> <span class="n">at</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">atts_star</span><span class="p">),</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>
    <span class="n">defs</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Deterministic</span><span class="p">(</span><span class="s">"defs"</span><span class="p">,</span> <span class="n">defs_star</span> <span class="o">-</span> <span class="n">at</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">defs_star</span><span class="p">),</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>

    <span class="c1"># Expected goals for home and away teams during regulation
</span>    <span class="n">home_theta</span> <span class="o">=</span> <span class="n">at</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">intercept</span> <span class="o">+</span> <span class="n">home</span> <span class="o">+</span> <span class="n">atts</span><span class="p">[</span><span class="n">home_idx</span><span class="p">]</span> <span class="o">-</span> <span class="n">defs</span><span class="p">[</span><span class="n">away_idx</span><span class="p">])</span>
    <span class="n">away_theta</span> <span class="o">=</span> <span class="n">at</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">intercept</span> <span class="o">+</span> <span class="n">atts</span><span class="p">[</span><span class="n">away_idx</span><span class="p">]</span> <span class="o">-</span> <span class="n">defs</span><span class="p">[</span><span class="n">home_idx</span><span class="p">])</span>

    <span class="c1"># Likelihood (Poisson distribution) for regulation goals
</span>    <span class="n">home_points</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Poisson</span><span class="p">(</span><span class="s">"home_points"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">home_theta</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s">'home_goals'</span><span class="p">],</span> <span class="n">dims</span><span class="o">=</span><span class="s">"match"</span><span class="p">)</span>
    <span class="n">away_points</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Poisson</span><span class="p">(</span><span class="s">"away_points"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="n">away_theta</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s">'away_goals'</span><span class="p">],</span> <span class="n">dims</span><span class="o">=</span><span class="s">"match"</span><span class="p">)</span>

    <span class="c1"># Overtime and shootout deterministics
</span>    <span class="n">overtime</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s">'home_goals'</span><span class="p">]</span> <span class="o">==</span> <span class="n">data</span><span class="p">[</span><span class="s">'away_goals'</span><span class="p">]</span>
    <span class="n">shootout</span> <span class="o">=</span> <span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s">'home_goals_ot'</span><span class="p">]</span> <span class="o">==</span> <span class="n">data</span><span class="p">[</span><span class="s">'away_goals_ot'</span><span class="p">])</span> <span class="o">&amp;</span> <span class="n">overtime</span>

    <span class="c1"># Expected goals for home and away teams during overtime (scaled down by 1/12)
</span>    <span class="n">ot_home_theta</span> <span class="o">=</span> <span class="n">home_theta</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="mi">12</span><span class="p">)</span>
    <span class="n">ot_away_theta</span> <span class="o">=</span> <span class="n">away_theta</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="mi">12</span><span class="p">)</span>

    <span class="c1"># Likelihood (custom likelihood function) for overtime goals
</span>    <span class="k">if</span> <span class="n">overtime</span><span class="p">.</span><span class="nb">sum</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">pm</span><span class="p">.</span><span class="n">Potential</span><span class="p">(</span><span class="s">"ot_goals_constraint"</span><span class="p">,</span>
                    <span class="n">overtime_goals_likelihood</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">home_goals_ot</span><span class="p">,</span> <span class="n">data</span><span class="p">.</span><span class="n">away_goals_ot</span><span class="p">,</span> <span class="n">ot_home_theta</span><span class="p">,</span> <span class="n">ot_away_theta</span><span class="p">))</span>

    <span class="c1"># Shootout model (conditioned on games that went to shootout)
</span>    <span class="n">so_coeff_o</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"so_coeff_o"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>  <span class="c1"># Offensive shootout coefficient
</span>    <span class="n">so_coeff_d</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"so_coeff_d"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">dims</span><span class="o">=</span><span class="s">"team"</span><span class="p">)</span>  <span class="c1"># Defensive shootout coefficient
</span>    <span class="n">so_coeff_h</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"so_coeff_h"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># Home advantage coefficient
</span>    <span class="n">so_intercept</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Normal</span><span class="p">(</span><span class="s">"so_intercept"</span><span class="p">,</span> <span class="n">mu</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">sigma</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>  <span class="c1"># Intercept term
</span>
    <span class="n">so_logit</span> <span class="o">=</span> <span class="p">(</span><span class="n">so_intercept</span> <span class="o">+</span>
                <span class="n">so_coeff_o</span><span class="p">[</span><span class="n">home_idx</span><span class="p">[</span><span class="n">shootout</span><span class="p">]]</span> <span class="o">-</span> <span class="n">so_coeff_o</span><span class="p">[</span><span class="n">away_idx</span><span class="p">[</span><span class="n">shootout</span><span class="p">]]</span> <span class="o">+</span>
                <span class="n">so_coeff_d</span><span class="p">[</span><span class="n">home_idx</span><span class="p">[</span><span class="n">shootout</span><span class="p">]]</span> <span class="o">-</span> <span class="n">so_coeff_d</span><span class="p">[</span><span class="n">away_idx</span><span class="p">[</span><span class="n">shootout</span><span class="p">]]</span> <span class="o">+</span>
                <span class="n">so_coeff_h</span> <span class="o">*</span> <span class="n">home</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">shootout</span><span class="p">.</span><span class="nb">sum</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        <span class="n">so_prob</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="n">invlogit</span><span class="p">(</span><span class="n">so_logit</span><span class="p">)</span>
        <span class="n">shootout_winner</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">Bernoulli</span><span class="p">(</span><span class="s">"shootout_winner"</span><span class="p">,</span> <span class="n">p</span><span class="o">=</span><span class="n">so_prob</span><span class="p">,</span> <span class="n">observed</span><span class="o">=</span><span class="n">data</span><span class="p">[</span><span class="s">'shootout_winner'</span><span class="p">][</span><span class="n">shootout</span><span class="p">])</span>

    <span class="n">trace</span> <span class="o">=</span> <span class="n">pm</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="mi">4000</span><span class="p">,</span> <span class="n">tune</span><span class="o">=</span><span class="mi">3000</span><span class="p">)</span>
<span class="k">return</span> <span class="n">model</span><span class="p">,</span> <span class="n">trace</span>
</code></pre></div></div>

<h2 id="playoff-predictions">Playoff Predictions</h2>

<p>To predict playoff games, we employ a simulation-based approach using the model’s posterior estimates. For each game, posterior samples of the attack and defense strengths for both the home and away teams are extracted, along with the other model parameters, then the scoring \(\theta\) values are calculated. Additionally, using a scaling factor of \(K=3\), possible OT scoring is calculated. Then, for each set of sampled parameters, we calculate the probability of the home team winning in both regulation and overtimes. The mean value is then compared to a random number 0-1 to simulate if the home team wins or not.</p>

<p>This formulation is run for all potential matchups in the cup, and once a team hits 4 wins in a series, they advance. 500 simulations of the entire tournament are ran daily, and the probability reported is the number of simulations in which a team wins a given round.</p>

<p>All code can be found <a href="https://github.com/tjburch/nhl-predictions">here</a>.</p>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Statistics" /><category term="hockey" /><category term="sports" /><summary type="html"><![CDATA[Who will win this year's cup?]]></summary></entry><entry><title type="html">2022 Reading List</title><link href="https://tylerjamesburch.com/blog/personal/reading-list" rel="alternate" type="text/html" title="2022 Reading List" /><published>2022-12-31T00:00:00+00:00</published><updated>2022-12-31T00:00:00+00:00</updated><id>https://tylerjamesburch.com/blog/personal/reading-list</id><content type="html" xml:base="https://tylerjamesburch.com/blog/personal/reading-list"><![CDATA[<h2 id="books-read-in-2022">Books Read in 2022:</h2>

<ul>
  <li><em>Deep Work</em> - Cal Newport</li>
  <li><em>The Man Who Solved the Market</em> - Gregory Zuckerman</li>
  <li><em>Freakonomics Revised and Expanded</em> - Steven D. Levitt and Stephen J. Dubner</li>
  <li><em>Weapons of Math Destruction</em> - Cathy O’Neil</li>
  <li><em>The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution</em> - Gregory Zuckerman</li>
  <li><em>Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else</em> - Jordan Ellenberg</li>
  <li><em>The Hitchhiker’s Guide to the Galaxy</em> - Douglas Adams</li>
  <li><em>The Arm: Inside the Billion-Dollar Mystery of the Most Valuable Thing in Sports</em> - Jeff Passan</li>
  <li><em>Peak: Secrets from the New Science of Expertise</em> -  Robert Pool, Anders Ericsson</li>
  <li><em>The Simplest Baby Book in the World: The Illustrated, Grab-and-Do Guide for a Healthy, Happy Baby</em> - Stephen Gross</li>
</ul>]]></content><author><name>Tyler James Burch</name><email>burcht11@gmail.com</email></author><category term="Personal" /><category term="books" /><category term="reading" /><summary type="html"><![CDATA[Books I read in 2022]]></summary></entry></feed>