## Tuesday, June 18, 2013

### Fun with probability

NESN's Twitter question of the day in the first game of the Sox/Rays doubleheader was introduced by Jenny Dell, who said something to the effect of how it's generally accepted that sweeping a doubleheader is rare, and asked Twitter: "Why is it so hard to sweep a doubleheader?" This has been argued in the past by baseball writers - in 2009 Joe Morgan said in an  ESPN column, "...in the history of baseball 80% of all doubleheaders have been split."

I'll tell you right now that Joe Morgan is way off, and NESN is starting from a false assumption as well. I replied to their twitter question, we'll see if it makes the broadcast. I'm sure it won't.

But my question is, what is the probability that a doubleheader is swept? What is the probability that YOUR TEAM sweeps a doubleheader? And does anyone else know how those probabilities compare to the historical rates of doubleheader sweepage?

Well, assuming randomness, the probability of YOUR TEAM sweeping is 1 in 4. The probability of either team sweeping is 1 in 2. I don't know nuthin' about the historicals.

Caleb said...

WOW the Red Sox swept. That's nearly unheard of. Except that 56% of doubleheaders between 2001 and 2008 were swept (176 games), and 52% of doubleheaders between 1901 and 1998 were swept.

Luke Murphy said...

It makes sense that the numbers would be slightly higher than 50%. More often than not, the team that wins game 1 is the better team, and will be more likely to win game 2.

Would not that be counterbalanced by a decrease in poorer team sweeps?

I'd look to home field advantage to explain it. A significant increase in home team sweeps, slight increase in splits, and many fewer away team sweeps.

Just guessin'.

Wait, that didn't make sense. A significant increase in home team sweeps, splits are fewer, away team sweeps slightly fewer.

Oh, never mind.

Luke Murphy said...

Actually, the increased number of favored team sweeps will always outweigh the decreased number of underdog sweeps. For example, if team A is better than team B, and has a 60% chance of winning each game, then the probability of a sweep is:

0.6*0.6 + 0.4*0.4 = 0.52

It's more extreme if team A is a more extreme favorite:

0.9*0.9 + 0.1*0.1 = 0.82

It's also valid even if the difference between the two teams is very small:

0.501*0.501 + 0.499*0.499 = 0.500002

The reason is clear when you look at the formula for the probability of a non-sweep:

Probability of non-sweep = (probability team A wins game 1)*(probability team B wins game 2) + (probability team B wins game 1)*(probability team A wins game 2)

and then assume that the probabilities are the same for games 1 or 2:

Probability of non-sweep = 2*(probability team A wins game 1)*(probability team B wins game 2)

The maximum product of 2 numbers that sum to X is achieved when the 2 numbers are equal (e.g. if X is 10, 5*5 = 25, 4*6 = 24, 3*7 = 21, etc.). As a result, if you assume equal probabilities for both games, the probability of a non-sweep is maximized at 50% and is the case only when both teams are perfectly evenly matched. That is probably never the case, but even if you assume it's the case for almost every doubleheader that has ever occurred, all it would take is the existence of 1 doubleheader with a clear favorite for both games to throw the overall rate of sweepage to slightly above 50%.

Luke Murphy said...

Actually, as I'm thinking about different possibilities and punching different combinations into my calculator, I'm realizing that even more broad generalizations can be made.

Situation 1:

If the same team is favored to win both games, the probability of a sweep will always be greater than 50%.

Situation 2:

If the team that is the favorite to win game 1 is for some reason the underdog for game 2, then the probability of a sweep will always be LESS than 50%. Example:

0.6*0.499 + 0.4*0.501 = 0.4998

Situation 3:

If the teams are perfectly evenly matched for EITHER of the 2 games, then the probability of a sweep is exactly 50%. Example:

0.9*0.5 + 0.1*0.5 = 0.5

So, looking back at the history of doubleheaders, as long as the weighted number of situation 1s is greater than the weighted number of situation 2s (situation 3s are irrelevant), the "sweep rate" will always be greater than 50%.

Interesting. And it is usually likely that one team is better than the other, at least marginally, and it is always the case that one team has homefield advantage. Ergo, more sweeps.

An interesting test to determine if there is any "doubleheader effect" would be to compare sweeps percentage in doubleheaders with sweeps percentage of any two games played by two teams on successive days.

Has there ever been a bigger lunkhead than Joe Morgan?

Luke Murphy said...

That'd be an interesting test. You could also look at the effect of homefield advantage by looking at 2 game series in which the games are played in different locations. Games 2 and 3 or games 5 and 6 in a 7 game playoff series, for example. I would bet that the sweep probability falls to under 50% in those games.

I tried to think of some reasons why the favored teams would switch during a doubleheader, but I can't come up with anything that would plausibly affect one team more than the other.

Luke Murphy said...

The only baseball people worse than than Joe Morgan are the White Sox announcers.

Laura said...

He gone.

Luke Murphy said...

WABEK! WABEK! WWWWWAAAABEK!

UUUUUCAAAANPUDIDONDABOOOOOOOOOOAAAAARRRRRD!