NCAA pointspreads and winners

Long ago I stumbled upon an excellent paper by Hal Stern: "On the Probability of Winning a Football game" (1991, The American Statistician, vol. 45, no. 3, pp. 179-183). Stern took NFL pointspread data from three years in the early 1980s and then calculated the probability of winning conferred by the pointspread alone. For example, Stern's data shows that 3-point favorites win about 59% of the time. At the time of this writing, Stern's paper exists in the Google cache from a Stanford class which used it as a handout.

I made use of the paper several times, for example discussing the odds of Ohio State getting to the BCS title game in 2007. Right after the loss to Illinois, the probability of Ohio State getting to the BCS title game was around 1%. So many teams above Ohio State lost in the subsequent weeks, that by the time I wrote that article the probability was over 50%. The probability of finishing #1 BCS (instead of just "top two") was only 2%, but that happened as well. It was a crazy season.

I recognized at the time that I was really abusing the numbers in Stern's paper. His numbers came from NFL pointspread data. NFL teams are subject to more scrutiny, and I believe that they perform on a more consistent level than college teams. I suspected that the probability of a 7-point NFL favorite losing (30%) might be different from the probability of a 7-point NCAA favorite losing.

Over the last month or two, with the generous help of Phil Steele, I've been assembling a database of all pointspreads and outcomes for BCS teams from 1999 through 2010 inclusive. The result is 9,626 games entered. (Note that each BCS-vs-BCS game is entered twice: For example, "Kansas -2 at Kansas State, result 24-10" would also show up in the data as "Kanas State +2 hosting Kansas, result 10-24." A game like Akron at Ohio State would only be listed once.) I divided the data into buckets by pointspread and computed the percentage of time that the favorite won in each bucket. For a 7-point favorite, that exact bucket yielded 69% wins by the favorite. Also, the best-fit curve seems likely to yield a similar value, so the NCAA numbers seem to be similar to the NFL ones.

The result, accumulatd by a simple perl script and then charted in Excel, is shown below. The Y-axis is the percent of teams which were favored by the exact amount that won their game; the X-axis is the size of the pointspread. There's one pointspread bucket per half-point (teams with a "0" line, teams favored by 0.5 points teams favored by 1 point, teams favored by 1.5 points, etc.). The chart shows three plotted data sets against those axes: the full data set (red), home favorites only (green) and road favorites only (purple).

Note that some of the bumps in the higher pointspreads are exaggerated due to the small number of samples that go into those data points. For 1-point favorites through 7-point favorites, every bucket has a minimum of 200 games. But consider the purple data point around -30 which falls on the 60% line. For that bucket -- "away games where the visiting team is a 30.5-point favorite" -- there are only three data points (two BCS-vs-BCS games each counted twice, and one other). Rutgers in 1999 beat Syracuse despite being 30.5-point underdogs, which (as a double-counted game, yielding only 3/5 wins by the favorite) by itself accounts for that deviation.

That's also true for the overly-high purple data point on the extreme left of the graph: there was only one 0.5-point road favorite in my data set. That team happened to win the game, to yield 100% winners for that bucket. With college overtime, it's not possible for a game to end in a tie score. As a result spreads of +0.5, 0, and -0.5 are all effectively the same.

The raw data that goes into the chart, is:

SpreadFavorite
Wins
GamesWin
Pct
Home Fav
wins
Home
Games
Home
Win %
Away Fav
wins
Away
Games
Away
Win %
0254654.35 81747.06111861.11
0.52450.00 02 0.0022100.00
112524351.44 529952.53489948.48
1.59521145.02 448949.444210042.00
213220664.08 519354.84648377.11
2.5179396 45.2085192 44.2775160 46.88
332858755.88 14326354.3715426458.33
3.5293483 60.66124208 59.62129208 62.02
415624164.73 7612162.81588865.91
4.5145241 60.177812562.405188 57.95
5115173 66.47619564.213960 65.00
5.5124206 60.197110468.274182 50.00
617827664.49 9314265.497010666.04
6.5266372 71.5115821573.4987118 73.73
729943369.05 15121171.5612818270.33
7.5243328 74.0914318378.1481110 73.64
812618667.74 6710464.42416167.21
8.5129179 72.077511068.184257 73.68
910414273.24 587577.33304665.22
9.5115162 70.99739676.043657 63.16
1019024876.61 10514174.47678380.72
10.5116155 74.84699473.403648 75.00
1110214172.34 649269.57364383.72
11.59812479.03 527173.24434889.58
1210512484.68 617284.72344085.00
12.5105123 85.37789383.872426 92.31
1311114775.51 689273.91384977.55
13.5206253 81.4214317482.186076 78.95
1418222082.73 12114682.88475683.93
14.5126148 85.14839587.373543 81.40
159510689.62 677490.54252986.21
15.5829685.42 566586.15262989.66
16779779.38 496081.67283775.68
16.5109129 84.50718880.683538 92.11
1711813090.77 869491.49303488.24
17.5111125 88.80809088.892933 87.88
18727793.51 525791.232020100.00
18.5717693.42 515691.071818100.00
19728188.89 455286.54252792.59
19.5586984.06 465190.20121866.67
2010010694.34 727892.312727100.00
20.5748092.50 525692.86192190.48
2113914198.58 10110398.063636100.00
21.5747894.87 566093.331818100.00
22717891.03 495490.74182090.00
22.5444793.62 313393.941313100.00
23616593.85 3939100.00222684.62
23.5616495.31 495098.00121485.71
24697394.52 535792.981616100.00
24.5727892.31 555894.83141782.35
25455090.00 303196.77151978.95
25.5383997.44 353697.2222100.00
26363894.74 293193.5566100.00
26.54040100.00 3131100.0077100.00
27657092.86 515494.44141687.50
27.54949100.00 4040100.0066100.00
286262100.00 4646100.001111100.00
28.5313296.88 282996.5533100.00
29343791.89 283190.3244100.00
29.5343597.14 303196.7733100.00
305151100.00 3434100.001717100.00
30.5333594.29 2828100.003560.00
31303196.77 222395.6577100.00
31.52222100.00 2020100.0022100.00
323434100.00 2828100.0066100.00
32.51818100.00 1616100.0022100.00
333333100.00 2626100.0077100.00
33.51717100.00 1717100.00---
342525100.00 2424100.0011100.00
34.52121100.00 2121100.00---
35313296.88 313296.88---
35.52525100.00 2323100.0022100.00
362020100.00 1919100.0011100.00
36.51515100.00 1313100.0022100.00
37131586.67 101283.3333100.00
37.51111100.00 88100.0033100.00
381717100.00 1515100.0022100.00
38.544100.00 44100.00---
391010100.00 66100.0044100.00
39.577100.00 77100.00---
4088100.00 66100.0022100.00
40.51212100.00 1111100.0011100.00
41212391.30 182090.0033100.00
41.599100.00 88100.0011100.00
421212100.00 1212100.00---
42.566100.00 44100.0022100.00
4344100.00 44100.00---
43.566100.00 66100.00---
4466100.00 66100.00---
44.555100.00 44100.00---
451010100.00 1010100.00---
45.544100.00 44100.00---
4633100.00 33100.00---
46.533100.00 33100.00---
4744100.00 44100.00---
47.511100.00 11100.00---
4844100.00 22100.0022100.00
48.511100.00 11100.00---
4944100.00 44100.00---
49.522100.00 22100.00---
5022100.00 22100.00---
5122100.00 22100.00---
51.511100.00 11100.00---
5211100.00 11100.00---
5322100.00 22100.00---
5611100.00 11100.00---
59.511100.00 11100.00---

For those interested in the raw data (all the individual game results), I hope to make it available soon. For now, you can click here to download an Excel spreadsheet containing the chart above and the collated data that goes into it. Also, I will work on further analysis on the data, coming up with a best-fit curve and a conversion chart for pointspread to win-percentage for that.