2010 NCAA Tournament Simulations

Be sure to also check out the "Fan Anxiety Matrix"

The brackets are set, and so for college basketball stat nerds, that means simulations. A number have already popped up: 5000 simulations using Sagarin's predictor rating; an online tool for simulating one random bracket, using Pomeroy's statistics.

As I did last year, I take both approaches one step further: using Pomeroy's offensive and defensive efficiency ratings, and the log5 prediction method, I simulated the 2010 NCAA tournament one million times. My script tabulated the results below. (You can also access them in Google Spreadsheet form by clicking here).

Each column is a round of the tournament; each value is the percentage of the one million simulations that a team reached a given round. On the right-hand side is the average number of wins each team in the tournament had in the simulations.

Last year, hardly any first round games were likely to be upsets based on the simulation. This year is much different, thanks to several major discrepancies between Pomeroy's rankings and the seedings made by the committee. Brigham Young, in particular, is seventh overall at kenpom, while stuck in a 7-seed in this year's tournament. The best games to watch this Thursday and Friday will be the 6-11 games, as all four are slated to be near coin-flips.

The intuition here is to do a face-slap and say "Duh, the teams at the top of the Pomeroy rankings have the best chance in simulations using the Pomeroy rankings!" That dismissal would miss several key features of the simulation, and one interesting thing to do is to see how the simulations correspond to our gut instincts about the basketball matchups in each game. For example, Kentucky and Syracuse have rough roads to the national title because of very high (about 25%) chances of losing in the second round. Florida State is the culprit for Syracuse; if you combine that information with the possibility that Arinze Onuaku will not play this weekend, an FSU-Syracuse game on Sunday suddenly gets very interesting.

Kentucky could run into Texas in the second round, and the Longhorns are ranked much higher than in the RPI and by humans (both the selection committee and the polls). And while Texas has been a bit of an enigma to the national media this year, it is clear that the possess the talent to be efficient on both ends of the floor. Kentucky's road is further blocked by Wisconsin, a team actually ranked higher than the Wildcats. That Sweet Sixteen matchup would be really bruising on Kentucky's boards, as the nation's #2 offensive rebounder (Collins) takes on the Badgers' nation-best defensive rebounding squad.

The Pomeroy rankings are typically recognized as being successful post-season analysis of the teams, including the tournament games. All six NCAA champions since Pomeroy's website launched were ranked in the top two post-season, and in the top 15 for both offense and defense. However, the accuracy of the pre-tournament stats is a bit more rusty; last year the "best simulation" out of one million got 53 games right, and averaged 37 correct games. Is it a lack of complete data, a flaw in the system's ability to prognosticate, or just the general stochasticity of the NCAA tournament?

This will be a very interesting year for seeing the ability of the RPI rating system (used by the committee, and which does not include margin of victory) versus the Pomeroy rating system (which goes to the opposite extreme, including margin of victory with no cap). If the adjusted efficiencies are the more accurate predictor this tournament, then we are likely to have a mad, mad, mad, mad March.

I wanted to get the results out quickly, but I will have some further analysis in this post and others throughout the week. Thanks as always to Ken Pomeroy for his absolutely terrific website; without his stats, none of the fun simulators would exist!

