Tuesday, March 17, 2009

Upset Special!

Hello again, welcome back to Immaculate Inning as we continue our week-long dive into the NCAA tournament, simulation style. In case you missed the posts, I've simulated the tournament one million times, and I've pulled from the data the most likely championship games and final fours. The link to the all-mighty spreadsheet (here).

This time I'm going to take a look much earlier in the tournament, as we fast approach the most exciting weekend of the sports year. Everybody loves a Cinderella, and everyone wants to brag about how they picked the upsets that filled the perfect brackets at work on Monday. This is going to be different from upset analysis you may have seen elsewhere, such as AccuScore, which simulates individual games 10,000 times. I've simulated the result of each game in the tournament once, then repeated that one million times. That number of simulations allows me to use statistical power that not even the flashy WhatifSports can match.

First, let's look at the upsets that are matters of probability; the efficiency ratings say, point blank, that the lower seed should be favored to win.

Upset Special #1: #10 Southern California (65.5%) over #7 Boston College (34.5%). The Trojans have the highest percentage of winning the first round game for any double-digit seed, and they might not have even been in the tournament if it weren't for capturing the Pac-10 tournament title. Both teams are strong on the offensive glass and weak on the defensive glass, and both teams don't take very many threes. This game could be a bruiser in the paint. One trouble spot for a USC upset potential is their poor free-throw ability; in a close game, Boston College has a clear edge there.

Upset Special #2: #12 Wisconsin (53.1%) over #5 Florida State (46.9%). As an avid fan of nearly all ACC teams when it comes to the tournament, this one hurts. The Seminoles enter the big dance as one of the hottest teams in the nation, knocking off (an admittedly wounded) North Carolina on the way to a runner-up finish in the ACC Tournament. Toney Douglas is exactly the kind of player that can go off in a big tournament and carry his team a long way. Wisconsin, meanwhile, is plodding-- 59.9 possessions is 334 out of 344 division 1 teams; is mistake-free-- #5 in turnovers/possession and #6 in steals/possession in the nation on offense. They also failed to win twenty games and have no one particularly scary. This is one where I personally would have a hard time following my own simulation, but they won just 0.82 games on average, by far the worst among the #5 seeds.

In terms of pure upsets predicted by the simulations, that's it for the first round. In general, if we were grading the committee based upon how well they matched higher seeded teams with higher Pomeroy efficiency ratings, they did pretty well. However, there are quite a few games that are "too close for comfort," when taking the seeds into account.

TCFC #1: #3 Kansas (80.7%) vs #14 North Dakota St (19.35%). NDSU, in their first tournament in their first year of eligibility, is a favorite upset pick among statheads like myself. The numbers were prettier a few weeks ago, but the Thundar (really? Thundar?) put up a pretty good offense for a minor-conference team. They can shoot lights out (40.2%, 10th in the nation), and Kansas hasn't defended the 3 very effectively this season. They also protect the ball pretty well (14th in turnovers/possession), while Kansas does not (244th). Bill Self's squad could be in trouble with this one.

TCFC #2: Dueling #13 seeds-- Mississippi St (23.8%) and Cleveland St (24.9%) both have much higher chances of knocking off their respective 4-seeds (Wake Forest and Washington). While the SEC champs would make for a nice story, the clear media favorite would be Cleveland St, a team which upset Butler in the Horizon league final to make the tournament. The Spiders won't spook anyone offensively, but they have a defense that is among the nation's best at taking the ball away. Washington, meanwhile, are in the middle of the pack in taking care of the ball, and their size should be more than enough to take care of Cleveland St. If I were the Huskies, I wouldn't be sleeping easy about a 1-in-4 chance of losing, however.

As for Wake Forest, I think we're noticing a trend; my simulation hates ACC teams not named Duke or Carolina. The other team not mentioned yet is Maryland, and my simulation has Maryland winning the fewest average games of any 10 seed, although they have a better shot at winning their opening round game than Michigan does, barely (35%). The folks filling out their bracket on ESPN disagree strongly, favoring Maryland over Cal 2-to-1.

Most casual bracket-fillers will lose interest after their brackets are busted by sometime Sunday evening; but the one who picks the correct surprise Sweet Sixteen teams is going to be the one bragging come Monday morning. So which low-seeded teams have the best chance to be standing after this weekend? These teams showed up in the Sweet Sixteen in at least ten percent of the simulations:

Wisconsin (#12 E): 26.5%
Southern California (#10 MW): 26.3%
Arizona (#12 MW): 17.9%
Michigan (#10 S): 10.9%
Minnesota (#10 E): 10.3%

I think it would be wise to be cautious about picking these #10 seeds to win two games this weekend. To see why, consider what the simulation was doing: picking at random (weighted by expected winning percentage) the winner of each game. So in some number of trials, the #2 seeds fell in the first round (Robert Morris and Morgan St. each won 8% of the time, for example). In those scenarios in which the #15 and #10 teams both won, the #10 seed is going to be a heavy favorite in the second round game. This inflates the chances of a #10 team making it to the second round; only a little bit has to do with the ability of the #10 seed to beat the #2 seed, by far the more likely opponent.

This is not the same with the #12 seed "Cinderellas" (not that major conference teams could ever count as such). Their upset win pits them, at worst, with a similarly-seeded #13 seed. Their high percentage really does suggest good matchups.

To finish, I present the best chances of winning two games this weekend, by seed:

1 seed: Louisville (80.23%)
2 seed: Memphis (83.98%)
3 seed: Missouri (60.50%)
4 seed: Gonzaga (68.66%)
5 seed: Purdue (47.16%)
6 seed: UCLA (54.30%)
7 seed: Clemson (34.66%)
8 seed: Brigham Young (24.29%)
9 seed: Tennessee (13.42%)
10 seed: Southern California (26.29%)
11 seed: Temple (9.05%)
12 seed: Wisconsin (26.51%)
13 seed: Cleveland St. (8.33%)
14 seed: North Dakota St. (4.01%)
15 seed: Robert Morris (1.62%)
16 seed: East Tennessee St. (1.09%)... yes, they have a 6% shot at beating Pittsburgh....

No comments: