Wednesday, March 17, 2010

2010 NCAAT Fan Anxiety Matrix

Simulating the NCAA tournament has been pretty popular this year, although I imagine that many of the Duke haters out there are not very satisfied with the results. The results, as expected, agree with the discrete log5 projections published by Basketball Prospectus (Click the links for South, East, West, and Midwest region log5 predictions). The difference between those predictions and my simulations, is the type of data that can be pulled out of the simulations.

For instance, below is a list of the five teams that all 64 teams in the field are most likely to end their seasons against. For example, while Duke was found to win the whole tournament in 24% of simulations, who did they lose to in the other 76% of the one million simulations?

As you can see, California is the team most likely to send Duke home with a disappointing season (and give plenty of bloggers and media mouthpieces lots to puff their chests about). It's not surprising to see that Kansas is one of the top five teams to knock off Duke, since Duke averaged 3.5 wins per simulation, and got to the final game very often, as did Kansas.

We're calling this presentation the Fan Anxiety Matrix. Which team is the most likely to knock out your favorite team? The second tab of the spreadsheet above shows every team's chances of being knocked out by every other team. Click here to see the full spreadsheet.

With a focus on the ACC teams, here's some pie charts to gaze at. Maryland fans will be pleased to see that Duke has only a 2% chance of ending their season. Unfortunately, there is only a 2% chance of winning the national championship as well. The other four ACC teams have an uphill battle to make it until the second weekend (click a Fan Anxiety Index to enlarge):

Sunday, March 14, 2010

2010 NCAA Tournament Simulations

Be sure to also check out the "Fan Anxiety Matrix" Who will your team lose to this March?

The brackets are set, and so for college basketball stat nerds, that means simulations. A number have already popped up: 5000 simulations using Sagarin's predictor rating; an online tool for simulating one random bracket, using Pomeroy's statistics.

As I did last year, I take both approaches one step further: using Pomeroy's offensive and defensive efficiency ratings, and the log5 prediction method, I simulated the 2010 NCAA tournament one million times. My script tabulated the results below. (You can also access them in Google Spreadsheet form by clicking here).

Each column is a round of the tournament; each value is the percentage of the one million simulations that a team reached a given round. On the right-hand side is the average number of wins each team in the tournament had in the simulations.

Last year, hardly any first round games were likely to be upsets based on the simulation. This year is much different, thanks to several major discrepancies between Pomeroy's rankings and the seedings made by the committee. Brigham Young, in particular, is seventh overall at kenpom, while stuck in a 7-seed in this year's tournament. The best games to watch this Thursday and Friday will be the 6-11 games, as all four are slated to be near coin-flips.

The intuition here is to do a face-slap and say "Duh, the teams at the top of the Pomeroy rankings have the best chance in simulations using the Pomeroy rankings!" That dismissal would miss several key features of the simulation, and one interesting thing to do is to see how the simulations correspond to our gut instincts about the basketball matchups in each game. For example, Kentucky and Syracuse have rough roads to the national title because of very high (about 25%) chances of losing in the second round. Florida State is the culprit for Syracuse; if you combine that information with the possibility that Arinze Onuaku will not play this weekend, an FSU-Syracuse game on Sunday suddenly gets very interesting.

Kentucky could run into Texas in the second round, and the Longhorns are ranked much higher than in the RPI and by humans (both the selection committee and the polls). And while Texas has been a bit of an enigma to the national media this year, it is clear that the possess the talent to be efficient on both ends of the floor. Kentucky's road is further blocked by Wisconsin, a team actually ranked higher than the Wildcats. That Sweet Sixteen matchup would be really bruising on Kentucky's boards, as the nation's #2 offensive rebounder (Collins) takes on the Badgers' nation-best defensive rebounding squad.

The Pomeroy rankings are typically recognized as being successful post-season analysis of the teams, including the tournament games. All six NCAA champions since Pomeroy's website launched were ranked in the top two post-season, and in the top 15 for both offense and defense. However, the accuracy of the pre-tournament stats is a bit more rusty; last year the "best simulation" out of one million got 53 games right, and averaged 37 correct games. Is it a lack of complete data, a flaw in the system's ability to prognosticate, or just the general stochasticity of the NCAA tournament?

This will be a very interesting year for seeing the ability of the RPI rating system (used by the committee, and which does not include margin of victory) versus the Pomeroy rating system (which goes to the opposite extreme, including margin of victory with no cap). If the adjusted efficiencies are the more accurate predictor this tournament, then we are likely to have a mad, mad, mad, mad March.

I wanted to get the results out quickly, but I will have some further analysis in this post and others throughout the week. Thanks as always to Ken Pomeroy for his absolutely terrific website; without his stats, none of the fun simulators would exist!

Thursday, March 11, 2010

One Million ACC Tournament Simulations

One year ago, we had some fun with spreadsheets and used a number of different methods to predict the ACC Tournament. Unfortunately, the raw numbers had no way of knowing the status of Ty Lawson's ankle, so the team predicted to win about 30% of the time grabbed the title, while UNC rested for bigger fish. In the past year the number of Pomeroy Disciples has grown, and so traditional "log5" predictions of the conference tournaments can be found all across teh internets (although this one in particular, from Basketball Prospectus, is a must read).

I like to find my little niche here at the Immaculate Inning, and that means simulating the hell out of things. The method is the same as for last years' ACC tournament. This year, I used raw offensive and defensive efficiencies that were tabulated here. This means that a team did not have their stats adjusted for home games or for the strength of opponent: the only values in the stat is points scored (or allowed) per possession. Via the Pythagorean Expectation Formula (with KenPom's exponent for unadjusted efficiencies = 8.5), I calculated a team's "expected winning percentage."

To determine the chances that team A beats team B, a form of Bayes Formula is applied, which in the stat-head world has come to be known as "The log5 Method." The method could be applied to any scenario where the probability of a single outcome is desired, given the prior probability for each of two alternatives. Here, we have two teams, each with an expected winning percentage, and can calculate the probability of a .900 team beating an .800 team. If we assume that the result of each game is independent, then we can multiply probabilities together to get a team's overall probability of making a certain round.

Personally I find the method rather deterministic, in what essentially is a stochastic process. Instead, I run the tournament 1 million times and calculate the percentage of simulations n which each team makes it to each round. The results of my simulations for the 2010 ACC Tournament are below:

The spreadsheet has two tabs, one for a simulation done using stats from all games, while the other is for ACC games only. The way to read it is that each team (row) won a certain number of games (0,1,2,3, or 4) in a certain percentage of the 1 million ACC tournaments I simulated. For the top four seeds, the maximum number of wins is 3, while the other 12 teams could potentially win four games and the tournament.

Duke's chances of winning the tournament is severely if stats from the entire season are used, and they go from a near 2-to-1 favorite to not even winning a majority of simulations. Part of this has to do with the raw nature of the efficiencies; accounting for Duke's tough schedule (and it was one of the toughest in the country by most any measure: KenPom, Sagarin, RPI) would probably account for most of the discrepancy.

On the other end of the spectrum is Miami, which gained an incredibly high percentage (from 0.1% to 3.4%) because they had a highly positive efficiency margin for all games, while it was highly negative in ACC games only. The Canes played very very well against a bunch of schools I've barely heard of, followed by getting clobbered in ACC play. Their adjusted efficiency margin is still decent due to the ACC games they played, but it's hard to give the full season stats much regard in this instance.

The numbers for the ACC-only simulations differ from those seen at Basketball Prospectus; I imagine most of the differences here also have to do with using raw efficiencies rather than Pomeroy's adjusted numbers. The adjusted numbers, Pomeroy claims, are the best for predicting "the chance of beating an average D-1 team on a neutral floor." The raw numbers, then, are skewed based on home-court advantage, schedule (remember, the ACC is no longer "balanced"), and the overall strength of offenses and defenses a team faces. In particular the predictions differ in that Maryland's chances are reduced, at the expense of better chances for FSU and VPI. Both methods agree that fifth-seed Wake has one hellish path towards an ACC title; much worse than sixth-seeded Clemson's chances. Overall, it will be interesting to see whether raw or adjusted efficiencies do a better job predicting the ACC tournament.

Another advantage that these simulations have is the amount of fun I can have with the results. Below I present the "Fan Anxiety Matrix." Each cell in the Matrix represents the chances that a team (in the rows) loses in the ACC tournament to a specific team (in the columns):

So, Duke's "Fan Anxiety Matrix" says that, in the 37% of simulations when they didn't win the whole thing, the most common opponent taking down the Blue Devils was Maryland (Using the ACC stats here). Perhaps no surprise there, but then there were still 8.3% of the simulations in which the Blue Devils fell in the semi-finals to Virginia Tech. Duke's first round game is against either Boston College or Virgina, and the combined percentage of simulations in which the Blue Devils' ACC run ended against those two teams was six percent. It should be of some comfort that UNC's chances of taking down Duke (this would have to be in the finals) clocked in at a tiny 0.029%.

Looking at the matrix, Clemson's path is an interesting one. The Tigers are seeded sixth and must at least pass through NCSU and FSU to get to the semifinals; the Matrix has them losing to these teams 22.5% and 37.1%, respectively. Maryland (22.4%) and Duke (10.1%) also appear in the double-digit percentages as Clemson's final ACC foe, with the remaining 3.2% speaking for Clemson's ACC title chances.

Virginia Tech's bubble position would certainly be helped with a win in their quarterfinal matchup; things are looking up according to the simulations, which have them falling to Duke in the semifinals 50% of the time. Wake Forest has a rough road to the ACC title, as they must win Thursday versus Miami (losing %: 30.5), Friday versus Virginia Tech (39.5%), Saturday versus (with 94% probability) Duke, who accounted for a further 25% of Wake's losses in the simulations.

For posterity's sake, here are the official Immaculate Inning ACC Tournament Predictions:

Thursday winners: Virginia, Wake Forest, Georgia Tech, Clemson
Friday winners: Duke, Virginia Tech, Maryland, Clemson
Saturday winners: Duke, Maryland
ACC Champion: Duke 75, Maryland 60

Friday, March 05, 2010

The Last Time

The average US price for a gallon of gasoline was $1.82.

One share of Google stock was $187.40.

The HMS Scott reveals, via mapping of the seafloor, a 100 m landslide at the epicenter of the deadly 2004 earthquake/tsunami in the Indian Ocean.

Year 4702 (Year of the Rooster) began in the Chinese calendar.

The #1 song on the Hot-100 Billboard charts was "Let me love you" by Mario

The top movie at the box office was "Boogeyman." It was about to be replaced by "Hitch."

Cuba begins a ban on smoking in public places.

President George W. Bush, just two weeks into his second term, announces a tax increase.

Two months before his death, Pope John Paul II allows an American cardinal to give the Ash Wednesday address from the Vatican.

#8 Duke beat #2 North Carolina in Cameron Indoor Stadium, 71-70, after Rashad McCants dribbled the ball off his foot with two seconds left.

Duke students, acting like They Have Been There Before, stay in the stands to sing the alma mater, rather than rushing the court. Students proceed to burn shit in a disorderly fashion.

That was Wednesday, February 9, 2005. The last time Duke beat UNC at Cameron.

Carolina delenda est.

Tuesday, March 02, 2010

Five Teams That Should Not Scare Duke In March

Duke fans, I've noticed, have a tendency to be of the self-hating variety. Threads at Duke Basketball Report are filled with skepticism for (supposedly) one's own team, which seems like it just doesn't translate well into pleasurable fan experiences. Some fans even go as far as concocting reasons why a win in a given game is not a big deal: I even saw a fan state that "some years the best team doesn't win it all; if Duke won the national championship, this would be one of those years." How sad a fan existence is that?

To combat the rampant pessimism, this post will identify the types of NCAA tournament teams Duke matches up extremely well against. I will examine whether some of the teams rated top in the country by pollsters and bracketologists are also the types of teams that match up well against Duke in tempo-free statistics. Here I will look at teams that match up poorly against Duke; later I will take a look at some "scarier" teams. My approach is to examine a path towards the National Championship that includes possible opponents at each level of the NCAA Tournament, and which teams/types of teams Duke would love to see. It is unlikely that Duke will have trouble with its first round opponent this year, so that leaves five rounds, and five example teams.

Clearly, this analysis involves looking at precisely zero video. The only team I have consistently watched all year is Duke; I may have tuned into a game here or there for other teams, but that one game could bias my thoughts on non-ACC teams. From the tempo-free perspective, there are a number of things Duke does extremely well:

1) Offensive Rebounding (7th in nation at 40.7%). Brian Zoubek is no longer Duke's best-kept secret weapon, and his improved offensive efficiency rating over the last few games complements his best-in-the-nation offensive rebounding ability.

2) Free Throw Percentage (8th in nation at 75.9%). While other teams can't really control how Duke does from the line, they can certainly be a team that fouls a lot, sending Duke to the line for some free points.

3) Three-Point Shooting Defense (1st in nation at 26.7%). Mid-major teams which lack size and rely on the three-pointer are going to have long nights against Duke in March.

4) Lack of Turnovers (16.4% of possessions, 11th in nation). Duke's turnover rate has been particularly excellent over their last few ACC games (and would be even lower if not for several turnovers by the bench players in garbage time against Virginia).

5) Height. Pomeroy lists two kinds of height on his statsheet: The raw number in inches (Duke's average height is 2nd in the nation at 79.0"). Clearly, this should be weighted by playing time, and Pomeroy does this: he points out that height is correlated to offensive efficiency with r-squared = 0.27, with r-squared = 0.38 on the defensive end. Duke's "Effective Height" is +4.9, meaning they are about five inches taller than the average Division-1 team. Here are the tallest teams in the nation, and Duke has already beaten four of the top ten!

Last week, I pointed out the aspects of the tempo-free analysis that Duke struggles with, and not much has changed so refer to that post for Duke's weaknesses.

Looking at the 7,8,9,and 10 seeds in Joe Lunardi's latest bracketology is a good place to find teams Duke is likely to see in the second round of the tournament. There are a bunch of ACC teams there, but we can leave them out since the committee does like to put teams from the same conference on collision courses before the regional final. Which of these teams sticks out as matching up poorly against Duke?

1 (Thirsty Thirty-Two) Oklahoma State. Picked to be a #8 seed by Joe Lunardi, and firmly in the "Also Receiving Votes" portion of both human polls, the Cowboys are a possible second-round opponent for Duke. Their overall offensive (112.8; 31st) and defensive (94.1, 62nd) efficiencies are about the middle of the pack for possible NCAAT teams, as you might expect for a team with a middle-seed. When broken down into component factors, there's a lot to like about this team from a Duke perspective. First of all, they are a short team, with an effective height more than an inch below the D-1 average; the tallest player on the roster is 6-8 Junior Matt Pilgrim, who sees less than 40% playing time.

The most convincing matchup for Duke comes with shot selection. Oklahoma State is near the top of the pile in relying on the three-point shot: 32.7% of their points come from behind the arc, which is 45th in the nation (the highest among possible NCAAT teams). Further, they are not particularly good at making three-pointers, shooting just 35.6%, barely in the top third of all D-1 teams. Finally, the Cowboys struggle to grab offensive rebounds: a rate of 30.2% of their missed shots places them 264th in the nation. The combination of chucking up threes, not making a lot, and struggling to rebound is a strategy that plays very poorly against the Blue Devils, who have superior rebounding and 3-point defense.

2 (Sweet Sixteen) Pittsburgh. This is a team that is slated to be seeded in the 3-5 range, and therefore is a possible Sweet-Sixteen matchup, and one that the tempo-free stats say would be good for Duke. The biggest reason that jumps out is, once again, height. Pittsburgh has an effective height of a full inch below average, especially at guard and forward. They do have 6-10 center Gary McGee, who plays nearly 60% of the time, but who is not much of a factor offensively. Defensively, McGee has a good block rate and a good defensive rebounding ability, so most of Duke's height advantage in this one comes from Kyle Singler and Jon Scheyer, who tower over their likely defensive counterparts.

One thing Duke would not be doing in a game with Pittsburgh is wasting offensive possessions. While the Panthers are good at limiting their opponent's eFG% (44.1; 20th), they are mediocre-to-bad at the other three factors. Duke protects the basketball pretty well on offense, and Pittsburgh rarely forces their opponents into turnovers (Defensive turnover rate: 16.8, 335th). Pitt also don't have much success on the defensive boards (DR% 68.9, 106th), which once again sets up the scenario of Duke dominating the offensive glass and making every possession count.

Pitt also plays a rather plodding pace (62.4 poss/game, 325th), and has a rather heavy reliance on getting to the free throw line to make their points (23.9% of points come from the charity stripe, 42nd most in the nation). If this game were played in November, this could have come as an advantage for Pitt, but Duke has reduced their fouls in recent weeks, with good results. The tempo-free stats don't say much about the "quick guards" factor, but Pitt does have a high percentage of their baskets accompanied by assists (67.1%, 4th). If this is the signature of a quick team using a lot of backdoor cuts, that could potentially hurt Duke's defense, but this signal is not nearly as strong as the possibility of Duke dominating Pitt on the offensive glass.

3 (Elite Eight) New Mexico. I'm all for giving respect to mid-major teams-- when they deserve it. Ranked #8 in the nation in the polls, #7 in RPI, and given a #2 seed in Joe Lunardi's latest bracketology, things are looking up for the Lobos. With the clear caveat that I have never seen them play, I'm not really sure what all the fuss is about. Ken Pomeroy is with me, and has used New Mexico to drive home a point: as long as the NCAA fails to consider Margin of Victory when making tournament picks/seeding decisions, they will continue to not pick/seed poorly the best teams in the country. Case in point: The Sagarin rankings, which have two parts, one (ELOCHESS) that only considers wins and losses; New Mexico is ranked #8. But when points scored/allowed is taken into account (in the PREDICTOR), New Mexico drops to #33. So, really, most elite teams this year should not be scared of New Mexico.

Duke has some specific advantages, though. One is, again, height, and the Lobos are an average team (Effective height +0.1"). New Mexico has been pretty good on the defensive boards (DR% 72.7; 11th best), but in the components for mid-majors, where most teams are short, it requires a little digging to see how they did against tall teams. Indeed, New Mexico's worst game on the defensive boards was their loss at San Diego State, a team that is only elite at one thing: offensive rebounding (they are ranked 6th in the nation for this component). UNM's best rebounder is Darrington Hobson, who at 6-7, 205 is not the kind of player you really want battling against Brian Zoubek...

New Mexico has other weaknesses that play into Duke's strengths, such as an above-average reliance on three-pointers, playing a generally slow pace, and with inexperienced players (ranked 241st Pomeroy's experience stat, versus Duke, ranked 81st in experience). However, if New Mexico were given a 2-seed, they wouldn't see Duke until the regional-final at the earliest, and the stats suggest that the Lobos will lose well before that point.

4 (Final Four) Villanova. 'Nova leaves the most recent bitter taste in the mouth for Duke fans. For this reason, a Duke fan may not be able to see passed some simple truths: a) Duke, in 09-10, has a better, taller, more versatile team than in 08-09 and b) Villanova's team is about the same as it was last year. Yes, they still have "quick guards" although the tempo-free analysis doesn't really see much other than their shooting percentages and overall efficiency; Reynolds, Fisher, and Pena are all rated above 115.0, which is pretty good. The Wildcats are ranked highly enough that it would be unlikely for Duke to face them this tournament until the Final Four, so we will consider them here.

The tempo-free stats don't pull out a whole lot of things that Villanova does really well. Their free throw percentage (75.6%) is comparable to Duke's, and they pull in a good number of offensive rebounds, especially considering their effective size (which is basically average). While teams at the top of the rankings are going to be fairly good, what we're looking for is whether Villanova's particular strengths and weaknesses lie within Duke's (as defined by tempo-free stats). The biggest red flag is pace: Villanova averages 75.8 possessions/game (11th), which we identified as a potential pitfall for a Duke team that averages just 67.3 poss/g (178th). Villanova also has a balanced attack, not relying too much on 3-pointers, 2-pointers, or foul shots.

But there are more red flags for Villanova. First and foremost: they foul. A lot. A defensive free-throw-rate of 49.7 is the ninth-highest in the nation, and Duke's elite free throw ability will give the Blue Devils lots of free points in this matchup. Villanova has had some atrocious games (at Georgetown and vs UConn) in which the opposing team shot more free throws than they made field goals! That is not a very good winning strategy, especially when in general your defense, relative to other supposed NC-contenders, is lacking. They allow a higher-than-average percentage of 3-point shots (3PTA/FGA = 38.2; 316th lowest), and their opponents make an above-average amount (3PT% against = 33.4; 140th). They are not particularly great on the defensive boards (DR% = 68.5; 122nd), and they don't force many turnovers (Defensive Turnover Rate = 22.0; 100th). It would only take an average-shooting night for Duke to win this one handily.

5 (National Championship Game) Kansas State. Okay, I will fully admit that this selection is a little bit of cheating, so let me explain. Should Duke get to the National Championship game, the range of teams they could face is necessarily much better than the range of teams they could face in the earlier rounds. Examining the rest of the Top 10 in the human and computer polls, there are a lot of teams that make for match-ups that are extremely tough to call from a tempo-free perspective, because each team is pretty good at a lot of things. For teams like Syracuse and Kentucky, there aren't any aspects that scream out "Duke is good at this aspect and the opponent sucks at that aspect!" Or vice versa. There are also teams for which there is convincing evidence that Duke would struggle... but that is for the next post. Finally, picking a Hummel-less Purdue just seems too easy, and without Hummel they probably don't make it to the title game anyway. So of the consensus top teams, the one against which Duke would have the easiest time is Kansas State.

The component of Kansas State's game that speaks the loudest is the fouls. I would imagine that most Kansas State games take three hours or more, with how often the Wildcats (Free Throw Rate: 53.3; 2nd) and their opponents (Free Throw Rate Allowed: 46.7, 309th) spend on the free throw line. Kansas State gets more than a quarter of their points from free throws, way more than Pittsburgh (see above), and the 19th most in the nation. However, once they get there they only shoot 66%, with only one of their regulars (Jacob Mullen) north of 75% individually.

They get around all this fouling by having a revolving door for a frontcourt. Three players (6-9 Wally Judge, 6-10 Luis Colon, and 7-0 Jordan Henriquez-Roberts) each grab just 25% minutes (that's 10 min/game) and each manage to put up foul rates above 7.0/40 minutes. The rest of the frontcourt minutes belong to 6-8 Curtis Kelly, who has decent defensive numbers, but isn't a standout offensive star.

From the team perspective, Kansas State does a lot of similar things to Duke. They are one of the few teams better than the Blue Devils on the offensive glass, grabbing 41.2% of their missed shots, which are frequent (since they shoot 36% from beyond the arc and 50% inside it, both rather mediocre). However, Kansas State, whether by design or by ability, struggles on the defensive glass, much more so than Duke does. This has a recipe for being a physical game, in which both teams finish possessions either with offensive rebounds, putting back missed shots, or by heading to the free throw line. There is one wrinkle that makes the comparison less symmetrical: Kansas State has a high turnover rate (21.4; 216th), and while Duke's defensive turnover rate is not great (22.1; 79th), the turnover battle certainly tips the Blue Devils' way. Overall, in a foul-shooting battle, Duke has the definite advantage, given their much higher free throw abilities. The game would come down to which team lost their best interior player due to foul trouble.

This window into Duke's potential road to the national championship is far from perfect. It leaves out a lot of factors, some that are "intangible" and others that are tangible but simply not measured (unless someone wants to watch thousands of hours of tape to quantify how "quick" certain team's guards are). And while my cherry-picking of teams may seem highly beneficial to Duke, I hope that I've at least illustrated that there are teams out there against which Duke stacks up against mightily. The tempo-free statistics say that Duke is the best team in the nation, by a fairly wide margin. While self-hating fans and Duke-haters alike may not consider Duke to be on their list of "scary" teams, there is clearly room for optimism in Durham.