Monday, December 28, 2009

2009 AFC Playoff Scenarios

One year ago, we constructed a popular post about the playoff permutations in the AFC East. This year, a logjam in the middle of the AFC leaves many meaningful games for week 17.

It is possible that five teams could finish 9-7 (Jets, Denver, Baltimore, Pittsburgh, Houston). In this scenario, strength of conference record ranks the teams first. This eliminates the Texans and Steelers, since both teams would finish at 6-6 in conference, while the other three teams would finish 7-5. Next, record in common games applies:

Versus NE, OAK, IND, and CIN:
Jets: 4-1
Broncos: 3-2
Ravens: 1-3

This gives the Jets the #5 seed and the Ravens would take the #6 seed thanks to their victory over the Broncos.

A different, crazier, possibility exists. With one week to play, it is possible that EIGHT AFC teams can finish at a mediocre 8-8, which in addition to giving ex-commisioner (and lover of parity) Paul Tagliabue a spring in his step, it brings upon the NFL the full power of the sometimes mysterious NFL Tiebreaking procedures. So what if the stars align and all eight teams finish at 8-8?

The key is that ties within divisions are broken first, with only one team per division advancing:

Within Division:
East: Mia > NYJ (head2head)
North: Bal > Pit (Division record)
West: Den
South: Jax (3-way h2h):
Jax 3-1 vs Hou, Ten
Ten 2-2 vs Hou, Jax
Hou 1-3 vs Ten, Jax

In a multi-team tiebreaker, a head to head sweep would prevail but the teams did not all play each other. Baltimore's win over Denver is irrelevant now.

This lands with the games-within-conference tiebreaker, and an 8-8 Jacksonville would have 7 wins in conference, so they're in.
The remaining three teams would be 6-6 in conference.
The next tiebreaker is record in common games (minimum four), and each team has played NE, PIT, IND, and SD:

Denver 2-3
Baltimore 2-3
Miami 2-3

This, troublesomely, leaves each team at 2-3, causing us to enter the magical land of Strength of Victory. Luckily, in the Everyone 8-8 Scenario, many games have been decided for us.

Denver beat: CIN (11), CLE (4), OAK (6), DAL (10, Play PHI), NE (11), SD (12, Play WAS), NYG (8, Play MIN), KC (4)
Ravens beat: KC (4), SD (12, Play WAS) , CLE (4), DEN (8), CLE (4), PIT (8), DET (2, Play CHI), CHI (5, play MIN and DET)
Miami beat: BUF (5, Play IND), NYJ (8), NYJ (8), TB (3, Play ATL), CAR (7, Play NO), NE (11), JAX (8), PIT (8)

So the games we would know for sure:

DEN: 66
BAL: 47
MIA: 58

The superior record of teams the Broncos have beaten is insurmountable, and so the Broncos would get the final playoff spot.

Meanwhile, the wins by the Patriots (over Houston) and Bengals (over the Jets) would put them both at 11-5 and at 8-4 in the conference. Moving on to record in common games against the Jets, Ravens, Texans, and Broncos:

NE: 3-2
CIN: 3-2
Which leaves Strength of Victory:

NE beat: NYJ (8), BAL (8), HOU (8), BUF (5, play IND), ATL (3, play TB), TEN (8), TB (3, play ATL), MIA (8), CAR (7, play NO), BUF (5, play IND), JAX (8) (71 wins)
CIN beat: NYJ (8), BAL (8), BAL (8), PIT (8), GB (10, play ARI), CLE (4), CHI (5, play MIN and DET), PIT (8), CLE (4), DET (2, Play CHI), KC (4) (69 wins)

So New England would have multiple ways to increase its lead over Cincinnati in the Strength of Victory department next week, while the Bengals will have to wait for everything to fall into place in order to grab the 3rd seed.

In summary, the Great Mediocrity Scenario of 2009 would create the following playoff matchups:

1. Indianapolis (bye)
2. San Diego (bye)
3. New England vs 6. Denver
4. Cincinnati vs 5. Jacksonville

Feel free to point out any errors I have made in the comments.

Update 1: Fixed Ravens-Broncos error pointed out in the comments. I blame's difficult to read schedule page...

Monday, September 07, 2009

Congrats to Ross Ohlendorf(and belatedly to A.J. Burnett)

On Saturday, Ross Ohlendorf pitched an immaculate seventh inning against the Cardinals. You can watch it here. I'd have to defer to mehmattski on this, but I would bet that it's the first time that an Immaculate Inning has been throw where every strikeout was a dropped third strike.

On June 20th, AJ Burnett threw an immaculate inning of his own. You would think that a blog written by a Yankees fan and a Marlins fan who also follows Former Marlins, would've noticed when a Former Marlin throws an Immaculate Inning for the Yankees, against the Marlins. To our 6 readers and AJ Burnett, we wholeheartedly apologize for not posting about it when it happened.

Tuesday, April 28, 2009

Immaculate Inning: Daniel Bard

We at Immaculate Inning take a lot of pride in chronicling the rare feat which gives our blog its name; that is, striking out three batters in an inning using just nine pitches. It has only happened 41 times in major league history, but unfortunately we have no idea how common the feat is at the minor leagues. A few years ago it came to our attention that Chris Mason twirled an Immaculate Inning in a AA game. The immaculate inning fires take a while to get stroked with these minor league games, but we are proud to recount the feat of Red Sox prospect Daniel Bard. Major hat-tip to the Projo Sox Blog for bringing the performance to my attention.

Daniel Bard is a 23-year old righthander pitching for the Pawtucket Sox in AAA. He finished last season at AA, retiring 20 of his final 23 batters; that domination has continued into this season, as he sports a 1.69 ERA and has struck out 18 in 10.3 innings. One of those innings, and three of those strikeouts, came against the Rochester Red Wings on April 22. Some video of the immaculate inning can be found here.

The batters were Jason Pridie, Matt Tolbert, and Luke Hughes, the first three batters in the Rochester order. All of them are career minor leaguers, although the 23 year old Hughes appears to be a legit prospect. Pridie appears to go down swinging on three straight fastballs right down the pipe. Tolbert follows the advice of a nearby heckler ("swing!") and misses at three more fastballs from Bard. Some kind of offspeed pitch (curveball) is taken for a strike by Hughes before he swings way late on two fastballs, the second around his eyes. Bard, meanwhile, looks to be rather bored with AAA pitching, and should expect a call up to the majors sometime this season. Regardless of his future, Bard has solidified a place in history with his immaculate inning, and we offer him the highest congratulations!

Tuesday, March 24, 2009

Sweet Sixteen Predictions by Simulation

Now that I've taken a day to recover from watching some 40+ hours of basketball over the weekend, let's revisit the predictions made by my NCAA Tournament Simulator. Here's a link to bracket that I picked based on the highest number of average wins in the tournament. As you can see, the picks did pretty well, landing in the 72nd percentile overall on ESPN. Thirteen of the sweet Sixteen teams were picked correctly, and the bracket lost zero Elite Eight teams over the first weekend of play. The three most notable exceptions were West Virginia, UCLA and Wake Forest. The simulation could not have taken into account how absolutely uninspired these teams would play. It also missed the Western Kentucky over Illinois, since the simulation didn't know about the injury to Chester Frazier.

West Virginia did replace Michigan State in the Most Likely Elite Eight according to the one million simulations. How likely was the first round overall? I wrote a script to count the number of times the simulation predicted the exact first round results in each region:

West = YES! 41733 times!
Midwest = YES! 2325 times!
East = YES! 84894 times!
South = YES! 13648 times!
Overall = Nope. 0 matches.

Upsets of Wake Forest, Utah, and West Virginia at the same time in the Midwest region rarely occurred in the same simulation, and when they did, that simulation did not get one of the other regions correct. In fact, in my pool of 1 million simulations, just 66 produced the correct first round results in three of the four regions. It seems that even if I could have entered all one million simulations, it would not be enough to win Yahoo's Perfect Bracket $1 million. Oh well.

So what do the Pomeroy ratings tell us about the Sweet Sixteen and beyond? To answer that I have two different approaches. One is to simply report the results of the final simulation from Sunday night, the results of which can be found in the data and graphs in this post. Those results are based on the Pythagorean Winning Percentages posted before the first round of the tournament. Four days and forty-eight games (not counting NIT games) later, the rankings are a bit different. How does the added information enhance or suppress the national title chances of each team left in the tournament?

Elite Eight Chances (Click for Chart)
Final Four Chances (Click for Chart)
Championship Game Chances (Click for Chart)
National Title Chances (Click for Chart)

Basically, the inclusion of all the statistics from the tournament games has improved the chances of Connecticut and Memphis winning the national championship, and hurt the chances for nearly everyone else. For Thursday and Friday's games, the teams that most improved were Connecticut (+8.2%), Villanova (+5.5%), North Carolina (+4.5%), and Kansas (+4%). Predictably, the teams that were most hurt by the newer statistics were the immediate opponents of those four teams. UNC-Gonzaga has gone from a tossup (51%-49%) to a more solid favoring of the top seed (55%-45%). The closest game of the Sweet Sixteen now projects to be Oklahoma-Syracuse, with the third-seeded Orange winning 52% of the time.

In the Final Four, Connecticut has actually seen its chances decrease, due to a much higher proportion alocated to Memphis and Missouri, but the Huskies still win the West region in 35% of the one million simulations. From the Midwest, Louisville is still the favorite with a slight edge over Kansas; Michigan State saw a drop in their chances with the inclusion of the new stats. The South is just as open as it was to start the tournament, but Syracuse maintains a healthy advantage, followed by Oklahoma. There is then a huge dropoff between those two and North Carolina and Gonzaga. Finally, the East regional still projects a showdown between Pittsburgh and Duke, with the Blue Devils giving an ever so slight edge (29.00% to 28.28% for Pitt).

The updated stats say that the national title game is less likely to have a representative from the East region, compared with pre-Tourney stats. This is because the four remaining South regional teams all improved their title-game chances, while Duke had the biggest drop of all the teams (from 17.01% to 14.82%). The other half of the title game is still most likely to come from the West, which had Connecticut, Memphis, and Missouri all increase their chances with the inclusion of new stats.

It has, so far, been a tournament small on upsets. The simulator predicts that this trend will continue, with one small exception (#3 Syracuse over #2 Oklahoma), although many of the games project to be very close. One thing that could be improved in the model is the log5 predictions for teams with such similar Pythagorean Winning Percentages. This is one of the things I will be taking a look at in the offseason. In the meantime, it's only two more days until things get kicked off in Glendale, Arizona. Hooray basketball!

Friday, March 20, 2009

Progression of Final Four Chances

At the Immaculate Inning we've been playing all week with different ways to present data generated by our NCAA Tournament simulation. Here is what I feel is the most dynamic view of things: after every set of games on both Thursday and Friday, I set the probability of the losing teams to "0" and re-ran the simulation. I've then graphed the final four chances of every team, by section. You can see the results below (also click here to view the whole source spreadsheet):

You can click on each tab to view the chart for each region. There lots of interesting in trends in each region. The re-simulations are based on the Pomeroy rankings from Thursday, and do not take into account statistics from the first round games themselves.

South Regional: The story here is the Final Four chances of #1 seed North Carolina. Note that these statistics do not take into account Ty Lawson's injury, and yet UNC has had their chances drop over the last two days. More specifically, they've stayed in about the same place, while four other teams have passed them in Final Four chances. Oklahoma is now the odds-on favorite, their chances jumping tremendously with the upset of Clemson elsewhere in the bracket. That is the story throughout; teams rarely improve their own Final Four chances with a win. Instead it's other teams losing that sends waves through the simulations. Three of the four remaining teams in the bottom half of the regional have a better final four shot than UNC, as does Gonzaga in the top half. Arizona State, meanwhile, has climbed from sixth to second in terms of Final Four chances, because they are favored in their matchup with Syracuse (52%-48%).

East Regional: Not much movement going on here, just some strengthening of chances for the favorites as the upsets just don't come. Remember, Wisconsin was heavily favored over FSU in the simulation, so the Seminoles' overtime loss doesn't have much effect on the rest of the regional. Basically, Wisconsin is now at 9%, having added FSU's original 4% to the Badgers' own 5% chances. Among the remainging teams, Texas has the worst chances, since they could have to go through Duke, UCLA, and Pitt (the top 3 teams, statistically), to make it to Detroit. Pittsburgh has the best chance of winning their second round game over Oklahoma St, while the Xavier-Wisconsin game should prove to be the closest of the second round.

Midwest Regional: One of the biggest jumps of the first round was in this regional-- Louisville is no longer the favorite, but instead Kansas wins the West 25% of the time. This only had a little bit to do with Kansas' win over North Dakota St. As you can see from the graph, the gigantic jump came at 5 PM, when simulation favorite West Virginia went down in an uninspiring performance against Dayton. Louisville, the #1 seed, also reached a Final Four chance of 25% by the end of the day, thanks to the upset of Wake Forest by Cleveland State (an upset we predicted in this post). There seems to be two types of games in the second round-- Kansas and Louisville are 80% favorites, while Michigan State and Arizona are favored at the 60-65% rate. If a Sweet Sixteen berth for a low-seeded, mid-major team is your defnition of "Cinderella," then Cleveland State's 38% chance of beating Arizona is the best slipper bet.

West Regional: Not much going on here, because there haven't been that many upsets. Our model did not see Maryland taking out Cal, but clearly they are a different team than the one which put up very mediocre numbers throughout the season. If Maryland can click their offense to the tune of 1.23 points/possession like against Cal, Memphis is going to be in for a long day. Purdue vs Washington is a coinflip (50.9% to 49.1%) and should be a very good game, while Missouri could have a tough time with Marquette.

Final Four Picture: There are tossups in pretty much every regional now, with Louisville-Kansas joining Pittsburgh-Duke and Memphis-Connecticut in the two-dog races. The South regional is as open as ever, and sees Oklahoma as the most likely representative. A UConn-Pittsburgh final still seems to be the most likely, while UConn and Memphis are the only teams winning more than 8% of the time (both are over 11%). This dynamic should change considerably after this weekend; currently the only major change was the elimination of West Virginia. Gonzaga and Kansas have slipped past Duke and are the fourth and fifth most likely championship teams.

So that's where we stand after the first thirty-two games of the 2009 NCAA tournament. Tomorrow and Sunday I'll be updating frequently with the chances of each team's advancement, and I will follow next week with a new simulation from the Sweet-Sixteen onwards! Till then, may your brackets be less busted than mine!

ACC Teams in NCAAT: Day 2

Above is a real time progression of the Final Four chances and the number of average wins for the seven ACC teams using my NCAA Tournament simulation. Yesterday there were a number of interesting trends, including the downward trend of Carolina's Final Four chances despite crushing Radford earlier in the day. In fact, they are no longer the favorite to win the South regional. Maryland improved their average number of wins from 0.44 to 1.15, despite the fact that they now have one actual win. This reflects the 15% chance that they will beat Memphis on Saturday.

As we enter Day 2, it will be interesting to see the chances of Boston College, Wake Forest, and Florida St before and after they play their games. It will also be interesting to follow the progression of Duke and Carolina's chances as the number of upsets increases. My next update will be after the 12 PM games. I don't expect there to be much effect on the ACC teams, but some upsets could send waves through other teams' chances (for example, if Stephen F. Austin upset Syracuse, it would solidify Oklahoma as the South regional favorite).

Thursday, March 19, 2009

ACC Teams in NCAAT: Real Time Chances

Throughout the day, I'm going to re-simulate the NCAAT as each team loses. Then, I am going to plot for each ACC team, their chances of making the Final Four. The first update should be around 2:30 Eastern, and will definitely have implications for North Carolina. Check back here often to see your team's chances change, in real time*!

Starting Chances:

North Carolina (#1 South): 15.28%
Duke (#2 East): 17.61%
Wake Forest (#4 Midwest): 9.72%
Florida State (#5 East): 4.62%
Boston College (#7 Midwest): 1.61%
Maryland (#10 West): 0.90%

*What the hell does "real time" mean anyway? As opposed to fake time? How would I update in fake time, anyway?

Update 1: 3:18 PM

Just ran a new simulation, taking into account the results of the 12 PM games. Three games, and already one pretty large upset, although you wouldn't tell it from the seeds. In the initial simulation, Butler beat Texas A&M 63% of the time. As you can see, there is not much change for the ACC teams. Click on the other tabs to see a handy progression chart for Final Four chances and for Average Wins. The biggest positive effects seem to be on the chances of UConn and Texas A&M making the final four (up 3-4% each), while no teams dipped all that much. The next update will be around 5 PM with the results of the 2:30 games, which will have a much bigger impact on the ACC teams, since two of them are playing...

Important note: the Pythagorean Win Percentages used to make this simulation are different from the ones used Sunday. I mistakenly did not save the original rankings, and the new rankings take into account adjustments based on the NIT results... if an NIT team played well, all of their opponents will have better adjusted stats. That is the reason why teams like Wake had their chances change from pre-tourney. I think the rest of the first round I will use today's statistics, rather than have them adjust each time.

Update #2: 12:52 AM

The results of Day 1 of the NCAA tournament are final. There were some exciting finishes in the first sixteen games, and by the seeds only one true upset. However, by the statistics there were some fairly unlikely results; BYU was favored 2-to-1 over Texas A&M, and Maryland was a 3-to-1 underdog against California. But as they say, that's why they play the games. Overall the "average wins" bracket was 12 for 16 (75%) on the first day, and lost zero teams beyond the second round. One of the games was as close to a coin flip as one can probably get; Butler beat LSU a slim 50.62% of the original simulations.

For the ACC, the major changes are obviously for Clemson, upset by a hot shooting Michigan team, and Maryland, whose one actual win only improves their "average wins" score by 0.74! In terms of Final Four probability, both Duke and Carolina saw their chances decrease throughout the day, despite winning. This is because while both teams were heavily favored to win their games, the teams in their way were not as heavily favored. In those matchups where Duke was playing Minnesota and American on the way to the Elite Eight, Duke would be the heavy favorite; those matchups are now impossible in the simulation.

Overall, the team with the biggest "bump" today was Memphis, which rose to an 11.73% chance of winning it all, thanks to the Maryland upset. Connecticut also benefited from the Texas A&M "on paper" upset, rising to 11.14%. Those two teams now sit at a combined 50% chance to win the west region; it doesn't look promising for the challengers there.

The biggest story is probably that North Carolina is no longer favored to win the South regional. After their win over Morgan St, and a very favorable matchup against Michigan in the second round, raised Oklahoma to 22.93% chance to make the Final Four. This is exactly the sort of thing we were looking for with these predictions-- how the matchups dictate who has the best chances to survive and advance. This will probably change dramatically tomorrow, especially if Syracuse and Arizona St. hold serve in the rest of Oklahoma's bracket. Certainly something to keep an eye on.

Immaculate Inning Bracket

My NCAA tournament simulations have been the most popular thing I've ever done on Immaculate Inning. With the tournament starting in one hour, I thought I'd get my personal pics out there. First of all, here is the tournament, selected simply by picking the team with the most average wins in the tournament (click to enlarge):

But there's more to March Madness than simply statistics. Here is what I call the "Educated Intuition" bracket. It resembles the simulation bracket because I used those to educate my decisions. However, I overrulled the bracket in several key matchups. Plus, I always have to have one bracket where Duke wins it all!

I'll be coming back to Tournament Simulations and breakdowns throughout the weekend and into next week. Thanks for visiting Immaculate Inning for your tourney prognostication needs!

Tuesday, March 17, 2009

Upset Special!

Hello again, welcome back to Immaculate Inning as we continue our week-long dive into the NCAA tournament, simulation style. In case you missed the posts, I've simulated the tournament one million times, and I've pulled from the data the most likely championship games and final fours. The link to the all-mighty spreadsheet (here).

This time I'm going to take a look much earlier in the tournament, as we fast approach the most exciting weekend of the sports year. Everybody loves a Cinderella, and everyone wants to brag about how they picked the upsets that filled the perfect brackets at work on Monday. This is going to be different from upset analysis you may have seen elsewhere, such as AccuScore, which simulates individual games 10,000 times. I've simulated the result of each game in the tournament once, then repeated that one million times. That number of simulations allows me to use statistical power that not even the flashy WhatifSports can match.

First, let's look at the upsets that are matters of probability; the efficiency ratings say, point blank, that the lower seed should be favored to win.

Upset Special #1: #10 Southern California (65.5%) over #7 Boston College (34.5%). The Trojans have the highest percentage of winning the first round game for any double-digit seed, and they might not have even been in the tournament if it weren't for capturing the Pac-10 tournament title. Both teams are strong on the offensive glass and weak on the defensive glass, and both teams don't take very many threes. This game could be a bruiser in the paint. One trouble spot for a USC upset potential is their poor free-throw ability; in a close game, Boston College has a clear edge there.

Upset Special #2: #12 Wisconsin (53.1%) over #5 Florida State (46.9%). As an avid fan of nearly all ACC teams when it comes to the tournament, this one hurts. The Seminoles enter the big dance as one of the hottest teams in the nation, knocking off (an admittedly wounded) North Carolina on the way to a runner-up finish in the ACC Tournament. Toney Douglas is exactly the kind of player that can go off in a big tournament and carry his team a long way. Wisconsin, meanwhile, is plodding-- 59.9 possessions is 334 out of 344 division 1 teams; is mistake-free-- #5 in turnovers/possession and #6 in steals/possession in the nation on offense. They also failed to win twenty games and have no one particularly scary. This is one where I personally would have a hard time following my own simulation, but they won just 0.82 games on average, by far the worst among the #5 seeds.

In terms of pure upsets predicted by the simulations, that's it for the first round. In general, if we were grading the committee based upon how well they matched higher seeded teams with higher Pomeroy efficiency ratings, they did pretty well. However, there are quite a few games that are "too close for comfort," when taking the seeds into account.

TCFC #1: #3 Kansas (80.7%) vs #14 North Dakota St (19.35%). NDSU, in their first tournament in their first year of eligibility, is a favorite upset pick among statheads like myself. The numbers were prettier a few weeks ago, but the Thundar (really? Thundar?) put up a pretty good offense for a minor-conference team. They can shoot lights out (40.2%, 10th in the nation), and Kansas hasn't defended the 3 very effectively this season. They also protect the ball pretty well (14th in turnovers/possession), while Kansas does not (244th). Bill Self's squad could be in trouble with this one.

TCFC #2: Dueling #13 seeds-- Mississippi St (23.8%) and Cleveland St (24.9%) both have much higher chances of knocking off their respective 4-seeds (Wake Forest and Washington). While the SEC champs would make for a nice story, the clear media favorite would be Cleveland St, a team which upset Butler in the Horizon league final to make the tournament. The Spiders won't spook anyone offensively, but they have a defense that is among the nation's best at taking the ball away. Washington, meanwhile, are in the middle of the pack in taking care of the ball, and their size should be more than enough to take care of Cleveland St. If I were the Huskies, I wouldn't be sleeping easy about a 1-in-4 chance of losing, however.

As for Wake Forest, I think we're noticing a trend; my simulation hates ACC teams not named Duke or Carolina. The other team not mentioned yet is Maryland, and my simulation has Maryland winning the fewest average games of any 10 seed, although they have a better shot at winning their opening round game than Michigan does, barely (35%). The folks filling out their bracket on ESPN disagree strongly, favoring Maryland over Cal 2-to-1.

Most casual bracket-fillers will lose interest after their brackets are busted by sometime Sunday evening; but the one who picks the correct surprise Sweet Sixteen teams is going to be the one bragging come Monday morning. So which low-seeded teams have the best chance to be standing after this weekend? These teams showed up in the Sweet Sixteen in at least ten percent of the simulations:

Wisconsin (#12 E): 26.5%
Southern California (#10 MW): 26.3%
Arizona (#12 MW): 17.9%
Michigan (#10 S): 10.9%
Minnesota (#10 E): 10.3%

I think it would be wise to be cautious about picking these #10 seeds to win two games this weekend. To see why, consider what the simulation was doing: picking at random (weighted by expected winning percentage) the winner of each game. So in some number of trials, the #2 seeds fell in the first round (Robert Morris and Morgan St. each won 8% of the time, for example). In those scenarios in which the #15 and #10 teams both won, the #10 seed is going to be a heavy favorite in the second round game. This inflates the chances of a #10 team making it to the second round; only a little bit has to do with the ability of the #10 seed to beat the #2 seed, by far the more likely opponent.

This is not the same with the #12 seed "Cinderellas" (not that major conference teams could ever count as such). Their upset win pits them, at worst, with a similarly-seeded #13 seed. Their high percentage really does suggest good matchups.

To finish, I present the best chances of winning two games this weekend, by seed:

1 seed: Louisville (80.23%)
2 seed: Memphis (83.98%)
3 seed: Missouri (60.50%)
4 seed: Gonzaga (68.66%)
5 seed: Purdue (47.16%)
6 seed: UCLA (54.30%)
7 seed: Clemson (34.66%)
8 seed: Brigham Young (24.29%)
9 seed: Tennessee (13.42%)
10 seed: Southern California (26.29%)
11 seed: Temple (9.05%)
12 seed: Wisconsin (26.51%)
13 seed: Cleveland St. (8.33%)
14 seed: North Dakota St. (4.01%)
15 seed: Robert Morris (1.62%)
16 seed: East Tennessee St. (1.09%)... yes, they have a 6% shot at beating Pittsburgh....

The Most Likely Final Four

Sorry that it has taken so long since my last post, I know that the masses are in need of more data, and help filling out their brackets. I have been working on a Python script to parse the massive amounts of data I produced with my 1 million NCAA tournament simulations. Essentially, what resulted is a data file containing the winners of each game in a single simulation; that file is 611 MB, if you were wondering. What I have done is pull out from that massive file the most common Final Fours and the most common Championship games, which I will present in a minute.

Yesterday was the most successful day in Immaculate Inning history, with over 740 unique visitors, most of you coming from I want to take a minute and point out some differences between what you'll find here and what other sites are producing. First, I noticed this article by the Wages of Wins Journal-- they do basically what I did for the ACC tournament, using both Pomeroy and Sagarin ratings. It's important to remember that the data on that site is discrete probabilities multiplied against each other; it's impossible to know how the winner of one game will affect the rest of the tournament.

Next, we have Joel Sokol of Georgia Tech, who uses a logarithmic regression model, based solely on margin of victory, to rank every team in Division I. He selects his bracket by picking the team that ranks higher, and according to his analysis, this method outperforms every other major bracket-picking method, whether it's seeds, ESPN's experts, or Sagarin rankings. That's pretty impressive, but once again, his choices do not take into account the effect of upsets on a single tournament.

Finally, there's a competing NCAA tourney simulation by Upon Further Review. There are two main differences between that simulation and mine. First, and perhaps most important; he doesn't show his work. A cursory look at the rest of the website shows a predilection for Basketball Prospectus, so perhaps we can assume he used efficiency ratings, but we just don't know. The second difference is that his is just 1,000 simulations. I'll admit that it doesn't seem obvious at first why having 1,000 times more simulations is necessarily better, other than the novelty of seeing Alabama State winning the tournament one or two times. I'm hoping to convince folks that the one million simulations really are better, because I can produce results like these: (click here to view the full spreadsheet)

The Most Likely Championship Game: Connecticut vs Pittsburgh

I searched my simulation output file for the winners of the initial final four matchups-- the championship game participants. There were 840 different matchups in the one million simulations. The championship games appearing in at least 1% (1,000) simulations, in order of decreasing likelihood:

Connecticut / Pittsburgh : 2.21%
Memphis / Pittsburgh : 1.86%
Louisville / Pittsburgh : 1.71%
Connecticut / Duke : 1.66%
Connecticut / North Carolina : 1.59%
Memphis / Duke : 1.43%
Memphis / North Carolina : 1.33%
Connecticut / Gonzaga : 1.31%
Louisville / Duke : 1.29%
Connecticut / Oklahoma : 1.28%
Connecticut / Syracuse : 1.27%
Louisville / North Carolina : 1.26%
Connecticut / Arizona St. : 1.22%
Connecticut / UCLA : 1.22%
Memphis / Gonzaga : 1.12%
West Virginia / Pittsburgh : 1.11%
Memphis / Syracuse : 1.09%
Memphis / Oklahoma : 1.09%
Louisville / Gonzaga : 1.02%
Memphis / UCLA : 1.02%
Memphis / Arizona St. : 1.02%

I'm fairly confident that a simulation of only 1,000 tournaments would be unable to separate the occurrence of one game versus another with any kind of power. As you can see, the first three most likely Championship Games include Pittsburgh. UCLA and Arizona St, both six seeds, are the lowest seeds commonly making an appearance in these most likely title game matchups. The left side of the bracket, representing the West/Midwest half of the tournament, appears a lot more stable than the right side; with one exception (WV), just three teams are represented: Louisville, Connecticut, and Memphis. The right side of the bracket, meanwhile, has a lot more variability, with three teams from the East and four from the South each making an appearance in the likely title games list.

In case you're worried about my arbitrary cutoff of 1%, the next three most common championship games all featured Louisville (vs Syracuse, Oklahoma, and UCLA), followed by a Michigan St-Pittsburgh matchup and yet another Louisville game (vs Arizona St). Following a unique matchup between Purdue and Pittsburgh at 0.90%, there is a sharp dropoff in the frequency. The first 25 or so matchups are clearly the most common, and therefore the most likely. I suppose it means that if you are looking for a sure thing, Pittsburgh is a good bet to make the title game. However, if you're looking for a sleeper (not a #1 or #2 seed) to make the title game, it would be better to replace Pittsburgh with UCLA, Arizona St, or Gonzaga, because low seeds making the title game out of the West and Midwest is just not likely.

The Most Likely Final Four: Connecticut, Louisville, Pittsburgh, Oklahoma

As a Duke fan, I was saddened that Duke did not represent the East region in the most likely final four. However, I am overjoyed that the only non-#1 seed to be there is North Carolina...
The power of the #1 seeds was actually quite strong-- the first five most likely brackets, representing nearly 1 percent of all simulations, featured UConn, Louisville, and Pittsburgh (one of which also included North Carolina). Anyway, there are 26,790 unique final fours in the simulation, 6,134 of which appear only once. Only 2,434 Final Fours occured more than 100 times (0.01 percent). The most likely final four, listed above, occured 2009 times (how's that for symmetry), or 0.2 percent.

Once again, the top heavy nature of the West region was clear; it was not until the 42nd most common final four that the West representative was not Connecticut or Memphis (it was Purdue). The first nine most common final fours list Louisville as the Midwest champ, and some sprinklings of West Virginia and Michigan State follow until the 37th most likely final four, which features Kanas. In the East, Pitt did capture those first five spots, and most of the top 20 (replaced by Duke in five of them, then UCLA in the 21st most likely final four). The first team to come out of the East that was not Pitt, Duke, or UCLA was Xavier in the 48th most likely Final Four. Finally, the South is just as wide open as we've been advertising, with five different teams in the first five most likely scenarios!

What does all of this mean for you, humble bracket filler? It means that under the most common bracket pool rules, (more points for late round games than early round) someone is going to win the pool by picking the correct South regional winner. The other regions are farily top-heavy with just a few likely options, but the South is where the money is at. These breakdowns don't really point to a favorite in the five-team cluster, although the initial simulation calls North Carolina the favorite.

It is a bit strange to note that Memphis is neither in the most likely title game, nor the most likely Final Four. They were a slight favorite to win the tournament in the initial simulation, just beating out UConn. I suppose you could say that whoever wins the West regional should be the odds-on favorite to capture the title!

Xenod and I are working on expanding the search through the simulation to incorporate the Elite Eight and Sweet Sixteen. I'm not sure if 1 million is enough to tease apart the variance at those levels, but we will try. I'll also take a look at first and second round matchups from a different perspective. Stay tuned to all the tourney simulations you can handle, right here at Immaculate Inning!

Sunday, March 15, 2009

NCAA Tournament Predictions Using Simulations

If you're looking for 2010 NCAA Tournament Simulations, you can find Immaculate Inning's One Million Simulations right here!

It's been a crazy Championship Week across the NCAA, and parity ruled supreme across the land, leaving many college basketball fans scratching their heads as they attempt to fill out their brackets. Well, we at the Immaculate Inning have a treat for you: a complete breakdown of the recently NCAA bracket based on the log5 prediction system and Ken Pomeroy's efficiency ratings. I did this for the ACC tournament by painstakingly filling out an Excel spreadsheet and running the numbers essentially by hand. This time was a bit different.

The Method. Briefly, this simulation takes in the "Expected Winning Percentage" calculated by taking the number of points a team scores and allows and transforming it into a win percentage. Instead of using raw scoring figures, I'm using the metrics invented by Ken Pomeroy, which take the tempo of a game out of the equation- we're dealing with how efficient a team's offense or defense is. Next, using the log5 prediction method (linked above), we can calculate how often a team with a given winning percentage is likely to beat another team with a given win percentage. For example, a team with a .600 win percentage is projected to beat a team with a .400 win percentage 69.2% of the time.

How well a team does in the NCAA tournament is affected by three things: how good a team is, how good their opponents are, and how likely it is to see a particular opponent. So while Louisiana State may salivate at the possibility of playing Radford in the second round of the tournament, it's just not likely to happen. For the ACC tournament, I calculated discrete probabilities for each matchup. This is where I've done things a bit different. I have created a computer simulation (a script in the Python language, thanks to Xenod for guidance and helpful tips) for the NCAA tournament, and then I run it a bunch of times. The outcome of each game is random, weighted by the expected winning percentage of each team. The result is not just another table of log5 projections, but is the result of 1 million simulated NCAA tournaments. It's how the tournament looks, "on paper."

So how did your favorite team fare in my simulations? Take a look at the spreadsheet below to find out! (It can also be accessed here for your sorting pleasure.)

The spreadsheet has tabs for each region; they are currently sorted by the "4" column, which is the chances that a given team will win "at least 4" games. That is, it is the chances that a team will win its region, advancing to the Final Four in Detroit. The other columns are similar, recording the percentage chance a team will win that many games. The difference is the "All Teams" tab, which is sorted by "Average Wins." This is the average number of wins a team accrued across the 1 million simulations. It ranges from Memphis (2.81 wins) to Chattanooga (0.02 wins).

Now that all the data is out there, what does it mean? I believe this data can tell us a great deal about how the tournament was set up by the committee, and who has the "hardest" and "easiest" roads to the Final Four and the national title. To begin, the finding that Memphis has not only the largest number of average wins, but also the highest chances of winning the title, is not surprising. Pomeroy's ratings place Memphis squarely atop the nation, led by an amazing team defense. John Calipari's team continues to get little respect nationally despite three straight regional final appearances. The statistics say there is a high probability they will make it four straight.

The South region has the most parity, with five teams winning four games (and the region) at least 10% of the time. Interestingly, top-seed North Carolina ranks third in Final Four appearances from this group, behind Oklahoma and Syracuse. However, if North Carolina does survive the region, they have by far the highest number of national titles (5.68%) from the South region.

Six teams made the national championship game in at least ten percent of the simulations: Connecticut, Louisville, Pittsburgh, Memphis, and Duke. Obviously, only one of the UConn-Memphis and Pitt-Duke pairs can make the title game, but I think it speaks to the lower overall level of performance from teams in the West and East regionals. Indeed, in those regions, the #1 and #2 seeds accounted for the regions' champion more than 40% of the time, while in the Midwest, Louisville and Michigan state came close (39.9%). The South, meanwhile, lags far behind- the champ was either UNC or OK just 32% of the time.

Among the lower seeds, three of the #6 seeds stand out as having higher than average chances of going to Detroit. West Virginia, ranked highly by Pomeroy all season, is the highest non-1-or-2 in terms of Final Four percentage, at 15.74%. Their first round matchup against Dayton ranks as one of the least upset prone games of the first round. How West Virginia fares in this tournament is perhaps a test case to the Pomeroy method-- how important are wins and losses, really, when you play pretty well in all those losses?

A similar case is UCLA, given a 6 seed in the East region despite having one of the best offenses in the country, statistically speaking. While their opening round game against VCU is no joke (and this Duke fan would know about that), they have the greatest chances of an Elite Eight appearance other than Duke and Pitt in this region. A third six-seed with high hopes could be Arizona State, in the apparently wide open South regional. Should ASU get past a tough Temple matchup in the first round, my simulator likes their chances against either Syracuse. Marquette is the odd six seed out in the simulations, with the lowest number of 1 win and 2 win simulations for six seeds.

I will have much more on these simulations in the coming days, eventually culminating in The Immaculate Inning Most Likely Bracket-- which of the 9 quadrillion possible baskets would Pomeroy's efficiency rating tell us to fill out?

If you have any suggestions on what kind of data analysis to do, how to improve the method, or if you'd like a copy of my Tournament Simulation script, comment here or shoot me an e-mail at mehmattski AT gmail DOT com. March Madness baby!

Friday, March 13, 2009

Updated ACC Tourney Probabilities

With the first six games in the ACC Tournament complete, let's revisit the log5 predictions, which are based on tempo-free efficiency ratings accrued in ACC games only:

These are up to date following FSU's escape of Georgia Tech in the second afternoon game. As you can see, North Carolina has increased their chances of winning the tournament to better than 50/50. Duke's tournament chances have actually gone down, caused by no longer having the possibility of playing Virginia. Both of tonight's quarterfinal games have a similar 4-to-1 advantage for favorites Duke and Wake Forest. Maryland doesn't have much of a chance of winning the tournament, but should they pull the upset tonight, would that be enough to get the ACC a seventh team in Teh Dance?

The other story lines remaining in the ACC tournament are all about seedings. Carolina probably locked up their #1 seed with a win, considering that they're actually still playing, unlike UConn, Pitt, and Oklahoma. The ACC results are not in a vaccuum, the seedings of Duke and Wake are heavily influenced by the results of the other tournaments. For example, someone upsetting Memphis or Louisville capturing the Big East tournament would have top-seeded implications.

Stay tuned to Immaculate Inning for all your March Madness projection needs. We've got a big project in the works to unveil late Sunday or early Monday. NCAA Hoops- Awesome!

Wednesday, March 11, 2009

ACC Conference Play: Devourer of Stats

A few weeks ago, I made a very critical post about the 2008-2009 Duke team. Having come off of a very poor stretch, the once promising Blue Devils seemed to be succumbing to conference play, with disastrous consequences. I concluded that Duke's pounding of non-conference foes was clouding our view of their standing, statistically speaking. Pomeroy's rankings simply cannot account for the evolution of a team throughout a season; they treat a November blowout win the same as a February blowout win. And conventional wisdom would treat the latter as more indicative of a team's chances in March.

With the ACC season complete I thought I'd take one final look at Duke's performance between conference and non-conference play. The result is not pretty:

In red are all the categories in which Duke is performing worse in ACC play compared to out of conference play (this includes 2009 games against Davidson, Georgetown, and St. John's). With the exception of turnovers on offense, Duke is not playing as well. But clearly, the level of play in the ACC must affect all teams. So I then tallied up every team's tempo-free performances. Rather than post another spreadsheet, the results can be found here. Some major points:

1) Nearly team saw both their offensive and defensive efficiencies drop when they were playing against ACC opponents. In fact, nearly every cell in the "Difference" part of my spreadsheet is colored red, meaning teams were also worse in other statistical categories. This probably makes sense, since the ACC is ranked the #1 conference by Pomeroy, and the #1 conference by Sagarin.

2) Overall, offense was less affected than defense. During ACC play, the conference teams averaged an efficiency of 104.5. Compared to the national average (100.1), it means that the ACC has an offense-heavy atmosphere. It would take further analysis to prove this point, but I believe this could have an effect similar to the "ballpark effect" in baseball; if the Oakland A's hit 300 home runs as a team, it would be more impressive than if the Colorado Rockies did it. By analogy I am suggesting that having a good offense in the ACC is not as impressive as having a good defense. This points a praising finger squarely at teams like FSU and Duke, the only teams to have defensive efficiency ratings below 100 during conference play.

3) Florida State is a major exception. While everyone elses' offensive efficiency was dropping, Florida State actually improved their offensive efficiency in conference play. A large part of this comes from another category-- turnover rate. Along with Duke, the Seminoles are one of two teams to improve their turnover rate on offense against ACC foes. Their effective field goal percentage and offensive rebounding rate were not as affected by conference play as well.

4) NC State is probably the biggest culprit of Cupcake Syndrome. The Wolfpack's offensive efficiency dropped by 8.4 points in ACC play (the worst drop the conference), and their defensive efficiency also dropped, by 18.0 points. They were an average team until January, and simply not a very good team in conference play.

5) There is no evidence for the conception "The ACC refs call more fouls than the rest of the nation." Collectively, the ACC teams went to the charity stripe during 35.4% of their possessions during league play, compared to 41.4% of possessions in non-conference play. While free throw rate is not a perfect proxy for the number of fouls called, it is obvious that the the ACC refs aren't as whistle happy as some would have you believe. On the other hand, during non-conference play, the opponents of ACC teams went to the free throw line in just 30.0% of possessions. There is certainly a connection between level of play and the number of fouls called; bad teams have bad defensive positioning and would tend to be whistled more often.

6) Continuing on the foul theme, Duke was near the top of free throw rate in conference (39.9%, 4th) and out of conference (46.2%, 3rd), but by no means any fuel for the DukeGetsAllTheCalls morons. In fact, every team (including Duke) saw their opponents go to the free throw line more frequently during ACC play, except one. That would be the Carolina Tar Heels, who inexplicably allowed free throws on 4% fewer possessions, compared to out of conference play. I'm not suggesting conspiracy, it's probably due to their Swiss cheese approach to half-court defense...

7) Wake Forest's defensive woes may be a bit misleading. Sure, they saw the biggest drop in defensive efficiency (19 points) of any team in the league. But, during league play they still have the best defensive effective field goal percentage, and the best defensive rebounding rate, of any team in the ACC. In this case, I'm guessing the problem was a cupcake pre-conference schedule (ranked 275th by Pomeroy), rather than some exposure by better competition.

8) Finally, we return to the most overanalyzed team in the country: Duke. It's amusing to me that any casual college basketball fan in the country right now can point to seven different reasons why the Blue Devils are not poised for greatness: they lack depth, they can't stop quick guards, they can't stop an inside presence, they don't play enough zone, they don't adapt in-game, ad nauseum. I wonder if those fans can note weaknesses so easily in other top 10 teams? Still, even I was receptive to this line of thinking a few weeks ago. But my comparison is clear: Duke is in the middle of the pack when it comes to their statistics being "affected" somehow by non-conference play.

In fact, contrary to my conclusions a few weeks ago, Duke's defense is one of the least affected by ACC play. On offense, Duke turns the ball over less frequently than any ACC team, and have respectable rebounding numbers for a team with "no inside presence." The lesson: stop making judgments in a vaccum; statistics can be misleading if they are not in a relative context.

Monday, March 09, 2009

2009 ACC Tournament Predictions

Some of the hardest days as a sports fan come during early March; the worst of all are the four days between Selection Sunday and the first full day of NCAA tournament games. For this ACC fan, it is equally hard to bear the four days between the Duke-Carolina rematch and the start of the ACC tournament. Sure, there are plenty of actual games between now and then, but few of them actually matter, save the random upset of a top 25 mid-major and the corresponding bubble implications. To pass the time, I repeated an exercise I completed two years ago this week: predictions for the ACC tournament using the log5 method.

There will no doubt be predictions using Ken Pomeroy's rating system, all over the internet. (Here's one simple example.) I want to do something different; how do the predictions change, based on whether I use:

1) Winning Percentage
2) Raw Points Scored/Allowed
3) Pomeroy's Rankings (Full Season)
4) Raw Efficiency (ACC Games Only)

What follows are four Google spreadsheets tallying the information. Each sheet has three tabs: the calcuated winning percentage for each team. For tests 2 through 4, my formula follows Ken Pomeroy's: PF^11.5/(PF^11.5+PA^11.5). The next tab shows the chances that the team in a given column will beat the team listed in a row, using the "log5" formula, discussed here. Finally, mindful of the ACC Tournament Bracket, I predict each team's chances to advance to the Quarterfinals, Semifinals, Finals, and their chances of being 2009 ACC Champion. Let's start with raw winning percentage.

So you can see that Duke has an .806 winning percentage, a 31.6% chance of beating UNC, and a 15. 6% chance of winning the ACC tournament. Of course, winning percentage is kind of silly, because blowouts and squeakers count exactly the same. For this reason many baseball stat-heads turned to Pythagorean Win Percentage, which calculates a team's likely winning percentage given how much they score and how much they allow. This can be applied to basketball as well, with the following result:

Some pretty big changes already. First off, Duke has vaulted above Wake and is now favored to make the finals against a still-overwhelmingly-favored UNC team. The middle of the pack has changed considerably; Miami has doubled their chances, while Clemson has had theirs halved. We know that the Pythagorean Winning Percentage is flawed, Baseball Prospectus also follows what they call "Third Order Wins." By this they mean that how much offense/defense is not as important as the context in which the points were scored.To put it in 2009 terms, which team has the better offense:

VMI-- Points/Game: 93.8 Possessions/Game: 81.2
Duke- Points/Game: 78.7 Possessions/Game: 70.1

It is true that VMI scores 15 more points per contest than the Blue Devils; they are the most prolific scorers in the nation. However, VMI plays at the fastest tempo in the country, getting over 11 possessions more per game than Duke. Teams play different opponents every game, which could have a wide variance in the number of possessions. So, a fair comparison of offenses requires looking not at a team's raw scoring numbers, but at how efficiently a team scores in the possessions it gets. With this, it is clear that Duke has the better offense.

So what if we were to predict the results of the ACC tournament using Offensive and Defensive Efficiency, as provided by Ken Pomeroy? For this run I will also take each team's schedule into account by using Pomeroy's "Adjusted" efficiency ratings; teams are penalized if they run up high efficiencies against bottom feeding teams. The results are provided in an earlier link, but I'm showing my work:

While the chances of favorite UNC have remained largely the same, the effect of tempo-free statistics and the schedule have boosted Duke's chances by 5%. Most of this comes from an ever-increasing chance of beating Wake Forest on a neutral court: from 46% using just win percentage to 56% with Pythagoras to 62% tempo-free.

Frequently, when I use these tempo-free statistics, some folks are not convinced. They think that the adjustments for schedule made by Pomeroy are not enough, and that teams are different in league play than they were playing non-league foes before the new year. In addition, the ACC tournament is taking place between only ACC teams, so shouldn't statistics within the ACC matter more? On the other hand, the ACC no longer has a balanced schedule; for example, Boston College played Duke once (at home) while they played #12 seed Georgia Tech twice. I have not attempted to adjust for schedule here, so these are raw efficiency numbers:

The most striking result is that the top three teams (UNC, Duke, Wake) have had their chances all go down, relative to the full-season Pomeroy ratings. These extra chances have been split among a few teams. Clemson's title chances went up by 3 percentage points. Florida State, whose defense has improved tremendously since the clock ticked to 2009, have doubled their title chances (as have Boston College).

NCAA Tournament Implications:
1) The 8-9 game is not the closest of the first round. That distinction belongs to NCSU vs Maryland, according to all four metrics. That is not a good matchup for anyone who thinks that Maryland is still on the bubble.
2) Virginia Tech is pretty screwed. Like Maryland, they are a 7-9 ACC team, and the committee doesn't usually take kindly to a sub-.500 conference record. They are probably out of the tournament picture unless they make it deep, and the statistics say it's not probable at all.
3) The final 7-9 team, Miami, has to avoid a collapse against Virginia Tech, and then they face 2-to-1 odds against in the matchup with Wake Forest. Should they prevail, would the committee consider what then would be a 20-win ACC team?
4) Statistically, the top three seeds are very heavy favorites for the semifinals, with Duke and UNC more likely to be there than Wake. Should Duke win the two games as expected, would they still have to beat Wake Forest to get a #2 seed in the NCAAT? Certainly, the Deacons probably need to win the ACC tournament to get their own #2 seed.
5) Clemson and Florida State should both be solidly into the NCAA tourament, but they are playing for favorable seedings. By the ACC numbers and the overall Pomeroy ratings, Clemson is favored in a matchup with Florida State, and the Tigers are more likely to knock off UNC.
6) Spreadsheets are fun!

Monday, February 16, 2009

Remember When...

...Duke had a good basketball team? Those were the days:

Stats taken from Ken Pomeroy's site, which are tempo-free statistics. Scroll over to the right for the defense stats. Efficiency is measured by Points/Possession x 100, and the other percentages are also per possession. The rankings are through games on February 15, showing the clear difference between teams Duke played in the calendar year 2008 compared with those played in 2009 (The ACC season plus Davidson and Georgetown).

Duke has also played about 10 percent less efficiently on both ends of the court in the last thirteen games. Depending on how much schedule played a part, this is either terrible or perhaps expected. Still, here is the defensive efficiency rating turned in by Duke in the last six games, along with that team's Pomeroy offensive rating (in parentheses):

@ Wake Forest (51): 92
vs Virginia (126): 77.8
@ Clemson (15): 118.3
vs Miami (26): 97.4
vs North Carolina (1): 127.6
@ Boston College (22): 116.2

Perhaps this performance is no accident; four of the five best offenses Duke has faced this season have been in the last four games! The only other defensive efforts worse than 1.00 points/possession came against Georgetown (19th ranked offense), at Michigan (66), and against Rhode Island (31). While it may be tempting to cut Duke some slack because of the level of offensive play they've been against, getting far in the NCAA tourney by hoping to play crappy offenses is not exactly a winning strategy.

Breaking it down, the component statistics that most closely parallel Duke's declining efficiency is rebounding. Duke is not a horrible rebounding team on the offensive glass, grabbing about 40% of their own missed shots, good for 15th in the nation. On the defensive glass, however, things are a lot less pretty: 31.8% of Duke's opponent's missed shots are grabbed by the other team (128th nationally). This particular statistic has gotten much worse in ACC play, where opponents are grabbing 35% of their missed shots.

I have not said much about the offense because it hasn't been as much of a problem in 2009, games versus Clemson and Wake Forest excepted. North Carolina is frequently seen to have a soft defense but it still ranks 14th in the nation, and the fourth best defense Duke has seen (behind Purdue, Wake Forest, and Florida State). Duke ran out a 109.9 offensive efficiency against UNC at Cameron, which is not too shabby-- it was the 127.6 stinker on defense (worst of the season for Duke) that was the deciding factor in that game. While Duke is indeed limited by some of the best defenses, such as at Wake Forest (89.3) and Florida State (98.0), it is not as stark a difference as on defense. What Duke's offense is not, however, is an offense that can carry a poor defensive performance to victory against top teams.

Curiously, none of the main component statistics stand out for Duke, which has the fifth best defense in the nation as ranked by Pomeroy. The only stat that kind of stands out is that Duke makes a steal on 12.9% of possessions, which ranks 19th nationally. The other stats, which Pomeroy calls the "four factors" are above average but none seem to scream "This why Duke has the fifth ranked defense." Indeed, just going by raw statistics Duke is ranked 29th, which seems a lot more in line with the components. Pomeroy weights each performance by the strength of schedule, which is how Duke ends up fifth. But if Duke plays exceptionally against poor offenses and poorly against exceptional offenses, weighting by schedule is going to skew the actual ability of Duke's defense.

The conclusion I am forced to draw is that Duke's high defensive efficiency was inflated by a mediocre non-conference schedule (ranked 91st by Pomeroy), and is incapable of playing at an elite level against top-25 offenses. Last Tuesday night, before the Duke-Carolina game, Mike Krzyzewski spoke to the Cameron Crazies. One thing he said stuck with me: "You guys don't need to chant 'Let's Get Hungry.' This team is hungry, believe me they are hungry." And so when I watched the dismantling of Duke's defense by UNC and by Boston College four days later, I couldn't help but think of this South Park clip:

If Duke is interested in keeping their Pomeroy ranking high, may I humbly suggest abandoning the Atlantic Coast Conference, with a move to the less elite A-10. Or at least the Big Ten. In the meantime, I'm looking forward to the Annual Duke Sweet Sixteen Bowout. Because at this point in the season, even that's a reach for the Blue Devils.

Saturday, February 07, 2009

A Fraudulent Response

Disclaimer: This post is not entirely rational, nor is it backed up by the normal statistics readers of this infrequently updated blog have come to expect. This post is a direct response to this one, by fellow Yankee fan and poster on Bronx Banter, Chyll Will. My previous feelings on the subject can be viewed right here.

I understand the anger. Well, strike that- what I understand is that someone else could have a different reaction than I do to the A-Rod steroid news, and that the reaction may include anger. To be honest, my reaction does include anger, but not at the player or the media. My anger is with the fans who enable the latter to apply uneven standards to the former without repercussion. Those fans then feign their own anger while conveniently ignoring parts of the issue that don't fit into a nice tidy box.

Look, we knew that Alex Rodriguez did ungodly things to his body. He is a professional athlete performing at the highest level of a generation, rising above his peers on the strength of.... something. We (speaking metaphorically as the collection of baseball fans, who apparently voted behind my back to oppose steroid use) were perfectly fine when that something was working out 14 hours a day, with protein shakes or herbal supplements or vitamin cocktails or creatine or monosodium glutamate. Before today, we knew that Alex Rodriguez was doing everything he possibly could to enhance his performance. But now, because there was a chemical that entered his body that was on a list only sort of implicitly banned by baseball at the time, suddenly he is dishonorable for trying to enhance his performance?

An analogy: A leading oncologist at a major research university knows she is on the verge of a breakthrough in cancer treatment, but just can't seem to figure out the last piece of the puzzle. Let's also say there's a chemical that improves focus and mental ability by 300%, but it was implicitly banned by the American Medical Association in a vague statement ten years ago. There are random drug tests, but zero consequences for failing that test. She decides to take the drug and with an improved focus has one of the best years an oncologist has ever had. Her new treatment makes a lot of people feel better. Six years later, the results of her random drug test are made public, and angry cancer patients start showing up at the hospital to boo the poor doctor all day.

Now, replace all the medical terms with baseball terms. Personally, I see absolutely no difference. I think that many sports fans realize the hypocrisy and choose to ingore it. We even practice hypocricy within the world of sports: when was the last time a football player was hauled in front of Congress, lied about steroid use, had their career ruined, and made Congress angry enough to waste millions of taxpayer dollars on a perjury trial during an economic depression?

I am not a parent yet either. But when I am, the message to my children will be this: We are not slaves to our genetics. Depending on our environment, our talents can either be destroyed or enhanced, and we are the engineers of that decision. So if there is something at which you have talent, something at which you want to be the best you can, by all means make your environment an enhancing one. Having a few sports "heroes" allegedly "cheating" in order to entertain us is not a dangerous message to our children. We want, and in some cases need to live vicariously by, their doing whatever is necessary to be the best that they can be. The far more damaging lesson is this: "we are a nation of innocence before guilt, of treating people equally. Except of course, when it comes to home run hitters. Then, what any reporter spews is direct evidence of guilt, but only if a player is sorta disliked anyway."

In conclusion, my thoughts restated in more easily accessible bullet form:

  • Why does enhancement of performance through some chemicals count as "cheating," while using other kinds of chemicals counts as "training"?
  • If it's simply a legality issue: Many of the supposed Performance Enhancing Drugs (such as HGH) have never been proven to have an impact on muscle growth or performance, enhancemet or otherwise. Why, then, are they on the banned list?
  • If it's the "do anything to get ahead, unfair advantage" issue that is dishonorable: why don't we as fans collectively look down upon players who use herbal supplements or hyperbaric chambers to try and get ahead? To make this argument look even more ridiculous, what about players who pray to God for a better performance? I think Congress needs to waste taxpayer money investigating whether God is indeed enhancing the performance of faithful athletes.
  • Shouldn't players who took innefectual drugs be treated differently, like a dumbass teenager who smokes a dime bag of oregano?
  • For those "worried about the message to children": Why are people worked up about baseball players who are on steroids, but continue to tolerate/celebrate football players who are known steroid users, such as Shawn Merriman?
  • Players who are caught breaking the steroid rules all face the same official penalty from the league. However, this fairness is not extended to the "court of public opinion" whose instincts are guilt before innocence. For some reason, particular players return to good opinion among fans while others wallow in collective loathing from fans. Shouldn't all suspected steroid users face the same public opinion?
  • And finally, to those who whine about the integrity of the game, or about the sanctity of statistics: clearly, you have no knowledge about the history of baseball.

Monday, January 05, 2009

Reason #812 Why Polls Are Stupid

College basketball polling is stupid; we all realize this, and yet we all are somehow swayed by the charm. Mostly these days the rankings matter only to fans of a team for the purposes of bragging rights, and to ESPN, looking to hype a game between two "top ten teams." You have the preseason rankings, which is about as predictive as just picking the final four teams out of a hat (so long as the hat only has the names of schools from the BCS conferences). Midseason, the formula for figuring out how a team will do in next week's ranking is simple: did your team lose this week? If yes, then they will be ranked lower; if not, well then they'll probably have the same ranking next week, depending on how the teams above them did. The rankings are not even meaningful at the end of the regular season, because the rankings that might be used by the NCAA selection committee when they fill the brackets on Selection Sunday are from the previous week, and don't reflect any conference tournament results! Luckily, reporters who have infiltrated the committee report that RPI, and actual team performance matter a whole lot more than what seventy sportswriters have to say.

To me, it's particularly hilarious watching teams rise and fall solely based on how they performed on Sundays. A team can lose two home games during the week, and then knock off the #1 team in the country on the road on Sunday night, and suddenly they are everyone's sleeper. What happened this week is not as much of an extreme example, but it is pretty telling about the minds of the coaches and writers who vote in the polls.

Let's look at Boston College. Before last night they were 13-2, having won ten in a row since back to back November losses to Saint Louis (on the road) and Purdue (neutral court). Their other wins last week included a twenty point home win over San Fransisco followed by a thirteen point home victory over Seton Hall two days later. Before the game they were ranked out of the top 75 teams in the country according to Most importantly, they had never received a single vote in either the AP or Coaches' poll in the first eight weeks (including the preseason poll).

Then, in Chapel Hill yesterday, BC knocked off top-ranked UNC, 85-78. Less than 24 hours later, ESPN published the new college basketball rankings. Boston College miraculously generated 90 points in the Coaches' poll, vaulting them from nowhere into 24th. The AP poll was even more generous, vaulting BC to 17th! From not even in the NCAA tournament to the sweet sixteen, all thanks to a poor shooting night by Tyler Hansbrough! Statistically, BC is ranked 52nd in the nation by, along with such giant killers as UAB and Iowa. Jeff Sagarin, whose rankings are based on teams played and score margin, has BC ranked 37th.

I write this not to belittle the accomplishment of Boston College. It was a gritty win, and continues to prove that the ACC is the best conference anywhere. But I tend to think that any unranked team in a BCS conference would have lept to 17th in the AP poll the day after beating God's Gift to Basketball Teams, the 2009 UNC Tarheels. The writers are not studying game film or getting crosseyed from looking over statistics to determine which are the 25 best teams in the nation at this moment. What they know, for certain, is that UNC is the best team ever, and Boston College beat them, in Chapel Hill, and so the Eagles must be this great team we didn't know about. UNC is still great, you see, and they're still going to have them ranked third, because it's impossible to believe that they overrated UNC, of course.

Let's contrast this to what happened after ESPN's touted "game of the week," between #6 Duke and #7 Xavier at the Meadowlands on December 20. The network actually cut away from the game with Duke up by 25+ points in the second half, but the writers and coaches chose not use Duke's stomping over the #7 team in the country as a referrendum proving Duke was the New Team To Beat. Instead, Duke raised only to #5 in the rankings while Xavier was pushed to #14. The keen poll-watcher would know that there is no way the voters could have pushed Duke any higher, because all four teams ahead of them (UNC, UConn, Pitt, and Oklahoma) were undefeated, and how could a one loss team possibly be better than an undefeated team?!?

I'm not so much griping as a Duke fan-- the loss to Michigan did the same thing for the Wolverines in early December that it has done for Boston, before other losses backed them out of the top 25. Another hilarious polll oddity happened soon later; Duke dropped from #4 to #7 in both polls following the loss in Ann Arbor, and a week later gained a place, despite not playing a single game! I suppose I don't really have a point, just that all these rankings bemuse me. They don't matter one bit, and I feel sorry for the fans who really care about them. When it comes to major college sports, though, I suppose I should feel lucky that the championship in the sport I care about is decided on the court, rather than by a bunch of coaches before the season even starts...