My google alert screamed this afternoon with a new hit for Immaculate Inning: this article by Gregg Found at ESPN, which mentions the 44th Immaculate Inning in Major League History. Congratulations to Rafael Soriano, the Tampa Bay Rays closer who effortlessly dispatched the Los Angeles Angels of Anaheim, California, USA, Earth by striking out the side on nine pitches last night. His victims were Erik Aybar, pinch hitter Mike Napoli, and Peter Bourjos.
Soriano was not the first one to throw an Immaculate Inning to end a game, which has happened eight times previously, though two ended complete games with Immaculate Innings (by Ron Guidry and Trevor Wilson). Closers, meanwhile, don't need to play the cat-and-mouse game of wasting pitches, and there is a high priority placed on not walking anyone. Therefore it makes a bit of sense that Soriano joins closers such as Jason Isringhausen by finishing off the game with nine straight strikes.
We honor the Immaculate Inning here because it serves as a type of dominance a pitcher can have over the batters in that inning, and along those lines, Soriano's feat stands out. The Rays' closer got seven swinging strikes (including one foul by Napoli) out of the nine strikes, and all three batters swung through the final pitch. Napoli's at bat was also interesting, as a pinch hitter he seemed predestined to swing-- missing wildly on breaking pitches for strikes one and three. For his part, he just missed a fastball right down the middle on Soriano's second pitch-- a few centimeters over and we're not talking about an Immaculate Inning.
Soriano picked up his league-leading 38th save for the effort, and the victory secured a tie in the AL East with the Yankees, meaning that this Immaculate Inning is one of the most "clutch" in the history of the feat, the first in nearly a year. Honestly, despite two perfect games, how could we call 2010 the Year of the Pitcher without at least one Immaculate Inning?
Finally, we'll have to agree with Found as he notes that an Immaculate Inning is "a feat with a cool-sounding moniker to match its impressiveness."
Tuesday, August 24, 2010
Saturday, June 26, 2010
The Worst No-Hitter
Yesterday evening I was packing and decided to throw on a baseball game as background noise. Since the Yankees were playing on the West Coast, I went for the Arizona-Tampa Bay contest. I turned off the game in the third inning, after Edwin Jackson weaseled his way out of a bases-loaded jam. I thought nothing of the game until I saw on Baseball Tonight that he had thrown a no-hitter! My first thought was, is this the worst no-hitter of all time?
Jackson walked eight batters in the game, and threw a career-high 149 pitches, just 79 of them for strikes. At one point in the third inning, the Win Probability actually favored Tampa Bay, thanks to Jackson walking the bases loaded! With two outs in the ninth, Jackson walked pinch hitter Willy Aybar on four pitches, which was Aybar's seventh walk of the season. No doubt about it, Jackson pitched rather poorly and still picked up the no-no. Is it possible to have a worse no hitter than Jackson?
First of all, some ground rules: Major League Baseball defines a no-hitter as a "a game in which a pitcher, or pitchers, gives up no hits while pitching at least nine innings. A pitcher may give up a run or runs so long as he pitches nine innings or more and does not give up a hit."
This excludes some rather infamous no-hit performances, such as this one by Andy Hawkins, who allowed four (unearned) runs in a 4-0 loss while pitching for the hapless 1990 Yankees. In a baseball-reference play index search, I asked for games of 9 IP or more since 1920 with zero hits. I then sorted this list by ascending Game Score, a measure invented by Bill James to assign a single number to a starting pitchers' performance. The best nine-inning performance since 1920, according to Game Score, is Kerry Wood's 20K game in 1998, with a score of 105. The measure is very results-oriented, since it places high value on the number of innings pitched and the number of runs-- shutouts are practically guaranteed to be above 80.
Click the link for the results, which show Jackson's game last night to be the fourth-worst no-hitter on record, in Game Score. Three games tied with a game score one unit worse:
George Culver, July 29, 1968 Pitching for the Reds, his second of what would be five teams in the pre-free agency era, Culver was nothing short of mediocre in this game, for someone who pitched a no-hitter. He walked four in the game, but faced 34 batters, one of the extra men (Dick Allen) getting aboard thanks to a throwing error by third baseman Tony Perez, and reaching second on another error by shortstop Woody Woodward on the same play. Allen then reached third base on a groundout and scored on a sac-fly by Cookie Rojas.
Culver allowed another unearned baserunner in the third inning, with Phillies' starter Chris Short reaching on Catchers Interference by Pat Corrales. Culver then retired eleven straight before walking two men with two outs in the sixth, needing a strikeout to get out of that jam. Culver's final two baserunners got on via walk to open the eighth inning, but Culver induced some ground balls to get out of that one, and breezed through the ninth for the no-hitter.
Bill James calculated how likely it was for each pitcher to have thrown a no-hitter, and George Culver came out as one of the ten men least likely to have a no-no.
Ken Holtzman- August 19, 1969. Of the three no-hitters with a Game Score of 84, Holtzman's seems the most impressive. Facing Phil Neikro and staked to a 3-0 lead after the first inning, Holtzman walked just three in his no-hitter. The reason this game has such a low game score is that Holtzman struck out precisely zero hitters! Not exactly Nolan Ryan, Holtzman struck out just 5.0 per nine innings in his career, despite playing in an extremely pitcher-friendly era. Only one other no-hitter since 1920 has featured zero strikeouts, by Sad Sam Jones in 1923.
Sabermetrics has taught us in the Gospel of Three True Outcomes; that a pitcher can only (really) control three results of an at bat: a strikeout, a walk, and a home run. There are fine tunings in there, such as GB/FB rate, and his own fielding ability, but once the ball is put in play, a lot is left up to his defense. On that day in 1969, the Cubs' defense shined, and Holtzman never allowed more than one baserunner in an inning. Twelve groundouts, twelve flyouts, and three pop-outs formed an even split among the batted balls in this game. Interestingly, while giving up fly-balls tends to influence home-run rate and therefore is bad for the overall success of a pitcher, if one wants to pitch a no-hitter, a flyball is far more likely to turn into an out.
The shimmering defense was on display no finer than on the last out of the seventh inning, in which outfielder Billy Williams climbed the ivy at Wrigley to pluck a home run away from Henry Aaron. Holtzman wrote in Chicago Cubs: Memorable Stories of Cubs Baseball that the home fans were reminding him that he had a no-hitter every inning following the third (so much for jinxes!) Holtzman would go on to throw another no-hitter two years later for the Cubs, and was later elected to the Jewish Sports Hall of Fame.
Joe Cowley, September 19, 1986 From the box-score, this is my vote for the worst official no-hitter on record. Cowley registered seven walks (and eight strikeouts), and among the 145 no-hitters on record with Game Scores below 100, Cowley's is only one of two in which the no-no-man gave up an earned run. Much like Jackson, Cowley walked the bases loaded to start the sixth, and then gave up three straight fly balls, the second of which scored Reggie Jackson. Cowley also had two men on in the third, and like Jackson walked a man in the ninth, though this runner was erased on the game-ending double play.
In fact, Cowley threw the worst no-hiter of all time, and then never won another big league game. He lost his next six decisions and was out of baseball within a year of this game. Hopefully, Edwin Jackson can avoid this same fate.
Jackson walked eight batters in the game, and threw a career-high 149 pitches, just 79 of them for strikes. At one point in the third inning, the Win Probability actually favored Tampa Bay, thanks to Jackson walking the bases loaded! With two outs in the ninth, Jackson walked pinch hitter Willy Aybar on four pitches, which was Aybar's seventh walk of the season. No doubt about it, Jackson pitched rather poorly and still picked up the no-no. Is it possible to have a worse no hitter than Jackson?
First of all, some ground rules: Major League Baseball defines a no-hitter as a "a game in which a pitcher, or pitchers, gives up no hits while pitching at least nine innings. A pitcher may give up a run or runs so long as he pitches nine innings or more and does not give up a hit."
This excludes some rather infamous no-hit performances, such as this one by Andy Hawkins, who allowed four (unearned) runs in a 4-0 loss while pitching for the hapless 1990 Yankees. In a baseball-reference play index search, I asked for games of 9 IP or more since 1920 with zero hits. I then sorted this list by ascending Game Score, a measure invented by Bill James to assign a single number to a starting pitchers' performance. The best nine-inning performance since 1920, according to Game Score, is Kerry Wood's 20K game in 1998, with a score of 105. The measure is very results-oriented, since it places high value on the number of innings pitched and the number of runs-- shutouts are practically guaranteed to be above 80.
Click the link for the results, which show Jackson's game last night to be the fourth-worst no-hitter on record, in Game Score. Three games tied with a game score one unit worse:
George Culver, July 29, 1968 Pitching for the Reds, his second of what would be five teams in the pre-free agency era, Culver was nothing short of mediocre in this game, for someone who pitched a no-hitter. He walked four in the game, but faced 34 batters, one of the extra men (Dick Allen) getting aboard thanks to a throwing error by third baseman Tony Perez, and reaching second on another error by shortstop Woody Woodward on the same play. Allen then reached third base on a groundout and scored on a sac-fly by Cookie Rojas.
Culver allowed another unearned baserunner in the third inning, with Phillies' starter Chris Short reaching on Catchers Interference by Pat Corrales. Culver then retired eleven straight before walking two men with two outs in the sixth, needing a strikeout to get out of that jam. Culver's final two baserunners got on via walk to open the eighth inning, but Culver induced some ground balls to get out of that one, and breezed through the ninth for the no-hitter.
Bill James calculated how likely it was for each pitcher to have thrown a no-hitter, and George Culver came out as one of the ten men least likely to have a no-no.
Ken Holtzman- August 19, 1969. Of the three no-hitters with a Game Score of 84, Holtzman's seems the most impressive. Facing Phil Neikro and staked to a 3-0 lead after the first inning, Holtzman walked just three in his no-hitter. The reason this game has such a low game score is that Holtzman struck out precisely zero hitters! Not exactly Nolan Ryan, Holtzman struck out just 5.0 per nine innings in his career, despite playing in an extremely pitcher-friendly era. Only one other no-hitter since 1920 has featured zero strikeouts, by Sad Sam Jones in 1923.
Sabermetrics has taught us in the Gospel of Three True Outcomes; that a pitcher can only (really) control three results of an at bat: a strikeout, a walk, and a home run. There are fine tunings in there, such as GB/FB rate, and his own fielding ability, but once the ball is put in play, a lot is left up to his defense. On that day in 1969, the Cubs' defense shined, and Holtzman never allowed more than one baserunner in an inning. Twelve groundouts, twelve flyouts, and three pop-outs formed an even split among the batted balls in this game. Interestingly, while giving up fly-balls tends to influence home-run rate and therefore is bad for the overall success of a pitcher, if one wants to pitch a no-hitter, a flyball is far more likely to turn into an out.
The shimmering defense was on display no finer than on the last out of the seventh inning, in which outfielder Billy Williams climbed the ivy at Wrigley to pluck a home run away from Henry Aaron. Holtzman wrote in Chicago Cubs: Memorable Stories of Cubs Baseball that the home fans were reminding him that he had a no-hitter every inning following the third (so much for jinxes!) Holtzman would go on to throw another no-hitter two years later for the Cubs, and was later elected to the Jewish Sports Hall of Fame.
Joe Cowley, September 19, 1986 From the box-score, this is my vote for the worst official no-hitter on record. Cowley registered seven walks (and eight strikeouts), and among the 145 no-hitters on record with Game Scores below 100, Cowley's is only one of two in which the no-no-man gave up an earned run. Much like Jackson, Cowley walked the bases loaded to start the sixth, and then gave up three straight fly balls, the second of which scored Reggie Jackson. Cowley also had two men on in the third, and like Jackson walked a man in the ninth, though this runner was erased on the game-ending double play.
In fact, Cowley threw the worst no-hiter of all time, and then never won another big league game. He lost his next six decisions and was out of baseball within a year of this game. Hopefully, Edwin Jackson can avoid this same fate.
Labels:
Baseball Statistics,
no-hitters,
Rare Events
Thursday, April 01, 2010
Confessions of a Duke Alum
It is time for the truth: the reason why Duke is the #1 team in the nation is because we at the Immaculate Inning paid off Ken Pomeroy.
Since 2007, these e-pages have been graced with seemingly logical discussion of the Duke University men's basketball team. We dissect the game from various angles using the statistics-- and those stats, facts based on what happened in each game, come from one website: kenpom.com. The obsession was so obvious that one time in 2007 the blog was briefly shut down by Google because they thought it was a spam link generator:
Over the years, some have wondered how it is that mediocre Duke teams were still ranked near the top of Pomeroy's rankings. All this season, fans dismissed the rankings because "everyone knew" that Kansas, Kentucky, and Syracuse were the best teams in the land. Yet, for all but a few weeks of 2010, the Duke Blue Devils were ranked #1 on kenpom.com; no ranking so obviously flawed could be trusted! Well, now you know the reason; with our beloved Blue Devils trashed in the national media for being "alarmingly unathletic," and unable to compete because of our racist coach, we wanted to feel like we were the best at something.
Some of you may ask for proof, so here is the little "Christmas Present" I gave to Mr. Pomeroy. His demands were quite specific.
Most of you will not be surprised by this revelation; it is of course common knowledge that Duke receives special treatment from the NCAA, CBS, and the officials (Georgia Tech superfreshman Derick Favors said after the ACC Championship game: "It was very frustrating. We played good defense, and the referees bailed them out.") It's such common knowledge that even after six seasons of "adjustment" by the officials, keeping Duke out of the Final Four, the first explanation for Duke's trip to Indianapolis was that "Duke Gets All the Calls." Of course we do, and there's a very good reason for that too! First, the facts:
It's no secret that referees are corruptible. Tim Donaghy, who was caught wagering on the basketball games he officiated, has claimed that the problem is not an isolated one; he claims that 13 NBA officials are involved in wagering on the game. He further accuses the NBA of "turning a blind eye," because they are more interested in the money than in fairness. The distrust this has created in the casual sports fan has trickled down to mingle with the "Duke Gets All the Calls" meme; no longer is it ridiculous tinfoil hat talk. Real referees are really swayed by real cash.
It's also no secret that Duke alums make a lot of money; in 2009 USA Today ran a "payscale bracket" which picked teams based on median graduate salary. Duke won, and it wasn't close. The Duke Endowment, despite the economic downturn, is still worth nearly $3 billion. The Duke Annual Fund employs banks of undergraduates to call previous donors; this year I made a donation that the university recorded as going to "The Nicholas School of the Environment," and they even made it look good by sending me a thank you letter. All of it was done with a wink, and an understanding.
Truth #2: Every year, Duke alums make a donation to the "Referee Fund," which goes directly to the National Association of College Basketball Referees. Our sizable contribution is made with good faith understanding that it will be paid back with whistles.
We are human, though; we do feel a slight bit of guilt every time we write that check. For six years we had to sit on our hands because the officials said that it would be too obvious to hand Duke a title in 2006, so soon after the officials handed them the 2004 and 2001 Final Four runs. Many alums canceled their contributions after the Duke-UConn game in 2004, and the Duke-LSU game in 2006. It seemed that in those games, the referees were blatantly defying their loyal contributors!
Finally, we have some retribution. Handed the easiest bracket since UNC "won" the 1924 national championship by playing exactly zero postseason games, Duke sailed to the Final Four beating a couple of high school teams and your mother's quilting club. It almost wasn't enough, so the refs did have to step in and prevent Baylor from rebounding any of their (fairly frequent) missed shots. Now, we are two games away from a fourth national championship. But at what cost? We felt it was time to stand up for what was right and come clean.
To the referees in the Final Four: May you call every touch foul and carry on Duke; may every block-charge call go against us. May you clear your conscience and hand West Virginia twice as many free throws. It is your massive control over the game-- more than the coaches, more than the players themselves, you referees always decide the outcome of every game. So, in the name of justice, and as a Duke alum and current student who will be in attendance, please make West Virginia win on Saturday. Only then can we rest in peace.
Since 2007, these e-pages have been graced with seemingly logical discussion of the Duke University men's basketball team. We dissect the game from various angles using the statistics-- and those stats, facts based on what happened in each game, come from one website: kenpom.com. The obsession was so obvious that one time in 2007 the blog was briefly shut down by Google because they thought it was a spam link generator:
Re: #118781655 Blogger beta non-spam review and verification request: mehmattski.blogspot.com
Hello,
Your blog has been reviewed, verified, and cleared for regular use so that
it will no longer appear as potential spam. If you sign out of Blogger and
sign back in again, you should be able to post as normal. Thanks for your
patience, and we apologize for any inconvenience this has caused.
Sincerely,
The Blogger Team
Over the years, some have wondered how it is that mediocre Duke teams were still ranked near the top of Pomeroy's rankings. All this season, fans dismissed the rankings because "everyone knew" that Kansas, Kentucky, and Syracuse were the best teams in the land. Yet, for all but a few weeks of 2010, the Duke Blue Devils were ranked #1 on kenpom.com; no ranking so obviously flawed could be trusted! Well, now you know the reason; with our beloved Blue Devils trashed in the national media for being "alarmingly unathletic," and unable to compete because of our racist coach, we wanted to feel like we were the best at something.
Some of you may ask for proof, so here is the little "Christmas Present" I gave to Mr. Pomeroy. His demands were quite specific.
Most of you will not be surprised by this revelation; it is of course common knowledge that Duke receives special treatment from the NCAA, CBS, and the officials (Georgia Tech superfreshman Derick Favors said after the ACC Championship game: "It was very frustrating. We played good defense, and the referees bailed them out.") It's such common knowledge that even after six seasons of "adjustment" by the officials, keeping Duke out of the Final Four, the first explanation for Duke's trip to Indianapolis was that "Duke Gets All the Calls." Of course we do, and there's a very good reason for that too! First, the facts:
It's no secret that referees are corruptible. Tim Donaghy, who was caught wagering on the basketball games he officiated, has claimed that the problem is not an isolated one; he claims that 13 NBA officials are involved in wagering on the game. He further accuses the NBA of "turning a blind eye," because they are more interested in the money than in fairness. The distrust this has created in the casual sports fan has trickled down to mingle with the "Duke Gets All the Calls" meme; no longer is it ridiculous tinfoil hat talk. Real referees are really swayed by real cash.
It's also no secret that Duke alums make a lot of money; in 2009 USA Today ran a "payscale bracket" which picked teams based on median graduate salary. Duke won, and it wasn't close. The Duke Endowment, despite the economic downturn, is still worth nearly $3 billion. The Duke Annual Fund employs banks of undergraduates to call previous donors; this year I made a donation that the university recorded as going to "The Nicholas School of the Environment," and they even made it look good by sending me a thank you letter. All of it was done with a wink, and an understanding.
Truth #2: Every year, Duke alums make a donation to the "Referee Fund," which goes directly to the National Association of College Basketball Referees. Our sizable contribution is made with good faith understanding that it will be paid back with whistles.
We are human, though; we do feel a slight bit of guilt every time we write that check. For six years we had to sit on our hands because the officials said that it would be too obvious to hand Duke a title in 2006, so soon after the officials handed them the 2004 and 2001 Final Four runs. Many alums canceled their contributions after the Duke-UConn game in 2004, and the Duke-LSU game in 2006. It seemed that in those games, the referees were blatantly defying their loyal contributors!
Finally, we have some retribution. Handed the easiest bracket since UNC "won" the 1924 national championship by playing exactly zero postseason games, Duke sailed to the Final Four beating a couple of high school teams and your mother's quilting club. It almost wasn't enough, so the refs did have to step in and prevent Baylor from rebounding any of their (fairly frequent) missed shots. Now, we are two games away from a fourth national championship. But at what cost? We felt it was time to stand up for what was right and come clean.
To the referees in the Final Four: May you call every touch foul and carry on Duke; may every block-charge call go against us. May you clear your conscience and hand West Virginia twice as many free throws. It is your massive control over the game-- more than the coaches, more than the players themselves, you referees always decide the outcome of every game. So, in the name of justice, and as a Duke alum and current student who will be in attendance, please make West Virginia win on Saturday. Only then can we rest in peace.
Wednesday, March 17, 2010
2010 NCAAT Fan Anxiety Matrix
Simulating the NCAA tournament has been pretty popular this year, although I imagine that many of the Duke haters out there are not very satisfied with the results. The results, as expected, agree with the discrete log5 projections published by Basketball Prospectus (Click the links for South, East, West, and Midwest region log5 predictions). The difference between those predictions and my simulations, is the type of data that can be pulled out of the simulations.
For instance, below is a list of the five teams that all 64 teams in the field are most likely to end their seasons against. For example, while Duke was found to win the whole tournament in 24% of simulations, who did they lose to in the other 76% of the one million simulations?
As you can see, California is the team most likely to send Duke home with a disappointing season (and give plenty of bloggers and media mouthpieces lots to puff their chests about). It's not surprising to see that Kansas is one of the top five teams to knock off Duke, since Duke averaged 3.5 wins per simulation, and got to the final game very often, as did Kansas.
We're calling this presentation the Fan Anxiety Matrix. Which team is the most likely to knock out your favorite team? The second tab of the spreadsheet above shows every team's chances of being knocked out by every other team. Click here to see the full spreadsheet.
With a focus on the ACC teams, here's some pie charts to gaze at. Maryland fans will be pleased to see that Duke has only a 2% chance of ending their season. Unfortunately, there is only a 2% chance of winning the national championship as well. The other four ACC teams have an uphill battle to make it until the second weekend (click a Fan Anxiety Index to enlarge):





For instance, below is a list of the five teams that all 64 teams in the field are most likely to end their seasons against. For example, while Duke was found to win the whole tournament in 24% of simulations, who did they lose to in the other 76% of the one million simulations?
As you can see, California is the team most likely to send Duke home with a disappointing season (and give plenty of bloggers and media mouthpieces lots to puff their chests about). It's not surprising to see that Kansas is one of the top five teams to knock off Duke, since Duke averaged 3.5 wins per simulation, and got to the final game very often, as did Kansas.
We're calling this presentation the Fan Anxiety Matrix. Which team is the most likely to knock out your favorite team? The second tab of the spreadsheet above shows every team's chances of being knocked out by every other team. Click here to see the full spreadsheet.
With a focus on the ACC teams, here's some pie charts to gaze at. Maryland fans will be pleased to see that Duke has only a 2% chance of ending their season. Unfortunately, there is only a 2% chance of winning the national championship as well. The other four ACC teams have an uphill battle to make it until the second weekend (click a Fan Anxiety Index to enlarge):





Labels:
Fan Anxiety Matrix,
NCAA Tourney,
Pomeroy Stats,
simulation
Sunday, March 14, 2010
2010 NCAA Tournament Simulations
Be sure to also check out the "Fan Anxiety Matrix" Who will your team lose to this March?
The brackets are set, and so for college basketball stat nerds, that means simulations. A number have already popped up: 5000 simulations using Sagarin's predictor rating; an online tool for simulating one random bracket, using Pomeroy's statistics.
As I did last year, I take both approaches one step further: using Pomeroy's offensive and defensive efficiency ratings, and the log5 prediction method, I simulated the 2010 NCAA tournament one million times. My script tabulated the results below. (You can also access them in Google Spreadsheet form by clicking here).
Each column is a round of the tournament; each value is the percentage of the one million simulations that a team reached a given round. On the right-hand side is the average number of wins each team in the tournament had in the simulations.
Last year, hardly any first round games were likely to be upsets based on the simulation. This year is much different, thanks to several major discrepancies between Pomeroy's rankings and the seedings made by the committee. Brigham Young, in particular, is seventh overall at kenpom, while stuck in a 7-seed in this year's tournament. The best games to watch this Thursday and Friday will be the 6-11 games, as all four are slated to be near coin-flips.
The intuition here is to do a face-slap and say "Duh, the teams at the top of the Pomeroy rankings have the best chance in simulations using the Pomeroy rankings!" That dismissal would miss several key features of the simulation, and one interesting thing to do is to see how the simulations correspond to our gut instincts about the basketball matchups in each game. For example, Kentucky and Syracuse have rough roads to the national title because of very high (about 25%) chances of losing in the second round. Florida State is the culprit for Syracuse; if you combine that information with the possibility that Arinze Onuaku will not play this weekend, an FSU-Syracuse game on Sunday suddenly gets very interesting.
Kentucky could run into Texas in the second round, and the Longhorns are ranked much higher than in the RPI and by humans (both the selection committee and the polls). And while Texas has been a bit of an enigma to the national media this year, it is clear that the possess the talent to be efficient on both ends of the floor. Kentucky's road is further blocked by Wisconsin, a team actually ranked higher than the Wildcats. That Sweet Sixteen matchup would be really bruising on Kentucky's boards, as the nation's #2 offensive rebounder (Collins) takes on the Badgers' nation-best defensive rebounding squad.
The Pomeroy rankings are typically recognized as being successful post-season analysis of the teams, including the tournament games. All six NCAA champions since Pomeroy's website launched were ranked in the top two post-season, and in the top 15 for both offense and defense. However, the accuracy of the pre-tournament stats is a bit more rusty; last year the "best simulation" out of one million got 53 games right, and averaged 37 correct games. Is it a lack of complete data, a flaw in the system's ability to prognosticate, or just the general stochasticity of the NCAA tournament?
This will be a very interesting year for seeing the ability of the RPI rating system (used by the committee, and which does not include margin of victory) versus the Pomeroy rating system (which goes to the opposite extreme, including margin of victory with no cap). If the adjusted efficiencies are the more accurate predictor this tournament, then we are likely to have a mad, mad, mad, mad March.
I wanted to get the results out quickly, but I will have some further analysis in this post and others throughout the week. Thanks as always to Ken Pomeroy for his absolutely terrific website; without his stats, none of the fun simulators would exist!
The brackets are set, and so for college basketball stat nerds, that means simulations. A number have already popped up: 5000 simulations using Sagarin's predictor rating; an online tool for simulating one random bracket, using Pomeroy's statistics.
As I did last year, I take both approaches one step further: using Pomeroy's offensive and defensive efficiency ratings, and the log5 prediction method, I simulated the 2010 NCAA tournament one million times. My script tabulated the results below. (You can also access them in Google Spreadsheet form by clicking here).
Each column is a round of the tournament; each value is the percentage of the one million simulations that a team reached a given round. On the right-hand side is the average number of wins each team in the tournament had in the simulations.
Last year, hardly any first round games were likely to be upsets based on the simulation. This year is much different, thanks to several major discrepancies between Pomeroy's rankings and the seedings made by the committee. Brigham Young, in particular, is seventh overall at kenpom, while stuck in a 7-seed in this year's tournament. The best games to watch this Thursday and Friday will be the 6-11 games, as all four are slated to be near coin-flips.
The intuition here is to do a face-slap and say "Duh, the teams at the top of the Pomeroy rankings have the best chance in simulations using the Pomeroy rankings!" That dismissal would miss several key features of the simulation, and one interesting thing to do is to see how the simulations correspond to our gut instincts about the basketball matchups in each game. For example, Kentucky and Syracuse have rough roads to the national title because of very high (about 25%) chances of losing in the second round. Florida State is the culprit for Syracuse; if you combine that information with the possibility that Arinze Onuaku will not play this weekend, an FSU-Syracuse game on Sunday suddenly gets very interesting.
Kentucky could run into Texas in the second round, and the Longhorns are ranked much higher than in the RPI and by humans (both the selection committee and the polls). And while Texas has been a bit of an enigma to the national media this year, it is clear that the possess the talent to be efficient on both ends of the floor. Kentucky's road is further blocked by Wisconsin, a team actually ranked higher than the Wildcats. That Sweet Sixteen matchup would be really bruising on Kentucky's boards, as the nation's #2 offensive rebounder (Collins) takes on the Badgers' nation-best defensive rebounding squad.
The Pomeroy rankings are typically recognized as being successful post-season analysis of the teams, including the tournament games. All six NCAA champions since Pomeroy's website launched were ranked in the top two post-season, and in the top 15 for both offense and defense. However, the accuracy of the pre-tournament stats is a bit more rusty; last year the "best simulation" out of one million got 53 games right, and averaged 37 correct games. Is it a lack of complete data, a flaw in the system's ability to prognosticate, or just the general stochasticity of the NCAA tournament?
This will be a very interesting year for seeing the ability of the RPI rating system (used by the committee, and which does not include margin of victory) versus the Pomeroy rating system (which goes to the opposite extreme, including margin of victory with no cap). If the adjusted efficiencies are the more accurate predictor this tournament, then we are likely to have a mad, mad, mad, mad March.
I wanted to get the results out quickly, but I will have some further analysis in this post and others throughout the week. Thanks as always to Ken Pomeroy for his absolutely terrific website; without his stats, none of the fun simulators would exist!
Labels:
NCAA Tourney,
Pomeroy Stats,
simulation
Thursday, March 11, 2010
One Million ACC Tournament Simulations
One year ago, we had some fun with spreadsheets and used a number of different methods to predict the ACC Tournament. Unfortunately, the raw numbers had no way of knowing the status of Ty Lawson's ankle, so the team predicted to win about 30% of the time grabbed the title, while UNC rested for bigger fish. In the past year the number of Pomeroy Disciples has grown, and so traditional "log5" predictions of the conference tournaments can be found all across teh internets (although this one in particular, from Basketball Prospectus, is a must read).
I like to find my little niche here at the Immaculate Inning, and that means simulating the hell out of things. The method is the same as for last years' ACC tournament. This year, I used raw offensive and defensive efficiencies that were tabulated here. This means that a team did not have their stats adjusted for home games or for the strength of opponent: the only values in the stat is points scored (or allowed) per possession. Via the Pythagorean Expectation Formula (with KenPom's exponent for unadjusted efficiencies = 8.5), I calculated a team's "expected winning percentage."
To determine the chances that team A beats team B, a form of Bayes Formula is applied, which in the stat-head world has come to be known as "The log5 Method." The method could be applied to any scenario where the probability of a single outcome is desired, given the prior probability for each of two alternatives. Here, we have two teams, each with an expected winning percentage, and can calculate the probability of a .900 team beating an .800 team. If we assume that the result of each game is independent, then we can multiply probabilities together to get a team's overall probability of making a certain round.
Personally I find the method rather deterministic, in what essentially is a stochastic process. Instead, I run the tournament 1 million times and calculate the percentage of simulations n which each team makes it to each round. The results of my simulations for the 2010 ACC Tournament are below:
The spreadsheet has two tabs, one for a simulation done using stats from all games, while the other is for ACC games only. The way to read it is that each team (row) won a certain number of games (0,1,2,3, or 4) in a certain percentage of the 1 million ACC tournaments I simulated. For the top four seeds, the maximum number of wins is 3, while the other 12 teams could potentially win four games and the tournament.
Duke's chances of winning the tournament is severely if stats from the entire season are used, and they go from a near 2-to-1 favorite to not even winning a majority of simulations. Part of this has to do with the raw nature of the efficiencies; accounting for Duke's tough schedule (and it was one of the toughest in the country by most any measure: KenPom, Sagarin, RPI) would probably account for most of the discrepancy.
On the other end of the spectrum is Miami, which gained an incredibly high percentage (from 0.1% to 3.4%) because they had a highly positive efficiency margin for all games, while it was highly negative in ACC games only. The Canes played very very well against a bunch of schools I've barely heard of, followed by getting clobbered in ACC play. Their adjusted efficiency margin is still decent due to the ACC games they played, but it's hard to give the full season stats much regard in this instance.
The numbers for the ACC-only simulations differ from those seen at Basketball Prospectus; I imagine most of the differences here also have to do with using raw efficiencies rather than Pomeroy's adjusted numbers. The adjusted numbers, Pomeroy claims, are the best for predicting "the chance of beating an average D-1 team on a neutral floor." The raw numbers, then, are skewed based on home-court advantage, schedule (remember, the ACC is no longer "balanced"), and the overall strength of offenses and defenses a team faces. In particular the predictions differ in that Maryland's chances are reduced, at the expense of better chances for FSU and VPI. Both methods agree that fifth-seed Wake has one hellish path towards an ACC title; much worse than sixth-seeded Clemson's chances. Overall, it will be interesting to see whether raw or adjusted efficiencies do a better job predicting the ACC tournament.
Another advantage that these simulations have is the amount of fun I can have with the results. Below I present the "Fan Anxiety Matrix." Each cell in the Matrix represents the chances that a team (in the rows) loses in the ACC tournament to a specific team (in the columns):
So, Duke's "Fan Anxiety Matrix" says that, in the 37% of simulations when they didn't win the whole thing, the most common opponent taking down the Blue Devils was Maryland (Using the ACC stats here). Perhaps no surprise there, but then there were still 8.3% of the simulations in which the Blue Devils fell in the semi-finals to Virginia Tech. Duke's first round game is against either Boston College or Virgina, and the combined percentage of simulations in which the Blue Devils' ACC run ended against those two teams was six percent. It should be of some comfort that UNC's chances of taking down Duke (this would have to be in the finals) clocked in at a tiny 0.029%.
Looking at the matrix, Clemson's path is an interesting one. The Tigers are seeded sixth and must at least pass through NCSU and FSU to get to the semifinals; the Matrix has them losing to these teams 22.5% and 37.1%, respectively. Maryland (22.4%) and Duke (10.1%) also appear in the double-digit percentages as Clemson's final ACC foe, with the remaining 3.2% speaking for Clemson's ACC title chances.
Virginia Tech's bubble position would certainly be helped with a win in their quarterfinal matchup; things are looking up according to the simulations, which have them falling to Duke in the semifinals 50% of the time. Wake Forest has a rough road to the ACC title, as they must win Thursday versus Miami (losing %: 30.5), Friday versus Virginia Tech (39.5%), Saturday versus (with 94% probability) Duke, who accounted for a further 25% of Wake's losses in the simulations.
For posterity's sake, here are the official Immaculate Inning ACC Tournament Predictions:
Thursday winners: Virginia, Wake Forest, Georgia Tech, Clemson
Friday winners: Duke, Virginia Tech, Maryland, Clemson
Saturday winners: Duke, Maryland
ACC Champion: Duke 75, Maryland 60
I like to find my little niche here at the Immaculate Inning, and that means simulating the hell out of things. The method is the same as for last years' ACC tournament. This year, I used raw offensive and defensive efficiencies that were tabulated here. This means that a team did not have their stats adjusted for home games or for the strength of opponent: the only values in the stat is points scored (or allowed) per possession. Via the Pythagorean Expectation Formula (with KenPom's exponent for unadjusted efficiencies = 8.5), I calculated a team's "expected winning percentage."
To determine the chances that team A beats team B, a form of Bayes Formula is applied, which in the stat-head world has come to be known as "The log5 Method." The method could be applied to any scenario where the probability of a single outcome is desired, given the prior probability for each of two alternatives. Here, we have two teams, each with an expected winning percentage, and can calculate the probability of a .900 team beating an .800 team. If we assume that the result of each game is independent, then we can multiply probabilities together to get a team's overall probability of making a certain round.
Personally I find the method rather deterministic, in what essentially is a stochastic process. Instead, I run the tournament 1 million times and calculate the percentage of simulations n which each team makes it to each round. The results of my simulations for the 2010 ACC Tournament are below:
The spreadsheet has two tabs, one for a simulation done using stats from all games, while the other is for ACC games only. The way to read it is that each team (row) won a certain number of games (0,1,2,3, or 4) in a certain percentage of the 1 million ACC tournaments I simulated. For the top four seeds, the maximum number of wins is 3, while the other 12 teams could potentially win four games and the tournament.
Duke's chances of winning the tournament is severely if stats from the entire season are used, and they go from a near 2-to-1 favorite to not even winning a majority of simulations. Part of this has to do with the raw nature of the efficiencies; accounting for Duke's tough schedule (and it was one of the toughest in the country by most any measure: KenPom, Sagarin, RPI) would probably account for most of the discrepancy.
On the other end of the spectrum is Miami, which gained an incredibly high percentage (from 0.1% to 3.4%) because they had a highly positive efficiency margin for all games, while it was highly negative in ACC games only. The Canes played very very well against a bunch of schools I've barely heard of, followed by getting clobbered in ACC play. Their adjusted efficiency margin is still decent due to the ACC games they played, but it's hard to give the full season stats much regard in this instance.
The numbers for the ACC-only simulations differ from those seen at Basketball Prospectus; I imagine most of the differences here also have to do with using raw efficiencies rather than Pomeroy's adjusted numbers. The adjusted numbers, Pomeroy claims, are the best for predicting "the chance of beating an average D-1 team on a neutral floor." The raw numbers, then, are skewed based on home-court advantage, schedule (remember, the ACC is no longer "balanced"), and the overall strength of offenses and defenses a team faces. In particular the predictions differ in that Maryland's chances are reduced, at the expense of better chances for FSU and VPI. Both methods agree that fifth-seed Wake has one hellish path towards an ACC title; much worse than sixth-seeded Clemson's chances. Overall, it will be interesting to see whether raw or adjusted efficiencies do a better job predicting the ACC tournament.
Another advantage that these simulations have is the amount of fun I can have with the results. Below I present the "Fan Anxiety Matrix." Each cell in the Matrix represents the chances that a team (in the rows) loses in the ACC tournament to a specific team (in the columns):
So, Duke's "Fan Anxiety Matrix" says that, in the 37% of simulations when they didn't win the whole thing, the most common opponent taking down the Blue Devils was Maryland (Using the ACC stats here). Perhaps no surprise there, but then there were still 8.3% of the simulations in which the Blue Devils fell in the semi-finals to Virginia Tech. Duke's first round game is against either Boston College or Virgina, and the combined percentage of simulations in which the Blue Devils' ACC run ended against those two teams was six percent. It should be of some comfort that UNC's chances of taking down Duke (this would have to be in the finals) clocked in at a tiny 0.029%.
Looking at the matrix, Clemson's path is an interesting one. The Tigers are seeded sixth and must at least pass through NCSU and FSU to get to the semifinals; the Matrix has them losing to these teams 22.5% and 37.1%, respectively. Maryland (22.4%) and Duke (10.1%) also appear in the double-digit percentages as Clemson's final ACC foe, with the remaining 3.2% speaking for Clemson's ACC title chances.
Virginia Tech's bubble position would certainly be helped with a win in their quarterfinal matchup; things are looking up according to the simulations, which have them falling to Duke in the semifinals 50% of the time. Wake Forest has a rough road to the ACC title, as they must win Thursday versus Miami (losing %: 30.5), Friday versus Virginia Tech (39.5%), Saturday versus (with 94% probability) Duke, who accounted for a further 25% of Wake's losses in the simulations.
For posterity's sake, here are the official Immaculate Inning ACC Tournament Predictions:
Thursday winners: Virginia, Wake Forest, Georgia Tech, Clemson
Friday winners: Duke, Virginia Tech, Maryland, Clemson
Saturday winners: Duke, Maryland
ACC Champion: Duke 75, Maryland 60
Labels:
ACC Tournament,
predictions,
simulation
Friday, March 05, 2010
The Last Time
The average US price for a gallon of gasoline was $1.82.
One share of Google stock was $187.40.
The HMS Scott reveals, via mapping of the seafloor, a 100 m landslide at the epicenter of the deadly 2004 earthquake/tsunami in the Indian Ocean.
Year 4702 (Year of the Rooster) began in the Chinese calendar.
The #1 song on the Hot-100 Billboard charts was "Let me love you" by Mario
The top movie at the box office was "Boogeyman." It was about to be replaced by "Hitch."
Cuba begins a ban on smoking in public places.
President George W. Bush, just two weeks into his second term, announces a tax increase.
Two months before his death, Pope John Paul II allows an American cardinal to give the Ash Wednesday address from the Vatican.
#8 Duke beat #2 North Carolina in Cameron Indoor Stadium, 71-70, after Rashad McCants dribbled the ball off his foot with two seconds left.
Duke students, acting like They Have Been There Before, stay in the stands to sing the alma mater, rather than rushing the court. Students proceed to burn shit in a disorderly fashion.
That was Wednesday, February 9, 2005. The last time Duke beat UNC at Cameron.
Carolina delenda est.
One share of Google stock was $187.40.
The HMS Scott reveals, via mapping of the seafloor, a 100 m landslide at the epicenter of the deadly 2004 earthquake/tsunami in the Indian Ocean.
Year 4702 (Year of the Rooster) began in the Chinese calendar.
The #1 song on the Hot-100 Billboard charts was "Let me love you" by Mario
The top movie at the box office was "Boogeyman." It was about to be replaced by "Hitch."
Cuba begins a ban on smoking in public places.
President George W. Bush, just two weeks into his second term, announces a tax increase.
Two months before his death, Pope John Paul II allows an American cardinal to give the Ash Wednesday address from the Vatican.
#8 Duke beat #2 North Carolina in Cameron Indoor Stadium, 71-70, after Rashad McCants dribbled the ball off his foot with two seconds left.
Duke students, acting like They Have Been There Before, stay in the stands to sing the alma mater, rather than rushing the court. Students proceed to burn shit in a disorderly fashion.
That was Wednesday, February 9, 2005. The last time Duke beat UNC at Cameron.
Carolina delenda est.
Subscribe to:
Posts (Atom)