Tuesday, January 30, 2007

Diving into DYJS Data, Part 1

First of all, welcome to all those who may have wandered over from waswatching.com, after Steve graciously linked to me tonight. Now that I've introduced DYJS, I thought that I'd post some interesting preliminary findings. The goal when I set out on this giant mission of data gathering, was to find new correlations to October success. I'm going to need more than six teams' worth of data (the 2006 playoff teams) to get that far, so for now I'd like to look at how well DYJS explains 2006 regular season success.

First of all, from the data summary, it seems that runs scored per game relates to wins at r=0.62. To explain, using my (admittedly limited) knowledge of statistics- a team's aggregate offensive output "explains" 62% of a team's win total. An r value of 1.00 means the two variables (in this case W and RS/G) would completely "explain" each other, while 0.00 would mean that there was no relation between the variables. Many of BP's regular season stats , when compared to post-season success, had an r value close to 0.00. Anyway, I'd like to use this 62% value as a benchmark for seeing how well each of the metrics I've co-created relate to wins.

Standard deviation of runs scored on a day-to-day basis was the original aim of this story. And as I pointed out in the previous post, the Yankees did lose to a team with a lower standard deviation of runs scored. However, here's where my statistics knowledge can get me into trouble, because I know just enough to make bad conclusions. What I'm wondering is what the effect of not being able to score fewer than zero runs has on the overall picture. What I mean is, does the Tigers' lower standard deviation come from the fact that they scored fewer runs, overall, than the Yankees. Any help from real statisticians would be appreciated.

In regards to the DYJS, the number that immediately jumps out at me is 0.89. That is the correlation between "DYJS O + D" and wins. What that means is the number of times that a team is able to Do Their Job at the plate and on the mound goes a long way towards explaining the number of wins a team accumulates. This seems intuitive to me, but I co-invented the statistic, so maybe it isn't so for others. At any rate, I think it shows me that I am on the right track with this metric, since it's so closely related to wins.

Looking specifically offense and defense, I find that Doing Your Job on the mound correlates much better to wins (r = 0.71) than does Doing Your Job at the plate (r = 0.59). To look at this another way, I've ranked each team in its DYJ percentage on offense and defense. The teams are ranked by wins, and I've compared each teams' rank in wins to its DYJS ranks:

So, what I believe this is telling me is that in the 2006 American League, it was much more important to Do Your Job on the mound than at the plate. In fact, for the Yankees it was crucial: there were 115 games when the Yankees Did Their Job at the plate in 2006, and they won 87 of those games (that's where the 75.7% comes from). On the mound, however, they Did Their Job in 79 games and won 67 of them (84.8%).

In the near future, then, I'm going to take a close look at my numbers for pitching, rather than offense, because I believe I am pushing toward the following conclusion: In terms of regular season success, the ability to keep the opposing team from reaching 5 runs in any given game was a crucial aspect of win total. More to come....

No comments: