Baseball Articles
Evaluating Defense
Tom Tippett's Thoughts on Defense
Some people argue that it's impossible to measure the defensive performance of baseball players because the statistics available for that purpose are woefully inadequate. If you're talking about traditional fielding stats -- games, putouts, assists, errors, double plays -- I wouldn't go so far as to say that it's impossible, but I would agree that it's not easy.
In this article, we'll look at those traditional fielding stats and talk about what you can and cannot learn from them. We'll look at more modern fielding statistics such as Pete Palmer's Fielding Runs, the zone ratings from STATS Inc., and Bill James Win Shares. As the providers of a computer baseball game, one of our ongoing tasks is rating players in all phases of the game, including defense, and we'll talk about how we use detailed play-by-play data from STATS to improve our understanding.
Even with these advances, evaluating defense is not an exact science. If you're a the-glass-is-half-empty sort of person, you could take that to mean it's not worth the effort. But I believe the availability of play-by-play data has raised the level of the water so the glass is now about 90% full, and if you're interested in joining me for a little stroll through the evolution of fielding analysis, I think you'll end up with a better idea of what we can and cannot learn about defense.
Baseball analysis in general
The idea of using statistical measures to assess the ability to succeed in a certain phase of the game is not a radical one. Baseball people have been doing this for over a century to measure batting and pitching performances. They don't, after all, give the batting title to the guy with the prettiest swing, they give it to the player who hit for the highest average. They don't give the Cy Young to the pitcher with the best mechanics or the guy who throws the hardest, they give it to the one who was deemed to be most effective. They look at results, not form or effort or attitude or any of the other things that a player brings to the game.
But for the most part this tradition has extended only to hitting and pitching. Today's announcers and analysts make increasing use of modern measures like on-base percentage and inherited runners to shed more light on those areas of the game, but you never hear a television or radio analyst talk about meaningful measures of baserunning, throwing or defense. Instead, they talk about their impressions of the player -- how fast he looks, his quickness, strength and athleticism -- and say simplistic things like "they're the best fielding team in a league because they lead in fielding percentage."
Because we do our own analysis, we sometimes find players whose performance is better or worse than you would guess by watching them a few times a year. And while most of our ratings are consistent with the opinions expressed by baseball's leading writers and TV personalities, sometimes we conclude that a player is actually performing at a higher or lower level than his reputation would suggest.
Because we try very hard to provide the most accurate and realistic baseball simulation available, we can't afford to give in to public opinion and rate someone higher than his performance justifies. If we did that for defensive ratings, we'd have these options:
- reduce the rating of one of his teammates so the team's defense isn't overrated
- reduce the effectiveness of the team's pitchers to compensate for the extra plays this player will now make in the simulated season
- disregard these side effects and allow the player, the team, and its pitchers to produce better results than they should
We don't think it's fair to downgrade teammates so we can give a popular player a better rating than he deserves. And we don't think our customers would want us to disregard the side effects and publish a season disk with players and teams who will overperform. So we do our best to rate players based on their actual performance.
Judging by Watching
For a few years now, I've wanted to write a little piece about how difficult it is to judge defensive ability, or any baseball skill for that matter, just by watching a lot of games. Then I found an essay by Bill James in his 1977 Baseball Abstract (a self-published book that predated his debut in bookstores by about five years) that says it far, far better than I ever could.
Here are a few excerpts from this wonderful essay, starting with a comment on how differently most people tend to approach the assessment of hitters and fielders:
"While we might not all be able to agree who the greatest-hitting first baseman ever was, the record books will provide us with a reasonably brief list to choose from: Gehrig, Anson, Foxx, Sisler. That's about it. Nobody's going to argue that it was Joe Judge or Moose Skowron, because the record books simply will not permit it . . .
Fielding statistics provide no such limited clarity. Talk about the greatest fielding shortstops ever . . . and the basic argument for everybody is 'One time he made a play where...'
Suppose we turn that same argument back to hitting. Now Moose Skowron hit some baseballs a long way, but nobody is going to say that he was the greatest hitting first baseman ever because 'One time I saw him hit a baseball so far that..." It is understood, about hitters, that the important question is not how spectacularly but how often. Brooks Robinson is known as a great fielding third baseman not because of the number of plays that he makes, but because he looks so good making them. Nobody talks anymore about what a great hitter Jim Northrup was, although to tell you the truth I never saw anybody who looked better at the plate. It is understood that, notwithstanding appearances, he wasn't an especially good hitter. Hitters are judged on results; fielders, on form."
And he talks about the difficulty of trying to judge effectiveness simply by watching:
"One absolutely cannot tell, by watching, the difference between a .300 hitter and a .275 hitter. The difference is one hit every two weeks. It might be that a reporter, seeing every game the team plays, could sense the difference over the course of the year if no records were kept, but I doubt it . . . the difference between a good hitter and an average hitter is simply not visible."
"a fielder's visible fielding range, which is his ability to move to the ball after it is hit, is vastly less important than his invisible fielding range, which is a matter of adjusting his position a step or two before the ball is hit."
In that essay, Bill went on to propose a scoring system that accomplishes essentially what STATS Inc. is doing now -- recording the location of every batted ball so that we could build a record of fielding performances similar to the statistical records that we use to judge batting and pitching performances.
I'm not saying that it doesn't matter whether you watch games or not. I'm just saying that I agree with Bill that it's very difficulty to rate players solely by watching games. We also need useful measures of what they accomplished.
Measuring Defensive Range
Defensive range is the ability to cover ground and get to more balls than the average fielder, and it's one of the hardest elements of fielding performance to measure.
Official fielding stats provide information such as games played, putouts, assists, errors, double plays, and fielding percentage. But using these numbers to assess player skills is extremely difficult, if not impossible. The list of reasons is very long, but they all boil down to the fact that they don't tell you how many chances to make plays were presented to each fielder.
In 2002, for example, Jose Vidro led the majors in assists by a second baseman. Does this mean he was the best seconde baseman in baseball, or was this just because:
- he played more innings than everyone else?
- he played behind a pitching staff that didn't strike out a lot of batters, so more balls were put in play?
- his pitching staff induced a high percentage of ground balls?
- his pitching staff was heavily right-handed, so they faced more than the normal number of left-handed batters (who hit more ground balls to the right side)?
- his park somehow makes it easier for him to make plays?
- it just happened that more balls were hit to second when he was playing?
Baseball analysts, ourselves included, have made many attempts to devise methods that deal with some of these other factors so that we can isolate the contribution the player is making. Let's review them, and then talk about some newer methods that we've been using.
Range Factors and Defensive Innings
In the 1970s, Bill James introduced the idea of range factors to compensate for playing time. A player's range factor is generally computed as successful chances (putouts plus assists) per game. This was a good first step, even though Bill acknowledged at the time that it wasn't meaningful for pitchers, catchers and first basemen.
One thing that frustrated Bill was the fact that not all games played are equal. Some players play almost every inning of their games. Others split the playing time with a platoon partner. Late-inning defensive specialists often pick up a lot of games played without actually playing a lot. For a while, Bill devised methods to estimate how many innings each fielder was actually in the game at his position, but this is very hard to do. Fortunately, companies like STATS have been publishing accurate counts of defensive innings for the last ten years. So we can now compute range factors on a per-nine-innings basis, just like we do for earned run averages.
Using a range factor based on defensive innings, Pokey Reese moves to the top of the list of 2002 second basemen with 5.86 successful chances per nine innings. Vidro drops to seventh.
The fixed sum problem
Whether you use games or innings as the basis of a range factor calculation, there's another critical problem with range factors. By measuring plays made per game or per nine innings, the method takes no account of the length of those innings. Consider the following two innings that start out the same way and feature the same mix of batted balls, only with different results:
- strikeout ... ground ball double down the third base line ... line drive single to shallow center ... popup to third ... triple into the right field corner ... ground ball single between first and second ... groundout to third
- strikeout ... great diving stop by third baseman on hard-hit grounder down the line, with the batter out at first on a strong throw by the third baseman ... line drive single to shallow center ... popup to third
In the first version of this inning, the official fielding stats record a putout for the catcher (on the strikeout), one assist (on the inning-ending ground out) and one putout (on the popup) for the third baseman, and one putout (on the grounder) for the first baseman. In the second version of this inning, the official fielding stats are exactly the same. The fact that the defense allowed three more hits in the first one is completely lost.
In this example, there's no way to tell which team defense and which individual fielders were more effective just by looking at the official fielding stats. In the more general case, the best fielders will generally end up making more plays than the poorest defenders. But the number of putouts in a nine-inning game adds up to 27 no matter how many hits are allowed, and the number of assists is mostly a product of the number of ground balls, not the skill of the infielders. So we can't use range factors to evaluate team defense at all, and they don't tell us nearly enough about individual fielders either.
Adjusted Range Factors
Even if we use defensive innings to measure playing time, we still haven't taken into account (a) the number of opportunities presented to each fielder and (b) the fact that some putouts and assists are harder to come by than others. Back in the 1980s, I developed a new type of range factor that adjusts for many of these variables in the following ways:
- it counts the number balls put in play (excluding homeruns) while each fielder was at his position, removing the strikeout rate of the pitching staff as a potential source of bias
- it counts only those putouts and assists that required the fielder to do some important work (e.g. taking a groundball and getting an out by making a throw or stepping on the bag for a force, spearing a line drive, or tracking down a fly ball) and ignores the ones that don't say much of anything about defensive range (e.g. taking a throw at first base, making the pivot on a double play, or tagging a runner on a steal attempt)
- it tracks balls put in play by left- and right-handed batters separately, since players pull the ball on the ground much more often than they go the other way
- it adjusts for the ground ball percentage of each team's pitching staff
Traditional range factors compute plays made per game or per nine innings. This method computes plays made per 100 batted balls, meaning that we can use it to get a better handle on both team and individual defense. If one team gives up a lot more hits than another, it will need more balls in play to get through a game, and the adjusted range factors for the poor fielding team will be lower.
Here's how these factors affected Vidro:
- his pitching staff was a little above average in strikeouts
- only 12% of Montreal's innings were thrown by lefties. That's a low figure, but the percentage of balls put in play by lefty hitters was about average despite the right-handed nature of his pitching staff. (By the way, if we made an assumption based on the left/right mix of the staff instead of actually counting balls put in play, we would have assumed Vidro got more chances to make plays than he really did.)
- Montreal's pitchers were second in the majors in ground ball percentage, a strong indication that Vidro's numbers were boosted significantly simply because he had more balls hit his way
Based on adjusted range factors, Vidro was a little below average among all major-league 2Bs this year, and while we can't finish our assessment of his play without using more advanced methods, we've already seen enough to conclude that his MLB-leading assist total is highly misleading.
This approach produces much better information than does an ordinary range factor, but we're still left with the fact that we're using these adjustments to make an educated guess at how many opportunities each fielder had to make plays. It goes without saying that it's possible to do better when we have access to play-by-play accounts that note the location of every batted ball.
Total Baseball's Fielding Runs
Before moving on, let me take a moment to say that the Fielding Runs numbers in the Total Baseball encyclopedia can be extremely misleading. I don't enjoy saying this, because they were developed by Pete Palmer, and Pete's a friend and one of the nicest guys I've ever met.
The first problem I have with fielding runs is that they're just a glorified range factor, with different weights for different events. Like range factors, you cannot interpret them accurately unless you know the strikeout rate and groundball/flyball ratio of the pitching staff and what percentage of left-handed batters the fielder faced. For a good example of the distortions that often creep into the fielding runs numbers, see the comments on Frank White and Ryne Sandberg in an article I wrote for ESPN.com in September, 1998.
In addition, I don't agree with some of the formulas, mainly because they put too much weight on certain events. For example, the formula for outfielders is .20(PO + 4A - E + 2DP), meaning that catching a fly ball with the bases empty earns you .20 fielding runs, while catching the same fly ball and throwing out a runner for a double play earns you 1.4 fielding runs. In both cases, the fielder made the best play available, but one counts seven times as much as the other. And suppose one center fielder reaches a ball but muffs it for a one-base error, while another lets it go up the gap for a double -- the guy who reached the ball has .20 fielding runs deducted and the second guy isn't penalized at all.
Finally, the fielding runs formula mixes range, errors and throwing into one number, which is appropriate for what Total Baseball is trying to accomplish (an overall player rating), but useless for what we do, which is to assign separate ratings for these skills.
STATS Zone Ratings
The next logical step beyond range factors is a system that counts actual opportunities to make plays. We weren't able to do that until 1989, because nobody tracked the location of every batted ball until then. The folks at STATS were the first to do it, and they developed the zone rating to take advantage of this new information.
STATS says the "zone rating measures all the balls hit in the area where a fielder can reasonably be expected to record an out, then counts the percentage of outs actually made." Instead of having to estimate the number of opportunities to make plays from defensive innings, percentages of balls in play, the left-right composition of the pitching staff, and the staff groundball/flyball ratio, we can actually count the balls hit to each fielder while they are in the game.
The zone rating could have been a tremendous breakthrough, but we disagree with some of the details of their implementation.
First, they don't count all the balls. For example, no infielder is charged with an opportunity when a grounder is hit down the lines, in the holes, or up the middle. The only plays that go into the zone ratings are the ones where the ball is hit more or less at a fielder. The net result is a system that places more emphasis on good hands than range.
Even if you didn't know this, you could infer from their numbers. The league average zone ratings range from .763 to .885 depending on the position, suggesting that fielders are turning well over 80% of all batted balls into outs. But the truth is that only about 70% of all batted balls become outs. It's clear that the most challenging opportunities, the ones that separate the best fielders from the ordinary ones, are left out of their system.
The second issue is that errors are mixed in with the ability to get to the ball in the first place. Let's suppose a player is credited with 500 opportunties in a season, and let's suppose he was very reliable, making 8 fewer errors than the average player with that many plays to make. Those 8 errors become 8 outs and produce a zone rating that is .016 above the league average. Without taking the errors into account, you might conclude that he has above-average range, when in fact he has average range and very good hands.
The third issue no longer applies but needs to be mentioned. Through the 1999 season, when an infielder started a ground ball double play, STATS credited him with two outs and one opportunity. Starting double plays is an important skill for an infielder, but this approach gives a significant boost to infielders who play behind pitchers who put lots of runners on base and/or with a pivot partner who turns the DP well, and it clouds the effort to measure defensive range. STATS doesn't do this any more, but if you have copies of the STATS Player Profiles books from the 1990s, you'll be looking at zone ratings that double-count these DPs.
Once again, let me say that the idea behind the STATS zone rating is sound and has value even with these issues. If you're looking for an overall measure of fielding performance that includes both range and errors, it won't matter to you that they're lumped together. And folks like us who are interested in separating these skills can make an adjustment for error rates to isolate the range portion.
The zones are smaller than we'd like, but my guess is that STATS did this on purpose to avoid running into two other issues that we'll talk about in a bit. First, some batted balls are playable by more than one fielder, and keeping the zones on the small side reduces the number of opportunities for one fielder to affect his neighbors. Second, outfield zones that cover the entire field make the system more vulnerable to distortions arising from different ballpark dimensions and characteristics. Our zone-oriented analysis does cover the whole field, so we've developed some methods for handling the interaction among fielders and accounting for park effects.
Defensive Average
For a few years in the early 1990s, we used a type of zone rating called Defensive Average (DA) . It was developed by Pete DeCoursey and Sherri Nichols and used play-by-play data from The Baseball Workshop. Like the STATS zone rating, defensive average used the same principle of counting batted balls hit into each fielder's zone and counting the number of plays he made. But it covered the whole field and didn't mix apples and oranges by double-counting GDPs. As a result, we felt we got better results from defensive average than from the STATS zone ratings.
When assigning responsibility for balls hit between fielders, the STATS and DA systems are similar if an out is made. Both systems credit the fielder with one opportunity and one play. But things get tricky when the ball falls in for a hit.
If the ball falls into one of the STATS zones, the fielder responsible for that zone is charged with an opportunity. If it falls outside the STATS zones, the play is ignored, and no fielder bears responsibility for the hit.
In the DA system, each player gets charged with half an opportunity when there's a hit that lands between two fielders. That means that someone playing next to a weak fielder tends to look worse than he is, because if the other guy makes the play, there is no opportunity charged, but if the ball falls in, he's charged with half an opportunity even if it's the sort of play the other fielder would be expected to make at least some of the time.
During the years in which we used the Defensive Average system, we were aware of this limitation and did our best to make intelligent adjustments to compensate for it when assigning player ratings. But we always wanted to see if we could do better.
The Diamond Mind Approach
In 1996, we began using a collection of old methods and new tools to expand our look at defensive performance, and we have been refining and improving these methods ever since. We believe that by using these tools to look at player performance from several angles, we can learn a lot more about who accomplished what in a given season.
To one degree or another, our best tools take advantage of the fact that STATS has been recording the type (grounder, fly ball, line drive, popup, bunt) and location (direction and distance) of every batted ball since the late 1980s. Using this information, our analysis programs aren't vulnerable to the potential biases in traditional fielding stats. We know exactly how often each player was in the field, how often the ball was hit near him, and how many plays he made on those balls.
The field is divided into approximately 80 zones. We count the number of balls hit into that zone, the number of times each fielder made an out, and the number of singles, doubles, triples, and errors that resulted. When we're done, we look at the zone data for all of the major leagues and see how often the players at each position were able to make plays on those balls.
For example, on the 6939 grounders up the middle to the shortstop side of the bag during the 2002 season, MLB shortstops turned 64.4% of those balls into outs and made errors 1.9% of the time. Second basemen ranged to the other side of the bag to make the play 0.8% of the time. Almost of the remaining grounders in this zone resulted in singles, with a handful of doubles and fielders choice plays to round things out.
This gives us a baseline that we can use to evaluate performance on balls hit into this zone. Repeating this process for all batted ball types and every zone gives us an overall measure of the playmaking ability of a team and its players.
With one exception, our zone-oriented approach includes the entire field and all types of batted balls. Early on, it became clear that we needed to screen out infield popups because they don't tell us anything. Over 99% of these plays result in an out, so they don't distinguish the good fielders from the not-so-good. And because these plays are easy to make, most popups can be handled by any of several players, making the successful completion of this play as much (or more) a matter of preference than one of skill.
As I mentioned previously, we need to use measures of team defense to help us deal with the interactions among fielders. If one player doesn't get credit for making a play, it may be because another fielder beat him to it, and the first guy shouldn't be punished for playing next to a superior defender. It's only by looking at measures of team defense that we can distinguish the cases where another guy made the play from those when the ball fell for a hit. So let's take a moment to discuss team defense metrics.
Defense efficiency record (DER)
We usually start by computing the percentage of batted balls, excluding homers, that were turned into outs by the team. This percentage was labelled the Defense Efficiency Record (DER) by Bill James when he wrote about it in the 1980s, and you can find DER information on the Baseball Prospectus web site during the season.
I'm not completely sold on DER as the ultimate measure of team defense, however. For one thing, I've always been troubled by the fact that it's just a variation on batting average, with strikeouts and homeruns removed, and with the focus on the out percentage instead of the hit percentage. But league batting averages have ranged from a low in the .230s to a high in the .300s in the past 80 years, so they don't just measure batting skill. They also embody the impact of the rules of the game (strike zone, mound height), the equipment (dead ball, lively ball, juiced ball), and the changing nature of ballparks. Similarly, the league DER figures have risen and fallen by large amounts, indicating that factors other than fielding skill are built into these numbers, too.
A second question about DER is the extent to which it measures pitching versus fielding. I've always believed that DER measures some of both. There is a strong (but not perfect) correlation between a team's rankings in ERA and DER, suggesting that (a) good pitchers make their fielders look better and/or (b) the team's rank in ERA is in large part due to the quality of its defense. It's hard to know which way to look at it, but I believe it works in both directions.
Recent work by Voros McCracken and Dick Cramer suggests that pitchers have little or nothing to do with the percentage of balls in play that are turned into outs. To put it another way, the defense is entirely responsible for a team's DER ranking. I'm not ready to accept that pitchers have nothing to with these outcomes. While I haven't had time to do any detailed studies in this area, some very preliminary work suggests that good pitchers do improve a team's DER, though only by a few points. But because pitchers allow a very large number of batted balls over the course of a season, these small improvements can have a large effect on the pitcher's ERA.
Another issue with DER is that park effects can play a large role. It's clear that the enormous impact that Coors Field has on scoring isn't entirely due to homeruns. A much higher percentage of balls that stay in the field of play are falling in for hits, too, and that makes Colorado's team defense look much worse than it really is. This is the most extreme example, of course, but there are other parks that make a difference.
In other words, we start our process by computing the DER for each team, but we don't take that figure as a precise measure of the team's ability to make plays in the field. We keep the potential distortions in mind as we go through our rating process.
Other measures of team defense
Our zone-oriented analysis provides us with another way of rating team defenses. We can go zone by zone and compute how many more (or fewer) plays were made by this team than the average team, then do a weighted average of all of the zones to get an overall score for the team. That overall score is expressed as the number of plays made above or below the average. In 2002, for example, Anaheim's defense led the majors by making 120 more plays than the average team (in 4228 opportunties). These figures are not park adjusted, so they're not definitive, but they definitely add value in the process.
To isolate portions of a team's defense, we rate the infields by computing the percentage of ground balls turned into outs and the outfields based on the percentage of fly balls and line drives that were caught.
Because we use a collection of overall measures (like DER), mid-level measures (such as out rates on grounders), and detailed zone-based analysis, we can examine team defense at several levels of detail. That helps us determine which fielders are getting the job done and which are letting the team down.
Park effects
We can't leave the subject of team defense without looking more closely at the parks.
We mentioned Coors Field a moment ago, but Dodger Stadium is another good example. From 2000 to 2002, that park depressed batting averages by 21 points, making it one of the best pitchers' parks in the game. And it wasn't just because of strikeouts and homers, either. Focusing only on balls hit into the field of play, Dodger Stadium took away 97 hits a year in that period. If half of them came with the Dodgers on defense, measures that ignore park effects (like DER) make LA's team defense appear to be 48 plays better than it really is.
Using play-by-play data, we can also compare the hit rate on different types of batted balls. Dodger Stadium dramatically reduces the percentage of ground balls that go for hits. It also cuts the hit rate on fly balls, but not by a whole lot. Because virtually all of the park's effect is concentrated in the infield, it would be especially easy to overrate the LA infield if we ignored this information.
Evaluating individual players
Most of our work at the player level uses zone-based data. We compare the rate at which each fielder turned batted balls into outs in each zone with the overall averages. If a player made more than the normal number of plays, he gets a plus score for that zone. If he fell short of the overall average, he gets a minus score. By computing a weighted average of all of his zones, we get a figure that tells us how many more (or fewer) plays he made than the average defender. We call this figure "net plays".
In a typical season, the top fielders at each position make 25-30 more plays than the average. Exceptional fielders have posted marks as high as 40-60 net plays, but those are fairly uncommon. Recent examples include Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones in his better seasons. The worst fielders tend to be in the minus 25-40 range.
As a reality check, we look at other measures like range factors, adjusted range factors, STATS zone ratings, and our own version of the STATS zone ratings (with larger zones). More often than not, these measures tell similar stories. When they disagree, we look for external factors that might be skewing those other measures. In the end, we put the most weight on our net plays analysis.
But the net plays figures are starting points, not the final answer, because we have several other things to consider before we assign a rating. We've already talked about park effects, so I won't dwell on that any more.
As with the STATS zone ratings, our net plays analysis can be influenced by error rates. So we always look to see whether a fielder is making more plays mainly because he has better hands. Mike Bordick and Alex Rodriguez are two good examples from the 2002 season. In some cases, a player will have a mediocre net plays figure because he made a lot of errors, and we may bump up his range rating to account for the fact that he's getting to more balls in the first place.
For infielders, we have another analysis program that measures their ability to start double plays and get force outs when such opportunities exist. Especially for corner infielders, the ability to make the tough plays can separate the men from the boys. If a first baseman always takes the ball to the bag and doesn't start his share of double plays and force plays, he's not helping the team, even if he does record a normal number of outs.
For middle infielders, we also look at how often they are able to make the pivot on the double play. This is an important part of the second baseman's job, and he can make up for ordinary range by turning two more often. It isn't talked about very often, but we also see differences in the ability of shortstops to complete these plays.
For shortstops, we look at the zone data to see if their net plays score has been artificially depressed by sharing the left side of the infield with an especially talented third baseman. For example, Scott Rolen is way above average on balls to his left, and that cuts down on the number of plays his shortstops can make. If the overall team defense in that zone is still very good, there's no reason to penalize the shortstop. Similarly, we look for first basemen who are taking plays away from the man at second. By looking at the zone data for individual fielders and for the team as a whole, we can tell whether plays not made by one fielder are getting made by someone else.
The same is true in the outfield. For balls hit in the gaps, we look at the zone data to see if an exceptional fielder might be taking plays away from his neighbors.
Another of our analysis programs counts the number of times a player is used as a defensive sub or is removed for a defensive sub. This information doesn't tell us anything about performance, of course, but it is very helpful to know that one fielder was regarded by his manager as being superior to another.
Like many of you, we read a lot, we watch games on local TV and satellite and the highlight shows on ESPN and Fox, because it helps to have an image of a player when we evaluate the performance data. And we compile an extensive database of player notes, so we know who's coming off a knee injury or a shoulder problem that might have affected their ability to make plays.
And when the evidence doesn't match the player's reputation, we double-check our work and look very, very hard for the reasons why. Whenever possible, we talk to people -- local writers, broadcasters and sophisticated fans -- who have seen the player quite a bit to see if we can gain some additional insight into each player's performance.
After rating all of the players, we go back and double-check these individual ratings to see if they add up to something resembling the team's park-adjusted defensive performance. If not, we go back over everything we know about those players and keep at it until it makes sense.
Bill James' Win Shares
In his recent book called Win Shares (published by STATS in 2002), Bill James developed a method for apportioning each team's wins to the players who were most responsible for creating them. A big part of that method involves evaluating defense at both the team and individual level. We're still in the process of evaluating this new approach, but we can point out a few things that you might want to keep in mind as you ponder the role that system should have in evaluating players:
- Bill begins by evaluating overall team defense and then tries to break that down and assign credit/blame to positions and then players. We've been doing that for many years.
- Bill's method is intended to work with players from all eras, including that vast portion of baseball history for which play-by-play data is not available. So he chose to develop new techniques for coping with the biases inherent in traditional fielding stats. We've been aware of those biases for a long time and have always kept them in mind while evaluating traditional fielding stats.
- Bill's system is an attempt to make better estimates of the number of opportunities to make plays and the number of plays made, and it appears that he has come up with at least a few useful ways to do that. On the other hand, using play-by-play data from the 1990s, we can now count those things directly, and we want to spend some time seeing whether Bill's estimates match up with the actual data for that period. If they do, he's made a giant contribution to the field, because we can confidently apply his techniques to seasons for which we don't have first-rate play-by-play data. If they don't, we'll have to figure out why and proceed from there.
- Bill's method is intended to aggregate all aspects of fielding performance into one number, while our goal is to isolate specific skills. We have separate ratings for range, errors and throwing, and we cannot assume that a high number of defensive win shares necessarily indicates a fielder who should get a top range rating. It's possible that his range is average and his value lies in a strong arm and good hands.
- We're not yet sure about the weights Bill put on different fielding skills when coming up with his fielding win shares. To some extent, that doesn't matter to us because we're more interested in rating the individual components of defense anyway. But as fans of baseball analysis, we're curious to see whether Win Shares really works, so we hope to find time to look at this part of his system, too.
The bottom line is that we will continue to rate fielders for modern seasons based on our analysis of play-by-play data. But we're always on the lookout for new and better ways to evaluate fielders, and if our review suggests that the fielding portion of the Win Shares model provides us with some new tools, we'll use them.
Other Approaches to Rating Players
We know that a lot of our customers like our products precisely because we do our own analysis instead of rating everyone based on prevailing opinions. At the same time, we know that there are other people who don't buy our products because Tim McCarver says that someone is a brilliant fielder, and because McCarver is a well-known TV analyst and ex-player, he must therefore know a lot more about this stuff than we do.
Let's suppose, for the sake of argument, that we wanted to ditch all of our analysis and rate players based upon what we read and hear from the media. That's a lot harder to do than you might think, for a whole host of reasons.
When someone in the media says "he's the best second baseman in baseball," it's not always clear what it means. It could mean he's the best overall player at his position (including hitting, running, etc.). It could mean he has great hands. It could mean he turns the double play well or that he has great range. Even if it means all of these things to some degree, an overall evaluation doesn't help us. We have separate ratings for separate skills, and we need objective evaluations of each skill.
The media doesn't talk about all the players. We have 1200+ players to rate each year, and only a fraction of them are regularly discussed. Some players may be overrated because they play for teams in media-intensive cities or teams that got a lot of exposure in the playoffs, while good players on small-market teams may be overlooked.
It often seems as if it takes a year or two for someone's reputation to catch up with a change in his performance, for better or worse. In the 15+ years we've been rating players, we've often identified someone who has been making a lot of plays without getting noticed. It's not unusual to see that player start to win Gold Gloves two years later. And then keep winning Gold Gloves for a few years after their performance no longer merits them.
Managers and general managers make public comments about players all the time, but their remarks can be influenced by the needs of the team. Sometimes it's to their advantage to talk about players in certain ways, whether it's to hype someone for marketing purposes or to talk them down in a salary squabble. It's hard to tell when we can take a comment at face value and when we need to discount it because of a hidden agenda.
I'd love to incorporate the opinions of professional baseball scouts because they are trained to see things that other people don't see. But it's difficult to find a collection of scouts who have seen every player and can make their evaluations available to people outside the organizations they work for.
We could base our judgments on how often someone shows up on SportsCenter. But the photogenic play isn't always the best play. The exact same fly ball might produce a routine play for a great fielder, a diving catch for the average fielder, or a single for the poor fielder. The diving catch is the only one that makes the highlight films. The majority of highlight-film plays are made at the edge of the fielder's effective range, whatever that range happens to be.
(A few years ago, I saw a game in Baltimore in which the right fielder broke back on a line drive, realized it wasn't hit that hard, reversed course and recovered in time to make a nice shoestring catch. What should have been a very easy play wound up being shown dozens of times as CNN's Play of the Day.)
We could place a lot of weight on the Gold Glove voting. Putting aside the question of how well the voters do that job, there are still several obstacles. They don't announce the voting, so we have no idea who came second or how close the vote may have been. And even if we were to accept all Gold Glovers as top fielders, we can't award them all our top range rating because Gold Gloves are given for overall fielding performance, and we have to rate players separately for range, throwing, and avoiding errors. For some Gold Glovers, the most accurate way to rate them would be to assign an excellent throwing rating, a very low error rate, and an average range rating.
Summing up
We do our very best to rate players based on performance, not reputation. To that end, we license play-by-play data and spend a lot of time developing new ways to analyze that information and interpreting that information in light of everything we know about that player's performance. The phrase "everything we know" includes our own analysis of team and player fielding skill, other measures like range factors and STATS zone ratings, injury reports, park effects, plus what we see and hear and read as we follow baseball on a daily basis. We hope you like the results as much as we enjoy doing this work.
How to Measure Team Efficiency
Measuring team efficiency
By Tom Tippett
December 5, 2002
It goes without saying that wins and losses are the most important things to consider when judging a team's performance. They are, after all, what the game is all about and what determines who gets to keep playing until there's only one winner left.
The next most important things are runs scored and runs allowed. You win games by outscoring your opponents, so the connection between runs and wins is very strong. It's not perfect, though, and every season produces a few teams that win more or less than you'd expect given their run differential.
If runs are one step removed from wins, then the baseball events that produce runs are two steps removed from wins. You score runs by putting together singles and walks and doubles and steals and homers, and you prevent runs by holding the other team to a minimum of those things.
In most cases, there's a very direct relationship between wins and runs and the underlying events that produce runs. But that's not always the case, and in this review of the 2002 season, we'll identify teams where those relationships didn't hold up. If the past is any guide, this will give us some very strong hints about what is likely to happen with those teams in the future.
To explore the relationship between runs and wins, we'll use the pythagorean method that was developed by Bill James. To explore the relationship between offensive events and runs, I'll introduce a new statistic that I'll call the run efficiency average. This number will tell us which teams were unusually good at turning offensive events into runs and unusually good at keeping the other team from doing the same.
We'll end up with three measures for each team -- one for offensive efficiency, one for defensive efficiency, and one for pythagorean efficiency -- that will tell us which teams squeezed more wins out of the hits and walks and homers and other events that occurred during their games. And which teams squandered their output to the greatest degree.
And we'll take a look at some history. We'll see that teams that are unusually efficient (or ineffecient) have exhibited a very strong tendency to revert back to the norm the next year. In other words, if your team was especially inefficient in 2002, there is every reason to believe things will be better next year. And the opposite is true, too. If your team was very efficient this year, don't count on a repeat performance next year.
That's good news for the Cubs, Brewers, Devil Rays and Tigers. And bad news for the Angels, Braves, and Twins. It's way too early to start predicting what's going to happen in 2003, and all thirty teams are quite capable of improving or regressing based on their off-season moves and the development of their younger players and prospects. But we can say that these seven teams (and a few others to a lesser degree) go into the offseason in better or worse condition than it might seem based solely on their 2002 win-loss records.
Converting runs into wins
Others, notably Rob Neyer and the Baseball Prospectus crew, have written extensively on ESPN.com about the Bill James pythagorean method, a well-established formula that says that a team's winning percentage is tightly coupled with runs scored and runs allowed. The expanded standings on ESPN.com include run margins and expected win-loss records that are derived using this formula, and Rob's home page showed pythagorean standings every day. So I'm not going to go over that ground again.
I will, however, try to put the 2002 results into historical context. For instance, the Red Sox and Cubs won 8 fewer games than their run margin would normally produce, while three teams (Minnesota +7, Oakland +6, and Detroit +6) won at least six more than expected. How unusual is this? And what tends to happen to teams that stray from their expected win totals?
I started by computing the expected and actual win totals for every team since 1962, the first year the 162-game schedule was used in both leagues. The more games you play, the larger the differences between expected and actual wins, so I didn't want to mix seasons with different schedules. For that reason, I left out the strike-shortened 1972, 1981, 1994, and 1995 campaigns, leaving a total of 72 league-seasons.
In those 72 league-seasons, the average team "missed" its pythagorean projection by only 3.2 games, indicating that there is indeed a very strong relationship between runs and wins. How does 2002 compare? This year, AL teams were off by an average of 4.1 games while their NL counterparts missed by an average of 3.0 games. Overall, the 30 teams had an average difference of 3.5 games, slightly higher than the historical average but well within the normal year-to-year fluctuations.
The wackiest season, in the pythagorean sense, was the 1978 National League, whose teams missed their projections by an average of 5.3 games. In the NL West that year, the Dodgers led the league in both scoring and fewest runs allowed, outscoring their opponents by 154 runs, while the Reds were only +22 on run differential. But Cincinnati won nine more games than expected and the Dodgers five fewer, turning what could have been a runaway win by the Dodgers into a close battle that saw LA win by 2-1/2 games. In the NL East, the story was much the same, as the Phillies (-6) edged the Pirates (+2) by two games in a race that was much closer than it could have been. That same year, the Expos matched the Reds with a run margin of +22, but Cincinnati (+9) won 92 games and Montreal (-8) only 76. (The Expos run margin foreshadowed their improvement; they went on to win 95 games in 1979.)
In contrast, the 1991 American League was closest to pythagorean form, with an average difference of only 1.8 wins.
What tends to happen to teams with large pythagorean differences? Here's a list of the 22 teams that have exceeded their projected win total by at least 8 games, along with their differences in the next year:
Team Diff Next --------------- ---- ---- 1974 Padres +12 + 7 1984 Mets +12 + 1 1970 Reds +11 - 3 1963 Astros +10 + 4 1997 Giants +10 - 3 1998 Royals +10 -11 1970 Phillies + 9 + 3 1977 Orioles + 9 + 6 1978 Reds + 9 - 1 1978 Athletics + 9 + 4 2001 Mets + 9 - 4 1971 Braves + 8 + 5 1973 Tigers + 8 + 8 1974 Tigers + 8 + 2 1976 Cubs + 8 + 5 1977 Mariners + 8 0 1979 Astros + 8 + 5 1982 Giants + 8 - 1 1987 Expos + 8 - 5 1989 Astros + 8 + 5 1992 Astros + 8 - 6 1997 Reds + 8 - 3
As you can see, only a few teams came close to matching their pythagorean differences in the next season. In fact, these 22 teams were collectively 18 wins above their projection the year after, an average of less than one win per team.
(Just to be clear, these next-year numbers don't represent the change in actual win-loss record from the year before, so they don't measure the team's tendency to get better or worse. They represent the difference between actual and pythagorean wins the next season. In other words, they measure the tendency to consistently win more or fewer games than the run margin suggests, not the tendency to produce a better or worse run margin in the first place.)
On the flip side, here are the teams with the biggest negative differences since 1962:
Team Diff Next --------------- ---- ---- 1962 Mets - 8 + 3 1968 Pirates - 8 - 2 1974 Angels - 8 + 3 1975 Dodgers - 8 + 2 1978 Expos - 8 0 1983 Cubs - 8 + 4 1983 Rangers - 8 - 5 1985 Indians - 8 + 4 1986 Giants - 8 - 4 2000 Astros - 8 + 4 1962 Cardinals - 9 - 2 1963 Twins - 9 - 9 1964 Twins - 9 + 1 1966 Yankees - 9 + 5 1974 Athletics - 9 - 1 1975 Yankees - 9 - 1 1980 Brewers - 9 + 4 1984 Astros - 9 0 1985 Red Sox - 9 + 4 1990 Mets - 9 - 3 1980 Cardinals -10 + 3 (103 games in 1981) 1993 Padres -10 - 5 1997 Astros -10 - 6 2001 Rockies -10 + 4 1970 Cubs -10 + 3 1975 Astros -11 + 3 1999 Royals -11 + 1 1967 Orioles -12 - 2 1984 Pirates -13 - 6 1986 Pirates -13 + 1 1993 Mets -14 + 1
These 31 teams were collectively 4 wins above their projection the year after, about as close to zero wins per team as you can get.
These extreme teams do leave us with a few unanswered questions. Why are there more NL teams than AL teams on these lists? Why do the Astros show up as often as they do? How miserable must the mid-1980s Pirates fans have been when their team posted a three-year pythagorean difference of -32 wins from 1984 to 1986? These answers, if they exist, will have to wait for another day.
I'm not going to suggest that I have proven this beyond a reasonable doubt, but I believe luck plays a large part. If you wanted to argue that the over-achievers had big pluses because their manager was especially astute or their roster was full of clutch players, it would be a tough case to make based on the next-year records of these teams. And looking at the under-achievers, it would be even tougher to argue that their manager and players are fundamentally flawed based on their next-year results.
But teams change from year to year, and the under-achievers are much more likely to fire their managers and turn over half their rosters. Perhaps those changes were responsible for bringing them back to pythagorean normalcy. Even though I don't believe this argument would hold up under closer examination, it muddies the water a little.
Still, if managerial skill and clutch performance were the biggest piece of this puzzle, why wouldn't the over-achievers, the teams that would not be making many changes from year to year, be able to maintain their performance to a much greater extent?
Converting offensive events into runs
In the previous section, we took one step back from wins and losses to examine runs. In this section, we'll take another step back and look at the offensive events -- the hits and walks that lead to the runs that generate the wins -- that were produced and allowed by each team.
Just as there is a strong relationship between runs and wins, it's almost always true that the more hits and walks you produce, the more runs you'll score. Sometimes a productive team comes up short on the scoreboard because they didn't hit in the clutch or just because they happened to hit line drives right at people in key situations. Or the opposite could be true. But this relationship holds up most of the time.
To shed some light on this relationship, we need a way to take batting stats and turn them into a measure of overall offensive production. There are several good options here, including Runs Created (Bill James), Batting Runs (Pete Palmer), Equivalent Average (Clay Davenport), and OPS (on-base average plus slugging average). But many of them require a computer, and although we do computer analysis all the time, we also like to use simpler measures that anyone can use whenever they have a page of stats in front of them. The best of these simple methods give up very little accuracy in return for a big gain in usability.
For this exercise, I'll use the sum of total bases and walks, or TBW for short. TBW is not a perfect measure, but it does have a few things going for it. It captures the most important things a team does to produce runs -- singles, extra-base hits, and walks. It's easy to figure without a computer. In the past, I've used both TBW and OPS for this type of analysis, and the results are almost exactly the same, so the accuracy is more than acceptable.
And sometimes it just seems to tell a story more clearly. For instance, the 2002 Yankees had a team OPS of .809 compared to the .769 mark of the Mariners. Even though I've been working with OPS figures for a number of years, I still need to stop and think about what a 40-point advantage means. But if you tell me that the Yankees produced 224 more total bases and walks than the Mariners, that's something I can grasp right away.
The following table shows the offensive and defensive TBW figures for the American League, along with the difference between these two figures and each team's league rank based on those differences. It also shows runs for and against, the run differential, and the rankings based on run differential. Finally, because we're trying to trace a path from TBW to runs to wins, it also lists the team's win total for the year.
---------- TBW ---------- ------- Runs -------- AL Off Def Diff Rank Off Def Diff Rank W NY 3187 2629 +558 1 897 697 +200 2 103 Bos 3050 2532 +518 2 859 665 +194 3 93 Tor 2921 2995 - 74 8 813 828 - 15 8 78 Bal 2665 2981 -316 11 667 773 -106 11 67 Tam 2640 3219 -579 13 673 918 -245 13 55 Min 2911 2789 +122 7 768 712 + 56 7 94 Chi 3028 2839 +189 6 856 798 + 58 6 81 Cle 2774 2930 -156 10 739 837 - 98 10 74 KC 2728 3189 -461 12 737 891 -154 12 62 Det 2414 3006 -592 14 575 864 -289 14 55 Oak 3009 2595 +414 3 800 654 +146 4 103 Ana 2918 2647 +271 4 851 644 +207 1 99 Sea 2963 2710 +253 5 814 699 +115 5 93 Tex 3112 3205 - 93 9 843 882 - 39 9 72
As you can see, the team rankings using TBW and those using run differentials are very similar. In fact, they're identical except for Anaheim's move from fourth in TBW to first in run margin. The Angels were very efficient on both sides of the ball, finishing 4th in scoring (and only 8 runs out of second) despite trailing six other teams in TBW, and leading the league in fewest runs allowed even though three other teams gave up fewer TBW. (That efficiency didn't carry over to the relationship of runs to wins, however, as they led the league in run margin but were only third in wins.)
In terms of raw production, the Red Sox nearly matched the Yankees, but still managed to come up ten short in the win column. (The same thing happened in 2001.) This comes as no surprise to the long-suffering Boston fans or the incredibly smug New Yorkers who just knew the Sox would find a way to lose despite all their talent.
It's interesting to note that the White Sox were a match for the Twins in production even though Minnesota ran away with the division. For all the talk about the Twins superior pitching and defense and the problems the White Sox had in those areas, Chicago gave up only 50 more TBW, roughly one base every three games.
And we see yet another example of how strong the AL West was this year, with three teams in the league's top five in TBW and run differentials and the Rangers only a little below the league average. Oakland was a clear winner in TBW but trailed the amazing Angels in run margin, and it took an excellent 32-14 record in one-run games to keep the A's in first place.
Let's take a quick look at the National League before pausing to put these TBW numbers in historical context.
---------- TBW ---------- ------- Runs -------- NL Off Def Diff Rank Off Def Diff Rank W Atl 2808 2529 +279 3 708 565 +143 3 101 Mon 2865 2857 + 8 9 735 718 + 17 7 83 Phi 2970 2769 +201 5 710 724 - 14 9 80 Flo 2810 2927 -117 12 699 763 - 64 11 79 NY 2657 2758 -101 10 690 703 - 13 8 75 StL 2879 2630 +249 4 787 648 +139 4 97 Hou 2886 2793 + 93 6 749 695 + 54 6 84 Cin 2815 2928 -113 11 709 774 - 65 12 78 Pit 2566 2860 -294 13 641 730 - 89 13 72 Chi 2853 2814 + 39 8 706 759 - 53 10 67 Mil 2613 3086 -473 16 627 821 -194 16 56 Ari 2974 2608 +366 2 819 674 +145 2 98 SF 3045 2524 +521 1 783 616 +167 1 95 LA 2701 2653 + 48 7 713 643 + 70 5 92 Col 2826 3195 -369 14 778 898 -120 14 73 SD 2649 3040 -391 15 662 815 -153 15 66
While it's clear that Atlanta was the division's top team, their TBW differential wasn't much better than that of the Phillies, who somehow managed to turn a big edge in raw production into a negative run differential and a losing season. Most of the problem was on offense, where the Phils were 3rd in TBW but only 8th in runs scored. (Before the season, our computer simulations had the Phillies finishing a close second behind the Braves. In the real season, they were a very close second statistically, but that didn't translate into the things that really matter, runs and wins.)
The biggest surprise in the Central division was the Cubs. In fact, most of what I just wrote about Philly applies here, too. Our preseason simulations put Chicago third with a .500 record, and the real Cubs put up TBW numbers that were entirely consistent with being a .500 team. But they ranked a few places lower in runs than in TBW on both sides of the ball and they couldn't win the close games (18-36 in contests decided by one run). By the way, the 2001 Cubs were the division's best team statistically (+175 TBW) but failed to win the pennant; with two straight seasons like this, it's no surprise that a managerial change was made, regardless of whether the manager was to blame.
In the West, San Francisco outproduced Arizona but came up a little short in the standings. Both teams were very strong across the board, however, and the Giants showed during the second season that they really were the best team in the league. Statistically speaking, Los Angeles was much closer to a .500 team than their 92-70 record suggests. In fact, the Dodgers were the anti-Phillies, turning a 12th-place ranking in offensive TBW into a 7th-place finish in scoring. (Warning to LA fans: the Padres had the most efficient offense in baseball in 2001 -- 13th in OPS, 6th in runs -- and look what happened to them in 2002.)
A little TBW history
I've been putting these tables together for a few years now, and I can tell you that TBW differentials are usually in the plus or minus 300 range. With eight teams more than 400 from the midpoint this year, I wondered how these figures stacked up against other teams from the past. Thanks to Retrosheet's database of play-by-play accounts, I ran the numbers for all seasons back to 1974. (It would be nice to go back further, but the official stats don't include doubles and triples allowed by pitchers.)
Here are the top twenty teams from that 29-year period:
Team Net TBW Comment ---------------- ------- -------------------------------- 1998 Braves + 664 1998 Yankees + 662 Won WS 2001 Mariners + 603 1997 Braves + 568 2002 Yankees + 558 1976 Reds + 550 Won WS 1995 Indians + 536 144-game schedule (+603 per 162) 2001 Athletics + 534 1984 Tigers + 530 Won WS 1988 Mets + 524 2002 Giants + 521 2002 Red Sox + 518 Failed to qualify for postseason 1974 Dodgers + 509 1978 Brewers + 493 1996 Indians + 471 1986 Mets + 466 Won WS 1999 Yankees + 454 Won WS 1996 Braves + 449 1977 Dodgers + 448 1979 Orioles + 448
Notes: Only one of these strong teams, the 2002 Red Sox, failed to make the postseason ... only five won the World Series, a reminder that surviving the expanded postseason format is very tough ... could have been a great 1998 World Series if the Braves hadn't lost to the Padres ... the 1978 Brewers also had the best run margin in the AL that year, so this could have been one of the great three-way pennant races in history.
And the bottom twenty teams:
Team Net TBW Comment ---------------- ------- -------------------------------- 1996 Tigers - 727 1979 Athletics - 683 1998 Marlins - 676 Defending WS champs 1974 Padres - 630 Expansion 2002 Tigers - 592 1978 Blue Jays - 590 Expansion 1979 Blue Jays - 586 Expansion 2002 Devil Rays - 579 Expansion 1977 Braves - 539 1977 Mariners - 528 Expansion 1980 Mariners - 522 Expansion 1977 Blue Jays - 507 Expansion 1993 Rockies - 487 Expansion 1978 Mariners - 486 Expansion 1999 Twins - 484 2001 Devil Rays - 475 Expansion 2002 Brewers - 473 2002 Royals - 461 1989 Tigers - 447 1982 Athletics - 443
Notes: Eleven of these twenty teams were expansion franchises in the first seven years of their existence ... four 2002 teams made this list ... maybe two rounds of expansion since 1993 is the reason so many recent teams made both lists ... Billy Martin took over as manager of the A's after their disastrous 1979 season, led them to a winning record in both 1980 and 1981, then made this list again in 1982 before being fired ... the Twins have come a long way since 1999 ... in 1982, Bill James wrote that the Blue Jays might be the worst expansion team in history, but they got better in a hurry after that, so maybe there's reason for Devil Rays fans to have some hope as their young prospects move up.
Run Efficiency Average
Earlier in this article, when discussing the relationship between runs and wins, we saw that teams sometimes win quite a few more or less games than their run margin would normally produce. And that those differences don't tend to repeat the next year. It's very rare for a team to over-achieve (or fall short) two years in a row, and there's a very strong tendency to revert to a normal runs-to-wins relationship. Is this also true of TBW and runs?
To identify teams with particularly efficient or inefficient offenses, ones that produce more or less than the expected number of runs given the TBW they produced, I divided runs by TBW to get something I'll call the run efficiency average (REA). As you can see in the following chart, which plots TBW versus runs scored for every full team season since 1974, there's a very strong straight-line relationship between TBW and runs. In other words, we can predict runs scored from TBW with a high degree of accuracy.
It turns out that run efficiency averages look an awful lot like team batting averages. From 1974 to 2002, team batting averages ranged from a low of .229 to a high of .294 with a midpoint of .261. Baseball fans know from experience that a team batting average of .280 or higher is very good, and one below the .245 mark is woeful.
In this time period, run efficiency averages have ranged from .225 to .305 with a midpoint of .264. The midpoint and the spread are slightly higher than for team batting averages, but the benchmarks are basically the same. Anything over .280 indicates a very efficient offense, while anything under .245 indicates a team that squandered a lot of its chances.
Like team batting averages, run efficiency averages tend to be higher in the American League (because pitchers don't bat) and rise and fall by a few points from season to season. (They also appear to be higher in good hitters parks, but I'm leaving park effects out of the equation for the time being. I'll be looking at both offense and defense, so the park effects should cancel out when we subtract one from the other.)
So the best way to evaluate teams is to compare their run efficiency averages to the norm for their league that season and to rank them based on those differences. Here are the offenses that were most efficient in the 1974 to 2002 period, relative to their leagues, and what they did the following year:
Team REA Diff Next ------------------ ---- ------ ------ 2000 Rockies .305 + .035 + .019 1996 Rockies .303 + .032 + .020 1977 Twins .298 + .032 - .010 1987 Cardinals .293 + .030 - .008 1975 Reds .287 + .029 + .019 1985 Cardinals .281 + .026 + .002 1982 Brewers .288 + .026 + .005 1992 Brewers .287 + .025 + .008 1976 Phillies .282 + .025 + .017 2000 White Sox .301 + .025 - .003 2001 Mariners .295 + .024 + .005 1999 Indians .299 + .024 + .006 1998 Yankees .297 + .023 + .003 1974 Dodgers .283 + .023 - .008 1993 Rangers .292 + .022 + .003 1991 Brewers .287 + .022 + .025 2000 Royals .299 + .022 - .002 1981 Brewers .280 + .022 + .026 1981 Phillies .273 + .021 + .001 2002 Angels .292 + .021 ???
A majority of these teams were above average again the next year, but all but one made a move back toward the middle of the pack. On average, they lost 19 points relative to the league. On a base of 2800 TBW, that's a loss of 53 runs, enough to cost a team five to six wins.
Here are the least efficient offenses of this period:
Team REA Diff Next ------------------ ---- ------ ------ 1988 Orioles .226 - .038 + .003 1998 Devil Rays .238 - .036 - .003 2002 Tigers .238 - .032 ??? 1983 Mariners .234 - .032 - .005 1982 Reds .228 - .031 - .007 1981 Blue Jays .227 - .031 - .006 1974 Padres .230 - .029 - .020 1978 Athletics .231 - .028 - .026 1981 Mets .225 - .027 - .004 1996 Phillies .244 - .027 - .010 1978 Athletics .246 - .026 - .005 1985 Rangers .240 - .026 + .002 1993 Marlins .243 - .025 - .019 1976 Expos .233 - .025 - .019 1980 Blue Jays .242 - .024 - .031 2001 Mets .242 - .023 + .003 1996 Angels .256 - .023 + .011 1980 White Sox .243 - .023 + .019 1995 Blue Jays .252 - .023 - .013 1989 Indians .242 - .022 + .021
Again, all but one team moved up the next year, with an average improvement of 22 points. It's abundantly clear that extreme REA values don't repeat themselves; no matter what the environment, and no matter how good or bad the team, the REA tends to make a big move toward the norm the next year.
This is good news for teams that were the most inefficient this year. Detroit, Baltimore, Philadelphia, Milwaukee, and Tampa Bay can expect to improve their efficiency in 2003. Of course, it's bad news for this year's over-achievers, namely Anaheim, Arizona, Colorado, St. Louis, and the White Sox.
I won't take the space to show top-20 lists for pitching efficiency, but I can tell you that the same pattern held on the other side of the ball. The twenty most efficient pitching staffs moved an average of 21 points toward the norm the next year, while the least efficient improved by 22 points. Not a single team on either list moved further away.
The five most efficient pitching staffs this year, and the five most likely to struggle to match that performance, were Atlanta (which had the lowest run efficiency average in this 29-year period), Anaheim, Oakland, Minnesota, and Los Angeles. On the other hand, improvement is bound to be in store for Colorado, Detroit, Cleveland, Tampa Bay, and the Cubs.
By the way, Detroit was very inefficient on both offense and defense this year, and while their park might have something to do with that, I don't think it's a major factor. If the Tigers move 20 points toward the norm on both sides of the ball, they're looking at a favorable swing of 108 runs, or about 11 wins, even if nothing else changes. (Of course, moving from 54 wins to 65 wins isn't anything to write home about. They need to do even better than that.)
In contrast, Anaheim was on both top-five lists for 2002, and they stand to move back toward the pack offensively and defensively in 2003. That could take a 99-win team and bring them back to the high 80s.
Converting efficiency into wins
Let's try to wrap all of this up into one neat package. We started by showing that runs scored and runs allowed are an accurate predictor of wins and losses. Teams that deviate from this prediction usually revert to form the next year.
Then we showed that offensive production (as measured by total bases plus walks) is an accurate predictor of runs scored. Likewise for defensive production and runs allowed. For both offense and defense, teams that deviate from the predicted number of runs tend to move significantly toward the norm the next year.
In other words, these three forms of efficiency -- which I'll call pythagorean efficiency (turning runs into wins), offensive efficiency (turning TBW into runs scored), and defensive efficiency (limiting runs allowed per TBW allowed) -- can have a major impact on the standings in any one season. But that effect isn't likely to carry over to the next year.
Pythagorean efficiency is already expressed in wins and losses. I'll translate offensive and defensive efficiency into wins by taking the surplus or deficit in runs and dividing by nine. Why nine? According to the pythagorean method, that's the number of runs it takes to add one win in a league where the average team scores about 750 runs. By converting all three types of efficiency to wins, we can add them up to see which teams gained or lost the most due to efficiency in 2002. Here are the figures for all thirty teams:
Actual ----- Efficiency ----- Adj Wins Pyth Off Def Tot Win New York 103 +3 +4 +1 + 8 95 Boston 93 -8 +4 +2 - 2 95 Toronto 78 -2 +3 -2 - 1 79 Baltimore 66 -2 -6 +4 - 4 70 Tampa Bay 55 -1 -4 -5 -10 65 Minnesota 94 +7 -2 +5 +10 84 Chicago 81 -6 +4 -3 - 5 86 Cleveland 74 +3 -1 -5 - 3 77 Kansas City 62 -4 0 -3 - 7 69 Detroit 55 +6 -9 -6 - 9 64 Oakland 103 +6 -2 +5 + 9 94 Anaheim 99 -4 +7 +8 +11 88 Seattle 93 0 +2 +4 + 6 87 Texas 72 -5 0 -2 - 7 79 Atlanta 101 +3 -1 +9 +11 90 Montreal 83 0 0 +2 + 2 81 Philadelphia 80 +1 -6 -2 - 7 87 Florida 79 +5 -2 -1 + 2 77 New York 75 -4 +1 0 - 3 78 St. Louis 97 0 +5 +3 + 8 89 Houston 84 -3 +1 +2 0 84 Cincinnati 78 +4 -1 -3 0 78 Pittsburgh 72 +2 -2 0 0 72 Chicago 67 -8 -3 -4 -15 82 Milwaukee 56 -4 -5 -3 -12 68 Arizona 98 +1 +6 -1 + 6 92 San Francisco 95 -4 0 +3 - 1 96 Los Angeles 92 +3 +2 +4 + 9 83 Colorado 73 +4 +6 -9 + 1 72 San Diego 66 +2 -2 -4 - 4 70
Let's work through a few examples to make sure it's clear what we're trying to say with this table:
-
Oakland won 103 games ... six more games than the pythagorean method says is normal for a team that scored 800 runs and allowed 654, mainly by posting an extraordinary 32-14 record on one-run games ... offensive inefficiency (REA of .266 versus a league average of .270) cost them two wins ... efficient pitching and defense (REA of .252) added five wins ... overall efficiency adds up to nine wins ... a team with Oakland's offensive and defensive stats with average efficiency would therefore be expected to win only 94 games.
-
Anaheim won 99 games ... with the best run differential in the AL, they should have won more games than anyone, but they fell four games short of their pythagorean projection ... but they had the most efficient offense in the majors, picking up seven extra wins (63 runs) because their offensive REA was .292, twenty-two points above the norm for the league ... they also had the majors second most efficient pitching/defense (REA of .243, saving 71 runs), good for another eight wins ... overall, their efficiency on offense and defense overtook their pythagorean inefficiency for a net gain of 11 wins ... an 88-win season would have been more in line with their offensive and defensive stats.
-
Philadelphia won 80 games ... one more than normal for a team that was outscored by 14 runs ... the offense was the third least efficient in the majors (REA of .239 versus a league average of .256), costing them 51 runs and 6 wins ... defensive efficiency was also a problem (REA of .261), robbing them of two more wins ... total impact was a loss of seven wins for a team that was the league's fifth-best statistically ... with normal efficiency, they should have won 87 games.
- the Cubs won 67 games ... eight fewer than normal for a team that was outscored by only 53 runs ... offensive inefficiency (REA of .247 versus a league average of .256) cost them another three games ... defensive efficiency (REA of .270) lowered their win total by another four wins ... this triple whammy of inefficiency cost them 15 wins ... with average efficiency in all three areas, they win 82 games and finish a respectable third in the division.
Before leaving this topic, I want to emphasize that I'm not trying to diminish what the Angels accomplished this year by pointing out that their offensive and defensive stats are more consistent with those of an 88-win team. They did win 99 games in a very tough division by doing all the little things that count: putting the ball in play so even their outs were able to move runners over, hitting in the clutch, playing great defense, getting key outs when they needed them, and so on. They did all that again in the post season, when time after time they got themselves into a hole against very good teams and found a way to get the job done when it mattered most. It was a great run by a team that was awfully fun to watch.
The Angels remind me a lot of the New England Patriots. Both were expected to do very little before the start of the season. Both got off to slow starts and reached the playoffs by putting together winning streaks late in the year. Both were more impressive on the scoreboard than in the statistical leaderboards. Both were intelligent, fundamentally sound teams that had to scrap for everything they got and came up with big play after big play when things looked bleak. And because of all that, both teams were a lot of fun to watch and served as great examples of why championships are decided on the field, not on paper.
Summing up
I could have used a more sophisticated statistic like Runs Created to measure the efficiency of each team's offense and defense, thereby factoring in things like stolen bases, hit batsmen, and a few other stats that contribute to success. But I'm partial to simpler measures like TBW that are easy to figure, easy to interpret, and tell essentially the same story as the more complicated stats. I especially like the fact that runs divided by TBW, what I'm calling the run efficiency average, produces a figure that looks a lot like a batting average, a happy coincidence that makes it easier to get a feel for what's good, what's normal, and what's bad.
It was also very interesting to discover the strong tendency of teams that are highly efficient or inefficient in these three areas to move significantly toward the norm the following season. It's very rare for teams to excel (or fall short) in this way two years in a row. That's a good thing for team executives to know as they plan for next season.
I recall being very impressed with the Houston Astros, who refused to panic after a disappointing 2000 season that saw them fall 8 games short of their pythagorean projection. Many teams would have fired the manager and turned over half the roster in a futile attempt to blame someone for their poor showing. Instead, they chalked it up to one of those years when things just didn't go right and were rewarded with a tie for the division title in 2001. (Of course, after the 2001 season, they fired the manager for failing to win in the postseason, but that's a topic for another day.)
A number of this year's most inefficient teams have changed managers in recent weeks, and some of those managers are going to look like geniuses when their clubs make big gains in the win column next year. I wouldn't mind being Dusty Baker right now, assuming the front office doesn't destroy the team with ill-advised personnel moves this winter. The Cubs are the team most likely to get a large efficiency-related bounce, and with one of baseball's best-regarded farm systems, they are poised for a strong run in the NL Central.
- Tags: Team Efficiency
2012 Team Efficiency
Are you smarter than Buck Showalter?
Measuring 2012 Team Efficiency
By Jim Wheeler
December 10, 2012
Several years ago, Tom Tippett would regularly pen an article discussing the most recently concluded MLB season aimed at discovering which team was the most efficient in scoring and preventing runs along with creating wins. This is my attempt to resurrect these articles. Most of the thought and boilerplate of this article belongs to Tom Tippett and not myself. I am merely trying to keep the flame lit.
Every few seasons it seems like a special team comes from out of nowhere to rocket into the playoffs. In 2006, it was the Arizona Diamondbacks zooming into the NL playoff picture. In 2012, it was the Buck Showalter led Baltimore Orioles who won an amazing 93 games as opposed to a Pythagorean prediction of 82 wins. The Orioles were scintillating in one run games, going 29 and 9, a factor that no doubt helped them reach the American League playoffs. So the million dollar question then becomes; do you think that you can manage the Orioles with equal skill as Buck Showalter did in 2012? Can you replay the "El Birdos" 2012 season and maneuver them into the top of the AL pack? As explained below, doing so will be a real challenge. So as a famous man once said, "Do you feel lucky, punk? Well do you?"
What do we mean by team efficiency?
In a nutshell, you win games by outscoring your opponents, so the connection between runs and wins is very strong, even though every season produces a few teams that win more or less than you'd expect given their run differential. To explore the relationship between runs and wins, we'll use the Pythagorean method that was developed by Bill James.
You score runs by putting together hits, walks, steals, and other offensive events, and you prevent runs by holding the other team to a minimum of those things. In most cases, there's a direct relationship between runs and the underlying events that produce runs.
We use the term efficiency to represent the ability to turn events into runs and runs into wins. An efficient team is one that produces more wins than expected given its run margin, produces more runs than expected given its offensive events, or allows fewer runs than expected given the hits and walks produced by their opponents.
In the 2002 edition of this article, we showed that teams that are unusually efficient (or inefficient) have exhibited a very strong tendency to revert back to the norm the next year. That's good news for some teams and bad news for others. If you'd like to find out who falls into which category, read on.
Converting runs into wins
The Bill James Pythagorean method, a well-established formula based on the idea that a team's winning percentage is tightly coupled with runs scored and runs allowed. Bill's formula is quite simple ... take the square of runs scored and divide it by the sum of the squares of runs scored and runs allowed (RF = runs for, RA = runs allowed):
RF ** 2 Projected winning pct = ----------------- RF ** 2 + RA ** 2
In 2012, for instance, 21 of 30 teams finished with win-loss records within three games of their projected records, and 26 of 30 teams finished within five games. In 2010 and 2011, 27 of 30 teams finished within five games of their Pythagorean projection.
We had a very big exception this year. The Orioles won 11 more games than normal for a team with a run margin of +7. On a run-margin basis, they were 29-9 and they surprised everyone by finishing second in the AL East with 93 victories. Since 1962, when the 162-game schedule was first used in both leagues, no team had ever been more than 12 games better than their Pythagorean projection, so the Orioles came very close to tying this record which is also held by several teams.
But 50 years of baseball history tells us that such large deviations are unusual and tend not to be repeated the following year. In other words, the Orioles must dramatically improve their run margin in 2013 if they are to come close to matching this year's win total. The same is true of the Cincinnati Reds, who finished 6 wins to the good.
The teams that most underperformed their Pythagorean records were the Rays, Cardinals, Diamondbacks, Red Sox, and Rockies all with (-5) Pythagorean wins.
Converting offensive events into runs
Just as there is a strong relationship between runs and wins, it's almost always true that the more hits and walks you produce, the more runs you'll score. Sometimes, of course, a productive team comes up short on the scoreboard because they didn't hit in the clutch, didn't run the bases well, or hit line drives right at people in key situations. But this relationship holds up most of the time.
To shed some light on this relationship, we need a way to take batting stats and turn them into a measure of overall offensive production. There are several good options here, including Runs Created (Bill James), Batting Runs (Pete Palmer), Equivalent Average (Clay Davenport), OPS (on-base average plus slugging average), and Base Runs (David Smyth).
For this exercise, we'll use the sum of total bases and walks, or TBW for short. TBW is not a perfect measure, but it does have a few things going for it. It captures the most important things a team does to produce runs -- singles, extra-base hits, and walks -- and it's easy to figure without a computer.
As with other statistics, a team's TBW total can be significantly influenced by its home park. For that reason, we focus on the difference between the TBW produced by a team's hitters and the TBW allowed by its pitchers. This effectively removes the park from the equation and helps us identify teams that out produced their opponents.
The following table shows the offensive and defensive TBW figures for the 2012 American League, along with the difference between these two figures and each team's league rank based on those differences. It also shows runs for and against, the run differential, and the rankings based on run differential. Finally, because we're trying to trace a path from TBW to runs to wins, it lists the team's win total and league rank for the year.
The 2012 American League:
TBW Runs Wins Team Off Def Diff Rank Off Def Diff Rank Num Rank New York (A) 3068 2753 315 2 804 668 136 1 95 1 Baltimore 2799 2769 30 8 712 705 7 8 93 3t Tampa Bay 2699 2374 325 1 697 577 120 2 90 5 Toronto 2704 2989 -285 13 716 784 -68 10 73 10 Boston 2753 2894 -141 10 734 806 -72 12 69 12 Detroit 2824 2645 179 5 726 670 56 7 88 7 Chicago (A) 2789 2718 71 7 748 676 72 5 85 8 Kansas City 2658 2899 -241 11 676 746 -70 11 72 11 Cleveland 2662 2951 -289 14 667 845 -178 14 68 13 Minnesota 2676 2942 -266 12 701 832 -131 13 66 14 Oakland 2781 2555 226 4 713 614 99 4 94 2 Texas 2971 2696 275 3 808 707 101 3 93 3t Los Angeles (A) 2845 2674 171 6 767 699 68 6 89 6 Seattle 2493 2608 -115 9 619 651 -32 9 75 9
The AL East produced some very intriguing results. The Yankees finished first in Runs Differential (RD) and as expected, won 95 games. Of course we have already discussed the Orioles. With a Runs Differential of 8, they ranked 8th in the American League. Their TBW differential also was a positive 30 which also placed them in 8th place in the AL. Neither stat bodes well for Orioles fans in 2013. The Rays proved to be the biggest losers in the 2012 race. With a Runs differential of 120 (2nd place) and TBW of plus 325 (1st place), you would have expected them to win more than 90 games.
Bringing up the AL East rear was the Blue Jays and Red Sox. Adding to the Red Sox misery is that they finished even worse than their Pythagorean with 5 less victories.
In the AL Central, the White Sox should have won 89 games and the division. But instead, the Tigers went 8-2 in their last 10 games while the White Sox were 4-6 and consequently the Tigers took the division by 3 games. The Tigers had an RD of +56 Runs and the White Sox +72. But Detroit was more efficient in TBW with a +108 advantage over the White Sox.
The remainder of the AL Central, KC, Cleveland and Minnesota were nothing specular. Despite the Indians lack of offense (-289 TBW), they managed to squeeze out 6 more victories than expected. KC, finished with the 11th ranked TBW team, the 11th ranked RD team and finally the 11th ranked team in wins. A trifecta of sorts!
Over in the AL West, the Rangers finished 3rd in TBW, 3rd in RD and tied for 3rd best in AL wins. Yet it was not enough to best the A’s who converted 4th place in both TBW and RD into a division winning 94 wins. Incidentally, those 94 wins represent the 2nd best total in the AL. The Angels never seemed to get rolling in 2012. Their +171 TBW and +68 in RD rated 6th best in the AL and they parlayed this into 89 victories, 1 win better than their expected win total.
The 2012 National League:
TBW Runs Wins Team Off Def Diff Rank Off Def Diff Rank Num Rank Washington 2880 2537 343 1 731 594 137 1 98 1 Atlanta 2678 2502 176 3 700 600 100 3 94 3t Philadelphia 2669 2656 13 9 684 680 4 9 81 9 New York (N) 2605 2673 -68 11 650 709 -59 11 74 12 Miami 2561 2692 -131 13 609 724 -115 13 69 13 Cincinnati 2730 2571 159 4 669 588 81 4 97 2 St. Louis 2900 2594 306 2 765 648 117 2 88 5 Milwaukee 2892 2842 50 6 776 733 43 7 83 7 Pittsburgh 2582 2616 -34 10 651 674 -23 10 79 10 Chicago (N) 2492 2864 -372 15 613 759 -146 15 61 15 Houston 2471 2906 -435 16 583 794 -211 16 55 16 San Francisco 2688 2646 42 7 718 649 69 5 94 3t Los Angeles (N) 2513 2495 18 8 637 597 40 8 86 6 Arizona 2823 2696 127 5 734 688 46 6 81 8 San Diego 2599 2716 -117 12 651 710 -59 12 76 11 Colorado 2884 3216 -332 14 758 890 -132 14 64 14
The old saying used to be, "Washington, First in war, First in Peace and last in the American League." Well guess what, Washington in now in the National League and in First place! The Nationals led all of baseball in 2012 with 98 wins, ranked 1st in DR (+137), and 1st in TBW differential (+343). Yet, the Cardinals who managed to under-perform by 6 Pythagorean wins (88 wins as opposed to 94 projected wins) were victorious in the play-offs over the Nationals. Which leads to one of the great questions of the 2012 season? Had Steven Stasburg been available for the play-offs rather than games in April and May, would the Nationals have done better in the playoffs? Well now you replay managers have your chance to see how history could have been altered with Strasburg in the Nationals playoff rotation.
The National League Central Division produced two teams that were among the leaders in baseball at being efficient and also inefficient. The Reds finished 1st in the division with 97 victories which is 6 better than their Pythagorean expectation. But the Reds were very efficient in producing wins despite rating 4th in the NL in RD (+81) and TWB (+159). Meanwhile, last year’s darlings the Cardinals should have run away with the pennant. The Redbirds had a very potent offense (+306) TWB plus a stingy defense and pitching staff, had a +117 in RD, both good enough for 2nd place in the National League. But when it came to converting these two positives into victories, the Cardinals were -6 games worse than their expected win total. The NL Central pennant race should have been much tighter than it was. It is worth noting that the Reds bested the Cardinals in one-run games going 31-21 compared to the Cardinals 21-26. Perhaps the difference comes down to the performance of "The Cuban Missile" Aroldis Chapman?
Once again the Pirates played well for the first 4 months of the season and then tailed off to finish up with 79 wins, one less than expected. But the Pirates did perform to the level that the stats predict: -34 TBW, -23 RD were both good enough for a 10th place rank amongst the National League. And their final win total was also in sync with their 79 wins. The Brewers were pretty much a mediocre team. They should have won 86 games but their lack of efficiency in TWB and RD doomed them to winning three fewer games and close the season with just 83 wins. The Cubs and Astros were nothing short of horrific in 2012. The Cubs allowed 146 more runs to score than their offense while the Astros had the worst RD in baseball with a -211 RD. Will the AL West be any easier for them in 2013?
The National League West mirrored the Central division in that the Giants won 5 more games than expected while the Diamondbacks lost 5 more games than expected. In fact, the Giants tied the Braves with the 3rd most wins in the NL at 94 and their expected wins total of 89 would still have been the best in the division. But neither the Dodgers nor the Diamondbacks could translate their positive TBW and RD into enough victories to give the Giants much of a challenge. Finishing on the two bottom rungs of the division as projected were the Padres and Rockies.
Looking ahead
As we've pointed out, it's unusual for teams that are especially efficient or inefficient to sustain those levels the next year. Instead, they tend to revert to the normal relationships between TBW and runs and between runs and wins. That means we can identify teams that are likely to improve or fall back even if they don't make moves that change their talent level significantly.
For that reason, the Rays have some reason for optimism going into 2013. With Baltimore overachieving and the Yankees losing players to injury and free agency, the Rays are in prime position to improve on their 90 wins. The Blue Jays had added some new interesting players but have a huge gap to make up to reach the top of the division. Hopefully, with a new, less controversial manager at the helm, the Red Sox can produce a much better season in 2013.
The Royals have added some much needed pitching but will it be enough to topple the Tigers with Cabrera, Fielder, and Verlander? Royals RD of -70 could be quickly made up by the new arms of Shields and Davis. In the AL West, Oakland and Texas ran neck and neck in TBW and RD. The unknown status of the Rangers’ Josh Hamilton makes this race too close to call based on their 2012 performances.
Expect the Nationals to again dominate in the National League. If Strasburg has no innings pitched limit, the Nationals may be unstoppable. The NL Central looks like a two team race between the Reds and Cardinals. The Dodgers signing of Greinke has to put them in the early driver’s seat. However, the Giants and Diamondbacks are still close to striking distance and any swing towards their expected win totals of 89 and 86 will make this division race quite interesting.
Wrapping Up
A lot of things will change between now and Opening Day. This process of looking at TBW differentials and run margins doesn't tell us how the 2013 season will unfold, but it can identify some teams that might have more or less work to do this winter than you may have thought.
I think it's safe to say that the Rays, Rangers, and Cardinals are among the good teams most likely to add to their win totals next season, even without major roster changes. The Braves, Brewers and Diamondbacks are also in line for small efficiency-related bounces.
On the flip side, it will be fascinating to see whether the Orioles and Yankees can sustain their recent successes. Perhaps they were a little lucky. Or perhaps they've figured out how to maximize the impact of the things that don't show up in these measures of expected runs, things such as base running, timely pitching (including a strong bullpen), and the judicious use of one-run strategies.
- Tags: Team Efficiency
2007 Team efficiency
Measuring team efficiency
By Charles Wolfson and Tom Tippett
December 10, 2007
In the mid-to-late 1990’s, Tom started writing an annual essay about teams whose win-loss records were out of synch with their underlying stats. In the early years, these essays were largely unknown to the baseball community, because they were available only as part of the annual Diamond Mind Season Disk. That changed in 2002, when he wrote an article on the subject for ESPN.com, and they’ve been published on ESPN.com and/or the Diamond Mind website ever since.
The central idea is simple. Baseball analysts have developed a variety of methods for translating (a) hits and walks and other events into runs, and (b) runs into wins. One can use these methods to identify teams that scored more or fewer runs than they "should have," teams that allowed more or fewer runs than they "should have," and teams that won or lost more games than they "should have," given the runs they actually scored and allowed.
In the years since our Team Efficiency article first appeared on ESPN.com, others have picked up the ball and run with it. The folks at Baseball Prospectus created a team standings page, updated daily during the season, which shows the actual standings and the standings as they would appear if every team had "normal" relationships between events and runs and wins. Bill James is now providing information of this type for the annual Bill James Baseball Handbooks. And The Hardball Times Baseball Annual has included a chapter on this subject.
What do we mean by team efficiency?
In a nutshell, you win games by outscoring your opponents, so the connection between runs and wins is very strong, even though every season produces a few teams that win more or less than you'd expect given their run differential. To explore the relationship between runs and wins, we use the pythagorean method that was developed by Bill James.
You score runs by putting together hits, walks, steals, and other offensive events, and you prevent runs by holding the other team to a minimum of those things. In most cases, there's a direct relationship between runs and the underlying events that produce runs.
We use the term efficiency to represent the ability to turn events into runs and runs into wins. An efficient team is one that produces more wins than expected given its run margin, produces more runs than expected given its offensive events, and/or allows fewer runs than expected given the hits and walks produced by their opponents.
In the 2002 edition of this article, we showed that what goes around, comes around: teams that are unusually efficient (or inefficient) have exhibited a very strong tendency to revert back to the norm the next year. That's good news for some teams and bad news for others. If you'd like to find out who falls into which category, read on.
Converting runs into wins
The Bill James pythagorean method is a well-established formula based on the idea that a team's winning percentage is tightly coupled with runs scored and runs allowed. Bill's formula is quite simple ... take the square of runs scored and divide it by the sum of the squares of runs scored and runs allowed (RS = runs scored, RA = runs allowed):
RF^2 Projected winning pct = ------------ RF^2 + RA^2
The 2007 season was unusual in that just 15 of 30 teams finished with win-loss records within three games, and just 23 of 30 within five games, of their projected record, compared to 18 of 30 and 25 of 30, respectively, in 2006. From 2003 to 2005, 75 of 90 teams finished within five games of their pythagorean projection.
The great outlier in 2006 was the Indians, who won 12 less games than normal for a team with their +88 run differential. No team approached that level of frustration in 2007, but the Diamondbacks overachieved by nearly as big a margin, winning 11 more games than normal for a team outscored by 20 runs, a margin topped by just four teams since 1974. The Mariners were nearly as fortunate, winning 9 more games than their -19 run deficit warranted.
The Red Sox registered an unusual reversal of fortunes. In 2007 they won a major league best 96 games, despite underperforming their projecting win total by seven games, compared to the disappointing 2006 season in which they managed only 86 wins, but exceeded their projected win total by six.
Baseball history tells us that large deviations are unusual and tend not to be repeated the following year. In our 2006 article, we suggested that the Indians could well see a big improvement in their win-loss record in 2007, even without major roster changes; for the same reason, fans of the Diamondbacks and Mariners may have reason to view the upcoming 2008 season with some trepidation.
Converting offensive events into runs
Just as there is a strong relationship between runs and wins, it's almost always true that the more hits and walks you produce, the more runs you'll score. Sometimes, of course, a productive team comes up short on the scoreboard because they didn't hit in the clutch, didn't run the bases well, or hit line drives right at people in key situations. But this relationship holds up most of the time.
To shed some light on this relationship, we need a way to take batting stats and turn them into a measure of overall offensive production. There are several good options here, including Runs Created (Bill James), Batting Runs (Pete Palmer), Equivalent Average (Clay Davenport), OPS (on-base average plus slugging average), and Base Runs (David Smyth).
For this exercise, we'll use the sum of total bases and walks, or TBW for short. TBW is not a perfect measure, but it does have a few things going for it. It captures the most important things a team does to produce runs -- singles, extra-base hits, and walks -- and it's easy to figure without a computer.
As with other statistics, a team's TBW total can be significantly influenced by its home park. For that reason, we focus on the difference between the TBW produced by a team's hitters and the TBW allowed by its pitchers. This effectively removes the park from the equation and helps us identify teams that outproduced their opponents.
The following table shows the offensive and defensive TBW figures for the 2007 American League, along with the difference between these two figures and each team's league rank based on those differences. It also shows runs for and against, the run differential, and the rankings based on run differential. Finally, because we're trying to trace a path from TBW to runs to wins, it lists the team's win total and league rank for the year.
---------- TBW ---------- ------- Runs -------- - Wins - AL Off Def Diff Rank Off Def Diff Rank Num Rank BOS 3170 2620 550 1 867 657 210 1 96 1t
NYA 3286 2907 379 2 968 777 191 2 94 3t TOR 2854 2633 221 4 753 699 54 6 83 7
BAL 2821 2987 -166 11 756 868 -112 12 69 12t
TBA 2969 3224 -255 13 782 944 -162 14 66 14
CLE 2987 2717 270 3 811 704 107 3 96 1t
DET 3109 2966 143 6 887 797 90 5 88 5t
MIN 2671 2831 -160 10 718 725 -7 7 79 8
CHA 2732 2946 -214 12 693 839 -146 13 72 11
KCA 2573 3011 -438 14 706 778 -72 11 69 12t
LAA 2824 2767 57 7 822 731 91 4 94 3t
SEA 2805 2964 -159 9 794 813 -19 9 88 5t
OAK 2934 2784 150 5 741 758 -17 8 76 9
TEX 2870 3022 -152 8 816 844 -28 10 75 10
In 2007 just seven of 14 AL teams had positive TBW differentials and just six outscored their opponents, compared to ten and nine, respectively, in 2006. The AL did not get quite as big a boost from interleague play in 2007, slipping from 154-98 overall vs. the NL in 2006 to 137-115 last season.
Boston dominated the rankings, leading the league in TBW differential, run margin, and wins. Indeed, their +550 TBW was the ninth best out of 928 team seasons since 1974. The Yankees were a solid second in both categories, and were slightly more efficient than the Red Sox in converting their +379 TBW into +191 runs.
As bad as things were for the bottom-ranked teams, for some they could have been even worse. Kansas City managed to better three other AL teams in run differential, despite a major league worst -438 TBW. The White Sox were a comparatively modest -214 TBW, but their -146 run differential was more than double that of the Royals. Nevertheless, Chicago won three more games than the Royals and six more than their second-worst-in-the-majors run differential predicted.
As already noted, Seattle turned in a remarkable performance in 2007, winning 88 games despite ranking ninth in the league in both TBW and run differential. Since 1974, no other team has managed at least 88 wins with a worse TBW than Seattle’s -159. In fact, only eight teams have managed the feat with a TBW in the red at all.
Minnesota and Los Angeles were notable overachievers, each ranking three spots higher in run differential than TBW. The Angels actually had a higher run differential (+91) than TBW (+57), the only team in 2007 to achieve that feat. Whether this run efficiency is a tribute to their “small ball” proficiency, or further obscures the true urgency of their need to boost their offenses, the 2008 season may reveal.
Oakland’s decline in the standings in 2007 was reflected in their inefficiency. The A’s ranked fifth in the league with a +150 TBW, but just eighth with their -17 run differential. Should the Angels do no more than add Torii Hunter, the Mariners come back down to earth, and the A’s keep their pitching corps intact, they may not be as far from contending in the AL West as the 2007 standings might otherwise suggest.
Moving on to the National League:
---------- TBW ---------- ------- Runs -------- - Wins - NL Off Def Diff Rank Off Def Diff Rank Num Rank PHI 3246 3104 142 5 892 821 71 4 89 3t
NYN 2971 2842 129 7 804 750 54 6 88 5
ATL 3006 2845 161 3 810 733 77 2 84 7
WAS 2679 2997 -318 16 673 783 -110 15 73 11t
FLO 3044 3170 -126 11 790 891 -101 13 71 14t
CHN 2879 2746 133 6 752 690 62 5 85 6
MIL 3033 2876 157 4 801 776 25 7 83 8
SLN 2747 2922 -175 13 725 829 -104 14 78 10
HOU 2858 3113 -255 14 723 813 -90 12 73 11t
CIN 2983 3078 -95 9 783 853 -70 11 72 13
PIT 2754 3046 -292 15 724 846 -122 16 68 16
ARI 2761 2860 -99 10 712 732 -20 9 90 1t
COL 3111 2881 230 2 860 758 102 1 90 1t
SDN 2862 2583 279 1 741 666 75 3 89 3t
LAN 2788 2722 66 8 735 727 8 8 82 9
SFN 2673 2836 -163 12 683 720 -37 10 71 14t
The NL champion Rockies ranked second in TBW and first in run differential. The team they defeated in the one-game playoff for the wild card, the Padres, did almost as well, ranking first in TBW and third in run differential.
The Diamondbacks, however, were the biggest story, winning 90 games despite a -99 TBW and a -20 run differential. Since 1974, the only other teams to manage 90 wins with a negative TBW were the 1984 Mets (-24) and the 1997 Giants (-9).
In 2006, the Cardinals eked out a division title and a World Series championship, despite a -23 TBW and +19 run differential. We gave them something of a “pass” that year on the basis of lengthy injuries to numerous key players. In 2007, however, the team slipped precipitously to -175 TBW and -104 runs. The fact that they managed to win eight more games than that performance warranted should not mislead anyone into thinking that this is not a team in need of a significant overhaul.
Atlanta was reasonably efficient in converting a +161 TBW (ranked third) into +77 runs (ranked second), but not in turning that positive run differential into wins (ranked seventh). Milwaukee was the league’s most inefficient team, ranking fourth with a +157 TBW but just seventh with +25 runs, a testament, perhaps, to their particularly inept defense.
Looking ahead
As we've pointed out, it's unusual for teams that are especially efficient or inefficient to sustain those levels the next year. Instead, they tend to revert to the normal relationships between TBW and runs and between runs and wins. That means we can identify teams that are likely to improve or fall back even if they don't make moves that change their talent level significantly.
For that reason, the A’s could be a sleeper team in the comparatively weak AL West, with underlying numbers in 2007 comparable to the Angels and overachieving Mariners. And based on their 2007 figures, Red Sox Nation can expect their team to be at least as tough again in 2008.
In the NL, 2008 could be the year the Braves return to the top of the East Division. It’s improbable for any team to put together a run like Colorado did last fall, but there is nothing in the numbers to suggest that the Rockies cannot carry last season’s improvement into 2008. The Diamondbacks, on the other hand, could be making a mistake if they choose to stand pat this winter. And a rebound by the Cardinals, even in the mediocre NL Central, appears unlikely without significant reinforcements.
- Tags: Team Efficiency
2006 Team Efficiency
Measuring team efficiency
By Tom Tippett
December 11, 2006
In the mid-to-late 1990s, I started writing an annual essay about teams whose win-loss records were out of synch with their underlying stats. In the early years, these essays were largely unknown to the baseball community because they were available only as part of our annual Season Disk. That changed in 2002 when I wrote an article on this subject for ESPN.com, and we've been publishing them on ESPN.com and/or the Diamond Mind web site ever since.
The central idea is simple. Baseball analysts have developed a variety of methods for translating (a) hits and walks and other events into runs and (b) runs into wins. One can use these methods to identify teams that scored more or fewer runs than they "should have", teams that allowed more or fewer runs than they "should have", and teams that won or lost more games than they "should have" given the runs they actually scored and allowed.
In the years since our Team Efficiency article first appeared on ESPN.com, others have picked up the ball and run with it. The folks at Baseball Prospectus created a team standings page, updated daily during the season, that shows the actual standings and the standings as they would appear if every team had "normal" relationships between events and runs and wins. Bill James is now providing information of this type for the annual Bill James Baseball Handbooks. And The Hardball Times Baseball Annual has included a chapter on this subject.
What do we mean by team efficiency?
In a nutshell, you win games by outscoring your opponents, so the connection between runs and wins is very strong, even though every season produces a few teams that win more or less than you'd expect given their run differential. To explore the relationship between runs and wins, we use the pythagorean method that was developed by Bill James.
You score runs by putting together hits, walks, steals, and other offensive events, and you prevent runs by holding the other team to a minimum of those things. In most cases, there's a direct relationship between runs and the underlying events that produce runs.
We use the term efficiency to represent the ability to turn events into runs and runs into wins. An efficient team is one that produces more wins than expected given its run margin, produces more runs than expected given its offensive events, and/or allows fewer runs than expected given the hits and walks produced by their opponents.
In the 2002 edition of this article, we showed that teams that are unusually efficient (or inefficient) have exhibited a very strong tendency to revert back to the norm the next year. That's good news for some teams and bad news for others. If you'd like to find out who falls into which category, read on.
Converting runs into wins
The Bill James pythagorean method, a well-established formula based on the idea that a team's winning percentage is tightly coupled with runs scored and runs allowed. Bill's formula is quite simple ... take the square of runs scored and divide it by the sum of the squares of runs scored and runs allowed (RF = runs for, RA = runs allowed):
RF ** 2 Projected winning pct = ----------------- RF ** 2 + RA ** 2
In 2006, for instance, 18 of 30 teams finished with win-loss records within three games of their projected records, and 25 of 30 teams finished within five games. From 2003 to 2005, 75 of 90 teams finished within five games of their pythagorean projection.
We had a very big exception this year. The Indians won 12 fewer games than normal for a team with a run margin of +88. On a run-margin basis, they were more like an 90-win team that should have been in the thick of the division race to the end. Since 1962, when the 162-game schedule was first used in both leagues, only five teams have been at least 12 games worse than their pythagorean projection, so the 2006 Indians have assumed a dubious place in modern baseball history.
In a reversal of their 2005 season, Oakland won 93 games despite a +44 run margin that would normally produce an 86-win season. The year before, their real win-loss record fell short of their pythagorean mark by 6 games, enough to cost them the division title that year.
The Mets, Brewers and Red Sox each won 5 more games than their run margin supported. The Rangers and Braves each fell short by 6 games, while the Rockies came up short by 5 wins.
But 44 years of baseball history tells us that large deviations are unusual and tend not to be repeated the following year. In other words, the Indians could easily see a big improvement their win-loss record in 2006 even if they don't make major changes to the roster.
Converting offensive events into runs
Just as there is a strong relationship between runs and wins, it's almost always true that the more hits and walks you produce, the more runs you'll score. Sometimes, of course, a productive team comes up short on the scoreboard because they didn't hit in the clutch, didn't run the bases well, or hit line drives right at people in key situations. But this relationship holds up most of the time.
To shed some light on this relationship, we need a way to take batting stats and turn them into a measure of overall offensive production. There are several good options here, including Runs Created (Bill James), Batting Runs (Pete Palmer), Equivalent Average (Clay Davenport), OPS (on-base average plus slugging average), and Base Runs (David Smyth).
For this exercise, we'll use the sum of total bases and walks, or TBW for short. TBW is not a perfect measure, but it does have a few things going for it. It captures the most important things a team does to produce runs -- singles, extra-base hits, and walks -- and it's easy to figure without a computer.
As with other statistics, a team's TBW total can be significantly influenced by its home park. For that reason, we focus on the difference between the TBW produced by a team's hitters and the TBW allowed by its pitchers. This effectively removes the park from the equation and helps us identify teams that outproduced their opponents.
The following table shows the offensive and defensive TBW figures for the 2006 American League, along with the difference between these two figures and each team's league rank based on those differences. It also shows runs for and against, the run differential, and the rankings based on run differential. Finally, because we're trying to trace a path from TBW to runs to wins, it lists the team's win total and league rank for the year.
---------- TBW ---------- ------- Runs -------- - Wins - AL Off Def Diff Rank Off Def Diff Rank Num Rank NY 3256 2807 449 1 930 767 163 1 97 1
Tor 3104 2839 265 3 809 754 55 6 87 7 Bos 3117 3026 91 9 820 825 -5 10 86 8
Bal 2850 3182 -332 12 768 899 -131 12 70 12
TB 2739 3134 -395 13 689 856 -167 13 61 14
Min 2870 2718 152 8 801 683 118 3 96 2
Det 2961 2728 233 4 822 675 147 2 95 3
Chi 3127 2925 202 5 868 794 74 5 90 5
Cle 3125 2849 276 2 870 782 88 4 78 10t
KC 2770 3319 -549 14 757 971 -214 14 62 13
Oak 2914 2897 17 10 771 727 44 8 93 4
LA 2869 2701 168 6 766 732 34 9 89 6
Tex 3028 2873 155 7 835 784 51 7 80 9
Sea 2810 2982 -172 11 756 792 -36 11 78 10t
Largely because the AL dominated the NL to such a great degree in inter-league play, it was an unusual year. Ten out of fourteen AL teams were above water in TBW differential and nine had positive run margins. Boston, Minnesota, Detroit, Chicago and Seattle dominated their inter-league series, each winning at least 14 of 18 contests against the weaker league.
The Yankees were strong across the board, leading the league in TBW differential, run margin, and wins. The bottom three teams were rotten in every way. In between, things didn't exactly go according to form.
We've already talked about the Indians, who were second in TBW differential, fourth in run margin, and tied for tenth in wins. That's not easy to do. Cleveland's TBW differential of +276 is in the top 12% of all teams in the past third of a century. Fully 90% of those teams won at least 90 games, and the 2006 Indians are only the third team in that group to lose more games than they won. The 1979 Dodgers went 79-83 with a +282 TBW differential in an aberrational year that was sandwiched between two seasons of at least 92 wins. And the 1984 Phillies finished at .500 despite a differential of +333 bases. Unlike the Dodgers, the Phils did not bounce back the next year, sliding to 76 wins in 1985.
The Blue Jays also underachieved, almost entirely on offense, scoring about 70 runs less than the Runs Created formula predicts based on their underlying stats. With the league's third-best TBW differential, they're positioned to make a postseason run in 2007 if they can overcome their offensive efficiency problems.
Minnesota's TBW differential of +152 was only 8th in the league and normally wouldn't be enough to put a team in playoff contention. But the Twins turned that into the league's third-best run margin and second-best record to grab the division title on the season's final day.
In 2005, Oakland was vastly better than Los Angeles statistically but fell seven games short in the standings. This year saw an almost complete reversal, with LA having a big lead in TBW differential, being roughly even in run margin, and finishing four games behind Oakland in the standings. Combine the two years and you've got parity across the board, so one division title each is about right, even if each team stole one from the other.
Moving on to the National League:
---------- TBW ---------- ------- Runs -------- - Wins - NL Off Def Diff Rank Off Def Diff Rank Num Rank NY 3021 2779 242 1 834 731 103 1 97 1
Phi 3168 3121 47 4 865 812 53 3 85 4
Atl 3066 3026 40 5 849 805 44 5 79 8
Flo 2890 2947 -57 10 758 772 -14 9 78 9
Was 2889 3087 -198 14 746 872 -126 16 71 14
SL 2913 2936 -23 7 781 762 19 6 83 5
Hou 2843 2828 15 6 735 719 16 7 82 6
Cin 2999 3057 -58 11 749 801 -52 12 80 7
Mil 2783 2918 -135 13 730 833 -103 13 75 13
Pit 2664 2995 -331 16 691 797 -106 14 67 15
Chi 2752 3053 -301 15 716 834 -118 15 66 16
SD 2886 2715 171 3 731 679 52 4 88 2t
LA 3035 2855 180 2 820 751 69 2 88 2t
Col 2969 3008 -39 8 813 812 1 8 76 10t
SF 2802 2846 -44 9 746 790 -44 11 76 10t
Ari 2897 2966 -69 12 773 788 -15 10 76 10t
Just like their AL counterparts, the Mets topped the league in TBW differential, run differential, and wins, all by a very comfortable margin, yet failed to make it to the World Series. The rest of the NL East went according to form, with strong relationships between batting events, runs, and wins.
The mediocrity of the Cardinals season is clear from these numbers. Despite being outproduced by 23 bases, St. Louis eked out a run margin of +19 and a winning record by a few games. Some have taken this as an indication that the Cardinals weren't worthy of their World Series title. I don't agree.
Their regular-season stats and record were far worse than they should have been because they lost Albert Pujols, Jim Edmonds, Scott Rolen, David Eckstein, Mark Mulder, and Jason Isringhausen to injuries for long stretches. By the time October rolled around, four of those guys were back, so the postseason Cardinals were more like the team that we projected for the best record in the NL. I still thought the Tigers were the better team, but it's just not fair to say the Cards were undeserving.
The Reds have been defying the odds in recent years, posting an actual record that was at least four games better than their pythagorean record in four of the past five seasons. (The exception was 2005, and the high-water mark was a +10 in 2004.) Still, their stats supported a third-place finish, and that's where they wound up after contending for the top spot almost all year.
In the West, a chasm opened between the top two teams and the rest of the pack. The Padres and Dodgers posted strong numbers across the board, while the other three teams were clustered together, all of them a little below average. Based on their TBW differentials, you'd expect the top two teams to finish in a tie and the bottom three to be within a game of each other, and that's exactly what happened.
Except for the Braves, who fell a few games short, and the Reds, who picked up a few unexpected wins, it was a case of what-you-see-is-what-you-get, with everyone in the NL posting a win-loss record that matched their underlying stats.
Looking ahead
As we've pointed out, it's unusual for teams that are especially efficient or inefficient to sustain those levels the next year. Instead, they tend to revert to the normal relationships between TBW and runs and between runs and wins. That means we can identify teams that are likely to improve or fall back even if they don't make moves that change their talent level significantly.
For that reason, the Blue Jays and Indians have some reason for optimism going into 2006. Both put up impressive underlying numbers but didn't get the payoff in runs and/or wins this past season. If their offseason moves are talent-neutral or better, both teams can be expected to contend for at least a wild card next year.
Three teams would be making a mistake if they focus too much on their actual 2006 win-loss records. Boston was more like a .500 team than an 86-win team, Minnesota was quite good but not 96-win good, and Oakland was a little fortunate to win the division.
Judging by the money they're spending this winter, Boston understands that they need to put a much better team on the field in order to contend in 2007. Oakland's front office always seems to understand what they need and find a creative way to get it. Minnesota has yet to make any major moves, but they don't need to do much to remain a contender in what has become a very tough division.
- Tags: Team Efficiency