Tom Tippett's Thoughts on Defense
Some people argue that it's impossible to measure the defensive performance of baseball players because the statistics available for that purpose are woefully inadequate. If you're talking about traditional fielding stats -- games, putouts, assists, errors, double plays -- I wouldn't go so far as to say that it's impossible, but I would agree that it's not easy.
In this article, we'll look at those traditional fielding stats and talk about what you can and cannot learn from them. We'll look at more modern fielding statistics such as Pete Palmer's Fielding Runs, the zone ratings from STATS Inc., and Bill James Win Shares. As the providers of a computer baseball game, one of our ongoing tasks is rating players in all phases of the game, including defense, and we'll talk about how we use detailed play-by-play data from STATS to improve our understanding.
Even with these advances, evaluating defense is not an exact science. If you're a the-glass-is-half-empty sort of person, you could take that to mean it's not worth the effort. But I believe the availability of play-by-play data has raised the level of the water so the glass is now about 90% full, and if you're interested in joining me for a little stroll through the evolution of fielding analysis, I think you'll end up with a better idea of what we can and cannot learn about defense.
Baseball analysis in general
The idea of using statistical measures to assess the ability to succeed in a certain phase of the game is not a radical one. Baseball people have been doing this for over a century to measure batting and pitching performances. They don't, after all, give the batting title to the guy with the prettiest swing, they give it to the player who hit for the highest average. They don't give the Cy Young to the pitcher with the best mechanics or the guy who throws the hardest, they give it to the one who was deemed to be most effective. They look at results, not form or effort or attitude or any of the other things that a player brings to the game.
But for the most part this tradition has extended only to hitting and pitching. Today's announcers and analysts make increasing use of modern measures like on-base percentage and inherited runners to shed more light on those areas of the game, but you never hear a television or radio analyst talk about meaningful measures of baserunning, throwing or defense. Instead, they talk about their impressions of the player -- how fast he looks, his quickness, strength and athleticism -- and say simplistic things like "they're the best fielding team in a league because they lead in fielding percentage."
Because we do our own analysis, we sometimes find players whose performance is better or worse than you would guess by watching them a few times a year. And while most of our ratings are consistent with the opinions expressed by baseball's leading writers and TV personalities, sometimes we conclude that a player is actually performing at a higher or lower level than his reputation would suggest.
Because we try very hard to provide the most accurate and realistic baseball simulation available, we can't afford to give in to public opinion and rate someone higher than his performance justifies. If we did that for defensive ratings, we'd have these options:
- reduce the rating of one of his teammates so the team's defense isn't overrated
- reduce the effectiveness of the team's pitchers to compensate for the extra plays this player will now make in the simulated season
- disregard these side effects and allow the player, the team, and its pitchers to produce better results than they should
We don't think it's fair to downgrade teammates so we can give a popular player a better rating than he deserves. And we don't think our customers would want us to disregard the side effects and publish a season disk with players and teams who will overperform. So we do our best to rate players based on their actual performance.
Judging by Watching
For a few years now, I've wanted to write a little piece about how difficult it is to judge defensive ability, or any baseball skill for that matter, just by watching a lot of games. Then I found an essay by Bill James in his 1977 Baseball Abstract (a self-published book that predated his debut in bookstores by about five years) that says it far, far better than I ever could.
Here are a few excerpts from this wonderful essay, starting with a comment on how differently most people tend to approach the assessment of hitters and fielders:
"While we might not all be able to agree who the greatest-hitting first baseman ever was, the record books will provide us with a reasonably brief list to choose from: Gehrig, Anson, Foxx, Sisler. That's about it. Nobody's going to argue that it was Joe Judge or Moose Skowron, because the record books simply will not permit it . . .
Fielding statistics provide no such limited clarity. Talk about the greatest fielding shortstops ever . . . and the basic argument for everybody is 'One time he made a play where...'
Suppose we turn that same argument back to hitting. Now Moose Skowron hit some baseballs a long way, but nobody is going to say that he was the greatest hitting first baseman ever because 'One time I saw him hit a baseball so far that..." It is understood, about hitters, that the important question is not how spectacularly but how often. Brooks Robinson is known as a great fielding third baseman not because of the number of plays that he makes, but because he looks so good making them. Nobody talks anymore about what a great hitter Jim Northrup was, although to tell you the truth I never saw anybody who looked better at the plate. It is understood that, notwithstanding appearances, he wasn't an especially good hitter. Hitters are judged on results; fielders, on form."
And he talks about the difficulty of trying to judge effectiveness simply by watching:
"One absolutely cannot tell, by watching, the difference between a .300 hitter and a .275 hitter. The difference is one hit every two weeks. It might be that a reporter, seeing every game the team plays, could sense the difference over the course of the year if no records were kept, but I doubt it . . . the difference between a good hitter and an average hitter is simply not visible."
"a fielder's visible fielding range, which is his ability to move to the ball after it is hit, is vastly less important than his invisible fielding range, which is a matter of adjusting his position a step or two before the ball is hit."
In that essay, Bill went on to propose a scoring system that accomplishes essentially what STATS Inc. is doing now -- recording the location of every batted ball so that we could build a record of fielding performances similar to the statistical records that we use to judge batting and pitching performances.
I'm not saying that it doesn't matter whether you watch games or not. I'm just saying that I agree with Bill that it's very difficulty to rate players solely by watching games. We also need useful measures of what they accomplished.
Measuring Defensive Range
Defensive range is the ability to cover ground and get to more balls than the average fielder, and it's one of the hardest elements of fielding performance to measure.
Official fielding stats provide information such as games played, putouts, assists, errors, double plays, and fielding percentage. But using these numbers to assess player skills is extremely difficult, if not impossible. The list of reasons is very long, but they all boil down to the fact that they don't tell you how many chances to make plays were presented to each fielder.
In 2002, for example, Jose Vidro led the majors in assists by a second baseman. Does this mean he was the best seconde baseman in baseball, or was this just because:
- he played more innings than everyone else?
- he played behind a pitching staff that didn't strike out a lot of batters, so more balls were put in play?
- his pitching staff induced a high percentage of ground balls?
- his pitching staff was heavily right-handed, so they faced more than the normal number of left-handed batters (who hit more ground balls to the right side)?
- his park somehow makes it easier for him to make plays?
- it just happened that more balls were hit to second when he was playing?
Baseball analysts, ourselves included, have made many attempts to devise methods that deal with some of these other factors so that we can isolate the contribution the player is making. Let's review them, and then talk about some newer methods that we've been using.
Range Factors and Defensive Innings
In the 1970s, Bill James introduced the idea of range factors to compensate for playing time. A player's range factor is generally computed as successful chances (putouts plus assists) per game. This was a good first step, even though Bill acknowledged at the time that it wasn't meaningful for pitchers, catchers and first basemen.
One thing that frustrated Bill was the fact that not all games played are equal. Some players play almost every inning of their games. Others split the playing time with a platoon partner. Late-inning defensive specialists often pick up a lot of games played without actually playing a lot. For a while, Bill devised methods to estimate how many innings each fielder was actually in the game at his position, but this is very hard to do. Fortunately, companies like STATS have been publishing accurate counts of defensive innings for the last ten years. So we can now compute range factors on a per-nine-innings basis, just like we do for earned run averages.
Using a range factor based on defensive innings, Pokey Reese moves to the top of the list of 2002 second basemen with 5.86 successful chances per nine innings. Vidro drops to seventh.
The fixed sum problem
Whether you use games or innings as the basis of a range factor calculation, there's another critical problem with range factors. By measuring plays made per game or per nine innings, the method takes no account of the length of those innings. Consider the following two innings that start out the same way and feature the same mix of batted balls, only with different results:
- strikeout ... ground ball double down the third base line ... line drive single to shallow center ... popup to third ... triple into the right field corner ... ground ball single between first and second ... groundout to third
- strikeout ... great diving stop by third baseman on hard-hit grounder down the line, with the batter out at first on a strong throw by the third baseman ... line drive single to shallow center ... popup to third
In the first version of this inning, the official fielding stats record a putout for the catcher (on the strikeout), one assist (on the inning-ending ground out) and one putout (on the popup) for the third baseman, and one putout (on the grounder) for the first baseman. In the second version of this inning, the official fielding stats are exactly the same. The fact that the defense allowed three more hits in the first one is completely lost.
In this example, there's no way to tell which team defense and which individual fielders were more effective just by looking at the official fielding stats. In the more general case, the best fielders will generally end up making more plays than the poorest defenders. But the number of putouts in a nine-inning game adds up to 27 no matter how many hits are allowed, and the number of assists is mostly a product of the number of ground balls, not the skill of the infielders. So we can't use range factors to evaluate team defense at all, and they don't tell us nearly enough about individual fielders either.
Adjusted Range Factors
Even if we use defensive innings to measure playing time, we still haven't taken into account (a) the number of opportunities presented to each fielder and (b) the fact that some putouts and assists are harder to come by than others. Back in the 1980s, I developed a new type of range factor that adjusts for many of these variables in the following ways:
- it counts the number balls put in play (excluding homeruns) while each fielder was at his position, removing the strikeout rate of the pitching staff as a potential source of bias
- it counts only those putouts and assists that required the fielder to do some important work (e.g. taking a groundball and getting an out by making a throw or stepping on the bag for a force, spearing a line drive, or tracking down a fly ball) and ignores the ones that don't say much of anything about defensive range (e.g. taking a throw at first base, making the pivot on a double play, or tagging a runner on a steal attempt)
- it tracks balls put in play by left- and right-handed batters separately, since players pull the ball on the ground much more often than they go the other way
- it adjusts for the ground ball percentage of each team's pitching staff
Traditional range factors compute plays made per game or per nine innings. This method computes plays made per 100 batted balls, meaning that we can use it to get a better handle on both team and individual defense. If one team gives up a lot more hits than another, it will need more balls in play to get through a game, and the adjusted range factors for the poor fielding team will be lower.
Here's how these factors affected Vidro:
- his pitching staff was a little above average in strikeouts
- only 12% of Montreal's innings were thrown by lefties. That's a low figure, but the percentage of balls put in play by lefty hitters was about average despite the right-handed nature of his pitching staff. (By the way, if we made an assumption based on the left/right mix of the staff instead of actually counting balls put in play, we would have assumed Vidro got more chances to make plays than he really did.)
- Montreal's pitchers were second in the majors in ground ball percentage, a strong indication that Vidro's numbers were boosted significantly simply because he had more balls hit his way
Based on adjusted range factors, Vidro was a little below average among all major-league 2Bs this year, and while we can't finish our assessment of his play without using more advanced methods, we've already seen enough to conclude that his MLB-leading assist total is highly misleading.
This approach produces much better information than does an ordinary range factor, but we're still left with the fact that we're using these adjustments to make an educated guess at how many opportunities each fielder had to make plays. It goes without saying that it's possible to do better when we have access to play-by-play accounts that note the location of every batted ball.
Total Baseball's Fielding Runs
Before moving on, let me take a moment to say that the Fielding Runs numbers in the Total Baseball encyclopedia can be extremely misleading. I don't enjoy saying this, because they were developed by Pete Palmer, and Pete's a friend and one of the nicest guys I've ever met.
The first problem I have with fielding runs is that they're just a glorified range factor, with different weights for different events. Like range factors, you cannot interpret them accurately unless you know the strikeout rate and groundball/flyball ratio of the pitching staff and what percentage of left-handed batters the fielder faced. For a good example of the distortions that often creep into the fielding runs numbers, see the comments on Frank White and Ryne Sandberg in an article I wrote for ESPN.com in September, 1998.
In addition, I don't agree with some of the formulas, mainly because they put too much weight on certain events. For example, the formula for outfielders is .20(PO + 4A - E + 2DP), meaning that catching a fly ball with the bases empty earns you .20 fielding runs, while catching the same fly ball and throwing out a runner for a double play earns you 1.4 fielding runs. In both cases, the fielder made the best play available, but one counts seven times as much as the other. And suppose one center fielder reaches a ball but muffs it for a one-base error, while another lets it go up the gap for a double -- the guy who reached the ball has .20 fielding runs deducted and the second guy isn't penalized at all.
Finally, the fielding runs formula mixes range, errors and throwing into one number, which is appropriate for what Total Baseball is trying to accomplish (an overall player rating), but useless for what we do, which is to assign separate ratings for these skills.
STATS Zone Ratings
The next logical step beyond range factors is a system that counts actual opportunities to make plays. We weren't able to do that until 1989, because nobody tracked the location of every batted ball until then. The folks at STATS were the first to do it, and they developed the zone rating to take advantage of this new information.
STATS says the "zone rating measures all the balls hit in the area where a fielder can reasonably be expected to record an out, then counts the percentage of outs actually made." Instead of having to estimate the number of opportunities to make plays from defensive innings, percentages of balls in play, the left-right composition of the pitching staff, and the staff groundball/flyball ratio, we can actually count the balls hit to each fielder while they are in the game.
The zone rating could have been a tremendous breakthrough, but we disagree with some of the details of their implementation.
First, they don't count all the balls. For example, no infielder is charged with an opportunity when a grounder is hit down the lines, in the holes, or up the middle. The only plays that go into the zone ratings are the ones where the ball is hit more or less at a fielder. The net result is a system that places more emphasis on good hands than range.
Even if you didn't know this, you could infer from their numbers. The league average zone ratings range from .763 to .885 depending on the position, suggesting that fielders are turning well over 80% of all batted balls into outs. But the truth is that only about 70% of all batted balls become outs. It's clear that the most challenging opportunities, the ones that separate the best fielders from the ordinary ones, are left out of their system.
The second issue is that errors are mixed in with the ability to get to the ball in the first place. Let's suppose a player is credited with 500 opportunties in a season, and let's suppose he was very reliable, making 8 fewer errors than the average player with that many plays to make. Those 8 errors become 8 outs and produce a zone rating that is .016 above the league average. Without taking the errors into account, you might conclude that he has above-average range, when in fact he has average range and very good hands.
The third issue no longer applies but needs to be mentioned. Through the 1999 season, when an infielder started a ground ball double play, STATS credited him with two outs and one opportunity. Starting double plays is an important skill for an infielder, but this approach gives a significant boost to infielders who play behind pitchers who put lots of runners on base and/or with a pivot partner who turns the DP well, and it clouds the effort to measure defensive range. STATS doesn't do this any more, but if you have copies of the STATS Player Profiles books from the 1990s, you'll be looking at zone ratings that double-count these DPs.
Once again, let me say that the idea behind the STATS zone rating is sound and has value even with these issues. If you're looking for an overall measure of fielding performance that includes both range and errors, it won't matter to you that they're lumped together. And folks like us who are interested in separating these skills can make an adjustment for error rates to isolate the range portion.
The zones are smaller than we'd like, but my guess is that STATS did this on purpose to avoid running into two other issues that we'll talk about in a bit. First, some batted balls are playable by more than one fielder, and keeping the zones on the small side reduces the number of opportunities for one fielder to affect his neighbors. Second, outfield zones that cover the entire field make the system more vulnerable to distortions arising from different ballpark dimensions and characteristics. Our zone-oriented analysis does cover the whole field, so we've developed some methods for handling the interaction among fielders and accounting for park effects.
For a few years in the early 1990s, we used a type of zone rating called Defensive Average (DA) . It was developed by Pete DeCoursey and Sherri Nichols and used play-by-play data from The Baseball Workshop. Like the STATS zone rating, defensive average used the same principle of counting batted balls hit into each fielder's zone and counting the number of plays he made. But it covered the whole field and didn't mix apples and oranges by double-counting GDPs. As a result, we felt we got better results from defensive average than from the STATS zone ratings.
When assigning responsibility for balls hit between fielders, the STATS and DA systems are similar if an out is made. Both systems credit the fielder with one opportunity and one play. But things get tricky when the ball falls in for a hit.
If the ball falls into one of the STATS zones, the fielder responsible for that zone is charged with an opportunity. If it falls outside the STATS zones, the play is ignored, and no fielder bears responsibility for the hit.
In the DA system, each player gets charged with half an opportunity when there's a hit that lands between two fielders. That means that someone playing next to a weak fielder tends to look worse than he is, because if the other guy makes the play, there is no opportunity charged, but if the ball falls in, he's charged with half an opportunity even if it's the sort of play the other fielder would be expected to make at least some of the time.
During the years in which we used the Defensive Average system, we were aware of this limitation and did our best to make intelligent adjustments to compensate for it when assigning player ratings. But we always wanted to see if we could do better.
The Diamond Mind Approach
In 1996, we began using a collection of old methods and new tools to expand our look at defensive performance, and we have been refining and improving these methods ever since. We believe that by using these tools to look at player performance from several angles, we can learn a lot more about who accomplished what in a given season.
To one degree or another, our best tools take advantage of the fact that STATS has been recording the type (grounder, fly ball, line drive, popup, bunt) and location (direction and distance) of every batted ball since the late 1980s. Using this information, our analysis programs aren't vulnerable to the potential biases in traditional fielding stats. We know exactly how often each player was in the field, how often the ball was hit near him, and how many plays he made on those balls.
The field is divided into approximately 80 zones. We count the number of balls hit into that zone, the number of times each fielder made an out, and the number of singles, doubles, triples, and errors that resulted. When we're done, we look at the zone data for all of the major leagues and see how often the players at each position were able to make plays on those balls.
For example, on the 6939 grounders up the middle to the shortstop side of the bag during the 2002 season, MLB shortstops turned 64.4% of those balls into outs and made errors 1.9% of the time. Second basemen ranged to the other side of the bag to make the play 0.8% of the time. Almost of the remaining grounders in this zone resulted in singles, with a handful of doubles and fielders choice plays to round things out.
This gives us a baseline that we can use to evaluate performance on balls hit into this zone. Repeating this process for all batted ball types and every zone gives us an overall measure of the playmaking ability of a team and its players.
With one exception, our zone-oriented approach includes the entire field and all types of batted balls. Early on, it became clear that we needed to screen out infield popups because they don't tell us anything. Over 99% of these plays result in an out, so they don't distinguish the good fielders from the not-so-good. And because these plays are easy to make, most popups can be handled by any of several players, making the successful completion of this play as much (or more) a matter of preference than one of skill.
As I mentioned previously, we need to use measures of team defense to help us deal with the interactions among fielders. If one player doesn't get credit for making a play, it may be because another fielder beat him to it, and the first guy shouldn't be punished for playing next to a superior defender. It's only by looking at measures of team defense that we can distinguish the cases where another guy made the play from those when the ball fell for a hit. So let's take a moment to discuss team defense metrics.
Defense efficiency record (DER)
We usually start by computing the percentage of batted balls, excluding homers, that were turned into outs by the team. This percentage was labelled the Defense Efficiency Record (DER) by Bill James when he wrote about it in the 1980s, and you can find DER information on the Baseball Prospectus web site during the season.
I'm not completely sold on DER as the ultimate measure of team defense, however. For one thing, I've always been troubled by the fact that it's just a variation on batting average, with strikeouts and homeruns removed, and with the focus on the out percentage instead of the hit percentage. But league batting averages have ranged from a low in the .230s to a high in the .300s in the past 80 years, so they don't just measure batting skill. They also embody the impact of the rules of the game (strike zone, mound height), the equipment (dead ball, lively ball, juiced ball), and the changing nature of ballparks. Similarly, the league DER figures have risen and fallen by large amounts, indicating that factors other than fielding skill are built into these numbers, too.
A second question about DER is the extent to which it measures pitching versus fielding. I've always believed that DER measures some of both. There is a strong (but not perfect) correlation between a team's rankings in ERA and DER, suggesting that (a) good pitchers make their fielders look better and/or (b) the team's rank in ERA is in large part due to the quality of its defense. It's hard to know which way to look at it, but I believe it works in both directions.
Recent work by Voros McCracken and Dick Cramer suggests that pitchers have little or nothing to do with the percentage of balls in play that are turned into outs. To put it another way, the defense is entirely responsible for a team's DER ranking. I'm not ready to accept that pitchers have nothing to with these outcomes. While I haven't had time to do any detailed studies in this area, some very preliminary work suggests that good pitchers do improve a team's DER, though only by a few points. But because pitchers allow a very large number of batted balls over the course of a season, these small improvements can have a large effect on the pitcher's ERA.
Another issue with DER is that park effects can play a large role. It's clear that the enormous impact that Coors Field has on scoring isn't entirely due to homeruns. A much higher percentage of balls that stay in the field of play are falling in for hits, too, and that makes Colorado's team defense look much worse than it really is. This is the most extreme example, of course, but there are other parks that make a difference.
In other words, we start our process by computing the DER for each team, but we don't take that figure as a precise measure of the team's ability to make plays in the field. We keep the potential distortions in mind as we go through our rating process.
Other measures of team defense
Our zone-oriented analysis provides us with another way of rating team defenses. We can go zone by zone and compute how many more (or fewer) plays were made by this team than the average team, then do a weighted average of all of the zones to get an overall score for the team. That overall score is expressed as the number of plays made above or below the average. In 2002, for example, Anaheim's defense led the majors by making 120 more plays than the average team (in 4228 opportunties). These figures are not park adjusted, so they're not definitive, but they definitely add value in the process.
To isolate portions of a team's defense, we rate the infields by computing the percentage of ground balls turned into outs and the outfields based on the percentage of fly balls and line drives that were caught.
Because we use a collection of overall measures (like DER), mid-level measures (such as out rates on grounders), and detailed zone-based analysis, we can examine team defense at several levels of detail. That helps us determine which fielders are getting the job done and which are letting the team down.
We can't leave the subject of team defense without looking more closely at the parks.
We mentioned Coors Field a moment ago, but Dodger Stadium is another good example. From 2000 to 2002, that park depressed batting averages by 21 points, making it one of the best pitchers' parks in the game. And it wasn't just because of strikeouts and homers, either. Focusing only on balls hit into the field of play, Dodger Stadium took away 97 hits a year in that period. If half of them came with the Dodgers on defense, measures that ignore park effects (like DER) make LA's team defense appear to be 48 plays better than it really is.
Using play-by-play data, we can also compare the hit rate on different types of batted balls. Dodger Stadium dramatically reduces the percentage of ground balls that go for hits. It also cuts the hit rate on fly balls, but not by a whole lot. Because virtually all of the park's effect is concentrated in the infield, it would be especially easy to overrate the LA infield if we ignored this information.
Evaluating individual players
Most of our work at the player level uses zone-based data. We compare the rate at which each fielder turned batted balls into outs in each zone with the overall averages. If a player made more than the normal number of plays, he gets a plus score for that zone. If he fell short of the overall average, he gets a minus score. By computing a weighted average of all of his zones, we get a figure that tells us how many more (or fewer) plays he made than the average defender. We call this figure "net plays".
In a typical season, the top fielders at each position make 25-30 more plays than the average. Exceptional fielders have posted marks as high as 40-60 net plays, but those are fairly uncommon. Recent examples include Darin Erstad in 2002, Scott Rolen just about every year, and Andruw Jones in his better seasons. The worst fielders tend to be in the minus 25-40 range.
As a reality check, we look at other measures like range factors, adjusted range factors, STATS zone ratings, and our own version of the STATS zone ratings (with larger zones). More often than not, these measures tell similar stories. When they disagree, we look for external factors that might be skewing those other measures. In the end, we put the most weight on our net plays analysis.
But the net plays figures are starting points, not the final answer, because we have several other things to consider before we assign a rating. We've already talked about park effects, so I won't dwell on that any more.
As with the STATS zone ratings, our net plays analysis can be influenced by error rates. So we always look to see whether a fielder is making more plays mainly because he has better hands. Mike Bordick and Alex Rodriguez are two good examples from the 2002 season. In some cases, a player will have a mediocre net plays figure because he made a lot of errors, and we may bump up his range rating to account for the fact that he's getting to more balls in the first place.
For infielders, we have another analysis program that measures their ability to start double plays and get force outs when such opportunities exist. Especially for corner infielders, the ability to make the tough plays can separate the men from the boys. If a first baseman always takes the ball to the bag and doesn't start his share of double plays and force plays, he's not helping the team, even if he does record a normal number of outs.
For middle infielders, we also look at how often they are able to make the pivot on the double play. This is an important part of the second baseman's job, and he can make up for ordinary range by turning two more often. It isn't talked about very often, but we also see differences in the ability of shortstops to complete these plays.
For shortstops, we look at the zone data to see if their net plays score has been artificially depressed by sharing the left side of the infield with an especially talented third baseman. For example, Scott Rolen is way above average on balls to his left, and that cuts down on the number of plays his shortstops can make. If the overall team defense in that zone is still very good, there's no reason to penalize the shortstop. Similarly, we look for first basemen who are taking plays away from the man at second. By looking at the zone data for individual fielders and for the team as a whole, we can tell whether plays not made by one fielder are getting made by someone else.
The same is true in the outfield. For balls hit in the gaps, we look at the zone data to see if an exceptional fielder might be taking plays away from his neighbors.
Another of our analysis programs counts the number of times a player is used as a defensive sub or is removed for a defensive sub. This information doesn't tell us anything about performance, of course, but it is very helpful to know that one fielder was regarded by his manager as being superior to another.
Like many of you, we read a lot, we watch games on local TV and satellite and the highlight shows on ESPN and Fox, because it helps to have an image of a player when we evaluate the performance data. And we compile an extensive database of player notes, so we know who's coming off a knee injury or a shoulder problem that might have affected their ability to make plays.
And when the evidence doesn't match the player's reputation, we double-check our work and look very, very hard for the reasons why. Whenever possible, we talk to people -- local writers, broadcasters and sophisticated fans -- who have seen the player quite a bit to see if we can gain some additional insight into each player's performance.
After rating all of the players, we go back and double-check these individual ratings to see if they add up to something resembling the team's park-adjusted defensive performance. If not, we go back over everything we know about those players and keep at it until it makes sense.
Bill James' Win Shares
In his recent book called Win Shares (published by STATS in 2002), Bill James developed a method for apportioning each team's wins to the players who were most responsible for creating them. A big part of that method involves evaluating defense at both the team and individual level. We're still in the process of evaluating this new approach, but we can point out a few things that you might want to keep in mind as you ponder the role that system should have in evaluating players:
- Bill begins by evaluating overall team defense and then tries to break that down and assign credit/blame to positions and then players. We've been doing that for many years.
- Bill's method is intended to work with players from all eras, including that vast portion of baseball history for which play-by-play data is not available. So he chose to develop new techniques for coping with the biases inherent in traditional fielding stats. We've been aware of those biases for a long time and have always kept them in mind while evaluating traditional fielding stats.
- Bill's system is an attempt to make better estimates of the number of opportunities to make plays and the number of plays made, and it appears that he has come up with at least a few useful ways to do that. On the other hand, using play-by-play data from the 1990s, we can now count those things directly, and we want to spend some time seeing whether Bill's estimates match up with the actual data for that period. If they do, he's made a giant contribution to the field, because we can confidently apply his techniques to seasons for which we don't have first-rate play-by-play data. If they don't, we'll have to figure out why and proceed from there.
- Bill's method is intended to aggregate all aspects of fielding performance into one number, while our goal is to isolate specific skills. We have separate ratings for range, errors and throwing, and we cannot assume that a high number of defensive win shares necessarily indicates a fielder who should get a top range rating. It's possible that his range is average and his value lies in a strong arm and good hands.
- We're not yet sure about the weights Bill put on different fielding skills when coming up with his fielding win shares. To some extent, that doesn't matter to us because we're more interested in rating the individual components of defense anyway. But as fans of baseball analysis, we're curious to see whether Win Shares really works, so we hope to find time to look at this part of his system, too.
The bottom line is that we will continue to rate fielders for modern seasons based on our analysis of play-by-play data. But we're always on the lookout for new and better ways to evaluate fielders, and if our review suggests that the fielding portion of the Win Shares model provides us with some new tools, we'll use them.
Other Approaches to Rating Players
We know that a lot of our customers like our products precisely because we do our own analysis instead of rating everyone based on prevailing opinions. At the same time, we know that there are other people who don't buy our products because Tim McCarver says that someone is a brilliant fielder, and because McCarver is a well-known TV analyst and ex-player, he must therefore know a lot more about this stuff than we do.
Let's suppose, for the sake of argument, that we wanted to ditch all of our analysis and rate players based upon what we read and hear from the media. That's a lot harder to do than you might think, for a whole host of reasons.
When someone in the media says "he's the best second baseman in baseball," it's not always clear what it means. It could mean he's the best overall player at his position (including hitting, running, etc.). It could mean he has great hands. It could mean he turns the double play well or that he has great range. Even if it means all of these things to some degree, an overall evaluation doesn't help us. We have separate ratings for separate skills, and we need objective evaluations of each skill.
The media doesn't talk about all the players. We have 1200+ players to rate each year, and only a fraction of them are regularly discussed. Some players may be overrated because they play for teams in media-intensive cities or teams that got a lot of exposure in the playoffs, while good players on small-market teams may be overlooked.
It often seems as if it takes a year or two for someone's reputation to catch up with a change in his performance, for better or worse. In the 15+ years we've been rating players, we've often identified someone who has been making a lot of plays without getting noticed. It's not unusual to see that player start to win Gold Gloves two years later. And then keep winning Gold Gloves for a few years after their performance no longer merits them.
Managers and general managers make public comments about players all the time, but their remarks can be influenced by the needs of the team. Sometimes it's to their advantage to talk about players in certain ways, whether it's to hype someone for marketing purposes or to talk them down in a salary squabble. It's hard to tell when we can take a comment at face value and when we need to discount it because of a hidden agenda.
I'd love to incorporate the opinions of professional baseball scouts because they are trained to see things that other people don't see. But it's difficult to find a collection of scouts who have seen every player and can make their evaluations available to people outside the organizations they work for.
We could base our judgments on how often someone shows up on SportsCenter. But the photogenic play isn't always the best play. The exact same fly ball might produce a routine play for a great fielder, a diving catch for the average fielder, or a single for the poor fielder. The diving catch is the only one that makes the highlight films. The majority of highlight-film plays are made at the edge of the fielder's effective range, whatever that range happens to be.
(A few years ago, I saw a game in Baltimore in which the right fielder broke back on a line drive, realized it wasn't hit that hard, reversed course and recovered in time to make a nice shoestring catch. What should have been a very easy play wound up being shown dozens of times as CNN's Play of the Day.)
We could place a lot of weight on the Gold Glove voting. Putting aside the question of how well the voters do that job, there are still several obstacles. They don't announce the voting, so we have no idea who came second or how close the vote may have been. And even if we were to accept all Gold Glovers as top fielders, we can't award them all our top range rating because Gold Gloves are given for overall fielding performance, and we have to rate players separately for range, throwing, and avoiding errors. For some Gold Glovers, the most accurate way to rate them would be to assign an excellent throwing rating, a very low error rate, and an average range rating.
We do our very best to rate players based on performance, not reputation. To that end, we license play-by-play data and spend a lot of time developing new ways to analyze that information and interpreting that information in light of everything we know about that player's performance. The phrase "everything we know" includes our own analysis of team and player fielding skill, other measures like range factors and STATS zone ratings, injury reports, park effects, plus what we see and hear and read as we follow baseball on a daily basis. We hope you like the results as much as we enjoy doing this work.