2004 Predictions -- Keeping Score
By Tom Tippett
October 14, 2004
When we release our annual Projection Disk in the spring, we give our customers a chance to get a head start on the baseball season. With projected statistics and ratings for over 1600 established big leaguers and top minor-league prospects, plus league schedules, park factors, team rosters, projected pitching rotations, bullpen assignments, lineups and depth charts, the Projection Disk gives them everything they need to play out the new season using the Diamond Mind Baseball simulation game.
It also gives us a chance to get a head start on the season. Ever since we created the first Projection Disk in 1998, we've been publishing our projected standings along with comments on the outlook for all 30 teams. Those projected standings are based on the average of a number of full-season simulations using the Projection Disk.
Of course, nobody really knows what's going to happen when the real season starts, but we're always curious to see how our projected results compare to the real thing. And we're equally interested in seeing how our projections stack up against the predictions made by other leading baseball experts and publications. This article takes a look at those preseason predictions and identifies the folks who were closest to hitting the mark in 2004. And because anyone can get lucky and pick the winners in one season, we also look at how everyone has done over a period of years.
In addition to projecting the order of finish, our simulations provide us with projected win-loss records, projected runs for and against, and the probability that each team will make the postseason by winning its division or grabbing the wild card.
Unfortunately, most of the predictions that are published in major newspapers, magazines and web sites don't include projected win-loss records. Instead, they give the projected order of finish without indicating which races are expected to be hotly contested and which will be runaways. Some don't even bother to predict the order of finish, but settle instead for the division winners and wild card teams.
As a result, we do our best to assign a meaningful score to each prediction based solely on order of finish within each division. We borrowed the scoring system from our friend Pete Palmer, co-author of Total Baseball and The Hidden Game of Baseball, who has been projecting team standings for more than 35 years.
Pete's scoring system subtracts each team's actual placement from its projected placement, squares this difference, and adds them up for all the teams. For example, if you predict a team will finish fourth and they finish second, that's a difference of two places. Square the result, and you get four points. Do this for every team and you get a total score. The lower the score, the more accurate your predictions.
We don't try to break ties. If, for example, two teams tie for first, we say that each team finished in 1.5th place for the purposes of figuring out how many places a prediction was off. Suppose a team was projected to finish third and they tied for first instead. That's a difference of 1.5 places. The square of 1.5 is 2.25, so that would be the point total for this team. That's why you'll see some fractional scores in the tables below.
Keeping things in perspective
That first year, we created a little database with our projected standings and those of fourteen national publications, and we were pleased to see that we ended the year with the best accuracy score among those fifteen forecasts. When we wrote up the results and posted them to our web site, however, we were very careful not to make any grand claims, saying:
"I'm not sure what to make of all this. It's just one year, and it's entirely possible that we were just lucky. Time will tell whether our approach to projecting seasons is consistently better than average."
Over time, we expanded our database to include the predictions of prominent baseball writers from major newspapers and other publications. This is easier said than done because some publications and web sites change their approach from year to year. For example, we used to track the predictions of several ESPN.com writers and editors, but they limited their picks to division winners in 2003. So the number of entries in our database can rise and fall depending on what the various publications do and whether we were able to find those predictions in our spring survey.
In the sections below, we'll show you how various prognosticators ranked in 2004 and over a period of years, with the period varying in length depending on when we added that person or publication to our database. We don't make any claims of completeness here -- there are lots of other predictions that are not in our database -- but we think you'll find that our sample is an interesting one.
For several reasons, we want to emphasize that it's important that nobody take these rankings too seriously.
First, this isn't the only scoring system one could use to rank these projections, of course. A fellow named Gerry Hamilton runs a predictions contest every year (see http://www.tidepool.com/~ggh1/index.html) and assigns a score based on how many games each team finished out of their predicted place in the standings. (We came 22nd out of 195 predictions in their 2004 contest after finishing 4th in 2003.)
Second, because of publishing deadlines, the predictions in some spring baseball magazines are made long before spring training started, others are prepared in early-to-mid March, and some are compiled just before opening day. Obviously, the longer you wait, the more information you have on player movement and injuries.
Third, many newspaper editors ask staff writers to make predictions so their readers have something to chew on for a couple of days. Some writers hate doing them but comply because their editors insist. Some do it even though their main beat is a different sport. Others may make off-the-wall picks just for grins or feel compelled to favor the hometown teams.
Rankings for 2004
It's interesting to see how everyone did this year, but it's even more interesting to look back to see how different people perceived the baseball world before the season started. We'll start by showing you the prediction rankings for the current season, then we'll follow that up with a review of each division race and how those races affected these rankings.
Forecaster Score New York Times 30 Las Vegas over-under line 32.5 Tony DeMarco, MSNBC.com 40 Diamond Mind simulations 42 Bob Hohler, Boston Globe 42 Joe Sheehan, Baseball Prospectus 42 Michael Wolverton, Baseball Prospectus 42 David Lipman, ESPN.com 44 Michael Holley, Boston Globe 46 Gary Huckabay, Baseball Prospectus 46 Team payroll (per USA Today) 46 Poll of SABR members 48 Athlon 48 Eric Mack, CBS SportsLine 48 2003 final standings 48 MLB Yearbook 50 Baseball Prospectus 52 Nate Silver, Baseball Prospectus 52 Lindy's 52 Dan Shaughnessy, Boston Globe 52 ESPN.com power rankings 56 Phil Rogers, ESPN.com 56 Steve Mann 56 The Sporting News (Ken Rosenthal) 58 Rany Jazayerli, Baseball Prospectus 58 Charley McCarthey, CBS SportsLine 58 Baseball America 60 Sports Illustrated 60 Spring Training Yearbook 60 Tristan Cockroft, CBS SportsLine 60 USA Today 61.5 Street & Smith 62 Chris Kahrl, Baseball Prospectus 62 Miami Herald 64 Derek Zumsteg, Baseball Prospectus 64 USA Today Sports Weekly 66 Jonah Keri, Baseball Prospectus 66 Pete Palmer 68 Dallas Morning News 68 Seattle Times 68 CBS SportsLine 72 Gordon Edes, Boston Globe 72 Scott Miller, CBS SportsLine 72 ESPN the magazine (Peter Gammons) 74 Los Angeles Times 74 Bob Ryan, Boston Globe 76 Adam Reich, CBS SportsLine 80 Spring training results 134
The "Diamond Mind simulations" entry is the one representing the average result of simulating the season 100 times. These simulations were done about three weeks before the season started.
There are a few other entries in this list that don't represent the views of a writer or a publication. If you predicted that the 2004 standings would be the same as in 2003, your score would have been 48. If you put together a set of standings based on the Las Vegas over-under line, you'd have racked up an impressively low total of 32.5 points. If you thought the teams would finish in order from highest to lowest payroll, your score would have been 46.
And if you predicted that the regular season standings would match the 2004 spring training standings, your score would have been 134. In other words, the spring training results were almost useless as a predictor of the real season, and that's been true for at least the past four years.
Reviewing the divisions
Much more interesting than the overall scores, in our opinion, are the details. Which teams were consistently under- or over-estimated? Which divisions contained the biggest surprises? Did anyone predict that certain teams would have a sudden change of fortune?
Leaving out the entries that don't represent writers or publications, here are some observations about how the others saw things last spring:
AL East. Everyone had either New York or Boston winning the division, with the Yankees being picked first four more times than the Red Sox. Other than Gary Huckabay, who picked Toronto second and Boston third, everyone had this as a two-team race. A good number of people picked Baltimore third ahead of Toronto, but four people picked the Orioles to finish last, too, so there was no clear consensus on the Orioles.
AL Central. The Kansas City Royals were the downfall for many this year. The young Royals led the division for much of the 2003 season before fading down the stretch, then added some veteran players during the winter. As a result, they were a trendy pick to win the division or finish second behind Minnesota. A good part of the reason our score is among the leaders in 2004 is that we identified the Royals as one of the teams most likely to disappoint. That was based largely on our simulation results, but also based on the fact that the 2003 Royals didn't have the statistical foundation to justify their high placement. Surprisingly, seven predictions had Detroit finishing fourth, in every case because they thought the Indians would be even worse.
AL West. A year ago, our score was significantly improved because we chose to rank the Mariners ahead of the Angels when those two teams finished in a virtual tie for second in our simulations. This year, those teams were again neck and neck, with the Mariners averaging one more win but the Angels having a slightly better run margin. In a decision we'd love to have back, we gave the nod to Seattle. More than twice as many people chose Anaheim to win the division over Oakland, with three choosing the Mariners for first place. Everyone picked the Rangers to finish last, meaning that nobody in our survey got this division (or any other division) correct from top to bottom.
NL East. Before the season, the Phillies appeared to be loaded with talent, the Marlins were shedding payroll after winning the World Series, and the Braves seemed quite vulnerable. All three teams were selected by at least one person to win the division, with Philly being the choice about 80% of the time. Most predictions had a clear separation between the top three and the bottom two, but Montreal (five times) and New York (three times) snuck into third place on a few lists.
NL Central. Only two entries (Diamond Mind and Steve Mann) had the Cardinals finishing first in this division. The others seemed caught up in the hype surrounding the Cubs young pitching (Prior, Wood, Zambrano) and the Astros older pitching (Clemens, Pettitte). Just about every prediction had the Cubs and Astros duking it out for first with the Cardinals third. The picks for first place were almost evenly split between Chicago and Houston, with the Cubs having a very slight edge. There was some variation in the order of the bottom three teams, but nobody picked any of them to finish in the top half of the division.
NL West. Picking the Dodgers to finish at or near the top was a key to the better-scoring predictions this year, as was picking against the Diamondbacks. We were among those who thought Arizona would finish ahead of Los Angeles, but we were not alone. Approximately 2/3 of the predictions had Arizona beating the Dodgers, with thirteen people picking the D'backs to win the division outright. (In an example of the importance of timing, Arizona finished one game ahead of the Dodgers in our simulations, but the teams would have been reversed had we run them again after Milton Bradley was traded to LA.) It's clear that many people thought this division was wide open, as four of the five teams (everyone but the Rockies) were picked to finish first at least once.
Summing up. For the first time ever, not a single division was nailed by even a single predictor. Certain teams surprised a lot of people by overachieving (Texas, Los Angeles) or falling short (Arizona, Seattle, Kansas City, Toronto). As a result, the prediction scores were much higher this year than in 2003. A year ago, things went more in accordance with expectations.
Here are the rankings for those who were included in our sample every year. There's a new entry this year. We went back and ranked all of the teams based on their payroll as reported in USA Today in April, and we computed a standings score based on the "prediction" that teams would finish in order from highest to lowest payroll. As you can see, that doesn't seem to be a very good predictor.
Forecaster 2004 2003 2002 2001 2000 1999 1998 Total Diamond Mind 42.0 28.0 40.0 54.5 68.0 42.0 44.5 319.0 Las Vegas over-under 32.5 30.0 46.0 65.5 51.5 48.0 52.0 325.5 Sports Illustrated 60.0 30.0 48.0 56.5 40.0 56.0 54.0 344.5 Steve Mann 56.0 48.0 60.0 38.5 58.0 54.0 44.0 358.5 Sports Weekly 66.0 38.0 42.0 46.5 58.0 51.5 60.0 362.0 Athlon 48.0 36.0 38.0 67.5 42.0 72.0 72.0 375.5 Sporting News 58.0 44.0 54.0 52.5 38.0 78.0 54.0 378.5 Pete Palmer 68.0 56.0 50.0 70.5 54.0 40.0 58.0 396.5 Street & Smith 62.0 36.0 70.0 68.5 58.0 68.0 64.0 426.5 Previous season 48.0 42.0 48.0 64.5 56.0 70.0 100.0 428.5 Payroll ranking 46.0 64.0 102.0 60.0 88.0 72.0 44.0 476.0
In 1999, we added some writers from the Boston Globe.
Forecaster 2004 2003 2002 2001 2000 1999 Total Gordon Edes, Boston Globe 52.0 32.0 54.0 56.5 26.0 28.0 248.5 Las Vegas over-under line 32.5 30.0 46.0 65.5 51.5 48.0 273.5 Diamond Mind simulations 42.0 28.0 40.0 54.5 68.0 42.0 274.5 Sports Illustrated 60.0 30.0 48.0 56.5 40.0 56.0 290.5 USA Today Sports Weekly 66.0 38.0 42.0 46.5 58.0 51.5 302.0 Athlon 48.0 36.0 38.0 67.5 42.0 72.0 303.5 Baseball America 60.0 28.0 48.0 54.5 54.0 70.0 314.5 Steve Mann 56.0 48.0 60.0 38.5 58.0 54.0 314.5 Sporting News 58.0 44.0 54.0 52.5 38.0 78.0 324.5 Previous season standings 48.0 42.0 48.0 64.5 56.0 70.0 328.5 Dan Shaughnessy, Globe 52.0 56.0 70.0 44.5 54.0 58.0 334.5 Pete Palmer 68.0 56.0 50.0 70.5 54.0 40.0 338.5 Bob Ryan, Boston Globe 76.0 40.0 58.0 84.5 58.0 40.0 356.5 Street & Smith 62.0 36.0 70.0 68.5 58.0 68.0 362.5 Payroll ranking 46.0 64.0 102.0 60.0 88.0 72.0 432.0
The Diamond Mind simulations missed the mark by quite a bit in 2000. We added a new concept to our projection system that year, but we were unhappy with the results, and we took that out of the model before generating our projections in 2001. The results have been much better since. As you can see, the Las Vegas over-under line has been getting much better in recent years.
Forecaster 2004 2003 2002 2001 2000 Total Las Vegas over-under line 32.5 30.0 46.0 65.5 51.5 225.5 Athlon 48.0 36.0 38.0 67.5 42.0 231.5 Diamond Mind simulations 42.0 28.0 40.0 54.5 68.0 232.5 Sports Illustrated 60.0 30.0 48.0 56.5 40.0 234.5 Gordon Edes, Boston Globe 72.0 32.0 54.0 56.5 26.0 240.5 Baseball America 60.0 28.0 48.0 54.5 54.0 244.5 Sporting News 58.0 44.0 54.0 52.5 38.0 246.5 Previous season standings 48.0 42.0 48.0 64.5 56.0 248.5 USA Today Sports Weekly 66.0 38.0 42.0 46.5 58.0 250.5 Steve Mann 56.0 48.0 60.0 38.5 58.0 260.5 Dan Shaughnessy, Globe 52.0 56.0 70.0 44.5 54.0 276.5 Street & Smith 62.0 36.0 70.0 68.5 58.0 294.5 Pete Palmer 68.0 56.0 50.0 70.5 54.0 298.5 Bob Ryan, Boston Globe 76.0 40.0 58.0 84.5 58.0 316.5 Payroll ranking 46.0 64.0 102.0 60.0 88.0 360.0
Lindy's was a strong addition to our survey in 2001. We also added the San Francisco Chronicle that year, but they've been dropped from this list because we couldn't find their 2004 predictions. That paper ranked second from 2001 to 2003.
Forecaster 2004 2003 2002 2001 Total Diamond Mind simulations 42.0 28.0 40.0 54.5 164.5 Lindy's 52.0 40.0 42.0 36.5 170.5 Las Vegas over-under line 32.5 30.0 46.0 65.5 174.0 Tony DeMarco, MSNBC.com 40.0 34.0 34.0 67.5 175.5 Athlon 48.0 36.0 38.0 67.5 189.5 Baseball America 60.0 28.0 48.0 54.5 190.5 USA Today Sports Weekly 66.0 38.0 42.0 46.5 192.5 Sports Illustrated 60.0 30.0 48.0 56.5 194.5 Steve Mann 56.0 48.0 60.0 38.5 202.5 Previous season standings 48.0 42.0 48.0 64.5 202.5 Sporting News 58.0 44.0 54.0 52.5 208.5 Los Angeles Times 74.0 18.0 44.0 73.5 209.5 Gordon Edes, Boston Globe 72.0 32.0 54.0 56.5 214.5 Dan Shaughnessy, Globe 52.0 56.0 70.0 44.5 222.5 Street & Smith 62.0 36.0 70.0 68.5 236.5 Pete Palmer 68.0 56.0 50.0 70.5 244.5 Bob Ryan, Boston Globe 76.0 40.0 58.0 84.5 258.5 Payroll ranking 46.0 64.0 102.0 60.0 272.0 Spring training results 134.0 70.0 86.0 113.5 403.5
Here's how things looked from 2002 to 2004. The LA Times was unable to follow up the excellent 2003 predictions that put them in top spot in last year's two-season rankings.
Forecaster 2004 2003 2002 Total Tony DeMarco, MSNBC.com 40.0 34.0 34.0 108.0 Las Vegas over-under line 32.5 30.0 46.0 108.5 Diamond Mind simulations 42.0 28.0 40.0 110.0 Bob Hohler, Boston Globe 42.0 32.0 38.0 112.0 Athlon 48.0 36.0 38.0 122.0 Lindy's 52.0 40.0 42.0 134.0 Los Angeles Times 74.0 18.0 44.0 136.0 Baseball America 60.0 28.0 48.0 136.0 Sports Illustrated 60.0 30.0 48.0 138.0 Previous season standings 48.0 42.0 48.0 138.0 USA Today Sports Weekly 66.0 38.0 42.0 146.0 USA Today 61.5 32.0 58.0 151.5 Sporting News 58.0 44.0 54.0 156.0 Gordon Edes, Boston Globe 72.0 32.0 54.0 158.0 Steve Mann 56.0 48.0 60.0 164.0 Street & Smith 62.0 36.0 70.0 168.0 Bob Ryan, Boston Globe 76.0 40.0 58.0 174.0 Pete Palmer 68.0 56.0 50.0 174.0 Dan Shaughnessy, Globe 52.0 56.0 70.0 178.0 Payroll ranking 46.0 64.0 102.0 212.0 Spring training results 134.0 70.0 86.0 290.0
Finally, here's how things have looked over the past two years.
Forecaster 2004 2003 Total Las Vegas over-under line 32.5 30.0 62.5 Diamond Mind simulations 42.0 28.0 70.0 Tony DeMarco, MSNBC.com 40.0 34.0 74.0 Bob Hohler, Boston Globe 42.0 32.0 74.0 Athlon 48.0 36.0 84.0 Baseball America 60.0 28.0 88.0 Sports Illustrated 60.0 30.0 90.0 Previous season standings 48.0 42.0 90.0 MLB Yearbook 50.0 40.0 90.0 Lindy's 52.0 40.0 92.0 Los Angeles Times 74.0 18.0 92.0
USA Today 61.5 32.0 93.5 Street & Smith 62.0 36.0 98.0 Sporting News 58.0 44.0 102.0 USA Today Sports Weekly 66.0 38.0 104.0 Gordon Edes, Boston Globe 72.0 32.0 104.0 Steve Mann 56.0 48.0 104.0 Dan Shaughnessy, Globe 52.0 56.0 108.0 Spring Training Yearbook 60.0 48.0 108.0 ESPN the magazine 74.0 36.0 110.0 Payroll ranking 46.0 64.0 110.0 Pete Palmer 68.0 56.0 114.0 Bob Ryan, Boston Globe 76.0 40.0 116.0 Spring training results 134.0 70.0 204.0
Overall, we've been pretty happy with our results, and if there's one thing that stands out, it's our ability to identify over-rated teams.
In 2004, we saw the Royals as a 2003 overachiever that was unlikely to repeat, we projected the Blue Jays to finish below .500, and we didn't buy all of the hype surrounding the Cubs and Astros.
A year earlier, our simulations correctly indicated that the Mets were likely to finish at the bottom of their division again, the Angels were very unlikely to repeat their 2002 success, and the Dodgers wouldn't score enough runs to make a serious run at the NL West title.
Even so, we're always surprised by something that happens each year. We didn't anticipate the emergence of the Rangers and Dodgers in 2004 or the surprising finishes of the Marlins and Royals the year before. As a result, we have a bunch of test cases to study as we consider possible improvements to our projection system.
More than anything, this process -- projecting the season in March, watching the real thing for six months, and taking a look back after the season -- is highly educational for us. So we'll be back with our projected 2005 team standings in March.
- Jim Wheeler
- Tags: Projected Standings
2001 Gold Glove Review
By Tom Tippett
December 10, 2001
If you haven't already done so, please read the introduction to the 2002 Gold Glove Review article for a summary of the techniques we use for evaluating defensive performance.
Pitchers. There's a very strong tendency for Gold Glove voters to fixate on one guy and keep giving him the award year after year after year, as long as he doesn't get hurt or do anything to make it clear that something has changed. This tendency is especially strong for pitchers, perhaps because the voters don't get to see them as often.
At other positions, we can judge performance over a span of 1,000 to 1,400 defensive innings, but even the most durable starting pitchers are in the field only for 200-250 innings. And relievers get only a fraction of the innings of a starting pitcher.
With 14 or 16 teams in the league, a voter might get to see a certain shortstop play 80 innings in the field. That's not much in the context of a whole season, but it sure beats the 10-20 innings they might see of a starting pitcher or the 4-5 innings a reliever might pitch in those games.
So it's hard for anyone to evaluate pitcher defense just by watching, because nobody is in position to watch enough pitchers in enough situations to get a complete picture. And it's hard to evaluate pitchers just by looking at their putouts and assists because a pitcher's tendency to induce ground balls can have a major impact on those numbers. Even if you're a brilliant fielder, you're not going to look good next to Greg Maddux if you're a fly-ball pitcher and they're using traditional fielding stats to evaluate you.
This year, Mike Mussina was chosen for the fifth time, and he's a pretty good pick. He had a good year, handling 43 chances successfully while participating in 5 double plays, making only one error, and doing a very good job holding opposing runners. But there are other deserving candidates.
(By the way, I'll leave it up to you to decide whether holding runners is a pitching skill or a defensive skill. But I'll mention it for those of you who think it's relevant to a Gold Glove debate.)
Freddy Garcia also participated in five double plays and made only one error while handling 68 chances successfully, more than half again as many as Mussina. On the other hand, Garcia creates more chances for himself because he's a ground ball pitcher, and he doesn't hold runners well.
Steve Sparks had 62 successful chances, only one error, and held runners well despite throwing a pitch, the knuckleball, that is easy to run on. He was involved in one double play.
Brad Radke had 57 successful chances, four double plays, and only one error, but wasn't quite as good as Sparks and Mussina at holding runners.
Andy Pettitte was error-free in 49 successful chances with one double play and has a terrific pickoff move, though he is less successful holding runners close when he goes home with the pitch.
Jeff Weaver also handled 49 chances without an error. He was in on four double plays and was in the middle of the pack in holding runners. All things considered, my vote would have gone to Garcia this year.
In the other league, Greg Maddux won his 12th straight, and there's no question that he's a very good fielder. But it must also be said that he has a head start on his competition because he's an extreme ground-ball pitcher who creates for himself a ton of opportunities to make plays. This year, he led the majors by handling 72 chances successfully, making only one error in the process.
But there are two arguments against Maddux's iron grip on this award. First, quite a few others have ranked above Maddux each year in plays made per batted ball in his zone. And Maddux has made 14 errors in the past five years; that's a lot for a pitcher, and only three other pitchers have made more in that span.
Consider Kirk Rueter. I'll bet if the voters had picked him a few years ago, they'd keep picking him every year just like they do with Maddux, because if Rueter had once been deemed the best, he's definitely doing enough to reinforce the view that he still is.
This year, Rueter handled 61 chances without an error and took part in eleven (!) double plays. Among players with at least 50 balls hit into his zone, he ranked #1 in converting those chances into outs. And he was almost impossible to run on.
Last year, Rueter handled 52 chances without an error and took part in four double plays. He converted an extremely high number of batted balls into outs and was almost impossible to run on. In 1999, Rueter handled 45 successful chances but made one error.
Over the past five years, Maddux has made 14 errors in 424 chances for a fielding percentage of .967. In the same span, Rueter has made 3 errors in 265 chances for a fielding percentage of .989. Rueter has been involved in seven more double plays (26 to 19) despite pitching about 240 fewer innings. Rueter has converted a noticeably higher percentage of batted balls into outs. The only area where Maddux has the edge is raw totals, and that's only because he generates so many more come- backers than the average pitcher.
Getting back to the 2001 season, the pitchers who bested Maddux in converting opportunities into outs are Adam Eaton, Rueter, Chris Reitsma, Livan Hernandez, Russ Ortiz, Tom Glavine, Javier Vazquez, and Mike Hampton, in that order.
Eaton only pitched for half the season and made two errors, so I don't consider him to be in the same league as the others, though he's someone to watch for the future. Rueter, Reitsma, Hernandez, Glavine, Vazquez, and Hampton each handled more than fifty chances without making an error.
Maddux was a good choice. Any of these guys I just mentioned would have been a slightly better choice. Rueter was the best of the bunch and deserved the Gold Glove this year. Just as he did last year.
Catchers. Ivan Rodriguez is the owner of one of the best throwing arms in history, and has been a lock for this award for many years. He had another great throwing year, and even though he missed a third of the season due to injury, and he's the hands-down choice again this year. For some reason, the best arms have found their way into the other league in the past few years, and there's nobody left in the AL to challenge him.
A year ago, I argued that Brad Ausmus should have been the choice in the AL, partly because he had a great year defensively and partly because Rodriguez missed half the season. Ausmus is now in the NL and had another good year throwing, though others bested him in that department, and backed it up by allowing only one passed ball (best in the majors) and making only three errors (tied for second best in the majors).
There were other candidates, of course. Jason LaRue, Mike Matheny, and Henry Blanco threw out a higher percentage of enemy base stealers. But LaRue allowed 15 passed balls, second most in baseball, despite starting only 95 games behind the plate. Blanco started only 94 games himself, and didn't quite match up to Ausmus at any rate.
In my eyes, it's almost impossible to choose between Ausmus and Matheny. Playing time was similar. Ausmus made one fewer error and was charged with five fewer passed balls. On the other hand, Matheny had a better year throwing, though he got more help from his pitchers than Ausmus did. All in all, I think Ausmus was a worthy victor.
First basemen. Based on our analysis, there are four men who could reasonably be thought of as viable candidates at this position, two in each league: Doug Mientkiewicz and Tino Martinez in the AL, Kevin Young and Todd Helton in the NL.
The voters got it right when they chose Mientkiewicz over Martinez. Doug had a better fielding percentage, turned a higher percentage of batted balls into outs, and led the majors in highlight-reel plays. It's actually an easy choice, but I wanted to mentioned Martinez because he's a very good fielder who had another very good year, and he deserves some recognition.
It's not quite so clear in the NL. The voters picked Helton, who I thought should have won the award over J. T. Snow in 2000, but Young had a terrific year, too. Both the Diamond Mind and STATS methods for assessing range give Young a slight edge over Helton. And after making a boatload of errors in 1999 and 2000, Young got his act together and finished around the league average in fielding percentage. Helton led the league in this category.
Over the past four years, Helton has shown more range than any other first baseman in baseball. Young is second. You rarely hear good things about Young's range because he made far too many errors in two of those four seasons. But the man can cover ground at first base.
Helton and Young were almost on par with each other this year, but I'd agree with the voters and choose Helton. He's been the best in the league since 1998 and this year sustained his high level of play over 157 starts (compared to only 125 for Young).
Second basemen. Here's some of what I wrote a year ago:
"Here we go again. Roberto Alomar won his ninth Gold Glove, and there isn't a baseball writer or television commentator who doesn't gush incessantly about Alomar's brilliance in the field. And I've seen him make some very spectacular plays myself. Problem is, year after year, our analysis (and other measures such as range factors and the STATS zone rating) shows that he doesn't make many more plays than the average second baseman.
Alomar was one of three Cleveland infielders to be rewarded with Gold Gloves this season. But that infield was below the league average in turning ground balls into outs. And according to the STATS Major League Handbook, they were fourth worst in the league in converting double plays when grounders were hit in double-play situations.
And even though they used a lot of different pitchers this year, I don't think you can argue that this defense was made to look worse by a lousy pitching staff. They did, after all, get almost 600 innings from three good starting pitchers (Burba, Colon, Finley) and a bunch more from a group of veteran relievers who have fared quite well playing in front of other defenses in the recent past.
The bottom line is that somebody isn't making nearly as many plays as people think ..."
I'm repeating so much of last year's comment because it's still relevant. This season, Cleveland's infield was 13th in the league in the percentage of ground balls turned into outs. And they were only a hair above the league average in double-play percentage.
You could argue that the infield looks bad because the corner guys -- Jim Thome at first, Travis Fryman and Russ Branyan at third -- don't cover much ground, and you'd be correct. Problem is, there's absolutely no evidence that their middle infielders are doing more than their share, either.
The best case for Alomar's Gold Glove is that he won the fielding percentage title by making only five errors all season. His nearest rivals, Ray Durham and Bret Boone, made ten errors each. But Alomar's range factor was .12 below the league average despite playing behind a ground-ball staff. His STATS zone rating was thirty-five points below the norm for his position. According to our method, Alomar made 20 fewer plays than the average 2B, and he was consistently below average on all types of plays -- line drives, ground balls and popups. And he was 33 years old this year, an age when many middle infielders struggle to keep up with their younger rivals.
Those numbers are indicative of a player who deserves our Fair rating. But we gave him an Average rating anyway. Why? Because he has a great reputation and because it's possible that his pitching staff did indeed make him look worse that he really is.
This is the fifth time in the past nine years that we've given Alomar a rating that's better than our analysis shows is justified. Not once in those nine years has his play-making score been far enough above the league average to merit a Very Good rating.
But every year we say to ourselves that there must be some aspect of his ability that doesn't show up in fielding studies. But don't you think that if Alomar was truly the best at his position in the history of baseball, he'd score well at least once in nine years? Is it really possible that external factors or quirks in the data would make him look worse every single year?
I know that some people will look at this rating and conclude that (a) we're vastly underestimating his ability, (b) we have something against Alomar, and/or (c) we know nothing about baseball. Looking at all of the evidence, however, I have to say that, if anything, we've been generous in how we've rated him over the years.
I'll end this commentary with a quote from The New Bill James Historical Baseball Abstract:
"[Alomar is] an overrated fielder, in my opinion; a good fielder, even a very good one, but no better than some guys who don't win Gold Gloves, like Fernando Vina."
That was written before the 2001 data was available, and I agree with Bill's assessment of Alomar's career. We're now in the late stages of that career, however, and we're seeing evidence of a decline in Alomar's play-making ability.
Other worthy candidates for the AL Gold Glove were Adam Kennedy, Ray Durham, Bret Boone, and Jerry Hairston. Kennedy was the best of this group, but started only 123 games. Nevertheless, I'd go with Kennedy.
The other league's Gold Glove went to Fernando Vina. If Pokey Reese had played the entire year at second, instead of splitting his time between second and short, he would have gotten my vote. But he didn't, and that left things open for Vina, who I nominated as my choice a year ago.
Vina had another good year, with above-average range and a low error rate, and the Cardinals were second in the NL in double play percentage. Those are solid credentials. And he played a lot more than some of the other guys (Ron Belliard, Damian Jackson, Mark Grudzielanek) who could be considered viable candidates.
Third basemen. The voters got it right at this position. Scott Rolen was so amazing that he managed to stand out in a league featuring several other very good players who had very good years. His closest rivals were Robin Ventura and Jeff Cirillo. But Rolen was so good that if there was an award for defense -- an MVP or Cy Young for defense, single award that crosses all positions -- Rolen would be my choice for NL Defensive Player of the Year.
The AL produced three strong candidates, Eric Chavez (the winner), Corey Koskie, and David Bell. Of the three, Chavez was best in range and sure-handedness, and he played a lot more than Bell. So I agree with this selection, too.
Shortstops. As I mentioned above, the voters tend to settle on one guy and give him the award year after year as long as he doesn't blow it. By posting the second-best fielding percentage in the majors (.989, trailing only Rey Sanchez's .991), and by continuing to ply his trade with grace and style, Omar Vizquel did enough this year to keep the voters' trust, and he was rewarded with his ninth straight Gold Glove.
I'm not going to spend a lot more time writing about the Cleveland defense because I did that in the second base comment above. Suffice it to say that Vizquel's range wasn't all that good this year. If Rey Sanchez hadn't been traded out of the league, I'd nominate him, as he bested Vizquel in both range and steadiness. But Sanchez WAS traded out of the league, and in his stead, my vote goes to Toronto's Alex Gonzalez.
Interestingly, I don't recall hearing any gripes about Orlando Cabrera getting the nod in the NL. I figured that with Rey Ordonez healthy and playing a full season, some in New York would have pushed for him to get it back. But Ordonez' range was nothing special according to the measures we use, and it may be that the lingering effects of his arm and shoulder injuries affected his ability to make certain plays for at least part of the season.
On the other hand, Cabrera showed above-average range and was among the steadiest fielders in either league. Rich Aurilia also looked quite good, but in my opinion, Cabrera was a deserving winner.
Outfielders. There are a lot of good outfield candidates this year, and with one major exception, all of the winners were drawn from that pool. In other words, five of the six choices were at least in the right ballpark.
According to our analysis, five center fielders stood out this year, and all of them are in the AL. They are, from top to bottom, Chris Singleton, Kenny Lofton, Mike Cameron, Darin Erstad, and Torii Hunter. Bobby Higginson and Jacque Jones were the two left fielders who separated themselves from the pack. In right, the top performers were in the NL, with Jermaine Dye and Ichiro Suzuki being the best of the AL contenders.
The voters and I agree on Mike Cameron, so I'll focus on the voters' selection of Torii Hunter and Ichiro.
Given that center field is the most demanding outfield position and that we have a large number of deserving candidates there, I see no reason to choose a corner outfielder. Furthermore, according to our analysis, Ichiro had above-average range and an above-average arm, but he wasn't as far above average as the media would have you believe.
Ichiro's range factor was .26 above the norm, but he played behind a pitching staff that produced almost 200 more fly balls than the average AL team (according to the STATS Player Profiles book). His STATS zone rating was seven points below the major-league average for right fielders.
Nevertheless, based on his reputation and the fact that our fielding analysis shows that Ichiro would almost certainly have made more plays if he wasn't playing next to Cameron, we believe he's worthy of a Very Good rating. But we don't see evidence of Gold Glove range here.
In addition, he had only 8 assists, a below-average number for a RF who played as much as he did. And it's not as if nobody was willing to test him. Runners tried to advance on him a little less often than against the average RF, but not that much less. It does appear as if runners got a little more wary of his arm as the season progressed, but not a lot more wary. So we've rated him Very Good in throwing as well.
The media seems to be saying that Ichiro is unquestionably excellent in all phases of the game. According to our methods, he's excellent at a lot of things (hitting for average, hitting in the clutch, sacrifice bunting, running the bases, stealing bases, avoiding errors, staying healthy), very good at some things (getting to balls in right and keeping runners from taking extra bases), and below average in some ways (drawing walks, hitting for power). That's quite a package, and I'd definitely want this guy on my team. But I just don't see the evidence that he's among the top defensive outfielders in the game.
So, if Ichiro doesn't get my vote, then who does deserve the other two outfield Gold Gloves for the AL? Singleton topped the charts in plays-made-per-opportunity, but he only started 102 games. Lofton only started 123 games. Singleton and Hunter have subpar throwing arms. (Hunter tied for the league lead in assists by a CF with 14, but several of those came on plays where the lead runner scored, and he allowed lots of runners to take extra bases.) Hunter plays in a tough park -- it's easy to lose balls in the Metrodome roof -- so he's better than his numbers suggest, and his numbers are very good to begin with. Erstad made only one error all season, leading all major-league CFs in fielding percentage.
It's a very close call, but there are some big differences in playing time to consider. Performance rates are very important, but when it comes to seasonal awards, the volume of performance is more important. So when someone performs at a high level for 145 games, that trumps someone else who performed at a slightly higher level for 120 games. On that basis, my other two votes would go to Erstad and Hunter.
Over in the NL, the top candidates (in my mind) were Geoff Jenkins in left, Andruw Jones in center, plus Larry Walker, Vladimir Guerrero, and Brian Jordan in right. J. D. Drew would have been on this list were it not for the injury that cost him about 50 games. The voters chose Walker, Jones, and Jim Edmonds.
I agree with the selections of Walker and Jones, but in my opinion, either Jenkins or Guerrero would have been a much better choice than Edmonds. Jenkins is a terrific left fielder, but I have to give it to Guerrero because (a) Jenkins started only 104 games, (b) Guerrero showed great range too, and (c) Guerrero has a cannon for an arm. Guerrero does make too many errors, but his range and arm more than compensate for them.
Jim Edmonds has made some of the most amazing plays I have ever seen, but he simply doesn't cover as much ground as some of the younger players at this position. This year, he was below average in range factor and the STATS zone rating, and according to our method, made 16 fewer plays than the average CF given the opportunities presented to him. He battled groin, toe and knee problems, and he's starting to get up in years. I just don't see any reason to believe that he's a more valuable outfielder than the other guys I mentioned.
Recap. Here's how my selections would agree or disagree with those of the voters:
Pos Voters Diamond Mind P Mussina, Maddux Garcia, Rueter C Rodriguez, Ausmus same 1B Mientkiewicz, Helton same 2B Alomar, Vina Kennedy, Vina 3B Chavez, Rolen same SS Vizquel, Cabrera Gonzalez, Cabrera OF Cameron, Walker same OF Hunter, Jones same OF Ichiro, Edmonds Erstad, Guerrero
We agree on twelve of the eighteen selections. I haven't been keeping track, but I'm guessing this represents the highest rate of agreement
since we began doing this.
Now that we've offered our two-cents worth on the Gold Glove winners, there are some other players worth mentioning:
Bobby Abreu, RF -- According to our system, Abreu's play-making scores have been very erratic lately -- quite good through 1998, subpar in 1999, very good in 2000, and average this year. Looked at in the context of the past three seasons, it now seems as if the Excellent rating we assigned for his performance last year was generous, even though he was clearly in the top tier statistically that season. I'm at a loss to explain these ups and downs.
Craig Biggio, 2B -- This former Gold Glover missed the last two months of the 2000 season with a knee injury that required surgery. In January, his general manager warned that Biggio's range and baserunning ability would most likely be limited, especially early in the year. Those comments proved to be accurate, as Biggio's range was far below its previous level and he stole only seven bases, down from 50 only three years ago. His baserunning instincts are still good, so he was a little above average in that regard, but nowhere near the Excellent level he sustained before he hurt his knee.
Tony Clark, 1B -- A great athlete who has earned our Very Good rating for defense the past two years, Clark has been battling back problems that have kept him out of the lineup and hurt his power and defense. We downgraded his range rating to Fair as a result, but if he regains his health, you can expect it to rebound next year.
Ken Griffey, CF -- Spent much of the season trying to play despite a torn hamstring and its after-effects, and it clearly showed. In a little more than half a season of playing time, Griffey made ten fewer plays than the average CF, thereby earning a Fair rating. Expect that to rise next year if he's back at 100%.
Derek Jeter, SS -- I know we're going to take some heat from New York fans on this one, but I assure you that there is no bias in our decision to assign Jeter a Fair range rating this year.
According to our analysis, Jeter made 32 fewer plays than the average shortstop given the opportunties presented to him. He was below average going to his right, below average going to his left, and below average on balls hit more or less at his position. His STATS zone rating was fifty points below average. His range factor was lowest in the majors among those who played at least 100 games at the position. At one time, Scott Brosius's superior range affected Jeter's numbers, but Brosius has declined from Excellent to Average in recent years and is no longer a factor in evaluating Jeter.
The New York infield ranked 10th in the league in the percentage of ground balls that were turned into outs. And it was 13th in double play percentage. Alfonso Soriano probably deserves most of the blame for the low DP rate, but if Jeter was an outstanding fielder, he would have compensated for Soriano's limitations to some extent, and the team would have been closer to the league average.
In his defense, he played behind a staff that produced 5% fewer ground balls than the average team, so his range factor was artificially depressed. Take that into account, and Jeter's range factor would have been only the second- or third-worst in the majors. And, of course, in the playoffs, he made a couple of very heady and gutsy plays that had everyone talking about his courage, his will to win, and his intelligence.
But a couple of attention-getting plays aren't enough, in my opinion, to offset the mountain of evidence indicating that Jeter simply didn't get to as many balls as most of the other shortstops in the game.
Ryan Klesko, 1B -- Earlier in his career, before he was traded to San Diego, Klesko didn't show much range at first base in the limited amount of time he played that position for Atlanta. In 2000, he showed average range in his first full season as a 1B. We gave him an average rating for that performance, even though we weren't certain that he had improved that much. But there was a major drop this year, and his Pr rating reflects that. Klesko has surprised a lot of people by stealing 23 bases in each of the past two seasons, but his career record is quite poor in both left field and at first base, so it seems as if his 2000 season was the anomaly.
Carlos Lee, LF -- Different fielding metrics suggest that Lee's range in left was anywhere from a little above average to a little below average. Yet his defense was sharply criticized in Sports Illustrated's pre-season baseball issue and again late in the season in a Baseball Weekly note. He was replaced defensively 39 times, and that normally happens only to players who are major liabilities in the field. In this case, however, the guys replacing him were superior defenders like Chris Singleton, so it doesn't necessarily mean that Lee was terrible, only that the other guys were better. We asked several people who follow the Sox, and their opinions ranged from "he's under-rated" to "he looks awkward but gets the job done" to "he's as bad as they say." We've chosen to assign him an Average rating this year. That may be a little generous, and I wouldn't be surprised if he slips back to a Fair rating next year.
Raul Mondesi, RF -- Has a very good reputation for defense, but that's mostly based on his great arm. In terms of range, our analysis shows that he's been slightly above average throughout his career. In the spring, it was reported that Mondesi came to camp carrying some extra weight, and his defensive numbers took a big dive. Coincidence? Maybe, but we felt a Fair rating was an accurate reflection of his 2001 performance. He could easily rebound next year.
Todd Zeile, 1B -- A year ago, we wrote that his Excellent range came as a complete surprise even though third basemen often move across the diamond and look very good relative to the men who play first. But we were skeptical. He's never had a reputation as a good fielder, and we wondered whether he'd be able to keep it up. He didn't, so it may be that last year was a fluke or a case where the various fielding measures over-stated his value for some reason. We rated him Average this year.
Can pitchers prevent hits on balls in play?
July 21, 2003
In January, 2001, Voros McCracken published an article that shook the baseball analysis community.
In an attempt to better understand how to separate the contributions of pitching and defense, McCracken divided the traditional pitching stats into two groups -- those that are under the direct control of the pitcher (hit batsmen, walks, strikeouts, homers) and those that aren't (hits on balls in play). He called the first group defense-independent pitching stats, or DIPS for short.
I'll get into the details shortly, but before I do, the reason McCracken's work caused such a stir is that he reached a conclusion that seems very counter-intuitive and, if true, extremely important. In his own words, he stated his major finding in these two ways, once at the beginning of the article and once at the end:
"hits allowed are not particularly meaningful in the evaluation of pitchers"
"major-league pitchers don't appear to have the ability to prevent hits on balls in play"
McCracken wasn't able to give a reason why this would be true, but stated rather emphatically that it is true.
Ever since I read that article, I've been wondering how this could possibly be. It seems so obvious that certain pitchers must be able to get more than their share of easy outs. Doesn't Greg Maddux produce more than his share of routine ground balls? Doesn't Mariano Rivera's cutter eat up opposing hitters even when they don't strike out? Doesn't a flame-thrower like Roger Clemens induce a lot of weak swings from hitters who are down in the count? Wouldn't a knuckleball lead to more lazy popups from hitters who are just guessing at where that pitch will dance next?
McCracken's analysis used a stat that I'll call in-play average (or IPAvg), which he defined as (H - HR) / (BF - HR - HBP - BB - K). That's just non-homer hits divided by balls in play, and because all but a handful of homers leave the yard, it's a good reflection of how well pitchers and defenses are able to turn batted balls (that stay in the field of play) into outs.
He found that:
- there are "massive differences in the ability of pitchers" even before considering balls in play. To put it another way, a lot of a pitcher's ERA is explained by his walk rate, strikeout rate, and ability to prevent homers.
- the correlation between a pitcher's IPAvg one year and the next is low, suggesting that pitching ability might not have a major impact on IPAvg, as compared to other factors such as defense and luck
- some of the best pitchers in the game, such as Greg Maddux and Pedro Martinez, have gone from the top to the bottom and back to the top in IPAvg in subsequent seasons, again suggesting that these results are largely out of their control
- the variations in IPAvg decrease when you add park effects and the quality of the defense to the analysis
- projections of next-year pitching stats are more accurate if you use a team's collective IPAvg than if you use each pitcher's personal IPAvg from the year before
My reaction was to think that McCracken was on to something but may have gone too far, so I began to think about how to dig a little deeper.
McCracken appears to have done most of his work using stats from two seasons. I wasn't sure whether those two seasons were representative or not, so I decided to apply his method to all pitcher-seasons since 1913. Why 1913? Because that's the first year my historical database has all of the stats needed to compute IPAvg and the DIPS for every pitcher. And I figured that 90 years would be more than enough to prove the point one way or the other.
After compiling this information and studying it for a while, I discovered a pair of columns by Rob Neyer of ESPN.com. In the first column, Rob described the McCracken article. In the second one, which appeared a couple of days later, Rob included email messages from Craig Wright and Bill James with their take on McCracken's assertion.
Wright described his own work in this area:
"Like McCracken, I've studied hits allowed per ball in play (though with the small difference that I subtract sacrifice hits) ... I agree that this type of hit rate is not as heavily influenced by the pitcher as is commonly believed, but at the same time I am distinctly uncomfortable with McCracken's conclusion."
James wrote that he hadn't studied this issue, but that he shared Wright's reservations and suggested that someone do a large-scale study to find out whether the idea would hold up. It appears that the work I had just finished doing was exactly what Bill was proposing.
In addition, Bill wrote about McCracken's work in the New Bill James Historical Baseball Abstract. Based on a review of an unspecified number of pitching careers and about 400 pitcher-seasons, he concluded that pitchers do have an influence on these outcomes but confirmed McCracken's finding that there's still a lot of random variation in single-season performances.
Finally, in recent months, I've seen more and more references to McCracken's assertion in various baseball articles and posts to baseball research forums. There's enough momentum building behind this idea that a few of our customers have asked how we might change the design of our Diamond Mind Baseball game to reflect this new knowledge about how baseball works.
Before making any changes to our game or our method for projecting player performance, I figured it was worth spending some time looking at this question.
NOTE: In an article published on Baseball Primer last year, McCracken softened his original conclusion a little, saying that there are small differences among pitchers in their ability to prevent hits on balls in play, and those differences are "statistically significant if generally not very relevant." Except for the regulars on Baseball Primer, I don't think many people in the baseball research community are aware of this update to McCracken's thinking.
For every pitcher who appeared in the big leagues since 1913, I computed his HBP rate, walk rate, strikeout rate, homerun rate, and IPAvg for each of his seasons. The first four numbers are computed quite simply -- take the relevant stat and divide by batters faced. The IPAvg figures were computed according to McCracken's formula, which I wrote out a few paragraphs back.
To establish a baseline against which to evaluate those figures, I also computed those stats for each league-season and each team-season since 1913.
This enables us to evaluate every pitcher relative to the norms for his league. Last year, for example, Roger Clemens faced 768 batters and fanned 192 of them. That's a strikeout rate of .250 in a league where the average was only .163. His advantage over the league can be stated in two ways: (a) his rate was .077 higher than the league, and (b) he had 67 more strikeouts than the league-average pitcher would have had if he faced the same number of batters as Clemens. The same method was used to determine how many hit batsmen, walks, and homeruns each pitcher yielded above or below the league average.
For balls in play, I compared the in-play batting average for each pitcher and subtracted from that the corresponding in-play batting average for the league. As was the case with strikeouts, the result can be expressed either as a number of batting average points above/below the league or a number of hits above/below the league.
But hits on balls in play are subject to some outside influences that make comparisons with the league average a little suspect. Some parks (like Coors Field) tend to inflate batting averages. Some defenses are much better than others. If Jamie Moyer allows 15 fewer hits than normal, how can we decide whether to give Moyer the credit or chalk it up to Safeco Field and the talents of Mike Cameron and Ichiro?
To account for the effects of park and defense, I also computed the in-play average for each team-season in the period from 1913 to 2002. If McCracken is correct when he says that pitchers have virtually no influence over these outcomes, every pitcher on a given team should have roughly the same IPAvg. After all, those pitchers share a common park and a common defense.
If we then (a) compute the IPAvg for each team, (b) compare the IPAvg for each pitcher to that of his team, and (c) study those differences, we should find that the differences in IPAvg between a pitcher and his teammates are random. In other words, those differences should be centered around zero, equally likely to be above zero as below zero, and have no predictive value from one year to the next.
If we find that these differences are not random, there must be another factor, apart from defense and park effects, that accounts for them. And it follows that the missing factor must be an attribute of the pitcher. Because if the pitcher had nothing to do with it, there'd be no reason for that external factor to be evident only for this pitcher.
Studying career totals
At this stage of the process, we now know how much a pitcher exceeded or fell short of his league in five categories -- HBP, BB, K, HR and hits on balls in play -- for every season of his career. And we also know how much a pitcher exceeded or fell short of his teammates on in-play hits for every season of his career. The last step is to sum these values to obtain career totals (from 1913 forward) for every pitcher.
McCracken asserted that pitchers have a lot of control over the defense-independent pitching stats, so I would expect to see substantial differences among pitchers in their career HBP, walk, strikeout, and homerun rates, even after normalizing all of these figures against the league averages for each season.
After crunching the numbers for a total 29,973 seasons by 6,004 pitchers, we did indeed find very large differences among pitchers in some of the defense-independent statistics, especially walks and strikeouts. That's not likely to surprise any of you. It didn't surprise me, and it's entirely consistent with McCracken's findings.
More importantly, McCracken asserted that pitchers have almost no control over balls in play. If he's right, we would expect to see essentially random values for the career rates of in-play hits, especially for net in-play hits relative to the team baseline.
But we also found meaningful differences in the number of hits allowed on balls in play. In other words, a large number of pitchers consistently demonstrated the ability to limit the number of those hits. Their influence on these outcomes isn't as great as it is on the defense-independent stats, but it is real, and it is large enough to be important.
Here's a partial list of the top pitchers based on the number of career hits they saved relative to the IPAvg of their teams. The list includes two figures for each pitcher, the first without adjustments for park and defense and the second with those adjustments:
Pitcher IPHits vsLg IPHits vsTm ----------------- ----------- ----------- Charlie Hough -371 -299
Walter Johnson -277* -214*
Tom Seaver -269 -201
Catfish Hunter -296 -185
Warren Spahn -266 -183
Fergie Jenkins -128 -182
Pete Alexander -197* -177*
Phil Niekro -147 -172
Jim Palmer -315 -170
Ned Garver -71 -168 * excludes seasons before 1913
Charlie Hough has prevented more hits on balls in play than any other pitcher in our study, and our sample includes the last ninety years, so we've covered most of baseball history. Compared with the league-average pitcher, Hough has allowed 371 fewer hits on balls in play. Compared with his teammates, that figure drops to 299 hits, suggesting that his parks and defenses deserve some of the credit.
How important is 299 hits? Hough would have given up an extra run every three games or so if he had allowed hits on balls in play at the same rate as his teammates over the course of his career. That's a pretty big deal.
Could this happen by chance? No, it couldn't. Hough allowed batters to put 11,586 balls in play over the course of his career. If these results were random, there'd be a 95% chance that his net hits allowed would fall between +93 and -93 and a 99% chance they would fall between +116 and -116. The probability that a pitcher could reduce hits by 299 totally by chance is exceedingly small. (For the statisticians among you, Hough was more than six standard deviations from the mean.)
And Hough wasn't the only one, not by a long shot. In a sample of 351 pitchers with at least 6000 career balls in play, more than 12% of them posted results that would happen less than 1% of the time by chance. And that understates the case, too, because you get to keep pitching if you're that much better than the league, but you usually don't make it to 6000 balls in play if you're that much worse than the league. If one end of the distribution hadn't been truncated by job losses, approximately 20% of those pitchers would have fallen outside the range that can be explained by chance.
There are two knuckle-ballers on this list, and while you can't see it here, I can tell you that if I had run this list a little further, you'd have seen 6 knuckle-ballers in the top 35. (The other four are Eddie Rommel, Ted Lyons, Hoyt Wilhelm and Tim Wakefield.)
NOTE: The observation that knuckleball pitchers are especially good in this area is not new. Craig Wright noted the same thing in his email to Rob Neyer in January, 2001, and McCracken made this point in an article on Baseball Primer last year.
Some pitchers got a lot of help from their defense and park -- almost half of Jim Palmer's hits saved can be attributed to his defense (mostly) and his park -- while others look even better after the defense/park adjustment.
Of course, when you rank players based on counts, rather than averages, you're going to see a lot of guys with very long careers at the top of the list. So let's rank them again, this time dividing career hits saved by career balls in play, and setting a minimum of 5000 balls in play:
Pitcher IPAvg vs Lg IPAvg vs Tm ----------------- ----------- ----------- Charlie Hough -.032 -.026 Don Wilson -.015 -.023 Andy Messersmith -.033 -.021 Ned Garver -.008 -.020 Tim Wakefield -.020 -.019 Catfish Hunter -.028 -.017 Bud Black -.020 -.017 Oral Hildebrand -.015 -.017 Walter Johnson -.021 -.016 Dave Stieb -.022 -.016
Hough remains the career leader by holding enemy hitters to an in-play batting average that was 26 points lower than that of the pitchers on his teams. That's a very substantial advantage, and one that is entirely inconsistent with McCracken's conclusion.
To recap, this examination of career totals suggests very strongly that a meaningful number of pitchers have demonstrated the ability to reduce the rate of hits on balls in play.
Year-to-year variations, part one
By comparing the results for two seasons, McCracken concluded that "there is little correlation between what a pitcher does one year in the stat and what he will do the next." I'll start by looking at a few of the pitchers mentioned in the McCracken article, then expand the study and get a little more scientific.
McCracken pointed out that Greg Maddux had one of the league's best marks in baseball in 1998, then had one of the worst in 1999, and bounced back with a good in-play average in 2000. The following chart shows his entire career, with bars going up indicating an IPAvg that was worse than average and the bars going down indicating a lower-than-average rate of hits on balls in play:
The wild swings of 1998-2000 look like an anomaly when you examine Maddux's entire career. In fact, it appears that he struggled a bit as a youngster, reeled off a decade of good-to-great performances, then began to lose it as he got into his mid-30s. That sounds like a pretty normal career progression to me.
Pedro Martinez was another pitcher who gave up a lot of in-play hits in 1999 but bounced back in 2000. It should be noted that Pedro had a 2.07 ERA despite all those in-play hits in 1999, so we can only imagine what he would have done if he'd been a little less unlucky. Here's Pedro's career:
There's really only one bad year in this line, but it happened to fall in one of the years McCracken looked at. I think it's fair to say that Pedro has shown an above-average ability to prevent hits on balls in play, but his influence on these results is much less than on strikeouts, where he consistently mowed down an extra 90 or more hitters a year, and an incredible 181 more than average in 1999.
McCracken wrote that "You'll often hear people use names like Randy Johnson, Jamie Moyer and Andy Pettitte [as being very good at preventing hits on balls in play], but by any definition you want to use, these guys are not particularly good in the stat." Here's Moyer's career:
Moyer wasn't very good in this respect, or in most other respects, for the first half of his career. But he figured something out in 1996 and has been consistently better than the league ever since, with the exception of 2000. If I was McCracken and I was looking at the 1999 and 2000 seasons, I would have concluded that Moyer isn't particularly effective in preventing hits, but his last seven years say otherwise.
By the way, it's tempting to assume that Safeco Field and a very good Seattle defense are responsible for these recent successes, but that wouldn't be true. First of all, the 1996-1999 numbers were accumulated in a mix of Fenway Park, the Kingdome, and Safeco, with only the second half of 1999 in Safeco. More importantly, these numbers are relative to the in-play average for his teams, so they already factor out the impact of the park and the defense. The bottom line is that Jamie Moyer has been a master at preventing hits on balls in play since 1996.
How about Andy Pettitte? Here's his career:
McCracken was quite correct in pointing out that Pettitte is not a pitcher who prevents hits on balls in play. On the other hand, he's a very good counter-example regarding the claim that pitchers are not consistent in this regard.
Randy Johnson is the third pitcher mentioned by McCracken in the quote I cited above. Here's how Johnson has fared on balls in play over his career:
That's nine straight seasons at or better than the league average, followed by five seasons that were league-average or worse. The shift occurred at the very moment that he moved from the AL to the NL. I'm not sure whether that's meaningful, or whether it has more to do with the fact that he turned 35 in 1998. Like Pedro, Johnson's main asset is not his ability to prevent hits on balls in play, it's his ability to prevent balls in play in the first place. But Johnson was pretty good on those balls in play for nine years.
McCracken also claimed that "Randy Johnson gives up fewer hits than Scott Karl. That's not because batters hit the ball harder off Karl than Johnson, but because they hit the ball more often off Karl than Johnson." Here's Karl's career:
You might be able to make the case that Karl in his prime wasn't any worse than Randy Johnson in his late 30s, but if you compare the two pitchers at the same age, there's a noticeable edge for Johnson.
While we're on the subject of consistency from year to year, let's take a look at some of the knuckleballers, starting with Charlie Hough:
This chart is a little misleading in one respect. There are two bars for 1980, one for each of the teams he played for that year. Hough's IPAvg was awful in his 32 innings with the Dodgers and quite good in his 61 innings with Texas. Overall, he was a little worse than average for the year. The bottom line is that Hough was remarkably good at preventing hits on balls in play for a very long time.
Here's another knuckleballer, Tim Wakefield:
And a third knuckleballer, Phil Niekro:
Hough and Wakefield were remarkably good throughout their careers, and if you ignore the years after his 43rd birthday (1983 to the end), you could say the same about Niekro, too.
Number two on the all-time list was Walter Johnson, whose career looked like this:
Remember, I cut things off at 1913, so this leaves out his early years. It's quite possible that he would have been the all-time leader if those seasons had been included.
Sandy Koufax got some help from Dodger Stadium, but that wasn't the only reason he was so dominant during the last five years of his career. Even with the park and defense factored out, his IPAvg was consistently good during those years:
Finally, here's Jim Palmer, another Hall-of-Famer who was consistently good on balls in play during his career, except for the very beginning and end of his time in the big leagues:
If I had run Palmer's chart showing his performance relative to the league average (instead of his team), it would have been twice as impressive.
We could go on and do a lot more pitchers, but I think we've seen enough to make the point that it's not too hard to find examples where these in-play averages appear to be anything but random. In other words, this is highly persuasive evidence that these pitchers did indeed have the ability to prevent hits on balls in play.
Year-to-year variations, part two
It goes without saying that one cannot prove or disprove the idea that "there is little correlation between what a pitcher does one year in the stat and what he will do the next" by examining only ten or twelve careers.
To get a better handle on this phenomenon, I compiled a database consisting of all pairs of consecutive seasons in which a pitcher faced at least 400 batters in each season. Using this sample of 7,486 season-pairs, I computed the correlation coefficient for the net HBP rate, BB rate, K rate, HR rate, and in-play hit rate.
I found the highest correlation (.73) for strikeout rates. Walk rates (.66) were also highly correlated. The correlation coefficients dropped to .36 for hit batsmen, .29 for homeruns, and .16 for in-play batting average relative to the league. The lowest correlation (.09) was seen for in-play batting average relative to the team.
It may appear to be contradictory to say that certain pitchers appear to be consistently good while the overall correlation rate is quite low. But that's not necessarily so.
If McCracken is right, the difference between a pitcher's IPAvg and that of his team should vary randomly around zero as he moves through his career, and the correlation would be quite weak.
But if pitchers do have some influence over these outcomes, they could still exhibit a weak correlation by varying around some value other than zero that reflects the ability of the pitcher.
What about the weaker pitchers?
Most of our work to this point has focused on pitchers who had long and mostly successful careers in the big leagues. How do the DIPS and IPAvg stats of these players compare to those of players who weren't good enough to last that long?
The following table shows how eleven groups of pitchers compared with the overall averages. The first row includes all pitchers who faced less than 1,000 batters in their careers. The second row includes all pitchers who faced at least 1,000 batters but less than 2,000 batters during their careers. And so on.
Career BF BF HBP BB K HR vsLg vsTm 1 - 999 401,138 .002 .027 -.017 .002 .017 .015 1000 - 1999 931,981 .001 .013 -.009 .001 .006 .004 2000 - 2999 1,105,712 .001 .007 -.005 .000 .002 .001 3000 - 3999 1,179,916 .000 .006 -.003 .000 .000 .000 4000 - 4999 906,271 .000 .002 -.002 .000 .000 .001 5000 - 5999 920,680 .000 .001 .000 .000 .000 .000 6000 - 6999 647,553 .000 -.004 -.002 .001 -.001 -.001 7000 - 7999 843,937 .000 -.003 .000 .000 -.002 -.001 8000 - 8999 716,200 -.001 -.005 .005 .000 -.002 -.002 9000 - 9999 788,532 .000 -.008 -.001 -.001 -.002 -.001 10000+ 2,589,409 -.001 -.010 .008 -.001 -.004 -.003
Let's walk through the first row so it's clear how to read this table. Those pitchers, as a group:
- faced a total of 401,138 batters in their careers
- hit batters at a rate that was .002 above the league average. In other words, they hit two more batters per 1000 BF than did the average pitcher.
- walked 27 more batters per 1000 BF
- struck out 17 fewer batters per 1000 BF
- gave up 2 more homers per 1000 BF
- gave up 17 more hits per 1000 balls in play when compared with the league-average pitcher
- gave up 15 more hits per 1000 balls in play when compared with the in-play averages of their teammates
As you can see from the table, the pitchers with longer careers were progressively better than their shorter-career counterparts in every respect. They walked fewer batters, struck out more hitters, gave up fewer homeruns, and gave up fewer hits on balls in play. The ability to prevent hits on balls in play appears to be as much of a skill as anything else.
It might be easier to see this in chart form, so here are the walk rate, strikeout rate, homerun rate, and in-play averages for these groups of pitchers:
Another interesting aspect of this breakdown by career length is the total number of batters faced by each group. Only a very small percentage of batters are faced by pitchers with short careers. Of the roughly 11 million plate appearances since 1913 (including the Federal League of 1914-15), only 3.6% featured pitchers who finished their careers with less than 1000 batters faced.
In fact, the midpoint falls in the 6000-6999 group. A little more than half of the plate appearances since 1913 have been initiated by a pitcher who faced at least 6000 hitters in his career. We, along with other baseball analysts, often compare pitchers to the league average. Those league averages reflect the fact that the majority of plate appearances involve pitchers who are good enough to face thousands of big-league hitters.
That's a very high standard. And that may explain why it's difficult for any pitcher to consistently perform at a level higher than the league average. The table shows that the pitchers with the longest careers are only a little better than average. (They peak at a higher level, of course, but if you take their entire careers, there's not a huge difference.)
A better indicator may be the comparison of the short-career pitchers to the league averages. The chart shows that these marginal hurlers are far worse than the average in every way. In particular, they give up a lot more hits on balls in play than do the pitchers who are good enough to be big-league regulars for several years.
What's the right baseline?
At this point, we've seen (a) career totals that demonstrate that pitchers do influence these outcomes over the course of their careers, (b) several examples of pitchers who have been very consistent in IPAvg during their careers, and (c) that pitchers with longer careers are better than pitchers with shorter careers in every respect, including IPAvg.
In other words, pitchers do affect the rate of hits on balls in play. That means we can no longer use the team's IPAvg as a baseline against which to evaluate a pitcher. McCracken asserted that the team's IPAvg depended only on the park and the defense, but we've found that it depends on the park, the defense, and the quality of the pitchers on that team. If we use team IPAvg as the baseline, a good pitcher on a good staff is going to look worse than he really is. A good pitcher on a bad staff is going to look better than he really is. A good pitcher on an average team is still going to look a little worse than he really is because his own good performance is included in the team's IPAvg.
That leads to a good question, one that is not easily resolved. Is it better to compare a pitcher's IPAvg to that of his league or his team? If we use the league IPAvg as our baseline, we leave out the impact of the park and the defense. If we use the team's IPAvg as the baseline, we adjust for the park and the defense, but we introduce the quality of the fellow pitchers as a variable that can skew the results.
Neither approach is completely satisfactory. It's probably best to evaluate each pitcher's IPAvg against that of his team but make some accommodation for the quality of the pitching staff before making any judgments about that pitcher and before making any predictions about future performance.
In addition to ranking pitchers on IPAvg, this exercise provides a different way of looking at pitching careers. By putting each pitcher's career totals for net HBP, BB, K, HR, and IPHits side by side, we get a very clear picture of the reasons why they were successful.
Let's do a few, starting with Roger Clemens:
How's that for a picture of all-around greatness? Sure, he hit a few more batters than the average pitcher, but compared to the league averages, he walked 173 fewer and struck out 1,355 more, allowed 138 fewer homers, and surrendered 101 fewer hits on balls in play. (The IPHits figures include the defense/park adjustments for all of these profiles.)
Pedro Martinez shows a very similar pattern to that of Roger Clemens, but based on less than half of Clemens' batters faced.
Greg Maddux demonstrates awesome control, an above-average K rate, and the ability to keep the ball in the park. He had some influence on IPAvg, but that was only a part of his success.
By the way, some of those 69 hits saved might be attributable to his own defensive skill rather than his pitching skill. It's also quite possible that the -69 figure signficantly understates his contribution. Maddux saved 97 hits relative to the league averages, and now that we've shown that the team IPAvg reflects the ability of the other pitchers on the staff, that figure may represent Maddux's talent more accurately.
This line shows only one dominating characteristic -- the strikeouts. But if you're going to dominate in one area, that's a good one, because they can't get a hit if they can't put the ball in play. Fortunately for Johnson, his control is only a little worse than the norm, and got better in the later stages of his career.
Guys with below-average strikeout rates aren't supposed to be successful, but Moyer's exceptional control and low IPAvg have been the keys, especially in the later stages of his career.
Now here's a guy who didn't strike anyone out and gave up a lot of hits on balls in play, but survived because he had excellent control and kept the ball in the park. In particular, he kept the ball on the ground, meaning that a lot of those extra hits were singles and that a good number of potential rallies were killed by double plays.
John's profile made me think that it would have been a good idea to extend McCracken's work to measure GDP rates, but that notion didn't hit me until it was too late. Some day, I'll go back and add that to the study and see what pops out.
We can't leave this section without looking at the all-time leader in in-play hits saved. As you can see, Hough hit more batters, walked more batters, struck out only a few more batters, and gave up more homers than the average pitcher. His ability to prevent hits on balls in play is the biggest reason he had a long and successful career.
Is there really any doubt that Don Sutton is a Hall-of-Famer when you look at this profile?
Groups of similar pitchers
We could go on forever this way, so let's speed things up by looking at groups of pitchers with similar styles. Maybe we'll see some patterns.
HBP BB K HR IPHits Nolan Ryan +44 +878 +2578 -117 -133 Randy Johnson +44 +107 +1769 -52 -10 Roger Clemens +17 -173 +1355 -138 -101 Dazzy Vance +19 -65 +1122 -20 -19 Steve Carlton -49 -1 +1042 +5 -31 Bob Feller -1 +149 +1022 -42 -53 Sandy Koufax -35 +64 +1015 -12 -94 Pedro Martinez +27 -152 +974 -60 -47
Obviously, the defining characteristic of these pitchers was their ability to retire batters without help from anyone else. As a group, with the exception of Ryan, they had average control. All of them were better than average on hits per ball in play, but that wasn't the main reason for their success.
HBP BB K HR IPHits Rich Gossage +10 +90 +492 -31 -57 Lee Smith -17 +25 +447 -21 +12 Tom Henke -10 -24 +391 -12 -20 Rollie Fingers +4 -109 +358 -16 +12 Armando Benitez -3 +78 +332 +1 -41 Trevor Hoffman -17 -34 +317 -8 -49 John Wetteland -5 -25 +310 -5 -39 Billy Wagner 0 +17 +295 -6 -23 Robb Nen -18 -3 +283 -27 +12 Troy Percival 0 +30 +279 -9 -55 Bruce Sutter -4 -48 +269 +1 -54
This is just a special case of the power pitchers group, but it's interesting to see how many of these guys have posted impressive IPHits numbers even though they pitch many fewer innings than do the power pitchers in the previous table.
HBP BB K HR IPHits Robin Roberts -40 -772 -15 +56 -82 Pete Alexander -50 -570 +247 -1 -177 Jim Kaat +19 -566 -264 -4 +144 Ferguson Jenkins -12 -534 +635 +125 -182 Greg Maddux -4 -507 +150 -147 -69 Ted Lyons -43 -481 -366 -7 -121 Dutch Leondard +7 -477 -100 -53 -64 Don Sutton -25 -476 +512 +42 -138 Lew Burdette -9 -445 -611 -13 +32 Walter Johnson +25 -442 +847 -20 -214
Some of these guys (Roberts, Jenkins, and Sutton) gave up more than their share of homers, but with control this good, plus the ability to reduce hits on balls in play, a lot of those homers were solo shots.
HBP BB K HR IPHits Warren Spahn -63 -437 -36 -44 -183 Bud Black 2 -110 -204 +23 -114 Randy Jones -18 -189 -346 -13 -97 Wilbur Wood 0 -238 -135 -13 -84 John Tudor -3 -146 -50 +5 -82 Kenny Rogers +3 -39 -105 -40 -74 Larry Gura +4 -127 -276 +21 -72 Jim Deshaies -7 +21 -34 +44 -72 Jamie Moyer 0 -238 -153 +15 -65 Don Carman +4 +44 -4 +36 -65
This is a list of left-handed pitchers with below-average strikeout rates. Most had very good control, but six of them were at least as susceptible to the long ball as the average pitcher. A significant part of their success is/was the ability to keep hitters off balance and keep their in-play batting averages down.
Putting the pieces together
We've seen that there's more than one way to succeed as a big-league pitcher. Robin Roberts walked 772 fewer batters than his peers. Roger Clemens struck out 1355 more batters than average. Greg Maddux yielded 147 fewer homeruns. And Charlie Hough prevented somewhere between 299 and 371 hits on balls in play.
So what's the most important element of a pitcher's repertoire?
Well, the value of various baseball events depends on the era. When scoring is up, as it has been in recent years, an extra baserunner comes around to score more often than during a period like the 1960s. In The Hidden Game of Baseball, Pete Palmer provided a table of run values for various periods in the 20th century, and I'll use those values to evaluate these events.
Palmer puts the value of a walk at about a third of a run, so the 772 walks saved by Robin Roberts are worth about 250 runs over the course of a career. That's not bad.
Clemens struck out 1355 more batters, but if he hadn't, some of those batters would have reached base, and some would have been retired in other ways. If his strikeout rate had been at the league average, it's possible that he would have allowed another 125 walks, 35 homers, and 320 more hits on balls in play. Using Palmer's run values and reasonable assumptions about the distribution of those hits among singles, doubles, and triples, those strikeouts are worth about 250-280 runs.
Palmer puts the value of a homer at about 1.4 runs, so Maddux saved about 200 runs by keeping his homerun rate down.
And the 300+ hits saved by Hough are worth about 150-175 runs.
Those are impressive figures, and they'd be even more impressive if we were evaluating them against replacement level pitchers instead of the league average. As we noted before, the league average is a very high standard.
The bottom line is that success in all four areas is important. You can have a good career if you're average in all four areas or if you can offset one weak area with a strength in another. You can have a very good career if you have no major weaknesses and you have a special ability in one of these respects. And you can have a great career if you're better than average in all four areas.
Having completed this study, I can sum up my own beliefs as follows:
1. Pitchers have more influence over in-play hit rates than McCracken suggested. In fact, some pitchers (like Charlie Hough and Jamie Moyer) owe much of their careers to the ability to excel in this respect.
2. Their influence over in-play hit rates is weaker than their influence over walk and strikeout rates. The most successful pitchers in history have saved only a few hits per season on balls in play, when compared with the league or team average. That seems less impressive than it really is, because the league average is such a high standard. Compared to a replacement-level pitcher, the savings are much greater.
3. The low correlation coefficients for in-play batting average suggest that there's a lot more room for random variation in these outcomes than in the defense-independent outcomes. I believe this follows quite naturally from the physics of the game. When a round bat meets a round ball at upwards of 90 miles per hour, and when that ball has laces and some sort of spin, miniscule differences in the nature of that impact can make the difference between a hit and an out. In other words, there's quite a bit of luck involved.
4. Year-to-year variations in IPAvg-versus-team can occur if the quality of a pitcher's teammates varies from year to year, even if that pitcher's performance is fairly consistent.
5. The fact that there's room for random variation doesn't necessarily mean a pitcher doesn't have any influence over the outcomes. It just means that his year-to-year performances can vary randomly around value other than zero, a value that reflects his skills.
6. Unusually good or bad in-play hit rates aren't likely to be repeated the next year. This has significant implications for projections of future performance.
7. Even if a pitcher has less influence on in-play averages than on walks and strikeouts, that doesn't necessarily mean that in-play outcomes are less important. Nearly three quarters of all plate appearances result in a ball being put in play. Because these plays are much more frequent, small differences in these in-play hit rates can have a bigger impact on scoring than larger differences in walk and strikeout rates.
The process of separating pitching stats into defense-independent and defense-dependent groups is illuminating. The notion that pitchers don't have as much control over in-play outcomes as they do over defense-independent outcomes is both obvious (in retrospect) and very important. Voros McCracken deserves a lot of credit for introducing this way of thinking.
The bottom line, though, is that I am convinced that pitchers do influence in-play outcomes to a significant degree. There's a reason why Charlie Hough and Jamie Moyer and Phil Niekro and Tom Glavine and Bud Black have had successful careers despite mediocre strikeout rates. There's a reason why the top strikeout pitchers have also suppressed in-play hits at a good rate. Using power or control or deception or a knuckleball, pitchers can keep hitters off balance and induce more than their share of routine grounders, popups, and lazy fly balls.
Written by Tom Tippett
September 15, 2003
After Oakland's 8-6 win over the Red Sox on August 20th, these two quotes appeared in ESPN.com's game story:
"I feel like we stole two games," Oakland third baseman Eric Chavez said. "These aren't the kind of games we're going to win down the line."
"We felt like we had the right people up there at the right time at several points in the game, but we couldn't get more runs across," Boston manager Grady Little said.
Chavez talked about stealing the game because Boston outhit the A's 18 to 11 and drew seven walks to only one for Oakland. Add up the total bases and walks (TBW) for both teams and you find that Boston outproduced the visitors 28 bases to 19. But 17 Red Sox runners were stranded, Oakland bunched their hits with a key Boston error in a four-run eighth inning, and the visiting team went home with the win.
That got us thinking. How often does this happen? How often does a team win the statistical battle yet lose the final-score war?
Measuring team performance in a season
For several years, we've been looking at measures of team production to learn more about why a season played out the way it did and to get a sense for each team's chances the next year. (For our recap of the 2002 season , see Measuring Team Efficiency).
One of those measures is total bases plus walks. By comparing the TBW produced by each team's hitters with the TBW allowed by its pitchers, we get a good indication of the strength of that team.
Most times, those TBW figures flow quite naturally into runs, which flow quite naturally into wins, and you can see the statistical underpinning for a team's performance. For instance, the 2002 Yankees produced 558 more TBW than they allowed, outscored their opponents by 200 runs, and finished with the AL's best record.
Sometimes, however, these relationships don't hold up. The 2002 Angels were exceptionally good at converting offensive events into runs, compiling a run margin that was a little better than New York's even though their TBW differential was less than half that of the Yankees. By taking full advantage of their opportunities, they finished with 99 wins, beat New York in the divisional series, and didn't stop until they'd won it all.
Measuring team performance in a game
We've been wondering whether we'd learn anything by applying this approach to the results of individual games. How often does the team with the higher TBW figure actually win the game? And do the games that go the other way have a significant effect on the standings?
While the TBW differential is a very good measure of team performance over a season and has the advantage of being easy to figure, it isn't perfect. Among other things, it doesn't include events like hit batsmen and errors.
Most of the time, we can safely ignore those events when evaluating full seasons. The difference between bases gained by a team and given to its opponents in these ways is usually very small and doesn't affect any conclusions one might draw from the TBW differentials.
In a single game, though, HBP and errors can make the difference, so we added them for this project. For every game in the last ten years (through the end of August, 2003) we computed the number of bases produced by each team on hits, walks, HBP and errors that allowed their batters to reach base.
It turns out that the team that produced more bases in these ways was the victor 82% of the time. In 4% of the games, the teams tied in bases produced, so the win could have gone to either team. That leaves 14%, or about one game in seven, in which a team was outproduced but found a way to win anyway.
In a little more than half of the games that went to the less productive team, the winners were outproduced by only one or two bases, leaving about 7% of the games in which one team overcame a deficit of at least three bases. For the rest of this discussion, we'll focus on this subset, and for lack of a better term, we'll call them "stolen games".
The big ones
Two of the biggest steals of the 2003 season came in back-to-back games involving Anaheim and Texas.
On April 15th, at Texas, the Angels drew four walks and pounded out out ten hits, including a triple and a pair of homers, for a total of 22 bases on hits and walks. Meanwhile, Jarrod Washburn and Brandon Donnelly held the Rangers to six hits (two doubles), three walks, and a hit batsman, for a total of 12 bases. But Texas won 5-4 because the Anaheim hits were scattered and much of the Texas action was crammed into a single five-run inning.
The tables were turned a day later. Both teams had 13 hits, allowed one hitter to reach on an error, and drew four walks. But the Rangers blasted four homers to none for the Angels. Add it all up and the Texas hitters accounted for 12 more bases. All that production went for naught, however, when Anaheim bunched their hits in a seven-run eighth inning that gave them an 8-7 win. This deficit of 12 bases was the season's largest for a winning team.
In the past ten years, only 17 games (of 22,334 that were played) have exhibited a larger deficit, topped by a pair of games in which the winner overcame an 18-base shortfall.
Winners and losers in 2003
With 7% of all wins going to a team that overcame a deficit of at least three bases, we'd expect each club to have about five wins and five losses of this type through the end of August.
And most did. Twenty-four teams had between 3 and 7 stolen wins, while twenty-six teams gave away between 3 and 7 losses of this type.
The Cincinnati Reds were far and away the biggest winners in the 2003 stolen-game sweepstakes. Twelve times the Reds picked up a victory in a game in which they were outproduced by at least three bases. Only once did they lose a game in this fashion. That's why they were able to hang around .500 for a few months despite having the worst run margin and the worst TBW differential in the NL.
Montreal has also improved its standing by winning eight and losing only three of these games. But, like the Reds, the Expos faded after a promising start and are no longer serious contenders for a postseason berth.
Three teams have lost more than their share of these games, but two of them are Detroit (five more stolen losses than wins) and San Diego (six more). Nothing of great importance there, at least in terms of postseason implications.
And then there are the Boston Red Sox. Only three teams had more than seven stolen losses, and Boston heads that list with twelve. With only four stolen wins to their credit, Boston has lost eight more stolen games than they've won, easily the worst imbalance in the majors.
In case you want to check out the boxscores and game logs, here are the games:
Date Opp Bases Score Comment 5/11 @Min 31-24 8-9 Rally from 8-0 deficit falls short 5/21 NYY 17-13 2-4 5/31 @Tor 27-23 7-10 Five Tor hits bunched in 5-run sixth 6/10 StL 32-29 7-9 6/12 StL 37-31 7-8 Nixon leaves based loaded four times 6/28 Flo 33-22 9-10 Marlins score four each in 8th and 9th 7/3 @Tam 26-20 5-6 7/25 NYY 19-16 2-4 8/8 Bal 24-20 4-10 O's get 6 of their 13 hits in 7-run inning 8/8 Bal 15-11 2-4 8/10 Bal 23-17 3-5 8/20 Oak 28-19 6-8 Boston strands 17 runners
The four-game series against Baltimore in early August was particularly disheartening for Boston fans. The home team outproduced the O's in every game but still managed to lose the series three games to one.
In 17 of the 19 games against the Yankees (including the three games in September), the more productive side emerged victorious. But both of the stolen games went in New York's favor. So the season series, won 10-9 by the Yankees, turned on these stolen games.
Remember that these twelve losses were in games where Boston outproduced their opponents by at least three bases. They also lost five games in which they had an edge of one or two bases, and their overall record in these games was 5-17. That's a very big deal.
This isn't the only statistical evidence to support the idea that Boston hasn't taken full advantage of its opportunities this year. Their run margin is right up there with Seattle's for the league lead. And their TBW differential (+539 through 9/14) is far better than New York's (+403).
In fact, the Sox are on pace to post the fifth-best TBW differential in the past thirty years. The only teams ahead of them on that list are the 1998 Braves (who finished with a 106-56 record), the 1998 Yankees (114-48), the 2001 Mariners (116-46), and the 1995 Indians (100-44 in a shorter season). That's great company. In other words, this Boston team is a statistical juggernaut that should be leading the league in wins.
Note: These measures of team performance exclude stolen bases. Looking over the boxscores for the dozen games listed above, I found only one game where steals might have made the difference. Boston has been a good running team this season, and I don't believe the conclusions would have changed if we had included stolen bases in our measure of a team's performance in a game.
The outlook for the Red Sox
After they blew the August 20th game against Oakland, I thought the Red Sox were done. Time after time, they had been able to bounce back from tough losses, and they've earned a lot of praise for being a resilient team. But you can only dig a hole and climb out of it so often, and I thought they may have used up their quota.
To their credit, they won the series finale against Oakland, swept the Mariners at home, and took two of three from New York in Yankee Stadium the next weekend. During the toughest part of the schedule, they played their best baseball of the season.
So they're still in position to be playing in October. But had the Red Sox been able to play .500 ball in these stolen games, their magic number for clinching a playoff spot would be in the low single digits right now. Instead, they're fighting tooth and nail just to get in.
Does their poor record in these games point to a weakness in the makeup of this team? Or was it just a run of bad luck? I don't know the answer to these questions. I can say that the Red Sox are a very strong team statistically, and if they can put all of this behind them and start posting a win-loss record that is consistent with their production, they can be very dangerous in October.
That, of course, is a very big if. And if Boston doesn't make the playoffs, you can bet that New England can anticipate a winter full of hot-stove conversations about how the Yankees "know how to win" and how the local nine is missing something.
- Jim Wheeler
- Tags: Measuring team performance
2015 Debut Players
Some of you might be anxious to get started on your pre-draft planning, so we've put together this set of stats for the players who made their major league debuts in 2015. If your league has a rookie draft, these are the players who'll be available, listed alphabetically.
|Jung Ho Kang||29756||PIT||0.287||126||421||121||24||2||15||60||58||17||28||99||5|
|Abel de los Santos||30201||WAS||2||0||0||0||0||5.40||1.7||2||1||1||1||3||1|
|Jose De Paula||29595||NYA||1||0||0||0||0||2.70||3.3||2||1||1||4||2||1|
|Chi Chi Gonzalez||29674||TEX||14||10||4||6||0||3.90||67||49||33||29||32||30||6|
|Sugar Ray Marimon||29812||ATL||16||0||0||1||0||7.36||26||30||21||21||14||14||3|
- Jim Wheeler
- Tags: Stats for Debut Players