Baseball Articles — Baseball Research

RSS

Can pitchers prevent hits on balls in play?

Tom Tippett
July 21, 2003

In January, 2001, Voros McCracken published an article that shook the baseball analysis community.

In an attempt to better understand how to separate the contributions of pitching and defense, McCracken divided the traditional pitching stats into two groups -- those that are under the direct control of the pitcher (hit batsmen, walks, strikeouts, homers) and those that aren't (hits on balls in play). He called the first group defense-independent pitching stats, or DIPS for short.

I'll get into the details shortly, but before I do, the reason McCracken's work caused such a stir is that he reached a conclusion that seems very counter-intuitive and, if true, extremely important. In his own words, he stated his major finding in these two ways, once at the beginning of the article and once at the end:

"hits allowed are not particularly meaningful in the evaluation of pitchers"

"major-league pitchers don't appear to have the ability to prevent hits on balls in play"

McCracken wasn't able to give a reason why this would be true, but stated rather emphatically that it is true.

Ever since I read that article, I've been wondering how this could possibly be. It seems so obvious that certain pitchers must be able to get more than their share of easy outs. Doesn't Greg Maddux produce more than his share of routine ground balls? Doesn't Mariano Rivera's cutter eat up opposing hitters even when they don't strike out? Doesn't a flame-thrower like Roger Clemens induce a lot of weak swings from hitters who are down in the count? Wouldn't a knuckleball lead to more lazy popups from hitters who are just guessing at where that pitch will dance next?

McCracken's analysis used a stat that I'll call in-play average (or IPAvg), which he defined as (H - HR) / (BF - HR - HBP - BB - K). That's just non-homer hits divided by balls in play, and because all but a handful of homers leave the yard, it's a good reflection of how well pitchers and defenses are able to turn batted balls (that stay in the field of play) into outs.

He found that:

  • there are "massive differences in the ability of pitchers" even before considering balls in play. To put it another way, a lot of a pitcher's ERA is explained by his walk rate, strikeout rate, and ability to prevent homers.
  • the correlation between a pitcher's IPAvg one year and the next is low, suggesting that pitching ability might not have a major impact on IPAvg, as compared to other factors such as defense and luck
  • some of the best pitchers in the game, such as Greg Maddux and Pedro Martinez, have gone from the top to the bottom and back to the top in IPAvg in subsequent seasons, again suggesting that these results are largely out of their control
  • the variations in IPAvg decrease when you add park effects and the quality of the defense to the analysis
  • projections of next-year pitching stats are more accurate if you use a team's collective IPAvg than if you use each pitcher's personal IPAvg from the year before

My reaction was to think that McCracken was on to something but may have gone too far, so I began to think about how to dig a little deeper.

McCracken appears to have done most of his work using stats from two seasons. I wasn't sure whether those two seasons were representative or not, so I decided to apply his method to all pitcher-seasons since 1913. Why 1913? Because that's the first year my historical database has all of the stats needed to compute IPAvg and the DIPS for every pitcher. And I figured that 90 years would be more than enough to prove the point one way or the other.

After compiling this information and studying it for a while, I discovered a pair of columns by Rob Neyer of ESPN.com. In the first column, Rob described the McCracken article. In the second one, which appeared a couple of days later, Rob included email messages from Craig Wright and Bill James with their take on McCracken's assertion.

Wright described his own work in this area:

"Like McCracken, I've studied hits allowed per ball in play (though with the small difference that I subtract sacrifice hits) ... I agree that this type of hit rate is not as heavily influenced by the pitcher as is commonly believed, but at the same time I am distinctly uncomfortable with McCracken's conclusion."

James wrote that he hadn't studied this issue, but that he shared Wright's reservations and suggested that someone do a large-scale study to find out whether the idea would hold up. It appears that the work I had just finished doing was exactly what Bill was proposing.

In addition, Bill wrote about McCracken's work in the New Bill James Historical Baseball Abstract. Based on a review of an unspecified number of pitching careers and about 400 pitcher-seasons, he concluded that pitchers do have an influence on these outcomes but confirmed McCracken's finding that there's still a lot of random variation in single-season performances.

Finally, in recent months, I've seen more and more references to McCracken's assertion in various baseball articles and posts to baseball research forums. There's enough momentum building behind this idea that a few of our customers have asked how we might change the design of our Diamond Mind Baseball game to reflect this new knowledge about how baseball works.

Before making any changes to our game or our method for projecting player performance, I figured it was worth spending some time looking at this question.

NOTE: In an article published on Baseball Primer last year, McCracken softened his original conclusion a little, saying that there are small differences among pitchers in their ability to prevent hits on balls in play, and those differences are "statistically significant if generally not very relevant." Except for the regulars on Baseball Primer, I don't think many people in the baseball research community are aware of this update to McCracken's thinking.

The methodology

For every pitcher who appeared in the big leagues since 1913, I computed his HBP rate, walk rate, strikeout rate, homerun rate, and IPAvg for each of his seasons. The first four numbers are computed quite simply -- take the relevant stat and divide by batters faced. The IPAvg figures were computed according to McCracken's formula, which I wrote out a few paragraphs back.

To establish a baseline against which to evaluate those figures, I also computed those stats for each league-season and each team-season since 1913.

This enables us to evaluate every pitcher relative to the norms for his league. Last year, for example, Roger Clemens faced 768 batters and fanned 192 of them. That's a strikeout rate of .250 in a league where the average was only .163. His advantage over the league can be stated in two ways: (a) his rate was .077 higher than the league, and (b) he had 67 more strikeouts than the league-average pitcher would have had if he faced the same number of batters as Clemens. The same method was used to determine how many hit batsmen, walks, and homeruns each pitcher yielded above or below the league average.

For balls in play, I compared the in-play batting average for each pitcher and subtracted from that the corresponding in-play batting average for the league. As was the case with strikeouts, the result can be expressed either as a number of batting average points above/below the league or a number of hits above/below the league.

But hits on balls in play are subject to some outside influences that make comparisons with the league average a little suspect. Some parks (like Coors Field) tend to inflate batting averages. Some defenses are much better than others. If Jamie Moyer allows 15 fewer hits than normal, how can we decide whether to give Moyer the credit or chalk it up to Safeco Field and the talents of Mike Cameron and Ichiro?

To account for the effects of park and defense, I also computed the in-play average for each team-season in the period from 1913 to 2002. If McCracken is correct when he says that pitchers have virtually no influence over these outcomes, every pitcher on a given team should have roughly the same IPAvg. After all, those pitchers share a common park and a common defense.

If we then (a) compute the IPAvg for each team, (b) compare the IPAvg for each pitcher to that of his team, and (c) study those differences, we should find that the differences in IPAvg between a pitcher and his teammates are random. In other words, those differences should be centered around zero, equally likely to be above zero as below zero, and have no predictive value from one year to the next.

If we find that these differences are not random, there must be another factor, apart from defense and park effects, that accounts for them. And it follows that the missing factor must be an attribute of the pitcher. Because if the pitcher had nothing to do with it, there'd be no reason for that external factor to be evident only for this pitcher.

Studying career totals

At this stage of the process, we now know how much a pitcher exceeded or fell short of his league in five categories -- HBP, BB, K, HR and hits on balls in play -- for every season of his career. And we also know how much a pitcher exceeded or fell short of his teammates on in-play hits for every season of his career. The last step is to sum these values to obtain career totals (from 1913 forward) for every pitcher.

McCracken asserted that pitchers have a lot of control over the defense-independent pitching stats, so I would expect to see substantial differences among pitchers in their career HBP, walk, strikeout, and homerun rates, even after normalizing all of these figures against the league averages for each season.

After crunching the numbers for a total 29,973 seasons by 6,004 pitchers, we did indeed find very large differences among pitchers in some of the defense-independent statistics, especially walks and strikeouts. That's not likely to surprise any of you. It didn't surprise me, and it's entirely consistent with McCracken's findings.

More importantly, McCracken asserted that pitchers have almost no control over balls in play. If he's right, we would expect to see essentially random values for the career rates of in-play hits, especially for net in-play hits relative to the team baseline.

 

But we also found meaningful differences in the number of hits allowed on balls in play. In other words, a large number of pitchers consistently demonstrated the ability to limit the number of those hits. Their influence on these outcomes isn't as great as it is on the defense-independent stats, but it is real, and it is large enough to be important.

Here's a partial list of the top pitchers based on the number of career hits they saved relative to the IPAvg of their teams. The list includes two figures for each pitcher, the first without adjustments for park and defense and the second with those adjustments:

Pitcher            IPHits vsLg  IPHits vsTm
-----------------  -----------  -----------
Charlie Hough          -371         -299
Walter Johnson -277* -214*
Tom Seaver -269 -201
Catfish Hunter -296 -185
Warren Spahn -266 -183
Fergie Jenkins -128 -182
Pete Alexander -197* -177*
Phil Niekro -147 -172
Jim Palmer -315 -170
Ned Garver -71 -168 * excludes seasons before 1913

Charlie Hough has prevented more hits on balls in play than any other pitcher in our study, and our sample includes the last ninety years, so we've covered most of baseball history. Compared with the league-average pitcher, Hough has allowed 371 fewer hits on balls in play. Compared with his teammates, that figure drops to 299 hits, suggesting that his parks and defenses deserve some of the credit.

How important is 299 hits? Hough would have given up an extra run every three games or so if he had allowed hits on balls in play at the same rate as his teammates over the course of his career. That's a pretty big deal.

Could this happen by chance? No, it couldn't. Hough allowed batters to put 11,586 balls in play over the course of his career. If these results were random, there'd be a 95% chance that his net hits allowed would fall between +93 and -93 and a 99% chance they would fall between +116 and -116. The probability that a pitcher could reduce hits by 299 totally by chance is exceedingly small. (For the statisticians among you, Hough was more than six standard deviations from the mean.)

And Hough wasn't the only one, not by a long shot. In a sample of 351 pitchers with at least 6000 career balls in play, more than 12% of them posted results that would happen less than 1% of the time by chance. And that understates the case, too, because you get to keep pitching if you're that much better than the league, but you usually don't make it to 6000 balls in play if you're that much worse than the league. If one end of the distribution hadn't been truncated by job losses, approximately 20% of those pitchers would have fallen outside the range that can be explained by chance.

There are two knuckle-ballers on this list, and while you can't see it here, I can tell you that if I had run this list a little further, you'd have seen 6 knuckle-ballers in the top 35. (The other four are Eddie Rommel, Ted Lyons, Hoyt Wilhelm and Tim Wakefield.)

NOTE: The observation that knuckleball pitchers are especially good in this area is not new. Craig Wright noted the same thing in his email to Rob Neyer in January, 2001, and McCracken made this point in an article on Baseball Primer last year.

 

Some pitchers got a lot of help from their defense and park -- almost half of Jim Palmer's hits saved can be attributed to his defense (mostly) and his park -- while others look even better after the defense/park adjustment.

Of course, when you rank players based on counts, rather than averages, you're going to see a lot of guys with very long careers at the top of the list. So let's rank them again, this time dividing career hits saved by career balls in play, and setting a minimum of 5000 balls in play:

Pitcher            IPAvg vs Lg  IPAvg vs Tm

-----------------  -----------  -----------
Charlie Hough         -.032       -.026

Don Wilson            -.015       -.023

Andy Messersmith      -.033       -.021

Ned Garver            -.008       -.020

Tim Wakefield         -.020       -.019

Catfish Hunter        -.028       -.017

Bud Black             -.020       -.017

Oral Hildebrand       -.015       -.017

Walter Johnson        -.021       -.016

Dave Stieb            -.022       -.016

Hough remains the career leader by holding enemy hitters to an in-play batting average that was 26 points lower than that of the pitchers on his teams. That's a very substantial advantage, and one that is entirely inconsistent with McCracken's conclusion.

To recap, this examination of career totals suggests very strongly that a meaningful number of pitchers have demonstrated the ability to reduce the rate of hits on balls in play.

Year-to-year variations, part one

By comparing the results for two seasons, McCracken concluded that "there is little correlation between what a pitcher does one year in the stat and what he will do the next." I'll start by looking at a few of the pitchers mentioned in the McCracken article, then expand the study and get a little more scientific.

McCracken pointed out that Greg Maddux had one of the league's best marks in baseball in 1998, then had one of the worst in 1999, and bounced back with a good in-play average in 2000. The following chart shows his entire career, with bars going up indicating an IPAvg that was worse than average and the bars going down indicating a lower-than-average rate of hits on balls in play:

The wild swings of 1998-2000 look like an anomaly when you examine Maddux's entire career. In fact, it appears that he struggled a bit as a youngster, reeled off a decade of good-to-great performances, then began to lose it as he got into his mid-30s. That sounds like a pretty normal career progression to me.

Pedro Martinez was another pitcher who gave up a lot of in-play hits in 1999 but bounced back in 2000. It should be noted that Pedro had a 2.07 ERA despite all those in-play hits in 1999, so we can only imagine what he would have done if he'd been a little less unlucky. Here's Pedro's career:

There's really only one bad year in this line, but it happened to fall in one of the years McCracken looked at. I think it's fair to say that Pedro has shown an above-average ability to prevent hits on balls in play, but his influence on these results is much less than on strikeouts, where he consistently mowed down an extra 90 or more hitters a year, and an incredible 181 more than average in 1999.

McCracken wrote that "You'll often hear people use names like Randy Johnson, Jamie Moyer and Andy Pettitte [as being very good at preventing hits on balls in play], but by any definition you want to use, these guys are not particularly good in the stat." Here's Moyer's career:

Moyer wasn't very good in this respect, or in most other respects, for the first half of his career. But he figured something out in 1996 and has been consistently better than the league ever since, with the exception of 2000. If I was McCracken and I was looking at the 1999 and 2000 seasons, I would have concluded that Moyer isn't particularly effective in preventing hits, but his last seven years say otherwise.

By the way, it's tempting to assume that Safeco Field and a very good Seattle defense are responsible for these recent successes, but that wouldn't be true. First of all, the 1996-1999 numbers were accumulated in a mix of Fenway Park, the Kingdome, and Safeco, with only the second half of 1999 in Safeco. More importantly, these numbers are relative to the in-play average for his teams, so they already factor out the impact of the park and the defense. The bottom line is that Jamie Moyer has been a master at preventing hits on balls in play since 1996.

How about Andy Pettitte? Here's his career:

McCracken was quite correct in pointing out that Pettitte is not a pitcher who prevents hits on balls in play. On the other hand, he's a very good counter-example regarding the claim that pitchers are not consistent in this regard.

Randy Johnson is the third pitcher mentioned by McCracken in the quote I cited above. Here's how Johnson has fared on balls in play over his career:

That's nine straight seasons at or better than the league average, followed by five seasons that were league-average or worse. The shift occurred at the very moment that he moved from the AL to the NL. I'm not sure whether that's meaningful, or whether it has more to do with the fact that he turned 35 in 1998. Like Pedro, Johnson's main asset is not his ability to prevent hits on balls in play, it's his ability to prevent balls in play in the first place. But Johnson was pretty good on those balls in play for nine years.

McCracken also claimed that "Randy Johnson gives up fewer hits than Scott Karl. That's not because batters hit the ball harder off Karl than Johnson, but because they hit the ball more often off Karl than Johnson." Here's Karl's career:

You might be able to make the case that Karl in his prime wasn't any worse than Randy Johnson in his late 30s, but if you compare the two pitchers at the same age, there's a noticeable edge for Johnson.

While we're on the subject of consistency from year to year, let's take a look at some of the knuckleballers, starting with Charlie Hough:

This chart is a little misleading in one respect. There are two bars for 1980, one for each of the teams he played for that year. Hough's IPAvg was awful in his 32 innings with the Dodgers and quite good in his 61 innings with Texas. Overall, he was a little worse than average for the year. The bottom line is that Hough was remarkably good at preventing hits on balls in play for a very long time.

Here's another knuckleballer, Tim Wakefield:

And a third knuckleballer, Phil Niekro:

Hough and Wakefield were remarkably good throughout their careers, and if you ignore the years after his 43rd birthday (1983 to the end), you could say the same about Niekro, too.

Number two on the all-time list was Walter Johnson, whose career looked like this:

Remember, I cut things off at 1913, so this leaves out his early years. It's quite possible that he would have been the all-time leader if those seasons had been included.

Sandy Koufax got some help from Dodger Stadium, but that wasn't the only reason he was so dominant during the last five years of his career. Even with the park and defense factored out, his IPAvg was consistently good during those years:

Finally, here's Jim Palmer, another Hall-of-Famer who was consistently good on balls in play during his career, except for the very beginning and end of his time in the big leagues:

If I had run Palmer's chart showing his performance relative to the league average (instead of his team), it would have been twice as impressive.

We could go on and do a lot more pitchers, but I think we've seen enough to make the point that it's not too hard to find examples where these in-play averages appear to be anything but random. In other words, this is highly persuasive evidence that these pitchers did indeed have the ability to prevent hits on balls in play.

Year-to-year variations, part two

It goes without saying that one cannot prove or disprove the idea that "there is little correlation between what a pitcher does one year in the stat and what he will do the next" by examining only ten or twelve careers.

To get a better handle on this phenomenon, I compiled a database consisting of all pairs of consecutive seasons in which a pitcher faced at least 400 batters in each season. Using this sample of 7,486 season-pairs, I computed the correlation coefficient for the net HBP rate, BB rate, K rate, HR rate, and in-play hit rate.

I found the highest correlation (.73) for strikeout rates. Walk rates (.66) were also highly correlated. The correlation coefficients dropped to .36 for hit batsmen, .29 for homeruns, and .16 for in-play batting average relative to the league. The lowest correlation (.09) was seen for in-play batting average relative to the team.

It may appear to be contradictory to say that certain pitchers appear to be consistently good while the overall correlation rate is quite low. But that's not necessarily so.

If McCracken is right, the difference between a pitcher's IPAvg and that of his team should vary randomly around zero as he moves through his career, and the correlation would be quite weak.

But if pitchers do have some influence over these outcomes, they could still exhibit a weak correlation by varying around some value other than zero that reflects the ability of the pitcher.

What about the weaker pitchers?

Most of our work to this point has focused on pitchers who had long and mostly successful careers in the big leagues. How do the DIPS and IPAvg stats of these players compare to those of players who weren't good enough to last that long?

The following table shows how eleven groups of pitchers compared with the overall averages. The first row includes all pitchers who faced less than 1,000 batters in their careers. The second row includes all pitchers who faced at least 1,000 batters but less than 2,000 batters during their careers. And so on.

 Career BF          BF    HBP     BB      K     HR   vsLg   vsTm

   1 -  999    401,138   .002   .027  -.017   .002   .017   .015

1000 - 1999    931,981   .001   .013  -.009   .001   .006   .004

2000 - 2999  1,105,712   .001   .007  -.005   .000   .002   .001

3000 - 3999  1,179,916   .000   .006  -.003   .000   .000   .000

4000 - 4999    906,271   .000   .002  -.002   .000   .000   .001

5000 - 5999    920,680   .000   .001   .000   .000   .000   .000

6000 - 6999    647,553   .000  -.004  -.002   .001  -.001  -.001

7000 - 7999    843,937   .000  -.003   .000   .000  -.002  -.001

8000 - 8999    716,200  -.001  -.005   .005   .000  -.002  -.002

9000 - 9999    788,532   .000  -.008  -.001  -.001  -.002  -.001

10000+       2,589,409  -.001  -.010   .008  -.001  -.004  -.003

Let's walk through the first row so it's clear how to read this table. Those pitchers, as a group:

  • faced a total of 401,138 batters in their careers
  • hit batters at a rate that was .002 above the league average. In other words, they hit two more batters per 1000 BF than did the average pitcher.
  • walked 27 more batters per 1000 BF
  • struck out 17 fewer batters per 1000 BF
  • gave up 2 more homers per 1000 BF
  • gave up 17 more hits per 1000 balls in play when compared with the league-average pitcher
  • gave up 15 more hits per 1000 balls in play when compared with the in-play averages of their teammates

As you can see from the table, the pitchers with longer careers were progressively better than their shorter-career counterparts in every respect. They walked fewer batters, struck out more hitters, gave up fewer homeruns, and gave up fewer hits on balls in play. The ability to prevent hits on balls in play appears to be as much of a skill as anything else.

It might be easier to see this in chart form, so here are the walk rate, strikeout rate, homerun rate, and in-play averages for these groups of pitchers:

Another interesting aspect of this breakdown by career length is the total number of batters faced by each group. Only a very small percentage of batters are faced by pitchers with short careers. Of the roughly 11 million plate appearances since 1913 (including the Federal League of 1914-15), only 3.6% featured pitchers who finished their careers with less than 1000 batters faced.

In fact, the midpoint falls in the 6000-6999 group. A little more than half of the plate appearances since 1913 have been initiated by a pitcher who faced at least 6000 hitters in his career. We, along with other baseball analysts, often compare pitchers to the league average. Those league averages reflect the fact that the majority of plate appearances involve pitchers who are good enough to face thousands of big-league hitters.

That's a very high standard. And that may explain why it's difficult for any pitcher to consistently perform at a level higher than the league average. The table shows that the pitchers with the longest careers are only a little better than average. (They peak at a higher level, of course, but if you take their entire careers, there's not a huge difference.)

A better indicator may be the comparison of the short-career pitchers to the league averages. The chart shows that these marginal hurlers are far worse than the average in every way. In particular, they give up a lot more hits on balls in play than do the pitchers who are good enough to be big-league regulars for several years.

What's the right baseline?

At this point, we've seen (a) career totals that demonstrate that pitchers do influence these outcomes over the course of their careers, (b) several examples of pitchers who have been very consistent in IPAvg during their careers, and (c) that pitchers with longer careers are better than pitchers with shorter careers in every respect, including IPAvg.

In other words, pitchers do affect the rate of hits on balls in play. That means we can no longer use the team's IPAvg as a baseline against which to evaluate a pitcher. McCracken asserted that the team's IPAvg depended only on the park and the defense, but we've found that it depends on the park, the defense, and the quality of the pitchers on that team. If we use team IPAvg as the baseline, a good pitcher on a good staff is going to look worse than he really is. A good pitcher on a bad staff is going to look better than he really is. A good pitcher on an average team is still going to look a little worse than he really is because his own good performance is included in the team's IPAvg.

That leads to a good question, one that is not easily resolved. Is it better to compare a pitcher's IPAvg to that of his league or his team? If we use the league IPAvg as our baseline, we leave out the impact of the park and the defense. If we use the team's IPAvg as the baseline, we adjust for the park and the defense, but we introduce the quality of the fellow pitchers as a variable that can skew the results.

Neither approach is completely satisfactory. It's probably best to evaluate each pitcher's IPAvg against that of his team but make some accommodation for the quality of the pitching staff before making any judgments about that pitcher and before making any predictions about future performance.

Pitcher Profiles

In addition to ranking pitchers on IPAvg, this exercise provides a different way of looking at pitching careers. By putting each pitcher's career totals for net HBP, BB, K, HR, and IPHits side by side, we get a very clear picture of the reasons why they were successful.

Let's do a few, starting with Roger Clemens:

How's that for a picture of all-around greatness? Sure, he hit a few more batters than the average pitcher, but compared to the league averages, he walked 173 fewer and struck out 1,355 more, allowed 138 fewer homers, and surrendered 101 fewer hits on balls in play. (The IPHits figures include the defense/park adjustments for all of these profiles.)

Pedro Martinez shows a very similar pattern to that of Roger Clemens, but based on less than half of Clemens' batters faced.

 

Greg Maddux demonstrates awesome control, an above-average K rate, and the ability to keep the ball in the park. He had some influence on IPAvg, but that was only a part of his success.

By the way, some of those 69 hits saved might be attributable to his own defensive skill rather than his pitching skill. It's also quite possible that the -69 figure signficantly understates his contribution. Maddux saved 97 hits relative to the league averages, and now that we've shown that the team IPAvg reflects the ability of the other pitchers on the staff, that figure may represent Maddux's talent more accurately.

This line shows only one dominating characteristic -- the strikeouts. But if you're going to dominate in one area, that's a good one, because they can't get a hit if they can't put the ball in play. Fortunately for Johnson, his control is only a little worse than the norm, and got better in the later stages of his career.

Guys with below-average strikeout rates aren't supposed to be successful, but Moyer's exceptional control and low IPAvg have been the keys, especially in the later stages of his career.

Now here's a guy who didn't strike anyone out and gave up a lot of hits on balls in play, but survived because he had excellent control and kept the ball in the park. In particular, he kept the ball on the ground, meaning that a lot of those extra hits were singles and that a good number of potential rallies were killed by double plays.

John's profile made me think that it would have been a good idea to extend McCracken's work to measure GDP rates, but that notion didn't hit me until it was too late. Some day, I'll go back and add that to the study and see what pops out.

We can't leave this section without looking at the all-time leader in in-play hits saved. As you can see, Hough hit more batters, walked more batters, struck out only a few more batters, and gave up more homers than the average pitcher. His ability to prevent hits on balls in play is the biggest reason he had a long and successful career.

Is there really any doubt that Don Sutton is a Hall-of-Famer when you look at this profile?

Groups of similar pitchers

We could go on forever this way, so let's speed things up by looking at groups of pitchers with similar styles. Maybe we'll see some patterns.

Power pitchers

                   HBP    BB      K    HR   IPHits

Nolan Ryan         +44  +878  +2578  -117    -133

Randy Johnson      +44  +107  +1769   -52     -10

Roger Clemens      +17  -173  +1355  -138    -101

Dazzy Vance        +19   -65  +1122   -20     -19

Steve Carlton      -49    -1  +1042    +5     -31

Bob Feller          -1  +149  +1022   -42     -53

Sandy Koufax       -35   +64  +1015   -12     -94

Pedro Martinez     +27  -152   +974   -60     -47

Obviously, the defining characteristic of these pitchers was their ability to retire batters without help from anyone else. As a group, with the exception of Ryan, they had average control. All of them were better than average on hits per ball in play, but that wasn't the main reason for their success.

Closers

                   HBP    BB      K    HR   IPHits

Rich Gossage       +10   +90   +492   -31     -57

Lee Smith          -17   +25   +447   -21     +12

Tom Henke          -10   -24   +391   -12     -20

Rollie Fingers      +4  -109   +358   -16     +12

Armando Benitez     -3   +78   +332    +1     -41

Trevor Hoffman     -17   -34   +317    -8     -49

John Wetteland      -5   -25   +310    -5     -39

Billy Wagner         0   +17   +295    -6     -23

Robb Nen           -18    -3   +283   -27     +12

Troy Percival        0   +30   +279    -9     -55

Bruce Sutter        -4   -48   +269    +1     -54

This is just a special case of the power pitchers group, but it's interesting to see how many of these guys have posted impressive IPHits numbers even though they pitch many fewer innings than do the power pitchers in the previous table.

Control freaks

                   HBP    BB      K    HR   IPHits

Robin Roberts      -40  -772    -15   +56     -82

Pete Alexander     -50  -570   +247    -1    -177

Jim Kaat           +19  -566   -264    -4    +144

Ferguson Jenkins   -12  -534   +635  +125    -182

Greg Maddux         -4  -507   +150  -147     -69

Ted Lyons          -43  -481   -366    -7    -121

Dutch Leondard      +7  -477   -100   -53     -64

Don Sutton         -25  -476   +512   +42    -138

Lew Burdette        -9  -445   -611   -13     +32

Walter Johnson     +25  -442   +847   -20    -214

Some of these guys (Roberts, Jenkins, and Sutton) gave up more than their share of homers, but with control this good, plus the ability to reduce hits on balls in play, a lot of those homers were solo shots.

Crafty lefties

                   HBP    BB      K    HR   IPHits

Warren Spahn       -63  -437    -36   -44    -183

Bud Black            2  -110   -204   +23    -114

Randy Jones        -18  -189   -346   -13     -97

Wilbur Wood          0  -238   -135   -13     -84

John Tudor          -3  -146    -50    +5     -82

Kenny Rogers        +3   -39   -105   -40     -74

Larry Gura          +4  -127   -276   +21     -72

Jim Deshaies        -7   +21    -34   +44     -72

Jamie Moyer          0  -238   -153   +15     -65

Don Carman          +4   +44     -4   +36     -65

This is a list of left-handed pitchers with below-average strikeout rates. Most had very good control, but six of them were at least as susceptible to the long ball as the average pitcher. A significant part of their success is/was the ability to keep hitters off balance and keep their in-play batting averages down.

Putting the pieces together

We've seen that there's more than one way to succeed as a big-league pitcher. Robin Roberts walked 772 fewer batters than his peers. Roger Clemens struck out 1355 more batters than average. Greg Maddux yielded 147 fewer homeruns. And Charlie Hough prevented somewhere between 299 and 371 hits on balls in play.

So what's the most important element of a pitcher's repertoire?

Well, the value of various baseball events depends on the era. When scoring is up, as it has been in recent years, an extra baserunner comes around to score more often than during a period like the 1960s. In The Hidden Game of Baseball, Pete Palmer provided a table of run values for various periods in the 20th century, and I'll use those values to evaluate these events.

Palmer puts the value of a walk at about a third of a run, so the 772 walks saved by Robin Roberts are worth about 250 runs over the course of a career. That's not bad.

Clemens struck out 1355 more batters, but if he hadn't, some of those batters would have reached base, and some would have been retired in other ways. If his strikeout rate had been at the league average, it's possible that he would have allowed another 125 walks, 35 homers, and 320 more hits on balls in play. Using Palmer's run values and reasonable assumptions about the distribution of those hits among singles, doubles, and triples, those strikeouts are worth about 250-280 runs.

Palmer puts the value of a homer at about 1.4 runs, so Maddux saved about 200 runs by keeping his homerun rate down.

And the 300+ hits saved by Hough are worth about 150-175 runs.

Those are impressive figures, and they'd be even more impressive if we were evaluating them against replacement level pitchers instead of the league average. As we noted before, the league average is a very high standard.

The bottom line is that success in all four areas is important. You can have a good career if you're average in all four areas or if you can offset one weak area with a strength in another. You can have a very good career if you have no major weaknesses and you have a special ability in one of these respects. And you can have a great career if you're better than average in all four areas.

Summing up

Having completed this study, I can sum up my own beliefs as follows:

1. Pitchers have more influence over in-play hit rates than McCracken suggested. In fact, some pitchers (like Charlie Hough and Jamie Moyer) owe much of their careers to the ability to excel in this respect.

2. Their influence over in-play hit rates is weaker than their influence over walk and strikeout rates. The most successful pitchers in history have saved only a few hits per season on balls in play, when compared with the league or team average. That seems less impressive than it really is, because the league average is such a high standard. Compared to a replacement-level pitcher, the savings are much greater.

3. The low correlation coefficients for in-play batting average suggest that there's a lot more room for random variation in these outcomes than in the defense-independent outcomes. I believe this follows quite naturally from the physics of the game. When a round bat meets a round ball at upwards of 90 miles per hour, and when that ball has laces and some sort of spin, miniscule differences in the nature of that impact can make the difference between a hit and an out. In other words, there's quite a bit of luck involved.

4. Year-to-year variations in IPAvg-versus-team can occur if the quality of a pitcher's teammates varies from year to year, even if that pitcher's performance is fairly consistent.

5. The fact that there's room for random variation doesn't necessarily mean a pitcher doesn't have any influence over the outcomes. It just means that his year-to-year performances can vary randomly around value other than zero, a value that reflects his skills.

6. Unusually good or bad in-play hit rates aren't likely to be repeated the next year. This has significant implications for projections of future performance.

7. Even if a pitcher has less influence on in-play averages than on walks and strikeouts, that doesn't necessarily mean that in-play outcomes are less important. Nearly three quarters of all plate appearances result in a ball being put in play. Because these plays are much more frequent, small differences in these in-play hit rates can have a bigger impact on scoring than larger differences in walk and strikeout rates.

The process of separating pitching stats into defense-independent and defense-dependent groups is illuminating. The notion that pitchers don't have as much control over in-play outcomes as they do over defense-independent outcomes is both obvious (in retrospect) and very important. Voros McCracken deserves a lot of credit for introducing this way of thinking.

The bottom line, though, is that I am convinced that pitchers do influence in-play outcomes to a significant degree. There's a reason why Charlie Hough and Jamie Moyer and Phil Niekro and Tom Glavine and Bud Black have had successful careers despite mediocre strikeout rates. There's a reason why the top strikeout pitchers have also suppressed in-play hits at a good rate. Using power or control or deception or a knuckleball, pitchers can keep hitters off balance and induce more than their share of routine grounders, popups, and lazy fly balls.

Philadelphia-Houston playoff game, 2005

Philadelphia-Houston playoff game, 2005

Last updated: October 12, 2005

The last weekend of the 2005 season featured a tight race for the NL wild card. Philadelphia won its last four but was left out of the postseason parade when Houston took care of business. The folks at the Philadelphia Daily News wondered how a playoff game might have turned out had Houston dropped one of its weekend games to fall into a tie, and we were more than happy to help out.

We set up the rosters using the stats for the 2005 season, set up the fatigue information by entering the number of pitches thrown in recent games by the pitchers on both teams, and played the game once. The paper ran a game story in the October 4th edition, but they didn't have room for the boxscore and play-by-play. Here's how the game turned out:

 

10/3/2005, Hou05-Phi05, Citizens Bank Park

1 2 3 4 5 6 7 8 9 R H E LOB DP
2005 Houston 2 0 0 0 0 0 0 0 0 2 5 0 6 0
2005 Philadelphia 0 0 2 3 0 0 0 0 x 5 5 2 3 1

Houston AB R H BI AVG Philadelphia AB R H BI AVG
Biggio 2b 3 0 0 0 .000 Rollins ss 4 1 1 2 .250
Taveras cf 4 1 0 0 .000 Lofton cf 4 0 0 0 .000
Ensberg 3b 4 1 2 2 .500 Utley 2b 3 0 0 0 .000
Berkman lf 4 0 1 0 .250 Abreu rf 3 1 1 0 .333
Lamb 1b 3 0 1 0 .333 Burrell lf 4 0 0 0 .000
Bagwell ph 0 0 0 0 .000 Wagner p 0 0 0 0 .000
Lane rf 4 0 0 0 .000 Howard 1b 2 1 0 0 .000
Ausmus c 4 0 0 0 .000 Bell 3b 3 1 1 3 .333
Everett ss 3 0 1 0 .333 Lieberthal c 3 1 2 0 .667
Vizcaino ph 1 0 0 0 .000 Padilla p 1 0 0 0 .000
Backe p 1 0 0 0 .000 Michaels ph 1 0 0 0 .000
Palmeiro ph 1 0 0 0 .000 Madson p 0 0 0 0 .000
Qualls p 0 0 0 0 .000 Chavez lf 0 0 0 0 .000
Burke ph 1 0 0 0 .000 28 5 5 5
Gallo p 0 0 0 0 .000
Wheeler p 0 0 0 0 .000
33 2 5 2

Houston INN H R ER BB K PCH STR ERA
Backe L 0-1 4.0 4 5 5 3 3 75 41 11.25
Qualls 2.0 1 0 0 0 1 27 18 0.00
Gallo 1.2 0 0 0 0 3 22 16 0.00
Wheeler 0.1 0 0 0 0 1 7 5 0.00
8.0 5 5 5 3 8 131 80

Philadelphia INN H R ER BB K PCH STR ERA
Padilla W 1-0 7.0 4 2 1 1 6 101 70 1.29
Madson H 1 1.0 1 0 0 0 0 14 9 0.00
Wagner S 1 1.0 0 0 0 1 2 14 7 0.00
9.0 5 2 1 2 8 129 86

Hou: Palmeiro batted for Backe in the 5th
Burke batted for Qualls in the 7th
Bagwell batted for Lamb in the 9th
Vizcaino batted for Everett in the 9th
Phi: Michaels batted for Padilla in the 7th
Chavez inserted at lf in the 9th

E-Utley 2. 2B-Ensberg, Berkman, Everett, Abreu. HR-Ensberg(1), Rollins(1),
Bell(1). K-Taveras, Ensberg, Lamb, Lane 2, Ausmus, Backe, Vizcaino,
Rollins 2, Utley, Burrell 3, Bell, Michaels. BB-Biggio, Bagwell, Utley,
Abreu, Howard. SH-Padilla. WP-Backe, Wagner.
GWRBI: Bell
Temperature: 82, Sky: clear, Wind: out to center at 6 MPH.
 
 
 
10/3/2005, Hou05-Phi05, Citizens Bank Park

1 2 3 4 5 6 7 8 9 R H E LOB DP
2005 Houston 2 0 0 0 0 0 0 0 0 2 5 0 6 0
2005 Philadelphia 0 0 2 3 0 0 0 0 x 5 5 2 3 1

Score O Rnr BS Event
----- - --- -- -----

************** Top of the 1st inning, Houston batting
0-0 0 --- 22 Biggio grounded out to the mound (BBFCX)
0-0 1 --- 00 Taveras to first on an error by the second baseman Utley
(X)
0-0 1 1-- 01 Ensberg homered deep to left, Taveras scored (CX)
2-0 1 --- 10 Berkman flied out to left (BX)
2-0 2 --- 11 Lamb lined a single to right center (BFX)
2-0 2 1-- 21 Lane flied out to center (BBCX)

************** Bottom of the 1st inning, Philadelphia batting
2-0 0 --- 22 Rollins struck out (BFFBS)
2-0 1 --- 10 Lofton grounded out to second (BX)
2-0 2 --- 32 Utley walked (BBBCSB)
2-0 2 1-- 32 Abreu grounded out to second (FBBCB>X)

************** Top of the 2nd inning, Houston batting
2-0 0 --- 20 Ausmus flied out to center (BBX)
2-0 1 --- 22 Everett grounded a double down the first base line
(BCBFFX)
2-0 1 -2- 12 Backe struck out (BSSC)
2-0 2 -2- 12 Biggio grounded out to the mound (BCCX)

************** Bottom of the 2nd inning, Philadelphia batting
2-0 0 --- 32 Burrell struck out (BBCCFBC)
2-0 1 --- 01 Howard flied out to center (SX)
2-0 2 --- 11 Bell grounded out to short (FBX)

************** Top of the 3rd inning, Houston batting
2-0 0 --- 10 Taveras to first on an error by the second baseman Utley
(BX)
2-0 0 1-- 32 Ensberg struck out (BFFBBC)
2-0 1 1-- 02 Berkman grounded into a double play, Rollins to Utley to
Howard (CFX)

************** Bottom of the 3rd inning, Philadelphia batting
2-0 0 --- 10 Lieberthal lined a single to left (BX)
2-0 0 1-- 01 Padilla popped out on a bunt to first (CbXb)
2-0 1 1-- 31 Rollins homered deep to right center, Lieberthal scored
(BBFBX)
2-2 1 --- 10 Lofton grounded out to short (BX)
2-2 2 --- 02 Utley lined out to right (CCX)

************** Top of the 4th inning, Houston batting
2-2 0 --- 12 Lamb flied out to left (CFFBX)
2-2 1 --- 02 Lane grounded out to short (CCX)
2-2 2 --- 12 Ausmus struck out (CBFS)

************** Bottom of the 4th inning, Philadelphia batting
2-2 0 --- 32 Abreu walked (CBBFBB)
2-2 0 1-- 12 Burrell struck out (CBFC)
2-2 1 1-- 21 Backe threw a wild pitch, Abreu to second (CBBB)
2-2 1 -2- 31 Howard walked (CBBB.B)
2-2 1 12- 20 Bell homered deep to left, Abreu scored, Howard scored
(BBX)
2-5 1 --- 10 Lieberthal lined a single down the right field line (BX)
2-5 1 1-- 21 Padilla sacrifice bunted to the mound, Lieberthal to
second (BbBbFbXb)
2-5 2 -2- 12 Rollins lined out to center (SFFBFX)

************** Top of the 5th inning, Houston batting
2-5 0 --- 02 Everett flied out to center (CFX)
Palmeiro pinch hitting for Backe
2-5 1 --- 01 Palmeiro flied out to center (FX)
2-5 2 --- 31 Biggio walked (FBBBB)
2-5 2 1-- 02 Taveras struck out (CFC)

************** Bottom of the 5th inning, Philadelphia batting
Qualls now pitching
2-5 0 --- 22 Lofton grounded out to first (BFFFBFX)
2-5 1 --- 10 Utley grounded out to second (BX)
2-5 2 --- 32 Abreu lined a double to right center (FBCBBX)
2-5 2 -2- 12 Burrell popped out to third (FCBX)

************** Top of the 6th inning, Houston batting
2-5 0 --- 02 Ensberg lined out to center (FSX)
2-5 1 --- 32 Berkman lined a double to right center (BFFBBX)
2-5 1 -2- 22 Lamb struck out (BFCBS)
2-5 2 -2- 22 Lane struck out (BCCBFC)

************** Bottom of the 6th inning, Philadelphia batting
2-5 0 --- 10 Howard grounded out to third (BX)
2-5 1 --- 12 Bell struck out (CBFS)
2-5 2 --- 01 Lieberthal grounded out to short (FX)

************** Top of the 7th inning, Houston batting
2-5 0 --- 01 Ausmus flied out to left (FX)
2-5 1 --- 12 Everett lined out to first (BSCFX)
Burke pinch hitting for Qualls
2-5 2 --- 12 Burke flied out to left (CBSX)

************** Bottom of the 7th inning, Philadelphia batting
Gallo now pitching
Michaels pinch hitting for Padilla
2-5 0 --- 32 Michaels struck out (SBBBFS)
2-5 1 --- 22 Rollins struck out (CFFBFFFBS)
2-5 2 --- 01 Lofton flied out to center (CX)

************** Top of the 8th inning, Houston batting
Madson now pitching
2-5 0 --- 11 Biggio grounded out to short (FBX)
2-5 1 --- 22 Taveras flied out to center (CBFBX)
2-5 2 --- 01 Ensberg lined a double to right center (CX)
2-5 2 -2- 21 Berkman grounded out to second (BBCX)

************** Bottom of the 8th inning, Philadelphia batting
2-5 0 --- 12 Utley struck out (CFBC)
2-5 1 --- 00 Abreu grounded out to third (X)
Wheeler now pitching
2-5 2 --- 22 Burrell struck out (CCBBFFS)

************** Top of the 9th inning, Houston batting
Wagner now pitching
Chavez now playing left field
Bagwell pinch hitting for Lamb
2-5 0 --- 30 Bagwell walked (BBBB)
2-5 0 1-- 12 Lane struck out (BCCC)
2-5 1 1-- 00 Wagner threw a wild pitch, Bagwell to second (B)
2-5 1 -2- 10 Ausmus popped out to short (B.X)
Vizcaino pinch hitting for Everett
2-5 2 -2- 12 Vizcaino struck out (CCBS)

 

Winning Early, Young Starters, and Defense

Tom Ruane and Tom Tippett Share Research

This month, I'm delighted to share this space with Tom Ruane, who recently joined the Diamond Mind team as a researcher, programmer and writer. Tom came to my attention earlier this year when he posted a large number of extremely well-researched and well-written items to the mailing list for the Society for American Baseball Research (SABR). He's one of the best baseball analysts I've come across in a long, long time, and I hope you enjoy reading his material as much as I do.

We're going to cover four mostly unrelated topics in this article. First, Tom will explore how much you can read into teams that get off to a quick start in April. Second, he'll delve into the question of babying the arms of young pitchers. And, finally, I'll present some grass/turf fielding statistics and talk about illusions that can be created by range factors.

The Importance of Winning Early, by Tom Ruane

Near the beginning of the 1998 baseball season, Don Zminda of Stats Inc. wrote a column for America On-Line. It dealt with the importance of the first few games of the season and started with a table correlating a team's mark in its first 10 games with its eventual record. For example, of the 218 teams to start the season 4-6 since 1951, only 38.1% of them managed to avoid a losing record, while 63.3% of the 226 teams starting out 6-4 finished up at .500 or better. He continued:

"Interesting stuff, but John Dewan wanted Don to go out an even more of a limb. He asked to see the same chart based on a club's first two or three games. This one simply knocked me over:

   Teams W-L Record in First 2 or 3 Games 1951-97



                    Cumul Finish     Ended Season

           # of    ==============    ============

    Start   Tms     Record     Pct    .500+   Pct



     0- 2   284  21279-23213  .478     119   41.9%

     1- 1   494  38916-39137  .499     251   50.8%

     2- 0   288  23648-21493  .524     189   65.6%



     0- 3   146  10936-11925  .478      56   38.4%

     1- 2   383  29470-30800  .489     182   47.5%

     2- 1   388  30937-30317  .505     213   54.9%

     3- 0   149  12500-10801  .536     108   72.5%

Would you have guessed that there'd be such a big difference between starting out 0-2 versus going 1-1 or 2-0? But there it is, and it seems to be anything but a fluke. At every number of games we looked at from two to 10, there's the same straight-line progression."

Which led Don to conclude that "early-season games ARE a lot more important than you might think. That old baseball adage - 'A win in April is worth two in September' - is not so crazy."

I thought it was very interesting, but decided to see if the phenomenon was unique to games at the start of the season. So here's what I did: rather than concentrate on the first 3 games of the season, I looked at ALL 3-game stretches during the season. If a win in April is really worth 2 at other times, you'd expect the spread of winning percentages to be more pronounced in games 1-3 than, for example, in games 135-7. But it isn't.

     Teams W-L Record in Games 135-137 1951-96



                   Cumul Finish     Ended Season

          # of    ==============    ============

   Record  Tms     Record     Pct    .500+   Pct

   

    0- 3   143  10556-12246  .463      48   33.6%

    1- 2   344  26666-28265  .485     145   42.2%

    2- 1   354  28918-27679  .511     214   60.5%

    3- 0   143  12450-10400  .545     114   79.7%

NOTE: I've removed ties from the equation. I guess I should call them decisions not games, but you get the idea.

The median percentages of these 4 groups of records, starting with the games 1-3 and going through to games 160-162, is the following:

   Record  Win Pct    .500+ Pct

    0- 3    .469        35.8%

    1- 2    .490        46.8%

    2- 1    .511        58.7%

    3- 0    .530        70.0%

So what their study shows is not that early games are more significant than later ones, but that a lot more bad teams go 0-3, at ANY time during the season, than do good teams. And so on.

By the way, the worst single game to lose if you want to have a winning season? Game 81. Only 43.4% of the teams losing their 81st decision since 1951 were able to avoid a losing season. The best game to lose? It's a tie: all 5 teams that dropped their 163rd, 164th or 165th decision of the season had great records. I wonder why.

The Care and Feeding of Young Arms, by Tom Ruane

With Kerry Wood in the headlines recently, there has been a lot of discussion in the press about how best to protect his (and other young pitcher's) arm. In May, Rob Neyer wrote a "Stats Class" column on ESPN.com about the price pitchers ultimately pay when they rack up a lot of innings at a tender age. He looked at the 19 pitchers who have thrown 200 innings in a season since 1969 and compared their performance in seasons 1-3 with their performance in seasons 4-6. Here's what he found:

                   IP     K/9    ERA

    Years 1-3   11375    5.97   3.31

    Years 4-6    8565    6.24   3.43

     Pct. +/-   -24.7    +4.5   -3.6

He noted the large dropoff in innings pitches, gave a few examples (including Mark Fidrych) and concluded with:

"You've heard a lot about Kerry Wood and Nolan Ryan. . . . But there is one important difference between the two pitchers. Nolan Ryan didn't total 200-plus innings in a major-league season until he was 25 years old. No one ever talks about this, but it goes a long way toward explaining why Ryan was still throwing 90-plus fastballs two decades later."

Well, that got me interested in what a similar chart would look like for pitchers over the same period (1969-1991) who waited until they were 25 to pitch 200 or more innings. (I picked 1991 as the cutoff because it gave me six years of data to examine for today's pitchers.) This list includes 54 pitchers and looks like this:

                    IP    K/9    ERA

     Years 1-3   31240   5.32   3.77

     Years 4-6   22448   5.44   3.81

      Pct. +/-   -28.1   +2.3   -1.1

So the innings pitched totals for the pitchers who saved themselves until their mid-twenties fell off even more than those who were rushed into action. And given that their performance over the second period was almost identical to the first, I'm assuming that their drop in innings pitched was primarily due to arm problems. Of course, it is possible that their teams expected them to get better and dumped them when they failed to improve. You could argue that we're talking about two different classes of pitchers here and you'd be right; pitchers with the ability to break into a starting rotation in their early twenties are as a group a lot more talented than those who come around four or five years later. Still, the differences in their ability should not have affected their susceptibility to arm woes.

By the way, I looked at a sampling of these pitchers, and with some exceptions (like Doyle Alexander), they did not exceed 200 innings pitched in the minors either.

Of course, I'm not recommending that it's okay for Kerry Wood to start throwing 150 pitches a game, or that it is somehow beneficial for a pitcher to top the 200 inning mark early. I guess my point is two-fold:

1) Innings pitched might not be the best measurement here. How many pitches he throws, especially when his arm is tired, is probably a better indicator than yearly innings pitched. Livian Hernandez pitched 96 innings in 1997 (not counting the post-season) and was probably overworked.

2) Pitchers' arms are fragile throughout their twenties. My guess is that the charts above wouldn't have changed that much (except for the number of pitchers involved) if I had picked 23, 27 or 29 as the target age. Pitching strategies are still evolving as we learn more and more about how to protect players from arm problems. It wouldn't surprise me if in a decade or so, most teams have six-man rotations and starting pitchers average from 150-175 innings a year. Of course then you'll have to listen to me complain about how they don't make pitchers like they use to. (Why in my day pitchers were men--tough guys who thought nothing of pitching seven, maybe eight innings on occasion-- especially if they had their good stuff, and the weather wasn't too hot or muggy.)

Grass/Turf Effects in Fielding Statistics

I'm sure by now you've seen your share of situational breakdowns in batting and pitching stats in various magazines, books and web sites. But I'll be you haven't seen too many fielding splits.

I spend a lot of time studying fielding, probably because I'm a little guy who grew up as a good-field, light-hitting shortstop. I have no illusions about being good enough to play professional ball -- I'm definitely not -- but if there was a Designated Fielder position in baseball, I'd be first in line. It wouldn't matter to me if I never came to the plate so long as I could play shortstop every day.

One of the things I've long been curious about is the effect of grass and turf fields on fielding statistics. Do infielders get to more grounders when the infield grass has a chance to slow them down? How much does playing on turf cut down on error rates? Is there any impact on the stats for outfielders?

To find out, I compiled fielding totals by position on a grass/turf basis for the period from 1980-1997. The data originated from Retrosheet (www.retrosheet.org), Project Scoresheet (now defunct), The Baseball Workshop (now part of Total Sports), and Total Sports. Here are the totals:

                Fielding Totals, Grass/Turf, 1980-97



Pos  S  Innings     PO      A    E    DP   PB   FPct  PO/9   A/9  DP/9  PB/9

---- -  ------- ------ ------ ---- ----- ----  ----- ----- ----- ----- -----



 p   t   244805  15818  33919 2365  2246        .955  .582 1.247  .083

 p   g   419065  26917  58977 3940  4325        .956  .578 1.267  .093



 c   t   244805 164211  13642 2198  1682 2158   .988 6.037  .502  .062  .079

 c   g   419065 279432  24399 3585  3206 3796   .988 6.001  .524  .069  .082



1b   t   244805 247210  19629 1994 21838        .993 9.088  .722  .803

1b   g   419065 418629  33284 3681 39900        .992 8.991  .715  .857



2b   t   244805  58706  83712 2503 17445        .983 2.158 3.078  .641

2b   g   419065 101227 141417 4939 31429        .980 2.174 3.037  .675



3b   t   244805  20266  55217 3859  4592        .951  .745 2.030  .169

3b   g   419065  35137  92969 6994  8468        .948  .755 1.997  .182



ss   t   244805  44333  83451 4000 16365        .970 1.630 3.068  .602

ss   g   419065  76695 143548 7658 29614        .966 1.647 3.083  .636



lf   t   244805  55022   1879 1206   300        .979 2.023  .069  .011

lf   g   419065  97607   3215 2311   526        .978 2.096  .069  .011



cf   t   244805  72902   1611  972   386        .987 2.680  .059  .014

cf   g   419065 125194   2507 2000   666        .985 2.689  .054  .014



rf   t   244805  55953   2228 1182   445        .980 2.057  .082  .016

rf   g   419065  96373   3632 2330   769        .977 2.070  .078  .017

As you might expect, fielding percentages among infielders are a little higher on turf than on grass.

The rates of putouts and assists per nine defensive innings are almost identical, which probably says that (a) balls that are slowed down a little by grass are about equally offset by the truer hops you get on turf, (b) because there are exactly 3 outs per inning on both grass and turf, it would be hard for these rates to deviate by much, and (c) one can safely ignore the effects of grass and turf when evaluating players based on range factors or similar stats.

The DP rate is noticeably higher on grass for all infield positions. It's been said that players run faster on turf (based on noticeable increases in stolen base percentages on turf), so this difference is probably due to the turf runner getting to second a little more quickly and the batter getting to first a little more quickly.

Passed ball rates are about 4% higher on grass. My guess is that this is just random and doesn't mean anything.

Range Factor Illusions

For years, I've been writing about the danger of reading too much into range factors. This subject came to my attention again this summer when I received my copies of two terrific books from STATS, Inc.: the All-Time Major League Handbook and the All-Time Baseball Sourcebook. Both volumes are packed with valuable information, including some that you cannot find anywhere else. They're very expensive, but I recommend both.

The All-Time Major League Handbook includes season-by-season fielding statistics for every player, and for each position they played. I'm not aware of any other book with complete fielding information. An early edition of Total Baseball had some fielding data for players with relatively long careers, but this section was dropped in later editions. The MacMillan Baseball Encyclopedia has some fielding data for some players, but it's not broken out by position. In the electronic world, the Fan Park Electronic Baseball Encyclopedia has good fielding data.

The All-Time Major League Handbook includes each player's range factor at each position along with the league average range factors for that position. This is very important, because it gives you a baseline against which to evaluate the player. I wish all sources included league averages to help us put player numbers in context.

Unfortunately, the range factor statistic has some serious weaknesses, and if you're going to use these numbers to draw conclusions about fielding prowess, you need to keep these weaknesses in mind. STATS computes range factors using this simple formula:

  Range Factor = (Putouts + Assists) / Games

Here are some of the reasons why you need to be careful about ranking players based on range factors:

Missing positions. STATS doesn't compile range factors for pitchers, catchers and first basemen, so if you're looking for a way to evaluate fielding at these positions, you're out of luck.

Playing time. The first problem with range factors is that not all games are created equal. Defensive specialists who often enter games in the late innings or are frequently lifted for pinch hitters are still charged for a full game played despite having many fewer opportunities to make plays than the starters. As a result, some of the best fielders in the game have their range factors artificially depressed.

For modern seasons, we use defensive innings to get a more precise measure of playing time. Defensive innings are just like pitcher innings. If you are in the field when an out is recorded, you are credited with one-third of a defensive inning. Defensive specialists and other part-time players are more fairly represented this way. But the STATS All-Time Handbook doesn't use this method because they didn't start compiling defensive innings until the late 1980s.

Grouping outfield positions. For most of baseball history, the official fielding records lumped all three outfield positions together. The STATS book is no different. Almost all of the CFs in the book have range factors higher than the league average. And most of the LFs and RFs are below the league average. But that's largely because (a) the STATS data groups all OF positions together and (b) more balls are hit to CF than the other two positions. So, if you're looking at a player with a range factor of 2.20 in a season where the average outfield range factor was 2.00, and you don't know which OF position he played, you don't really know whether he was better than average or not.

Strikeouts vs balls in play. If someone plays behind pitchers who strike out a lot of batters, fewer balls are put in play. In 1996, for example, the Indians fielders saw 4516 balls put in play (excluding homeruns), while the Yankees defense saw only 4348, in large part because the New York pitchers struck out 106 more batters. And the Athletics defense was presented with 4720 batted balls, almost 400 more than the Yankees. These extra chances can inflate range factors very quickly.

Ground ball percentages. In 1996, the Twins infielders had a crack at only 1746 ground balls, while the Indians saw 2148 grounders. On the flip side, the Twins outfielders got to chase 1956 fly balls, more than any other defense. Range factors made the Twins outfielders look like Gold Glovers, and their infielders look weak, but it had much more to do with their pitchers than their defense.

Left/right splits. Some teams have more left-handed pitchers than others, which usually means they face more right-handed batters than others. And right-handed batters are about twice as likely to hit a ball to 3B than to 1B. In 1996, left-handed batters (including switch-hitters batting left) accounted for 43.5% of the atbats, yet the Indians saw 50.1% lefties. That's a difference of 371 batters.

Statistical quirks. Not all putouts and assists are created equal. A second baseman, for example, gets a putout each time he makes the tag on a steal play or takes a throw from a shortstop on a force play. Neither play has anything to do with his range, and both can be artificially boosted by playing with a great throwing catcher or a superior defensive shortstop, especially when playing behind a pitching staff that puts a lot of runners on base.

Degree of difficulty. Some fielders might have benefited from an unusually high number of routine plays, such as lazy popups and soft line drives, while others were cowering under a barrage of screaming line drives. It stands to reason that you'll see a few more tough plays when you're playing behind an awful pitching staff than when Maddux, Smoltz, and Glavine are on the mound.

An example -- Ryne Sandberg versus Frank White. For all these reasons, it's very hard to look at range factors and determine how much is due to the player's ability versus external forces. One interesting example is Ryne Sandberg's 1983 season, when he led the majors with 571 assists, a fielding percentage of .986, and a range factor of 5.74. The average range factor for 2Bs was 4.53, meaning that Sandberg had about 27% more chances per game than the average 2B. From these numbers, you would be forced to conclude that Ryno had the best range of any second baseman that season. But there's a little more to the story.

Sandberg's Cubs had a predominantly right-handed pitching staff that year. Consequently, when Ryno was in the field, 45% of the batters who put the ball in play were lefties, compared to a league average of 40%. That translates into 225 more lefties than normal, and since lefties are much more likely to hit the ball to 2B than righties, he got quite a few more chances to make plays.

The Cubs staff was third last in the NL in strikeouts. As a result, Cubs pitchers put 4663 balls in play, second most in the league, and 75 more than the league average. Sandberg got his share of those extra chances.

Sandberg picked up a bunch of assists and putouts on double plays, and he led the majors in DPs by second basemen with 126. But the Cubs staff had the highest on-base percentage in the league, meaning that he had more than his share of double play opportunities.

Finally, and most importantly, the Cubs pitching staff led the majors in ground ball percentage, and that translated into more than 300 extra ground balls over the course of the season.

When you remove the effects of facing 225 more lefties, seeing 75 more balls put in play, picking up some extra PO and A on double plays, and seeing 300 more ground balls, Sandberg's numbers are not all that different from the league average. Most of his impressive range factor derived from the large number of opportunities presented to him.

Please understand that I'm not knocking Sandberg. I'm just using his 1983 season to point out that an outstanding range factor doesn't necessarily indicate outstanding range. It can also mean "decent range and good hands and an unusually large number of balls hit his way."

The season before, in 1982, Frank White had a range factor of 5.21, which was 0.51 above the league average. He was helped a little by a Royals staff that put 64 more balls in play than the average team. On the other hand, that staff included more than the normal number of lefties and flyball pitchers, and White saw 317 fewer lefty batters and 175 fewer ground balls than is normal for a 2B who played the same amount. So, even though Sandberg's 1983 range factor was much higher than White's in 1982, I would argue that White had a much better defensive season, because he produced a lot of outs despite many fewer chances to make plays.

Note: Total Baseball publishes a number called Fielding Runs that rates players based on putouts, assists, DPs and errors. Like the simple range factor, Fielding Runs doesn't adjust for left/right splits, ground-ball percentages and some of the other factors I've listed. According to Total Baseball, White cost his team 15 runs defensively that year. I couldn't disagree more. When you take into account the characteristics of his pitching staff, I figure he was the best 2B in the league. He won the Gold Glove that year, and I believe the voters got it right.

Adjusted Range Factors. To help us come up with accurate range ratings for our past season disks, I developed a new type of range factor that adjusts for these external forces. It measures playing time by counting balls put in play while each fielder was at his position. It counts only those PO and A in which a fielder's range was really being tested (such as turning a grounder into an out, or catching a line drive or fly ball), while ignoring plays that don't measure range (taking a throw on a steal play, catching a popup). It tracks balls put in play by left- and right-handed batters separately. It produces values for every defensive position and treats the three outfield positions separately. And it adjusts for the fly-ball/ground-ball ratio of the pitching staff. The result is a measure of range that eliminates most of the biases of the simpler range factors published by others.

But I'd be remiss if I left you with the impression that I believe these adjusted range factors are the ultimate in fielding statistics. They rest on the assumption that one can make a much better estimate of the number of opportunities to make plays by taking these external forces into account. At the end of the day, it's still an estimate. And, regrettably, it's not yet possible to compute adjusted range factors for much of this century because we don't have enough play-by-play data for older seasons.

Consequently, when I'm rating fielders for Diamond Mind Baseball season disks, I use one of three methods. For modern seasons in which we have access to detailed play-by-play data that includes the location of every batted ball, I use a technique that evaluates the number and difficulty of the chances each fielder was presented with. For slightly older seasons (1980s), where we have play-by-play data without hit location data, I use our adjusted range factors. And for seasons without any play-by-play data, I look at the traditional measures of assists and putouts per game and try to make mental adjustments for playing time, strikeout rates and other factors.

So I'm happy to see that STATS is publishing the fielding statistics and range factors in their All-Time Major League Handbook, partly because I like to see more attention given to fielding, and partly because it gives me more material to work with when rating players for older seasons. But, please, do yourself a favor. Before you take their range factors as a pure measure of defensive range, stop to think about some of the things that might be distorting the numbers.

Copyright © 1998. Diamond Mind, Inc. All rights reserved.

Understanding The Impact of Coors Field

Tom Tippett discusses the impact of Coors Field 

As you know, offense goes sky high in Coors Field. We can see that in the park factors and the home/road splits for individual players. Here are some hitting stats for the NL as a whole (including Coors), Coors only, the NL without Coors, the Coors numbers prorated to 700 plate appearances, and park-adjusted norms for players who play half their games at Coors.

                 NL                 w/o       Per      Half

              Total    Coors      Coors     700PA     Coors

G              1007       72        935       162       162

AB            69049     5194      63855       638       638

H             18184     1637      16547       201       183

2B             3367      305       3062        37        34

3B              418       59        359         7         5

HR             1917      241       1676        30        23

W              6668      509       6159        62        62

K             13309      880      12429       108       116

R              9329      975       8354       120       102

AVG            .263     .315       .259      .315      .287

SPC            .408     .536       .397      .536      .464

Runs/Tm/Gm     4.63     6.77       4.47                5.62

What can we learn from this? Quite a bit, actually:

  • If Coors Field wasn't part of the 1995 NL, the league batting average would have been four points lower and the slugging average would have gone down by eleven points. If you're playing in a Diamond Mind draft league that uses NL players but doesn't include Coors, you can expect your league averages to go down by the same amount.
  • Similarly, if Coors was not included in your draft league, the league average runs per game would go down by .16 runs. Factor in the unearned runs, and you can expect to see the league ERA go down by fourteen points.
  • The fourth column is the most compelling. It says that an average hitter with the opportunity to play 162 games in Coors Field would pile up MVP-type numbers. And this is not Barry Bonds we're talking about. This is the average hitter, including pitchers, middle infielders and September callups. Everybody.
  • The fifth column helps us evaluate real-life players who played in Colorado last year. It's a 50/50 weighting of the Coors numbers in column two and the non-Coors numbers in column three, prorated to 700 plate appearances. These are the figures that the average hitter would compile in 81 games at Coors and 81 games at the other thirteen parks. In other words, any Rockies player who posted those numbers (or the equivalent for the amount of playing time they really had) is merely average. If you move them to another park, you can expect their numbers to drop quite significantly. Keep this in mind when you evaluate your draft choices next time around.
  • Several members of the 1995 Rockies posted great numbers that were largely dismissed because of the park effect. Was this fair? Looked at in this light, it's clear that Burks had a below-average year. Castilla and Galarraga were a little above average. Bichette had a very good year and deserved consideration for the MVP award. Even if he played in the average non-Coors park, he would have hit about .312 with a slugging average of .553, numbers that compare favorably with Bonds, Gant, Sanders, Sosa, and Conine. I'd make a roster spot available for him anytime.
  • If the Rockies had played in a neutral park, their pitching staff would have finished around fourth in the league in ERA and their hitters would have been around ninth. So although they ranked last in pitching and first in hitting, it was superior pitching that carried them to the playoffs.