DMB News April 1999

Diamond Mind Email Newsletter #2

April 23, 1999
Written by Tom Tippett

Welcome to the second edition of the Diamond Mind email newsletter. Through these newsletters, we will try to keep you up to date on the latest product and technical information about the Diamond Mind Baseball game, related player disks, and our ongoing baseball research efforts. We expect to produce new issues about once every two months, though we'll occasionally go to a monthly cycle when we have a lot of news or technical information to share. Back issues are available on our website.

If you don't wish to receive these messages in the future, please send an email response with the subject line "unsubscribe". We'll immediately remove your email address from the list. If you know someone who would like to subscribe to this newsletter, we'll be happy to add them to the mailing list if they send us an email message with the subject line "subscribe" and their name and street address in the body of the message.

Topics for this issue:

Contending with the media
Range ratings -- what's in a name?
Playing someone out of position
Baserunning Odds
Rating historical ballparks

Contending with the media

Have you ever read something in the paper or heard something on TV that was just so obviously wrong that you wanted to scream? This happened to me twice in a four-day period, both times involving Peter Gammons reporting on Dodgers SS Mark Grudzielanek.

The first was in Gammons' Boston Globe column on Sunday, February 28th. He quoted Dodger GM Kevin Malone as follows: "I think he [Grudzielanek] can be even better defensively than he is offensively, and he's a rare offensive shortstop."

The second was in Gammons' Diamond Notes segment on ESPN SportsCenter on March 4th, in which Gammons pointed out that Mark's defense was criticized in his recent arbitration hearing. He went on to say that Grudzielanek, who made 33 errors last year, agreed with the negative assessment of his defense and had sought out Bucky Dent to help him improve his footwork and fundamentals during the off-season. At this point, he commented that Grudzielanek is an outstanding offensive shortstop, one of only three shortstops in history who has batted .300 with at least 200 hits, 30 doubles, and 30 stolen bases in the same year.

Let's start with the notion that Grudzielanek is an excellent offensive shortstop. His career batting average is .279, which is good for his position. But he's walked a grand total of 89 times in his CAREER (2318 plate appearances). That translates into a career on-base percentage of .317, well below the .331 NL average during that period (and that average includes pitcher hitting). His career slugging average is .373, or 35-40 points below the league. Last year, he ranked 14th in OPS (on-base plus slugging) among the 19 shortstops with at least 502 plate appearances. And that was a fairly typical season for him -- his 1998 OPS of .673 was only 17 points below his career average.

Yes, he had one season (1996) where he had some impressive numbers in some categories (mainly hits and doubles), but even then his on-base percentage was only a little above the league average and quite inadequate for a leadoff hitter. Yes, he can steal 20 bases a year with a success rate near 80%. Yes, his stats have been depressed a little by playing in two pitchers parks. But, no, an emphatic no, Grudzielanek has not been an asset offensively. He's an average offensive shortstop who was outhit by Mike Bordick, Chris Gomez, Omar Vizquel, Royce Clayton, Rich Aurilia, Rey Sanchez and many others last year. And he's not even remotely in the same class as Alex Rodriguez, Nomar Garciaparra, Derek Jeter, and Barry Larkin.

Defensively, Grudzielanek's fielding percentage has been well below the league average three years running. And that's only part of the story. His range has also been average-to-below-average, even after giving him credit for getting to the balls he made errors on.

In fact, you could make a good case for starting Jose Vizcaino ahead of Grudzielanek on this year's Dodger team, assuming Vizcaino has fully recovered from his 1998 injury. Grudzielanek's career OPS is 26 points higher, but he has given back more than that by making so many more errors. And Vizcaino's range is at least as good and maybe better. If (and this is a very big if) Grudzielanek has improved his defense, he should be the starter. If not, it's a tossup.

So, here we have the Dodger GM calling Grudzielanek a "rare offensive shortstop" when he's really in the bottom half of his peer group. We have a major-league GM and a prominent TV analyst evaluating defensive skills solely on the basis of errors. And we have the same analyst using a misleading bucket of statistical achievements from 1996 to reach a completely bogus conclusion about Grudzielanek's offensive contributions.

And some people still wonder why we don't place more weight on comments by members of the baseball media and the baseball establishment when we put together our player ratings each year.

My purpose here is not to dismiss the media as a whole. There are plenty of smart, hard-working members of the media who say or write intelligent things that help me understand baseball better. I'm simply using this as an example to remind us that some of what we read and hear is pure nonsense.

Fortunately, when the comments focus on hitting or pitching performance, we have a host of statistical evidence to help us evaluate the opinions being expressed. Thanks to Baseball Weekly and the STATS Major League Handbook, it took me about three minutes to compile the information needed to refute the claims about Grudzielanek's offensive prowess.

Unfortunately, when these same commentators wax poetic about someone's fielding skills, we don't have a similar set of well-crafted defensive statistics to turn to. So it's a lot harder to tell when the praise is justified and when it's just as bogus as the claims discussed in this article. That's why we believe in doing original research. We simply want to gather the facts for ourselves rather than having to guess which "expert" opinions to trust.

Range ratings -- what's in a name?

For a while now, I've been regretting the decision to use the term "range" for our most important defensive rating. The term "range" implies the ability to cover ground. And while that's part of the story, it's not the whole story.

Our "range" rating actually measures each fielder's overall playmaking ability (minus throwing and his tendency to commit errors, as we have separate ratings for those things). And playmaking ability is not just about range, it's about the ability to turn all sorts of batted balls into outs.

Think about the types of plays a SS has to make. Yes, he certainly does need to backhand the ball in the hole and make the long throw, go to his left for that grounder up the middle, and range into foul territory down the LF line to snag a popup. But he also needs to leap for line drives, flag down sharply-hit one-hoppers, barehand slow rollers and get off a quick throw, and know when to try for a lead runner and when to take the safe out at first. And even if he reaches those balls in the hole and behind second, he needs the arm to throw out the hitter.

Because we rate players based on overall playmaking ability, there can be a mismatch between how a player looks on the field and how he's rated, especially if he's not so good at some of the less visible aspects of playing the position. And we only make matters worse by calling it a "range" rating instead of using a more comprehensive term.

Let me give you an example. I caught a Mariners game on satellite last week, and it featured these three plays involving 3B Russ Davis:

- on a grounder between third and short, Davis took three steps to his left and made a tremendous diving stop, then made a low throw to first that David Segui couldn't come up with. Two-base error.

- a couple of batters later, the batter dropped a nice bunt down the 3B line. Davis charged, made a nice pickup, and got off a late throw that almost eluded Segui. Bunt single.
- later in the game, Davis took a slow chopper, double-clutched trying to get the ball out of his glove, then made a very strong and accurate throw. But the batter beat it out for a single.

In our system, the first play doesn't hurt Davis's range rating, because we have a separate error rating. In fact, this play helps him, because he got to a ball some 3Bs would not have reached.

But it leaves us with three plays that produced no outs. It's hard to question Davis's range, since he got to every ball and made one outstanding stop. But he didn't get the results due to a bad throw, (possibly) playing too deep on the bunt, and the inability to get the ball out of his glove cleanly.

This sequence illustrates the variety of plays that a fielder must be able to make, and the variety of skills -- physical and mental -- needed to perform at or above the major-league level. It's part of the reason why I prefer to rate players based on their demonstrated play-making ability rather than appearances and reputation. And it shows why the term "range" doesn't encompass all that goes into our rating.

So I think we need a better name. Maybe we should call it a "playmaking" rating in the future. That's more accurate, but it doesn't exactly roll off the tongue. If you have any ideas, please let me know.

Playing out of position

We've received a number of email messages from people who ask how the game treats players who are playing out of position. The answer is ... it depends.
Players can make a relatively painless transition to an easier position that is similar to one they're already rated for. The penalties are much greater for moving to a very different position that is also more difficult to play.

For example, a CF can play LF or RF without suffering much at all. Both positions are similar and easier than the one he's rated for. A LF or RF moving to CF has a more difficult time because there's more ground to cover. Similarly, a move from SS to 2B won't cost you too much, while a move from 2B to SS will hurt more. And the moves that will hurt the most are (a) from any position to catcher, (b) a catcher moving to any position except 1B, and (c) a 1B moving to CF or another infield position.

How will these penalties show up? In lots of ways. More balls in their zones will go for hits. They'll make more errors. Guys without outfielder throwing or catcher throwing ratings will be easier to run on. Unrated catchers will have more passed balls. Unrated middle infielders won't start as many double plays on balls hit to them, and they won't turn two as often when they're the pivot man on the play.

You might ask why we apply penalties even when a player is moving to a less difficult position. Couldn't a top-rated SS play 2B as well or better than the average 2B?

In the dozen years I've been assigning fielding ratings, I've seen a lot of players get higher ratings when they make the transition from a harder position to an easier one (especially SS -> 2B, 3B -> 1B, and CF -> LF), but I've also seen plenty of cases where the player needed some time to learn how to play the new position.

Every position requires mastery of a different set of skills. A CF moving to RF needs to learn how to play the caroms on balls hit down in the corner. A 3B needs great reflexes to handle the hot smashes that come his way, and that might not be the strong suit of a middle infielder moving to 3B. A SS moving to 2B must learn how to make the pivot with his back to the runner.

And I believe that fielders need to see a lot of balls come off the bat from the vantage point of their new position before they learn how to get the best possible jump on the ball.

If our out-of-position adjustments assumed that every player could instantly adapt to a new position, even an easier one, I think it would create too many opportunities for managers to abuse the game by moving players around in ways that real-life managers would never get away with. So the game imposes penalties of varying degrees on all out-of-position players.

Having said that, it's also true that we are considering one change for version 8 and beyond. It seems to me that the move of a CF to a corner outfield position might be the one case where the positions are so similar, and players are so accustomed to being shifted from one position to another in real life, that no penalty should be applied.

Baserunning odds

Over the years, I've sensed that a lot of baseball fans don't have a good handle on the frequency with which runners take extra bases on different types of hits and in different situations. And because Diamond Mind Baseball doesn't make life easy for you by saying that you have a 72% chance to take the extra base on this play, I thought it might help to provide with some general information that might help you make better baserunning decisions.

As is our custom when we want to learn something, we developed a program to compile baserunning information that helps us in the design of the baserunning system and in the assignment of running and throwing ratings. The program counts how often the key runner holds his base, is out advancing, or is safe advancing. It looks at different game situations -- how advance rates vary when there are two outs, how the playing surface affects these rates, and so on. And it ignores plays where baserunning options aren't available -- infield singles and ground-rule doubles are two good examples.

The key runner is the first runner who doesn't automatically score on a play. On a single, we ignore the runner on third and focus on whether the next runner goes from second to home or first to third. On a double, we concede the run to the guys on second and third, and focus on whether a runner scores from first. On fly balls, we look at the lead runner, wherever he is.

Here's an example of the information our program compiles at the player level. I picked these two guys because one (Tony Womack) is a leadoff hitter who steals a lot of bases and the other (Dave Hollins) is a player who consistently ranks at the top of our running ratings despite getting no press in this regard.

                          Tony Womack            Dave Hollins

                          ------------------     ------------------

  Single, 1st->3rd         4 of 18               14 of 17

  Single, 2nd->home       13 of 16, out once      6 of 10, out once

  Double, 1st->home        2 of  3                5 of  8, out once

NOTE: This is summary data. When we assign baserunning ratings, we look separately at balls hit to LF, CF and RF and situations where there are less than two outs vs two outs.

Even allowing for the fact that Womack played much more often on artificial turf, and that it's a little harder to get an extra base on a turf single because the ball reaches the outfielder more quickly, it's still surprising to see Hollins going first to third so much more often. Womack's performance in the other two categories was terrific, enough to earn him the Vg rating, but his below-average scores on singles were enough to rank him below Hollins overall. I think the baseball media underemphasizes the mental part of the game, and being able to get a great jump by reading the ball off the bat is a big part of running the bases that is rarely discussed.

Now let's turn our attention to the overall averages. In 1998, on grass fields, runners went from first to third 19% of the time on singles hit to LF. The rate was 15% on turf. I'll bet most baseball fans think the rates are higher than that. On singles to center, the rates were 40% and 35%; to right, 48% and 42%. In other words, the runner holds second more than half the time no matter where the ball is hit. And the rate doesn't increase significantly with two out -- the runner can still score from second on a two-out single, so runners and their coaches don't take chances.

The rates are higher for runners scoring from second on singles. Coaches are more willing to gamble when a run is there for the taking, and throws from LF and CF must travel further to reach home plate than third base. (Most places in RF are about the same distance from third and home.) With 0 or 1 out, runners score from second 55-65% of the time, with the rate rising to 85-90% with two out. As with the first-to-third case, runners take a few more bases on grass fields than on turf.

When the batter doubles, the most important factor in whether the runner can score from first is the type and location of the batted ball. A double in the corner requires a much shorter throw than a ball that goes to the wall in the gap. Last year, with less than two out, runners scored from first 20-30% of the time on doubles hit to the LF or RF. (The high end of the range was for balls hit to left, reflecting the fact that teams generally put better arms in right field). The rate was 50-55% on balls handled by the CF. If there were two out, the rates rose to 45-50% for LF/RF and 70-75% for CF.

By the way, real-life teams are very conservative. For every 100 chances to send a runner to third, only about one produces an out. On a typical single to left, for example, you might see 81 runners held, 18 advance, and one thrown out. The out rate rises to about 3 in 100 chances on plays at the plate.

One could make a very good case that scoring would increase if more chances were taken, but knowing how much criticism is directed at a coach or manager when someone is thrown out, I'm not surprised at how cautious they are. I suspect that DMB managers are more aggressive than their real-life counterparts, especially with runners with Ex and Vg running ratings.

So, when you're thinking about your baserunning decisions, keep in mind that the real-life advancement rates are quite low in many situations. Even the best runners don't often go first-to-third on a single to left, and nobody goes first-to-third every time.

And remember that Diamond Mind Baseball never asks you if you want to take the extra base if the choice is obvious. If there's no chance, the runner always holds. If the advance is certain, the base is given automatically. So anytime you're asked whether to take the base, there's a chance the runner can be thrown out, no matter how good the runner and how bad the outfielder's arm might be.

Rating historical ballparks

A few people have asked for advice on rating historical ballparks, and I promised them I'd describe our methods in the next newsletter. This is the next newsletter, so here goes...

If you look at the park form, you'll see several groups of related items: (a) information describing the physical aspects of the park, such as wall distances and the size of foul territory, (b) weather information, and (c) statistical park factors. Of these groups, the statistical park factors are by far the most important, and that's what I'll focus on here. Before getting into those details, however, let me take a moment to talk about how these three groups are used in the game.

The statistical factors determine the park's overall impact on the rates of singles, doubles, triples and homeruns that are generated in that park. Because these are overall ratings, they reflect everything that affects games played there -- wall distance, wall height, size of foul territory, playing surface, altitude, quality of the hitting background, shadows, prevailing winds, temperature, and so on.

The weather information doesn't *directly* affect the overall rates of singles and homers and so on that will occur over the course of a season. If the park is rated for 30% more homers than average, it might yield 50% more on a day when it's hot and the wind is blowing out. Or 10% less on a damp April day with a strong wind blowing in. But over the course of the season, the weather won't change the fact that the park is rated to yield about 30% more homers than average.

The physical characteristics of the park affect how outcomes are distributed within the park, but they don't affect the overall rates. If the walls are low and close in one part of the field, you'll see more homers hit to that part of the park than one where the walls are high and deep. But changing the height of a wall won't affect the overall rate of homers in the park unless you also change the homerun factor.

The weather and physical information is also used by the game to determine whether certain types of plays make sense or not, and this in turn affects the play-by-play commentary. You'll see more ground rule doubles on artificial turf than on grass, and you won't see ground rule doubles over the left-field wall in Fenway. The weather on a particular day will affect whether fielders lose balls in the sun, slip on wet turf, or catch a deep fly ball that might have gone out on another day.

But you cannot change the number of homers generated by a park simply by bringing the walls in and making the wind blow out more often. To add or subtract homers, you need to change the homerun factor.

OK, let's move on to how one assigns these factors. As you probably know, there are eight statistical factors in the game -- four (singles, doubles, triples and homers) each for LHB and RHB. And nobody publishes this information. So how do we come up with these factors?

For modern seasons (1980 to the present), we compile our own using play-by-play data from Total Sports and some software we wrote. For older seasons, we start with the data in the Home-Road section of the Macmillan Baseball Encyclopedia. This section gives us two pieces of the puzzle -- the number of homers hit at home and on the road, and the number of runs scored at home and on the road.

Using this information we can compute two important numbers for each park. For example, 670 runs were scored in the 78 home games played by the 1948 Cleveland Indians, for a rate of 8.59 runs per game. In their 77 road games, the teams scored 738 runs, or 9.58 per game. As a result, we can conclude that the Indians home park reduced scoring by 10%.

The same table tells us that 118 homers (1.52 per game) were hit in Cleveland and 119 (1.55 per game) were hit in Cleveland's road games, leading us to conclude that the park reduced homers by 2%.

Unfortunately, neither of these numbers translates directly into the eight factors we need. If we believe the park affects LHB and RHB the same way, we can go with a homerun factor of 98 for both sides. If not, we need to figure out how to come up with separate factors for lefties and righties. And the portion of the run rate that's not explained by the homeruns must be split into factors for 1B, 2B and 3B. How do we do that?

To get a feel for the park, we use the information in various ballpark books (such as Green Cathedrals) to make judgments about the likely impact of the park on the rates of doubles and triples. And whether the park appears to favor LHB or RHB.

But even with this information, there's no simple answer. We don't have a formula that says "if the park decreases runs by 10%, choose factors X and Y for doubles and triples". So the best you can do is choose values of X and Y that are consistent with a 10% decrease in offense. But how?

The starting point is the relationship between slugging percentage (SPC) and scoring. A few years ago, I plotted the league-wide SPC against the number of runs scored per game for every league-season this century and found a very strong correlation between SPC and runs per game. (In the early part of the century, it was necessary to treat errors as singles to make this work, as the number of runs scored was inflated by the higher error rates of that era. And I later learned that Dick Cramer, founder of STATS Inc., had long ago come up with a terrific little formula that predicts runs by multiplying the on-base percentage by total bases for a team or league.)

So how do we use this relationship? To give us a reality check. Once we've chosen the eight factors, we can calculate the impact they would have on SPC, and thereby learn whether the factors we've chosen are consistent with the overall run factor we're shooting for.

But to translate the factors into a projected effect on SPC, we need to determine how much weight to give to each factor. We start by making an estimate of the percentage of plate appearances by LHB. If you assume that lefties get about 1/3 of all plate appearances, you won't be off by much in most years.

Then we look at the league totals to determine how many total bases came from singles, doubles, triples and homers. Why? Because increasing the triples rate by 10% has almost no impact when triples are scarce and a large impact when triples are more common than homers, as they were at the turn of the century.

Let's use the 1948 American League as an example. Doing a little arithmetic on the numbers in Total Baseball, we find that there were 16088 total bases -- 8206 on singles, 3692 on doubles, 1350 on triples, and 2840 on homeruns. In other words, singles accounted for 51.0% of total bases, doubles 23.0%, triples 8.4%, and homers 17.6%. If we split these percentages between LHB and RHB, we get the following weights for the eight factors:

           Singles    Doubles    Triples    Homeruns

  Left       .170       .077       .028       .059

  Right      .340       .153       .056       .117

And the impact on SPC, and therefore on scoring, of our eight factors is:

  RF = .170*LS + .077*LD + .028*LT + .059*LH + 

       .340*RS + .153*RD + .056*RT + .117*RH

where RF is the run factor and LS/LD/LT/LH are the factors for singles, doubles, triples and homers for lefties, and so on.

Let's pause for a moment to recap. First, we get the Macmillan data to give us the target homerun factor and overall run factor to shoot for. We compute weights for the eight factors by making an assumption about the percentage of plate appearances by lefties and by computing the total bases by hit type using Total Baseball. Then, using the homerun rates from Macmillan plus everything we've learned about the park from Green Cathedrals and other sources, we come up with estimated values for the eight factors and let this formula tell us whether they produce the impact on scoring that we're looking for. If they don't, we adjust the factors until they do. And we do this for all the parks in the league.

If this sounds like a lot of work, it's because it is. We will soon finish a project in which we are developing park ratings for every season since 1901. We already had over 60 of those seasons done as part of our Past Season collection, but it still took months of part-time work to do the rest. The good news is that we will soon have a complete set of park ratings for the 20th century, and we plan to include this database in version 8 so anyone creating players with future versions of Diamond Mind Baseball won't have to do park factors on their own.

I want to leave you with a few other thoughts. First, you'll find team-seasons where the homerun factors and run factors don't seem to make sense relative to one another. For instance, the park might have yielded a lot of homers but reduced scoring by a large margin. The only way to keep the homerun factor high while reducing SPC enough to dampen scoring by the "right" amount may be to choose an inappropriately low singles factor (see the next paragraph for more on this). It's likely this is just a statistical fluke. Park effects can fluctuate by quite a bit from year to year, especially in categories where few hits of the given type are generated (triples today, homers from the dead ball era). So you might have to smooth the data a little to keep things reasonable.

Second, be wary of singles factors that deviate a lot from 100. If you think about it, there aren't that many ways a park can affect singles. And because singles comprise a large percentage of all hits (66% in the 1998 AL), the singles factor has a dramatic impact on batting averages in that park. If you give a park a singles factor of 80, for example, you're removing 20% of all singles, and that's enough by itself to reduce batting averages by 36 points. We have data for 19 seasons worth of modern parks, and Coors Field is the only park we've found that justifies a singles rating outside the range of 88 to 112, and most other parks are clustered in the 96 to 104 range.

Finally, once you're finished rating the parks for a league, make sure that the factors in each category average to 100 or very close to it. For instance, if you end up with a set of homerun factors that average 110, your players will end up hitting more homers than they should.

For those of you who are creating historical parks, I hope this discussion has been useful. If it turns out that I've been unclear about something, feel free to write with your questions. If I'm not too busy working on version 8 at the time, I'll answer them right away. Otherwise, I'll address them for everyone in the next issue of the newsletter.