DMB News April 2003
Diamond Mind Email Newsletter
April 30, 2003
Written by Tom Tippett
Welcome to the second edition of the Diamond Mind email newsletter for the year 2003. Through these newsletters, we will try to keep you up to date on the latest product and technical information about the Diamond Mind Baseball game, related player disks, and our ongoing baseball research efforts. Back issues are available on our website.
Topics for this issue:
For a while, we've been looking for the best way to publish small pieces of baseball commentary and research, items that may not warrant a full article (such as the ones we've been writing for ESPN.com since 1998) or items that would be outdated by the time our next email newsletter is due to be issued. A weblog is well suited to this purpose.
For those of you who aren't yet familiar with weblogs -- and I was one of those people until a few months ago -- a weblog is nothing more than a web page with a series of writings on a particular topic. Our topic, obviously, is baseball, and we'll post things to our weblog whenever we feel we have something to say.
It might be a comment about a managerial decision or a defensive play we've just seen in a game. It might be a more general comment about the progress of the season. Or a small bit of research that doesn't warrant a full article.
Since we last published a newsletter, we have released the 2003 Projection Disk, published our 2003 Projected Team Standings article, collected a bunch of predictions by others so we can rank them after the season, and watched about four weeks of baseball.
There's a good deal of consensus among of the predictors this year:
- in the AL East, everybody thinks the Yankees and Red Sox, in that order, will top the division. Athlon and Lindy's have Baltimore third; everyone else has the Blue Jays in that spot.
- in the AL Central, there's a fairly evenly split between the Twins and the White Sox to win the division, but everyone agrees that they'll wind up one and two. Cleveland was a unanimous choice to finish third.
In our preseason simulations, Cleveland was three games better than Kansas City, but I'd have to give the nod to the Royals now. As of April 28th, they were already 11-1/2 games up on the Indians, and I'll be surprised if Cleveland can make that up. Of course, a better question is whether the Royals can hang with Minnesota and Chicago all year. If they can, it'll be fun to watch.
- in the West, everyone picked Oakland or Anaheim to win the division, with the vast majority placing Seattle third and Texas fourth.
Our simulation results put Anaheim and Seattle in a virtual dead heat for second place. Seattle averaged 0.3 more wins per season, but Anaheim had a slightly better run margin. Because wins are more important than runs, we're going with Seattle as our pick for second for the purposes of our postseason prediction rankings.
- in the NL East, a small majority favored Atlanta over Philadelphia, with one lone voice predicting that the Mets will take the division. In our simulations, the Mets were last, a view nobody else shared. Time will tell whether we alone were able to see the truth or whether there's safety in numbers.
- the NL Central is a lot like its AL counterpart in that there's complete agreement about the two teams that are expected to finish one-two and a sizeable split as to which will be on top. We had St. Louis first with Houston second. It's worth noting that the Cubs took first place in 14% of our simulations, so the Cards and Astros can't just focus on each other.
- the NL West is another division where our results differed from the consensus. Several others predicted that the Dodgers would win, but we had them fourth because they didn't score enough runs.
In our fifty seasons, the Giants averaged about one more win than the Diamondbacks, while the majority of other forecasters had Arizona finishing ahead of San Francisco. We have Colorado finishing third, and only one other publication projected them to finish in that spot; everyone else had them fourth or fifth.
With the season a month old, I'm quite excited about how things are developing. It's awfully early, of course, but it's nice to see potential surprises brewing in a number of the races.
In recent weeks, we've made great progress in the development of our All-time Greatest Players Disk, and we hope to be able to begin shipping it in the next couple of months.
When we began this project, we weren't sure whether to focus on franchises or players. A franchise focus would have meant going through each franchise and finding the 32-or-so best players in its history. A player focus would have meant finding the best players in history, regardless of who they played for.
The advantage of the franchise approach is that we are pretty much guaranteed to wind up with rosters that cover all of the positions and pitching roles. The disadvantage is that we could end up including less-than-great players just to fill out a roster or excluding other great players who had the misfortune of playing for a team with a long history and a glut of other stars.
So we decided to go with the best players. We've completed most of the ratings and statistical work, and are beginning to think about how to organize that pool of 1100+ players into meaningful rosters.
We could, I suppose, just ship a database of players and let you assemble your own rosters, but we believe it will be a better product if we create teams that you can use right away.
Our tentative plan is to take this pool of players and allocate them to teams that match real-life franchises as much as possible. Certain franchises have too many stars for one team, so we'll split them into two teams. Other franchises are too young to have accumulated a full and balanced roster, so we'll combine them.
It remains to be seen whether this approach will give us a collection of teams that we're happy with. If not, we'll try something else. Of course, no matter what we end up with, you'll be able to use various DMB tools to assemble your own rosters and/or transfer these players to other DMB databases.
Another fundamental question was how to rate the players: Single-season? Peak years? Full career? If peak years, how many years, and which ones? And so on.
We decided to go with a series of consecutive peak years. For each batter, we've chosen his best group of consecutive seasons totaling at least 6000 plate appearances. That means eight years for some players and twelve for others, but it's a meaningful amount of playing time in any case.
Why peak years? Some all-time greats had mediocre-to-poor seasons at the beginning and/or end of their careers, perhaps because they were called up at a very young age or because their names allowed them to hang onto a job after they had lost much of their ability. If we used entire careers, these stars would not stand out from the crowd as much as they should.
Why 6000 plate appearances? We're not wedded to this number, and we may decide to bump it up or down before we're done. We do feel it's important to include a lot of playing time so lesser players with one or two really good seasons aren't rated as highly as others who sustained their success over a much longer period.
If a player had a career of less than 4000 plate appearances, he's not eligible for this disk, at least for now. (We may lower that threshold in the future.) If he reached that mark but fell short of 6000 plate appearances, we're rating him based on his entire career.
Why consecutive seasons? Because we want to end up with complete players, not hypothetical combinations of skills that may not have existed together at any one point in the player's career.
Players tend to change with age. An outfielder might start out as a superior center fielder with great speed and enough hitting skills to be an asset at the top of the order. As he ages and fills out his body, he might move to left or right field, run less frequently, take more walks, add power, and move to the middle of the batting order.
If we take the approach of rating the player based on his best N seasons, regardless of where they fell in his career, we run the risk of winding up with a few of his early years and a few of his later years. That might be fine in terms of his batting ratings, but what about everything else?
We run the risk of creating a power hitter who could also play a great center field and steal a bunch of bases. Each of his ratings (hitting, defense, running) might reflect his best level in that one area, but as a group, those ratings suggest a player who could do all of those things at once, something that probably didn't happen in real life.
By choosing a series of consecutive seasons, we reduce the risk of combining hitting and non-hitting ratings that don't really belong together. The risk doesn't go away completely, but at least we're putting some boundaries on it. We're no longer faced with the problem of how to combine an age-23 season with an age-38 season.
The question of how to choose each player's best years wasn't the last one we needed to answer. We also needed to choose the period of baseball history from which we would draw the players, and how to create a level playing field in the face of changes to the rules, equipment, ballparks, and other aspects of the game.
How far back in history should we go? In the early years of professional baseball, leagues came and went, teams often folded or moved during the season, and rules were changed on a regular basis.
We've chosen 1894 as our starting point because the rules had settled down to something very much like what we use today. One could argue that 1903 would be an even better choice because it was the first time foul balls counted as strikes in both leagues, but we didn't want to exclude the top players from the 1890s.
How close to the present should we go? We decided to go all the way. Our greatest players disk will include active player who have met our minimum playing time limits and have performed at a high enough level to qualify.
An obvious implication of this decision is that a year from now we'll have more current players who have met that minimum plus one more year of performance upon which to rate the active players. As a result, we expect to release updates to this disk as time goes on. (Schedule and pricing to be determined.)
How do we level the playing field? We take each player season, adjust the stats for the effect of the player's home park, and evaluate those park-adjusted stats relative to the appropriate set of league averages. This is no different from how we rate players on a single-season basis.
In other words, we're adjusting every player's stats for the time and park in which he played. We don't want to overrate pitchers from a low-offense era like the 1960s or overrate hitters from a high-offense era like the 1930s or 1990s.
After we've decided which group of consecutive seasons is the player's best, we combine them, giving more weight to seasons in which the player had more plate appearances, and express the resulting set of park- and era-adjusted statistics in terms of a neutral era.
All of these examples were expressed in terms of batters, but the concepts apply equally to pitchers. They're also being rated based on era- and park-adjusted stats. And we're making similar historical adjustments when assigning error rates to fielders.
Our belief is that we're in the process of creating something quite unique. We're not aware of any other systematic approach to rating historical and current players based on peak years with adjustments for era and park effects.
We hope to have our All-time Greatest Players disk available for shipment in June. When we've set the price and have a more precise fix on the projected ship date, we'll add this product to our price list and web store and begin taking orders. In the meantime, we'll keep you posted via our web site as more information becomes available.
Although we've chosen to write about our All-time Greatest Players project in this newsletter, it doesn't mean that we've forgotten about version 9. We continue to make good progress on version 9 in parallel with our greatest players project.
In February, we created a version 9 page on our web site and described some of the features we're working on. As time goes on, we may add to the features list, and we'll get a better handle on the projected ship date. As we know more, we'll let you know through the web site and this newsletter.
Many of you already know this, but in case you don't, here's a reminder about two great games that use baseball simulation technology from Diamond Mind:
Diamond Legends (http://legends.stats.com/intro.asp)
ESPN Classic Fantasy Baseball (http://games.espn.go.com/legends/frontpage)
Both games enable you to join a league, select from a large pool of historical players to build the best team you can within your budget, set up a manager profile, and play a 154- or 162-game season against the other teams in your league.
A custom version of the Diamond Mind game engine is used to play the games. As the season progresses, you can log in to the web site any time to make roster moves, view boxscores, check the standings, browse the leader boards, and pore over a wide range of statistical reports.
Closers always get a lot of attention, but it seems as if they've been in the news even more than usual this year:
- the Red Sox took the money they would have spent on Ugueth Urbina, signed several veteran relievers (Embree, Timlin, Mendoza, Fox), and announced that it doesn't make sense to save your best pitcher for the 9th inning when games are often decided in the 7th or 8th
- several of the game's leading closers (Nen, Hoffman, Isringhausen, Rivera) have yet to pitch because of injury
- some established closers (Benitez, Sasaki) have blown more than their share of saves already this year, while others (Jimenez, Williams, Koch) have sky-high ERAs that haven't yet translated into blown saves
- on one recent day (Sunday, April 27), Kansas City couldn't hold a 9-4 lead in the ninth, St. Louis coughed up a 5-run lead in the 9th and we're forced to go 20 innings before taking home the win, and Boston's pen failed to hold a 4-2 lead handed over by Pedro Martinez.
I'm not going to try to sort all this out at once, but there's one aspect of closer performance that has been tugging at my mind for a while. It's the messy save.
The closer comes in to start the 9th with a two-run lead. He walks the leadoff hitter, gets a strikeout, gives up a single, gets the second out on a long fly ball, then blows away the last hitter with high heat.
Mission accomplished. The team wins. The closer notches another save. But the fans are ticked off because it wasn't clean enough:
"I almost had a heart attack! The tying run was on base. The potential winning run was at the plate. For the money we pay these guys, they ought to be able to retire the side in order, wouldn't you think?"
So I decided to look at one-two-three innings. How often do they happen? Are they more common for starters or relievers? How much different is the 9th inning? Does the score matter?
I've been playing around with this analysis, but there's still some work to be done. Time permitting, I'll turn this into a full article sometime. For now, here are some of the early findings based on the period from 1998 to 2002:
- approximately 29% of all innings were 1-2-3 innings
- there wasn't much of a difference between starting pitchers (29.1%) and relievers (28.4%)
- when a team was leading by 1 to 3 runs in the 7th inning or later, 1-2-3 innings were registered 29.4% of the time
- the highest rates of 1-2-3 innings were recorded in the 9th. Starting pitchers polished off their foes in order 31.3% of the time while relievers posted a 30.4% mark.
There is a large amount of selection bias here -- the guys who get to start the 9th inning in a close game are not average pitchers, they're much more likely to be established closers or starters with names like Johnson, Schilling, or Martinez.
- when a team entered the 9th inning with a lead of one to three runs, the rate of 1-2-3 innings was 32.5%, or just less than one third. Again, the top starters and closers get most of these opportunities, so there is some bias here.
In other words, if you can get through an inning without allowing a baserunner about 1/3 of the time, you're doing pretty well. Of course, that means that if you've been asked to protect a 1-run or 2-run lead, you're going to bring the tying or winning run to the plate more often than not. And some fans might think you've come up a little short even if you get out of the jam with the lead intact.
Here are the top five 1-2-3 performances in 2002 by pitchers when starting the 9th inning with a lead of 1 to 3 runs (min 20 innings):
Inn 123 Pct Isringhausen 31 16 .516 Smoltz 51 25 .490 Guardado 48 23 .479 Percival 34 16 .471 Wagner 35 16 .457
The bottom five:
Inn 123 Pct Irabu 20 5 .250 Alfonseca 25 6 .240 Kim 32 6 .188 Acevedo 29 5 .172 Yan 25 4 .160
And some middle-of-the-pack guys of note:
Inn 123 Pct
Koch 47 17 .362
Urbina 41 14 .341
Nen 45 15 .333
Julio 26 7 .269
Benitez 34 9 .265
Staying with these 9th inning, small-lead situations, but looking at the non-closers:
- Steve Karsay (6 for 9), Damaso Marte (5 for 9), and Scott Williamson (4 for 8) were impressive in limited opportunities
- TJ Tucker (0 for 6); Steve Kline and Dan Smith (0 for 4); Danys Baez, Mike Stanton and Mike Myers (1 for 6); Matt Herges (1 for 8); Bob Wickman (4 for 19); and Braden Looper (3 for 14) didn't inspire as much confidence, at least in terms of 1-2-3 innings
The bottom line is that fans shouldn't be surprised when the other team threatens in a save situation. It's not easy to get a 1-2-3 inning, and, more often than not, even the best closers are going to give the other guys a chance to tie or win the game on one swing.
We got a call this morning from someone who had joined a DMB league and wanted to buy the DMB game. Because the league is based on the 2002 Season Disk, we said he'd need to buy the game and the season disk in order to play in the league. He said the league commissioner had told him that all he needed was the game because the stats and ratings would be distributed by the league.
Whether he knew it or not, the commissioner was wrong. Our license agreement clearly states that it is a violation of copyright law to distribute a league database to members who do not already own the related season disk(s).
We spend a lot of time and money to acquire and evaluate the play-by-play data needed to put our annual season disks together. Drafting new rosters doesn't change the fact that a league database is full of stats and ratings that are copyrighted by Diamond Mind and STATS, Inc.
Most league commissioners understand this and work with us to make sure they're not inadvertently sending copies of our products to league members who aren't entitled to receive them. Obviously, as is evident from that phone call, there are some people who either aren't aware or don't care.
We have always tried to deal honestly with our customers. For example, we offer a money-back guarantee so you can return products that don't meet your needs. We provide free technical support because we feel that anyone who spends their hard-earned money with us deserves our help when they need it. And we have resisted using copy-protection schemes because they inconvenience the many honest customers who simply want to make backups or do reasonable things like install the game (for their personal use) on a home computer and a laptop at the same time.
To the majority of you who have always respected our hard work and our legal rights, we will always be grateful for your business and your support. To others who might be violating our rights by distributing league databases to non-owners of the relevant season disk, either knowingly or unknowingly, please do the right thing.