DMB News October 2001

Diamond Mind Email Newsletter

October 18, 2001
Written by Tom Tippett

Welcome to the fifth edition of the Diamond Mind email newsletter for the year 2001. Through these newsletters, we will try to keep you up to date on the latest product and technical information about the Diamond Mind Baseball game, related player disks, and our ongoing baseball research efforts. Back issues are available on our website.

Topics for this issue:

Product announcements / October mailing
Twelve new or upgraded past seasons now shipping
2001 Season Disk
Post-season technical tips
Customer-written utilities for version 8
Book review: "Curve Ball"

Product announcements / October mailing

In recent weeks, we have been very busy putting the finishing touches on a number of new and upgraded season disks that we'll describe in the next two topics. All of these products are now available for immediate shipment.

And we have already started work on the 2001 Season Disk, which will be ready for shipment on or before December 12th, and we are now taking advance orders for that item as well.

Starting today, and for the next two weeks, we are sending a mailing to all registered owners of version 8 that includes a letter describing these new season disks, an order form, and a postage-paid reply envelope for anyone who wishes to order by mail.

If you wish to order with a credit card, you can do so in three ways. The most convenient method is to use our online store by following the link from www.diamond-mind.com. You can also send us a credit card order by mail or call us at 800-400-4803 during our normal business hours (9-5 Eastern time, Mon-Fri).

Twelve new or upgraded season disks now shipping

In the past month, we have begun shipping four new Deluxe Past Seasons (1978 - 1981), two new collections of the best teams in history (All-time Greatest Teams #3 and #4), two new Classic Past Seasons (1935 and 1940), and upgraded editions of the 1982 through 1985 Deluxe Past Seasons that now include real-life transactions and lineups. All of these items are available only in version 8 format.

We told you about the 1978 - 1981 seasons in the last newsletter, so we'll get right to the news about the others.

AGT #3 and #4 contain 24 teams each, and with the addition of these 48 teams, our greatest teams collection now includes at least one team from every franchise that has operated since 1901 (with the exception of Colorado and Tampa Bay, which have yet to post a strong enough season to qualify). Our web site includes a full list of the teams on each of these disks.

With the addition of 1935 and 1940 to our Classic Past Season collection, we now offer all but one season since 1927. That missing season is 1943 and should be available next year.

We recently added a complete set of real-life player transactions and game-by-game starting lineups to our 1982 through 1985 Deluxe Past Seasons. If you are a registered owner of these seasons, you can upgrade to the latest editions for $5 per season. And if these upgrades prove to be popular, we will add transactions and lineups to the 1986-1995 season disks in the future.

2001 Season Disk

We're already hard at work on the 2001 Season Disk. Advance orders are now being accepted, and we expect to begin shipments by December 12th.

As usual, you'll get a ton of information with this season disk. Full rosters that include every player who appeared in the big leagues this season. Official batting, pitching and fielding statistics, including some statistics that you won't see anywhere else. Left/right splits for all batters and pitchers. Games started by position versus left- and right-handed pitchers. A full set of real-life transactions and game-by-game lineups for season replays. Two schedules, the original (as-scheduled) schedule and another (as-played) reflecting rainouts and other rescheduled games. Real-life salaries for all players. And complete manager profiles for all teams.

The 2001 Season Disk will be available only in version 8 format.

Post-season technical tips

Just prior to wrapping up our work on version 8 last year, we added several tools to help you set up and play one or more post-season series after your regular season has been completed.

Those tools were designed to allow you to emulate the post-season setups that have been employed by the big leagues over the years. That is, up to four teams from each league can participate in the post-season, and if you are operating two leagues that belong to the same organization, the league champions can meet in an organization championship series to decide the overall winner.

In recent weeks, several customers have asked whether we can enhance these features in two ways -- by allowing more than four teams from one league to participate and by making it possible to play an organization series even if you didn't set up an organization before the season started.

As we go forward, we'll see if we can modify the game to allow more teams and to allow you to set up an organization after the season has started. In the meantime, here are some ideas on how to accomplish these things using current features.

Let's suppose you want eight teams from the same league in the post-season. To reduce the field from eight to four, you can create a second database in which to play the first round games, then return to your main league database and use DMB's post-season tools for the remaining two rounds of playoffs. Here's how:

- from the File menu, choose "Create new empty database" and give that database a meaningful name

- after that database has been created and is the active database, open the Organizer window, select the Teams tab, and use the Import button to import the eight teams that will participate. (If you're not familiar with importing teams, read the Organizer > Teams > Importing Teams topic in the help system.)

- staying within the Organizer window, select the Leagues tab. Create a league, set the schedule to , and add four teams to the league. Click to save the league. (If you're not familiar with creating a new league, read the Organizer > Leagues > Creating a New League topic in the help system.)

- now Modify the league to set the rules and options and set up the post-season matchups for the four teams assigned to that league.

- while still in the Organizer, click on the Schedules tab. Create a new schedule. Give the schedule a name, set the Scope to League, and set the owner to the new league you just created. Click on OK.

- click on the Modify button to open the schedule editor. Click on the Tools button, go down to Generate, and choose "First round playoff schedule(s)". Set the date as the last day of the season and click on OK to generate a schedule for the teams in this league. Save the schedule and close it.

- repeat the last four steps to create a second league with the other four teams and generate a schedule for that league.

- after you've played the four first round series, use the "Change active database" command on the File menu to switch back to your original league database.

- now that you are down to four teams, you can use the Organizer to modify your league and set the matchups for these four teams, then generate a post-season schedule, and play those games.

If you need more assistance with using the post-season tools, refer to the Organizer section of the help system. The Organization, League and Schedules chapters contain relevant topics.

The second issue we mentioned has to do with playing an organization championship series even if you didn't set up an organization before the season started. For a variety of technical reasons, DMB version 8 does not allow you to create an organization and assign leagues to it after the season has started. Just as it was for an eight-team playoff, the solution is to use another database to get the job done, as follows:

- make sure your database is the active database

- from the File menu, use the "Copy this database" command to create another database that is an exact copy of your original. This will allow you to set up for the organization series without affecting the regular season results that are stored in the original database.

- when the newly-created copy is the active database, use the "Restart a season" command on the Tools menu to reset the statistics for both leagues.

- open the Organizer window, select the Orgs tab, and create a new Organization. Assign the two leagues to that organization and leave the schedule set to . Click on OK to save these settings. You will be asked if you want to combine the league schedules. Select "No".

- Modify the new organization you just created. Click on the Options tab and set the rules for the organization series. Click on the Post Season tab and set the teams, home field advantage, and schedule format. Click on OK.

- while still in the Organizer, click on the Schedules tab. Create a new schedule. Give the schedule a name, set the Scope to Organization, and set the owner to the new organization. Check the "Inter-league play" box. Click on OK.

- Modify the new schedule. Click on the Tools button, go down to Generate, and choose "Organization championship schedule". Set the last day of the season and click on OK. You will now have a schedule for your organization series. Save the schedule and close the Organizer. You're now ready to play.

Customer-written utilities for version 8

Thanks to the efforts of three Diamond Mind customers, you can choose from two utilities that scan Diamond Mind Baseball boxscores and compile a wide variety of interesting information.

Ron Gudykunst and Tom Milne have released BASE 3.0, an inexpensive add-on utility for searching DMB version 8 boxscores and compiling lots of fun facts about your DMB season.

And Mark Cohen has updated his HindSight program to scan both version 7 and version 8 boxscores.

To learn more about these tools, visit:

BASE -- http://hometown.aol.com/basev30/BASE30.html
Hindsight -- http://home.att.net/~marklisa/Hindsight/HindSight.htm

We have added links to both from the version 8 support area of the Diamond Mind web site.

Book review: "Curve Ball"

I recently finished reading a very interesting book called Curve Ball by Jim Albert and Jay Bennett, former chairs of the Sports Section of the American Statistical Association. Both are members of SABR and are very familiar with the work of famous baseball analysts (Bill James, Pete Palmer, and the Elias Sports Bureau folks) and other groundbreaking researchers who are not so well known (Earnshaw Cook and George Lindsay).

This book is quite a bit different from other baseball books I've read. Most works of this type set out to rank players in some fashion -- to identify the greatest hitters or pitchers of all time, figure out who the MVPs should have been in a particular season, or something else along these lines. Statistical analysis is a means to an end.

In Curve Ball, the statistical analysis techniques are the central characters. Current and former players are used in the many examples of how to apply formal statistical methods, but the primary goal is the explanation and justification of the techniques themselves. In a nutshell, this what the authors of Curve Ball are trying to say:

"Baseball statistics are everywhere these days. Some of them are meaningful, but a lot of them are not. It's possible to identify the meaningful ones using formal statistical techniques. And we're going to show you how, using language that you can understand even if you haven't studied math or statistics in college. In the process, we'll describe and evaluate many useful measures of baseball performance."

Chapter one describes how some tabletop baseball games represent the probabilities of various baseball events. Think of this chapter as a pregame warmup for your brain. It's main purpose is to introduce the basic ideas of probability and the use of spinners, dice and other methods to randomly select outcomes based on a set of probabilities.

Chapter two introduces some graphical techniques for displaying and evaluating statistical data. The visual display of information interests me a great deal, and I found this chapter to be very well done.

Chapter three expounds on what may be the most important idea in the entire book -- the difference between performance and ability. When we say that a player "batted .300" over some time period, we're talking about his performance. It's a simple statement than cannot be refuted. On the other hand, when we say that a player "is a .300 hitter", we're implying that .300 is a reflection of his true ability, that it's reasonable to expect him to hit .300 again in the future if he's placed in a similar situation. Those are very different statements, yet many in the media get them confused.

Suppose you toss a coin ten times and it comes up heads seven times. Is it reasonable to conclude that the probability of heads with this coin is 70%? It might be, but it's also entirely possible that it is a fair coin with a true probability of heads of 50%, and it just happened to come up heads seven times in this small number of tosses. How can you work backwards from a set of observed outcomes, such as a batter's performance in day games, and draw inferences about his true ability? That's the lesson of chapter three.

And that lesson is put to the test in chapter four, where the authors look at all the ways batting performances are broken down by situation -- versus left and right, on grass and turf, in day and night games, pre- and post-all-star game, with runners in scoring position, and so on. Their goal? Identify the ones that mean something and the ones that are essentially random.

Their conclusions? Focusing only on batting averages, they concluded that grass/turf, day/night and pre/post all-star game breakdowns are essentially random. Among the splits they studied, three appear to be meaningful -- the home field advantage, left/right splits, and performance against fly-ball and ground-ball pitchers.

These results correspond to our own thinking on this subject. The home field advantage and left/right dynamics are important ideas in Diamond Mind Baseball. And we have rejected occasional requests to base player performance on other splits such as day/night, grass/turf, and monthly performance because we believe them to be essentially random.

The one difference is the fly-ball/ground-ball aspect. In DMB, all batters and pitchers are rated for their tendency to produce ground balls and fly balls, but we don't change the likelihood of getting hits based on gb/fb matchups. Albert and Bennett claim that "hitters bat 12 points better against ground-ball pitchers than fly-ball pitchers".

That doesn't feel right to me. Our studies show that hits are less common on ground balls than on fly balls and line drives. Popups have the lowest rate of hits, but they don't occur often enough to pull the hit rate below the hit rate on grounders. I suspect there's an external factor at work here -- perhaps there's a difference in the strikeout rates between pitchers classified as FB or GB pitchers, and it's the strikeout rates that are driving the results found by the authors. I think this issue needs more study.

Chapter five explores streakiness, and the issue of performance versus ability is once again at the forefront. There's no question that streaks and slumps exist. But are some teams or players inherently streaky while others are inherently consistent?

The authors found very little evidence in support of streakiness at either the team or player level. There was some evidence to suggest that some players might be streaky, but because they didn't adjust for the quality of the opposing pitchers and for ballpark effects, I think the evidence is even weaker than it appears. A stretch of games against good pitchers or a road trip through some of the better hitter's parks can make you look more streaky than you really are.

The next three chapters evaluate statistics that measure the overall contribution of a hitter -- batting average, on-base percentage, slugging percentage, on-base plus slugging (OPS), on-base times slugging, Tom Boswell's total average, Pete Palmer's linear weights, and Bill James' runs created. In each case, the authors examine how well these measures predict the number of runs scored by teams throughout the 125-year history of the game.

Albert and Bennett concluded that Bill James' runs created per game formula was the best predictor of team scoring, with Boswell's total average, Palmer's linear weights and OPS not far behind.

The next chapter deals with clutch play. Again, the question of performance versus ability comes up -- every season and every playoff series has its share of clutch performances, but are there players who have an innate ability to come up big in clutch situations time and again? Interestingly, the authors don't tackle the question of whether clutch ability exists. Instead, they introduce a way to measure clutch performances that makes use of win probabilities.

What are win probabilities? By studying tens of thousands of games, you can develop a set of probabilities for winning the game in every possible game situation. For example, we can measure how often a team goes on to win when it has runners on first and third with nobody out in the bottom of the third inning while trailing by two runs.

It stands to reason, and the data bears this out, that the situations where win probabilities can change the most dramatically are those in the late innings of close games.

Suppose your team is trailing by two runs and you hit a three-run homer to put your team ahead. That's always a good thing to do. But if you do it in the first inning, it would increase your chances of winning the game from about 30% to around 60%. Do exactly the same thing with two out in the bottom of the ninth and you would increase your team's chances of winning from about 10% to 100%. So it's no wonder that we intuitively view these late-inning atbats as more important.

The authors apply this line of reasoning in several ways, but the one they explore in greatest detail is the selection of MVPs in post-season series. Each batter or pitcher gets a score for every plate appearance or batter faced based on the change in the win probability for that confrontation. Add up those scores and you get an overall clutch performance score for a game. Do it for a series and you have an objective way to choose the series MVP.

An approach like this can also be used to select regular-season MVPs, and while it seems intuitively appealing and may often select the same player you might have chosen subjectively, I worry that techniques like this can be biased by external factors.

Suppose you have two fictional hitters who each get a single every time up. One plays on a bad team, so he often comes to bat with the bases empty and his team trailing by five runs or more. His singles aren't going to count for much in terms of win probabilities.

The other one is on a mediocre team in a great pitcher's park that winds up in a lot of close, low-scoring games. His singles often make the difference between winning and losing, so his win probability score is higher. But is he really a better player? By definition, he can't be, because we started by assuming both players hit a single every time.

The second-last chapter looks at models for predicting things. The authors found that it's very hard to predict the outcome of individual games (there's a surprise!) and go on to present a few interesting ways to predict season and career totals.

The book wraps up by demonstrating that the best team often fails to win in a 162-game season. That won't come as a surprise to anyone who has read Bill James' work on this subject.

This was a thoroughly enjoyable read for me. The topics are relevant, the material thorough and well organized, and the tables and graphs very helpful. It does get harder to keep up with the math as you move into the later chapters, but I believe you can get a lot out of this book even if you don't have any formal statistical training.

As you've seen in the preceding paragraphs, I don't agree with all of their conclusions. In some cases, in order to maintain their focus on explaining the techniques, the authors used a simplified view of the game of baseball. (A good example is leaving out ballpark effects and the quality of opposing pitchers when studying streakiness in hitters.) As game designers, we often don't have that luxury.

But I've never seen a better presentation of the statistical methods that can and should be used to study important questions about how baseball really works, and I recommend this book.

I bought my copy on Amazon.com and it arrived in a few days.