DMB News July 2004

Diamond Mind Email Newsletter

July 9, 2004
Written by Tom Tippett

Welcome to the third edition of the Diamond Mind email newsletter for the year 2004. Through these newsletters, we will try to keep you up to date on the latest product and technical information about the Diamond Mind Baseball game, related player disks, and our ongoing baseball research efforts. Back issues are available on our website.

Topics for this issue:

Office hours
Encyclopedia fun, part 1
Recent product releases
Building a league encyclopedia
Encyclopedia fun, part 2

Office hours

Our offices will be closed for two days on July 15-16 while our staff attends the national convention of the Society for American Baseball Research (SABR). We apologize in advance for being unavailable on those days.

Recent product releases

Although we announced this in our last newsletter, we just wanted to remind you that on Tuesday, May 18th, we posted a version 9 patch on our web site.

If you're a version 9 owner and you haven't already installed this patch, please do so as soon as possible so you have the best possible experience with your version 9 games.

In June, we released an updated edition of the 1993 Deluxe Past Season that includes a complete set of real-life player transactions and game-by-game starting lineups.

That same month, we released an updated 1966 Classic Past Season that includes games started by position, updated ratings and manager profiles, updated park factors based on a review of all boxscores, and real-life player transactions.

Also in June, we released an update to our Historical Parks Database that includes park information for the 2001-2003 seasons, improved park factors for Classic Past Seasons that have been upgraded in recent years, and some minor corrections.

We're continuing to work on an update to our All-time Greatest Players Disk that we hope to have available later this summer. Work is underway on real-life transactions and game-by-game starting lineups for the 1994 Deluxe Past Season, and we plan to release that in the next few months as well.

Encyclopedia fun, part 1

Many of you are aware that our big research project in 2003 was a look at the extent to which pitchers can influence the outcome of a play when the batter puts the ball in play. Partly as a result of that work, we added in-play batting average to version 9 so you could see how your players stack up in this aspect of the game.

That statistic can be applied to batters as well, and a little while back we spotted something interesting when playing around with the version 9 encyclopedia.

One of our encyclopedias is based on Pete Palmer's historical database, which has all of the real-life stats from 1871 to the present. Because version 9 includes a career leaderboard report, it's easy to see who the all-time leaders are in a bunch of categories, including modern stats like in-play batting average.

We'll come back to this at the end of the newsletter, but for now, we'll leave you to think about the all-time career leaders in in-play average for batters. In particular, we're wondering if you can guess which active players are currently sitting in the #2 and #3 spots on the all-time list.

To make sure you're clear on the question, in-play batting average is hits divided by balls in play, not counting homers. So we're looking for guys who rack up the hits even though they're giving the defense a chance to make a play.

Building a league encyclopedia

Unfortunately, I don't have as much time to play Diamond Mind Baseball as many of you do. Working on the game and running the company doesn't leave a lot of spare time, and I like to spend a good bit of that time with my family.

I am, however, in two DMB leagues, one that I joined earlier this year, and another (the World Internet Baseball League, or WIBL) in which I've been a manager for nine years.

Quite frankly, I haven't been a very good manager in the WIBL league, and I mean that in two ways. First, my team has had only occasional success on the field. Second, and more important, I don't spend enough time talking trades, preparing for the draft, and participating in the social aspects of the league. Sometimes I think the league would be better off if I turned my team over to someone who could put more effort into it.

Nevertheless, I've held on to my team because (a) it's fun when I'm not feeling pressed for time and (b) using our products the same way you do helps us find ways to make the game better.

Wanting to contribute a little more to the league, I recently took it upon myself to build an encyclopedia of all of the WIBL seasons from 1995 to the present, and I thought it might be useful to share some observations and tips from that experience.

The first stage in the process was assembling the databases from past WIBL seasons. Fortunately, as a long-time member of the league, I still had copies of some of the old seasons. The commissioner of the league was able to find the others.

The 1995 and 1996 seasons were played using DMB version 6, the next three seasons with version 7, and three after that with version 8, and the current season with version 9. So the first task was to get everything into version 9 format.

For most of the older seasons, the commissioner had used PKZip to collect and compress all of the final DMB files into a series of Zip files called 95final.zip, 96final.zip, and so on.

There's more than one way to get these files into version 9, but here are the steps I went through for each season. I'll use the 1995 for my examples, but if you're doing this yourself, you can substitute the names that make sense for you:

(a) use Windows Explorer to create a new folder for the season. I chose to put it inside my game folder, so the full path to this folder is c:\dmb9\wibl1995.

(b) use PKZip to extract all of the files from the 95final.zip archive and place them into the wibl1995 folder

(c) use the "Add reference to existing database" command to tell DMB about the new database. This command also converts the files to version 9 and makes it the active database.

(d) use the Organizer to get the databases ready for importing into the encyclopedia, which I'll discuss in more detail in just a moment

As I said, there's more than one way to get to this point. For instance, a couple of the more recent seasons were already on my hard disk in the form of version 8 databases, so I used the "Copy database" command to get them in version 9 format.

That was stage one, and it only took a few minutes per season to get this far. Next, I prepared each season for the encyclopedia, which is mostly an exercise in assigning proper Universal IDs (UIDs).

As you probably know already, we use UIDs to link organizations, leagues, teams, parks, and players across different DMB databases.

Why are UIDs needed? Because the player population changes every year. Joe Smith might be player number 842 in one database and player number 906 the next season. By assigning him the same UID in both databases, we know we're dealing with the same Joe Smith and not two different players who happen to have the same name.

I began the UID checking with our leagues and organizations. The league structure has changed a little over the years. For 1995 and 1996, we had 24 teams divided into two leagues. In 1997, we created an organization that linked the two leagues and began playing inter-league games. In 1998, we expanded to 26 teams.

After some false starts, I realized that it would be helpful to create an organization for each of the 1995 and 1996 seasons. Why bother? Mainly for consistency and ease of use.

When importing seasons, the DMB Encyclopedia asks you to specify the league or organization you wish to import. If you have two unconnected leagues, you must run this command twice, once for each league. The same is true if you want to delete a season. It's easier if the two leagues are connected by an organization.

Also, when you generate reports or a league web site, it's simpler if all your seasons have the same structure. One of my main goals was to generate a web site containing the full history of the league, and that was complicated by having two years without an organization and seven others with an organization.

When creating these organizations, I had to remember to assign them a UID of 1. I could have used any other number, but a DMB database can only have one organization, so 1 was a natural choice. The key is assigning the same UID for each of these seasons. For those seasons that already had organizations, I modified them to have this UID.

I did some of this work a few months ago, and I don't remember whether our leagues already had UIDs or whether I had to add them. In any case, because the version 9 Organizer window displays UIDs, it's easy to spot leagues that still have the default UID of -1, and it only takes a few seconds to assign a new one.

Most of the parks were automatically assigned their UIDs during the conversion to version 9. When it encounters a park without a UID, the version 9 conversion logic uses a lookup table of park info to assign these values. A few parks still had UIDs of -1 because their names didn't match those in our lookup table. In these cases, I double-clicked on each park to open the Modify Park window and used the Search button to quickly assign the correct UID.

Similarly, most of the players were automatically assigned their UIDs during conversion. Version 9 contains a massive lookup table that contains enough information to assign correct UIDs for almost all player-seasons covered by our season disk collection.

A small percentage of the time, perhaps about 1%, there's a slight mismatch between the names and stats in the database being converted and the corresponding values in the lookup table. These players end up with the default UID of -1.

Some "dummy" players also had UIDs of -1 after the conversion. Most often, these were players who were on a roster even though they missed the real-life season with an injury. The team manager wanted to hold onto those players, and the dummy player records helped the commissioner track their status.

So I scrolled through the players in the Organizer, looking for guys with -1 in the UID column. When I found one, I double-clicked on his name to display his profile and see whether he was a real player or a dummy player.

If the player had stats for that season, I closed the profile, right-clicked on the player, and chose "Modify player > Player UID" from the popup menu to assign his UID.

I deleted players who did not have stats for that season. I could have assigned UIDs using the same method, but I didn't see the point of loading players with no stats into the encyclopedia.

Assigning team UIDs was a little trickier. As is the case in most leagues, as managers came and went, some teams changed names. We've had managers keep their teams but change the names anyway. We've had teams switch divisions and leagues between seasons.

One team, for example, used four different names in nine seasons, but because I made sure it had the same UID in each of these nine seasons, the encyclopedia treats it as one team.

It took about an hour and a half to review these nine seasons and make sure I had the team UIDs set up properly. Fortunately, only a few changes needed to be made, largely because our commissioner used DMB's Migrate command to set up the league each year, and that command carries team and player UIDs forward from season to season.

To be absolutely sure, however, I grabbed a sheet of paper and created a grid showing the transitions and name changes from year to year. In a few cases, especially when teams changed divisions, we had lost the continuity of the UIDs and I had to make some changes.

As I said, it took only about an hour and a half to make sure the team UIDs were in good shape. Add in the time to assign missing UIDs for the organizations, leagues, parks, and players, and there was a grand total of two to three hours of ID work, or about 20 minutes per season.

That was all of the preparation that was needed. The final step was to fire up the DMB Encyclopedia, create a new empty encyclopedia, and import the nine seasons one at a time. In a few cases, the validation process identified a few UIDs I had missed, so I had to go back to DMB and fix them up and then try again. But this process was very simple.

With nine seasons loaded, the encyclopedia consumes about 133 MB of disk space, and that's with only a few of the boxscores. On the other hand, our league has always had the game-by-game stats feature turned on, and those game-by-game records represent about 3/4 of the space taken. Fortunately, disk space is very cheap these days, and most PCs have plenty.

With all of the seasons loaded, I could move on to the fun part.

The first thing I did was generate the league standings report, cycling easily through the nine seasons to see how my team's fortunes rose and fell over the years.

Next, I generated the career batting and pitching leaders reports, starting with the entire database as the scope. I'll bet I spent an hour looking over the all-time WIBL leaders in batting average, hits, homers, RBI, wins, ERA, strikeouts, and dozens of other categories. Much of that time was spent searching (in vain, mostly) for guys on my team.

Then I changed the scope of these reports to "Players on a roster" so I could see the career leaders for my franchise. It was a great way for me to see which players had contributed the most to my club over the years, though I have to admit that some of those leaders had totals that were not all that impressive.

I also wanted to see the top single-season performances, so I generated the batting and pitching leaders reports for the range of years from 1995 to 2003. Even though I didn't see many of my players on those lists, it was fascinating to see all of the league's single-season records on a few pages.

Finally, I used the standard options to generate an entire web site for the WIBL league. On a computer with a 1.6 GHz processor, it took a little over three minutes to generate 1,960 linked HTML reports, or an average of about 10 reports per second. Total space consumption was about 133 MB.

It's great to have all of those reports in one place -- everything from the career leaders for the organization to a league batting register for any season to the team pitching report for any team season to an individual boxscore.

On the other hand, 133 MB is a lot of space. If space was a concern, I could have pruned this tree by including only the organization reports and skipping all of the league reports, removing league- or team-level reports I wasn't interested in, customizing reports to exclude sections I didn't care about, or changing report options so fewer players would show up.

Encyclopedia fun, part 2

If you haven't already read part 1 above, this section will make more sense if you take a moment to read that first.

Through 2003, with a minimum 2000 career plate appearances, the all-time leaders in in-play batting average are:

     Ty Cobb             .362

     Derek Jeter         .361

     Bobby Abreu         .357

     Rogers Hornsby      .355

     Rod Carew           .353

     Joe Jackson         .351

     Billy Hamilton      .350

     Reggie Jefferson    .349

     Pete Browning       .347

     Ichiro Suzuki       .347



It's an interesting mix of guys who (a) played when parks were big and it was easier to hit-em-where-they-ain't than it is today, (b) played in the modern era who still managed to find the holes in the defense, and (c) used their great speed to beat out a bunch of infield hits.

One technical detail is worth mentioning. These averages were derived by subtracting homers and strikeouts from the overall totals for each player. This would be a little more precise if we could distinguish between inside-the-park homers and over-the-wall homers, since an inside-the-park job should count toward in-play average.

Some old-time players would move up the list if we could do that. Then again, the old-timers already have an edge because they played in big parks and against weaker fielders, so I don't feel too sorry for them.

In any case, the next time you watch Derek Jeter beat out a grounder, sneak a ball through the infield, or line a base hit just out of the reach of a fielder, remember that nobody in the last three-quarters of a century has done it better.