I wanted to utilize the Opening Day MLB rosters for various season replays in my computer-based simulations, but I wasn’t able to locate this information in the usual places (Baseball-Reference, Retrosheet, etc.) I belatedly found out that a number of individuals and volunteers in the sabermetric community have already done this work, scouring box scores and transaction pages to cobble together this data. Additionally, some of the current computer baseball simulations such as Out of the Park, Action! PC and Digital Diamond Baseball include built-in functionality or utilities that provide users with the option to incorporate Opening Day Rosters, As-Played Lineups and Real-Life Transactions into their replays. I discovered the ATMgrforBBW@groups.io website which supports the Automatic Transaction Manager (ATMgr) developed by Gary Leven for APBA’s Baseball for Windows (BBW).
However, since I went to the trouble of writing an R script to facilitate my needs, I figured that I’d share the results with the community. The script essentially imports one or more years’ worth of Retrosheet event files and iterates through the data to extract the starters and substitutes from every game in the file. The code will also gather the game dates and relevant team information. Once the data is extracted, the resulting data table can be further sorted and manipulated to remove all but the first game in a given season for every player. You can export the results and take the first 25 players for each team as their Opening Day Roster. Individuals who do not wish to go through the entire process outlined below can still acquire the data as I’ve shared it via Google Sheets here ->
What is R?
The definition of the R programming language, as it relates to baseball data, was articulated in the book “Analyzing Baseball Data with R” by Max Marchi and Jim Albert. Quoting from the Preface of the first edition: “R is a system for statistical computation and graphics, and it is a computer language designed for typical and possibly specialized statistical and graphical applications.. The public availability of baseball data and the open-source R software is an attractive marriage. R provides a large range of tools for importing, arranging and organizing large datasets. By the use of built-in functions and collections of packages from the R user-community, one can perform various data and graphical analyses, and communicate this work easily to other baseball enthusiasts.”
Requirements:
You’ll need to download and install R 3.3.0+ and R Studio Desktop for Windows from the following site:
https://posit.co/download/rstudio-desktop/
The installation process for R and R Studio are beyond the scope of this article.
You will need to download the Retrosheet event files and extract them into a folder.
https://www.retrosheet.org/game.htm
You can choose to download the event files for individual seasons or, if you scroll down on the page, look for the section “Regular season event files by decade”. If you have ample available disk space, I suggest downloading the decade event files.
I extract the event files into C:RetrosheetEvents on my computer. You can change the folder/subfolder location if you wish, but you’ll need to update any references to that folder in the R script.
The resulting files are exported to a folder called C:OpeningDayRosters. Again, you can export to any folder that you choose, but you’ll need to modify that line of code in the R script.
Optional – merge the event files. Open a command prompt and execute the following commands:
copy 191*.ev* events_1910.csv
copy 192*.ev* events_1920.csv
copy 193*.ev* events_1930.csv
copy 194*.ev* events_1940.csv
copy 195*.ev* events_1950.csv
copy 196*.ev* events_1960.csv
copy 197*.ev* events_1970.csv
copy 198*.ev* events_1980.csv
copy 199*.ev* events_1990.csv
copy 200*.ev* events_2000.csv
copy 201*.ev* events_2010.csv
copy 202*.ev* events_2020.csv
You’ll need the R script – extract the .zip file into a C:R_Scripts folder or alternate directory if you have another location that you’d prefer to house your R script files.
A closer look at Retrosheet Event Files
Please check out the following link for a detailed description of the Retrosheet event file contents and scoring system. https://www.retrosheet.org/eventfile.htm
Each game in an event file consists of multiple record types including:
id, version, info, start, sub, badj, padj, ladj, radj, presadj, data, com
We will use Jackie Robinson’s MLB debut for the Brooklyn Dodgers on April 15, 1947 as our sample event file. As you’ll see in my explanation of the script that I wrote to determine the Opening Day Rosters, I drop all of the record types except for id, info, start and sub very early in the process. However, you may wish to examine the Event files for play-by-play or other information. In that case, you’ll need to determine which records are pertinent to your project.
The ‘id’ row is fairly self-explanatory. The entry consists of a three-letter abbreviation for the home team followed by the year, month, day and game number (single game (0), first game (1) or second game (2) if a double-header was played on that date).
id,BRO194704150
You may safely ignore the ‘version’ row.
version,1
The ‘info’ records encompass approximately 30 rows of data but only a handful of rows are germane to the Opening Day Rosters script. I’m only interested in visteam, hometeam, date and number – the remaining rows will be discarded.
info,inputprogvers,"version 7RS(19) of 07/07/92"
info,visteam,BSN
info,hometeam,BRO
info,date,1947/04/15
info,site,NYC15
info,number,0
info,starttime,0:00PM
info,daynight,day
info,usedh,false
info,umphome,pineb101
info,ump1b,barla901
info,ump2b,(none)
info,ump3b,gorea901
info,scorer,"27,32"
info,translator,"Smith"
info,inputter,"Smith"
info,inputtime,1993/05/15 8:54PM
info,howscored,unknown
info,pitches,none
info,temp,0
info,winddir,unknown
info,windspeed,-1
info,fieldcond,unknown
info,precip,unknown
info,sky,unknown
info,timeofgame,146
info,attendance,26623
info,wp,gregh102
info,lp,sainj101
info,save,caseh101
info,gwrbi,
The ‘start’ rows consist of 18 to 20 lines of data depending on whether the designated hitter is present in the lineup for the particular game. Each row contains the player’s unique Retrosheet ID, full name, team designation (either ‘0’ for visiting or ‘1’ for home team), batting order position and fielding position. Substitutions or ‘sub’ entries are comprised of similar fields and they appear chronologically within the play-by-play rows.
start,culld101,"Dick Culler",0,1,6
start,hoppj102,"Johnny Hopp",0,2,8
start,mccom101,"Mike McCormick",0,3,9
start,ellib103,"Bob Elliott",0,4,5
start,litwd101,"Danny Litwhiler",0,5,7
start,torge101,"Earl Torgeson",0,6,3
start,masip101,"Phil Masi",0,7,2
start,ryanc102,"Connie Ryan",0,8,4
start,sainj101,"Johnny Sain",0,9,1
start,stane101,"Eddie Stanky",1,1,4
start,robij103,"Jackie Robinson",1,2,3
start,reisp101,"Pete Reiser",1,3,8
start,walkd101,"Dixie Walker",1,4,9
start,hermg101,"Gene Hermanski",1,5,7
start,edwab101,"Bruce Edwards",1,6,2
start,jorgs101,"Spider Jorgensen",1,7,5
start,reesp101,"Peewee Reese",1,8,6
start,hattj101,"Joe Hatten",1,9,1
Noteworthy information regarding the game is registered via the ‘com’ (comment) field.
com,"$Dodgers manager Leo Durocher suspended for the 1947 season for associating"
com,"with known gamblers; coach Clyde Sukeforth managed the team for the first"
com,"two games; debut for Jackie Robinson; debut for umpire Artie Gore"
Play-by-play events are registered in the order in which they occur during the contest. This data features the inning, team (visitor or home), Retrosheet player ID, the ball-strike count when the event occurred, pitch-by-pitch description (when available) and the play/event record such as a ‘K’ for a strikeout, ‘W’ for a walk or a ‘S7’ for a single to left field.
play,1,0,culld101,??,,53
play,1,0,hoppj102,??,,K
play,1,0,mccom101,??,,S8
play,1,0,ellib103,??,,WP.1-2
play,1,0,ellib103,??,,W
play,1,0,litwd101,??,,8/F8D
play,1,1,stane101,??,,43
play,1,1,robij103,??,,53
play,1,1,reisp101,??,,W
play,1,1,walkd101,??,,13
play,2,0,torge101,??,,W
play,2,0,masip101,??,,6
play,2,0,ryanc102,??,,46(1)3/GDP
play,2,1,hermg101,??,,4/P
play,2,1,edwab101,??,,8/F8D
play,2,1,jorgs101,??,,W
play,2,1,reesp101,??,,9/P
play,3,0,sainj101,??,,53
play,3,0,culld101,??,,63
play,3,0,hoppj102,??,,43
play,3,1,hattj101,??,,K
play,3,1,stane101,??,,43
play,3,1,robij103,??,,7
play,4,0,mccom101,??,,S7
play,4,0,ellib103,??,,S8.1-3
play,4,0,litwd101,??,,FC1.3XH(1);1-2;B-1
play,4,0,torge101,??,,K/C
play,4,0,masip101,??,,7/L78
play,4,1,reisp101,??,,W
play,4,1,walkd101,??,,43.1-2
play,4,1,hermg101,??,,S8.2-3
play,4,1,edwab101,??,,54(1)/FO.3-H
play,4,1,jorgs101,??,,43
play,5,0,ryanc102,??,,S8
play,5,0,sainj101,??,,3/SH.1-2
play,5,0,culld101,??,,53/SH.2-3
play,5,0,hoppj102,??,,S7.3-H
play,5,0,mccom101,??,,W.1-2
play,5,0,ellib103,??,,8/P
play,5,1,reesp101,??,,D7
play,5,1,hattj101,??,,S/BG.2-3
play,5,1,stane101,??,,4/L
play,5,1,robij103,??,,64(1)3/GDP
play,6,0,litwd101,??,,HP
play,6,0,torge101,??,,E2/TH/BG.1-2
play,6,0,masip101,??,,53/SH.1-2;2-3
play,6,0,ryanc102,??,,S7.3-H(UR);2-H(UR)
play,6,0,sainj101,??,,13/SH.1-2
play,6,0,culld101,??,,3/FL
play,6,1,reisp101,??,,S7
play,6,1,walkd101,??,,S9.1-3
play,6,1,hermg101,??,,NP
sub,tatut101,"Tommy Tatum",1,4,12
play,6,1,hermg101,??,,9
play,6,1,edwab101,??,,HP.1-2
play,6,1,jorgs101,??,,NP
sub,rackm101,"Marv Rackley",1,6,12
play,6,1,jorgs101,??,,43.1-2;2-3;3-H
play,6,1,reesp101,??,,IW
play,6,1,hattj101,??,,NP
sub,steve101,"Ed Stevens",1,9,11
play,6,1,steve101,??,,K
play,7,0,hoppj102,??,,NP
sub,tatut101,"Tommy Tatum",1,4,9
play,7,0,hoppj102,??,,NP
sub,bragb101,"Bobby Bragan",1,6,2
play,7,0,hoppj102,??,,NP
sub,gregh102,"Hal Gregg",1,9,1
play,7,0,hoppj102,??,,7
play,7,0,mccom101,??,,41
play,7,0,ellib103,??,,W
play,7,0,litwd101,??,,K
play,7,1,stane101,??,,NP
sub,roweb101,"Bama Rowell",0,5,7
play,7,1,stane101,??,,W
play,7,1,robij103,??,,E3/TH1/SH.1-3;B-2
com,"$the throw hit Jackie Robinson and caromed into RF"
play,7,1,reisp101,??,,D9.3-H;2-H(UR)
play,7,1,tatut101,??,,NP
sub,coopm101,"Mort Cooper",0,9,1
play,7,1,tatut101,??,,NP
sub,vauga101,"Arky Vaughan",1,4,11
play,7,1,vauga101,??,,13.2-3
play,7,1,hermg101,??,,8.3-H(UR)
play,7,1,bragb101,??,,63
play,8,0,torge101,??,,NP
sub,furic101,"Carl Furillo",1,4,9
play,8,0,torge101,??,,K
play,8,0,masip101,??,,3/P
play,8,0,ryanc102,??,,S7
play,8,0,coopm101,??,,NP
sub,neilt101,"Tommy Neill",0,9,11
play,8,0,neilt101,??,,HP.1-2
play,8,0,culld101,??,,NP
sub,holmt101,"Tommy Holmes",0,1,11
play,8,0,holmt101,??,,7
play,8,1,jorgs101,??,,NP
sub,sists101,"Sibby Sisti",0,1,6
play,8,1,jorgs101,??,,NP
sub,lanfw101,"Walt Lanfranconi",0,9,1
play,8,1,jorgs101,??,,43
play,8,1,reesp101,??,,K/C
play,8,1,gregh102,??,,K
play,9,0,hoppj102,??,,NP
sub,schuh101,"Howie Schultz",1,2,3
play,9,0,hoppj102,??,,6/L
play,9,0,mccom101,??,,S9
play,9,0,ellib103,??,,W.1-2
play,9,0,roweb101,??,,NP
sub,caseh101,"Hugh Casey",1,9,1
play,9,0,roweb101,??,,3/FL
play,9,0,torge101,??,,K
The ‘data’ record is currently used to track earned runs allowed for each pitcher in the game. These rows are discarded by the Opening Day Roster script.
data,er,sainj101,3
data,er,coopm101,0
data,er,lanfw101,0
data,er,hattj101,1
data,er,gregh102,0
data,er,caseh101,0
Walking through the process
Here’s the code with some commentary along the way:
We need to utilize two frequently-used packages containing commands that are not included in the base-R language – dplyr and sqldf.
install.packages("dplyr")
install.packages("sqldf")
library(dplyr)
library(sqldf)
Next, we use the setwd to change the working directory within R Studio to the location where you extracted the Retrosheet Event files and merged them into .csv files.
setwd("C:/retrosheet/events")
# change the events_####.csv to reference the decade event file
# that you wish to import
The read.csv command imports the designated .csv file into a dataframe named “lineups_subs”.
lineups_subs <- read.csv("events_2020.csv", header = FALSE, sep = ",",
col.names = c("startsub", "retroID",
"playerName", "visitorHome",
"lineupPos", "fieldingPos",
"unused1"), fill = TRUE, quote = "",
stringsAsFactors = FALSE)
As I noted earlier, the Retrosheet Event files include play-by-play and comments along with several other data points that are superfluous to this particular task. We’re using the subset command to remove the ‘play’, ‘com’ and ‘version’ data references first. Then we will perform another subset to remove references to umpires, weather, etc.
lineups_subs_filter2 <- subset(lineups_subs, startsub == 'info' |
startsub == 'start' | startsub == 'sub' |
startsub == 'id')
lineups_subs_filter <- subset(lineups_subs_filter2,
retroID != 'starttime' &
retroID != 'daynight' &
retroID != 'usedh' &
retroID != 'innings' &
retroID != 'tiebreaker' &
retroID != 'umphome' &
retroID != 'ump1b' &
retroID != 'ump2b' &
retroID != 'ump3b' &
retroID != 'umplf' &
retroID != 'umprf' &
retroID != 'inputtime' &
retroID != 'howscored' &
retroID != 'pitches' &
retroID != 'oscorer' &
retroID != 'temp' &
retroID != 'winddir' &
retroID != 'windspeed' &
retroID != 'fieldcon' &
retroID != 'precip' &
retroID != 'sky' &
retroID != 'timeofgame' &
retroID != 'attendance' &
retroID != 'wp' &
retroID != 'lp' &
retroID != 'save' )
We’re going to iterate through the ‘lineups_sub_filter’ dataframe, but first we need to do some housekeeping and create several new variables to help us keep track of everything.
Let’s create a brand new dataframe called ‘lineups_subs_temp’ while retaining the same structure as ‘lineups_sub_filter’. Note: we’re renaming the ‘unused1’ column to ‘gameID’ and adding 5 new columns to the dataframe – visTeam, homeTeam, TeamYear, date and number.
lineups_subs_temp <- lineups_subs_filter
names(lineups_subs_temp)[names(lineups_subs_temp) == 'unused1'] <- 'gameID'
lineups_subs_temp[,'visTeam'] = NA
lineups_subs_temp[,'HomeTeam'] = NA
lineups_subs_temp[,'TeamYear'] = NA
lineups_subs_temp[,'date'] = NA
lineups_subs_temp[,'number'] = NA
lineups_subs_temp <- lineups_subs_temp[0, ]
Here are the new variables for tracking information as the for-loop processes. We will track the number of rows in the new dataframe using ‘tempcounter’ . The other variables are updated when the loop processes a row in ‘lineups_subs_temp’ that matches exactly on the corresponding information.
tempcounter = 1
currentID <- ""
currentVisTeam <- ""
currentHomeTeam <- ""
currentTeamYear <- ""
currentDate <- ""
currentNumber <- "0" # game number '0' first game
# or first of a double-header,
# '1' for second game of double-header
This is where the majority of the magic happens! The variable ‘I’ counts the rows as we search through the lineups_sub_filter dataframe. Then we scan the contents of the current row (based on the value of ‘I’) using several ifelse statements to populate the currentID, currentVisTeam, currentHomeTeam, currentDate and currentNumber variables.
for(i in 1:nrow(lineups_subs_filter)) { # for-loop over columns
currentID <- ifelse(lineups_subs_filter[i,1] == "id", lineups_subs_filter[i,2], currentID)
currentVisTeam <- ifelse(lineups_subs_filter[i,1] == "info" & lineups_subs_filter[i,2] == "visteam", lineups_subs_filter[i,3], currentVisTeam)
currentHomeTeam <- ifelse(lineups_subs_filter[i,1] == "info" & lineups_subs_filter[i,2] == "hometeam", lineups_subs_filter[i,3], currentHomeTeam)
currentDate <- ifelse(lineups_subs_filter[i,1] == "info" & lineups_subs_filter[i,2] == "date", lineups_subs_filter[i,3], currentDate)
currentNumber <- ifelse(lineups_subs_filter[i,1] == "info" & lineups_subs_filter[i,2] == "number", lineups_subs_filter[i,3], currentNumber)
The next set of statements only fire if the value of the first column in the current row is equal to “start” or “sub”. When the statement is true, we create a record in new dataframe (lineups_subs_temp) containing the following fields:
date, number, visitor, home, player_Team (matching the player’s team based on status of visitorHome field – 0 or 1), startSub, retroID, playerName, lineupPos, fieldingPos
if(lineups_subs_filter[i,1] == "start" | lineups_subs_filter[i,1] == "sub") {
lineups_subs_temp[tempcounter , 1] <- lineups_subs_filter[i,1]
lineups_subs_temp[tempcounter , 2] <- lineups_subs_filter[i,2]
lineups_subs_temp[tempcounter , 3] <- gsub('["]', '',
lineups_subs_filter[i,3])
lineups_subs_temp[tempcounter , 4] <- lineups_subs_filter[i,4]
lineups_subs_temp[tempcounter , 5] <- lineups_subs_filter[i,5]
lineups_subs_temp[tempcounter , 6] <- lineups_subs_filter[i,6]
lineups_subs_temp[tempcounter , 7] <- currentID
lineups_subs_temp[tempcounter , 8] <- currentVisTeam
lineups_subs_temp[tempcounter , 9] <- currentHomeTeam
We confirm whether the current player is a member of the visiting or home team and assign the “TeamYear” accordingly (i.e. NYY1983, SEA2001). Use the substr command to extract the year from the currentDate variable.
currentTeamYear <- ifelse(lineups_subs_filter[i,4] == "0",
paste(currentVisTeam,
substr(currentDate,1,4),sep="_"),
paste(currentHomeTeam,
substr(currentDate,1,4),sep="_"))
lineups_subs_temp[tempcounter , 10] <- currentTeamYear
lineups_subs_temp[tempcounter , 11] <- currentDate
lineups_subs_temp[tempcounter , 12] <- currentNumber
tempcounter <- tempcounter + 1
}
} # end of for..loop
Once the loop completes (this can take several hours if you combined multiple event files as I did), we execute a sqldf statement to generate a new dataframe called OpeningDayRosters_temp. This dataframe will be sorted in ascending order by TeamYear, retroID, date, number.
OpeningDayRosters_temp <- sqldf("SELECT * FROM lineups_subs_temp
ORDER BY TeamYear ASC, retroID ASC,
date ASC, number ASC")
Next, we create new columns for PlayerTeamYear and Year to assist with the sorting process.
OpeningDayRosters_temp$PlayerTeamYear <-
paste(OpeningDayRosters_temp$retroID,
OpeningDayRosters_temp$TeamYear, sep="_")
OpeningDayRosters_temp$year <- substr(OpeningDayRosters_temp$gameID,4,7)
This is where the dplyr library comes into play. We’re utilizing the group_by, mutate and slice commands. First, we group the rows by PlayerTeamYear. Then we use the mutate command to generate a new variable ‘gamenumber’ that contains each row’s rank based on the date. Slice using the 1:1 parameters will omit every row with the exception of those in which gameNumber is equal to 1. The resulting rows are placed into the OpeningDayRosters dataframe. In essence, we’re eliminating every game that a player participated in besides their first game of a given season.
OpeningDayRosters <- OpeningDayRosters_temp %>%
group_by(PlayerTeamYear) %>%
mutate(gameNumber = rank(date)) %>%
slice(1:1)
Here we use sqldf again, creating a new dataframe called OpeningDayRosters25 which will be sorted by Team_Year and Date.
OpeningDayRosters25 <- sqldf("SELECT * FROM OpeningDayRosters
ORDER BY year ASC, TeamYear ASC, date ASC")
Let’s change the working directory in R Studio to the C:OpeningDayRosters subfolder. The final step is to write the dataframe out to a .csv (comma-separated values) file that you can view in a text editor or edit in a spreadsheet or database application such as Microsoft Excel, Microsoft Access, etc.
setwd("C:/OpeningDayRosters")
# uncomment the next line if you want to export all of the start/sub data for
# a given season or decade
# write.csv(OpeningDayRosters,"OpeningDayRosters_All_2020.csv")
write.csv(OpeningDayRosters25,"OpeningDayRosters_first25_2020.csv")
Results
Here’s an example of the resulting output using the 1983 California Angels. The top 25 rows (from Juan Beniquez through Joe Ferguson) would be considered the Opening Day roster.
TeamYear | playerName | date | startsub | retroID | visHome | lineup | fieldPos |
CAL_1983 | Juan Beniquez | 4/5/1983 | sub | benij001 | 1 | 1 | 7 |
CAL_1983 | Bob Boone | 4/5/1983 | start | boonb001 | 1 | 9 | 2 |
CAL_1983 | Bob Clark | 4/5/1983 | start | clarb002 | 1 | 2 | 9 |
CAL_1983 | Doug DeCinces | 4/5/1983 | start | decid001 | 1 | 6 | 5 |
CAL_1983 | Brian Downing | 4/5/1983 | start | downb001 | 1 | 1 | 7 |
CAL_1983 | Tim Foli | 4/5/1983 | start | folit001 | 1 | 8 | 6 |
CAL_1983 | Bobby Grich | 4/5/1983 | start | gricb001 | 1 | 7 | 4 |
CAL_1983 | Andy Hassler | 4/5/1983 | sub | hassa001 | 1 | 0 | 1 |
CAL_1983 | Reggie Jackson | 4/5/1983 | start | jackr001 | 1 | 4 | 10 |
CAL_1983 | Bruce Kison | 4/5/1983 | start | kisob001 | 1 | 0 | 1 |
CAL_1983 | Fred Lynn | 4/5/1983 | start | lynnf001 | 1 | 5 | 8 |
CAL_1983 | Luis Sanchez | 4/5/1983 | sub | sancl001 | 1 | 0 | 1 |
CAL_1983 | Daryl Sconiers | 4/5/1983 | start | scond001 | 1 | 3 | 3 |
CAL_1983 | Ron Jackson | 4/6/1983 | start | jackr002 | 1 | 6 | 3 |
CAL_1983 | Tommy John | 4/6/1983 | start | johnt001 | 1 | 0 | 1 |
CAL_1983 | Ricky Adams | 4/7/1983 | sub | adamr001 | 1 | 4 | 5 |
CAL_1983 | Rod Carew | 4/7/1983 | sub | carer001 | 1 | 8 | 11 |
CAL_1983 | Dave Goltz | 4/7/1983 | sub | goltd101 | 1 | 0 | 1 |
CAL_1983 | Rob Wilfong | 4/7/1983 | sub | wilfr001 | 1 | 8 | 12 |
CAL_1983 | Mike Witt | 4/7/1983 | start | wittm001 | 1 | 0 | 1 |
CAL_1983 | Doug Corbett | 4/8/1983 | sub | corbd001 | 0 | 0 | 1 |
CAL_1983 | Geoff Zahn | 4/8/1983 | start | zahng001 | 0 | 0 | 1 |
CAL_1983 | Ken Forsch | 4/9/1983 | start | forsk001 | 0 | 0 | 1 |
CAL_1983 | Jack Curtis | 4/10/1983 | sub | curtj001 | 0 | 0 | 1 |
CAL_1983 | Joe Ferguson | 4/11/1983 | start | fergj101 | 1 | 8 | 2 |
CAL_1983 | Ellis Valentine | 5/6/1983 | sub | valee001 | 0 | 6 | 9 |
CAL_1983 | Bill Travers | 5/10/1983 | start | travb101 | 0 | 0 | 1 |
CAL_1983 | Byron McLaughlin | 6/7/1983 | sub | mclab102 | 1 | 0 | 1 |
CAL_1983 | Curt Brown | 6/10/1983 | sub | browc001 | 1 | 0 | 1 |
CAL_1983 | Mike O’Berry | 6/22/1983 | start | oberm001 | 0 | 9 | 2 |
CAL_1983 | Rick Burleson | 6/30/1983 | start | burlr001 | 1 | 1 | 6 |
CAL_1983 | Steve Lubratich | 7/20/1983 | start | lubrs101 | 1 | 2 | 5 |
CAL_1983 | Mike Brown | 7/21/1983 | start | browm002 | 1 | 7 | 9 |
CAL_1983 | Rick Steirer | 7/22/1983 | sub | steir001 | 1 | 0 | 1 |
CAL_1983 | Steve Brown | 8/1/1983 | start | brows001 | 0 | 0 | 1 |
CAL_1983 | Bob Lacey | 9/8/1983 | sub | laceb001 | 0 | 0 | 1 |
CAL_1983 | Jerry Narron | 9/8/1983 | sub | narrj001 | 0 | 6 | 11 |
CAL_1983 | Gary Pettis | 9/8/1983 | start | pettg001 | 0 | 1 | 9 |
CAL_1983 | Dick Schofield | 9/8/1983 | sub | schod001 | 0 | 9 | 6 |
Again, the entire data set is available here:
I would encourage you to venture forward with R or the programming language of your choice. You may wish to utilize this script in its original form or as a basis to mine the event files further as you seek answers within the play-by-play and other records contained in this amazing data set!
References and Resources
Adler, Joseph. Baseball Hacks: Tips & Tools for Analyzing and Winning with Statistics.
Sebastopol, CA: O’Reilly Media, 2006. Print.
Baseball-Reference. Web. < http://www.baseball-reference.com >.
Marchi, Max and Albert, Jim. Analyzing Baseball Data with R. Boca Raton, FL: CRC Press, 2014. Print.
Retrosheet. Web. < http://www.retrosheet.org >.
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at “www.retrosheet.org”.
About the Author
I am a New Jersey native with a passion for baseball, statistics, computers and video games who enjoys spending quality time with his family.
“Hardball Architects – Volume 1 (American League)“, published in July 2020 and
“Hardball Architects – Volume 2 (National League)”, published in April 2022, examines the trades, free agent acquisitions, draft picks and other transactions for the 30 Major League Baseball franchises, divided into a 2-volume set. Both books are available in paperback and digital (Kindle) format at Amazon.com. All key moves are scrutinized for every team and Sabermetric principles are applied to the roster construction throughout the lifetime of the organization to encapsulate the hits and misses by front office executives. Team performances are analyzed based on transaction type with graphs depicting the WAR (Wins Above Replacement) in every decade. Individual results for each player-transaction is charted over the duration of their stint with the franchise. Every team chapter includes All-Time Rosters and Single-Season Leaders based on transaction type. The Team Trade Record chronicles the WAR and WS (Win Shares) accumulated by players acquired in comparison to those traded to opposing teams. The opening chapter is devoted to the Evolution of the General Manager and incorporates a discussion with former Dodgers GM Fred Claire (along with former Angels and Red Sox GM Mike Port and current Reds GM Nick Krall in Volume 2) on a variety of front office topics.
“Hardball Retroactive”, published in June 2018, is available in paperback and digital (Kindle) format at Amazon.com. Hardball Retroactive is a modest collection of selected articles that I have written for Seamheads.com along with my Baseball Analytics blog since 2010. Exclusive content includes the chapter on “Minors vs. Majors” which assesses every franchise’s minor league successes and failures in relation to their major league operations.
“Hardball Retrospective” is available in paperback and digital (Kindle) format at Amazon.com. Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com. In Hardball Retrospective, I placed every ballplayer in the modern era (from 1901-present) on their original teams. Using a variety of advanced statistics and methods, I generated revised standings for each season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the real-time or “actual” team results to assess each franchise’s scouting, development and general management skills.
Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.
“Hardball Retrospective – Addendum 2014 to 2016” supplements my research for Hardball Retrospective, providing retroactive standings based on Wins Above Replacement (WAR) and Win Shares (WS) for each “original” team over the past three seasons (2014-2016). Team totals from 2010 – 2013 are included for reference purposes. “Addendum” is available in paperback and digital (Kindle) format at Amazon.com.
A lifelong resident of central New Jersey, I enjoy spending quality time with my wife and three children. In my professional life I’ve worked for three local healthcare systems as a server and network administrator over the last 30 years. Co-chair of the SABR Games and Simulations Committee (https://sabrbaseballgaming.com) since August 2022.
My hobbies include baseball, statistics, computers and video games along with freshwater fishing. I have authored five books and contributed articles to Seamheads, Fangraphs and my site, Baseball Analytics. Follow my HardballRetro channels on Twitch for live-streaming of classic and current baseball video games and view the resulting playthrough videos on YouTube!
Visit my Amazon author page to check out my books, promotional videos, and post a review if you're a Hardball Retro fan!
https://www.amazon.com/author/derekbain
My Books:
“Hardball Retro’s Compendium of Baseball Video Games and Electronic Handhelds”,published in September 2024 with co-author John Racanelli, is available in paperback and digital (Kindle) format at Amazon.com.
“Hardball Architects – Volume 1 (American League Teams)”,published in July 2020, is available in paperback and digital (Kindle) format at Amazon.com.
“Hardball Architects – Volume 2 (National League Teams)”,published in April 2022, is available in paperback and digital (Kindle) format at Amazon.com.
“Hardball Architects” examines the trades, free agent acquisitions, draft picks and other transactions for the 30 Major League Baseball franchises, divided into a 2-volume set (American League and National League). All key moves are scrutinized for every team and Sabermetric principles are applied to the roster construction throughout the lifetime of the organization to encapsulate the hits and misses by front office executives.
“Hardball Retroactive”,published in June 2018, is available in paperback and digital (Kindle) format at Amazon.com. A cross-section of essays that I penned for Seamheads.com along with my Baseball Analytics blog spanning nearly a decade touching on subjects including "Taking the Extra Base", "General Manager Scorecard", "Worst Trades", "BABIP By Location" and "Baseball Birthplaces and the Retro World Baseball Classic". Rediscover your favorite hardball arcade and simulations in "Play Retro Baseball Video Games In Your Browser" or take a deep dive into every franchise's minor league successes and failures in relation to their major league operations in "Minors vs. Majors".
“Hardball Retrospective” is available in paperback and digital (Kindle) format at Amazon.com.Supplemental Statistics, Charts and Graphs along with a discussion forum are offered at TuataraSoftware.com. In Hardball Retrospective, I placed every ballplayer in the modern era (from 1901-present) on their original teams. Using a variety of advanced statistics and methods, I generated revised standings for each season based entirely on the performance of each team’s “original” players. I discuss every team’s “original” players and seasons at length along with organizational performance with respect to the Amateur Draft (or First-Year Player Draft), amateur free agent signings and other methods of player acquisition. Season standings, WAR and Win Shares totals for the “original” teams are compared against the real-time or “actual” team results to assess each franchise’s scouting, development and general management skills.
Don Daglow (Intellivision World Series Major League Baseball, Earl Weaver Baseball, Tony LaRussa Baseball) contributed the foreword for Hardball Retrospective. The foreword and preview of my book are accessible here.
“Hardball Retrospective - Addendum 2014 to 2016”supplements my research for Hardball Retrospective, providing retroactive standings based on Wins Above Replacement (WAR) and Win Shares (WS) for each "original" team over the past three seasons (2014-2016). Team totals from 2010 - 2013 are included for reference purposes. “Addendum” is available in paperback and digital (Kindle) format at Amazon.com.
Contact me on BlueSky - @hardballretro.bsky.social
Derek,
This reminds me of what I did for finding opening day batting orders. Teams use so many different orders during any one season, I wondered which one would be the team’s #1 choice. I figured out how to find the order that was used most often, but even that is used much less often than one might expect.
I decided that the order used on Opening Day would, more often than not, be the one that a team would like to use as often as possible throughout the season.
The lineups are available in retrosheet game logs (separate from the event files). I store all retrosheet data in MySQL tables so I just had to write a query to find the orders used in the first game of each season. I used the opening day order compare to the simulated run production from all other possible combinations.
As I store each year in a separate table, I then had to write a PERL script to loop through the seasons.
This (and last night’s meeting) has given me a couple of ideas about simulations that I am going to start working on. If I can find a break in golf season, I could whip up a post.
Cheers,
-Tim