Visualizing Statcast Pitching Data (Part II)

A brand new coding tutorial showing you how to pull statcast data and create pitch movement plots

Feb 23, 2024

∙ Paid

Yesterday, we posted a few visuals powered by Statcast data, created in R. Today, we’ll dive into the code behind these visuals. If coding and working with data is of no interest to you, feel free to sit this one out.

Important note: The tutorial posts assume that the reader has at least a basic understanding of R. If you've never written a line of code or worked with baseball data, there are many excellent introductory resources. R for Data Science by Hadley Wickham & Garrett Grolemund is a great place to start. It covers everything you'll need to know to begin working with data, performing analysis, and building visualizations.

Share Down on the Farm

Project Setup & Data Acquisition

We’ll us a few different libraries to work with our data and plot, so if you don’t already have all of the packages used in this tutorial, go ahead and install them on your machine before you move on. The code to grab all data for a whole season may take some time, so be patient.

# load libraries to be used
library(tidyverse)
library(baseballr)
library(mlbplotR)
library(lubridate)

# sequence of dates to use in statcast_search()

all_dates <- seq(ymd(paste0(year_run, "-04-01")), ymd(paste0(year_run, "-09-30")), by = '1 day') |>
  as_tibble() |>
  rename(date_val = value)

# create empty list to be filled with search results

n <- nrow(all_dates)
date_list <- list()

# loop over date sequence and acquire statcast data in small portions

for (i in 1:n) {

    date_pk <- all_dates$date_val[i]

    date_info <- paste0(
      sprintf("date number %i of %i (%.2f %%) ", i, n, i / n * 100.0),
      all_dates$date_val[i]
    )

    cat(date_info, sep="\n")

    df <- try(statcast_search(start_date = date_pk,
                             end_date = date_pk,
                             player_type = 'pitcher'))

    if (!is.null(df) & is.data.frame(df) & nrow(df) > 0) {

    date_list[[i]] = df

  }

}

# now do a little data cleanup
statcast_data <- bind_rows(date_list) |> # combine list of dataframes
  filter(
    !pitch_type %in% c("FA", "PO", "EP", "")
  ) |> # drop some unwanted pitch types
  mutate(
    pfx_x_adj = if_else(p_throws == "R", -pfx_x, pfx_x),
    pitching_team = if_else(inning_topbot == "Top", home_team, away_team)
  ) # adjust horizontal movement for handedness, create a pitching team variable

Movement Plot(s)

In order to easily create multiple movement plots for whichever player you wish, it’s best practice to create a function. That’s what we’ll do, and call it movement_plot().

Keep reading with a 7-day free trial

Subscribe to Down on the Farm to keep reading this post and get 7 days of free access to the full post archives.