Down on the Farm

Down on the Farm

Share this post

Down on the Farm
Down on the Farm
How To: Scraping MVP Results & WAR Data
Copy link
Facebook
Email
Notes
More
"How To" Tutorials

How To: Scraping MVP Results & WAR Data

Scraping historical MVP results data and plotting it by wins above replacement

Scott & Josh's avatar
Scott & Josh
Nov 11, 2022
∙ Paid
4

Share this post

Down on the Farm
Down on the Farm
How To: Scraping MVP Results & WAR Data
Copy link
Facebook
Email
Notes
More
Share

Good afternoon! Today’s tutorial will walk through how to make the plots from yesterday looking at historical MVP results and WAR from yesterday’s post. The tutorial posts will be for premium subscribers only, but we’ll occasionally post them for everyone like we did a few weeks ago. You can find that post here.

If you are a premium subscriber and don’t care about R or coding, you can skip past this section and go right to the updates from the Arizona Fall League last night by clicking HERE.

Share Down on the Farm

Important note: The tutorial posts are going to assume that the reader has at least a basic understanding of R. If you’ve never written a line of code or worked with baseball data, there are many great introductory resources out there. A great place to start is R for Data Science by Hadley Wickham & Garrett Grolemund. It covers everything you’ll need to know to begin working with data, performing analysis, and building visualizations. RStudio also puts out many great resources like this guide for getting started in R.


Scraping MVP Voting & WAR Data from Baseball Reference

We aren’t going to go through the entire post from yesterday line-by-line, but we’ll get you started with some of the code to scrape the data from Baseball-Reference and pop it in some tables and plots.

# load packages needed
library(tidyverse)
library(rvest)
library(janitor)
library(data.table)
library(stringr)
library(gt)
library(gtExtras)
library(ggrepel)
library(prismatic)

# set theme for plot
theme_scott <- function () {

 theme_minimal(base_size=9) %+replace%
 theme(
    panel.grid.minor = element_blank(),
    plot.background = element_rect(fill = "#F9F9F9", color = "#F9F9F9"),
    )
}

The first thing we need to do is get our data. We’re going to build a little web scraper using the rvest package that grabs some data from Baseball-Reference.com. We’ll show you how to write a function that will scrape the data, clean it, then combine it by year. But first we’ll break it down in small steps.

# set the year we want for the MVP voting
year = 2021

# create URL needed to grab the results, inserting in the year variable
url = paste0('https://www.baseball-reference.com/awards/awards_', year, '.shtml#AL_MVP_voting_link')

# get the results for MVP voting from the AL in 2021
al = url %>% 
   read_html() %>%
   html_nodes("#AL_MVP_voting") %>% 
   html_table()

Keep reading with a 7-day free trial

Subscribe to Down on the Farm to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Josh
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More