MSDS 692 - Practicum Project
Author: Pat Kennedy
What effect does a MLB team’s payroll have on the success and output of the product on the field?
- Are there correlations with:
- Winning percentage?
- Team statistics?
- Hitting
- Pitching
- Postseason?
Two datasets:
- Historical Dataset: MLB_HistoricalPayroll.csv
- 2019 – 2023
- 1st half of dataset: Financial Data (Payroll Information & Ranking)
- 2nd half of dataset: Performance Data (Win %, batting stats, pitching stats, Postseason - Y/N)
- Source: https://www.mlb.com/stats/team
- Future Dataset: MLB_FuturePayroll.csv
- 2024 Financial Data
- Source: https://www.mlb.com/stats/team
- No Performance Data (season has yet to start)
- Goal: Make predictions for 2024 Performance Data
- 2024 Financial Data
(From Historical Dataset)
- Key:
- Light Tan = Strong Positive correlation
- Dark Purple = Strong Negative correlation
Findings: There are strong relationships between Pitching Stats (WHIP / ERA) & Postseason
(From Historical Dataset) Examining the relationship between Payroll Rank and Winning Percentage
- Key:
- Red = Team made Postseason
- Blue = Team did not make Postseason
Findings:
- Teams that rank in the top 5 in payroll made the playoffs 14/20 times. That is 70%
- There have only been 3 instances of a team in the top 5 rank that had a below .500 winning percentage.
Since the goal of the project is to predict multiple outputs for the future dataset based on trends from the historical dataset, I decided to use a Random Forest Regressor wrapped in a Multi-Output Regressor.
From Historic Dataset:
- Features: Financial Data columns
- Targets: Performance Data columns
Scaled the historical dataset and used the same scale to the future dataset.
In the MLB, 12 Teams make the Postseason in a given season (6 from each League).
The model I built selected the teams that had the highest percentage chance of making the postseason out of 30 teams for each League. Here, they are listed, with respects to their chance of making the postseason as well as where they would be positioned in the postseason clinchings:
American League
- Kansas City Royals (94%) - Central Division Winner
- Seattle Mariners (93%) - West Division Winner
- Chicago White Sox (89%) - Wild Card Team #1
- Houston Astros (75%) - East Division Winner
- Toronto Blue Jays (71%) - Wild Card Team #2
- Texas Rangers (66%) - Wild Card Team #3
National League
- Los Angeles Dodgers (84%) - West Division Winner
- Philadelphia Phillies (77%) - Central Division Winner
- Atlanta Braves (75%) - East Division Winner
- St. Louis Cardinals (68%) - Wild Card Team #1
- New York Mets (58%) - Wild Card Team #2
- Pittsburgh Pirates (54%) - Wild Card Team #3
In this project, I examined the relationship between MLB teams' payroll and their output and success on the field. Furthermore, I made predictions for team statistics based on preseason payroll data. Throughout the project, I found strong correlations between pitching stats and payroll rank. In addition, there were interesting correlations between payroll rank, winning percentage, and whether or not a team made the postseason that year. To make predictions for the 2024 season, I used Multi-Output Regresson and Random Forest Regresson. I am excited measure how my preseason predictions match up at the end of the 2024 regular season in October!