A Data-Driven Approach to Determining Conference Supremacy
This repository implements a novel method for evaluating the best college football conferences using a cross-country scoring mechanism. This method ranks conferences based on the positions of their top teams in the Associated Press's Top 25 rankings before, throughout, and at the end of each college football season. This provides an objective, measurable comparison of conference strength for every week of the season.
Inspired by the team scoring in cross-country racing, this method sums the "finishing positions," or the ranking, of the top five teams within each conference. The conference with the lowest total score is deemed the best, emphasizing overall depth and strength rather than just top-tier performance (like whichever conference produced the most recent national champion, or bogus math like detailed in the below tweet).
Take a look at the average AP Ranking by Power Five Conference 📊
— FOX College Football (@CFBONFOX) January 18, 2023
Which conference surprises you the most? 👀 pic.twitter.com/qJ0AWkQkOm
This approach was first introduced in 2015 and updated in 2019 and 2024. You can read more about the method and its evolution in the following blog posts:
- 2015: The Race for Supremacy
- 2019: AP XC - An Update
- 2024: Updating the race for conference realignments | Medium
The results of this work from 2012-2023:
This code is built upon the ESPN College Football API, shown by Akshay Easwaran to have hidden endpoints with reliable AP ranking information back to 2014. Thus, this code is dependent upon the quality and stability of ESPN's API data structure.
data
: Directory containing input data files.espn_api.py
: Script for fetching data from the ESPN API.store_data.py
: Script for storing data fetched from external sources.
Make sure all required packages are installed. You can do this in a few ways:
Using your command line or bash terminal:
pip install -r requirements.in
This will install the package's dependencies to your base python interpreter. This is not recommended as other python projects or repositories may require different versions of these packages.
Setting up a virtual python environment (venv) is recommended to ensure no dependency conflicts across your personal projects or with other developers on this project.
To create or update your venv, this repository includes a tool to do this on Windows operating systems.
The py_venv
subdirectory includes setup files that can create your venv for you.
To do this, first open the .\py_venv\set_python_path.bat
in a text editor.
Set the PYTHON_PATH
variable to the base interpreter off of which you want the venv to be built.
- Recommended: use your interpreter that was included in your ArcGIS Pro installation. This will ensure you will have all geoprocessing functions accessible in your venv.
Then, execute the following in a Windows command prompt with the current directory set to the root of this repo:
.\py_venv\setup.bat
This batch file will install a virtual python environment for you in the py_venv
subdirectory
based on the specifications in the .\requirements.in
file.
The .\py_venv\requirements.txt
file is generated at the time of the venv setup.
It indicates the most recent development environment in which this repository was developed.
It is a full pip freeze
of the development environment.
If you have any package dependency issues, you can reference the .\py_venv\requirements.txt
file to compare with your current environment.
Remember to reference the newly created venv as your new python interpreter.
This will be located at .\py_venv\venv\Scripts\python.exe
.
To activate the venv directly in the Windows command prompt, enter
.\py_venv\venv\Scripts\activate
Warning: housing venvs in locations with excessively long paths may cause errors in installing or importing packages.
Make sure to git clone
the repository into a folder without a long file path.
To execute a full run that pulls the latest AP rankings from ESPN and scores them as a cross-country meet,
execute and run the store_data.py
file.
-
Fetch Data:
-
Use
espn_api.py
to fetch the latest college football data from ESPN. -
Critical Functions:
-
full_ap_xc_run(year: int = None, week=None, four_team_score: bool = False) -> dict
Purpose: Fetches the full AP cross-country run data for a given year and week, with an option for four-team scoring.
Inputs:
year
: The year for which to fetch data (optional).week
: The week for which to fetch data (optional).four_team_score
: Boolean indicating whether to use four-team scoring (default isFalse
).
Outputs:
- A dictionary containing the fetched data, including conference team data and conference scores.
-
-
-
Store Data:
-
Use
store_data.py
to store the fetched data into a suitable format for analysis. -
Critical Functions:
-
summarize_data(week, conference_score_tuple: list, n_teams_str: str = pent, existing_summary_df: pd.DataFrame = None)
Purpose: Summarizes data for a given week and conference score tuple. It standardizes the week formatting, handles potential errors, and writes the summary data to a file.
Inputs:
week
: The week to summarize.conference_score_tuple
: List of conference scores.n_teams_str
: String indicating the number of teams (default ispent
).existing_summary_df
: Existing summary DataFrame (optional).
Outputs:
- The summary data as a DataFrame.
-
store_weekly_results(year: int = None, week=None, four_team_score: bool = False)
Purpose: Stores weekly results by calling various functions to fetch, prepare, and write data.
Inputs:
year
: The year to store results for (optional).week
: The week to store results for (optional).four_team_score
: Boolean indicating whether to use four-team scoring (default isFalse
).
Outputs:
- The results of the storage operation.
-
store_all_data_2014_to_present()
Purpose: Stores all data from 2014 to the present year by iterating through each year and week, calling
store_weekly_results
for both four-team and five-team scoring.Inputs: None
Outputs: None
-
-
-
Counterfactual Conference Analysis:
-
Use
counterfactual_conferences_2023.py
to impose the 2024 conference membership schema onto the 2023 season results, previewing how the realigned conferences could perform in 2024. -
Critical Functions:
-
realign_teams(df: pd.DataFrame, n_teams_score: int = 5)
Purpose: Realigns teams based on the 2024 conference membership schema and recalculates their standings using the 2023 season results. This function previews the future strength of each conference under the upcoming realignments.
Inputs:
df
: DataFrame containing the 2023 season results.n_teams_score
: The number of top team scores to sum for each conference (default is 5).
Outputs:
- A DataFrame with teams realigned to their new conferences and the recalculated conference standings.
-
-
Contributions are welcome! Please fork the repository and create a pull request with your changes.
Thanks to John-Lee-Cooper, seanreid5454, & akeaswaran for their thought partnership and good ideas over the years on this project.
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.