Scrap Football data table of previous matches from the official football data websites (fbref...) #1076
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
The process of scraping a football squad stats table from a webpage and saving it to an Excel file involves several key steps. First, necessary Python libraries are imported:
requests
for handling HTTP requests,pandas
for data manipulation and storage, andBeautifulSoup
from thebs4
library for parsing HTML content. The URL of the webpage containing the football squad statistics is specified, and an HTTP GET request is sent to retrieve the webpage content. The success of the request is verified by checking if the status code is 200.Once the webpage content is successfully retrieved,
BeautifulSoup
is used to parse the HTML content. The specific stats table is located using CSS selectors, targeting the third table with the classstats_table
. The data is then extracted by looping through the rows of the table, collecting the text from each cell in the row, and storing it in a list. The first row is assumed to contain the column headers, defining the structure of the data.A Pandas DataFrame is created using the extracted data, with the first row as the headers. Finally, this DataFrame is saved to an Excel file named "Team Score.xlsx". This process allows for efficient extraction, manipulation, and storage of football squad statistics in a structured format suitable for further analysis. If the stats table is not found or the webpage retrieval fails, appropriate messages are printed to indicate the issues.
Resolves: [1067]
Checklist
Screenshots
Additional Notes/Comments
I certify that I have carried out the relevant checks and provided the requisite screenshot for validation by submitting this pull request.
I appreciate your contribution.