-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsix_nations.Rmd
65 lines (54 loc) · 1.79 KB
/
six_nations.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
title: "Scrape and clean Six Nations data"
author: "Anthony Caravaggi"
date: "`r format(Sys.Date())`"
output: github_document
---
This was a simple exercise in scraping and cleaning a table from a website.
Load required libraries
```{r}
library(XML)
library(RCurl)
library(rlist)
library(stringr)
```
Import website. Here I'm extracting data from Pick and Go: http://www.lassen.co.nz/pickandgo.php
```{r}
url.00 <- getURL("http://www.lassen.co.nz/pickandgo.php?fyear=%22,2000:2017,%22&tyear=%22,2000:2017,%22&hema=NH&tourn=6N#hrh",.opts = list(ssl.verifypeer = FALSE) )
```
Read all tables from the site and preserve the largest (which.max)
```{r}
tables <- readHTMLTable(url.00)
tables <- list.clean(tables, fun = is.null, recursive = FALSE)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
sn.00 <- tables[[which.max(n.rows)]]
```
Check the head and tail for unwated data and remove if present
```{r}
head(sn.00)
```
```{r}
tail(sn.00)
```
```{r}
sn.00 <- sn.00[-c(271:273),]
```
Split columns which contain combined data (e.g. teams: Wal v Fra) by character
```{r}
m <- str_split_fixed(sn.00$Match, " v ", 2)
s <- str_split_fixed(sn.00$Score, "-", 2)
t <- str_split_fixed(sn.00$Tries, ":", 2)
p <- str_split_fixed(sn.00$Pnts, "-", 2)
```
Extract the year the competition took place
```{r}
y <- str_sub(sn.00$Date, start= -4)
```
Recombine all into new dataframe and rename columns
```{r}
SixNations <- data.frame(y, sn.00$Rnd, m[,1], m[,2], s[,1], s[,2], t[,1], t[,2], p[,1], p[,2], sn.00$Venue)
names(SixNations) <- c("year", "round", "home", "away", "score_h", "score_a", "tries_h", "tries_a",
"points_h", "points_a", "venue")
head(SixNations)
```
The final dataframe can be found at: https://github.com/arcaravaggi/browncoat/blob/master/six_nations_data/six_nations_00-17.csv