-
Notifications
You must be signed in to change notification settings - Fork 0
/
ae-02.Rmd
101 lines (76 loc) · 3.46 KB
/
ae-02.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: "Bechdel Tsst"
author: "[YOUR NAME]"
date: "[DATE]"
output:
pdf_document:
fig_height: 4
fig_width: 9
---
In this mini analysis we work with the data used in the 2014 FiveThirtyEight story titled ["The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women"](https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/).
## Data and packages
We start with loading the packages we'll use.
```{r load-packages, message=FALSE}
library(fivethirtyeight)
library(tidyverse)
```
The dataset contains information on `r nrow(bechdel)` movies released between `r min(bechdel$year)` and `r max(bechdel$year)`. However we'll focus our analysis on movies released between 1990 and 2013.
```{r}
bechdel90_13 <- bechdel %>%
filter(between(year, 1990, 2013))
```
There are ____ such movies.
The financial variables we'll focus on are the following:
- `budget_2013`: Budget in 2013 inflation adjusted dollars
- `domgross_2013`: Domestic (US) gross revenue in 2013 inflation adjusted dollars
- `intgross_2013`: Total interational (i.e., worldwide including US) gross revenue in 2013 inflation adjusted dollars
And we'll also use the variables `binary` and `clean_test` for grouping.
## Analysis
Let's take a look at how median budget and gross revenue vary by whether the movie passed the Bechdel test.
```{r}
bechdel90_13 %>%
group_by(binary) %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))
```
Next, let's take a look at how median budget and gross revenue vary by a more detailed indicator of the Bechdel test result (`ok` = passes test, `dubious`, `men` = women only talk about men, `notalk` = women don't talk to each other, `nowomen` = fewer than two women).
```{r}
bechdel90_13 %>%
# ____ %>%
summarise(med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE))
```
In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we'll first create a new variable called `roi` as the ratio of the total international gross revenue to the budget.
```{r}
bechdel90_13 <- bechdel90_13 %>%
mutate(roi = intgross_2013 / budget_2013)
```
Let's see which movies have the highest return on investment.
```{r}
bechdel90_13 %>%
arrange(desc(roi)) %>%
select(title, clean_test, binary, roi, budget_2013, intgross_2013)
```
Below is a visualization of the return on investment by test result, however it's difficult to see the distributions due to a few extreme observations.
```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "___",
color = "Binary Bechdel result")
```
Zooming in on the movies with `roi < 10` provides a better view of how the medians across the categories compare:
```{r}
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
ylim(0, 10) +
labs(title = "Return on investment vs. Bechdel test result",
subtitle = "___",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result")
```
- What are the advantages to each plot? What are the disadvantages to each plot?