output | ||||
---|---|---|---|---|
|
GitHub: EquiR
Chung Au-Yeung - BCC PHM
Late Updated: 2024-02-29
EquiR is an R package designed specifically to streamline the creation of combined heatmap, column, and bar charts within a single graph, tailored for Birmingham City Council's needs. The primary objective of EquiR is to simplify the process of generating such visualizations, offering a range of functions optimised to accommodate various types of data frames provided by users.
!!Consider utilising categorical variables for plotting purposes, as it is advisable to convert continuous data into categories prior to utilising the functions.!!
The package can be installed the from GitHub by typing the following command into the RStudio console:
devtools::install_github("BCC-PHM/EquiR")
EquiR
will automatically download any missing prerequisite
libraries so this may take a few minutes the first time running it on
your machine.
Once installed, you can open a new file by clicking the icon in the top left corner of RStudio underneath "file". In this new script, you can load the library at the start of a new R script using the following function.
library("EquiR")
The EquiR
pacakge supports three types of data frames provided by the users and they are:
- Record level
- Multidimensional
- Aggreated level
Different types of data frames require different functions to be used from the EquiR
pacakge. The following is the demonstration:
Record-level data refers to individual entries or observations within a dataset, each representing a distinct unit or instance of information.
ID | Age | Gender | HEIGHT__value | WEIGHT_value | Smoking_status | Ethnicity_Broad | Outcome | IMD_decile |
---|---|---|---|---|---|---|---|---|
1 | 42 | Male | 1.83 | 84 | Never smoked | Asian | Normal | IMD decile 3+ |
24 | 66 | Male | 1.66 | 72 | Non-smoker - history unknown | Asian | Pre-diabetic | IMD decile 1 |
35 | 41 | Female | 1.515 | 66 | Never smoked | Asian | Normal | IMD decile 1 |
41 | 42 | Female | 1.58 | 65 | Never smoked | Asian | Normal | IMD decile 1 |
54 | 52 | Male | 1.73 | 62 | Never smoked | Asian | Normal | IMD decile 3+ |
The function you will need to use from "EquiR" to make the plot is Ineq_record_level_heatmap()
. The function takes
the following basic arguments:
data
: A record level data supplied by userscol
: A column fromdata
consisting a categorical variable defined by user which will be the column of the heatmaprow
: A row fromdata
consisting a categorical variable defined by user which will be the row of the heatmapcoln
: The label to be displayed for thecol
on the graph defined by usersrown
: The label to be displayed for therow
on the graph defined by usersunit
: user can define unit if supplied, otherwise unit will be count as always unless Percent argument is given T, then unit wil be Percentaage automaticallypercent
: To allow the graph to display as percentage of total, the default dispaly is numbercolour
: User defined colour for the graph (Default ="blue"
)
Therefore, we can generate the graph by running:
Ineq_record_level_heatmap(data = example_data,
col = "Ethnicity_Broad",
row = "IMD_decile",
coln = "Eth",
rown = "IMD",
unit = "Count",
colour = "blue" )
This produces a graph that looks like this:
Multidimensional data refers to datasets or information that contain multiple variables or dimensions(>=3), while a single column summarises the number of observations corresponding to individuals meeting specific conditions.
LA Code | LA | Ethnic_group | Economic_inactive | Age | Observation |
---|---|---|---|---|---|
E08000025 | Birmingham | White | Retired | Aged 65 years and over | 97864 |
E08000025 | Birmingham | Asian | Student | Aged 16 to 24 years | 30507 |
E08000025 | Birmingham | White | Student | Aged 16 to 24 years | 28167 |
E08000025 | Birmingham | Asian | Looking after home or family | Aged 35 to 49 years | 18123 |
E08000025 | Birmingham | Asian | Retired | Aged 65 years and over | 16280 |
E08000025 | Birmingham | White | Long-term sick or disabled | Aged 50 to 64 years | 13759 |
E08000025 | Birmingham | White | Retired | Aged 50 to 64 years | 10661 |
E08000025 | Birmingham | Black | Student | Aged 16 to 24 years | 10010 |
The function you will need to use from "EquiR" to make the plot is Ineq_multidi_level_heatmap()
. The function takes
the following basic arguments:
data
: A Multidimensional data supplied by userscol
: A column fromdata
consisting a categorical variable defined by user which will be the column of the heatmaprow
: A row fromdata
consisting a categorical variable defined by user which will be the row of the heatmapvalue
: The variable that contains the sum of observationscoln
: The label to be displayed for thecol
on the graph defined by usersrown
: The label to be displayed for therow
on the graph defined by usersunit
: user can define unit if supplied, otherwise unit will be count as always unless Percent argument is given T, then unit wil be Percentaage automaticallypercent
: To allow the graph to display as percentage of total, the default dispaly is numbercolour
: User defined colour for the graph (Default ="blue"
)
Therefore, we can generate the graph by running:
Ineq_multidi_level_heatmap(data = example_data2,
col = "Ethnic_group",
row = "Age",
value= "Observation",
coln = "Eth",
rown = "Age gp",
unit = "Count",
colour = "red")
This produces a graph that looks like this:
Aggregated level data within this package refers to information that has been combined or summarised from individual-level data to provide a higher-level perspective or summary. This dataframe is designed to include only two columns of categorical variables and one column for observations.
Ethnicity | reason | Values |
---|---|---|
White | Retired | 108806 |
Asian | Looking after home or family | 38004 |
Asian | Student | 34357 |
Black | Student | 31371 |
White | Long-term sick or disabled | 28841 |
White | Looking after home or family | 21297 |
Other | Retired | 18448 |
The function you will need to use from "EquiR" to make the plot is Ineq_aggregated_level_heatmap()
. The function takes
the following basic arguments:
data
: An Aggreated level data supplied by userscol
: A column fromdata
consisting a categorical variable defined by user which will be the column of the heatmaprow
: A row fromdata
consisting a categorical variable defined by user which will be the row of the heatmapvalue
: The variable that contains the sum of observationscoln
: The label to be displayed for thecol
on the graph defined by usersrown
: The label to be displayed for therow
on the graph defined by usersunit
: user can define unit if supplied, otherwise unit will be count as always unless Percent argument is given T, then unit wil be Percentaage automaticallypercent
: To allow the graph to display as percentage of total, the default dispaly is numbercolour
: User defined colour for the graph (Default ="blue"
)
Therefore, we can generate the graph by running:
Ineq_multidi_level_heatmap(data = example_data3,
col = "Ethnicity",
row = "reason",
value= "Values",
coln = "Eth",
rown = "reason",
unit = "Count",
colour = "blue")
This produces a graph that looks like this:
We can also change the colour palette by setting the colour
argument.
The default is set to "blue"
and currently only two more palette are available namely "red"
and "green"
.
We can also change the heatmap, bar and column into percentage of total, all the functions from "EquiR" have the option of turning the percent
argument on by passing "percent= T"
. The "unit"
will automaticaly labeled as percentage if users do not supply one.
We can generate the graph in percentage by running:
Ineq_multidi_level_heatmap(data = example_data3,
col = "Ethnic_group",
row = "Age",
value= "Observation",
coln = "Eth",
rown = "Age gp",
percent = T,
colour = "red")
This produces a graph that looks like this: