During the data exploration phase, developers write repeated code to investigate the summary view based on different categories. The goal of this package is to avoid writing boilerplate code during the data exploration phase. This package implements counting the number of observations per category in a given dataset and returns the top observations.
This package is not in the CRAN yet. You can install the development
version of functionsnashid
from the
GitHub repository
with:
devtools::install_github("stat545ubc-2021/functionsnashid")
Please check ?count_by_category
for a more detailed explanation of the
function. Now we demonstrate the basic usage of the function. In the
following example, we get the number of games per genre from the
steam_games dataset.
- Results in descending order by default:
suppressMessages(library(tidyverse))
suppressMessages(library(datateachr))
library(functionsnashid)
games <- steam_games %>%
select(id, name, genre, publisher, developer, original_price, release_date, all_reviews) %>%
separate_rows(genre, sep = ",", convert = TRUE)
count_by_category(steam_games, genre, 5)
#> # A tibble: 5 × 2
#> genre count
#> <chr> <int>
#> 1 Action 2386
#> 2 Action,Indie 2129
#> 3 Casual,Indie 1732
#> 4 Action,Adventure,Indie 1585
#> 5 Adventure,Indie 1520
- Results in ascending order:
count_by_category(steam_games, genre, 5, FALSE)
#> # A tibble: 5 × 2
#> genre count
#> <chr> <int>
#> 1 Accounting,Animation & Modeling,Audio Production,Design & Illustration,… 1
#> 2 Accounting,Education,Software Training,Utilities,Early Access 1
#> 3 Action,Adventure,Casual,Early Access 1
#> 4 Action,Adventure,Casual,Free to Play 1
#> 5 Action,Adventure,Casual,Free to Play,Early Access 1
Here we would demonstrate the usage of the function count_by_category
to explore different dataset:
We see Acer genus i.e. family of Maple trees are the most common in vancouver.
count_by_category(vancouver_trees, genus_name, 5)
#> # A tibble: 5 × 2
#> genus_name count
#> <chr> <int>
#> 1 ACER 36062
#> 2 PRUNUS 30683
#> 3 FRAXINUS 7381
#> 4 TILIA 6773
#> 5 QUERCUS 6119
count_by_category(apt_buildings, property_type, 5)
#> # A tibble: 3 × 2
#> property_type count
#> <chr> <int>
#> 1 PRIVATE 2888
#> 2 TCHC 327
#> 3 SOCIAL HOUSING 240
count_by_category(apt_buildings, heating_type, 5)
#> # A tibble: 3 × 2
#> heating_type count
#> <chr> <int>
#> 1 HOT WATER 2789
#> 2 FORCED AIR GAS 315
#> 3 ELECTRIC 265