Are cupcakes empirically different than muffins? Let's find out!
This repository contains the following R notebooks:
1-web-scraping.Rmd
: Contains code for scraping ingredients lists, calories, and numbers of servings from www.allrecipes.com for both cupcakes and muffins. Exportsmuffin_raw.rds
andcupcake_raw.rds
2-wrangling.Rmd
: Extracts ingredients, units, and amounts from raw ingredient lists. Converts amounts/units of all ingredients into cups or ounces. Filters data in various ways to ensure integrity of data. Importsmuffin_raw.rds
andcupcake_raw.rds
, exportsrecipes_tidy.rds
andnofrosting_tidy.rds
3-EDA-outlier-removal.Rmd
: Does some exploratory data analysis to detect anomalies and outliers. Exportsrecipes_wide.rds
And the following .rds files:
cupcake_raw.rds
: a list of data frames of raw ingredients lists, servings, and calories from cupcake recipes.muffin_raw.rds
: a list of data frames of raw ingredients lists, servings, and calories from muffin recipes.recipes_tidy.rds
: finalized dataframe containing all acceptable muffin and cupcake recipes with ingredients categorized and amounts summarized by recipe (e.g. if there are two kinds of sugar, they are added together). This dataframe also contains number of servings and calories.nofrosting_tidy.rds
: obviously cupcakes have frosting and muffins do not. In this dataset, frosting and decoration ingredients are removed, as is the calories column since it is no longer accurate.recipes_wide.rds
: A wide data frame (each ingredient in its own column) ready for multivariate analysis.nofrosting_wide.rds
: A wide version ofnofrosting_tidy.rds
.