By Eric Burden | December 22, 2020
In the spirit of the holidays (and programming), I’ll be posting my solutions to the Advent of Code 2020 puzzles here, at least one day after they’re posted (no spoilers!). I’ll be implementing the solutions in R because, well, that’s what I like! What I won’t be doing is posting any of the actual answers, just the reasoning behind them.
Also, as a general convention, whenever the puzzle has downloadable input, I’m
saving it in a file named input.txt
.
Day 21 - Allergen Assessment
Find the problem description HERE.
Part One - Inferring Ingredients
According to today’s puzzle, we are either allergic to most everything or we
are a very young child who needs to avoid exposure to allergens in case of
unknown interactions. Probably the first one, right? Either way, it would be
helpful if we spoke the language, but at least we’re pretty good at identifying
patterns! Essentially, we’re looking for ingredients that do not appear in
every recipe containing a particular allergen, for example, if there are two
recipes that contain ‘dairy’ and one of them contains nhms
and the other
doesn’t, then you know nhms
is not your dairy allergen.
# Setup ------------------------------------------------------------------------
test_input <- c(
"mxmxvkd kfcds sqjhc nhms (contains dairy, fish)",
"trh fvjkl sbzzf mxmxvkd (contains dairy)",
"sqjhc fvjkl (contains soy)",
"sqjhc mxmxvkd sbzzf (contains fish)"
)
real_input <- readLines('input.txt')
library(dplyr)
library(purrr)
# Functions --------------------------------------------------------------------
# Helper function, given a line number `line_no` and a list where the first
# element is a space-separated vector of ingredients and the second element
# is a space-separated vector of allergens `split_line`, returns a data frame
# with one row for each combination of `line_no` (as 'recipe'), ingredient,
# and allergen.
line_to_df <- function(line_no, split_line) {
parts <- strsplit(split_line, ' ')
expand.grid(
recipe = line_no,
ingredient = parts[[1]],
allergen = parts[[2]],
stringsAsFactors = F
)
}
# Helper function, given a list of lines from the puzzle input, returns a
# data frame containing one row per line number (recipe), ingredient, and
# allergen
parse_input <- function(input) {
line_count <- length(input)
only_words <- gsub('[\\(\\),]', '', input)
split_lines <- strsplit(only_words, 'contains ')
dfs <- mapply(line_to_df, 1:line_count, split_lines, SIMPLIFY = F)
do.call(rbind, dfs)
}
# Processing -------------------------------------------------------------------
recipe_list <- parse_input(real_input) # Parse the input
# For each type of allergen, return a data frame consisting of that allergen
# and all ingredients that appear in all recipes containing that allergen.
possible_allergens <- recipe_list %>%
group_by(allergen, recipe) %>%
summarise(ingredients = list(ingredient)) %>%
summarise(ingredient = reduce(ingredients, intersect))
# Remove all rows containing ingredients that definitely contain an allergen
# (though we're not sure which one), then count the number of ingredient
# and recipe combinations remaining
answer1 <- recipe_list %>%
filter(!(ingredient %in% possible_allergens$ingredient)) %>%
distinct(recipe, ingredient) %>%
nrow()
Hey, we used a few more pipes! Also, notice that
summarise(x = list(y)) %>% summarise(z = reduce(x, intersect))
pattern. This
is how we create a column containing all the ingredient
that appear in
every recipe for each allergen. This is a pretty uncommon pattern (I found
a couple of StackOverflow questions that either get this wrong or make it
really complicated), so I encourage you to try it out and see how it works.
Part Two - Dietary Detective
Once we remove the ingredients we know won’t cause a reaction, we can figure out which ingredients contain which allergens. Maybe we’ve just got one really severe allergy after all. If you recall Day 16, Part 2, this is practically the same thing.
# Setup ------------------------------------------------------------------------
source('exercise_1.R')
# Processing -------------------------------------------------------------------
# Setup, count the number of ingredients that contain an allergen and create an
# empty data frame to add allergen/ingredient combinations to as we confirm
num_allergen_ingredients <- length(unique(possible_allergens$ingredient))
confirmed_allergens <- data.frame(
allergen = character(0),
ingredient = character(0)
)
# Until the number of confirmed allergens is as large as the number of allergens...
while (nrow(confirmed_allergens) < num_allergen_ingredients) {
# Remove all the previously confirmed allergens from the table of possible
# allergens, then identify which ingredients are now associated with a
# single allergen
newly_confirmed_allergens <- possible_allergens %>%
filter(!(allergen %in% confirmed_allergens$allergen)) %>%
group_by(ingredient) %>%
mutate(n = n()) %>%
filter(n == 1) %>%
select(-n)
# Add the newly confirmed allergens to the ongoing list of confirmed
# allergens
confirmed_allergens <- bind_rows(
confirmed_allergens,
newly_confirmed_allergens
)
}
# Sort the table of confirmed allergen/ingredient combinations by allergen name,
# then collapse the ingredient names with commas between
answer2 <- confirmed_allergens %>%
arrange(allergen) %>%
pull(ingredient) %>%
paste(collapse = ',')
Thankfully, we’ve got a pretty good pattern to follow for this kind of problem.
Wrap-Up
For once, I’m glad to be catching up a bit on AoC challenges, which means I’ve tackled Days 16 and 21 less than five days apart, so it was still pretty fresh on my mind. These types of ‘data manipulation’ heavy challenges are a nice change because they’re a bit more comfortable for me and offer a nice ‘break’ from some of the more ‘fast implementation’ problems that I don’t think about very often. The flip side, though, is I don’t feel like I learn as much from them, but I find the mix of problem types in AoC to be pretty great.
If you found a different solution (or spotted a mistake in one of mine), please drop me a line!