library(tidyverse)
library(babynames)
Ch.14 Solutions
Prerequisites
14.2.4 Exercises:
These can all be done using raw strings as well.
<- 'He said "That\'s amazing!"' str_1 <- "\\a\\b\\c\\d" str_2 <- "\\\\\\\\\\\\" str_3 str_view(str_1) ## [1] │ He said "That's amazing!" str_view(str_2) ## [1] │ \a\b\c\d str_view(str_3) ## [1] │ \\\\\\
There is a Wikipedia page for this character here. It’s a space that prevents an automatic line break.
<- "This\u00a0is\u00a0tricky" x str_view(x) ## [1] │ This{\u00a0}is{\u00a0}tricky x## [1] "This is tricky"
14.3.4 Exercises:
Code below:
paste0("hi ", NA) ## [1] "hi NA" str_c("hi ", NA) ## [1] NA paste0(letters[1:2], letters[1:3]) ## [1] "aa" "bb" "ac" #str_c(letters[1:2], letters[1:3]) This creates an error
str_c()
is less flexible with certain inputs. This trade-off in flexibility (infectious NA s and requiring same length strings) across the tidyverse limits possible unintended results.
Paste0()
is a special case of paste where the separator between results is the empty string. Can recreatepaste()
withstr_c()
using the the sep argument.str_c('x', 'y') ## [1] "xy" str_c('x', 'y', sep = ' ') ## [1] "x y"
Code below:
#1 = 'hotdog' food = 100 price str_c("The price of ", food, " is ", price) ## [1] "The price of hotdog is 100" str_glue("the price of {food} is {price}") ## the price of hotdog is 100 #2 = 12 age = 'Germany' country str_glue("I'm {age} years old and live in {country}") ## I'm 12 years old and live in Germany str_c("I'm ", age, " years old and live in ", country) ## [1] "I'm 12 years old and live in Germany" #3 = 'Fido' title str_c("\\section{", title, "}") ## [1] "\\section{Fido}" str_glue("\\\\section{{{title}}}") ## \\section{Fido}
14.5.3 Exercises:
the babyname dataset is not a row per baby but rather a row per name per year. Therefore we need to compute the sum of n and not the number of occurrences of n. (This is a good example why you should always read documentation and examine a dataset before attempting to pull information from it).
If the length was even I chose 2 characters instead of one.
|> babynames mutate( middle = if_else( str_length(name) %% 2 == 0, str_sub(name, str_length(name) %/% 2, (str_length(name) %/% 2) + 1), str_sub(name, str_length(name) %/% 2 + 1, (str_length(name) %/% 2) + 1) )|> ) select(name, middle) ## # A tibble: 1,924,665 × 2 ## name middle ## <chr> <chr> ## 1 Mary ar ## 2 Anna nn ## 3 Emma mm ## 4 Elizabeth a ## 5 Minnie nn ## 6 Margaret ga ## 7 Ida d ## 8 Alice i ## 9 Bertha rt ## 10 Sarah r ## # ℹ 1,924,655 more rows
I decided to create a graph of average name length by year and popularity of first letter by year.
|> babynames group_by(year) |> summarise( avg_len =weighted.mean(str_length(name), w = n) |> ) ggplot(aes(year, avg_len)) + geom_smooth(method = 'loess', formula = 'y~x') + labs(x = 'Year', 'Avg. Length')
|> babynames mutate(first_letter = str_sub(name, 1, 1)) |> group_by(year, first_letter) |> summarise( occurences = sum(n), .groups = 'drop_last' |> ) mutate( freq = occurences / sum(occurences)*100, label = if_else(year == 2000, first_letter, NA) |> ) ggplot(aes(year, freq, label = label)) + geom_line()+ geom_label(nudge_y = 1, na.rm = TRUE) + facet_wrap(~first_letter) + labs(x = 'Year', y = 'Percentage of Year Total') + theme(text = element_text(size = 10))