Ch.14 Solutions

Prerequisites

library(tidyverse)
library(babynames)

14.2.4 Exercises:

These can all be done using raw strings as well.

str_1 <- 'He said "That\'s amazing!"'
str_2 <- "\\a\\b\\c\\d"
str_3 <- "\\\\\\\\\\\\"

str_view(str_1)
## [1] │ He said "That's amazing!"
str_view(str_2)
## [1] │ \a\b\c\d
str_view(str_3)
## [1] │ \\\\\\

There is a Wikipedia page for this character here. It’s a space that prevents an automatic line break.

x <- "This\u00a0is\u00a0tricky"
str_view(x)
## [1] │ This{\u00a0}is{\u00a0}tricky
x
## [1] "This is tricky"

14.3.4 Exercises:

Code below:
```
paste0("hi ", NA)
## [1] "hi NA"
str_c("hi ", NA)
## [1] NA
paste0(letters[1:2], letters[1:3])
## [1] "aa" "bb" "ac"
#str_c(letters[1:2], letters[1:3]) This creates an error
```
- str_c() is less flexible with certain inputs. This trade-off in flexibility (infectious NA s and requiring same length strings) across the tidyverse limits possible unintended results.
Paste0() is a special case of paste where the separator between results is the empty string. Can recreate paste() with str_c() using the the sep argument.
```
str_c('x', 'y')
## [1] "xy"
str_c('x', 'y', sep = ' ')
## [1] "x y"
```

Code below:

#1
food = 'hotdog'
price = 100
str_c("The price of ", food, " is ", price)
## [1] "The price of hotdog is 100"
str_glue("the price of {food} is {price}")
## the price of hotdog is 100

#2
age = 12
country = 'Germany'
str_glue("I'm {age} years old and live in {country}")
## I'm 12 years old and live in Germany
str_c("I'm ", age, " years old and live in ", country)
## [1] "I'm 12 years old and live in Germany"

#3
title = 'Fido'
str_c("\\section{", title, "}")
## [1] "\\section{Fido}"
str_glue("\\\\section{{{title}}}")
## \\section{Fido}

14.5.3 Exercises:

the babyname dataset is not a row per baby but rather a row per name per year. Therefore we need to compute the sum of n and not the number of occurrences of n. (This is a good example why you should always read documentation and examine a dataset before attempting to pull information from it).

If the length was even I chose 2 characters instead of one.

babynames |> 
  mutate(
    middle = if_else(
      str_length(name) %% 2 == 0,
      str_sub(name, str_length(name) %/% 2, (str_length(name) %/% 2) + 1),
      str_sub(name, str_length(name) %/% 2 + 1, (str_length(name) %/% 2) + 1)
    )
  ) |> 
  select(name, middle)
## # A tibble: 1,924,665 × 2
##    name      middle
##    <chr>     <chr> 
##  1 Mary      ar    
##  2 Anna      nn    
##  3 Emma      mm    
##  4 Elizabeth a     
##  5 Minnie    nn    
##  6 Margaret  ga    
##  7 Ida       d     
##  8 Alice     i     
##  9 Bertha    rt    
## 10 Sarah     r     
## # ℹ 1,924,655 more rows

I decided to create a graph of average name length by year and popularity of first letter by year.

babynames |> 
  group_by(year) |> 
  summarise(
    avg_len =weighted.mean(str_length(name), w = n)
  ) |> 
  ggplot(aes(year, avg_len)) + 
  geom_smooth(method = 'loess', formula = 'y~x') +
  labs(x = 'Year', 'Avg. Length')

babynames |> 
  mutate(first_letter = str_sub(name, 1, 1)) |> 
  group_by(year, first_letter) |> 
  summarise(
    occurences = sum(n), 
    .groups = 'drop_last'
  ) |> 
  mutate(
    freq = occurences / sum(occurences)*100,
    label = if_else(year == 2000, first_letter, NA)
  ) |> 
  ggplot(aes(year, freq, label = label)) +
  geom_line()+
  geom_label(nudge_y = 1, na.rm = TRUE) +
  facet_wrap(~first_letter) +
  labs(x = 'Year', y = 'Percentage of Year Total') +
  theme(text = element_text(size = 10))