library(tidyverse)
library(nycflights13)
ch_26_solutions
Prerequisites:
26.2.8 Exercises:
Parts 1-3 below:
::penguins |> palmerpenguinssummarise(across(everything(), n_distinct)) ## # A tibble: 1 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g ## <int> <int> <int> <int> <int> <int> ## 1 3 3 165 81 56 95 ## # ℹ 2 more variables: sex <int>, year <int>
|> mtcars summarise(across(everything(), mean)) ## mpg cyl disp hp drat wt qsec vs am ## 1 20.09062 6.1875 230.7219 146.6875 3.596563 3.21725 17.84875 0.4375 0.40625 ## gear carb ## 1 3.6875 2.8125
|> diamonds group_by(cut, clarity, color) |> summarise( n = n(), across(where(is.numeric), mean), .groups = 'drop' )## # A tibble: 276 × 11 ## cut clarity color n carat depth table price x y z ## <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 Fair I1 D 4 1.88 65.6 56.8 7383 7.52 7.42 4.90 ## 2 Fair I1 E 9 0.969 65.6 58.1 2095. 6.17 6.06 4.01 ## 3 Fair I1 F 35 1.02 65.7 58.4 2544. 6.14 6.04 4.00 ## 4 Fair I1 G 53 1.23 65.3 57.7 3187. 6.52 6.43 4.23 ## 5 Fair I1 H 52 1.50 65.8 58.4 4213. 6.96 6.86 4.55 ## 6 Fair I1 I 34 1.32 65.7 58.4 3501 6.76 6.65 4.41 ## 7 Fair I1 J 23 1.99 66.5 57.9 5795. 7.55 7.46 4.99 ## 8 Fair SI2 D 56 1.02 64.7 58.6 4355. 6.24 6.17 4.01 ## 9 Fair SI2 E 78 1.02 63.4 59.5 4172. 6.28 6.22 3.96 ## 10 Fair SI2 F 89 1.08 63.8 59.5 4520. 6.36 6.30 4.04 ## # ℹ 266 more rows
It just appends _1, _2, _3 etc. to the end of the name. This is quite unreadable, especially if you don’t have access to the code.
|> diamonds summarise( across(where(is.numeric), list(mean, median)) )## # A tibble: 1 × 14 ## carat_1 carat_2 depth_1 depth_2 table_1 table_2 price_1 price_2 x_1 x_2 ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.798 0.7 61.7 61.8 57.5 57 3933. 2401 5.73 5.7 ## # ℹ 4 more variables: y_1 <dbl>, y_2 <dbl>, z_1 <dbl>, z_2 <dbl>
the
.keep
argument tomutate
determines what columns are maintained after mutating a column. The way I wrote this function doesn’t require embracing since I am not taking columns as an argument. A more flexible function, such as one that allows the user to define the columns to keep/remove, would likely require embracing.<- function(df) { expand_dates |> df mutate( across(where(is.Date), list(year = year, month = month, day = mday)), .keep = 'unused' ) } <- tibble( df_date name = c("Amy", "Bob"), date = ymd(c("2009-08-03", "2010-01-16")) ) expand_dates(df_date) ## # A tibble: 2 × 4 ## name date_year date_month date_day ## <chr> <dbl> <dbl> <int> ## 1 Amy 2009 8 3 ## 2 Bob 2010 1 16
To keep the summary short, the function outputs the count of NAs for each combination of user submitted variables for columns that have at least one NA. The special feature of
where
(as described in the documentation) is its ability to take purrr-like formulas. The logic of thewhere
function in this function reminds me of aHAVING
statement in a SQL query, which is used to filtered data post-aggregation.<- function(df, group_vars, summary_vars = everything()) { show_missing |> df group_by(pick({{ group_vars }})) |> summarize( across({{ summary_vars }}, \(x) sum(is.na(x))), .groups = "drop" |> ) select(where(\(x) any(x > 0))) }::flights |> show_missing(c(year, month, day)) nycflights13## # A tibble: 365 × 9 ## year month day dep_time dep_delay arr_time arr_delay tailnum air_time ## <int> <int> <int> <int> <int> <int> <int> <int> <int> ## 1 2013 1 1 4 4 5 11 0 11 ## 2 2013 1 2 8 8 10 15 2 15 ## 3 2013 1 3 10 10 10 14 2 14 ## 4 2013 1 4 6 6 6 7 2 7 ## 5 2013 1 5 3 3 3 3 1 3 ## 6 2013 1 6 1 1 1 3 0 3 ## 7 2013 1 7 3 3 3 3 1 3 ## 8 2013 1 8 4 4 4 7 1 7 ## 9 2013 1 9 5 5 7 9 2 9 ## 10 2013 1 10 3 3 3 3 2 3 ## # ℹ 355 more rows