Tune Models

class: title-slide, center, bottom

# Tune Models

## Tidymodels, Virtually &mdash; Session 05

### Alison Hill

---
class: middle, center, frame

# tune

Functions for fitting and tuning models

<tidymodels.github.io/tune/>

---
class: middle, center

# `tune()`

A placeholder for hyper-parameters to be "tuned"

```r
nearest_neighbor(neighbors = tune())
```

---

.center[
# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.
]

.pull-left[

```r
tune_grid(
  object, 
  resamples, 
  ..., 
  grid = 10, 
  metrics = NULL, 
  control = control_grid()
)
```

]

---

.center[
# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.
]

.pull-left[

```r
tune_grid(
* object,
  resamples, 
  ..., 
  grid = 10, 
  metrics = NULL, 
  control = control_grid()
)
```

]

.pull-right[
One of:

+ A parsnip `model` object

+ A `workflow`

]

---

.center[
# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.
]

.pull-left[

```r
tune_grid(
* object,
* preprocessor,
  resamples, 
  ..., 
  grid = 10, 
  metrics = NULL, 
  control = control_grid()
)
```

]

.pull-right[
A `model` + `recipe`
]

---

.center[
# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.
]

.pull-left[

```r
tune_grid(
  object, 
  resamples, 
  ..., 
* grid = 10,
  metrics = NULL, 
  control = control_grid()
)
```

]

.pull-right[
One of:

+ A positive integer.

+ A data frame of tuning combinations.

]

---

.center[

# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.

]

.pull-left[

```r
tune_grid(
  object, 
  resamples, 
  ..., 
* grid = 10,
  metrics = NULL, 
  control = control_grid()
)
```

]

.pull-right[
Number of candidate parameter sets to be created automatically; `10` is the default.
]

---

```r
library(modeldata)
data(stackoverflow)

# split the data
set.seed(100) # Important!
so_split <- initial_split(stackoverflow, strata = Remote)
so_train <- training(so_split)
so_test  <- testing(so_split)

# resample training data
set.seed(100) # Important!
so_folds <- vfold_cv(so_train, v = 10, strata = Remote)
```

---
class: inverse, middle, center

# Aside:

## Sub-class sampling

---
class: middle, center

# Downsampling

.pull-left[

]

.pull-right[
<img src="figs/05-tune/unnamed-chunk-9-1.png" width="504" style="display: block; margin: auto;" />

]

---
class: middle, center

# Upsampling

.pull-left[

]

.pull-right[
<img src="figs/05-tune/unnamed-chunk-11-1.png" width="504" style="display: block; margin: auto;" />

]

---

# .center[`step_downsample()`]

Down-sampling is performed on the training set *only*. Default is `skip = TRUE`.

.pull-left[

## Training Set

```
# # A tibble: 2 x 2
#   Remote         n
#   <fct>      <int>
# 1 Remote       435
# 2 Not remote  3761
```
]

.pull-right[

## "Prepped" Training Set

```
# # A tibble: 2 x 2
#   Remote         n
#   <fct>      <int>
# 1 Remote       435
# 2 Not remote   435
```

]

---

# .center[`step_downsample()`]

Down-sampling is performed on the training set *only*. Default is `skip = TRUE`.

.pull-left[

## Test Set

```
# # A tibble: 2 x 2
#   Remote         n
#   <fct>      <int>
# 1 Remote       140
# 2 Not remote  1258
```
]

.pull-right[

## "Prepped" Test Set

```
# # A tibble: 2 x 2
#   Remote         n
#   <fct>      <int>
# 1 Remote       140
# 2 Not remote  1258
```

]

---
class: your-turn

# Your Turn 1

Here's a new recipe (also in your .Rmd)…

```r
so_rec <- recipe(Remote ~ ., 
                 data = so_train) %>% 
  step_zv(all_predictors()) %>%
  step_dummy(all_nominal(), -all_outcomes()) %>% 
  step_lincomb(all_predictors()) %>% 
  step_downsample(Remote)
```

---
class: your-turn

# Your Turn 1

…and a new model plus workflow. Can you tell what type of model this is?…

```r
rf_spec <- 
  rand_forest() %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_workflow <-
  workflow() %>% 
  add_recipe(so_rec) %>% 
  add_model(rf_spec)
```

---
class: your-turn

# Your Turn 1

Here is the output from `fit_resamples()`...

```r
set.seed(100) # Important!
rf_results <-
  rf_workflow %>% 
  fit_resamples(resamples = so_folds,
                metrics = metric_set(roc_auc))

rf_results %>% 
  collect_metrics()
# # A tibble: 1 x 5
#   .metric .estimator  mean     n std_err
#   <chr>   <chr>      <dbl> <int>   <dbl>
# 1 roc_auc binary     0.683    10  0.0206
```

---
class: your-turn

# Your Turn 1

Edit the random forest model to tune the `mtry` and `min_n` hyperparameters.

Update your workflow to use the tuned model.

Then use `tune_grid()` to find the best combination of hyper-parameters to maximize `roc_auc`; let tune set up the grid for you.

How does it compare to the average ROC AUC across folds from `fit_resamples()`?

---

```r
rf_tuner <- 
  rand_forest(mtry = tune(),
              min_n = tune()) %>% 
  set_engine("ranger") %>% 
  set_mode("classification")

rf_workflow <-
  rf_workflow %>% 
  update_model(rf_tuner)

set.seed(100) # Important!
rf_results <-
  rf_workflow %>% 
  tune_grid(resamples = so_folds,
            metrics = metric_set(roc_auc))
```

---

```r
rf_results %>% 
  collect_metrics() 
# # A tibble: 10 x 7
#     mtry min_n .metric .estimator  mean     n std_err
#    <int> <int> <chr>   <chr>      <dbl> <int>   <dbl>
#  1     3    13 roc_auc binary     0.685    10  0.0198
#  2     4    15 roc_auc binary     0.684    10  0.0201
#  3     7    36 roc_auc binary     0.684    10  0.0203
#  4     8    20 roc_auc binary     0.683    10  0.0209
#  5    10    28 roc_auc binary     0.683    10  0.0211
#  6    13    21 roc_auc binary     0.680    10  0.0219
#  7    14     8 roc_auc binary     0.668    10  0.0221
#  8    18    31 roc_auc binary     0.676    10  0.0225
#  9    20     5 roc_auc binary     0.659    10  0.0220
# 10    22    38 roc_auc binary     0.678    10  0.0231
```

---

```r
rf_results %>% 
  collect_metrics(summarize = FALSE) 
# # A tibble: 100 x 6
#    id      mtry min_n .metric .estimator .estimate
#    <chr>  <int> <int> <chr>   <chr>          <dbl>
#  1 Fold01    10    28 roc_auc binary         0.726
#  2 Fold01     7    36 roc_auc binary         0.732
#  3 Fold01    13    21 roc_auc binary         0.717
#  4 Fold01     3    13 roc_auc binary         0.741
#  5 Fold01    14     8 roc_auc binary         0.709
#  6 Fold01    18    31 roc_auc binary         0.717
#  7 Fold01     4    15 roc_auc binary         0.744
#  8 Fold01    20     5 roc_auc binary         0.698
#  9 Fold01    22    38 roc_auc binary         0.710
# 10 Fold01     8    20 roc_auc binary         0.729
# # … with 90 more rows
```

---

.center[
# `tune_grid()`

A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters.

]

.pull-left[

```r
tune_grid(
  object, 
  resamples, 
  ..., 
* grid = df,
  metrics = NULL, 
  control = control_grid()
)
```

]

.pull-right[
A data frame of tuning combinations.
]

---
class: middle, center

# `expand_grid()`

Takes one or more vectors, and returns a data frame holding all combinations of their values.

```r
expand_grid(mtry = c(1, 5), min_n = 1:3)
# # A tibble: 6 x 2
#    mtry min_n
#   <dbl> <int>
# 1     1     1
# 2     1     2
# 3     1     3
# 4     5     1
# 5     5     2
# 6     5     3
```

.footnote[tidyr package; see also base `expand.grid()`]

---
class: middle
name: show-best

.center[
# `show_best()`

Shows the .display[n] most optimum combinations of hyper-parameters
]

```r
rf_results %>% 
  show_best(metric = "roc_auc", n = 5)
```

---
template: show-best

```
# # A tibble: 5 x 7
#    mtry min_n .metric .estimator  mean     n std_err
#   <int> <int> <chr>   <chr>      <dbl> <int>   <dbl>
# 1     3    13 roc_auc binary     0.685    10  0.0198
# 2     7    36 roc_auc binary     0.684    10  0.0203
# 3     4    15 roc_auc binary     0.684    10  0.0201
# 4     8    20 roc_auc binary     0.683    10  0.0209
# 5    10    28 roc_auc binary     0.683    10  0.0211
```

---
class: middle, center

# `autoplot()`

Quickly visualize tuning results

```r
rf_results %>% autoplot()
```

---
class: middle, center

---
class: middle
name: select-best

.center[
# `select_best()`

Shows the .display[top] combination of hyper-parameters.
]

```r
so_best <-
  rf_results %>% 
  select_best(metric = "roc_auc")

so_best
```

---
template: select-best

```
# # A tibble: 1 x 2
#    mtry min_n
#   <int> <int>
# 1     3    13
```

---
class: middle

.center[
# `finalize_workflow()`

Replaces `tune()` placeholders in a model/recipe/workflow with a set of hyper-parameter values.
]

```r
last_rf_workflow <- 
  rf_workflow %>%
  finalize_workflow(so_best) 
```

---
background-image: url(images/diamonds.jpg)
background-size: contain
background-position: left
class: middle, center
background-color: #f5f5f5

.pull-right[
## We are ready to touch the jewels...

## The .display[testing set]!

]

---
class: middle

.center[

# `last_fit()`

]

```r
last_rf_fit <-
  last_rf_workflow %>% 
  last_fit(split = so_split)
```

---

```r
last_rf_fit
# # Monte Carlo cross-validation (0.75/0.25) with 1 resamples  
# # A tibble: 1 x 6
#   splits   id      .metrics  .notes  .predictions  .workflow
#   <list>   <chr>   <list>    <list>  <list>        <list>   
# 1 <split … train/… <tibble … <tibbl… <tibble [1,3… <workflo…
```

---
class: your-turn

# Your Turn 2

Use `select_best()`, `finalize_workflow()`, and `last_fit()` to take the best combination of hyper-parameters from `rf_results` and use them to predict the test set.

How does our actual test ROC AUC compare to our cross-validated estimate?

---

```r
so_best <-
  rf_results %>% 
  select_best(metric = "roc_auc")

last_rf_workflow <- 
  rf_workflow %>%
  finalize_workflow(so_best)

last_rf_fit <-
  last_rf_workflow %>% 
  last_fit(split = so_split)

last_rf_fit %>% 
  collect_metrics()
```

---
class: middle, frame

.center[
# Final metrics
]

```r
last_rf_fit %>% 
  collect_metrics()
# # A tibble: 2 x 3
#   .metric  .estimator .estimate
#   <chr>    <chr>          <dbl>
# 1 accuracy binary         0.677
# 2 roc_auc  binary         0.735
```

---
class: middle

.center[
# Final test predictions
]

```r
last_rf_fit %>% 
  collect_predictions()
# # A tibble: 1,398 x 6
#    id    .pred_Remote `.pred_Not remo…  .row .pred_class
#    <chr>        <dbl>            <dbl> <int> <fct>      
#  1 trai…        0.505            0.495     1 Remote     
#  2 trai…        0.410            0.590     6 Not remote 
#  3 trai…        0.185            0.815    18 Not remote 
#  4 trai…        0.654            0.346    23 Remote     
#  5 trai…        0.463            0.537    30 Not remote 
#  6 trai…        0.547            0.453    50 Remote     
#  7 trai…        0.855            0.145    53 Remote     
#  8 trai…        0.432            0.568    56 Not remote 
#  9 trai…        0.588            0.412    63 Remote     
# 10 trai…        0.315            0.685    68 Not remote 
# # … with 1,388 more rows, and 1 more variable: Remote <fct>
```

---

```r
roc_values <- 
  last_rf_fit %>% 
  collect_predictions() %>% 
  roc_curve(truth = Remote, estimate = .pred_Remote)
autoplot(roc_values)
```