class: title-slide, center, bottom # Tune Models ## Tidymodels, Virtually — Session 05 ### Alison Hill --- class: middle, center, frame # tune Functions for fitting and tuning models <tidymodels.github.io/tune/> <iframe src="https://tidymodels.github.io/tune/" width="100%" height="400px"></iframe> --- class: middle, center # `tune()` A placeholder for hyper-parameters to be "tuned" ```r nearest_neighbor(neighbors = tune()) ``` --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( object, resamples, ..., grid = 10, metrics = NULL, control = control_grid() ) ``` ] --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( * object, resamples, ..., grid = 10, metrics = NULL, control = control_grid() ) ``` ] -- .pull-right[ One of: + A parsnip `model` object + A `workflow` ] --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( * object, * preprocessor, resamples, ..., grid = 10, metrics = NULL, control = control_grid() ) ``` ] .pull-right[ A `model` + `recipe` ] --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( object, resamples, ..., * grid = 10, metrics = NULL, control = control_grid() ) ``` ] .pull-right[ One of: + A positive integer. + A data frame of tuning combinations. ] --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( object, resamples, ..., * grid = 10, metrics = NULL, control = control_grid() ) ``` ] .pull-right[ Number of candidate parameter sets to be created automatically; `10` is the default. ] --- ```r library(modeldata) data(stackoverflow) # split the data set.seed(100) # Important! so_split <- initial_split(stackoverflow, strata = Remote) so_train <- training(so_split) so_test <- testing(so_split) # resample training data set.seed(100) # Important! so_folds <- vfold_cv(so_train, v = 10, strata = Remote) ``` --- class: inverse, middle, center # Aside: -- ## Sub-class sampling --- class: middle, center # Downsampling .pull-left[ <img src="figs/05-tune/uni-biscatter-1.png" width="504" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="figs/05-tune/unnamed-chunk-9-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: middle, center # Upsampling .pull-left[ <img src="figs/05-tune/unnamed-chunk-10-1.png" width="504" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="figs/05-tune/unnamed-chunk-11-1.png" width="504" style="display: block; margin: auto;" /> ] --- # .center[`step_downsample()`] Down-sampling is performed on the training set *only*. Default is `skip = TRUE`. .pull-left[ ## Training Set ``` # # A tibble: 2 x 2 # Remote n # <fct> <int> # 1 Remote 435 # 2 Not remote 3761 ``` ] -- .pull-right[ ## "Prepped" Training Set ``` # # A tibble: 2 x 2 # Remote n # <fct> <int> # 1 Remote 435 # 2 Not remote 435 ``` ] --- # .center[`step_downsample()`] Down-sampling is performed on the training set *only*. Default is `skip = TRUE`. .pull-left[ ## Test Set ``` # # A tibble: 2 x 2 # Remote n # <fct> <int> # 1 Remote 140 # 2 Not remote 1258 ``` ] -- .pull-right[ ## "Prepped" Test Set ``` # # A tibble: 2 x 2 # Remote n # <fct> <int> # 1 Remote 140 # 2 Not remote 1258 ``` ] --- class: your-turn # Your Turn 1 Here's a new recipe (also in your .Rmd)… ```r so_rec <- recipe(Remote ~ ., data = so_train) %>% step_zv(all_predictors()) %>% step_dummy(all_nominal(), -all_outcomes()) %>% step_lincomb(all_predictors()) %>% step_downsample(Remote) ``` --- class: your-turn # Your Turn 1 …and a new model plus workflow. Can you tell what type of model this is?… ```r rf_spec <- rand_forest() %>% set_engine("ranger") %>% set_mode("classification") rf_workflow <- workflow() %>% add_recipe(so_rec) %>% add_model(rf_spec) ``` --- class: your-turn # Your Turn 1 Here is the output from `fit_resamples()`... ```r set.seed(100) # Important! rf_results <- rf_workflow %>% fit_resamples(resamples = so_folds, metrics = metric_set(roc_auc)) rf_results %>% collect_metrics() # # A tibble: 1 x 5 # .metric .estimator mean n std_err # <chr> <chr> <dbl> <int> <dbl> # 1 roc_auc binary 0.683 10 0.0206 ``` --- class: your-turn # Your Turn 1 Edit the random forest model to tune the `mtry` and `min_n` hyperparameters. Update your workflow to use the tuned model. Then use `tune_grid()` to find the best combination of hyper-parameters to maximize `roc_auc`; let tune set up the grid for you. How does it compare to the average ROC AUC across folds from `fit_resamples()`?
05
:
00
--- ```r rf_tuner <- rand_forest(mtry = tune(), min_n = tune()) %>% set_engine("ranger") %>% set_mode("classification") rf_workflow <- rf_workflow %>% update_model(rf_tuner) set.seed(100) # Important! rf_results <- rf_workflow %>% tune_grid(resamples = so_folds, metrics = metric_set(roc_auc)) ``` --- ```r rf_results %>% collect_metrics() # # A tibble: 10 x 7 # mtry min_n .metric .estimator mean n std_err # <int> <int> <chr> <chr> <dbl> <int> <dbl> # 1 3 13 roc_auc binary 0.685 10 0.0198 # 2 4 15 roc_auc binary 0.684 10 0.0201 # 3 7 36 roc_auc binary 0.684 10 0.0203 # 4 8 20 roc_auc binary 0.683 10 0.0209 # 5 10 28 roc_auc binary 0.683 10 0.0211 # 6 13 21 roc_auc binary 0.680 10 0.0219 # 7 14 8 roc_auc binary 0.668 10 0.0221 # 8 18 31 roc_auc binary 0.676 10 0.0225 # 9 20 5 roc_auc binary 0.659 10 0.0220 # 10 22 38 roc_auc binary 0.678 10 0.0231 ``` --- ```r rf_results %>% collect_metrics(summarize = FALSE) # # A tibble: 100 x 6 # id mtry min_n .metric .estimator .estimate # <chr> <int> <int> <chr> <chr> <dbl> # 1 Fold01 10 28 roc_auc binary 0.726 # 2 Fold01 7 36 roc_auc binary 0.732 # 3 Fold01 13 21 roc_auc binary 0.717 # 4 Fold01 3 13 roc_auc binary 0.741 # 5 Fold01 14 8 roc_auc binary 0.709 # 6 Fold01 18 31 roc_auc binary 0.717 # 7 Fold01 4 15 roc_auc binary 0.744 # 8 Fold01 20 5 roc_auc binary 0.698 # 9 Fold01 22 38 roc_auc binary 0.710 # 10 Fold01 8 20 roc_auc binary 0.729 # # … with 90 more rows ``` --- .center[ # `tune_grid()` A version of `fit_resamples()` that performs a grid search for the best combination of tuned hyper-parameters. ] .pull-left[ ```r tune_grid( object, resamples, ..., * grid = df, metrics = NULL, control = control_grid() ) ``` ] .pull-right[ A data frame of tuning combinations. ] --- class: middle, center # `expand_grid()` Takes one or more vectors, and returns a data frame holding all combinations of their values. ```r expand_grid(mtry = c(1, 5), min_n = 1:3) # # A tibble: 6 x 2 # mtry min_n # <dbl> <int> # 1 1 1 # 2 1 2 # 3 1 3 # 4 5 1 # 5 5 2 # 6 5 3 ``` -- .footnote[tidyr package; see also base `expand.grid()`] --- class: middle name: show-best .center[ # `show_best()` Shows the .display[n] most optimum combinations of hyper-parameters ] ```r rf_results %>% show_best(metric = "roc_auc", n = 5) ``` --- template: show-best ``` # # A tibble: 5 x 7 # mtry min_n .metric .estimator mean n std_err # <int> <int> <chr> <chr> <dbl> <int> <dbl> # 1 3 13 roc_auc binary 0.685 10 0.0198 # 2 7 36 roc_auc binary 0.684 10 0.0203 # 3 4 15 roc_auc binary 0.684 10 0.0201 # 4 8 20 roc_auc binary 0.683 10 0.0209 # 5 10 28 roc_auc binary 0.683 10 0.0211 ``` --- class: middle, center # `autoplot()` Quickly visualize tuning results ```r rf_results %>% autoplot() ``` <img src="figs/05-tune/rf-plot-1.png" width="504" style="display: block; margin: auto;" /> --- class: middle, center <img src="figs/05-tune/unnamed-chunk-26-1.png" width="504" style="display: block; margin: auto;" /> --- class: middle name: select-best .center[ # `select_best()` Shows the .display[top] combination of hyper-parameters. ] ```r so_best <- rf_results %>% select_best(metric = "roc_auc") so_best ``` --- template: select-best ``` # # A tibble: 1 x 2 # mtry min_n # <int> <int> # 1 3 13 ``` --- class: middle .center[ # `finalize_workflow()` Replaces `tune()` placeholders in a model/recipe/workflow with a set of hyper-parameter values. ] ```r last_rf_workflow <- rf_workflow %>% finalize_workflow(so_best) ``` --- background-image: url(images/diamonds.jpg) background-size: contain background-position: left class: middle, center background-color: #f5f5f5 .pull-right[ ## We are ready to touch the jewels... ## The .display[testing set]! ] --- class: middle .center[ # `last_fit()` ] ```r last_rf_fit <- last_rf_workflow %>% last_fit(split = so_split) ``` --- ```r last_rf_fit # # Monte Carlo cross-validation (0.75/0.25) with 1 resamples # # A tibble: 1 x 6 # splits id .metrics .notes .predictions .workflow # <list> <chr> <list> <list> <list> <list> # 1 <split … train/… <tibble … <tibbl… <tibble [1,3… <workflo… ``` --- class: your-turn # Your Turn 2 Use `select_best()`, `finalize_workflow()`, and `last_fit()` to take the best combination of hyper-parameters from `rf_results` and use them to predict the test set. How does our actual test ROC AUC compare to our cross-validated estimate?
05
:
00
--- ```r so_best <- rf_results %>% select_best(metric = "roc_auc") last_rf_workflow <- rf_workflow %>% finalize_workflow(so_best) last_rf_fit <- last_rf_workflow %>% last_fit(split = so_split) last_rf_fit %>% collect_metrics() ``` --- class: middle, frame .center[ # Final metrics ] ```r last_rf_fit %>% collect_metrics() # # A tibble: 2 x 3 # .metric .estimator .estimate # <chr> <chr> <dbl> # 1 accuracy binary 0.677 # 2 roc_auc binary 0.735 ``` --- class: middle .center[ # Final test predictions ] ```r last_rf_fit %>% collect_predictions() # # A tibble: 1,398 x 6 # id .pred_Remote `.pred_Not remo… .row .pred_class # <chr> <dbl> <dbl> <int> <fct> # 1 trai… 0.505 0.495 1 Remote # 2 trai… 0.410 0.590 6 Not remote # 3 trai… 0.185 0.815 18 Not remote # 4 trai… 0.654 0.346 23 Remote # 5 trai… 0.463 0.537 30 Not remote # 6 trai… 0.547 0.453 50 Remote # 7 trai… 0.855 0.145 53 Remote # 8 trai… 0.432 0.568 56 Not remote # 9 trai… 0.588 0.412 63 Remote # 10 trai… 0.315 0.685 68 Not remote # # … with 1,388 more rows, and 1 more variable: Remote <fct> ``` --- ```r roc_values <- last_rf_fit %>% collect_predictions() %>% roc_curve(truth = Remote, estimate = .pred_Remote) autoplot(roc_values) ``` <img src="figs/05-tune/unnamed-chunk-35-1.png" width="504" style="display: block; margin: auto;" />