Resampling methods to BOTH select and evaluate models
Most common scenario requires both model selection and final model evaluation
Need data for training, validation and test data set(s)
Test set
All methods (other than nested) use a single test set
20-30% of data
The problems associated with a small test set motivate nested resampling (What problem?)
Conceptually, get test set first
use initial_split() for all but validation set approach (now use initial_validation_split() for this last approach)
Train and Validation set
Describe how each of these methods produce training and validation sets
Single validation set
LOOCV
K-fold
Repeated k-fold
Grouped k-fold
Bootstrap
No performance estimate necessary
When might you just want to know the best model configuration but NOT care about how it performs? (though I think you might always want to know!)
How would you modify the resampling procedure?
In this instance, we might call the held-out sets validation sets when using that terminology
Only one model configuration
When might you have only one model configuration? (is this ever really true?)
How would you modify the resampling procedure?
In this instance, we might call the held-out sets test sets when using that terminology
Bias and Variance - The MOST Fundamental Issues in Inferential Statistics
Describe these two properties of estimates in general
need to think about repeated estimation of something (\(\hat{f}\), \(\hat{DGP}\); \(\hat{accuracy}\), \(\hat{rmse}\))
Describe bias and variance associated with developing models (estimates of the DGP)
Examples for our models (\(\hat{f}\), \(\hat{DGP}\))
What factors affect each?
Describe bias and variance of our performance estimates
Examples for our performance estimates (\(\hat{accuracy}\), \(\hat{rmse}\))
What factors affect each?
Connect Bias and Variance to our Resampling Methods
Our resampling methods yield an ESTIMATE of (held-out/out of sample) performance metric
We can use it to select among model configurations (validation sets)
We can use it to evaluate a single (or best) model configuration (test sets)
Lets consider the bias and variable of this ESTIMATE of a performance metric for each METHOD
Use held-in/held-out terminology
Discuss the bias and variance of the performance estimate of this configuration. Consider implications of:
The method
size of the held-in data
size of held out data
Our METHODS
Single validation set
LOOCV
K-fold
Repeated k-fold
Bootstrap
Biased Performance Estimates
ALL resampling methods yield performance estimates with some degree of bias when used to evaluate the performance of a model we will implement trained on ALL the data
WHY?
Optimization bias
What are the implications if you use that same performance estimate to select the best model configuration AND to evaluate that best model configuration (i.e., estimate its performance in new data)?
Can this source of bias be eliminated?
Nested CV
When used?
Why needed?
Describe how to do it using held-in/held-out terminology in addition to train/val/test
Discuss the bias and variance of the performance estimate used to select the best configuration. Consider implications of:
The method
size of the held-in data
size of held out data
Discuss the bias and variance of the performance estimate used to evaluate that best configuration. Consider implications of: