train/val(dev)/test vs kfold/bootstrap with test vs nested cv
choosing sample sizes for train, val, test
Random vs non-random splits into train/val/test. what needs to be true? what is less important?
overfit val? Use test more than once? Do vs. don’t and risks
Review 12 Takeaways
why
start with baseline/basic system first- why?
eyeball and black box dev sets
size of eyeball sample?
risk of using all dev as eyeball sample
error analysis for regression?
review 19 takeaways
We do not use training error as an estimate of model performance in new data. Why?
We can compare training and val error (e.g., rmse) to coarsely partition error into separate sources (bias vs. variance)
Also discuss
example 1
example 2
example 3