IAML Unit 9: Discussion

Announcements

Installing keras package
Slack media channel
Appendix for nested cross validation
- using tidymodels packages but not parsnip engines and fit_resamples/tune_grid
- tidymodels wrappers at three levels (we use levels 1-2 for engines and resampling/tuning but not workflows, etc)

I’m still confused about what out-of-bag error is and how it would differ from just model error
What did you mean in the lectures when you said that the hyperparameters were around the edges?
Learn more about the algorithms you use by reading package documentation
- ranger
- rpart

When we split the data in a tree by a variable, how the threshold is specified? Is the algorithm determining it internally?
- RSS
- Error/accuracy, gini, entropy (book formula)
Can we discuss further how tree-models can handle nonlinear effects and interactions natively?
- Interactions (draw)
- non-linear?
What determines the depth of trees that we set in decision trees and random forest?
How do trees handle missing data? Do we ever want to impute missing data in our recipes?
In what situations might additional feature engineering (e.g., alternative handling of missing values or categorical aggregation) still improve decision tree performance?
- missing values, categorical aggregation, nsv, other dimensionality reduction
- Why isn’t feature engineering required as extensively when it comes to decision trees?
- out of box statistical algorithms
Can you give more examples of how to interpret the decision tree graph?
- book examples

A further explanation on what a base learner is
When and why does bagging improve model performance
- What are the benefits of bagging
- No impact on bias
- Interpretability loss
- computational cost
Can you talk more about bagging and how it utilizes bootstrapping techniques?
Can you explain more on why we should de-correlate the bagged trees? Do we not need to de-correlate if we are not using bagging?
Besides using the same resampling (bootstrapping), how is the inner loop/outer loop with bagging and random forest distinct from nested CV?
its not similar
Out-of-Bag Error Estimation: I did not really understand the logic behind this

Understanding mtry

How do we decide which stopping rule to use for trees?
- Role of tree_depth (trees), min_n (both), cost complexity (trees) and mtry (rf), number of trees (rf) in decision trees and random forests
Can we go over the different advantages and disadvantages of the different tree algorithms like we did in class with QDA, LDA, KNN, Log, and RDA?
- simple tree (e.g. CART)
- bagged tress
- rf and XGBoost