When we split the data in a tree by a variable, how the threshold is specified? Is the algorithm determining it internally?
Can we discuss further how tree-models can handle nonlinear effects and interactions natively?
What determines the depth of trees that we set in decision trees and random forest?
How do trees handle missing data? Do we ever want to impute missing data in our recipes?
In what situations might additional feature engineering (e.g., alternative handling of missing values or categorical aggregation) still improve decision tree performance?
Can you give more examples of how to interpret the decision tree graph?
A further explanation on what a base learner is
When and why does bagging improve model performance
Can you talk more about bagging and how it utilizes bootstrapping techniques?
Can you explain more on why we should de-correlate the bagged trees? Do we not need to de-correlate if we are not using bagging?
Besides using the same resampling (bootstrapping), how is the inner loop/outer loop with bagging and random forest distinct from nested CV?
its not similar
Out-of-Bag Error Estimation: I did not really understand the logic behind this
Understanding mtry
How do we decide which stopping rule to use for trees?
Can we go over the different advantages and disadvantages of the different tree algorithms like we did in class with QDA, LDA, KNN, Log, and RDA?