IAML Unit 8: Discussion

Announcements

General

  • Can we go over if/how post hoc is different for this class than it was in 610? It seems like we do more for it in this class than we did before

Confusion matrix

Ground Truth
Prediction Positive Negative
Positive TP FP
Negative FN TN

Definitions:

  • TP: True positive

  • TN: True negative

  • FP: False positive (Type 1 error/false alarm)

  • FN: False negative (Type 2 error/miss)

  • More generally

    • define accuracy, sens, spec, ppv, npv, bal_accuracy, f1
    • Cost of FP and FN?
    • How to choose between?
Ground Truth
Prediction Positive Negative
Positive 95 5
Negative 5 95
Ground Truth
Prediction Positive Negative
Positive 95 50
Negative 5 950
  • Predicting lapses
  • COVID testing during pandemic early vs. late

Resampling

  • Options?
  • similarities and differences, relative comparisons?
  • resampling held-in but not held-out. Why?
  • Why does it work?

ROC

  • Axes and various names/representations
library(tidyverse)
theme_set(theme_classic()) 
devtools::source_url("https://github.com/jjcurtin/lab_support/blob/main/format_path.R?raw=true")
path_models <-"~/mnt/private/studydata/risk/models/ema" # format_path("studydata/risk/models/ema", "restricted")
preds_hour<- read_rds(file.path(path_models, 
                               "outer_preds_1hour_0_v5_nested_main.rds"))
roc_hour <- preds_hour|> 
  yardstick::roc_curve(prob_beta, truth = label)

Option 1 - Sensitivity vs. 1 - Specificity

  • More common terms
  • 1 - Specificity is confusing!
roc_hour |> 
  ggplot(aes(x = 1 - specificity, y = sensitivity)) +
    geom_path(linewidth = 1.25) +
    geom_abline(lty = 3) +
    coord_fixed(xlim = c(0, 1), ylim = c(0, 1)) +
    labs(x = "(1 - Specificity)",
        y = "Sensitivity") +
  scale_x_continuous(breaks = seq(0,1,.25))

Option 2

  • More common terms
  • Sensible (though reversed) axis
  • My preferred format
roc_hour |> 
  ggplot(aes(x = 1 - specificity, y = sensitivity)) +
    geom_path(linewidth = 1.25) +
    geom_abline(lty = 3) +
    coord_fixed(xlim = c(0, 1), ylim = c(0, 1)) +
    labs(x = "Specificity",
        y = "Sensitivity") +
  scale_x_continuous(breaks = seq(0,1,.25),
    labels = sprintf("%.2f", seq(1,0,-.25)))

Option 3 - TPR vs. FPR

  • sensible axes
  • Less common terms
  • But coming to prefer (depending on audience)
roc_hour |> 
  ggplot(aes(x = 1 - specificity, y = sensitivity)) +
    geom_path(linewidth = 1.25) +
    geom_abline(lty = 3) +
    coord_fixed(xlim = c(0, 1), ylim = c(0, 1)) +
    labs(x = "False Positive Rate (FPR)",
        y = "True Positive Rate (TPR)") +
  scale_x_continuous(breaks = seq(0,1,.25))
  • Why is top right perfect performance
  • Thresholds
  • Interpretation of auROC and the diagonal line
  • Can we go over some more concrete (real-world) examples of when it would be a good idea to use a different threshold for classification?
  • Are there any techniques to optimize decision thresholds? Or is it trial and error.
  • How does AUROC perform in highly imbalanced datasets
  • Sensitivity vs. Specificity Trade-offs, is it always true increase in one will sacrifice another?
roc_hour |> 
  ggplot(aes(x = 1 - specificity, y = sensitivity)) +
    geom_path(linewidth = 1.25) +
    geom_abline(lty = 3) +
    coord_fixed(xlim = c(0, 1), ylim = c(0, 1)) +
    labs(x = "Specificity",
        y = "Sensitivity") +
  scale_x_continuous(breaks = seq(0,1,.25),
    labels = sprintf("%.2f", seq(1,0,-.25)))
  • “means” of sensitivity/specificity vs. precision(ppv)/recall(sensitivity)
  • Balanced accuracy is the arithmetic mean of sens/spec
  • f1 is the harmonic mean of precision/recall

harmonic mean example: if a vehicle travels 60km outbound at a speed of 60 km/h and returns the same distance at a speed of 20 km/h, then its average speed is the harmonic mean of x and y (30 km/h), not 40 km/h

harmonic mean of

  • .5 and .5 is .5
  • .75 and .75 is .75
  • .25 and .75 is .375
  • 0 and 1 is 0

Kappa

  • expected accuracy
    • given majority class proportion
    • by chance given base rates
  • ratio of improvement above expected accuracy