Unit 3: Inferences About a Single Mean (1 Parameter Models)
Units 3-4 Organization
First, consider details of simplest model (one parameter estimate; mean-only model; no \(X\)s)
Next, examine simple (bivariate) regression (two parameter estimates; one \(X\) for one quantitative predictor)
These provide a critical foundation for all linear models
Subsequent units will generalize to one dichotomous variable (Unit 5), multiple predictors (Units 6-7), and beyond…
Linear Models as Models
Linear models (including regression) are models.
\(DATA = MODEL + ERROR\)
Three general uses for models:
Describe and summarize DATA (\(Y\)s) in a simpler form using MODEL.
Predict DATA (\(Y\)s) from MODEL.
Understand (test inferences about) complex relationships between individual regressors (\(X\)s) in MODEL and the DATA (\(Y\)s). How precise are estimates of relationship?
\(DATA = MODEL + ERROR\)
MODELS are simplifications of reality
As such, there is ERROR
They also make assumptions that must be evaluated
Fear Potential Startle
We were interested in producing anxiety in the laboratory
To do this, we developed a procedure where we expose people to periods of unpredictable electric shock administration alternating with periods of safety
We measure their startle response in the shock and safe periods
We use the difference between their startle during shock – safe to determine if they are anxious
This is called Fear potentiated startle (FPS). Our procedure works if FPS > 0. We need a model of FPS scores to determine if FPS > 0
Fear Potentiated Startle: One parameter model
A very simple model for the population of FPS scores would predict the same value (\(\beta_0\)) for everyone in the population.
\(\hat{Y}_i=\beta_0\)
We would like this value to be the best prediction.
Question
In the context of DATA = MODEL + ERROR, how can we quantify best?
We want to predict some characteristic about the population of FPS scores that minimizes the ERROR from our model.
ERROR = DATA – MODEL
\(\varepsilon_i=Y_i-\hat{Y}_i\)
There is an error (\(\varepsilon_i\)) for each population score.
Total Error
Question
How might we quantify total model error?
1. Sum of errors across all scores in the population isn’t ideal because positive and negative errors will tend to cancel each other out. We won’t use this.
\(\sum(Y_i-\hat{Y}_i)\)
2. Sum of absolute values of errors could work. If we selected \(\beta_0\) to minimize the sum of the absolute value of errors, \(\beta_0\) would equal the median of the population. We could use this, but there is a better option.
\(\sum(|Y_i-\hat{Y}_i|)\)
3. Sum of squared errors (SSE) could work. If we selected \(\beta_0\) to minimize the sum of squared errors, \(\beta_0\) would equal the mean of the population. This is the best option and what we will use.
\(\sum(Y_i-\hat{Y}_i)^2\)
One Parameter Model for FPS
For the moment, lets just accept that we prefer to minimize SSE (more on that in a moment). You should predict the population mean FPS for everyone.
\(\hat{Y}_i=\beta_0\) … where \(\beta_0=\mu\)
Question
What is the problem with this model?
We don’t know the population mean for FPS scores (\(\mu\)).
Question
How could we solve this problem to develop a model?
We can collect a sample from the population and use the sample mean (\(\overline{X}\)) as an estimate of the population mean (\(\mu\)).
\(\overline{X}\) is an unbiased estimate for \(\mu\).
Model Parameter Estimation
DATA = MODEL + ERROR
Population model
\(Y_i=\beta_0+\varepsilon_i\)
\(\hat{Y}_i=\beta_0\) … where \(\beta_0=\mu\)
Estimate population parameters from sample
\(Y_i=b_0+e_i\)
\(\hat{Y}_i=b_0\) … where \(b_0=\overline{X}\)
Least Squares Criterion
In ordinary least squares (OLS) regression and other least squares linear models, the model parameter estimates (e.g., \(b_0\)) are calculated such that they minimize the sum of squared errors (SSE) in the sample in which you estimate the model.
\(\text{SSE}=\sum(Y_i-\hat{Y}_i)^2\)
\(\text{SSE}=\sum e_i^2\)
Properties of Parameter Estimates
There are three properties that make a parameter estimate attractive.
Unbiased: Mean of the sampling distribution for the parameter estimate is equal to the value for that parameter in the population.
Efficient: The sample estimates are close to the population parameter. In other words, the narrower the sampling distribution for any specific sample size \(N\), the more efficient the estimator. Efficient means small SE for parameter estimate.
Consistent: As the sample size increases, the sampling distribution becomes narrower (more efficient). Consistent means as \(N\) increases, SE for parameter estimate decreases
Least Squares Criterion (Continued)
If the \(\varepsilon_i\) are normally distributed, both the median and the mean are unbiased and consistent estimators.
The variance of the sampling distribution for the mean is:
\(\frac{\sigma^2}{N}\)
The variance of the sampling distribution for the median is:
\(\frac{\pi\sigma^2}{2N}\)
Therefore, the mean is the more efficient parameter.
For this reason, we tend to prefer to estimate our models by minimizing the sum of squared errors.
We will frequently use the tidy() function from the broom package so I am loading that package here. The alternative is to not load the package but use broom::tidy() each time we want to use the function. I prefer to load the package given that its part of the extended tidyverse.
2
My preferred theme for ggplots. Set it here to affect all plots in this document.
data <-read_csv(here::here(path_data, "03_single_mean_fps.csv"), show_col_types =FALSE) |>glimpse()
Here we are fitting a linear model with FPS regressed on the intercept. In other words, we are fitting a model that predicts FPS using only the sample mean.
We can get the predicted value for each individual in the sample using this model with the function predict().
Describe the logic of how the standard error, t statistic and p-value were determined given your understanding of sampling distributions.
1. Establish null and alternative hypotheses.
\(H_0: \beta_0 = 0; H_a: \beta_0 \neq 0\)
2. If \(H_0\) is true, the sampling distribution for \(b_0\) will have a mean of 0. We can estimate standard deviation of the sampling distribution with SE for \(b_0\).
3. \(b_0\) is approximately 8 standard deviations above the expected mean of the distribution if \(H_0\) is true.
4. We can use pt() to calculate the exact \(p\) value for this parameter estimate given \(H_0\).
pt(8.40,95,lower.tail=FALSE) *2
[1] 0.0000000000004293253
The probability of obtaining a sample \(b_0\) = 32.2 (or more extreme) if \(H_0\) is true is very low (< .05). Therefore we reject \(H_0\) and conclude that \(\beta_0 \neq 0\) and \(b_0\) is our best (unbiased) estimate of it.
Of course, we don’t need to calculate the p-value with pt() because we got it directly from tidy()
tibble(b0 =seq(-40,40,.01),probability =dt(b0/tidy(m)$std.error, m$df.residual)) |>ggplot(aes(x = b0, y = probability)) +geom_line() +geom_vline(xintercept =tidy(m)$estimate, color ="red") +labs(title ="Sampling Distribution for b0")
Statistical Inference and Model Comparisons
Statistical inference about parameters is fundamentally about model comparisons.
You are implicitly (t-test of parameter estimate) or explicitly (F-test of model comparison) comparing two different models of your data.
We follow Judd et al and call these two models the compact model and the augmented model.
The compact model will represent reality as the null hypothesis predicts. The augmented model will represent reality as the alternative hypothesis predicts.
The compact model is simpler (fewer parameters) than the augmented model. It is also nested in the augmented model (i.e. a subset of parameters).
Model Comparisons: Testing Inferences about \(\beta_0\)
\(\hat{\text{FPS}_i}=\beta_0\)
\(H_0: \beta_0 = 0\)
\(H_a: \beta_0 \neq 0\)
Compact model:
\(\hat{\text{FPS}_i}=0\)
We estimate 0 parameters (\(P=0\)) in this compact model.
Augmented model:
\(\hat{\text{FPS}_i}=\beta_0 (\approx b_0)\)
We estimate 1 parameter (\(P=1\)) in this augmented model.
Choosing between these two models is equivalent to testing if \(\beta_0 = 0\) as you did with the t-test.
Model Comparison Plots
data |>ggplot(aes(x ="", y = fps)) +geom_jitter(width =0.1, alpha = .6, size =2) +theme(axis.line.x =element_blank(),axis.ticks.x =element_blank()) +xlab(NULL) +geom_hline(yintercept =0, color ="red", linewidth =1) +geom_hline(yintercept =tidy(m)$estimate, color ="blue", linewidth =1)
Model Comparisons: Testing Inferences about \(\beta_0\) (Continued)
Compact model:
\(\hat{\text{FPS}_i}=0\)
\(\text{SSE}_c = \sum(Y_i-0)^2\)
Augmented model:
\(\hat{\text{FPS}_i}=\beta_0 (\approx b_0)\)
\(\text{SSE}_a = \sum(Y_i-\hat{Y}_i)^2\)
We can compare (and choose between) these two models by comparing their total error (SSE) in our sample.
For any model: \(\text{SSE} = \sum(Y_i-\hat{Y}_i)^2\)
We can calculate for the SSE for the compact model (\(\text{SSE}_c = \sum(Y_i-0)^2\)) directly:
sum((data$fps -0)^2)
[1] 233368.3
We can calculate the SSE for the augmented model (\(\text{SSE}_a = \sum(Y_i- 32.2 )^2\)) directly:
sum((data$fps -tidy(m)$estimate)^2)
[1] 133888.3
We can also get the SSE for the augmented model by summing the squared errors/residuals of the fitted model
Our more complex model that includes \(\beta_0\) reduces prediction error (SSE) by approximately 43%. Not bad!
Note
We won’t introduce eta squared/\(\Delta R^2\) until we have two parameters in the model.
Confidence Interval for \(b_0\)
A confidence interval (CI) is an interval for a parameter estimate in which you can be fairly confident that you will capture the true population parameter (in this case, \(\beta_0\)).
Most commonly reported is the 95% CI.
Across repeated samples, 95% of the calculated CIs will include the population parameter.
Use the confint() function to calculate confidence intervals. The default is to provide 95% CIs, but you can change this using the level parameter if you wish.
2.5 % 97.5 %
(Intercept) 24.58426 39.79742
Question
Given what you now know about confidence intervals and sampling distributions, what should the formula be?
For the 95% confidence interval this is approximately 2 SEs around our unbiased estimate of \(\beta_0\).
Question
How can we tell if a parameter is significant using only the confidence interval?
If a parameter estimate \(\neq\) 0 at \(\alpha\) = .05, then the 95% confidence interval for its parameter estimate should not include 0.
This is also true for testing whether the parameter estimate is equal to any other non-zero value for the population parameter.
The one parameter (mean-only) model: Special Case
Question
What special case (specific analytic test) is statistically equivalent to the test of the null hypothesis: \(\beta_0\) = 0 in the one parameter model?
The one sample t-test testing if a population mean = 0.
t.test(data$fps)
One Sample t-test
data: data$fps
t = 8.4015, df = 95, p-value = 0.0000000000004261
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
24.58426 39.79742
sample estimates:
mean of x
32.19084
Testing \(\beta_0\) = non-zero values
Question
How could you test an \(H_0\) regarding \(B_0\) = some value other than 0 (e.g., 10)? HINT: There are at least three methods.
Option 1: Compare SSE for the augmented model (\(\hat{Y}_i= \beta_0\)) to SSE from a different compact model for this new \(H_0\) (\(\hat{Y}_i= 10\)).
compact model: \(\hat{\text{FPS}_i}=10\)
augmented model: \(\hat{\text{FPS}_i}=\beta_0)\)
Option 2: Recalculate t-statistic using this new \(H_0\).
\(t= \frac{b_0 - 10}{\text{SE}_{b_0}}\)
Option 3: Does the confidence interval for the parameter estimate contain this other value? No p-value provided.
confint(m)
2.5 % 97.5 %
(Intercept) 24.58426 39.79742
Intermission…
One parameter (\(\beta_0\)) mean-only model
Description:\(b_0\) describes mean of \(Y\).
Prediction:\(b_0\) is predicted value that minimizes sample SSE.
Inference: Use \(b_0\) to test id \(\beta_0 = 0\) (default) or any other value. One sample t-test.
Two parameter (\(\beta_0, \beta_1\)) model
Description:\(b_1\) describes how \(Y\) changes as a function of \(X_1\). \(b_0\) describes expected value of \(Y\) ar specific value (0) for \(X_1\).
Prediction:\(b_0\) and \(b_1\) yield predicted values that vary by \(X_1\) and minimize SSE in sample.
Inference: Test if \(\beta_1 = 0\). Pearson’s \(r\); independent sample t-test. Test if \(\beta_0=0\). Analogous to one-sample t-test controlling for \(X_1\), if \(X_1\) is mean-centered. Very flexible!