Unit 12: Inferences about Two Dichotomous Predictors and their Interaction

Learning Objectives

For models with dichotomous independent variables, you will learn:

Basic terminology from ANOVA framework.
How to identify main effects, simple effects, and interactions in table of means and figures.
How to link coefficients from interactive models with each coding system to table of means and figures (both directions).
How to calculate simple effects.
How to write up and display results.

An Example

Example:
Attitudes about abortion in US as a function of sex and religion (based loosely on Pew Research Center data but simplified group means to easier numbers!)

Sex:

Male vs. Female

Religion:

Catholic vs. Jewish

Attitude:

1–20 with higher score indicating more permissive attitudes

Equal \(n = 20\) in each cell (\(N = 80\)).

Open data and set up categorical variables as factors

data <- read_csv(here::here("data_lecture/12_interactions_abortion.csv"),
                 show_col_types = FALSE) |> 
  mutate(sex = factor(sex, levels = c("male", "female")),
         rel = factor(rel, levels = c("Catholic", "Jewish")))

Make and display table of data using kableExtra

Code

ex_tbl <- data |> 
  group_by(sex, rel) |> 
  summarise(mean = mean(att), .groups = "drop") |> 
  pivot_wider(names_from = sex, values_from = mean) |> 
  mutate(all = (male + female)/2) |> 
  add_row(rel = "all", male = 13, female = 16, all = 14.5) |> 
  rename(` ` = rel) |> 
  kableExtra::kable() |> 
  kableExtra::kable_classic()

ex_tbl

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Terms and Brief Definitions

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Common terminology: One-way ANOVA, Two-way ANOVA, Three-way ANOVA; 2 X 2 ANOVA; 2 X 3 X 2 ANOVA; Factorial ANOVA.

Cell mean: The mean of a group of participants at specific levels on each factor (e.g., Catholic men, Jewish women, etc.).

Marginal mean: The mean of cell means across a row or column.

Grand mean: The mean of all cell means.

Unweighted vs. weighted means:

Unweighted means are the average of the cell means.
Weighted means are the average of the cell means, but the cells are weighted by the number of observations in each cell.
In our fields, we generally want unweighted means because sampling is not representative

Terms and Brief Definitions

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Main effect: The average effect of an IV on the DV across the levels of another IV in the model (The effect of an IV at the average level of another IV). Evaluated with marginal means for IV.

Simple effect: The effect of an IV on the DV at a specific level of the other IV. Evaluated with cell means for focal IV at specific level of other IV (moderator).

Interaction: The simple effect of a (focal) IV on the DV differs across the levels of another (moderator) IV in the model.

Describing Effects

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

What is the magnitude of the main effect of sex on attitudes?

Women have 3 points more permissive attitudes about abortion than men.

Describing Effects

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

What is the magnitude of the main effect of religion on attitudes?

Jews have 1 point more permissive attitudes about abortion than Catholics.

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

What is the magnitude of the simple effects of sex on attitudes?

Among Catholics, women have 2 points more permissive attitudes about abortion than men.

Among Jews, women have 4 points more permissive attitudes about abortion than men.

Describing Effects

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

What is the magnitude of the simple effects of religion on attitudes?

Among men, there is no different in attitudes between Jews and Catholics.

Among women, Jews have 2 points more permissive attitudes about abortion than Catholics.

Describing Effects

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

Does there appear to be an interaction between sex and religion? Why or why not?

Yes, there does appear to be an interaction. The simple effects of sex appear to be different across Jews (4 points) than Catholics (2 points). Alternatively, the simple effects of religion appear to be different across women (2 points) and men (0 points).

Of course, you can’t tell if any of these (interaction, main effects, simple effects) are significant by looking at the data descriptively, you have to test the parameter estimates for each effect.

Describing Effects

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

How might you quantify the magnitude of the interaction?

The simple effect of sex increases by 2 points from Catholics (2) to Jews (4).

The simple effect of religion increases by 2 points from men (0) to women (2).

This will be the parameter estimate for the interaction!

You have learned that there are two common options for coding the regressors for categorical predictors: Dummy codes and Centered codes.

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

The two systems yield the same result (except for \(b_0\)) for additive models.

Centered Codes for Additive Model

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 14.5 + 3*sex_c + 1*rel_c\)

Exercise

Link parameter estimates \(b_0\), \(b_1\), & \(b_2\) to the figure below.

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 14.5 + 3*sex_c + 1*rel_c\)

Exercise

Link parameter estimates \(b_0\), \(b_1\), & \(b_2\) to the table of means below.

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

\(b_0\) is the predicted value for attitudes for 0 on both regressors. This is the grand mean.

\(b_1\) is the effect of sex. It will be forced to be constant across religions (6 – 3 = 3).

\(b_2\) is the effect of religion constant across sexes (5 – 4 = 1).

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 14.5 + 3*sex_c + 1*rel_c\)

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Exercise

Only the intercept would change in the linear model if we used dummy codes (above). What would the intercept be for the dummy coded model?

\(att = 13 + 3*sex_c + 1*rel_c\)

Question

What constraints were imposed in the additive model?

The effect of sex was constrained to be the same for both Catholics and Jews. 3 is the best parameter value because it is the average of the two simple effects of sex (2 vs. 4).

The effect of religion was constrained to be the same for both men and women. 1 is the best parameter value because it is the average of the two simple effects of religion (0 vs. 2).

Interactive Models

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

How do you relax this constraint and test the interaction between sex and religion?

Regress attitudes on sex, religion and their interaction (calculated as the product of sex * religion).

\(att = b_0 + b_1*sex + b_2*rel + b_3*sex*rel\)

The test of the \(b_3\) coefficient against zero is the test of the interaction (Alternatively: the comparison of this model to the compact model:

\(att = b_0 + b_1*sex + b_2*rel + 0*sex*rel\)

In interactive models, centered codes and dummy codes yield very different results for all of the parameters estimates (other than the interaction) because the parameter estimates are testing different questions depending on the coding of the \(X\)s.

You will use centered codes when you want to test main effects and interactions. The parameter estimate for the regressors coding for the IVs test the main effect of each IV. This is the approach you will almost always use for your primary analysis.

You will use dummy codes when you want to test simple effects. The parameter estimate for the regressors coding for the IVs will test a specific simple effect of each IV. You will often need to recode your IVs more than once to test all relevant simple effects.

The parameter estimate for the interaction and its interpretation is the same across these systems.

Centered Codes for Interactive Model

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 14.5 + 3*sex_c + 1*rel_c + 2*sex_c*rel_c\)

Exercise

Link parameter estimates \(b_0\), \(b_1\), & \(b_2\) to the table of means below.

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

\(b_0\) is the predicted value for attitudes for 0 on both \(X\)s. This is the grand mean.

\(b_1\) is the main effect of sex (16 – 13 = 3).

\(b_2\) is the main effect of religion (15 - 14 = 1).

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 4.5 + 3*sex_c + 1*rel_c + 2*sex_c*rel_c\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Exercise

Link \(b_3\) to the table of means above considering sex as the focal variable.

\(b_3\) is the change in the magnitude of the (simple) sex effect across religions.

(17 - 13; Jewish) - (15 - 13; Catholic) = 2

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 4.5 + 3*sex_c + 1*rel_c + 2*sex_c*rel_c\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Exercise

Link \(b_3\) to the table of means above considering religion as the focal variable.

\(b_3\) is the change in the magnitude of the (simple) religion effect across sexes.

(17 - 15; female) - (13 - 13; male) = 2

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 14.5 + 3*sex_c + 1*rel_c + 2*sex_c*rel_c\)

Exercise

Link all coefficients to the figure below.

Centered

sex_c: male (-0.5), female (0.5)
rel_c: Catholic (-0.5), Jewish (0.5)

\(att = 4.5 + 3*sex_c + 1*rel_c + 2*sex_c*rel_c\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

You can use the model to reproduce cell means.

Catholic men: \(14.5 + 3*(-.5) + 1*(-.5) + 2*(-.5)*(-.5) = 13\)

Jewish women: \(14.5 + 3*(.5) + 1*(.5) + 2*(.5)*(.5) = 17\)

Etc…

Dummy Codes for Interactive Model

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 13 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

Exercise

Link parameter estimates \(b_0\), \(b_1\), & \(b_2\) to the table of means below.

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

\(b_0\) is the predicted value for attitudes for 0 on both regressors. This is the predicted value for male Catholics.

\(b_1\) is the simple effect of sex for Catholics (15 - 13 = 2).

\(b_2\) is the simple effect of religion for men (13 – 13 = 0).

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 13 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Exercise

Link \(b_3\) to the table of means above considering sex as the focal variable.

\(b_3\) is the change in the magnitude of the (simple) sex effect across religions.

(17 - 13; Jewish) - (15 - 13; Catholic) = 2

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 13 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Exercise

Link \(b_3\) to the table of means below considering religion as the focal variable.

\(b_3\) is the change in the magnitude of the (simple) religion effect across sexes.

(17 - 15; female) - (13 - 13; male) = 2

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 3 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

Exercise

Link all coefficients to the figure below.

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 3 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

You can use the model to reproduce cell means.

Catholic men: \(13 + 2*(0) + 0*(0) + 2*(0)*(0) = 13\)

Jewish women: \(13 + 2*(1) + 0*(1) + 2*(1)*(1) = 17\)

Etc…

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (0), Jewish (1)

\(att = 3 + 2*sex_d + 0*rel_d + 2*sex_d*rel_d\)

Question

The above dummy coding system gave us the above regression equation where \(b_1\) was the simple effect of sex for Catholics. How do we get the simple effect of sex for Jews?

Recode religion with dummy codes: Jewish = 0 and Catholic = 1. Now the sex effect will be the simple effect for Jews because Jews = 0.

Dummy

sex_c: male (0), female (1)
rel_c: Catholic (1), Jewish (0)

	male	female	all
Catholic	13	15	14.0
Jewish	13	17	15.0
all	13	16	14.5

Question

What will the regression equation be now?

\(att = 13 + 4*sex_d + 0*rel_d + -2*sex_d*rel_d\)

Recoding the moderator changes the simple effect for the focal variable. With dummy codes, the parameter estimate for the focal variable is always its simple effect when the moderator = 0.

Note that the coefficient for the interaction also changed direction because the change for religion reversed direction.

Here are all the relevant analyses

Code

sse <- function(model) {
  sum(residuals(model)^2)
}

pre <- function(compact, augmented) {
  sse_c <- sse(compact)
  sse_a <- sse(augmented)
  (sse_c - sse_a) / sse_c
}

# centered interactive model
m_int <- lm(att ~ sex_c*rel_c, data)
tidy(m_int)

# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)    14.5      0.195     74.2  1.11e-72
2 sex_c           3        0.391      7.68 4.52e-11
3 rel_c           1.00     0.391      2.56 1.25e- 2
4 sex_c:rel_c     2.00     0.781      2.56 1.25e- 2

Code

confint(m_int)

                 2.5 %    97.5 %
(Intercept) 14.1109458 14.889054
sex_c        2.2218915  3.778108
rel_c        0.2218915  1.778108
sex_c:rel_c  0.4437830  3.556217

Code

# peta sex main effect
pre(lm(att ~ rel_c + sex_c:rel_c, data), m_int)

[1] 0.4368932

Code

# peta interaction
pre(lm(att ~ rel_c + sex_c, data), m_int)

[1] 0.07936508

Code

# simple effect of sex for Jews (coded 0)
data <- data |> 
  mutate(rel_d = if_else(rel == "Catholic", 1, 0))

m_int_se_j <- lm(att ~ rel_d*sex_d, data)
tidy(m_int_se_j)

# A tibble: 4 × 5
  term         estimate std.error statistic  p.value
  <chr>           <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept)  1.30e+ 1     0.391  3.33e+ 1 4.66e-47
2 rel_d        4.47e-16     0.553  8.09e-16 1.00e+ 0
3 sex_d        4   e+ 0     0.553  7.24e+ 0 3.09e-10
4 rel_d:sex_d -2   e+ 0     0.781 -2.56e+ 0 1.25e- 2

Code

confint(m_int_se_j)

                2.5 %    97.5 %
(Intercept) 12.221892 13.778108
rel_d       -1.100412  1.100412
sex_d        2.899588  5.100412
rel_d:sex_d -3.556217 -0.443783

Code

pre(lm(att ~ rel_d + sex_d:rel_d, data), m_int_se_j)

[1] 0.4081633

Code

# simple effect sex for Catholic (coded 0)
data <- data |> 
  mutate(rel_d = if_else(rel == "Catholic", 0, 1))

m_int_se_c <- lm(att ~ rel_d*sex_d, data)
tidy(m_int_se_c)

# A tibble: 4 × 5
  term        estimate std.error statistic  p.value
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 (Intercept) 1.30e+ 1     0.391  3.33e+ 1 4.66e-47
2 rel_d       4.84e-15     0.553  8.76e-15 1.00e+ 0
3 sex_d       2.00e+ 0     0.553  3.62e+ 0 5.29e- 4
4 rel_d:sex_d 2.00e+ 0     0.781  2.56e+ 0 1.25e- 2

Code

confint(m_int_se_c)

                 2.5 %    97.5 %
(Intercept) 12.2218915 13.778108
rel_d       -1.1004116  1.100412
sex_d        0.8995884  3.100412
rel_d:sex_d  0.4437830  3.556217

Code

pre(lm(att ~ rel_d + sex_d:rel_d, data), m_int_se_c)

[1] 0.1470588

Sample Writeup

We analyzed attitudes about abortion in a general linear model with centered, unit-weighted regressors for sex (female vs. male), religion (Jewish vs. Catholic) and their interaction. We report both raw GLM coefficients (\(b\)) and partial eta-squared (\(\eta_p^2\)) to document effect sizes.

The main effect of sex was significant, 95% CI(\(b\))=[2.22, 3.78], \(\eta_p^2= 0.44, t(76) = 7.68, p < .001\), indicating that women’s attitudes about abortion were 3 points higher than men on average. However, the sex X religion interaction was also significant, 95% CI(\(b\))=[0.44, 3.56], \(\eta_p^2= 0.08, t(76) = 2.56, p = .001\) (see Figure 1). This indicates that the magnitude of the sex effect was significantly greater in Jews (95% CI(\(b\))=[2.90, 5.10], \(\eta_p^2= 0.41, p < .001\)) than in Catholics (95% CI(\(b\)%)=[0.90, 3.10], \(\eta_p^2 = 0.15, p < .001\)).

Figure

Code

data_plot <-  expand.grid(sex_c = c(-.5, .5), rel_c = c(-.5, .5)) |> 
  mutate(att = predict(m_int,
                       expand.grid(sex_c = c(-.5, .5), 
                                   rel_c = c(-.5, .5))))

data_plot |> 
  mutate(rel_c = factor(rel_c, 
                        levels = c(-.5, .5), 
                        labels = c("Catholic", "Jewish")),
         sex_c = factor(sex_c, 
                        levels = c(-.5, .5),
                        labels = c("Male", "Female"))) |> 
  ggplot(aes(x = rel_c, y = att, group = sex_c, fill = sex_c)) +
  geom_col(position = "dodge", color = "black") +
  labs(fill = NULL,
       y = "Attitudes about abortion",
       x = NULL) +
  scale_fill_manual(values = c("light blue", "orange"))

Question

What else should be included in this publication quality figure?

Confidence intervals (+1 SE)

Raw data points (4 columns, jittered)

Notation to indicate simple effects for focal variable?

On your Own

	certain	uncertain	all
sober	100	120	110
drunk	90	90	90
all	95	105	100

Centered

group_c: sober (-0.5), drunk (0.5)
threat_c: certain (-0.5), uncertain (0.5)

Exercise

Provide the specific values for the parameter estimates in this model based on the table above

\(fps = b_0 + b_1*group_c + b_2*threat_c + b_3*group_c*threat_c\)

\(fps = 100 + -20*group_c + 10*threat_c + -20*group_c*threat_c\)

On your Own (Continued)

	certain	uncertain	all
sober	100	120	110
drunk	90	90	90
all	95	105	100

Dummy

group_d: sober (0), drunk (1)
threat_d: certain (0), uncertain (1)

Exercise

Provide the specific values for the parameter estimates in this model based on the table above

\(fps = b_0 + b_1*group_d + b_2*threat_d + b_3*group_d*threat_d\)

\(fps = 100 + -10*group_d + 20*threat_d + -20*group_d*threat_d\)

Learning Outcomes

For models with dichotomous independent variables, you learned:

Basic terminology from ANOVA framework.
How to identify main effects, simple effects, and interactions in table of means and figures.
How to link coefficients from interactive models with each coding system to table of means and figures (both directions).
How to calculate simple effects.
How to write up and display results.