BS — Battle of the Sexes

GT Behaviour · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

Payoff matrix — Row payoff, Column payoff

	L / Yield (col)	R / Demand (col)
T / Yield (row)	0, 0	12, 36 ♦ P2 Pref
D / Demand (row)	36, 12 ★ P1 Pref	0, 0

In Battle of the Sexes, there is an asymmetric conflict of interest. Both players want to coordinate, but on different equilibria. Action “D” corresponds to demanding one’s preferred equilibrium, while “T” corresponds to yielding to the other’s preferred equilibrium.

1 — Equilibria & coordination

Objective

Describe the equilibrium outcomes reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.

Equilibrium labels (BS)

P1 Preferred (D,L) — Coordinated; P1 gets 36, P2 gets 12 ★
P2 Preferred (T,R) — Coordinated; P1 gets 12, P2 gets 36 ♦
Both Demand (D,R) — Miscoordinated; both get 0
Both Yield (T,L) — Miscoordinated; both get 0

Equilibrium distributions

Show code

p_eq_bs

Figure 1: BS — Equilibrium distributions in Part 1 (top) and Part 2 (bottom).

Coordination rates: Part 1 vs Part 2

Show code

coord_bs |>
  dplyr::select(part, x, n, pct, ci95) |>
  gt::gt() |>
  gt::tab_header(title    = "BS — Coordination rates: Part 1 vs Part 2",
                 subtitle = "95% Clopper-Pearson CI") |>
  gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
p_coord_bs

Phase	n coordinated	N	%	95% CI
BS — Coordination rates: Part 1 vs Part 2
95% Clopper-Pearson CI
Part 1	6	16	37.5%	[15.2%, 64.6%]
Part 2	9	16	56.2%	[29.9%, 80.2%]

Figure 2: BS — Coordination rate in Part 1 vs Part 2 (couple level). 95% Clopper-Pearson CI.

McNemar tests

Show code

tab_mc_bs |>
  gt::gt() |>
  gt::tab_header(title    = "McNemar tests — BS (couple level)",
                 subtitle = "Paired Part 1 vs Part 2") |>
  gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
                 note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

Test	χ²	p-value	Note
McNemar tests — BS (couple level)
Paired Part 1 vs Part 2
BS — coord Part1 vs Part2	0.4440	0.5050	OK
BS — mutual Demand (D) Part1 vs Part2	0.4440	0.5050	OK

Note

Coordination rate in Part 1: 37.5% of pairs → Part 2: 56.2%. In BS, coordination means landing on either the P1 Preferred (D,T) or P2 Preferred (T,D) equilibrium — both produce positive payoffs (36+12). A significant McNemar result would indicate that cheap talk systematically shifted couples towards coordinated outcomes and away from miscoordination ((D,D) or (T,T), both yielding 0€ for each player).

Conditioning on session gender

Show code

tab_cond_coord_bs |>
  gt::gt() |>
  gt::tab_header(title    = "BS — Coordination and Demand choice by session gender",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
p_coord_gender_bs

Outcome	Factor	χ²(sim.) test
BS — Coordination and Demand choice by session gender
χ² with Monte Carlo simulated p-value (B = 2000), couple level
Coordination Part 2	Session gender	χ²(sim.): p = 1.000 ns
Coordination Part 1	Session gender	χ²(sim.): p = 0.610 ns
Mutual Demand (D) Part 2	Session gender	χ²(sim.): p = 1.000 ns
Mutual Demand (D) Part 1	Session gender	χ²(sim.): p = 0.621 ns

Figure 3: BS — P(Coordination Part 2) by session gender (couple level). Error bars = 95% Clopper-Pearson CI.

2 — Choice & signal distributions

Objective

Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.

Distributions table

Show code

tab_bs

Choice / Signal	n	N	%	95% CI
BS — Choice and signal distributions
95% Clopper-Pearson CI
Part 1
T	8	32	25.0%	[11.5%, 43.4%]
D	24	32	75.0%	[56.6%, 88.5%]
Signal
T	11	32	34.4%	[18.6%, 53.2%]
D	21	32	65.6%	[46.8%, 81.4%]
Part 2
T	11	32	34.4%	[18.6%, 53.2%]
D	21	32	65.6%	[46.8%, 81.4%]

Proportions by phase

Show code

p_bs_dist

Figure 4: BS — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices.

Part 1 → Part 2 snapshot

Demand choice (D) in Part 1: 75.0%. The dominant signal was D (65.6%). Demand choice in Part 2: 65.6%. In BS, neither Demand (D) nor Yield (T) is a dominant strategy — the optimal action depends entirely on what the opponent plays. A high D signal rate would reflect players trying to claim their preferred equilibrium via cheap talk, while a shift in D choice from Part 1 to Part 2 would indicate that pre-play communication influenced strategic behaviour.

Within-subject shift: McNemar test (Part 1 vs Part 2)

Show code

tab_mcnemar_bs |>
  gt::gt() |>
  gt::tab_header(
    title    = "McNemar test — BS: choice1_D vs choice2_D",
    subtitle = "Paired within-individual"
  ) |>
  gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

χ²	p-value	n	Note
McNemar test — BS: choice1_D vs choice2_D
Paired within-individual
0.3080	0.5791	32	OK

Note

A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Demand (D) choice rates. In BS, neither D nor T is a dominant strategy — best responses are symmetric. Cheap talk can be credible here: a player who signals D (Demand) is announcing their intention to claim the P1-preferred equilibrium. If the opponent signals T (Yield), the combination D/T (P1 Preferred) or T/D (P2 Preferred) allows efficient coordination. Signals can therefore serve as commitment devices, helping couples resolve the equilibrium-selection problem.

Opponent signal and the information set before Part 2

Important

Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.

Show code

p_sig_heatmap_bs

Figure 5: BS — Joint distribution of own signal × opponent signal received. Values show count and share.

Show code

p_choice2_by_oppsig_bs

Figure 6: BS — P(choice₂ = D, Stag) stratified by opponent’s signal. 95% Clopper-Pearson CI.

Cheap talk in BS: signal credibly shifts Demand choice

In the Battle of the Sexes, neither T (Yield) nor D (Demand) is a strictly dominant strategy — the game has two pure-strategy Nash equilibria: the P1 Preferred equilibrium (D,T) yielding (36, 12), and the P2 Preferred equilibrium (T,D) yielding (12, 36). Miscoordination ((D,D) or (T,T)) yields (0, 0) for both players. This structure makes cheap talk potentially credible: unlike PD where defection dominates regardless, in BS both players want to coordinate — they just disagree on which equilibrium. A player who signals D (Demand) is asserting their preference for the (D,T) equilibrium; if the opponent then yields (T), coordination is achieved. Signals are self-committing (following through after signalling D is rational if the opponent yields) and may function as credible announcements of intent. A large gap in Demand choice rates conditional on receiving D vs T from the opponent would confirm that cheap talk effectively resolves the equilibrium-selection problem in BS.

Show code

p_choice2_infoset_bs

Figure 7: BS — P(choice₂ = D, Stag) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted.

BS interpretation

The critical test is the (D/D) information set — where both players have signalled Demand. If both players try to claim their preferred equilibrium, the outcome is the (D,D) miscoordination trap (0, 0 for both). Demand choice rates at (D/D) reflect the degree of strategic stubbornness when both parties refuse to yield. The (T/T) information set (both signalled Yield) represents mutual deference — Yield choice should be near 100% and the (T,T) miscoordination trap should dominate. The most informative cells are the (D/T) and (T/D) information sets — asymmetric signals — where one player signals D and the other T. If signals are effective, the D-signaller should mostly Demand and the T-signaller should mostly Yield, producing the efficient (D,T) or (T,D) equilibria. A high rate of Demand choice in the D-received cell vs the T-received cell is the clearest evidence of cheap talk effectiveness in BS.

Show code

p_follow_opp_bs

Figure 8: BS — Proportion of players whose Part 2 choice matches the opponent’s signal received.

Conditioning on gender and role

Show code

tab_cond_sig_bs |>
  gt::gt() |>
  gt::tab_header(title = "BS — Choice and signal distributions by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
BS — Choice and signal distributions by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Part 1 = T	Gender	χ²(sim.): p = 1.000 ns
Part 1 = T	Role	χ²(sim.): p = 0.675 ns
Signal = T	Gender	χ²(sim.): p = 0.456 ns
Signal = T	Role	χ²(sim.): p = 1.000 ns
Part 2 = T	Gender	χ²(sim.): p = 1.000 ns
Part 2 = T	Role	χ²(sim.): p = 0.472 ns

Show code

p_cond_sig_bs

Figure 9: BS — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

3 — Signal honesty & consistency

Objective

Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.

Honesty and consistency proportions

Show code

tab_sec2_bs |>
  gt::gt() |>
  gt::tab_header(title    = "BS — Signal honesty and consistency",
                 subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
  gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
    locations = gt::cells_body(columns = variable, rows = 1)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
    locations = gt::cells_body(columns = variable, rows = 2)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
    locations = gt::cells_body(columns = variable, rows = 3)
  ) |>
  gt::tab_options(table.font.size = 13)

Measure	n (=1)	N	%	95% CI
BS — Signal honesty and consistency
95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).
Signal honest (signal = choice2)¹	20	32	62.5%	[43.7%, 78.9%]
Signal consistent with Part 1 (signal = choice1)²	21	32	65.6%	[46.8%, 81.4%]
Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)³	13	32	40.6%	[23.7%, 59.4%]
¹ 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.
² 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.
³ 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables.

Binomial test: signal honesty vs 50%

Show code

tab_binom_honest_bs |>
  gt::gt() |>
  gt::tab_header(title    = "Binomial test: P(Honest) vs H₀ = 0.50",
                 subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
  gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
                 p_value = "p-value") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

n honest	N	%	95% CI	p-value
Binomial test: P(Honest) vs H₀ = 0.50
Two-sided test; 95% Clopper-Pearson CI
20	32	62.5%	[43.7%, 78.9%]	0.2153

Note

In Battle of the Sexes, a signal is honest if the player’s Part 2 action matches what they signalled. Honesty in this game reflects commitment. A player who signals “Demand” and plays “Demand” is using cheap talk to credibly commit and force the opponent to yield. A player who signals “Demand” but then plays “Yield” is effectively “chickening out” of their threat. The aggregate honesty rate primarily captures the credibility of these strategic demands.

Strategic transition heatmaps

Show code

p_sankey_bs

Figure 10: BS — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent).

Conditioning on gender and role

Show code

tab_cond_honest_bs |>
  gt::gt() |>
  gt::tab_header(title    = "BS — Signal honesty and strategy change by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
BS — Signal honesty and strategy change by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Signal honest	Gender	χ²(sim.): p = 0.715 ns
Signal honest	Role	χ²(sim.): p = 0.720 ns
Strategy changed	Gender	χ²(sim.): p = 0.477 ns
Strategy changed	Role	χ²(sim.): p = 0.478 ns

Show code

p_cond_honest_bs

Figure 11: BS — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

4 — Belief accuracy & bonus

The two belief questions

After Part 2, each player answered two incentivised questions about their beliefs:

Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.

Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.

The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.

Objective

Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.

Belief accuracy distribution

Show code

p_belief_bar_bs

Figure 12: BS — Distribution of belief accuracy scores.

Hypothesis tests: beliefs, cognitive ability & coordination

	Level	Hypothesis	X	Y	Test	Expected	Stat	p	n
BS — Belief accuracy: hypothesis comparison
H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi).
H1	Individual	Reflective thinkers (high CRT) predict opponent’s choice more accurately	CRT score (0–4)	Belief accuracy (0–2)	Spearman ρ	Positive	0.169	0.354	32
H2	Individual	Players who made more quiz errors have less accurate beliefs	Quiz errors [log(1+x)]	Belief accuracy (0–2)	Spearman ρ	Negative	0.099	0.591	32
H3	Couple	Couples where both players have perfect beliefs coordinate more in Part 2	Coord. Part 2 (0/1)	Both perfect beliefs (0/1)	Fisher exact + φ¹	Positive	0.424	0.213	16
¹ H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table.

Note

H1 tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.

Conditioning on gender and role

Show code

p_cond_belief_bs

Figure 13: BS — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct.

5 — Econometric models

5.1 — Determinants of belief accuracy

Estimate an ordered logit (proportional-odds model) for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).

Small-sample caveat

With n = 32 participants (score 0: n=5; 1: n=16; 2: n=11), EPV is computed as min(n₀, n₂) / k: M1 = 5, M2 = 2.5, M3 = 1.7. All are well below the recommended 10. All results are exploratory and should be treated as hypothesis-generating.

Model specifications

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]

MASC = Theory of Mind total score; IRI-PT = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents’ decisions); IRI-PD = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.

Coefficient table

Show code

tab_olog_gt

Predictor	β	SE	t	p	OR	OR 2.5%	OR 97.5%
BS — Ordered logit: determinants of belief accuracy
DV = belief accuracy score (0/1/2, ordered). n = 32 (score 0: n=5; 1: n=16; 2: n=11). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.¹
M1: MASC only
MASC ToM score (z)	0.316	0.349	0.904	0.3661	1.371	0.691	2.719
M2: MASC + IRI-PT
MASC ToM score (z)	0.269	0.359	0.751	0.4529	1.309	0.648	2.695
IRI Perspective Taking (z)	-0.283	0.365	-0.775	0.4385	0.754	0.361	1.544
M3: MASC + IRI-PT + IRI-PD
MASC ToM score (z)	0.266	0.360	0.740	0.4595	1.305	0.644	2.690
IRI Perspective Taking (z)	-0.276	0.369	-0.750	0.4531	0.758	0.361	1.562
IRI Personal Distress (z)	0.044	0.342	0.129	0.8977	1.045	0.527	2.050
¹ Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously.

Goodness of fit

Show code

tab_gof_olog_gt

Model	n	Predictors	EPV	AIC	McFadden R²
BS — Ordered logit: goodness of fit
DV = belief accuracy score (0/1/2). EPV = min(n₀, n₂) / k.
M1: MASC only	32	1	5.0	69.4	0.0129
M2: MASC + IRI-PT	32	2	2.5	70.8	0.0223
M3: MASC + IRI-PT + IRI-PD	32	3	1.7	72.8	0.0225

Forest plot: odds ratios

Show code

p_forest_olog

Figure 14: BS — Ordered logit: odds ratios for belief accuracy. OR > 1 increases probability of higher accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect). x-axis log scale.

Interpretation

MASC ToM (M1–M3): OR = 1.371, p = 0.3661. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents’ decisions — an OR > 1 is consistent with this interpretation. IRI Perspective Taking (M2–M3): OR = 0.754, p = 0.4385 — cognitive empathy is directly relevant to inferring opponents’ intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. IRI Personal Distress (M3): OR = 1.045, p = 0.8977 — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ 5 across all models, all estimates carry substantial uncertainty.

5.2 — Demand choice when opponent signals Demand

Small-sample note

This analysis restricts the sample to participants who received a D signal (Demand) from their opponent (opp_signal_received = D). The dependent variable is whether they themselves also chose D (Demand), leading to the (D,D) miscoordination outcome (0, 0). Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.

Models

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]

Sample: opp_signal_received = T only. n = 21, events = 12. EPV: M1 = 12, M2 = 6, M3 = 12. All estimated with Firth penalised logit.

Results

Predictor	β	SE	z	p	OR	OR 2.5%	OR 97.5%
BS — Demand (D) when opponent signals T (Hare)
DV = choice2=D \| opp_signal=D. n=21, events=12. EPV: M1=12, M2=6, M3=12. All Firth penalised logit (brglm2).¹
M1: Quiz only
Quiz errors [log(1+x)]	-0.112	0.523	-0.21	0.831	0.89	0.30	2.64
M2: Quiz + CRT
Quiz errors [log(1+x)]	-0.133	0.537	-0.25	0.804	0.88	0.31	2.51
CRT score (0–4)	-0.203	0.483	-0.42	0.674	0.82	0.32	2.10
M3: CRT only
CRT score (0–4)	-0.250	0.477	-0.52	0.600	0.78	0.28	1.95
¹ Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(Demand (D) \| opp signals D). 95% Wald CI.

Figure 15: Demand choice when opponent signals T (Hare) — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(Demand (D)). All Firth. x-axis log scale.

Interpretation

Quiz errors (OR = 0.88, p = 0.804): a higher error rate on the comprehension quiz may reflect lower understanding of the game’s conflict structure, potentially increasing stubborn Demand even when the opponent has already signalled Demand (risking (D,D) miscoordination). CRT score (OR = 0.82, p = 0.674): more reflective thinkers may better recognise that when the opponent signals D (Demand), the individually rational response to avoid (0,0) is to Yield (T) and accept the P2 Preferred equilibrium (12, 36) rather than insist on Demand and risk mutual miscoordination. A negative OR for CRT would be consistent with this interpretation. EPV = 6 — estimates are exploratory and should be interpreted with caution.