SH — Stag Hunt

GT Behaviour · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

Payoff matrix — Row payoff, Column payoff

	L (col)	R (col)
T (row)	14, 14 ★	14, 0
D (row)	0, 14	20, 20 ♦

★ = Risk-dominant Nash equilibrium (Hare-Hare). ♦ = Pareto-dominant Nash equilibrium (Stag-Stag). In Stag Hunt, cheap talk is highly effective because both players want to coordinate, and signals are self-signaling and self-committing.

1 — Equilibria & coordination

Objective

Describe the equilibrium outcomes reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.

Equilibrium labels (SH)

(D,D) — Stag-Stag — Pareto-dominant NE; both hunt Stag → 20, 20 € ♦
(T,T) — Hare-Hare — Risk-dominant NE; both hunt Hare → 14, 14 € ★
(T,D) — Hare-Stag: P1 hunts Hare (14 €), P2 hunts Stag alone (0 €)
(D,T) — Stag-Hare: P1 hunts Stag alone (0 €), P2 hunts Hare (14 €)

Equilibrium distributions

Show code

p_eq_sh

Figure 1: SH — Equilibrium distributions in Part 1 (top) and Part 2 (bottom).

Coordination rates: Part 1 vs Part 2

Show code

coord_sh |>
  dplyr::select(part, x, n, pct, ci95) |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Coordination rates: Part 1 vs Part 2",
                 subtitle = "95% Clopper-Pearson CI") |>
  gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
p_coord_sh

Phase	n coordinated	N	%	95% CI
SH — Coordination rates: Part 1 vs Part 2
95% Clopper-Pearson CI
Part 1	8	16	50.0%	[24.7%, 75.3%]
Part 2	12	16	75.0%	[47.6%, 92.7%]

Figure 2: SH — Coordination rate in Part 1 vs Part 2 (couple level). 95% Clopper-Pearson CI.

McNemar tests

Show code

tab_mc_sh |>
  gt::gt() |>
  gt::tab_header(title    = "McNemar tests — SH (couple level)",
                 subtitle = "Paired Part 1 vs Part 2") |>
  gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
                 note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

Test	χ²	p-value	Note
McNemar tests — SH (couple level)
Paired Part 1 vs Part 2
SH — coord Part1 vs Part2	2.2500	0.1336	OK
SH — mutual Stag choice Part1 vs Part2	0.0000	1.0000	OK

Note

Pareto equilibrium (Stag-Stag) rate in Part 1: 50.0% of pairs → Part 2: 50.0%. A significant McNemar result would indicate that cheap talk systematically shifted couples from the risk-dominant (Hare-Hare) to the Pareto-dominant (Stag-Stag) equilibrium.

Conditioning on session gender

Show code

tab_cond_coord_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Coordination and Stag choice by session gender",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
p_coord_gender_sh

Outcome	Factor	χ²(sim.) test
SH — Coordination and Stag choice by session gender
χ² with Monte Carlo simulated p-value (B = 2000), couple level
Coordination Part 2	Session gender	χ²(sim.): p = 0.544 ns
Coordination Part 1	Session gender	χ²(sim.): p = 1.000 ns
Mutual Stag choice Part 2	Session gender	χ²(sim.): p = 1.000 ns
Mutual Stag choice Part 1	Session gender	χ²(sim.): p = 1.000 ns

Figure 3: SH — P(Coordination Part 2) by session gender (couple level). Error bars = 95% Clopper-Pearson CI.

2 — Choice & signal distributions

Objective

Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.

Distributions table

Show code

tab_sh

Choice / Signal	n	N	%	95% CI
SH — Choice and signal distributions
95% Clopper-Pearson CI
Part 1
T	8	32	25.0%	[11.5%, 43.4%]
D	24	32	75.0%	[56.6%, 88.5%]
Signal
T	9	32	28.1%	[13.7%, 46.7%]
D	23	32	71.9%	[53.3%, 86.3%]
Part 2
T	12	32	37.5%	[21.1%, 56.3%]
D	20	32	62.5%	[43.7%, 78.9%]

Proportions by phase

Show code

p_sh_dist

Figure 4: SH — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices.

Part 1 → Part 2 snapshot

Stag choice (D) in Part 1: 75.0%. The dominant signal was D (71.9%). Stag choice in Part 2: 62.5%. An increase in D (Stag) from Part 1 to Part 2 would be consistent with cheap talk successfully shifting players from the safe Hare equilibrium to the Pareto-dominant Stag equilibrium. Unlike PD or MP, Hare is not a strictly dominant strategy in SH — it is merely risk-dominant — so signals can plausibly serve as coordination devices.

Within-subject shift: McNemar test (Part 1 vs Part 2)

Show code

tab_mcnemar_sh |>
  gt::gt() |>
  gt::tab_header(
    title    = "McNemar test — SH: choice1_D vs choice2_D",
    subtitle = "Paired within-individual"
  ) |>
  gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

χ²	p-value	n	Note
McNemar test — SH: choice1_D vs choice2_D
Paired within-individual
1.1250	0.2888	32	OK

Note

A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Stag (D) choice rates. In SH, Hare is not a dominant strategy — it is the risk-dominant equilibrium, but Stag is the better mutual response if the opponent also plays Stag. Cheap talk is theoretically credible in SH because both players prefer the Pareto equilibrium (20, 20) over the risk-dominant one (14, 14), making signals self-signaling and self-committing.

Opponent signal and the information set before Part 2

Important

Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.

Show code

p_sig_heatmap_sh

Figure 5: SH — Joint distribution of own signal × opponent signal received. Values show count and share.

Show code

p_choice2_by_oppsig_sh

Figure 6: SH — P(choice₂ = D, Stag) stratified by opponent’s signal. 95% Clopper-Pearson CI.

Cheap talk in SH: signal credibly shifts Stag choice

In the Stag Hunt, neither T (Hare) nor D (Stag) is a strictly dominant strategy. Hare is merely the risk-dominant choice — it is the safer option under uncertainty, but both players would prefer mutual Stag (20, 20) over mutual Hare (14, 14). This makes cheap talk highly credible: a player who signals D (Stag) is credibly committing to the Pareto-dominant equilibrium, because following through is individually rational as long as the opponent also plays Stag. Signals are both self-signaling (it is rational to send a D signal only if you intend to play D) and self-committing (it is rational to follow a D signal once sent). A large gap in Stag choice rates between receiving a D signal vs a T signal would confirm the effectiveness of cheap talk in SH.

Show code

p_choice2_infoset_sh

Figure 7: SH — P(choice₂ = D, Stag) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted.

SH interpretation

The critical test is the (D/D) information set — where both players have signalled Stag (D). If both players trust the signal, this should produce nearly 100% Stag choice, converging on the Pareto-dominant (D,D) equilibrium. Values below 100% reflect residual strategic risk aversion (fear of being the only Stag hunter). The (T/T) information set (both signalled Hare) represents the risk-dominant equilibrium pull — here Hare choice should be near 100%. A large D/D vs T/T gap in Stag choice rates is the clearest evidence of cheap talk effectiveness in SH.

Show code

p_follow_opp_sh

Figure 8: SH — Proportion of players whose Part 2 choice matches the opponent’s signal received.

Conditioning on gender and role

Show code

tab_cond_sig_sh |>
  gt::gt() |>
  gt::tab_header(title = "SH — Choice and signal distributions by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
SH — Choice and signal distributions by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Part 1 = T	Gender	χ²(sim.): p = 1.000 ns
Part 1 = T	Role	χ²(sim.): p = 0.667 ns
Signal = T	Gender	χ²(sim.): p = 1.000 ns
Signal = T	Role	χ²(sim.): p = 0.432 ns
Part 2 = T	Gender	χ²(sim.): p = 0.722 ns
Part 2 = T	Role	χ²(sim.): p = 1.000 ns

Show code

p_cond_sig_sh

Figure 9: SH — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

3 — Signal honesty & consistency

Objective

Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.

Honesty and consistency proportions

Show code

tab_sec2_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Signal honesty and consistency",
                 subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
  gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
    locations = gt::cells_body(columns = variable, rows = 1)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
    locations = gt::cells_body(columns = variable, rows = 2)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
    locations = gt::cells_body(columns = variable, rows = 3)
  ) |>
  gt::tab_options(table.font.size = 13)

Measure	n (=1)	N	%	95% CI
SH — Signal honesty and consistency
95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).
Signal honest (signal = choice2)¹	25	32	78.1%	[60.0%, 90.7%]
Signal consistent with Part 1 (signal = choice1)²	29	32	90.6%	[75.0%, 98.0%]
Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)³	8	32	25.0%	[11.5%, 43.4%]
¹ 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.
² 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.
³ 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables.

Binomial test: signal honesty vs 50%

Show code

tab_binom_honest_sh |>
  gt::gt() |>
  gt::tab_header(title    = "Binomial test: P(Honest) vs H₀ = 0.50",
                 subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
  gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
                 p_value = "p-value") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

n honest	N	%	95% CI	p-value
Binomial test: P(Honest) vs H₀ = 0.50
Two-sided test; 95% Clopper-Pearson CI
25	32	78.1%	[60.0%, 90.7%]	0.0021

Note

In Stag Hunt, a signal is honest if the player’s Part 2 action matches what they signalled. Unlike PD where honesty is only meaningful for the dominated cooperative strategy, in SH both signals (Stag/D and Hare/T) correspond to Nash equilibria. Honesty therefore measures pure coordination commitment. A player who signals Stag and plays Stag is attempting the Pareto-dominant equilibrium; a player who signals Hare and plays Hare is securing the risk-dominant equilibrium. The aggregate honesty rate is a genuine measure of how much players rely on cheap talk to coordinate their equilibria.

Strategic transition heatmaps

Show code

p_sankey_sh

Figure 10: SH — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent).

Conditioning on gender and role

Show code

tab_cond_honest_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Signal honesty and strategy change by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
SH — Signal honesty and strategy change by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Signal honest	Gender	χ²(sim.): p = 1.000 ns
Signal honest	Role	χ²(sim.): p = 1.000 ns
Strategy changed	Gender	χ²(sim.): p = 0.681 ns
Strategy changed	Role	χ²(sim.): p = 0.683 ns

Show code

p_cond_honest_sh

Figure 11: SH — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

4 — Belief accuracy & bonus

The two belief questions

After Part 2, each player answered two incentivised questions about their beliefs:

Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.

Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.

The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.

Objective

Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.

Belief accuracy distribution

Show code

p_belief_bar_sh

Figure 12: SH — Distribution of belief accuracy scores.

Hypothesis tests: beliefs, cognitive ability & coordination

	Level	Hypothesis	X	Y	Test	Expected	Stat	p	n
SH — Belief accuracy: hypothesis comparison
H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi).
H1	Individual	Reflective thinkers (high CRT) predict opponent’s choice more accurately	CRT score (0–4)	Belief accuracy (0–2)	Spearman ρ	Positive	-0.017	0.927	32
H2	Individual	Players who made more quiz errors have less accurate beliefs	Quiz errors [log(1+x)]	Belief accuracy (0–2)	Spearman ρ	Negative	0.141	0.441	32
H3	Couple	Couples where both players have perfect beliefs coordinate more in Part 2	Coord. Part 2 (0/1)	Both perfect beliefs (0/1)	Fisher exact + φ¹	Positive	0.577	0.077	16
¹ H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table.

Note

H1 tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.

Conditioning on gender and role

Show code

p_cond_belief_sh

Figure 13: SH — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct.

5 — Econometric models

5.1 — Determinants of belief accuracy

Estimate an ordered logit (proportional-odds model) for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).

Small-sample caveat

With n = 32 participants (score 0: n=4; 1: n=9; 2: n=19), EPV is computed as min(n₀, n₂) / k: M1 = 4, M2 = 2, M3 = 1.3. All are well below the recommended 10. All results are exploratory and should be treated as hypothesis-generating.

Model specifications

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]

MASC = Theory of Mind total score; IRI-PT = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents’ decisions); IRI-PD = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.

Coefficient table

Show code

tab_olog_gt

Predictor	β	SE	t	p	OR	OR 2.5%	OR 97.5%
SH — Ordered logit: determinants of belief accuracy
DV = belief accuracy score (0/1/2, ordered). n = 32 (score 0: n=4; 1: n=9; 2: n=19). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.¹
M1: MASC only
MASC ToM score (z)	0.277	0.331	0.835	0.4036	1.319	0.689	2.525
M2: MASC + IRI-PT
MASC ToM score (z)	0.341	0.346	0.985	0.3248	1.406	0.706	2.865
IRI Perspective Taking (z)	0.259	0.360	0.720	0.4713	1.296	0.635	2.702
M3: MASC + IRI-PT + IRI-PD
MASC ToM score (z)	0.479	0.366	1.307	0.1911	1.615	0.788	3.457
IRI Perspective Taking (z)	0.404	0.422	0.955	0.3394	1.497	0.662	3.651
IRI Personal Distress (z)	-0.759	0.431	-1.762	0.0780	0.468	0.183	1.031
¹ Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously.

Goodness of fit

Show code

tab_gof_olog_gt

Model	n	Predictors	EPV	AIC	McFadden R²
SH — Ordered logit: goodness of fit
DV = belief accuracy score (0/1/2). EPV = min(n₀, n₂) / k.
M1: MASC only	32	1	4.0	64.6	0.0117
M2: MASC + IRI-PT	32	2	2.0	66.1	0.0205
M3: MASC + IRI-PT + IRI-PD	32	3	1.3	64.5	0.0801

Forest plot: odds ratios

Show code

p_forest_olog

Figure 14: SH — Ordered logit: odds ratios for belief accuracy. OR > 1 increases probability of higher accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect). x-axis log scale.

Interpretation

MASC ToM (M1–M3): OR = 1.319, p = 0.4036. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents’ decisions — an OR > 1 is consistent with this interpretation. IRI Perspective Taking (M2–M3): OR = 1.296, p = 0.4713 — cognitive empathy is directly relevant to inferring opponents’ intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. IRI Personal Distress (M3): OR = 0.468, p = 0.078 — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ 4 across all models, all estimates carry substantial uncertainty.

5.2 — Stag choice under Hare signal

Small-sample note

This analysis restricts the sample to participants who received a T signal (Hare) from their opponent (opp_signal_received = T). The dependent variable is whether they nonetheless chose D (Stag) — the Pareto-dominant action — despite receiving a Hare signal. Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.

Models

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]

Sample: opp_signal_received = T only. n = 9, events = 4. EPV: M1 = 4, M2 = 2, M3 = 4. All estimated with Firth penalised logit.

Results

Predictor	β	SE	z	p	OR	OR 2.5%	OR 97.5%
SH — Stag choice when opponent signals T (Hare)
DV = choice2=D \| opp_signal=T. n=9, events=4. EPV: M1=4, M2=2, M3=4. All Firth penalised logit (brglm2).¹
M1: Quiz only
Quiz errors [log(1+x)]	-0.263	0.877	-0.30	0.764	0.77	0.14	4.28
M2: Quiz + CRT
Quiz errors [log(1+x)]	0.099	1.036	0.10	0.924	1.10	0.14	8.41
CRT score (0–4)	0.834	1.006	0.83	0.407	2.30	0.32	16.54
M3: CRT only
CRT score (0–4)	0.952	0.955	1.00	0.319	2.59	0.40	16.83
¹ Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(choose Stag \| opp signals D). 95% Wald CI.

Figure 15: Stag choice when opponent signals T (Hare) — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(choose Stag). All Firth. x-axis log scale.

Interpretation

Quiz errors (OR = 1.1, p = 0.924): a higher error rate on the comprehension quiz may reflect lower understanding of the game, potentially reducing willingness to attempt the risky Stag choice even when the opponent signals Hare (T). CRT score (OR = 2.3, p = 0.407): more reflective thinkers may be better at recognising that choosing Stag despite a Hare signal is a dominated gamble — the opponent has revealed their intention to play Hare, making Stag strictly worse. EPV = 2 — estimates are exploratory and should be interpreted with caution.