GT Behaviour · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

TipPayoff matrix — Row payoff, Column payoff
L (col) R (col)
T (row) 14, 14 14, 0
D (row) 0, 14 20, 20

★ = Risk-dominant Nash equilibrium (Hare-Hare). ♦ = Pareto-dominant Nash equilibrium (Stag-Stag). In Stag Hunt, cheap talk is highly effective because both players want to coordinate, and signals are self-signaling and self-committing.


1 — Equilibria & coordination

Objective

Describe the equilibrium outcomes reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.

TipEquilibrium labels (SH)
  • (D,D) — Stag-Stag — Pareto-dominant NE; both hunt Stag → 20, 20 € ♦
  • (T,T) — Hare-Hare — Risk-dominant NE; both hunt Hare → 14, 14 € ★
  • (T,D) — Hare-Stag: P1 hunts Hare (14 €), P2 hunts Stag alone (0 €)
  • (D,T) — Stag-Hare: P1 hunts Stag alone (0 €), P2 hunts Hare (14 €)

Equilibrium distributions

Show code
p_eq_sh
Figure 1: SH — Equilibrium distributions in Part 1 (top) and Part 2 (bottom).

Coordination rates: Part 1 vs Part 2

Show code
coord_sh |>
  dplyr::select(part, x, n, pct, ci95) |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Coordination rates: Part 1 vs Part 2",
                 subtitle = "95% Clopper-Pearson CI") |>
  gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
p_coord_sh
SH — Coordination rates: Part 1 vs Part 2
95% Clopper-Pearson CI
Phase n coordinated N % 95% CI
Part 1 8 16 50.0% [24.7%, 75.3%]
Part 2 12 16 75.0% [47.6%, 92.7%]
Figure 2: SH — Coordination rate in Part 1 vs Part 2 (couple level). 95% Clopper-Pearson CI.

McNemar tests

Show code
tab_mc_sh |>
  gt::gt() |>
  gt::tab_header(title    = "McNemar tests — SH (couple level)",
                 subtitle = "Paired Part 1 vs Part 2") |>
  gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
                 note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
McNemar tests — SH (couple level)
Paired Part 1 vs Part 2
Test χ² p-value Note
SH — coord Part1 vs Part2 2.2500 0.1336 OK
SH — mutual Stag choice Part1 vs Part2 0.0000 1.0000 OK
Note

Pareto equilibrium (Stag-Stag) rate in Part 1: 50.0% of pairs → Part 2: 50.0%. A significant McNemar result would indicate that cheap talk systematically shifted couples from the risk-dominant (Hare-Hare) to the Pareto-dominant (Stag-Stag) equilibrium.

Conditioning on session gender

Show code
tab_cond_coord_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Coordination and Stag choice by session gender",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
p_coord_gender_sh
SH — Coordination and Stag choice by session gender
χ² with Monte Carlo simulated p-value (B = 2000), couple level
Outcome Factor χ²(sim.) test
Coordination Part 2 Session gender χ²(sim.): p = 0.544 ns
Coordination Part 1 Session gender χ²(sim.): p = 1.000 ns
Mutual Stag choice Part 2 Session gender χ²(sim.): p = 1.000 ns
Mutual Stag choice Part 1 Session gender χ²(sim.): p = 1.000 ns
Figure 3: SH — P(Coordination Part 2) by session gender (couple level). Error bars = 95% Clopper-Pearson CI.

2 — Choice & signal distributions

Objective

Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.

Distributions table

Show code
tab_sh
SH — Choice and signal distributions
95% Clopper-Pearson CI
Choice / Signal n N % 95% CI
Part 1
T 8 32 25.0% [11.5%, 43.4%]
D 24 32 75.0% [56.6%, 88.5%]
Signal
T 9 32 28.1% [13.7%, 46.7%]
D 23 32 71.9% [53.3%, 86.3%]
Part 2
T 12 32 37.5% [21.1%, 56.3%]
D 20 32 62.5% [43.7%, 78.9%]

Proportions by phase

Show code
p_sh_dist
Figure 4: SH — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices.
NotePart 1 → Part 2 snapshot

Stag choice (D) in Part 1: 75.0%. The dominant signal was D (71.9%). Stag choice in Part 2: 62.5%. An increase in D (Stag) from Part 1 to Part 2 would be consistent with cheap talk successfully shifting players from the safe Hare equilibrium to the Pareto-dominant Stag equilibrium. Unlike PD or MP, Hare is not a strictly dominant strategy in SH — it is merely risk-dominant — so signals can plausibly serve as coordination devices.

Within-subject shift: McNemar test (Part 1 vs Part 2)

Show code
tab_mcnemar_sh |>
  gt::gt() |>
  gt::tab_header(
    title    = "McNemar test — SH: choice1_D vs choice2_D",
    subtitle = "Paired within-individual"
  ) |>
  gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
McNemar test — SH: choice1_D vs choice2_D
Paired within-individual
χ² p-value n Note
1.1250 0.2888 32 OK
Note

A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Stag (D) choice rates. In SH, Hare is not a dominant strategy — it is the risk-dominant equilibrium, but Stag is the better mutual response if the opponent also plays Stag. Cheap talk is theoretically credible in SH because both players prefer the Pareto equilibrium (20, 20) over the risk-dominant one (14, 14), making signals self-signaling and self-committing.

Opponent signal and the information set before Part 2

Important

Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.

Show code
p_sig_heatmap_sh
Figure 5: SH — Joint distribution of own signal × opponent signal received. Values show count and share.
Show code
p_choice2_by_oppsig_sh
Figure 6: SH — P(choice₂ = D, Stag) stratified by opponent’s signal. 95% Clopper-Pearson CI.
NoteCheap talk in SH: signal credibly shifts Stag choice

In the Stag Hunt, neither T (Hare) nor D (Stag) is a strictly dominant strategy. Hare is merely the risk-dominant choice — it is the safer option under uncertainty, but both players would prefer mutual Stag (20, 20) over mutual Hare (14, 14). This makes cheap talk highly credible: a player who signals D (Stag) is credibly committing to the Pareto-dominant equilibrium, because following through is individually rational as long as the opponent also plays Stag. Signals are both self-signaling (it is rational to send a D signal only if you intend to play D) and self-committing (it is rational to follow a D signal once sent). A large gap in Stag choice rates between receiving a D signal vs a T signal would confirm the effectiveness of cheap talk in SH.

Show code
p_choice2_infoset_sh
Figure 7: SH — P(choice₂ = D, Stag) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted.
NoteSH interpretation

The critical test is the (D/D) information set — where both players have signalled Stag (D). If both players trust the signal, this should produce nearly 100% Stag choice, converging on the Pareto-dominant (D,D) equilibrium. Values below 100% reflect residual strategic risk aversion (fear of being the only Stag hunter). The (T/T) information set (both signalled Hare) represents the risk-dominant equilibrium pull — here Hare choice should be near 100%. A large D/D vs T/T gap in Stag choice rates is the clearest evidence of cheap talk effectiveness in SH.

Show code
p_follow_opp_sh
Figure 8: SH — Proportion of players whose Part 2 choice matches the opponent’s signal received.

Conditioning on gender and role

Show code
tab_cond_sig_sh |>
  gt::gt() |>
  gt::tab_header(title = "SH — Choice and signal distributions by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
SH — Choice and signal distributions by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Outcome Factor χ²(sim.) test
Part 1 = T Gender χ²(sim.): p = 1.000 ns
Part 1 = T Role χ²(sim.): p = 0.667 ns
Signal = T Gender χ²(sim.): p = 1.000 ns
Signal = T Role χ²(sim.): p = 0.432 ns
Part 2 = T Gender χ²(sim.): p = 0.722 ns
Part 2 = T Role χ²(sim.): p = 1.000 ns
Show code
p_cond_sig_sh
Figure 9: SH — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

3 — Signal honesty & consistency

Objective

Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.

Honesty and consistency proportions

Show code
tab_sec2_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Signal honesty and consistency",
                 subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
  gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
    locations = gt::cells_body(columns = variable, rows = 1)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
    locations = gt::cells_body(columns = variable, rows = 2)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
    locations = gt::cells_body(columns = variable, rows = 3)
  ) |>
  gt::tab_options(table.font.size = 13)
SH — Signal honesty and consistency
95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).
Measure n (=1) N % 95% CI
Signal honest (signal = choice2)1 25 32 78.1% [60.0%, 90.7%]
Signal consistent with Part 1 (signal = choice1)2 29 32 90.6% [75.0%, 98.0%]
Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)3 8 32 25.0% [11.5%, 43.4%]
1 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.
2 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.
3 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables.

Binomial test: signal honesty vs 50%

Show code
tab_binom_honest_sh |>
  gt::gt() |>
  gt::tab_header(title    = "Binomial test: P(Honest) vs H₀ = 0.50",
                 subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
  gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
                 p_value = "p-value") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
Binomial test: P(Honest) vs H₀ = 0.50
Two-sided test; 95% Clopper-Pearson CI
n honest N % 95% CI p-value
25 32 78.1% [60.0%, 90.7%] 0.0021
Note

In Stag Hunt, a signal is honest if the player’s Part 2 action matches what they signalled. Unlike PD where honesty is only meaningful for the dominated cooperative strategy, in SH both signals (Stag/D and Hare/T) correspond to Nash equilibria. Honesty therefore measures pure coordination commitment. A player who signals Stag and plays Stag is attempting the Pareto-dominant equilibrium; a player who signals Hare and plays Hare is securing the risk-dominant equilibrium. The aggregate honesty rate is a genuine measure of how much players rely on cheap talk to coordinate their equilibria.

Strategic transition heatmaps

Show code
p_sankey_sh
Figure 10: SH — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent).

Conditioning on gender and role

Show code
tab_cond_honest_sh |>
  gt::gt() |>
  gt::tab_header(title    = "SH — Signal honesty and strategy change by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
SH — Signal honesty and strategy change by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Outcome Factor χ²(sim.) test
Signal honest Gender χ²(sim.): p = 1.000 ns
Signal honest Role χ²(sim.): p = 1.000 ns
Strategy changed Gender χ²(sim.): p = 0.681 ns
Strategy changed Role χ²(sim.): p = 0.683 ns
Show code
p_cond_honest_sh
Figure 11: SH — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

4 — Belief accuracy & bonus
NoteThe two belief questions

After Part 2, each player answered two incentivised questions about their beliefs:

Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.

Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.

The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.

Objective

Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.

Belief accuracy distribution

Show code
p_belief_bar_sh
Figure 12: SH — Distribution of belief accuracy scores.

Hypothesis tests: beliefs, cognitive ability & coordination

SH — Belief accuracy: hypothesis comparison
H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi).
Level Hypothesis X Y Test Expected Stat p n
H1 Individual Reflective thinkers (high CRT) predict opponent’s choice more accurately CRT score (0–4) Belief accuracy (0–2) Spearman ρ Positive -0.017 0.927 32
H2 Individual Players who made more quiz errors have less accurate beliefs Quiz errors [log(1+x)] Belief accuracy (0–2) Spearman ρ Negative 0.141 0.441 32
H3 Couple Couples where both players have perfect beliefs coordinate more in Part 2 Coord. Part 2 (0/1) Both perfect beliefs (0/1) Fisher exact + φ1 Positive 0.577 0.077 16
1 H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table.
Note

H1 tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.

Conditioning on gender and role

Show code
p_cond_belief_sh
Figure 13: SH — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct.

5 — Econometric models

5.1 — Determinants of belief accuracy

Estimate an ordered logit (proportional-odds model) for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).

WarningSmall-sample caveat

With n = 32 participants (score 0: n=4; 1: n=9; 2: n=19), EPV is computed as min(n₀, n₂) / k: M1 = 4, M2 = 2, M3 = 1.3. All are well below the recommended 10. All results are exploratory and should be treated as hypothesis-generating.

Model specifications

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]

MASC = Theory of Mind total score; IRI-PT = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents’ decisions); IRI-PD = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.

Coefficient table

Show code
tab_olog_gt
SH — Ordered logit: determinants of belief accuracy
DV = belief accuracy score (0/1/2, ordered). n = 32 (score 0: n=4; 1: n=9; 2: n=19). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.1
Predictor β SE t p OR OR 2.5% OR 97.5%
M1: MASC only
MASC ToM score (z) 0.277 0.331 0.835 0.4036 1.319 0.689 2.525
M2: MASC + IRI-PT
MASC ToM score (z) 0.341 0.346 0.985 0.3248 1.406 0.706 2.865
IRI Perspective Taking (z) 0.259 0.360 0.720 0.4713 1.296 0.635 2.702
M3: MASC + IRI-PT + IRI-PD
MASC ToM score (z) 0.479 0.366 1.307 0.1911 1.615 0.788 3.457
IRI Perspective Taking (z) 0.404 0.422 0.955 0.3394 1.497 0.662 3.651
IRI Personal Distress (z) -0.759 0.431 -1.762 0.0780 0.468 0.183 1.031
1 Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously.

Goodness of fit

Show code
tab_gof_olog_gt
SH — Ordered logit: goodness of fit
DV = belief accuracy score (0/1/2). EPV = min(n₀, n₂) / k.
Model n Predictors EPV AIC McFadden R²
M1: MASC only 32 1 4.0 64.6 0.0117
M2: MASC + IRI-PT 32 2 2.0 66.1 0.0205
M3: MASC + IRI-PT + IRI-PD 32 3 1.3 64.5 0.0801

Forest plot: odds ratios

Show code
p_forest_olog
Figure 14: SH — Ordered logit: odds ratios for belief accuracy. OR > 1 increases probability of higher accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect). x-axis log scale.
NoteInterpretation

MASC ToM (M1–M3): OR = 1.319, p = 0.4036. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents’ decisions — an OR > 1 is consistent with this interpretation. IRI Perspective Taking (M2–M3): OR = 1.296, p = 0.4713 — cognitive empathy is directly relevant to inferring opponents’ intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. IRI Personal Distress (M3): OR = 0.468, p = 0.078 — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ 4 across all models, all estimates carry substantial uncertainty.


5.2 — Stag choice under Hare signal

WarningSmall-sample note

This analysis restricts the sample to participants who received a T signal (Hare) from their opponent (opp_signal_received = T). The dependent variable is whether they nonetheless chose D (Stag) — the Pareto-dominant action — despite receiving a Hare signal. Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.

Models

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]

Sample: opp_signal_received = T only. n = 9, events = 4. EPV: M1 = 4, M2 = 2, M3 = 4. All estimated with Firth penalised logit.

Results

SH — Stag choice when opponent signals T (Hare)
DV = choice2=D | opp_signal=T. n=9, events=4. EPV: M1=4, M2=2, M3=4. All Firth penalised logit (brglm2).1
Predictor β SE z p OR OR 2.5% OR 97.5%
M1: Quiz only
Quiz errors [log(1+x)] -0.263 0.877 -0.30 0.764 0.77 0.14 4.28
M2: Quiz + CRT
Quiz errors [log(1+x)] 0.099 1.036 0.10 0.924 1.10 0.14 8.41
CRT score (0–4) 0.834 1.006 0.83 0.407 2.30 0.32 16.54
M3: CRT only
CRT score (0–4) 0.952 0.955 1.00 0.319 2.59 0.40 16.83
1 Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(choose Stag | opp signals D). 95% Wald CI.
Figure 15: Stag choice when opponent signals T (Hare) — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(choose Stag). All Firth. x-axis log scale.
NoteInterpretation

Quiz errors (OR = 1.1, p = 0.924): a higher error rate on the comprehension quiz may reflect lower understanding of the game, potentially reducing willingness to attempt the risky Stag choice even when the opponent signals Hare (T). CRT score (OR = 2.3, p = 0.407): more reflective thinkers may be better at recognising that choosing Stag despite a Hare signal is a dominated gamble — the opponent has revealed their intention to play Hare, making Stag strictly worse. EPV = 2 — estimates are exploratory and should be interpreted with caution.