MP — Matching Pennies

GT Behaviour · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

Payoff matrix — Row payoff, Column payoff (€)

	T (col)	D (col)
T (row)	24, 0	0, 24
D (row)	0, 24	24, 0

Zero-sum game: total payoff is always 24 €. P1 (row) wins when both players choose the same action; P2 (col) wins when they differ. No pure-strategy Nash equilibrium — the unique NE is in mixed strategies (each plays 50% T, 50% D).

1 — Outcomes & P1 Win

Objective

Describe the joint outcomes reached by each couple in Part 1 and Part 2, and test whether the observed distribution deviates from the uniform prediction implied by the mixed-strategy Nash equilibrium (\(p = 0.25\) per outcome).

Outcome labels (MP)

(T,T) — P1 wins: both choose T → P1 gets 24 €, P2 gets 0 €
(D,D) — P1 wins: both choose D → P1 gets 24 €, P2 gets 0 €
(T,D) — P2 wins: choices differ → P1 gets 0 €, P2 gets 24 €
(D,T) — P2 wins: choices differ → P1 gets 0 €, P2 gets 24 €

No outcome is a pure-strategy Nash equilibrium. The unique NE is mixed (50/50).

Outcome distributions

Show code

p_eq_mp

Figure 1: MP — Outcome distributions in Part 1 (top) and Part 2 (bottom).

Test: are joint outcomes uniformly distributed?

Null hypothesis (NE prediction)

Under the unique mixed-strategy Nash equilibrium each player randomises independently with \(p(\text{T}) = p(\text{D}) = 0.50\). Joint independence implies that each of the four outcomes is equally likely:

\[H_0: P(\text{T,T}) = P(\text{D,D}) = P(\text{T,D}) = P(\text{D,T}) = 0.25\]

A significant \(\chi^2\) goodness-of-fit test against this uniform distribution indicates that observed play deviates from the NE prediction.

Show code

tab_obs_exp_mp |>
  gt::gt(groupname_col = "Part") |>
  gt::tab_header(
    title    = "MP — Observed vs expected outcome counts",
    subtitle = "Expected = N × 0.25 under uniform NE prediction"
  ) |>
  gt::cols_label(Outcome = "Outcome", Observed = "Observed",
                 Expected = "Expected (NE)", N = "N couples") |>
  gt::cols_hide(N) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_row_groups())

Outcome	Observed	Expected (NE)
MP — Observed vs expected outcome counts
Expected = N × 0.25 under uniform NE prediction
Part 1
(D,D)	3	3.75
(D,T)	6	3.75
(T,D)	3	3.75
(T,T)	3	3.75
Part 2
(D,D)	0	3.75
(D,T)	6	3.75
(T,D)	3	3.75
(T,T)	6	3.75

Show code

tab_chi_uniform_mp |>
  gt::gt() |>
  gt::tab_header(
    title    = "MP — χ² goodness-of-fit vs uniform distribution",
    subtitle = "H₀: each joint outcome has probability 0.25 (mixed-strategy NE). p-value Monte Carlo (B = 9 999)."
  ) |>
  gt::cols_label(Part = "Phase", N = "N", chi2 = "χ²",
                 df = "df", p_sim = "p (sim.)", note = "Note") |>
  gt::fmt_number(columns = chi2, decimals = 3) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_sim,
                                           rows = p_sim < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

Phase	N	χ²	df	p (sim.)	Note
MP — χ² goodness-of-fit vs uniform distribution
H₀: each joint outcome has probability 0.25 (mixed-strategy NE). p-value Monte Carlo (B = 9 999).
Part 1	15	1.800	NA	0.6999	Expected per cell = 3.75 — p-value Monte Carlo (B = 9 999)
Part 2	15	6.600	NA	0.0963	Expected per cell = 3.75 — p-value Monte Carlo (B = 9 999)

Conditioning on session gender

Show code

tab_cond_coord_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — P1 Win and choice T by session gender",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
p_eq_cond_mp

Outcome	Factor	χ²(sim.) test
MP — P1 Win and choice T by session gender
χ² with Monte Carlo simulated p-value (B = 2000), couple level
Coordination Part 2	Session gender	χ²(sim.): p = 0.317 ns
Coordination Part 1	Session gender	χ²(sim.): p = 1.000 ns
Mutual choice T Part 2	Session gender	χ²(sim.): p = 0.323 ns
Mutual choice T Part 1	Session gender	χ²(sim.): p = 1.000 ns

Figure 2: MP — Equilibrium distribution by session gender. Left: Part 1; Right: Part 2. Proportions are within-group (Male session vs Female session).

2 — Choice & signal distributions

Objective

Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.

Distributions table

Show code

tab_mp

Choice / Signal	n	N	%	95% CI
MP — Choice and signal distributions
95% Clopper-Pearson CI
Part 1
T	15	30	50.0%	[31.3%, 68.7%]
D	15	30	50.0%	[31.3%, 68.7%]
Signal
T	19	30	63.3%	[43.9%, 80.1%]
D	11	30	36.7%	[19.9%, 56.1%]
Part 2
T	21	30	70.0%	[50.6%, 85.3%]
D	9	30	30.0%	[14.7%, 49.4%]

Proportions by phase

Show code

p_mp_dist

Figure 3: MP — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices.

Part 1 → Part 2 snapshot

Choice T in Part 1: 50.0%. The dominant signal was T (63.3%). Choice T in Part 2: 70.0%. In MP there is no dominant strategy — each player’s best response depends on what the opponent plays, so the NE prescribes a 50/50 mix. Any deviation from 50% T reflects individual bias or the influence of pre-play communication.

Within-subject choice shift: McNemar test (Part 1 vs Part 2)

Show code

tab_mcnemar_mp |>
  gt::gt() |>
  gt::tab_header(
    title    = "McNemar test — MP: choice_1_T vs choice_2_T",
    subtitle = "Paired within-individual"
  ) |>
  gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

χ²	p-value	n	Note
McNemar test — MP: choice_1_T vs choice_2_T
Paired within-individual
2.5000	0.1138	30	OK

Note

A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual choice T rates — consistent with cheap talk influencing behaviour even in MP, where theory predicts signals should have no effect because players have strictly opposing interests and no credible commitment is possible.

Opponent signal and the information set before Part 2

Important

Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.

Show code

p_sig_heatmap_mp

Figure 4: MP — Joint distribution of own signal × opponent signal received. Values show count and share.

Show code

p_choice2_by_oppsig_mp

Figure 5: MP — P(choice₂ = T) stratified by opponent’s signal. 95% Clopper-Pearson CI.

Cheap talk in MP: signal has no credible effect on Part 2 behaviour

In Matching Pennies there is no dominant strategy — each player’s optimal action depends entirely on what the opponent does, which is exactly why only a mixed-strategy NE exists. This also means signals cannot be credible: P1 wants to match P2’s action, while P2 wants to mismatch P1’s. Any signal P1 sends gives P2 information to exploit (play the opposite), so a rational P1 should never truthfully reveal their intended action. By the same logic, a rational P2 should not trust a T signal as evidence that P1 will choose T. Near-equal choice T rates conditional on receiving a T vs D signal would confirm this theoretical prediction: the opponent’s signal carries no useful information in equilibrium.

Show code

p_choice2_infoset_mp

Figure 6: MP — P(choice₂ = T) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted.

MP interpretation

The critical test is the (T/T) information set — where both players have promised choice T. Even here, full choice T in Part 2 may fall well below 100%, consistent with the theoretical prediction that cheap talk is non-binding and strategic distrust persists. When the opponent signals D, choice T typically falls further.

Show code

p_follow_opp_mp

Figure 7: MP — Proportion of players whose Part 2 choice matches the opponent’s signal received.

Conditioning on gender and role

Show code

tab_cond_sig_mp |>
  gt::gt() |>
  gt::tab_header(title = "MP — Choice and signal distributions by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
MP — Choice and signal distributions by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Part 1 = T	Gender	χ²(sim.): p = 0.714 ns
Part 1 = T	Role	χ²(sim.): p = 0.481 ns
Signal = T	Gender	χ²(sim.): p = 0.059 ns
Signal = T	Role	χ²(sim.): p = 1.000 ns
Part 2 = T	Gender	χ²(sim.): p = 0.454 ns
Part 2 = T	Role	χ²(sim.): p = 0.415 ns

Show code

p_cond_sig_mp

Figure 8: MP — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

3 — Signal honesty & consistency

Objective

Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.

Honesty and consistency proportions

Show code

tab_sec2_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — Signal honesty and consistency",
                 subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
  gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
    locations = gt::cells_body(columns = variable, rows = 1)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
    locations = gt::cells_body(columns = variable, rows = 2)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
    locations = gt::cells_body(columns = variable, rows = 3)
  ) |>
  gt::tab_options(table.font.size = 13)

Measure	n (=1)	N	%	95% CI
MP — Signal honesty and consistency
95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).
Signal honest (signal = choice2)¹	20	30	66.7%	[47.2%, 82.7%]
Signal consistent with Part 1 (signal = choice1)²	20	30	66.7%	[47.2%, 82.7%]
Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)³	10	30	33.3%	[17.3%, 52.8%]
¹ 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.
² 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.
³ 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables.

Binomial test: signal honesty vs 50%

Show code

tab_binom_honest_mp |>
  gt::gt() |>
  gt::tab_header(title    = "Binomial test: P(Honest) vs H₀ = 0.50",
                 subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
  gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
                 p_value = "p-value") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())

n honest	N	%	95% CI	p-value
Binomial test: P(Honest) vs H₀ = 0.50
Two-sided test; 95% Clopper-Pearson CI
20	30	66.7%	[47.2%, 82.7%]	0.0987

Note

In MP, a signal is honest if the player’s Part 2 action matches what they signalled. Unlike PD, where a D→D case is “trivially honest” because D is dominant, in MP no action is dominant. More importantly, honesty in a zero-sum game has an adverse strategic consequence: a player who honestly signals T and then plays T is revealing their intention to an opponent who benefits from mismatching. Rational players should therefore be strategically deceptive — signal one action and play the other. Observing high honesty rates in MP would indicate that players fail to exploit this deception opportunity, perhaps due to social norms, experimenter demand, or misunderstanding the zero-sum structure.

Strategic transition heatmaps

Show code

p_sankey_mp

Figure 9: MP — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent).

Conditioning on gender and role

Show code

tab_cond_honest_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — Signal honesty and strategy change by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)

Outcome	Factor	χ²(sim.) test
MP — Signal honesty and strategy change by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Signal honest	Gender	χ²(sim.): p = 1.000 ns
Signal honest	Role	χ²(sim.): p = 0.697 ns
Strategy changed	Gender	χ²(sim.): p = 1.000 ns
Strategy changed	Role	χ²(sim.): p = 1.000 ns

Show code

p_cond_honest_mp

Figure 10: MP — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

4 — Belief accuracy & bonus

The two belief questions

After Part 2, each player answered two incentivised questions about their beliefs:

Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.

Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.

The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.

Objective

Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with P1 Win outcomes at the couple level.

Belief accuracy distribution

Show code

p_belief_bar_mp

Figure 11: MP — Distribution of belief accuracy scores.

Hypothesis tests: beliefs, cognitive ability & P1 Win

	Level	Hypothesis	X	Y	Test	Expected	Stat	p	n
MP — Belief accuracy: hypothesis comparison
H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi).
H1	Individual	Reflective thinkers (high CRT) predict opponent’s choice more accurately	CRT score (0–4)	Belief accuracy (0–2)	Spearman ρ	Positive	-0.275	0.141	30
H2	Individual	Players who made more quiz errors have less accurate beliefs	Quiz errors [log(1+x)]	Belief accuracy (0–2)	Spearman ρ	Negative	0.046	0.809	30
H3	Couple	Couples where both players have perfect beliefs more often result in P1 Win in Part 2	P1 Win Part 2 (0/1)	Both perfect beliefs (0/1)	Fisher exact + φ¹	Ambiguous	0.327	0.400	15
¹ H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table. H3 cannot be tested in MP: p1_wins_part2 is not available in the dataset.

Note

H1 tests whether more reflective players (higher CRT) are better at predicting their opponent’s Part 2 choice — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to result in a P1 Win in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. The expected direction for H3 is ambiguous in MP: if P1 correctly predicts P2’s action, P1 can match and win — but if P2 also predicts correctly, P2 can mismatch and P1 loses. Mutual perfect beliefs is self-defeating in a zero-sum game (it would imply a pure-strategy NE, which does not exist in MP). Note: H3 cannot be computed for MP as Part 2 outcome data is not available in the dataset. H1 and H2 are at the individual level; H3 is at the couple level.

Conditioning on gender and role

Show code

p_cond_belief_mp

Figure 12: MP — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct.

5 — Econometric models

5.1 — Determinants of belief accuracy

Estimate an ordered logit (MASS::polr) for the belief accuracy score (0 = both beliefs wrong; 1 = one correct; 2 = both correct) across three nested specifications adding Theory of Mind (MASC) and empathy/perspective-taking (IRI) measures. The outcome is treated as an ordered categorical variable: higher scores reflect better first- and second-order belief formation.

Small-sample caveat

With n = 30 observations and only 6 in the smallest boundary category (min of score 0 and score 2), the events-per-variable (EPV) for M2 is 3 and for M3 is 2 — well below the recommended minimum of 10. polr uses standard MLE and may be biased with small boundary cells. All results are exploratory and should not be used for causal inference.

Model specifications

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]

\(Y\) = belief accuracy score ∈ {0, 1, 2}; \(\alpha_j\) = threshold for category \(j\). MASC = total Theory of Mind score; IRI-PT = Perspective Taking subscale; IRI-PD = Personal Distress subscale. All predictors are z-standardised. OR > 1 indicates a higher probability of achieving a higher accuracy score.

Coefficient table

Show code

tab_olog_gt

Predictor	β	SE	t	p	OR	OR 2.5%	OR 97.5%
MP — Ordered logit: determinants of belief accuracy
DV = belief accuracy score (0/1/2, ordered). n = 30 (score 0: n=6; 1: n=17; 2: n=7). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.¹
M1: MASC only
MASC ToM score (z)	-0.459	0.386	-1.190	0.2339	0.632	0.297	1.346
M2: MASC + IRI-PT
MASC ToM score (z)	-0.504	0.394	-1.279	0.2010	0.604	0.268	1.289
IRI Perspective Taking (z)	-0.203	0.377	-0.539	0.5900	0.816	0.383	1.714
M3: MASC + IRI-PT + IRI-PD
MASC ToM score (z)	-0.432	0.408	-1.059	0.2895	0.649	0.282	1.426
IRI Perspective Taking (z)	-0.237	0.381	-0.623	0.5335	0.789	0.366	1.666
IRI Personal Distress (z)	-0.272	0.375	-0.726	0.4679	0.762	0.355	1.572
¹ Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously.

Goodness of fit

Show code

tab_gof_olog_gt

Model	n	Predictors	EPV	AIC	McFadden R²
MP — Ordered logit: goodness of fit
DV = belief accuracy score (0/1/2). EPV = min(n₀, n₂) / k.
M1: MASC only	30	1	6	63.5	0.0248
M2: MASC + IRI-PT	30	2	3	65.2	0.0297
M3: MASC + IRI-PT + IRI-PD	30	3	2	66.7	0.0388

Forest plot: odds ratios

Show code

p_forest_olog

Figure 13: MP — Ordered logit odds ratios for belief accuracy score (0/1/2). OR > 1 = higher probability of a higher belief accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect).

Interpretation

MASC ToM (M1): OR = 0.632, p = 0.2339 — a positive OR (>1) would indicate that higher Theory of Mind ability is associated with more accurate first- and second-order beliefs, consistent with MASC measuring the capacity to model others’ mental states. IRI Perspective Taking (M2): OR = 0.816, p = 0.59 — cognitive perspective-taking may similarly improve belief accuracy by helping players simulate the opponent’s reasoning. IRI Personal Distress (M3): OR = 0.762, p = 0.4679 — higher personal distress may interfere with clear-headed belief formation, so a negative OR (< 1) is theoretically plausible. MASC in M3: OR = 0.649, p = 0.2895. Given EPV = 2, all M3 estimates carry substantial uncertainty and should be treated as hypothesis-generating only.

5.2 — Choice T under defection signal

Small-sample note

This analysis restricts the sample to participants who received a D signal from their opponent (opp_signal_received = D). The dependent variable is whether they nonetheless chose T (choose T). Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.

Models

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]

Sample: opp_signal_received = D only. n = 11, events = 6. EPV: M1 = 6, M2 = 3, M3 = 6. All estimated with Firth penalised logit.

Results

Predictor	β	SE	z	p	OR	OR 2.5%	OR 97.5%
MP — Choice T when opponent signals D
DV = played BR \| opp_signal=D. n=11, events=6. EPV: M1=6, M2=3, M3=6. All Firth penalised logit (brglm2).¹
M1: Quiz only
Quiz errors [log(1+x)]	-0.052	0.559	-0.09	0.926	0.95	0.32	2.84
M2: Quiz + CRT
Quiz errors [log(1+x)]	-0.245	0.632	-0.39	0.699	0.78	0.23	2.70
CRT score (0–4)	-1.684	1.397	-1.21	0.228	0.19	0.01	2.87
M3: CRT only
CRT score (0–4)	-1.686	1.339	-1.26	0.208	0.19	0.01	2.55
¹ Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(played BR \| opp signals D). 95% Wald CI.

Figure 14: Choice T when opponent signals D — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(choose T). All Firth. x-axis log scale.

Interpretation

Quiz errors (OR = 0.78, p = 0.699) — a higher error rate on the comprehension quiz is associated with increased probability of choosing T even after receiving a D signal, consistent with incomplete understanding of the zero-sum structure. CRT score (OR = 0.19, p = 0.228) — more reflective thinkers are better at strategic anticipation: recognising that the opponent who signals D may be deceiving, and responding with T based on mixed-strategy reasoning rather than naively following the signal. EPV = 3 — estimates are exploratory.