MP — Matching Pennies

GT Behaviour · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

TipPayoff matrix — Row payoff, Column payoff (€)
T (col) D (col)
T (row) 24, 0 0, 24
D (row) 0, 24 24, 0

Zero-sum game: total payoff is always 24 €. P1 (row) wins when both players choose the same action; P2 (col) wins when they differ. No pure-strategy Nash equilibrium — the unique NE is in mixed strategies (each plays 50% T, 50% D).


1 — Outcomes & P1 Win

Objective

Describe the joint outcomes reached by each couple in Part 1 and Part 2, and test whether the observed distribution deviates from the uniform prediction implied by the mixed-strategy Nash equilibrium (\(p = 0.25\) per outcome).

TipOutcome labels (MP)
  • (T,T) — P1 wins: both choose T → P1 gets 24 €, P2 gets 0 €
  • (D,D) — P1 wins: both choose D → P1 gets 24 €, P2 gets 0 €
  • (T,D) — P2 wins: choices differ → P1 gets 0 €, P2 gets 24 €
  • (D,T) — P2 wins: choices differ → P1 gets 0 €, P2 gets 24 €

No outcome is a pure-strategy Nash equilibrium. The unique NE is mixed (50/50).

Outcome distributions

Show code
p_eq_mp
Figure 1: MP — Outcome distributions in Part 1 (top) and Part 2 (bottom).

Test: are joint outcomes uniformly distributed?

TipNull hypothesis (NE prediction)

Under the unique mixed-strategy Nash equilibrium each player randomises independently with \(p(\text{T}) = p(\text{D}) = 0.50\). Joint independence implies that each of the four outcomes is equally likely:

\[H_0: P(\text{T,T}) = P(\text{D,D}) = P(\text{T,D}) = P(\text{D,T}) = 0.25\]

A significant \(\chi^2\) goodness-of-fit test against this uniform distribution indicates that observed play deviates from the NE prediction.

Show code
tab_obs_exp_mp |>
  gt::gt(groupname_col = "Part") |>
  gt::tab_header(
    title    = "MP — Observed vs expected outcome counts",
    subtitle = "Expected = N × 0.25 under uniform NE prediction"
  ) |>
  gt::cols_label(Outcome = "Outcome", Observed = "Observed",
                 Expected = "Expected (NE)", N = "N couples") |>
  gt::cols_hide(N) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_row_groups())
MP — Observed vs expected outcome counts
Expected = N × 0.25 under uniform NE prediction
Outcome Observed Expected (NE)
Part 1
(D,D) 3 3.75
(D,T) 6 3.75
(T,D) 3 3.75
(T,T) 3 3.75
Part 2
(D,D) 0 3.75
(D,T) 6 3.75
(T,D) 3 3.75
(T,T) 6 3.75
Show code
tab_chi_uniform_mp |>
  gt::gt() |>
  gt::tab_header(
    title    = "MP — χ² goodness-of-fit vs uniform distribution",
    subtitle = "H₀: each joint outcome has probability 0.25 (mixed-strategy NE). p-value Monte Carlo (B = 9 999)."
  ) |>
  gt::cols_label(Part = "Phase", N = "N", chi2 = "χ²",
                 df = "df", p_sim = "p (sim.)", note = "Note") |>
  gt::fmt_number(columns = chi2, decimals = 3) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_sim,
                                           rows = p_sim < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
MP — χ² goodness-of-fit vs uniform distribution
H₀: each joint outcome has probability 0.25 (mixed-strategy NE). p-value Monte Carlo (B = 9 999).
Phase N χ² df p (sim.) Note
Part 1 15 1.800 NA 0.6999 Expected per cell = 3.75 — p-value Monte Carlo (B = 9 999)
Part 2 15 6.600 NA 0.0963 Expected per cell = 3.75 — p-value Monte Carlo (B = 9 999)

Conditioning on session gender

Show code
tab_cond_coord_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — P1 Win and choice T by session gender",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
p_eq_cond_mp
MP — P1 Win and choice T by session gender
χ² with Monte Carlo simulated p-value (B = 2000), couple level
Outcome Factor χ²(sim.) test
Coordination Part 2 Session gender χ²(sim.): p = 0.317 ns
Coordination Part 1 Session gender χ²(sim.): p = 1.000 ns
Mutual choice T Part 2 Session gender χ²(sim.): p = 0.323 ns
Mutual choice T Part 1 Session gender χ²(sim.): p = 1.000 ns
Figure 2: MP — Equilibrium distribution by session gender. Left: Part 1; Right: Part 2. Proportions are within-group (Male session vs Female session).

2 — Choice & signal distributions

Objective

Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.

Distributions table

Show code
tab_mp
MP — Choice and signal distributions
95% Clopper-Pearson CI
Choice / Signal n N % 95% CI
Part 1
T 15 30 50.0% [31.3%, 68.7%]
D 15 30 50.0% [31.3%, 68.7%]
Signal
T 19 30 63.3% [43.9%, 80.1%]
D 11 30 36.7% [19.9%, 56.1%]
Part 2
T 21 30 70.0% [50.6%, 85.3%]
D 9 30 30.0% [14.7%, 49.4%]

Proportions by phase

Show code
p_mp_dist
Figure 3: MP — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices.
NotePart 1 → Part 2 snapshot

Choice T in Part 1: 50.0%. The dominant signal was T (63.3%). Choice T in Part 2: 70.0%. In MP there is no dominant strategy — each player’s best response depends on what the opponent plays, so the NE prescribes a 50/50 mix. Any deviation from 50% T reflects individual bias or the influence of pre-play communication.

Within-subject choice shift: McNemar test (Part 1 vs Part 2)

Show code
tab_mcnemar_mp |>
  gt::gt() |>
  gt::tab_header(
    title    = "McNemar test — MP: choice_1_T vs choice_2_T",
    subtitle = "Paired within-individual"
  ) |>
  gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
  gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = !is.na(p_value) & p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
McNemar test — MP: choice_1_T vs choice_2_T
Paired within-individual
χ² p-value n Note
2.5000 0.1138 30 OK
Note

A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual choice T rates — consistent with cheap talk influencing behaviour even in MP, where theory predicts signals should have no effect because players have strictly opposing interests and no credible commitment is possible.

Opponent signal and the information set before Part 2

Important

Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.

Show code
p_sig_heatmap_mp
Figure 4: MP — Joint distribution of own signal × opponent signal received. Values show count and share.
Show code
p_choice2_by_oppsig_mp
Figure 5: MP — P(choice₂ = T) stratified by opponent’s signal. 95% Clopper-Pearson CI.
NoteCheap talk in MP: signal has no credible effect on Part 2 behaviour

In Matching Pennies there is no dominant strategy — each player’s optimal action depends entirely on what the opponent does, which is exactly why only a mixed-strategy NE exists. This also means signals cannot be credible: P1 wants to match P2’s action, while P2 wants to mismatch P1’s. Any signal P1 sends gives P2 information to exploit (play the opposite), so a rational P1 should never truthfully reveal their intended action. By the same logic, a rational P2 should not trust a T signal as evidence that P1 will choose T. Near-equal choice T rates conditional on receiving a T vs D signal would confirm this theoretical prediction: the opponent’s signal carries no useful information in equilibrium.

Show code
p_choice2_infoset_mp
Figure 6: MP — P(choice₂ = T) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted.
NoteMP interpretation

The critical test is the (T/T) information set — where both players have promised choice T. Even here, full choice T in Part 2 may fall well below 100%, consistent with the theoretical prediction that cheap talk is non-binding and strategic distrust persists. When the opponent signals D, choice T typically falls further.

Show code
p_follow_opp_mp
Figure 7: MP — Proportion of players whose Part 2 choice matches the opponent’s signal received.

Conditioning on gender and role

Show code
tab_cond_sig_mp |>
  gt::gt() |>
  gt::tab_header(title = "MP — Choice and signal distributions by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
MP — Choice and signal distributions by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Outcome Factor χ²(sim.) test
Part 1 = T Gender χ²(sim.): p = 0.714 ns
Part 1 = T Role χ²(sim.): p = 0.481 ns
Signal = T Gender χ²(sim.): p = 0.059 ns
Signal = T Role χ²(sim.): p = 1.000 ns
Part 2 = T Gender χ²(sim.): p = 0.454 ns
Part 2 = T Role χ²(sim.): p = 0.415 ns
Show code
p_cond_sig_mp
Figure 8: MP — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

3 — Signal honesty & consistency

Objective

Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.

Honesty and consistency proportions

Show code
tab_sec2_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — Signal honesty and consistency",
                 subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
  gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
                 pct = "%", ci95 = "95% CI") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
    locations = gt::cells_body(columns = variable, rows = 1)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
    locations = gt::cells_body(columns = variable, rows = 2)
  ) |>
  gt::tab_footnote(
    footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
    locations = gt::cells_body(columns = variable, rows = 3)
  ) |>
  gt::tab_options(table.font.size = 13)
MP — Signal honesty and consistency
95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).
Measure n (=1) N % 95% CI
Signal honest (signal = choice2)1 20 30 66.7% [47.2%, 82.7%]
Signal consistent with Part 1 (signal = choice1)2 20 30 66.7% [47.2%, 82.7%]
Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)3 10 30 33.3% [17.3%, 52.8%]
1 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.
2 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.
3 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables.

Binomial test: signal honesty vs 50%

Show code
tab_binom_honest_mp |>
  gt::gt() |>
  gt::tab_header(title    = "Binomial test: P(Honest) vs H₀ = 0.50",
                 subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
  gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
                 p_value = "p-value") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_body(columns = p_value,
                                           rows = p_value < 0.05)) |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels())
Binomial test: P(Honest) vs H₀ = 0.50
Two-sided test; 95% Clopper-Pearson CI
n honest N % 95% CI p-value
20 30 66.7% [47.2%, 82.7%] 0.0987
Note

In MP, a signal is honest if the player’s Part 2 action matches what they signalled. Unlike PD, where a D→D case is “trivially honest” because D is dominant, in MP no action is dominant. More importantly, honesty in a zero-sum game has an adverse strategic consequence: a player who honestly signals T and then plays T is revealing their intention to an opponent who benefits from mismatching. Rational players should therefore be strategically deceptive — signal one action and play the other. Observing high honesty rates in MP would indicate that players fail to exploit this deception opportunity, perhaps due to social norms, experimenter demand, or misunderstanding the zero-sum structure.

Strategic transition heatmaps

Show code
p_sankey_mp
Figure 9: MP — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent).

Conditioning on gender and role

Show code
tab_cond_honest_mp |>
  gt::gt() |>
  gt::tab_header(title    = "MP — Signal honesty and strategy change by gender and role",
                 subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
  gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
  gt::tab_style(style = gt::cell_text(weight = "bold"),
                locations = gt::cells_column_labels()) |>
  gt::opt_stylize(style = 1) |>
  gt::tab_options(table.font.size = 13)
MP — Signal honesty and strategy change by gender and role
χ² with Monte Carlo simulated p-value (B = 2000)
Outcome Factor χ²(sim.) test
Signal honest Gender χ²(sim.): p = 1.000 ns
Signal honest Role χ²(sim.): p = 0.697 ns
Strategy changed Gender χ²(sim.): p = 1.000 ns
Strategy changed Role χ²(sim.): p = 1.000 ns
Show code
p_cond_honest_mp
Figure 10: MP — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%.

4 — Belief accuracy & bonus
NoteThe two belief questions

After Part 2, each player answered two incentivised questions about their beliefs:

Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.

Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.

The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.

Objective

Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with P1 Win outcomes at the couple level.

Belief accuracy distribution

Show code
p_belief_bar_mp
Figure 11: MP — Distribution of belief accuracy scores.

Hypothesis tests: beliefs, cognitive ability & P1 Win

MP — Belief accuracy: hypothesis comparison
H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi).
Level Hypothesis X Y Test Expected Stat p n
H1 Individual Reflective thinkers (high CRT) predict opponent’s choice more accurately CRT score (0–4) Belief accuracy (0–2) Spearman ρ Positive -0.275 0.141 30
H2 Individual Players who made more quiz errors have less accurate beliefs Quiz errors [log(1+x)] Belief accuracy (0–2) Spearman ρ Negative 0.046 0.809 30
H3 Couple Couples where both players have perfect beliefs more often result in P1 Win in Part 2 P1 Win Part 2 (0/1) Both perfect beliefs (0/1) Fisher exact + φ1 Ambiguous 0.327 0.400 15
1 H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table. H3 cannot be tested in MP: p1_wins_part2 is not available in the dataset.
Note

H1 tests whether more reflective players (higher CRT) are better at predicting their opponent’s Part 2 choice — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to result in a P1 Win in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. The expected direction for H3 is ambiguous in MP: if P1 correctly predicts P2’s action, P1 can match and win — but if P2 also predicts correctly, P2 can mismatch and P1 loses. Mutual perfect beliefs is self-defeating in a zero-sum game (it would imply a pure-strategy NE, which does not exist in MP). Note: H3 cannot be computed for MP as Part 2 outcome data is not available in the dataset. H1 and H2 are at the individual level; H3 is at the couple level.

Conditioning on gender and role

Show code
p_cond_belief_mp
Figure 12: MP — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct.

5 — Econometric models

5.1 — Determinants of belief accuracy

Estimate an ordered logit (MASS::polr) for the belief accuracy score (0 = both beliefs wrong; 1 = one correct; 2 = both correct) across three nested specifications adding Theory of Mind (MASC) and empathy/perspective-taking (IRI) measures. The outcome is treated as an ordered categorical variable: higher scores reflect better first- and second-order belief formation.

WarningSmall-sample caveat

With n = 30 observations and only 6 in the smallest boundary category (min of score 0 and score 2), the events-per-variable (EPV) for M2 is 3 and for M3 is 2 — well below the recommended minimum of 10. polr uses standard MLE and may be biased with small boundary cells. All results are exploratory and should not be used for causal inference.

Model specifications

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \leq j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]

\(Y\) = belief accuracy score ∈ {0, 1, 2}; \(\alpha_j\) = threshold for category \(j\). MASC = total Theory of Mind score; IRI-PT = Perspective Taking subscale; IRI-PD = Personal Distress subscale. All predictors are z-standardised. OR > 1 indicates a higher probability of achieving a higher accuracy score.

Coefficient table

Show code
tab_olog_gt
MP — Ordered logit: determinants of belief accuracy
DV = belief accuracy score (0/1/2, ordered). n = 30 (score 0: n=6; 1: n=17; 2: n=7). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.1
Predictor β SE t p OR OR 2.5% OR 97.5%
M1: MASC only
MASC ToM score (z) -0.459 0.386 -1.190 0.2339 0.632 0.297 1.346
M2: MASC + IRI-PT
MASC ToM score (z) -0.504 0.394 -1.279 0.2010 0.604 0.268 1.289
IRI Perspective Taking (z) -0.203 0.377 -0.539 0.5900 0.816 0.383 1.714
M3: MASC + IRI-PT + IRI-PD
MASC ToM score (z) -0.432 0.408 -1.059 0.2895 0.649 0.282 1.426
IRI Perspective Taking (z) -0.237 0.381 -0.623 0.5335 0.789 0.366 1.666
IRI Personal Distress (z) -0.272 0.375 -0.726 0.4679 0.762 0.355 1.572
1 Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously.

Goodness of fit

Show code
tab_gof_olog_gt
MP — Ordered logit: goodness of fit
DV = belief accuracy score (0/1/2). EPV = min(n₀, n₂) / k.
Model n Predictors EPV AIC McFadden R²
M1: MASC only 30 1 6 63.5 0.0248
M2: MASC + IRI-PT 30 2 3 65.2 0.0297
M3: MASC + IRI-PT + IRI-PD 30 3 2 66.7 0.0388

Forest plot: odds ratios

Show code
p_forest_olog
Figure 13: MP — Ordered logit odds ratios for belief accuracy score (0/1/2). OR > 1 = higher probability of a higher belief accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect).
NoteInterpretation

MASC ToM (M1): OR = 0.632, p = 0.2339 — a positive OR (>1) would indicate that higher Theory of Mind ability is associated with more accurate first- and second-order beliefs, consistent with MASC measuring the capacity to model others’ mental states. IRI Perspective Taking (M2): OR = 0.816, p = 0.59 — cognitive perspective-taking may similarly improve belief accuracy by helping players simulate the opponent’s reasoning. IRI Personal Distress (M3): OR = 0.762, p = 0.4679 — higher personal distress may interfere with clear-headed belief formation, so a negative OR (< 1) is theoretically plausible. MASC in M3: OR = 0.649, p = 0.2895. Given EPV = 2, all M3 estimates carry substantial uncertainty and should be treated as hypothesis-generating only.


5.2 — Choice T under defection signal

WarningSmall-sample note

This analysis restricts the sample to participants who received a D signal from their opponent (opp_signal_received = D). The dependent variable is whether they nonetheless chose T (choose T). Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.

Models

\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(T) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]

Sample: opp_signal_received = D only. n = 11, events = 6. EPV: M1 = 6, M2 = 3, M3 = 6. All estimated with Firth penalised logit.

Results

MP — Choice T when opponent signals D
DV = played BR | opp_signal=D. n=11, events=6. EPV: M1=6, M2=3, M3=6. All Firth penalised logit (brglm2).1
Predictor β SE z p OR OR 2.5% OR 97.5%
M1: Quiz only
Quiz errors [log(1+x)] -0.052 0.559 -0.09 0.926 0.95 0.32 2.84
M2: Quiz + CRT
Quiz errors [log(1+x)] -0.245 0.632 -0.39 0.699 0.78 0.23 2.70
CRT score (0–4) -1.684 1.397 -1.21 0.228 0.19 0.01 2.87
M3: CRT only
CRT score (0–4) -1.686 1.339 -1.26 0.208 0.19 0.01 2.55
1 Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(played BR | opp signals D). 95% Wald CI.
Figure 14: Choice T when opponent signals D — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(choose T). All Firth. x-axis log scale.
NoteInterpretation

Quiz errors (OR = 0.78, p = 0.699) — a higher error rate on the comprehension quiz is associated with increased probability of choosing T even after receiving a D signal, consistent with incomplete understanding of the zero-sum structure. CRT score (OR = 0.19, p = 0.228) — more reflective thinkers are better at strategic anticipation: recognising that the opponent who signals D may be deceiving, and responding with T based on mixed-strategy reasoning rather than naively following the signal. EPV = 3 — estimates are exploratory.