Show code
p_eq_shGT Behaviour · GTEMO Experiment
Eric Guerci
March 22, 2026
| L (col) | R (col) | |
|---|---|---|
| T (row) | 14, 14 ★ | 14, 0 |
| D (row) | 0, 14 | 20, 20 ♦ |
★ = Risk-dominant Nash equilibrium (Hare-Hare). ♦ = Pareto-dominant Nash equilibrium (Stag-Stag). In Stag Hunt, cheap talk is highly effective because both players want to coordinate, and signals are self-signaling and self-committing.
Describe the equilibrium outcomes reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.
coord_sh |>
dplyr::select(part, x, n, pct, ci95) |>
gt::gt() |>
gt::tab_header(title = "SH — Coordination rates: Part 1 vs Part 2",
subtitle = "95% Clopper-Pearson CI") |>
gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
p_coord_sh| SH — Coordination rates: Part 1 vs Part 2 | ||||
| 95% Clopper-Pearson CI | ||||
| Phase | n coordinated | N | % | 95% CI |
|---|---|---|---|---|
| Part 1 | 8 | 16 | 50.0% | [24.7%, 75.3%] |
| Part 2 | 12 | 16 | 75.0% | [47.6%, 92.7%] |
tab_mc_sh |>
gt::gt() |>
gt::tab_header(title = "McNemar tests — SH (couple level)",
subtitle = "Paired Part 1 vs Part 2") |>
gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| McNemar tests — SH (couple level) | |||
| Paired Part 1 vs Part 2 | |||
| Test | χ² | p-value | Note |
|---|---|---|---|
| SH — coord Part1 vs Part2 | 2.2500 | 0.1336 | OK |
| SH — mutual Stag choice Part1 vs Part2 | 0.0000 | 1.0000 | OK |
Pareto equilibrium (Stag-Stag) rate in Part 1: 50.0% of pairs → Part 2: 50.0%. A significant McNemar result would indicate that cheap talk systematically shifted couples from the risk-dominant (Hare-Hare) to the Pareto-dominant (Stag-Stag) equilibrium.
tab_cond_coord_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Coordination and Stag choice by session gender",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
p_coord_gender_sh| SH — Coordination and Stag choice by session gender | ||
| χ² with Monte Carlo simulated p-value (B = 2000), couple level | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Coordination Part 2 | Session gender | χ²(sim.): p = 0.544 ns |
| Coordination Part 1 | Session gender | χ²(sim.): p = 1.000 ns |
| Mutual Stag choice Part 2 | Session gender | χ²(sim.): p = 1.000 ns |
| Mutual Stag choice Part 1 | Session gender | χ²(sim.): p = 1.000 ns |
Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.
Stag choice (D) in Part 1: 75.0%. The dominant signal was D (71.9%). Stag choice in Part 2: 62.5%. An increase in D (Stag) from Part 1 to Part 2 would be consistent with cheap talk successfully shifting players from the safe Hare equilibrium to the Pareto-dominant Stag equilibrium. Unlike PD or MP, Hare is not a strictly dominant strategy in SH — it is merely risk-dominant — so signals can plausibly serve as coordination devices.
tab_mcnemar_sh |>
gt::gt() |>
gt::tab_header(
title = "McNemar test — SH: choice1_D vs choice2_D",
subtitle = "Paired within-individual"
) |>
gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| McNemar test — SH: choice1_D vs choice2_D | |||
| Paired within-individual | |||
| χ² | p-value | n | Note |
|---|---|---|---|
| 1.1250 | 0.2888 | 32 | OK |
A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Stag (D) choice rates. In SH, Hare is not a dominant strategy — it is the risk-dominant equilibrium, but Stag is the better mutual response if the opponent also plays Stag. Cheap talk is theoretically credible in SH because both players prefer the Pareto equilibrium (20, 20) over the risk-dominant one (14, 14), making signals self-signaling and self-committing.
Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.
In the Stag Hunt, neither T (Hare) nor D (Stag) is a strictly dominant strategy. Hare is merely the risk-dominant choice — it is the safer option under uncertainty, but both players would prefer mutual Stag (20, 20) over mutual Hare (14, 14). This makes cheap talk highly credible: a player who signals D (Stag) is credibly committing to the Pareto-dominant equilibrium, because following through is individually rational as long as the opponent also plays Stag. Signals are both self-signaling (it is rational to send a D signal only if you intend to play D) and self-committing (it is rational to follow a D signal once sent). A large gap in Stag choice rates between receiving a D signal vs a T signal would confirm the effectiveness of cheap talk in SH.
The critical test is the (D/D) information set — where both players have signalled Stag (D). If both players trust the signal, this should produce nearly 100% Stag choice, converging on the Pareto-dominant (D,D) equilibrium. Values below 100% reflect residual strategic risk aversion (fear of being the only Stag hunter). The (T/T) information set (both signalled Hare) represents the risk-dominant equilibrium pull — here Hare choice should be near 100%. A large D/D vs T/T gap in Stag choice rates is the clearest evidence of cheap talk effectiveness in SH.
tab_cond_sig_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Choice and signal distributions by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)| SH — Choice and signal distributions by gender and role | ||
| χ² with Monte Carlo simulated p-value (B = 2000) | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Part 1 = T | Gender | χ²(sim.): p = 1.000 ns |
| Part 1 = T | Role | χ²(sim.): p = 0.667 ns |
| Signal = T | Gender | χ²(sim.): p = 1.000 ns |
| Signal = T | Role | χ²(sim.): p = 0.432 ns |
| Part 2 = T | Gender | χ²(sim.): p = 0.722 ns |
| Part 2 = T | Role | χ²(sim.): p = 1.000 ns |
Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.
tab_sec2_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Signal honesty and consistency",
subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
locations = gt::cells_body(columns = variable, rows = 1)
) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
locations = gt::cells_body(columns = variable, rows = 2)
) |>
gt::tab_footnote(
footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
locations = gt::cells_body(columns = variable, rows = 3)
) |>
gt::tab_options(table.font.size = 13)| SH — Signal honesty and consistency | ||||
| 95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no). | ||||
| Measure | n (=1) | N | % | 95% CI |
|---|---|---|---|---|
| Signal honest (signal = choice2)1 | 25 | 32 | 78.1% | [60.0%, 90.7%] |
| Signal consistent with Part 1 (signal = choice1)2 | 29 | 32 | 90.6% | [75.0%, 98.0%] |
| Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)3 | 8 | 32 | 25.0% | [11.5%, 43.4%] |
| 1 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal. | ||||
| 2 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2. | ||||
| 3 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables. | ||||
tab_binom_honest_sh |>
gt::gt() |>
gt::tab_header(title = "Binomial test: P(Honest) vs H₀ = 0.50",
subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
p_value = "p-value") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| Binomial test: P(Honest) vs H₀ = 0.50 | ||||
| Two-sided test; 95% Clopper-Pearson CI | ||||
| n honest | N | % | 95% CI | p-value |
|---|---|---|---|---|
| 25 | 32 | 78.1% | [60.0%, 90.7%] | 0.0021 |
In Stag Hunt, a signal is honest if the player’s Part 2 action matches what they signalled. Unlike PD where honesty is only meaningful for the dominated cooperative strategy, in SH both signals (Stag/D and Hare/T) correspond to Nash equilibria. Honesty therefore measures pure coordination commitment. A player who signals Stag and plays Stag is attempting the Pareto-dominant equilibrium; a player who signals Hare and plays Hare is securing the risk-dominant equilibrium. The aggregate honesty rate is a genuine measure of how much players rely on cheap talk to coordinate their equilibria.
tab_cond_honest_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Signal honesty and strategy change by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)| SH — Signal honesty and strategy change by gender and role | ||
| χ² with Monte Carlo simulated p-value (B = 2000) | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Signal honest | Gender | χ²(sim.): p = 1.000 ns |
| Signal honest | Role | χ²(sim.): p = 1.000 ns |
| Strategy changed | Gender | χ²(sim.): p = 0.681 ns |
| Strategy changed | Role | χ²(sim.): p = 0.683 ns |
After Part 2, each player answered two incentivised questions about their beliefs:
Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.
Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.
The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.
Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.
| SH — Belief accuracy: hypothesis comparison | |||||||||
| H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi). | |||||||||
| Level | Hypothesis | X | Y | Test | Expected | Stat | p | n | |
|---|---|---|---|---|---|---|---|---|---|
| H1 | Individual | Reflective thinkers (high CRT) predict opponent’s choice more accurately | CRT score (0–4) | Belief accuracy (0–2) | Spearman ρ | Positive | -0.017 | 0.927 | 32 |
| H2 | Individual | Players who made more quiz errors have less accurate beliefs | Quiz errors [log(1+x)] | Belief accuracy (0–2) | Spearman ρ | Negative | 0.141 | 0.441 | 32 |
| H3 | Couple | Couples where both players have perfect beliefs coordinate more in Part 2 | Coord. Part 2 (0/1) | Both perfect beliefs (0/1) | Fisher exact + φ1 | Positive | 0.577 | 0.077 | 16 |
| 1 H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table. | |||||||||
H1 tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.
Estimate an ordered logit (proportional-odds model) for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).
With n = 32 participants (score 0: n=4; 1: n=9; 2: n=19), EPV is computed as min(n₀, n₂) / k: M1 = 4, M2 = 2, M3 = 1.3. All are well below the recommended 10. All results are exploratory and should be treated as hypothesis-generating.
\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]
MASC = Theory of Mind total score; IRI-PT = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents’ decisions); IRI-PD = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.
| SH — Ordered logit: determinants of belief accuracy | |||||||
| DV = belief accuracy score (0/1/2, ordered). n = 32 (score 0: n=4; 1: n=9; 2: n=19). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.1 | |||||||
| Predictor | β | SE | t | p | OR | OR 2.5% | OR 97.5% |
|---|---|---|---|---|---|---|---|
| M1: MASC only | |||||||
| MASC ToM score (z) | 0.277 | 0.331 | 0.835 | 0.4036 | 1.319 | 0.689 | 2.525 |
| M2: MASC + IRI-PT | |||||||
| MASC ToM score (z) | 0.341 | 0.346 | 0.985 | 0.3248 | 1.406 | 0.706 | 2.865 |
| IRI Perspective Taking (z) | 0.259 | 0.360 | 0.720 | 0.4713 | 1.296 | 0.635 | 2.702 |
| M3: MASC + IRI-PT + IRI-PD | |||||||
| MASC ToM score (z) | 0.479 | 0.366 | 1.307 | 0.1911 | 1.615 | 0.788 | 3.457 |
| IRI Perspective Taking (z) | 0.404 | 0.422 | 0.955 | 0.3394 | 1.497 | 0.662 | 3.651 |
| IRI Personal Distress (z) | -0.759 | 0.431 | -1.762 | 0.0780 | 0.468 | 0.183 | 1.031 |
| 1 Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously. | |||||||
MASC ToM (M1–M3): OR = 1.319, p = 0.4036. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents’ decisions — an OR > 1 is consistent with this interpretation. IRI Perspective Taking (M2–M3): OR = 1.296, p = 0.4713 — cognitive empathy is directly relevant to inferring opponents’ intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. IRI Personal Distress (M3): OR = 0.468, p = 0.078 — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ 4 across all models, all estimates carry substantial uncertainty.
This analysis restricts the sample to participants who received a T signal (Hare) from their opponent (opp_signal_received = T). The dependent variable is whether they nonetheless chose D (Stag) — the Pareto-dominant action — despite receiving a Hare signal. Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.
\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]
Sample: opp_signal_received = T only. n = 9, events = 4. EPV: M1 = 4, M2 = 2, M3 = 4. All estimated with Firth penalised logit.
| SH — Stag choice when opponent signals T (Hare) | |||||||
| DV = choice2=D | opp_signal=T. n=9, events=4. EPV: M1=4, M2=2, M3=4. All Firth penalised logit (brglm2).1 | |||||||
| Predictor | β | SE | z | p | OR | OR 2.5% | OR 97.5% |
|---|---|---|---|---|---|---|---|
| M1: Quiz only | |||||||
| Quiz errors [log(1+x)] | -0.263 | 0.877 | -0.30 | 0.764 | 0.77 | 0.14 | 4.28 |
| M2: Quiz + CRT | |||||||
| Quiz errors [log(1+x)] | 0.099 | 1.036 | 0.10 | 0.924 | 1.10 | 0.14 | 8.41 |
| CRT score (0–4) | 0.834 | 1.006 | 0.83 | 0.407 | 2.30 | 0.32 | 16.54 |
| M3: CRT only | |||||||
| CRT score (0–4) | 0.952 | 0.955 | 1.00 | 0.319 | 2.59 | 0.40 | 16.83 |
| 1 Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(choose Stag | opp signals D). 95% Wald CI. | |||||||
Quiz errors (OR = 1.1, p = 0.924): a higher error rate on the comprehension quiz may reflect lower understanding of the game, potentially reducing willingness to attempt the risky Stag choice even when the opponent signals Hare (T). CRT score (OR = 2.3, p = 0.407): more reflective thinkers may be better at recognising that choosing Stag despite a Hare signal is a dominated gamble — the opponent has revealed their intention to play Hare, making Stag strictly worse. EPV = 2 — estimates are exploratory and should be interpreted with caution.
---
title: "SH — Stag Hunt"
subtitle: "GT Behaviour · GTEMO Experiment"
author: "Eric Guerci"
date: today
format:
html:
theme: flatly
toc: true
toc-depth: 2
toc-title: "Contents"
number-sections: false
code-fold: true
code-summary: "Show code"
code-tools: true
fig-width: 9
fig-height: 4
fig-dpi: 150
smooth-scroll: true
embed-resources: true
execute:
echo: true
warning: false
message: false
---
```{r setup}
#| include: false
source("code.R")
```
::: {.callout-tip icon="false"}
##### Payoff matrix — Row payoff, Column payoff
| | **L** (col) | **R** (col) |
|:---:|:---:|:---:|
| **T** (row) | **14, 14** ★ | 14, 0 |
| **D** (row) | 0, 14 | **20, 20** ♦ |
★ = Risk-dominant Nash equilibrium (Hare-Hare). ♦ = Pareto-dominant Nash equilibrium (Stag-Stag). In Stag Hunt, cheap talk is highly effective because both players want to coordinate, and signals are self-signaling and self-committing.
:::
---
```{=html}
<details open>
<summary><strong>1 — Equilibria & coordination</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Describe the **equilibrium outcomes** reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.
::: {.callout-tip icon="false"}
##### Equilibrium labels (SH)
- **(D,D) — Stag-Stag** — Pareto-dominant NE; both hunt Stag → 20, 20 € ♦
- **(T,T) — Hare-Hare** — Risk-dominant NE; both hunt Hare → 14, 14 € ★
- **(T,D)** — Hare-Stag: P1 hunts Hare (14 €), P2 hunts Stag alone (0 €)
- **(D,T)** — Stag-Hare: P1 hunts Stag alone (0 €), P2 hunts Hare (14 €)
:::
### Equilibrium distributions
```{r}
#| label: fig-eq-sh
#| fig-cap: "SH — Equilibrium distributions in Part 1 (top) and Part 2 (bottom)."
#| fig-height: 8
#| fig-width: 8
p_eq_sh
```
### Coordination rates: Part 1 vs Part 2
:::: {layout="[1,1]" layout-valign="top"}
```{r}
#| label: tab-coord-rates-sh
coord_sh |>
dplyr::select(part, x, n, pct, ci95) |>
gt::gt() |>
gt::tab_header(title = "SH — Coordination rates: Part 1 vs Part 2",
subtitle = "95% Clopper-Pearson CI") |>
gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
```{r}
#| label: fig-coord-sh
#| fig-cap: "SH — Coordination rate in Part 1 vs Part 2 (couple level). 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 4
p_coord_sh
```
::::
### McNemar tests
```{r}
#| label: tab-mc-sh
tab_mc_sh |>
gt::gt() |>
gt::tab_header(title = "McNemar tests — SH (couple level)",
subtitle = "Paired Part 1 vs Part 2") |>
gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
```{r}
#| echo: false
coop1_pct <- scales::percent(coop_rates_sh$coop1, accuracy = 0.1)
coop2_pct <- scales::percent(coop_rates_sh$coop2, accuracy = 0.1)
```
::: callout-note
Pareto equilibrium (Stag-Stag) rate in Part 1: **`r coop1_pct`** of pairs → Part 2: **`r coop2_pct`**. A significant McNemar result would indicate that cheap talk systematically shifted couples from the risk-dominant (Hare-Hare) to the Pareto-dominant (Stag-Stag) equilibrium.
:::
### Conditioning on session gender
:::: {layout="[1,1]" layout-valign="top"}
```{r}
#| label: tab-cond-coord-sh
tab_cond_coord_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Coordination and Stag choice by session gender",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-coord-gender-sh
#| fig-cap: "SH — P(Coordination Part 2) by session gender (couple level). Error bars = 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 4
p_coord_gender_sh
```
::::
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>2 — Choice & signal distributions</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Describe the **marginal distributions of choices and signals** (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent's signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.
### Distributions table
```{r}
#| label: tab-sh-dist
tab_sh
```
### Proportions by phase
```{r}
#| label: fig-sh-dist
#| fig-cap: "SH — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices."
#| fig-height: 4
p_sh_dist
```
::: {.callout-note icon="false"}
##### Part 1 → Part 2 snapshot
```{r}
#| echo: false
sh_t1 <- tab_sh_dist_long |> dplyr::filter(phase == "Part 1", level == "D") |> dplyr::pull(pct)
sh_t2 <- tab_sh_dist_long |> dplyr::filter(phase == "Part 2", level == "D") |> dplyr::pull(pct)
sh_sig <- tab_sh_dist_long |> dplyr::filter(phase == "Signal") |> dplyr::arrange(dplyr::desc(n))
sh_t1 <- if (length(sh_t1) == 0) "—" else sh_t1
sh_t2 <- if (length(sh_t2) == 0) "—" else sh_t2
```
Stag choice (D) in Part 1: **`r sh_t1`**. The dominant signal was **`r sh_sig$level[1]`** (`r sh_sig$pct[1]`). Stag choice in Part 2: **`r sh_t2`**. An increase in D (Stag) from Part 1 to Part 2 would be consistent with cheap talk successfully shifting players from the safe Hare equilibrium to the Pareto-dominant Stag equilibrium. Unlike PD or MP, Hare is not a strictly dominant strategy in SH — it is merely risk-dominant — so signals can plausibly serve as coordination devices.
:::
### Within-subject shift: McNemar test (Part 1 vs Part 2)
```{r}
#| label: tab-mcnemar-sh
tab_mcnemar_sh |>
gt::gt() |>
gt::tab_header(
title = "McNemar test — SH: choice1_D vs choice2_D",
subtitle = "Paired within-individual"
) |>
gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
::: callout-note
A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Stag (D) choice rates. In SH, Hare is **not** a dominant strategy — it is the risk-dominant equilibrium, but Stag is the better mutual response if the opponent also plays Stag. Cheap talk is theoretically credible in SH because both players prefer the Pareto equilibrium (20, 20) over the risk-dominant one (14, 14), making signals self-signaling and self-committing.
:::
### Opponent signal and the information set before Part 2
::: callout-important
**Decision sequence.** After sending their own signal and *before* making the Part 2 choice, each player observes the **opponent's signal**. The Part 2 decision is taken with a two-dimensional information set: **(own signal sent) × (opponent's signal received)**. The four possible information sets are: T/T, T/D, D/T, D/D.
:::
```{r}
#| label: fig-sig-heatmap-sh
#| fig-cap: "SH — Joint distribution of own signal × opponent signal received. Values show count and share."
#| fig-height: 4
#| fig-width: 7
p_sig_heatmap_sh
```
```{r}
#| label: fig-choice2-oppsig-sh
#| fig-cap: "SH — P(choice₂ = D, Stag) stratified by opponent's signal. 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 6
p_choice2_by_oppsig_sh
```
::: {.callout-note icon="false"}
##### Cheap talk in SH: signal credibly shifts Stag choice
In the Stag Hunt, neither T (Hare) nor D (Stag) is a strictly dominant strategy. Hare is merely the **risk-dominant** choice — it is the safer option under uncertainty, but both players would prefer mutual Stag (20, 20) over mutual Hare (14, 14). This makes cheap talk highly credible: a player who signals D (Stag) is credibly committing to the Pareto-dominant equilibrium, because following through is individually rational as long as the opponent also plays Stag. Signals are both **self-signaling** (it is rational to send a D signal only if you intend to play D) and **self-committing** (it is rational to follow a D signal once sent). A large gap in Stag choice rates between receiving a D signal vs a T signal would confirm the effectiveness of cheap talk in SH.
:::
```{r}
#| label: fig-choice2-infoset-sh
#| fig-cap: "SH — P(choice₂ = D, Stag) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted."
#| fig-height: 5
#| fig-width: 9
p_choice2_infoset_sh
```
::: {.callout-note icon="false"}
##### SH interpretation
The critical test is the **(D/D) information set** — where both players have signalled Stag (D). If both players trust the signal, this should produce nearly 100% Stag choice, converging on the Pareto-dominant (D,D) equilibrium. Values below 100% reflect residual strategic risk aversion (fear of being the only Stag hunter). The **(T/T) information set** (both signalled Hare) represents the risk-dominant equilibrium pull — here Hare choice should be near 100%. A large D/D vs T/T gap in Stag choice rates is the clearest evidence of cheap talk effectiveness in SH.
:::
```{r}
#| label: fig-follow-opp-sh
#| fig-cap: "SH — Proportion of players whose Part 2 choice matches the opponent's signal received."
#| fig-height: 4
#| fig-width: 5
p_follow_opp_sh
```
### Conditioning on gender and role
```{r}
#| label: tab-cond-sig-sh
tab_cond_sig_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Choice and signal distributions by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-cond-sig-sh
#| fig-cap: "SH — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%."
#| fig-height: 4
#| fig-width: 9
p_cond_sig_sh
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>3 — Signal honesty & consistency</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Examine whether signals are **honest** (= same as the action eventually taken in Part 2) and **consistent** with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.
### Honesty and consistency proportions
```{r}
#| label: tab-sec2-sh
tab_sec2_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Signal honesty and consistency",
subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
locations = gt::cells_body(columns = variable, rows = 1)
) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
locations = gt::cells_body(columns = variable, rows = 2)
) |>
gt::tab_footnote(
footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
locations = gt::cells_body(columns = variable, rows = 3)
) |>
gt::tab_options(table.font.size = 13)
```
### Binomial test: signal honesty vs 50%
```{r}
#| label: tab-binom-honest-sh
tab_binom_honest_sh |>
gt::gt() |>
gt::tab_header(title = "Binomial test: P(Honest) vs H₀ = 0.50",
subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
p_value = "p-value") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
::: callout-note
In Stag Hunt, a signal is *honest* if the player's Part 2 action matches what they signalled. Unlike PD where honesty is only meaningful for the dominated cooperative strategy, in SH both signals (Stag/D and Hare/T) correspond to Nash equilibria. Honesty therefore measures pure coordination commitment. A player who signals Stag and plays Stag is attempting the Pareto-dominant equilibrium; a player who signals Hare and plays Hare is securing the risk-dominant equilibrium. The aggregate honesty rate is a genuine measure of how much players rely on cheap talk to coordinate their equilibria.
:::
### Strategic transition heatmaps
```{r}
#| label: fig-sankey-sh
#| fig-cap: "SH — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent)."
#| fig-height: 4
#| fig-width: 11
p_sankey_sh
```
### Conditioning on gender and role
```{r}
#| label: tab-cond-honest-sh
tab_cond_honest_sh |>
gt::gt() |>
gt::tab_header(title = "SH — Signal honesty and strategy change by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-cond-honest-sh
#| fig-cap: "SH — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%."
#| fig-height: 4
#| fig-width: 9
p_cond_honest_sh
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>4 — Belief accuracy & bonus</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
::: {.callout-note icon="false"}
##### The two belief questions
After Part 2, each player answered two incentivised questions about their beliefs:
**Belief 1 — First-order belief:** *"What do you think your opponent chose in Part 2?"* (T or D).
Scored correct (`GT_right_guess1 = 1`) if the player's prediction matched the opponent's actual Part 2 choice. Bonus: +2€ if correct.
**Belief 2 — Second-order belief:** *"What do you think your opponent believes you chose in Part 2?"* (T or D).
Scored correct (`GT_right_guess2 = 1`) if the player correctly identified what the opponent believed about the player's own choice. Bonus: +2€ if correct.
The **belief accuracy score** = `GT_right_guess1 + GT_right_guess2` ∈ {0, 1, 2}. The **belief bonus** = score × 2€ ∈ {0€, 2€, 4€}.
:::
### Objective
Describe the distribution of **belief accuracy scores** (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated **belief bonus** payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.
### Belief accuracy distribution
```{r}
#| label: fig-belief-bar-sh
#| fig-cap: "SH — Distribution of belief accuracy scores."
#| fig-height: 5
#| fig-width: 7
p_belief_bar_sh
```
### Hypothesis tests: beliefs, cognitive ability & coordination
```{r}
#| label: tab-belief-hyp-sh
#| echo: false
tab_belief_hyp_gt
```
::: callout-note
**H1** tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. **H2** tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. **H3** tests whether couples where *both* players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.
:::
### Conditioning on gender and role
```{r}
#| label: fig-cond-belief-sh
#| fig-cap: "SH — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct."
#| fig-height: 4
#| fig-width: 10
p_cond_belief_sh
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>5 — Econometric models</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
---
### 5.1 — Determinants of belief accuracy
Estimate an **ordered logit (proportional-odds model)** for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).
::: {.callout-warning icon="true"}
##### Small-sample caveat
With n = `r n_olog` participants (score 0: n=`r n_olog_0`; 1: n=`r n_olog_1`; 2: n=`r n_olog_2`), EPV is computed as min(n₀, n₂) / k: M1 = `r epv_B1`, M2 = `r epv_B2`, M3 = `r epv_B3`. All are well below the recommended 10. **All results are exploratory and should be treated as hypothesis-generating.**
:::
#### Model specifications
$$
\begin{aligned}
\text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\
\text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\
\text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z
\end{aligned}
$$
**MASC** = Theory of Mind total score; **IRI-PT** = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents' decisions); **IRI-PD** = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.
#### Coefficient table
```{r}
#| label: tab-olog-sh
tab_olog_gt
```
#### Goodness of fit
```{r}
#| label: tab-gof-olog-sh
tab_gof_olog_gt
```
#### Forest plot: odds ratios
```{r}
#| label: fig-forest-olog-sh
#| fig-cap: "SH — Ordered logit: odds ratios for belief accuracy. OR > 1 increases probability of higher accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect). x-axis log scale."
#| fig-height: 4
#| fig-width: 8
p_forest_olog
```
::: {.callout-note icon="false"}
##### Interpretation
```{r}
#| echo: false
m1_or_masc <- tab_olog_all |> dplyr::filter(model == "M1: MASC only", term == "MASC_z") |> dplyr::pull(OR)
m1_p_masc <- tab_olog_all |> dplyr::filter(model == "M1: MASC only", term == "MASC_z") |> dplyr::pull(p_value)
m2_or_iript <- tab_olog_all |> dplyr::filter(model == "M2: MASC + IRI-PT", term == "IRI_PT_z") |> dplyr::pull(OR)
m2_p_iript <- tab_olog_all |> dplyr::filter(model == "M2: MASC + IRI-PT", term == "IRI_PT_z") |> dplyr::pull(p_value)
m3_or_irish <- tab_olog_all |> dplyr::filter(model == "M3: MASC + IRI-PT + IRI-PD", term == "IRI_SH_z") |> dplyr::pull(OR)
m3_p_irish <- tab_olog_all |> dplyr::filter(model == "M3: MASC + IRI-PT + IRI-PD", term == "IRI_SH_z") |> dplyr::pull(p_value)
```
**MASC ToM (M1–M3):** OR = `r m1_or_masc`, p = `r m1_p_masc`. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents' decisions — an OR > 1 is consistent with this interpretation. **IRI Perspective Taking (M2–M3):** OR = `r m2_or_iript`, p = `r m2_p_iript` — cognitive empathy is directly relevant to inferring opponents' intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. **IRI Personal Distress (M3):** OR = `r m3_or_irish`, p = `r m3_p_irish` — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ `r epv_B1` across all models, all estimates carry substantial uncertainty.
:::
---
### 5.2 — Stag choice under Hare signal
::: {.callout-warning icon="false"}
##### Small-sample note
This analysis restricts the sample to participants who received a **T signal (Hare)** from their opponent (`opp_signal_received = T`). The dependent variable is whether they nonetheless chose **D (Stag)** — the Pareto-dominant action — despite receiving a Hare signal. Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.
:::
#### Models
$$
\begin{aligned}
\text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\
\text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\
\text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT}
\end{aligned}
$$
**Sample**: `opp_signal_received = T` only. n = `r n_T`, events = `r n_events_T`. EPV: M1 = `r round(epv_T_m1, 1)`, M2 = `r round(epv_T_m2, 1)`, M3 = `r round(epv_T_m3, 1)`. All estimated with Firth penalised logit.
#### Results
```{r}
#| label: tab-logit-D
#| echo: false
#| message: false
tab_T_gt
```
```{r}
#| label: fig-forest-D
#| echo: false
#| message: false
#| fig-cap: "Stag choice when opponent signals T (Hare) — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(choose Stag). All Firth. x-axis log scale."
#| fig-height: 3.5
#| fig-width: 8
p_forest_T
```
::: {.callout-note icon="false"}
##### Interpretation
```{r}
#| echo: false
d_or_quiz <- round(exp(coef(m_stag_hare)["log_quiz_err"]), 2)
d_p_quiz <- round(summary(m_stag_hare)$coefficients["log_quiz_err", 4], 3)
d_or_crt <- round(exp(coef(m_stag_hare)["CRT4"]), 2)
d_p_crt <- round(summary(m_stag_hare)$coefficients["CRT4", 4], 3)
```
**Quiz errors** (OR = `r d_or_quiz`, p = `r d_p_quiz`): a higher error rate on the comprehension quiz may reflect lower understanding of the game, potentially reducing willingness to attempt the risky Stag choice even when the opponent signals Hare (T). **CRT score** (OR = `r d_or_crt`, p = `r d_p_crt`): more reflective thinkers may be better at recognising that choosing Stag despite a Hare signal is a dominated gamble — the opponent has revealed their intention to play Hare, making Stag strictly worse. EPV = `r round(epv_T, 1)` — estimates are exploratory and should be interpreted with caution.
:::
```{=html}
</div>
</details>
```