Show code
p_eq_bsGT Behaviour · GTEMO Experiment
Eric Guerci
March 22, 2026
| L / Yield (col) | R / Demand (col) | |
|---|---|---|
| T / Yield (row) | 0, 0 | 12, 36 ♦ P2 Pref |
| D / Demand (row) | 36, 12 ★ P1 Pref | 0, 0 |
In Battle of the Sexes, there is an asymmetric conflict of interest. Both players want to coordinate, but on different equilibria. Action “D” corresponds to demanding one’s preferred equilibrium, while “T” corresponds to yielding to the other’s preferred equilibrium.
Describe the equilibrium outcomes reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.
coord_bs |>
dplyr::select(part, x, n, pct, ci95) |>
gt::gt() |>
gt::tab_header(title = "BS — Coordination rates: Part 1 vs Part 2",
subtitle = "95% Clopper-Pearson CI") |>
gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
p_coord_bs| BS — Coordination rates: Part 1 vs Part 2 | ||||
| 95% Clopper-Pearson CI | ||||
| Phase | n coordinated | N | % | 95% CI |
|---|---|---|---|---|
| Part 1 | 6 | 16 | 37.5% | [15.2%, 64.6%] |
| Part 2 | 9 | 16 | 56.2% | [29.9%, 80.2%] |
tab_mc_bs |>
gt::gt() |>
gt::tab_header(title = "McNemar tests — BS (couple level)",
subtitle = "Paired Part 1 vs Part 2") |>
gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| McNemar tests — BS (couple level) | |||
| Paired Part 1 vs Part 2 | |||
| Test | χ² | p-value | Note |
|---|---|---|---|
| BS — coord Part1 vs Part2 | 0.4440 | 0.5050 | OK |
| BS — mutual Demand (D) Part1 vs Part2 | 0.4440 | 0.5050 | OK |
Coordination rate in Part 1: 37.5% of pairs → Part 2: 56.2%. In BS, coordination means landing on either the P1 Preferred (D,T) or P2 Preferred (T,D) equilibrium — both produce positive payoffs (36+12). A significant McNemar result would indicate that cheap talk systematically shifted couples towards coordinated outcomes and away from miscoordination ((D,D) or (T,T), both yielding 0€ for each player).
tab_cond_coord_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Coordination and Demand choice by session gender",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
p_coord_gender_bs| BS — Coordination and Demand choice by session gender | ||
| χ² with Monte Carlo simulated p-value (B = 2000), couple level | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Coordination Part 2 | Session gender | χ²(sim.): p = 1.000 ns |
| Coordination Part 1 | Session gender | χ²(sim.): p = 0.610 ns |
| Mutual Demand (D) Part 2 | Session gender | χ²(sim.): p = 1.000 ns |
| Mutual Demand (D) Part 1 | Session gender | χ²(sim.): p = 0.621 ns |
Describe the marginal distributions of choices and signals (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent’s signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.
Demand choice (D) in Part 1: 75.0%. The dominant signal was D (65.6%). Demand choice in Part 2: 65.6%. In BS, neither Demand (D) nor Yield (T) is a dominant strategy — the optimal action depends entirely on what the opponent plays. A high D signal rate would reflect players trying to claim their preferred equilibrium via cheap talk, while a shift in D choice from Part 1 to Part 2 would indicate that pre-play communication influenced strategic behaviour.
tab_mcnemar_bs |>
gt::gt() |>
gt::tab_header(
title = "McNemar test — BS: choice1_D vs choice2_D",
subtitle = "Paired within-individual"
) |>
gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| McNemar test — BS: choice1_D vs choice2_D | |||
| Paired within-individual | |||
| χ² | p-value | n | Note |
|---|---|---|---|
| 0.3080 | 0.5791 | 32 | OK |
A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Demand (D) choice rates. In BS, neither D nor T is a dominant strategy — best responses are symmetric. Cheap talk can be credible here: a player who signals D (Demand) is announcing their intention to claim the P1-preferred equilibrium. If the opponent signals T (Yield), the combination D/T (P1 Preferred) or T/D (P2 Preferred) allows efficient coordination. Signals can therefore serve as commitment devices, helping couples resolve the equilibrium-selection problem.
Decision sequence. After sending their own signal and before making the Part 2 choice, each player observes the opponent’s signal. The Part 2 decision is taken with a two-dimensional information set: (own signal sent) × (opponent’s signal received). The four possible information sets are: T/T, T/D, D/T, D/D.
In the Battle of the Sexes, neither T (Yield) nor D (Demand) is a strictly dominant strategy — the game has two pure-strategy Nash equilibria: the P1 Preferred equilibrium (D,T) yielding (36, 12), and the P2 Preferred equilibrium (T,D) yielding (12, 36). Miscoordination ((D,D) or (T,T)) yields (0, 0) for both players. This structure makes cheap talk potentially credible: unlike PD where defection dominates regardless, in BS both players want to coordinate — they just disagree on which equilibrium. A player who signals D (Demand) is asserting their preference for the (D,T) equilibrium; if the opponent then yields (T), coordination is achieved. Signals are self-committing (following through after signalling D is rational if the opponent yields) and may function as credible announcements of intent. A large gap in Demand choice rates conditional on receiving D vs T from the opponent would confirm that cheap talk effectively resolves the equilibrium-selection problem in BS.
The critical test is the (D/D) information set — where both players have signalled Demand. If both players try to claim their preferred equilibrium, the outcome is the (D,D) miscoordination trap (0, 0 for both). Demand choice rates at (D/D) reflect the degree of strategic stubbornness when both parties refuse to yield. The (T/T) information set (both signalled Yield) represents mutual deference — Yield choice should be near 100% and the (T,T) miscoordination trap should dominate. The most informative cells are the (D/T) and (T/D) information sets — asymmetric signals — where one player signals D and the other T. If signals are effective, the D-signaller should mostly Demand and the T-signaller should mostly Yield, producing the efficient (D,T) or (T,D) equilibria. A high rate of Demand choice in the D-received cell vs the T-received cell is the clearest evidence of cheap talk effectiveness in BS.
tab_cond_sig_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Choice and signal distributions by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)| BS — Choice and signal distributions by gender and role | ||
| χ² with Monte Carlo simulated p-value (B = 2000) | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Part 1 = T | Gender | χ²(sim.): p = 1.000 ns |
| Part 1 = T | Role | χ²(sim.): p = 0.675 ns |
| Signal = T | Gender | χ²(sim.): p = 0.456 ns |
| Signal = T | Role | χ²(sim.): p = 1.000 ns |
| Part 2 = T | Gender | χ²(sim.): p = 1.000 ns |
| Part 2 = T | Role | χ²(sim.): p = 0.472 ns |
Examine whether signals are honest (= same as the action eventually taken in Part 2) and consistent with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.
tab_sec2_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Signal honesty and consistency",
subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
locations = gt::cells_body(columns = variable, rows = 1)
) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
locations = gt::cells_body(columns = variable, rows = 2)
) |>
gt::tab_footnote(
footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
locations = gt::cells_body(columns = variable, rows = 3)
) |>
gt::tab_options(table.font.size = 13)| BS — Signal honesty and consistency | ||||
| 95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no). | ||||
| Measure | n (=1) | N | % | 95% CI |
|---|---|---|---|---|
| Signal honest (signal = choice2)1 | 20 | 32 | 62.5% | [43.7%, 78.9%] |
| Signal consistent with Part 1 (signal = choice1)2 | 21 | 32 | 65.6% | [46.8%, 81.4%] |
| Strategy changed Part 1 → Part 2 (choice1 ≠ choice2)3 | 13 | 32 | 40.6% | [23.7%, 59.4%] |
| 1 1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal. | ||||
| 2 1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2. | ||||
| 3 1 if the Part 2 choice differs from the Part 1 choice (choice1 ≠ choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1–2 due to missing values in different variables. | ||||
tab_binom_honest_bs |>
gt::gt() |>
gt::tab_header(title = "Binomial test: P(Honest) vs H₀ = 0.50",
subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
p_value = "p-value") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())| Binomial test: P(Honest) vs H₀ = 0.50 | ||||
| Two-sided test; 95% Clopper-Pearson CI | ||||
| n honest | N | % | 95% CI | p-value |
|---|---|---|---|---|
| 20 | 32 | 62.5% | [43.7%, 78.9%] | 0.2153 |
In Battle of the Sexes, a signal is honest if the player’s Part 2 action matches what they signalled. Honesty in this game reflects commitment. A player who signals “Demand” and plays “Demand” is using cheap talk to credibly commit and force the opponent to yield. A player who signals “Demand” but then plays “Yield” is effectively “chickening out” of their threat. The aggregate honesty rate primarily captures the credibility of these strategic demands.
tab_cond_honest_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Signal honesty and strategy change by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)| BS — Signal honesty and strategy change by gender and role | ||
| χ² with Monte Carlo simulated p-value (B = 2000) | ||
| Outcome | Factor | χ²(sim.) test |
|---|---|---|
| Signal honest | Gender | χ²(sim.): p = 0.715 ns |
| Signal honest | Role | χ²(sim.): p = 0.720 ns |
| Strategy changed | Gender | χ²(sim.): p = 0.477 ns |
| Strategy changed | Role | χ²(sim.): p = 0.478 ns |
After Part 2, each player answered two incentivised questions about their beliefs:
Belief 1 — First-order belief: “What do you think your opponent chose in Part 2?” (T or D). Scored correct (GT_right_guess1 = 1) if the player’s prediction matched the opponent’s actual Part 2 choice. Bonus: +2€ if correct.
Belief 2 — Second-order belief: “What do you think your opponent believes you chose in Part 2?” (T or D). Scored correct (GT_right_guess2 = 1) if the player correctly identified what the opponent believed about the player’s own choice. Bonus: +2€ if correct.
The belief accuracy score = GT_right_guess1 + GT_right_guess2 ∈ {0, 1, 2}. The belief bonus = score × 2€ ∈ {0€, 2€, 4€}.
Describe the distribution of belief accuracy scores (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated belief bonus payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.
| BS — Belief accuracy: hypothesis comparison | |||||||||
| H1–H2: individual level. H3: couple level. Stat = Spearman ρ or φ (phi). | |||||||||
| Level | Hypothesis | X | Y | Test | Expected | Stat | p | n | |
|---|---|---|---|---|---|---|---|---|---|
| H1 | Individual | Reflective thinkers (high CRT) predict opponent’s choice more accurately | CRT score (0–4) | Belief accuracy (0–2) | Spearman ρ | Positive | 0.169 | 0.354 | 32 |
| H2 | Individual | Players who made more quiz errors have less accurate beliefs | Quiz errors [log(1+x)] | Belief accuracy (0–2) | Spearman ρ | Negative | 0.099 | 0.591 | 32 |
| H3 | Couple | Couples where both players have perfect beliefs coordinate more in Part 2 | Coord. Part 2 (0/1) | Both perfect beliefs (0/1) | Fisher exact + φ1 | Positive | 0.424 | 0.213 | 16 |
| 1 H3 stat = phi coefficient (φ); p-value from two-sided Fisher exact test on 2×2 contingency table. | |||||||||
H1 tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. H2 tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. H3 tests whether couples where both players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.
Estimate an ordered logit (proportional-odds model) for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).
With n = 32 participants (score 0: n=5; 1: n=16; 2: n=11), EPV is computed as min(n₀, n₂) / k: M1 = 5, M2 = 2.5, M3 = 1.7. All are well below the recommended 10. All results are exploratory and should be treated as hypothesis-generating.
\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\ \text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\ \text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z \end{aligned} \]
MASC = Theory of Mind total score; IRI-PT = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents’ decisions); IRI-PD = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.
| BS — Ordered logit: determinants of belief accuracy | |||||||
| DV = belief accuracy score (0/1/2, ordered). n = 32 (score 0: n=5; 1: n=16; 2: n=11). MASC/IRI z-scored. OR > 1 increases probability of higher accuracy.1 | |||||||
| Predictor | β | SE | t | p | OR | OR 2.5% | OR 97.5% |
|---|---|---|---|---|---|---|---|
| M1: MASC only | |||||||
| MASC ToM score (z) | 0.316 | 0.349 | 0.904 | 0.3661 | 1.371 | 0.691 | 2.719 |
| M2: MASC + IRI-PT | |||||||
| MASC ToM score (z) | 0.269 | 0.359 | 0.751 | 0.4529 | 1.309 | 0.648 | 2.695 |
| IRI Perspective Taking (z) | -0.283 | 0.365 | -0.775 | 0.4385 | 0.754 | 0.361 | 1.544 |
| M3: MASC + IRI-PT + IRI-PD | |||||||
| MASC ToM score (z) | 0.266 | 0.360 | 0.740 | 0.4595 | 1.305 | 0.644 | 2.690 |
| IRI Perspective Taking (z) | -0.276 | 0.369 | -0.750 | 0.4531 | 0.758 | 0.361 | 1.562 |
| IRI Personal Distress (z) | 0.044 | 0.342 | 0.129 | 0.8977 | 1.045 | 0.527 | 2.050 |
| 1 Ordered logit (proportional-odds, MASS::polr). p-values from two-tailed z-test on t-statistic. CI from profile likelihood where convergent, otherwise Wald. All models MLE; EPV < 10 — interpret cautiously. | |||||||
MASC ToM (M1–M3): OR = 1.371, p = 0.3661. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents’ decisions — an OR > 1 is consistent with this interpretation. IRI Perspective Taking (M2–M3): OR = 0.754, p = 0.4385 — cognitive empathy is directly relevant to inferring opponents’ intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. IRI Personal Distress (M3): OR = 1.045, p = 0.8977 — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ 5 across all models, all estimates carry substantial uncertainty.
This analysis restricts the sample to participants who received a D signal (Demand) from their opponent (opp_signal_received = D). The dependent variable is whether they themselves also chose D (Demand), leading to the (D,D) miscoordination outcome (0, 0). Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.
\[ \begin{aligned} \text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\ \text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\ \text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT} \end{aligned} \]
Sample: opp_signal_received = T only. n = 21, events = 12. EPV: M1 = 12, M2 = 6, M3 = 12. All estimated with Firth penalised logit.
| BS — Demand (D) when opponent signals T (Hare) | |||||||
| DV = choice2=D | opp_signal=D. n=21, events=12. EPV: M1=12, M2=6, M3=12. All Firth penalised logit (brglm2).1 | |||||||
| Predictor | β | SE | z | p | OR | OR 2.5% | OR 97.5% |
|---|---|---|---|---|---|---|---|
| M1: Quiz only | |||||||
| Quiz errors [log(1+x)] | -0.112 | 0.523 | -0.21 | 0.831 | 0.89 | 0.30 | 2.64 |
| M2: Quiz + CRT | |||||||
| Quiz errors [log(1+x)] | -0.133 | 0.537 | -0.25 | 0.804 | 0.88 | 0.31 | 2.51 |
| CRT score (0–4) | -0.203 | 0.483 | -0.42 | 0.674 | 0.82 | 0.32 | 2.10 |
| M3: CRT only | |||||||
| CRT score (0–4) | -0.250 | 0.477 | -0.52 | 0.600 | 0.78 | 0.28 | 1.95 |
| 1 Firth penalised logit used for all models (EPV < 10). OR > 1 → increases P(Demand (D) | opp signals D). 95% Wald CI. | |||||||
Quiz errors (OR = 0.88, p = 0.804): a higher error rate on the comprehension quiz may reflect lower understanding of the game’s conflict structure, potentially increasing stubborn Demand even when the opponent has already signalled Demand (risking (D,D) miscoordination). CRT score (OR = 0.82, p = 0.674): more reflective thinkers may better recognise that when the opponent signals D (Demand), the individually rational response to avoid (0,0) is to Yield (T) and accept the P2 Preferred equilibrium (12, 36) rather than insist on Demand and risk mutual miscoordination. A negative OR for CRT would be consistent with this interpretation. EPV = 6 — estimates are exploratory and should be interpreted with caution.
---
title: "BS — Battle of the Sexes"
subtitle: "GT Behaviour · GTEMO Experiment"
author: "Eric Guerci"
date: today
format:
html:
theme: flatly
toc: true
toc-depth: 2
toc-title: "Contents"
number-sections: false
code-fold: true
code-summary: "Show code"
code-tools: true
fig-width: 9
fig-height: 4
fig-dpi: 150
smooth-scroll: true
embed-resources: true
execute:
echo: true
warning: false
message: false
---
```{r setup}
#| include: false
source("code.R")
```
::: {.callout-tip icon="false"}
##### Payoff matrix — Row payoff, Column payoff
| | **L / Yield** (col) | **R / Demand** (col) |
|:---:|:---:|:---:|
| **T / Yield** (row) | 0, 0 | **12, 36** ♦ P2 Pref |
| **D / Demand** (row) | **36, 12** ★ P1 Pref | 0, 0 |
In Battle of the Sexes, there is an asymmetric conflict of interest. Both players want to coordinate, but on different equilibria. Action "D" corresponds to demanding one's preferred equilibrium, while "T" corresponds to yielding to the other's preferred equilibrium.
:::
---
```{=html}
<details open>
<summary><strong>1 — Equilibria & coordination</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Describe the **equilibrium outcomes** reached by each couple in Part 1 and Part 2, and test whether cheap talk shifted coordination rates using paired McNemar tests.
::: {.callout-tip icon="false"}
##### Equilibrium labels (BS)
- **P1 Preferred (D,L)** — Coordinated; P1 gets 36, P2 gets 12 ★
- **P2 Preferred (T,R)** — Coordinated; P1 gets 12, P2 gets 36 ♦
- **Both Demand (D,R)** — Miscoordinated; both get 0
- **Both Yield (T,L)** — Miscoordinated; both get 0
:::
### Equilibrium distributions
```{r}
#| label: fig-eq-bs
#| fig-cap: "BS — Equilibrium distributions in Part 1 (top) and Part 2 (bottom)."
#| fig-height: 8
#| fig-width: 8
p_eq_bs
```
### Coordination rates: Part 1 vs Part 2
:::: {layout="[1,1]" layout-valign="top"}
```{r}
#| label: tab-coord-rates-bs
coord_bs |>
dplyr::select(part, x, n, pct, ci95) |>
gt::gt() |>
gt::tab_header(title = "BS — Coordination rates: Part 1 vs Part 2",
subtitle = "95% Clopper-Pearson CI") |>
gt::cols_label(part = "Phase", x = "n coordinated", n = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
```{r}
#| label: fig-coord-bs
#| fig-cap: "BS — Coordination rate in Part 1 vs Part 2 (couple level). 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 4
p_coord_bs
```
::::
### McNemar tests
```{r}
#| label: tab-mc-bs
tab_mc_bs |>
gt::gt() |>
gt::tab_header(title = "McNemar tests — BS (couple level)",
subtitle = "Paired Part 1 vs Part 2") |>
gt::cols_label(label = "Test", statistic = "χ²", p_value = "p-value",
note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4, rows = !is.na(statistic)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
```{r}
#| echo: false
coop1_pct <- scales::percent(coop_rates_bs$coop1, accuracy = 0.1)
coop2_pct <- scales::percent(coop_rates_bs$coop2, accuracy = 0.1)
```
::: callout-note
Coordination rate in Part 1: **`r coop1_pct`** of pairs → Part 2: **`r coop2_pct`**. In BS, coordination means landing on either the P1 Preferred (D,T) or P2 Preferred (T,D) equilibrium — both produce positive payoffs (36+12). A significant McNemar result would indicate that cheap talk systematically shifted couples towards coordinated outcomes and away from miscoordination ((D,D) or (T,T), both yielding 0€ for each player).
:::
### Conditioning on session gender
:::: {layout="[1,1]" layout-valign="top"}
```{r}
#| label: tab-cond-coord-bs
tab_cond_coord_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Coordination and Demand choice by session gender",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000), couple level") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-coord-gender-bs
#| fig-cap: "BS — P(Coordination Part 2) by session gender (couple level). Error bars = 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 4
p_coord_gender_bs
```
::::
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>2 — Choice & signal distributions</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Describe the **marginal distributions of choices and signals** (Part 1 choice → signal sent → Part 2 choice), and examine how the opponent's signal shapes Part 2 behaviour. All proportions use exact 95% Clopper-Pearson CIs.
### Distributions table
```{r}
#| label: tab-bs-dist
tab_bs
```
### Proportions by phase
```{r}
#| label: fig-bs-dist
#| fig-cap: "BS — Proportions of each choice/signal type with 95% Clopper-Pearson CIs. Left: Part 1 choices; centre: signals sent; right: Part 2 choices."
#| fig-height: 4
p_bs_dist
```
::: {.callout-note icon="false"}
##### Part 1 → Part 2 snapshot
```{r}
#| echo: false
bs_t1 <- tab_bs_dist_long |> dplyr::filter(phase == "Part 1", level == "D") |> dplyr::pull(pct)
bs_t2 <- tab_bs_dist_long |> dplyr::filter(phase == "Part 2", level == "D") |> dplyr::pull(pct)
bs_sig <- tab_bs_dist_long |> dplyr::filter(phase == "Signal") |> dplyr::arrange(dplyr::desc(n))
bs_t1 <- if (length(bs_t1) == 0) "—" else bs_t1
bs_t2 <- if (length(bs_t2) == 0) "—" else bs_t2
```
Demand choice (D) in Part 1: **`r bs_t1`**. The dominant signal was **`r bs_sig$level[1]`** (`r bs_sig$pct[1]`). Demand choice in Part 2: **`r bs_t2`**. In BS, neither Demand (D) nor Yield (T) is a dominant strategy — the optimal action depends entirely on what the opponent plays. A high D signal rate would reflect players trying to claim their preferred equilibrium via cheap talk, while a shift in D choice from Part 1 to Part 2 would indicate that pre-play communication influenced strategic behaviour.
:::
### Within-subject shift: McNemar test (Part 1 vs Part 2)
```{r}
#| label: tab-mcnemar-bs
tab_mcnemar_bs |>
gt::gt() |>
gt::tab_header(
title = "McNemar test — BS: choice1_D vs choice2_D",
subtitle = "Paired within-individual"
) |>
gt::cols_label(statistic = "χ²", p_value = "p-value", n = "n", note = "Note") |>
gt::fmt_number(columns = c(statistic, p_value), decimals = 4) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = !is.na(p_value) & p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
::: callout-note
A significant McNemar result (p < 0.05) indicates that pre-play cheap talk systematically shifted individual Demand (D) choice rates. In BS, neither D nor T is a dominant strategy — best responses are symmetric. Cheap talk can be credible here: a player who signals D (Demand) is announcing their intention to claim the P1-preferred equilibrium. If the opponent signals T (Yield), the combination D/T (P1 Preferred) or T/D (P2 Preferred) allows efficient coordination. Signals can therefore serve as commitment devices, helping couples resolve the equilibrium-selection problem.
:::
### Opponent signal and the information set before Part 2
::: callout-important
**Decision sequence.** After sending their own signal and *before* making the Part 2 choice, each player observes the **opponent's signal**. The Part 2 decision is taken with a two-dimensional information set: **(own signal sent) × (opponent's signal received)**. The four possible information sets are: T/T, T/D, D/T, D/D.
:::
```{r}
#| label: fig-sig-heatmap-bs
#| fig-cap: "BS — Joint distribution of own signal × opponent signal received. Values show count and share."
#| fig-height: 4
#| fig-width: 7
p_sig_heatmap_bs
```
```{r}
#| label: fig-choice2-oppsig-bs
#| fig-cap: "BS — P(choice₂ = D, Stag) stratified by opponent's signal. 95% Clopper-Pearson CI."
#| fig-height: 4
#| fig-width: 6
p_choice2_by_oppsig_bs
```
::: {.callout-note icon="false"}
##### Cheap talk in BS: signal credibly shifts Demand choice
In the Battle of the Sexes, neither T (Yield) nor D (Demand) is a strictly dominant strategy — the game has two pure-strategy Nash equilibria: the P1 Preferred equilibrium (D,T) yielding (36, 12), and the P2 Preferred equilibrium (T,D) yielding (12, 36). Miscoordination ((D,D) or (T,T)) yields (0, 0) for both players. This structure makes cheap talk potentially credible: unlike PD where defection dominates regardless, in BS both players want to coordinate — they just disagree on *which* equilibrium. A player who signals D (Demand) is asserting their preference for the (D,T) equilibrium; if the opponent then yields (T), coordination is achieved. Signals are **self-committing** (following through after signalling D is rational if the opponent yields) and may function as credible announcements of intent. A large gap in Demand choice rates conditional on receiving D vs T from the opponent would confirm that cheap talk effectively resolves the equilibrium-selection problem in BS.
:::
```{r}
#| label: fig-choice2-infoset-bs
#| fig-cap: "BS — P(choice₂ = D, Stag) by full information set (own/opp). Colour = own signal. 95% CI. Information sets with no observations are omitted."
#| fig-height: 5
#| fig-width: 9
p_choice2_infoset_bs
```
::: {.callout-note icon="false"}
##### BS interpretation
The critical test is the **(D/D) information set** — where both players have signalled Demand. If both players try to claim their preferred equilibrium, the outcome is the (D,D) miscoordination trap (0, 0 for both). Demand choice rates at (D/D) reflect the degree of strategic stubbornness when both parties refuse to yield. The **(T/T) information set** (both signalled Yield) represents mutual deference — Yield choice should be near 100% and the (T,T) miscoordination trap should dominate. The most informative cells are the **(D/T) and (T/D) information sets** — asymmetric signals — where one player signals D and the other T. If signals are effective, the D-signaller should mostly Demand and the T-signaller should mostly Yield, producing the efficient (D,T) or (T,D) equilibria. A high rate of Demand choice in the D-received cell vs the T-received cell is the clearest evidence of cheap talk effectiveness in BS.
:::
```{r}
#| label: fig-follow-opp-bs
#| fig-cap: "BS — Proportion of players whose Part 2 choice matches the opponent's signal received."
#| fig-height: 4
#| fig-width: 5
p_follow_opp_bs
```
### Conditioning on gender and role
```{r}
#| label: tab-cond-sig-bs
tab_cond_sig_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Choice and signal distributions by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-cond-sig-bs
#| fig-cap: "BS — P(Signal = T) by gender and role. Error bars = 95% Clopper-Pearson CI. Dashed line = 50%."
#| fig-height: 4
#| fig-width: 9
p_cond_sig_bs
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>3 — Signal honesty & consistency</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
### Objective
Examine whether signals are **honest** (= same as the action eventually taken in Part 2) and **consistent** with Part 1 choices. Assess the prevalence of strategy switches between Part 1 and Part 2.
### Honesty and consistency proportions
```{r}
#| label: tab-sec2-bs
tab_sec2_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Signal honesty and consistency",
subtitle = "95% Clopper-Pearson CI. Each row is a binary indicator (1 = yes, 0 = no).") |>
gt::cols_label(variable = "Measure", n = "n (=1)", N = "N",
pct = "%", ci95 = "95% CI") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 2 choice (e.g. sent T and chose T in Part 2). Measures whether players followed through on their signal.",
locations = gt::cells_body(columns = variable, rows = 1)
) |>
gt::tab_footnote(
footnote = "1 if the signal sent equals the Part 1 choice (e.g. signalled T and had also chosen T in Part 1). Measures whether the signal reflects past behaviour — independent of Part 2.",
locations = gt::cells_body(columns = variable, rows = 2)
) |>
gt::tab_footnote(
footnote = "1 if the Part 2 choice differs from the Part 1 choice (choice1 \u2260 choice2). Measures switching behaviour across rounds, independent of the signal. N may differ from rows 1\u20132 due to missing values in different variables.",
locations = gt::cells_body(columns = variable, rows = 3)
) |>
gt::tab_options(table.font.size = 13)
```
### Binomial test: signal honesty vs 50%
```{r}
#| label: tab-binom-honest-bs
tab_binom_honest_bs |>
gt::gt() |>
gt::tab_header(title = "Binomial test: P(Honest) vs H₀ = 0.50",
subtitle = "Two-sided test; 95% Clopper-Pearson CI") |>
gt::cols_label(x = "n honest", n = "N", pct = "%", ci95 = "95% CI",
p_value = "p-value") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_body(columns = p_value,
rows = p_value < 0.05)) |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels())
```
::: callout-note
In Battle of the Sexes, a signal is *honest* if the player's Part 2 action matches what they signalled. Honesty in this game reflects commitment. A player who signals "Demand" and plays "Demand" is using cheap talk to credibly commit and force the opponent to yield. A player who signals "Demand" but then plays "Yield" is effectively "chickening out" of their threat. The aggregate honesty rate primarily captures the credibility of these strategic demands.
:::
### Strategic transition heatmaps
```{r}
#| label: fig-sankey-bs
#| fig-cap: "BS — Strategic transitions. Left: Part 1 → Signal (row %: conditional on Part 1 choice). Right: Signal → Part 2 (row %: conditional on signal sent)."
#| fig-height: 4
#| fig-width: 11
p_sankey_bs
```
### Conditioning on gender and role
```{r}
#| label: tab-cond-honest-bs
tab_cond_honest_bs |>
gt::gt() |>
gt::tab_header(title = "BS — Signal honesty and strategy change by gender and role",
subtitle = "χ² with Monte Carlo simulated p-value (B = 2000)") |>
gt::cols_label(Outcome = "Outcome", Factor = "Factor", Test = "χ²(sim.) test") |>
gt::tab_style(style = gt::cell_text(weight = "bold"),
locations = gt::cells_column_labels()) |>
gt::opt_stylize(style = 1) |>
gt::tab_options(table.font.size = 13)
```
```{r}
#| label: fig-cond-honest-bs
#| fig-cap: "BS — P(Honest) by gender (left) and role (right). Error bars = 95% Clopper-Pearson CI. Dashed line = 50%."
#| fig-height: 4
#| fig-width: 9
p_cond_honest_bs
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>4 — Belief accuracy & bonus</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
::: {.callout-note icon="false"}
##### The two belief questions
After Part 2, each player answered two incentivised questions about their beliefs:
**Belief 1 — First-order belief:** *"What do you think your opponent chose in Part 2?"* (T or D).
Scored correct (`GT_right_guess1 = 1`) if the player's prediction matched the opponent's actual Part 2 choice. Bonus: +2€ if correct.
**Belief 2 — Second-order belief:** *"What do you think your opponent believes you chose in Part 2?"* (T or D).
Scored correct (`GT_right_guess2 = 1`) if the player correctly identified what the opponent believed about the player's own choice. Bonus: +2€ if correct.
The **belief accuracy score** = `GT_right_guess1 + GT_right_guess2` ∈ {0, 1, 2}. The **belief bonus** = score × 2€ ∈ {0€, 2€, 4€}.
:::
### Objective
Describe the distribution of **belief accuracy scores** (0 = both beliefs wrong; 1 = one correct; 2 = both correct) and the associated **belief bonus** payoff. Assess whether belief accuracy is correlated with coordination outcomes at the couple level.
### Belief accuracy distribution
```{r}
#| label: fig-belief-bar-bs
#| fig-cap: "BS — Distribution of belief accuracy scores."
#| fig-height: 5
#| fig-width: 7
p_belief_bar_bs
```
### Hypothesis tests: beliefs, cognitive ability & coordination
```{r}
#| label: tab-belief-hyp-bs
#| echo: false
tab_belief_hyp_gt
```
::: callout-note
**H1** tests whether more reflective players (higher CRT) are better at predicting their opponent — if strategic reasoning drives belief formation, a positive Spearman ρ is expected. **H2** tests whether players who struggled with game comprehension (more quiz errors) hold less accurate beliefs — expected direction is negative. **H3** tests whether couples where *both* players had perfect beliefs were more likely to coordinate in Part 2 — the only couple-level hypothesis; uses Fisher exact given small N and binary outcomes. Note that H1 and H2 operate at the individual level while H3 is at the couple level; they are not directly comparable.
:::
### Conditioning on gender and role
```{r}
#| label: fig-cond-belief-bs
#| fig-cap: "BS — Belief accuracy score (0/1/2) by gender (left) and role (right). Bars show proportion within each group; labels show % and count. Score 0 = both beliefs wrong, 1 = one correct, 2 = both correct."
#| fig-height: 4
#| fig-width: 10
p_cond_belief_bs
```
```{=html}
</div>
</details>
```
---
```{=html}
<details>
<summary><strong>5 — Econometric models</strong></summary>
<div style="padding: 0.75em 0.5em 0.5em 0.5em;">
```
---
### 5.1 — Determinants of belief accuracy
Estimate an **ordered logit (proportional-odds model)** for the belief accuracy score (0 = both wrong, 1 = one correct, 2 = both correct) using Theory of Mind (MASC) and IRI subscales as predictors. The proportional-odds assumption implies a single log-odds shift per unit increase in each predictor, shared across both thresholds (0→1 and 1→2).
::: {.callout-warning icon="true"}
##### Small-sample caveat
With n = `r n_olog` participants (score 0: n=`r n_olog_0`; 1: n=`r n_olog_1`; 2: n=`r n_olog_2`), EPV is computed as min(n₀, n₂) / k: M1 = `r epv_B1`, M2 = `r epv_B2`, M3 = `r epv_B3`. All are well below the recommended 10. **All results are exploratory and should be treated as hypothesis-generating.**
:::
#### Model specifications
$$
\begin{aligned}
\text{M1:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z \\
\text{M2:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z \\
\text{M3:} \quad & \text{logit}\,P(Y \le j) = \alpha_j - \beta_1\,\text{MASC}_z - \beta_2\,\text{IRI-PT}_z - \beta_3\,\text{IRI-PD}_z
\end{aligned}
$$
**MASC** = Theory of Mind total score; **IRI-PT** = Perspective Taking subscale (cognitive empathy — most directly linked to predicting opponents' decisions); **IRI-PD** = Personal Distress subscale (self-oriented distress — may impair strategic prediction). All scores z-standardised. OR > 1 shifts probability towards higher belief accuracy.
#### Coefficient table
```{r}
#| label: tab-olog-bs
tab_olog_gt
```
#### Goodness of fit
```{r}
#| label: tab-gof-olog-bs
tab_gof_olog_gt
```
#### Forest plot: odds ratios
```{r}
#| label: fig-forest-olog-bs
#| fig-cap: "BS — Ordered logit: odds ratios for belief accuracy. OR > 1 increases probability of higher accuracy score. Error bars = 95% CI. Dashed line = OR 1 (no effect). x-axis log scale."
#| fig-height: 4
#| fig-width: 8
p_forest_olog
```
::: {.callout-note icon="false"}
##### Interpretation
```{r}
#| echo: false
m1_or_masc <- tab_olog_all |> dplyr::filter(model == "M1: MASC only", term == "MASC_z") |> dplyr::pull(OR)
m1_p_masc <- tab_olog_all |> dplyr::filter(model == "M1: MASC only", term == "MASC_z") |> dplyr::pull(p_value)
m2_or_iript <- tab_olog_all |> dplyr::filter(model == "M2: MASC + IRI-PT", term == "IRI_PT_z") |> dplyr::pull(OR)
m2_p_iript <- tab_olog_all |> dplyr::filter(model == "M2: MASC + IRI-PT", term == "IRI_PT_z") |> dplyr::pull(p_value)
m3_or_iribs <- tab_olog_all |> dplyr::filter(model == "M3: MASC + IRI-PT + IRI-PD", term == "IRI_BS_z") |> dplyr::pull(OR)
m3_p_iribs <- tab_olog_all |> dplyr::filter(model == "M3: MASC + IRI-PT + IRI-PD", term == "IRI_BS_z") |> dplyr::pull(p_value)
```
**MASC ToM (M1–M3):** OR = `r m1_or_masc`, p = `r m1_p_masc`. Higher Theory of Mind ability may improve belief accuracy by enabling better prediction of opponents' decisions — an OR > 1 is consistent with this interpretation. **IRI Perspective Taking (M2–M3):** OR = `r m2_or_iript`, p = `r m2_p_iript` — cognitive empathy is directly relevant to inferring opponents' intended strategies; a positive OR would support the link between perspective-taking and prediction accuracy. **IRI Personal Distress (M3):** OR = `r m3_or_iribs`, p = `r m3_p_iribs` — self-oriented distress may interfere with accurate belief formation (OR < 1 expected). Given EPV ≤ `r epv_B1` across all models, all estimates carry substantial uncertainty.
:::
---
### 5.2 — Demand choice when opponent signals Demand
::: {.callout-warning icon="false"}
##### Small-sample note
This analysis restricts the sample to participants who received a **D signal (Demand)** from their opponent (`opp_signal_received = D`). The dependent variable is whether they themselves also chose **D (Demand)**, leading to the (D,D) miscoordination outcome (0, 0). Given the small subsample size, EPV may be below 10 and Firth penalised logit is applied automatically.
:::
#### Models
$$
\begin{aligned}
\text{M1:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} \\
\text{M2:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{quiz\_err} + \beta_2\,\text{CRT} \\
\text{M3:} \quad & \text{logit}\,P(D) = \beta_0 + \beta_1\,\text{CRT}
\end{aligned}
$$
**Sample**: `opp_signal_received = T` only. n = `r n_D`, events = `r n_events_D`. EPV: M1 = `r round(epv_D_m1, 1)`, M2 = `r round(epv_D_m2, 1)`, M3 = `r round(epv_D_m3, 1)`. All estimated with Firth penalised logit.
#### Results
```{r}
#| label: tab-logit-D
#| echo: false
#| message: false
tab_D_gt
```
```{r}
#| label: fig-forest-D
#| echo: false
#| message: false
#| fig-cap: "Demand choice when opponent signals T (Hare) — M1 (quiz only), M2 (quiz + CRT), M3 (CRT only). Odds ratios with 95% CI. OR > 1 increases P(Demand (D)). All Firth. x-axis log scale."
#| fig-height: 3.5
#| fig-width: 8
p_forest_D
```
::: {.callout-note icon="false"}
##### Interpretation
```{r}
#| echo: false
d_or_quiz <- round(exp(coef(m_demand_demand)["log_quiz_err"]), 2)
d_p_quiz <- round(summary(m_demand_demand)$coefficients["log_quiz_err", 4], 3)
d_or_crt <- round(exp(coef(m_demand_demand)["CRT4"]), 2)
d_p_crt <- round(summary(m_demand_demand)$coefficients["CRT4", 4], 3)
```
**Quiz errors** (OR = `r d_or_quiz`, p = `r d_p_quiz`): a higher error rate on the comprehension quiz may reflect lower understanding of the game's conflict structure, potentially increasing stubborn Demand even when the opponent has already signalled Demand (risking (D,D) miscoordination). **CRT score** (OR = `r d_or_crt`, p = `r d_p_crt`): more reflective thinkers may better recognise that when the opponent signals D (Demand), the individually rational response to avoid (0,0) is to Yield (T) and accept the P2 Preferred equilibrium (12, 36) rather than insist on Demand and risk mutual miscoordination. A negative OR for CRT would be consistent with this interpretation. EPV = `r round(epv_D, 1)` — estimates are exploratory and should be interpreted with caution.
:::
```{=html}
</div>
</details>
```