Movie for the Assessment of Social Cognition · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

1 Background

Theory of Mind (ToM) is the ability to attribute mental states — beliefs, intentions, desires, emotions — to others and to understand that these may differ from one’s own. It is a core dimension of social cognition and underlies strategic behaviour in interactive settings: anticipating what others know, want, and believe is a prerequisite for effective communication, negotiation, and cooperation.

The Movie for the Assessment of Social Cognition (MASC) is a validated film-based instrument developed by Dziobek et al. (2006). Participants watch short video clips of social interactions and answer multiple-choice questions about the characters’ thoughts and feelings. The MASC is designed to capture ecological ToM by embedding mental-state inference in naturalistic, dynamic social scenes — closer to real-world interaction than classic vignette-based tasks.

The instrument yields five scores:

Variable Description Scale
MASC_ToM_score Total correct ToM responses 0 – 45
MASC_dimToM_score Diminishing errors — under-mentalising 0 – 45
MASC_excToM_score Exceeding errors — over-mentalising 0 – 45
MASC_noToM_score No ToM errors — no mental-state attribution 0 – 45
MASC_attention_score Correct attention-check items (control) 0 – 6

Items are further classified as affective (emotion inference, 17 items) or cognitive (belief/intention inference, 28 items), yielding two proportion scores (MASC_affective_perc_score, MASC_cognitive_perc_score) that allow dissociation of the two ToM components.

2 Data overview

Show code
df |>
  select(game_id,
         MASC_ToM_score, MASC_dimToM_score, MASC_excToM_score,
         MASC_noToM_score, MASC_attention_score,
         MASC_affective_perc_score, MASC_cognitive_perc_score) |>
  skim()
Data summary
Name select(…)
Number of rows 122
Number of columns 8
_______________________
Column type frequency:
factor 1
numeric 7
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
game_id 0 1 FALSE 4 BS: 32, SH: 32, MP: 30, PD: 28

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
MASC_ToM_score 0 1 31.90 3.35 21.00 30.00 32.00 34.00 39 ▁▂▅▇▂
MASC_dimToM_score 0 1 5.96 2.60 0.00 4.00 6.00 7.00 14 ▂▆▇▂▁
MASC_excToM_score 0 1 5.59 2.48 0.00 4.00 5.00 7.00 15 ▂▇▃▁▁
MASC_noToM_score 0 1 1.55 1.44 0.00 0.00 1.00 2.00 7 ▇▃▂▁▁
MASC_attention_score 0 1 4.28 1.05 1.00 4.00 4.00 5.00 6 ▁▃▆▇▂
MASC_affective_perc_score 0 1 0.64 0.09 0.39 0.61 0.67 0.67 1 ▂▇▇▂▁
MASC_cognitive_perc_score 0 1 0.58 0.08 0.41 0.52 0.56 0.63 1 ▅▇▂▁▁

Descriptive skim of MASC variables including the attention control score.

3 Descriptive statistics by game

Show code
tab_masc
Characteristic Overall
N = 1221
BS
N = 321
MP
N = 301
PD
N = 281
SH
N = 321
p-value2 Effect size3
Correct ToM (0–45) 32.000 (30.000, 34.000) 31.500 (29.500, 34.500) 31.000 (29.000, 34.000) 33.000 (30.000, 34.000) 33.500 (31.000, 35.000) 0.126 η² = 0.023 (small)
Diminishing — under-mentalising 6.000 (4.000, 7.000) 6.000 (4.500, 7.000) 7.000 (6.000, 8.000) 5.000 (4.000, 6.000) 6.000 (4.000, 7.000) 0.026 η² = 0.053 (small)
Exceeding — over-mentalising 5.000 (4.000, 7.000) 5.000 (4.000, 7.000) 6.000 (4.000, 7.000) 5.500 (5.000, 7.000) 5.000 (4.000, 6.000) 0.479 η² = -0.004 (small)
No ToM (wrong)




0.360
    0 33 (27%) 9 (28%) 6 (20%) 7 (25%) 11 (34%)

    1 33 (27%) 7 (22%) 9 (30%) 7 (25%) 10 (31%)

    2 31 (25%) 7 (22%) 6 (20%) 9 (32%) 9 (28%)

    3 14 (11%) 3 (9.4%) 6 (20%) 3 (11%) 2 (6.3%)

    4 8 (6.6%) 4 (13%) 3 (10%) 1 (3.6%) 0 (0%)

    6 1 (0.8%) 0 (0%) 0 (0%) 1 (3.6%) 0 (0%)

    7 2 (1.6%) 2 (6.3%) 0 (0%) 0 (0%) 0 (0%)

Attention checks correct




0.907
    1 1 (0.8%) 0 (0%) 0 (0%) 0 (0%) 1 (3.1%)

    2 5 (4.1%) 1 (3.1%) 2 (6.7%) 1 (3.6%) 1 (3.1%)

    3 22 (18%) 4 (13%) 5 (17%) 7 (25%) 6 (19%)

    4 37 (30%) 9 (28%) 7 (23%) 11 (39%) 10 (31%)

    5 45 (37%) 13 (41%) 13 (43%) 8 (29%) 11 (34%)

    6 12 (9.8%) 5 (16%) 3 (10%) 1 (3.6%) 3 (9.4%)

Affective ToM (proportion correct) 0.667 (0.611, 0.667) 0.611 (0.556, 0.667) 0.611 (0.556, 0.667) 0.667 (0.611, 0.667) 0.667 (0.611, 0.722) 0.407 η² = -0.001 (small)
Cognitive ToM (proportion correct) 0.556 (0.519, 0.630) 0.593 (0.519, 0.630) 0.593 (0.556, 0.630) 0.556 (0.500, 0.611) 0.556 (0.519, 0.611) 0.465 η² = -0.004 (small)
1 Median (Q1, Q3); n (%)
2 Kruskal-Wallis rank sum test; Pearson’s Chi-squared test with simulated p-value (based on 2000 replicates)
3 η² (Kruskal-Wallis). Small / medium / large: η² ≥ 0.01 / 0.06 / 0.14.
Note

Statistics are median (Q1, Q3). The Kruskal-Wallis test checks whether distributions differ across the 4 games; η² quantifies the effect size (small ≥ 0.01, medium ≥ 0.06, large ≥ 0.14). A significant p indicates heterogeneity in ToM profiles across games — relevant for interpreting group-level strategic differences in the behaviour sections.

4 Affective vs cognitive ToM

4.1 Sample-level summary

Show code
df |>
  select(`Affective ToM` = MASC_affective_perc_score,
         `Cognitive ToM` = MASC_cognitive_perc_score) |>
  pivot_longer(everything(), names_to = "Dimension", values_to = "score") |>
  group_by(Dimension) |>
  summarise(
    Median = median(score, na.rm = TRUE),
    Q1     = quantile(score, 0.25, na.rm = TRUE),
    Q3     = quantile(score, 0.75, na.rm = TRUE),
    .groups = "drop"
  ) |>
  gt() |>
  fmt_number(columns = c(Median, Q1, Q3), decimals = 3) |>
  tab_header(title = "Affective vs Cognitive ToM: sample-level summary (median, IQR)")
Affective vs Cognitive ToM: sample-level summary (median, IQR)
Dimension Median Q1 Q3
Affective ToM 0.667 0.611 0.667
Cognitive ToM 0.556 0.519 0.630

The following tests whether affective and cognitive ToM accuracy differ within individuals (paired Wilcoxon signed-rank, as scores are bounded proportions).

Show code
tibble(
  Statistic      = c("V (Wilcoxon)", "p-value", "Pseudo-median diff. (H-L)",
                     "95% CI lower", "95% CI upper",
                     "Effect size r", "Magnitude"),
  Value          = c(
    round(wilcox_res$statistic,  1),
    signif(wilcox_res$p.value,   3),
    round(wilcox_res$estimate,   4),
    round(wilcox_res$conf.int[1],4),
    round(wilcox_res$conf.int[2],4),
    round(wilcox_es$effsize,     3),
    as.character(wilcox_es$magnitude)
  )
) |>
  gt() |>
  tab_header(
    title    = "Wilcoxon signed-rank: Affective vs Cognitive ToM",
    subtitle = "Pseudo-median difference = Hodges-Lehmann estimator (Affective \u2212 Cognitive)"
  ) |>
  tab_style(style = cell_text(weight = "bold"),
            locations = cells_column_labels())
Wilcoxon signed-rank: Affective vs Cognitive ToM
Pseudo-median difference = Hodges-Lehmann estimator (Affective − Cognitive)
Statistic Value
V (Wilcoxon) 5451
p-value 5.29e-08
Pseudo-median diff. (H-L) 0.0649
95% CI lower 0.0463
95% CI upper 0.0834
Effect size r 0.494
Magnitude moderate

5 Figures

5.1 Response type distribution

Show code
p_stacked
Figure 1: Average proportion of the 4 MASC response types per experimental condition. Correct responses dominate; diminishing (under-mentalising) is the most frequent error type, consistent with non-clinical samples.

5.2 ToM score distribution by game

Show code
p_violin_tom
Figure 2: Distribution of total correct ToM score (0–45) by experimental condition.

5.3 MASC dimensions heatmap

Show code
p_heat_masc
Figure 3: Within-variable standardised means (z-scores) across games — colour encodes relative position within each dimension. Cell labels show raw means; Affective (%) and Cognitive (%) labels are multiplied ×100 for readability.

5.4 Affective vs Cognitive ToM by game

Show code
p_aff_cog
Figure 4: Distributions of affective and cognitive ToM accuracy by experimental condition. Accuracy displayed as proportion (0–1). Subtitles report Kruskal-Wallis tests (η²) across the 4 game conditions separately for each ToM dimension.

5.5 Cognitive vs affective ToM scatter (pooled sample)

Show code
p_scatter
Figure 5: Pooled scatter of cognitive vs affective ToM accuracy with a single OLS regression line (grey). The top-left label reports β, R², and significance. Points coloured by game.

5.6 Attention control vs ToM scores

The MASC includes attention-check items that do not require mental-state inference. Correlating the attention score with ToM scores helps assess whether performance differences are driven by general task engagement rather than ToM ability per se.

Show code
p_attention_panel
Figure 6: Scatter plots of the MASC attention-check score against overall ToM score (left), affective ToM accuracy (centre), and cognitive ToM accuracy (right). OLS line with 95% CI; annotation reports β, R², p-value. Points coloured by game.
Note

A strong positive association between attention score and ToM scores would indicate that overall task engagement (rather than ToM specifically) drives performance. A weak or absent association is more consistent with ToM scores reflecting the construct of interest.

6 Conditioning on gender and role

Note

Distributions stratified by gender and role, followed by OLS models with game, gender, and role entered simultaneously. Reference category: game = BS, gender = Male, role = P1 (LEEN).

6.1 MASC by gender and role (pooled)

Show code
p_masc_cond_pooled
Figure 7: MASC overall ToM score by gender (left) and experimental role (right), pooled across game conditions. Mann-Whitney U test with rank-biserial r effect size.

6.2 OLS with demographic controls

Show code
gt_ols_masc
OLS: MASC accuracy ~ game + gender + role
OLS. Reference: game = BS, gender = Male, role = P1 (LEEN). 95% CI from confint().
Outcome Predictor β SE 95% CI lo 95% CI hi t p Sig.
Outcome: Cognitive ToM (%)
Cognitive ToM (%) Game: MP vs BS 0.006 0.020 -0.034 0.046 0.306 0.76
Cognitive ToM (%) Game: PD vs BS -0.021 0.021 -0.062 0.020 -1.030 0.305
Cognitive ToM (%) Game: SH vs BS 0.001 0.020 -0.038 0.040 0.058 0.954
Cognitive ToM (%) genderMale 0.014 0.014 -0.015 0.042 0.962 0.338
Cognitive ToM (%) Role: CoCoLab vs LEEN 0.001 0.014 -0.028 0.029 0.042 0.966
Outcome: Affective ToM (%)
Affective ToM (%) Game: MP vs BS -0.011 0.023 -0.057 0.035 -0.473 0.637
Affective ToM (%) Game: PD vs BS 0.016 0.024 -0.030 0.063 0.690 0.492
Affective ToM (%) Game: SH vs BS 0.030 0.023 -0.015 0.075 1.299 0.197
Affective ToM (%) genderMale 0.026 0.017 -0.007 0.059 1.575 0.118
Affective ToM (%) Role: CoCoLab vs LEEN -0.019 0.016 -0.052 0.013 -1.162 0.248
Outcome: Overall ToM (0–40)
Overall ToM (0–40) Game: MP vs BS -0.788 0.854 -2.479 0.903 -0.923 0.358
Overall ToM (0–40) Game: PD vs BS 0.633 0.870 -1.091 2.356 0.727 0.469
Overall ToM (0–40) Game: SH vs BS 0.750 0.840 -0.913 2.413 0.893 0.374
Overall ToM (0–40) genderMale 0.140 0.610 -1.068 1.348 0.230 0.819
Overall ToM (0–40) Role: CoCoLab vs LEEN 0.131 0.608 -1.073 1.336 0.216 0.83
β = OLS coefficient. * p < .05 ** p < .01 *** p < .001.
Show code
p_forest_masc8
Figure 8: Forest plot: OLS β with 95% CI for MASC outcomes (game + gender + role). Top panel: Overall ToM (0–45 scale); bottom panel: Affective ToM and Cognitive ToM overlaid (both proportion scale, 0–1). Free x-axis per panel. Dashed line = 0.
Note

Interpretation. Game coefficients represent the conditional effect of game assignment given equal gender and role composition. A game effect that is significant unconditionally (Kruskal-Wallis) but non-significant here suggests partial confounding by demographics.

7 Response times

7.1 MASC response times by dimension

Show code
tab_resp_times
Characteristic Overall
N = 1221
BS
N = 321
MP
N = 301
PD
N = 281
SH
N = 321
p-value2 Effect size3
MASC – avg response time (all items) 11.4 (9.6, 13.7) 11.6 (9.6, 14.0) 11.1 (9.4, 13.9) 11.1 (9.5, 13.3) 11.7 (10.1, 14.0) 0.913 η² = -0.021 (small)
MASC – avg response time (affective) 11.5 (9.9, 14.1) 11.4 (9.7, 14.3) 11.3 (9.2, 14.1) 11.4 (9.7, 13.5) 11.6 (10.3, 14.3) 0.842 η² = -0.018 (small)
MASC – avg response time (cognitive) 11.4 (9.5, 13.7) 11.7 (9.2, 14.5) 11.3 (9.2, 14.4) 11.1 (9.4, 13.3) 11.6 (9.7, 13.4) 0.955 η² = -0.023 (small)
IRI – total time (28 items) 190.0 (158.0, 237.0) 184.0 (149.0, 237.5) 175.5 (159.0, 233.0) 192.0 (157.0, 230.0) 196.5 (165.5, 238.5) 0.740 η² = -0.015 (small)
CRT – total time (4 items) 58.0 (46.0, 76.0) 51.5 (42.0, 73.0) 57.0 (45.0, 76.0) 62.0 (51.5, 73.5) 58.0 (52.0, 83.0) 0.365 η² = 0.002 (small)
1 Median (Q1, Q3)
2 Kruskal-Wallis rank sum test
3 η² (Kruskal-Wallis). Small / medium / large: η² ≥ 0.01 / 0.06 / 0.14.

7.2 Speed–accuracy trade-off

Show code
p_masc_rt
Figure 9: Distribution of MASC response times by dimension (overall / affective / cognitive) across games. Faster response times may reflect overconfidence or heuristic use; slower times suggest deliberative mentalising.
Show code
p_masc_rt_cond
Figure 10: MASC average response times (Overall / Affective / Cognitive) by gender (top) and experimental role (bottom), pooled across game conditions. Mann-Whitney U test with rank-biserial r annotated per facet.
Note

Response time differences by gender and role. A significant Mann-Whitney result would indicate that one group responds systematically faster or slower across MASC dimensions — potentially reflecting differences in deliberative mentalising strategies, not just accuracy. Results should be interpreted alongside the ToM score conditioning (see § Conditioning on gender and role).

8 Preliminary interpretation

The sample shows a median MASC ToM score of 32 (IQR = 4) out of 40 items, consistent with adequate mentalising ability in a non-clinical adult population. The affective component (median 66.7%) and the cognitive component (median 55.6%) are compared within individuals: the Wilcoxon signed-rank test yields p = 5.3^{-8}, with a moderate effect size (r = 0.49), suggesting a statistically significant difference between the two ToM dimensions at the sample level.

Differences in MASC profiles across games are informative to the extent that randomisation was imperfect or that participant sorting occurred. Any significant Kruskal-Wallis effects will be noted as potential covariates in the inferential sections.