MASC (ToM)

Movie for the Assessment of Social Cognition · GTEMO Experiment

Author

Eric Guerci

Published

March 22, 2026

1 Background

Theory of Mind (ToM) is the ability to attribute mental states — beliefs, intentions, desires, emotions — to others and to understand that these may differ from one’s own. It is a core dimension of social cognition and underlies strategic behaviour in interactive settings: anticipating what others know, want, and believe is a prerequisite for effective communication, negotiation, and cooperation.

The Movie for the Assessment of Social Cognition (MASC) is a validated film-based instrument developed by Dziobek et al. (2006). Participants watch short video clips of social interactions and answer multiple-choice questions about the characters’ thoughts and feelings. The MASC is designed to capture ecological ToM by embedding mental-state inference in naturalistic, dynamic social scenes — closer to real-world interaction than classic vignette-based tasks.

The instrument yields five scores:

Variable	Description	Scale
`MASC_ToM_score`	Total correct ToM responses	0 – 45
`MASC_dimToM_score`	Diminishing errors — under-mentalising	0 – 45
`MASC_excToM_score`	Exceeding errors — over-mentalising	0 – 45
`MASC_noToM_score`	No ToM errors — no mental-state attribution	0 – 45
`MASC_attention_score`	Correct attention-check items (control)	0 – 6

Items are further classified as affective (emotion inference, 17 items) or cognitive (belief/intention inference, 28 items), yielding two proportion scores (MASC_affective_perc_score, MASC_cognitive_perc_score) that allow dissociation of the two ToM components.

2 Data overview

Show code

df |>
  select(game_id,
         MASC_ToM_score, MASC_dimToM_score, MASC_excToM_score,
         MASC_noToM_score, MASC_attention_score,
         MASC_affective_perc_score, MASC_cognitive_perc_score) |>
  skim()

Data summary
Name	select(…)
Number of rows	122
Number of columns	8
_______________________
Column type frequency:
factor	1
numeric	7
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
game_id	0	1	FALSE	4	BS: 32, SH: 32, MP: 30, PD: 28

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
MASC_ToM_score	1	31.90	3.35	21.00	30.00	32.00	34.00	39	▁▂▅▇▂
MASC_dimToM_score	1	5.96	2.60	0.00	4.00	6.00	7.00	14	▂▆▇▂▁
MASC_excToM_score	1	5.59	2.48	0.00	4.00	5.00	7.00	15	▂▇▃▁▁
MASC_noToM_score	1	1.55	1.44	0.00	0.00	1.00	2.00	7	▇▃▂▁▁
MASC_attention_score	1	4.28	1.05	1.00	4.00	4.00	5.00	6	▁▃▆▇▂
MASC_affective_perc_score	1	0.64	0.09	0.39	0.61	0.67	0.67	1	▂▇▇▂▁
MASC_cognitive_perc_score	1	0.58	0.08	0.41	0.52	0.56	0.63	1	▅▇▂▁▁

Descriptive skim of MASC variables including the attention control score.

3 Descriptive statistics by game

Show code

tab_masc

Characteristic	Overall N = 122¹	BS N = 32¹	MP N = 30¹	PD N = 28¹	SH N = 32¹	p-value²	Effect size³
Correct ToM (0–45)	32.000 (30.000, 34.000)	31.500 (29.500, 34.500)	31.000 (29.000, 34.000)	33.000 (30.000, 34.000)	33.500 (31.000, 35.000)	0.126	η² = 0.023 (small)
Diminishing — under-mentalising	6.000 (4.000, 7.000)	6.000 (4.500, 7.000)	7.000 (6.000, 8.000)	5.000 (4.000, 6.000)	6.000 (4.000, 7.000)	0.026	η² = 0.053 (small)
Exceeding — over-mentalising	5.000 (4.000, 7.000)	5.000 (4.000, 7.000)	6.000 (4.000, 7.000)	5.500 (5.000, 7.000)	5.000 (4.000, 6.000)	0.479	η² = -0.004 (small)
No ToM (wrong)						0.360
0	33 (27%)	9 (28%)	6 (20%)	7 (25%)	11 (34%)
1	33 (27%)	7 (22%)	9 (30%)	7 (25%)	10 (31%)
2	31 (25%)	7 (22%)	6 (20%)	9 (32%)	9 (28%)
3	14 (11%)	3 (9.4%)	6 (20%)	3 (11%)	2 (6.3%)
4	8 (6.6%)	4 (13%)	3 (10%)	1 (3.6%)	0 (0%)
6	1 (0.8%)	0 (0%)	0 (0%)	1 (3.6%)	0 (0%)
7	2 (1.6%)	2 (6.3%)	0 (0%)	0 (0%)	0 (0%)
Attention checks correct						0.907
1	1 (0.8%)	0 (0%)	0 (0%)	0 (0%)	1 (3.1%)
2	5 (4.1%)	1 (3.1%)	2 (6.7%)	1 (3.6%)	1 (3.1%)
3	22 (18%)	4 (13%)	5 (17%)	7 (25%)	6 (19%)
4	37 (30%)	9 (28%)	7 (23%)	11 (39%)	10 (31%)
5	45 (37%)	13 (41%)	13 (43%)	8 (29%)	11 (34%)
6	12 (9.8%)	5 (16%)	3 (10%)	1 (3.6%)	3 (9.4%)
Affective ToM (proportion correct)	0.667 (0.611, 0.667)	0.611 (0.556, 0.667)	0.611 (0.556, 0.667)	0.667 (0.611, 0.667)	0.667 (0.611, 0.722)	0.407	η² = -0.001 (small)
Cognitive ToM (proportion correct)	0.556 (0.519, 0.630)	0.593 (0.519, 0.630)	0.593 (0.556, 0.630)	0.556 (0.500, 0.611)	0.556 (0.519, 0.611)	0.465	η² = -0.004 (small)
¹ Median (Q1, Q3); n (%)
² Kruskal-Wallis rank sum test; Pearson’s Chi-squared test with simulated p-value (based on 2000 replicates)
³ η² (Kruskal-Wallis). Small / medium / large: η² ≥ 0.01 / 0.06 / 0.14.

Note

Statistics are median (Q1, Q3). The Kruskal-Wallis test checks whether distributions differ across the 4 games; η² quantifies the effect size (small ≥ 0.01, medium ≥ 0.06, large ≥ 0.14). A significant p indicates heterogeneity in ToM profiles across games — relevant for interpreting group-level strategic differences in the behaviour sections.

4 Affective vs cognitive ToM

4.1 Sample-level summary

Show code

df |>
  select(`Affective ToM` = MASC_affective_perc_score,
         `Cognitive ToM` = MASC_cognitive_perc_score) |>
  pivot_longer(everything(), names_to = "Dimension", values_to = "score") |>
  group_by(Dimension) |>
  summarise(
    Median = median(score, na.rm = TRUE),
    Q1     = quantile(score, 0.25, na.rm = TRUE),
    Q3     = quantile(score, 0.75, na.rm = TRUE),
    .groups = "drop"
  ) |>
  gt() |>
  fmt_number(columns = c(Median, Q1, Q3), decimals = 3) |>
  tab_header(title = "Affective vs Cognitive ToM: sample-level summary (median, IQR)")

Dimension	Median	Q1	Q3
Affective vs Cognitive ToM: sample-level summary (median, IQR)
Affective ToM	0.667	0.611	0.667
Cognitive ToM	0.556	0.519	0.630

The following tests whether affective and cognitive ToM accuracy differ within individuals (paired Wilcoxon signed-rank, as scores are bounded proportions).

Show code

tibble(
  Statistic      = c("V (Wilcoxon)", "p-value", "Pseudo-median diff. (H-L)",
                     "95% CI lower", "95% CI upper",
                     "Effect size r", "Magnitude"),
  Value          = c(
    round(wilcox_res$statistic,  1),
    signif(wilcox_res$p.value,   3),
    round(wilcox_res$estimate,   4),
    round(wilcox_res$conf.int[1],4),
    round(wilcox_res$conf.int[2],4),
    round(wilcox_es$effsize,     3),
    as.character(wilcox_es$magnitude)
  )
) |>
  gt() |>
  tab_header(
    title    = "Wilcoxon signed-rank: Affective vs Cognitive ToM",
    subtitle = "Pseudo-median difference = Hodges-Lehmann estimator (Affective \u2212 Cognitive)"
  ) |>
  tab_style(style = cell_text(weight = "bold"),
            locations = cells_column_labels())

Statistic	Value
Wilcoxon signed-rank: Affective vs Cognitive ToM
Pseudo-median difference = Hodges-Lehmann estimator (Affective − Cognitive)
V (Wilcoxon)	5451
p-value	5.29e-08
Pseudo-median diff. (H-L)	0.0649
95% CI lower	0.0463
95% CI upper	0.0834
Effect size r	0.494
Magnitude	moderate

5 Figures

5.1 Response type distribution

Show code

p_stacked

Figure 1: Average proportion of the 4 MASC response types per experimental condition. Correct responses dominate; diminishing (under-mentalising) is the most frequent error type, consistent with non-clinical samples.

5.2 ToM score distribution by game

Show code

p_violin_tom

Figure 2: Distribution of total correct ToM score (0–45) by experimental condition.

5.3 MASC dimensions heatmap

Show code

p_heat_masc

Figure 3: Within-variable standardised means (z-scores) across games — colour encodes relative position within each dimension. Cell labels show raw means; Affective (%) and Cognitive (%) labels are multiplied ×100 for readability.

5.4 Affective vs Cognitive ToM by game

Show code

p_aff_cog

Figure 4: Distributions of affective and cognitive ToM accuracy by experimental condition. Accuracy displayed as proportion (0–1). Subtitles report Kruskal-Wallis tests (η²) across the 4 game conditions separately for each ToM dimension.

5.5 Cognitive vs affective ToM scatter (pooled sample)

Show code

p_scatter

Figure 5: Pooled scatter of cognitive vs affective ToM accuracy with a single OLS regression line (grey). The top-left label reports β, R², and significance. Points coloured by game.

5.6 Attention control vs ToM scores

The MASC includes attention-check items that do not require mental-state inference. Correlating the attention score with ToM scores helps assess whether performance differences are driven by general task engagement rather than ToM ability per se.

Show code

p_attention_panel

Figure 6: Scatter plots of the MASC attention-check score against overall ToM score (left), affective ToM accuracy (centre), and cognitive ToM accuracy (right). OLS line with 95% CI; annotation reports β, R², p-value. Points coloured by game.

Note

A strong positive association between attention score and ToM scores would indicate that overall task engagement (rather than ToM specifically) drives performance. A weak or absent association is more consistent with ToM scores reflecting the construct of interest.

6 Conditioning on gender and role

Note

Distributions stratified by gender and role, followed by OLS models with game, gender, and role entered simultaneously. Reference category: game = BS, gender = Male, role = P1 (LEEN).

6.1 MASC by gender and role (pooled)

Show code

p_masc_cond_pooled

Figure 7: MASC overall ToM score by gender (left) and experimental role (right), pooled across game conditions. Mann-Whitney U test with rank-biserial r effect size.

6.2 OLS with demographic controls

Show code

gt_ols_masc

Outcome	Predictor	β	SE	95% CI lo	95% CI hi	t	p
OLS: MASC accuracy ~ game + gender + role
OLS. Reference: game = BS, gender = Male, role = P1 (LEEN). 95% CI from confint().
Outcome: Cognitive ToM (%)
Cognitive ToM (%)	Game: MP vs BS	0.006	0.020	-0.034	0.046	0.306	0.76
Cognitive ToM (%)	Game: PD vs BS	-0.021	0.021	-0.062	0.020	-1.030	0.305
Cognitive ToM (%)	Game: SH vs BS	0.001	0.020	-0.038	0.040	0.058	0.954
Cognitive ToM (%)	genderMale	0.014	0.014	-0.015	0.042	0.962	0.338
Cognitive ToM (%)	Role: CoCoLab vs LEEN	0.001	0.014	-0.028	0.029	0.042	0.966
Outcome: Affective ToM (%)
Affective ToM (%)	Game: MP vs BS	-0.011	0.023	-0.057	0.035	-0.473	0.637
Affective ToM (%)	Game: PD vs BS	0.016	0.024	-0.030	0.063	0.690	0.492
Affective ToM (%)	Game: SH vs BS	0.030	0.023	-0.015	0.075	1.299	0.197
Affective ToM (%)	genderMale	0.026	0.017	-0.007	0.059	1.575	0.118
Affective ToM (%)	Role: CoCoLab vs LEEN	-0.019	0.016	-0.052	0.013	-1.162	0.248
Outcome: Overall ToM (0–40)
Overall ToM (0–40)	Game: MP vs BS	-0.788	0.854	-2.479	0.903	-0.923	0.358
Overall ToM (0–40)	Game: PD vs BS	0.633	0.870	-1.091	2.356	0.727	0.469
Overall ToM (0–40)	Game: SH vs BS	0.750	0.840	-0.913	2.413	0.893	0.374
Overall ToM (0–40)	genderMale	0.140	0.610	-1.068	1.348	0.230	0.819
Overall ToM (0–40)	Role: CoCoLab vs LEEN	0.131	0.608	-1.073	1.336	0.216	0.83
β = OLS coefficient. * p < .05 p < .01 * p < .001.

Show code

p_forest_masc8

Figure 8: Forest plot: OLS β with 95% CI for MASC outcomes (game + gender + role). Top panel: Overall ToM (0–45 scale); bottom panel: Affective ToM and Cognitive ToM overlaid (both proportion scale, 0–1). Free x-axis per panel. Dashed line = 0.

Note

Interpretation. Game coefficients represent the conditional effect of game assignment given equal gender and role composition. A game effect that is significant unconditionally (Kruskal-Wallis) but non-significant here suggests partial confounding by demographics.

7 Response times

7.1 MASC response times by dimension

Show code

tab_resp_times

Characteristic	Overall N = 122¹	BS N = 32¹	MP N = 30¹	PD N = 28¹	SH N = 32¹	p-value²	Effect size³
MASC – avg response time (all items)	11.4 (9.6, 13.7)	11.6 (9.6, 14.0)	11.1 (9.4, 13.9)	11.1 (9.5, 13.3)	11.7 (10.1, 14.0)	0.913	η² = -0.021 (small)
MASC – avg response time (affective)	11.5 (9.9, 14.1)	11.4 (9.7, 14.3)	11.3 (9.2, 14.1)	11.4 (9.7, 13.5)	11.6 (10.3, 14.3)	0.842	η² = -0.018 (small)
MASC – avg response time (cognitive)	11.4 (9.5, 13.7)	11.7 (9.2, 14.5)	11.3 (9.2, 14.4)	11.1 (9.4, 13.3)	11.6 (9.7, 13.4)	0.955	η² = -0.023 (small)
IRI – total time (28 items)	190.0 (158.0, 237.0)	184.0 (149.0, 237.5)	175.5 (159.0, 233.0)	192.0 (157.0, 230.0)	196.5 (165.5, 238.5)	0.740	η² = -0.015 (small)
CRT – total time (4 items)	58.0 (46.0, 76.0)	51.5 (42.0, 73.0)	57.0 (45.0, 76.0)	62.0 (51.5, 73.5)	58.0 (52.0, 83.0)	0.365	η² = 0.002 (small)
¹ Median (Q1, Q3)
² Kruskal-Wallis rank sum test
³ η² (Kruskal-Wallis). Small / medium / large: η² ≥ 0.01 / 0.06 / 0.14.

7.2 Speed–accuracy trade-off

Show code

p_masc_rt

Figure 9: Distribution of MASC response times by dimension (overall / affective / cognitive) across games. Faster response times may reflect overconfidence or heuristic use; slower times suggest deliberative mentalising.

Show code

p_masc_rt_cond

Figure 10: MASC average response times (Overall / Affective / Cognitive) by gender (top) and experimental role (bottom), pooled across game conditions. Mann-Whitney U test with rank-biserial r annotated per facet.

Note

Response time differences by gender and role. A significant Mann-Whitney result would indicate that one group responds systematically faster or slower across MASC dimensions — potentially reflecting differences in deliberative mentalising strategies, not just accuracy. Results should be interpreted alongside the ToM score conditioning (see § Conditioning on gender and role).

8 Preliminary interpretation

The sample shows a median MASC ToM score of 32 (IQR = 4) out of 40 items, consistent with adequate mentalising ability in a non-clinical adult population. The affective component (median 66.7%) and the cognitive component (median 55.6%) are compared within individuals: the Wilcoxon signed-rank test yields p = 5.3^{-8}, with a moderate effect size (r = 0.49), suggesting a statistically significant difference between the two ToM dimensions at the sample level.

Differences in MASC profiles across games are informative to the extent that randomisation was imperfect or that participant sorting occurred. Any significant Kruskal-Wallis effects will be noted as potential covariates in the inferential sections.