Modeling Judges’ Scores in Artistic Gymnastics

Table 2: Comparison of the judges’ original scores for the evaluation trials and the predicted scores from the different model variants (means ± standard deviations), as well as the results of the Wilcoxon rank-sum test.

	Model Variant
Appa-ratus	Best/ Worst	Z	p	Nearest Neighbor	Z	p	Three out of Five	Z	p	Recurrent Neural Network	Z	p
Floor	3.73 ± 1.22	1.47	.140	3.46 ± 1.75	1.85	.063	3.84 ± 1.69	0.34	.735	3.70 ± 0.79	2.18	.029*
Beam	3.87 ± 2.38	0.85	.394	3.65 ± 2.13	0.31	.756	3.91 ± 2.11	0.89	.372	4.30 ± 1.00	0.86	.390
Vault	3.30 ± 2.10	0.06	.947	3.34 ± 1.89	0.07	.946	3.53 ± 1.75	0.39	.695	3.10 ± 1.11	1.88	.060

Note: * denotes a statistically significant difference between the original and predicted scores.