Maximizing Diver Score by Examining Discrepancies in Diver Competency and Judges' Marks

Maximizing Diver Score by Examining Discrepancies in Diver Competency and Judges' Marks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Central to diving competitions is the diver’s ``dive list’’, which is the list of dives an athlete will perform during a competition. Creating a dive list that contains enough difficulty to be competitive yet not beyond the capability of the diver is an important consideration in diving. In this work, we examine the discrepancy between a diver’s ability and judges’ scores in springboard diving meets with the purpose of discovering biases in scoring that might aid a diver in completing a dive list. As a measure of the ability of a diver, we calculate a mean score for all dives and all meets in which the diver has participated. We call this mean score a diver’s competency score. We use the difference between judges’ scores within a given meet and the diver’s competency to define a discrepancy: the difference between a judge’s estimation of a diver’s ability and their true ability. The notions of competency and discrepancy are applied to a data set, gathered from divemeets.com for high-school one meter diving competitions in the US from 2017 to 2022.


💡 Research Summary

The paper investigates scoring discrepancies in U.S. high‑school one‑meter springboard diving by comparing each diver’s “competency score” – defined as the average of all scores the diver has earned across all meets from 2017 to 2022 – with the individual judges’ marks awarded in specific meets. Using a dataset scraped from divemeets.com, the authors assembled roughly 38,000 rows of data covering 14‑19‑year‑old athletes, both genders, and a variety of meets (dual and prestige). For each dive the dataset contains the degree of difficulty (DD, ranging from 1.2 to 3.2), round number (1‑11), direction, position, number of half‑rotations, each judge’s raw score, the trimmed‑mean net score (k‑percent trimming according to the number of judges), and the final “award” score obtained by multiplying the net score by the DD.

The methodological framework proceeds in three steps. First, a diver’s competency score is calculated as the simple arithmetic mean of that diver’s award scores across all recorded dives. Second, for every judge‑diver‑round observation the authors compute a “discrepancy” as the difference between the judge’s raw score and the diver’s competency score, interpreting a positive discrepancy as the judge awarding more than the diver’s historical average would predict. Third, these discrepancies are aggregated by difficulty level and by round to assess two primary forms of bias: difficulty bias (systematic over‑ or under‑scoring of high‑DD dives) and round bias (systematic variation of scores across the sequence of rounds).

The analysis reveals that judges tend to award slightly higher raw scores on high‑DD dives than would be expected from the diver’s overall average. This difficulty bias suggests a strategic opportunity: divers could place their most difficult dives early in a meet to capitalize on the inflated scoring tendency. Additionally, the authors observe a modest but consistent pattern of round‑related variation—scores are marginally higher in early rounds, dip in middle rounds, and rise again toward the final rounds—indicating a possible round bias. Because the dataset lacks information on dive order, seeding, or judge affiliation, the study cannot evaluate conformity, order, or reputation biases that have been documented in elite international competitions.

Several limitations are acknowledged. The competency score is a crude average that does not capture temporal improvement, intra‑season variability, or the effect of different meet types (dual vs. prestige). The use of a trimmed‑mean net score follows standard practice but the paper does not test the sensitivity of results to the trimming proportion or to individual judge consistency. No formal statistical modeling (e.g., mixed‑effects regression) is employed to quantify the significance of the observed biases, nor are confidence intervals reported for discrepancy estimates. Moreover, the absence of dive‑order and seeding data precludes analysis of sequential or seed bias, which are known to affect subjective scoring in other sports.

Despite these constraints, the work makes a valuable contribution by assembling a large, publicly available high‑school diving dataset and by introducing the competency‑discrepancy framework as a practical tool for athletes and coaches. The findings provide actionable insight: divers can tailor their dive lists to exploit identified scoring tendencies, potentially improving competitive outcomes without altering technical execution.

Future research directions proposed include: (1) extending the competency model to a hierarchical or longitudinal framework that accounts for age, gender, and meet prestige; (2) estimating individual judge bias parameters using mixed‑effects models to produce calibrated scores; (3) integrating video‑based objective metrics (e.g., splash size, rotation speed) to develop a hybrid “fairness score” that blends subjective and objective components; and (4) collecting additional metadata such as dive order, seeding, and judge affiliation to enable a comprehensive assessment of conformity, order, and reputation biases. Such extensions would deepen our understanding of scoring dynamics in diving and could inform policy changes aimed at enhancing fairness and transparency in the sport.


Comments & Academic Discussion

Loading comments...

Leave a Comment