CEFR-Annotated WordNet: LLM-Based Proficiency-Guided Semantic Database for Language Learning
Although WordNet is a valuable resource because of its structured semantic networks and extensive vocabulary, its fine-grained sense distinctions can be challenging for second-language learners. To address this issue, we developed a version of WordNet annotated with the Common European Framework of Reference for Languages (CEFR), integrating its semantic networks with language-proficiency levels. We automated this process using a large language model to measure the semantic similarity between sense definitions in WordNet and entries in the English Vocabulary Profile Online. To validate our approach, we constructed a large-scale corpus containing both sense and CEFR-level information from the annotated WordNet and used it to develop contextual lexical classifiers. Our experiments demonstrate that models fine-tuned on this corpus perform comparably to those fine-tuned on gold-standard annotations. Furthermore, by combining this corpus with the gold-standard data, we developed a practical classifier that achieves a Macro-F1 score of 0.81. This result provides indirect evidence that the transferred labels are largely consistent with the gold-standard levels. The annotated WordNet, corpus, and classifiers are publicly available to help bridge the gap between natural language processing and language education, thereby facilitating more effective and efficient language learning.
💡 Research Summary
The paper tackles a long‑standing obstacle in leveraging WordNet for second‑language (L2) instruction: its fine‑grained sense distinctions and the sheer number of senses per word impose a heavy cognitive load on learners. To make WordNet learner‑friendly, the authors integrate the Common European Framework of Reference for Languages (CEFR) proficiency levels directly onto WordNet senses, creating a “CEFR‑Annotated WordNet.” The CEFR provides six standardized proficiency tiers (A1, A2, B1, B2, C1, C2) that describe concrete communicative abilities, making it an ideal educational meta‑label.
The annotation pipeline consists of three steps. First, for each lemma‑POS pair, glosses are extracted from both WordNet and the English Vocabulary Profile (EVP) online, which already supplies CEFR levels for individual senses. Second, a large language model (LLM)—specifically GPT‑4o—receives a prompt that asks it to rate the semantic similarity between a WordNet gloss and an EVP gloss on a seven‑point scale (1 = identical, 7 = completely different). This rating replaces a binary alignment decision, allowing partial overlaps to be captured. Third, when the similarity score is 1 or 2 (identical or almost identical), the CEFR level attached to the EVP gloss is transferred to the corresponding WordNet sense; otherwise, no label is assigned. Using a threshold of ≤ 2 balances coverage (≈ 80 % of single‑word EVP senses) with precision.
Applying this automatic mapping yields CEFR labels for 10,644 WordNet senses covering 5,645 lemmas. To demonstrate the practical utility of the resource, the authors propagate the labels onto the SemCor 3.0 corpus, producing a “SemCor‑CEFR” dataset with more than 110,000 sense‑level CEFR annotations across nouns, verbs, adjectives, and adverbs. This corpus is roughly ten times larger than existing lexical‑complexity datasets such as CompLex 2.0, and it aligns directly with WordNet’s sense inventory.
Two families of contextual CEFR‑level classifiers are then trained. The first uses only the automatically annotated data; the second combines the gold‑standard EVP annotations with the automatically transferred labels (a hybrid training set). Both models are fine‑tuned from a BERT‑style encoder, and their performance is evaluated with Macro‑F1. The hybrid model achieves a Macro‑F1 of 0.81, comparable to a model trained solely on gold‑standard data, thereby providing indirect evidence that the automatic transfers are largely consistent with human‑curated labels.
The paper’s contributions are fourfold: (1) a CEFR‑Annotated WordNet linking lexical semantics to proficiency standards; (2) a prompt‑only LLM annotation method that is simple, reproducible, and cost‑effective; (3) the large‑scale SemCor‑CEFR corpus, publicly released; and (4) contextual lexical classifiers that validate the annotation quality and offer a ready‑to‑use tool for educational applications. All resources are made available via Zenodo (doi:10.5281/zenodo.17395388).
Limitations are acknowledged. The similarity judgments rely on a single LLM and a manually chosen score threshold, which may introduce subjectivity and affect the precision‑recall trade‑off. The approach is currently limited to English WordNet; extending it to multilingual WordNets and incorporating human validation would strengthen the resource. Nonetheless, the study bridges a gap between natural language processing and language pedagogy, providing a scalable infrastructure for learner‑adapted vocabulary acquisition, automated assessment, and CEFR‑aware CALL systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment