Navigating Taxonomic Expansions of Entity Sets Driven by Knowledge Bases

Reading time: 5 minute
...

📝 Original Info

  • Title: Navigating Taxonomic Expansions of Entity Sets Driven by Knowledge Bases
  • ArXiv ID: 2512.16953
  • Date: 2025-12-17
  • Authors: Giovanni Amendola, Pietro Cofone, Marco Manna, Aldo Ricioppo

📝 Abstract

Recognizing similarities among entities is central to both human cognition and computational intelligence. Within this broader landscape, Entity Set Expansion is one prominent task aimed at taking an initial set of (tuples of) entities and identifying additional ones that share relevant semantic properties with the former -- potentially repeating the process to form increasingly broader sets. However, this ``linear'' approach does not unveil the richer ``taxonomic'' structures present in knowledge resources. A recent logic-based framework introduces the notion of an expansion graph: a rooted directed acyclic graph where each node represents a semantic generalization labeled by a logical formula, and edges encode strict semantic inclusion. This structure supports taxonomic expansions of entity sets driven by knowledge bases. Yet, the potentially large size of such graphs may make full materialization impractical in real-world scenarios. To overcome this, we formalize reasoning tasks that check whether two tuples belong to comparable, incomparable, or the same nodes in the graph. Our results show that, under realistic assumptions -- such as bounding the input or limiting entity descriptions -- these tasks can be implemented efficiently. This enables local, incremental navigation of expansion graphs, supporting practical applications without requiring full graph construction.

💡 Deep Analysis

Figure 1

📄 Full Content

1. Introduction 1.1. Context. Similarities play a key role in many real-world scenarios, underpinning both cognitive processes and computational tasks [H + 23]. Humans routinely identify similarities and differences among entities or sets of entities to categorize objects, assess qualities, make informed choices, and stimulate creative thinking. At the heart of these activities lies the recognition of interconnected properties shared by entities, hereinafter referred to as nexus of similarity. This capacity extends to both tangible and intangible domains, shaping reasoning patterns and influencing behaviors in diverse contexts, from everyday judgments to strategic planning.

Researchers from various fields have proposed a variety of approaches to measure semantic similarity between entities, typically expressed as descriptive ratings or numerical scores [GF13]. For instance, modern systems can assign a high similarity score to the pair ⟨Paris⟩ and ⟨Rome⟩ by taking into account somehow that both are “European cities”, “places situated on rivers”, “capitals”, and so forth.

Inspired by “Google Sets” [Cir07], considerable academic and commercial efforts have been also devoted to providing solutions for expanding a given set of entities with similar ones. The main tasks are: entity set expansion [PCB + 09], entity recommendation [BCMT13], tuples expansion [EAB16], or entity suggestion [Z + 17]. For example, one can expand the set U = {⟨Paris⟩, ⟨Rome⟩} and obtain U ′ = U ∪ {⟨Amsterdam⟩}; then, one can reapply the process starting from U ′ to obtain the set U ′′ = U ′ ∪ {⟨Brussels⟩, ⟨Rio de Janeiro⟩, ⟨Vienna⟩}. Indeed, all these entities share one or more of the aforementioned properties, e.g., “places situated on rivers” or “European cities”.

Complementary approaches, ranging from Description Logics [CBH92] to Semantic Web [CDGS16,PKGH19] and Database Theory [tCDFL23], studied the task of explaining (i.e., recognizing and formally expressing) nexus of similarity (a.k.a. commonalities) between entities within a Knowledge Base (KB) [CBH92, BKM99, BST07, CDGS16, HGJ17, PSGH17, PKGH19, JLW20, JLPW22, tCDFL23, CDS24, C + 25]. Such approaches vary across some key dimensions: the form of the input (e.g., pairs of entities, sets of entities, sets of entity tuples); the type of KBs they can handle (e.g., Description Logic KBs, RDF documents, relational databases); the scope of knowledge considered for each input item (e.g., the entire KB or selected excerpts); and the specific formalism to express commonalities (e.g., Description Logics Concepts, r-graphs, (U)CQs, rooted-CQs, SPARQL queries). For example, to express that the tuples of U = {⟨Paris⟩, ⟨Rome⟩} are “European cities”, one can use formulas of the following shapes: city ⊓ ∃ located.{Europe} or

x ← city(x), located(x, Europe) or

x ← isa(x, city), located(x, Europe).

To explain -in a comprehensive way-the nexus of similarity between tuples of entities, however, it is crucial to have a suitable formal semantics on top of the given formalism expressing commonalities. To this aim, Amendola et al. [AMR24] designed a unifying general logic-based framework, endowed with an appropriate semantics, for characterizing nexus of similarity within KBs, namely explaining them in a formal and comprehensive way. In particular, this framework is able to: (i) accommodate different types of KBs; (ii) adopt a notion of summary selector to focus on relevant knowledge about the input items; (iii) handle, in input, sets of entity tuples; (iv) ensure that nexus explanations and nexus characterizations always exist, and admit concise and human understandable equivalent representations; and (v) extend the classical notion of linear expansions (i.e., U ⊂ U ′ ⊂ U ′′ ⊂ . . .), to generalize (tuples of) entities in a taxonomic way. More precisely, the framework is based on the notion of a selective KB, denoted by S = (K, ς), which enriches any knowledge base K with a so-called summary selector ς. For each tuple τ of entities, ς identifies a relevant portion of the knowledge “entailed” by K that meaningfully describes τ . To formally express commonalities in this setting, the framework introduces a dedicated nexus explanation language, called NCF, equipped with suitable semantics. Within this language, one can define NCF-formulas playing the role of explanations and characterizations for tuples of entities, together with succinct forms of the latter called canonical characterizations and core characterizations. Importantly, all these formulas are guaranteed to exist and are effectively computable. In addition, the framework defines an expansion graph, which generalizes the classical notion of linear expansion by capturing taxonomic structures that naturally emerge from similarity relations. These structures can be intuitively viewed as taxonomic entity set expansions, reflecting how humans group and generalize concepts along meaningful semantic hierarchies. Key reasoning tasks related to th

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut