Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parameterized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as “support” or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the propositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form “any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant”. We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.
💡 Research Summary
This paper conducts a thorough logical investigation of redundancy among association rules, a cornerstone technique in data mining. An association rule X → Y is interpreted as a partial implication between two itemsets, characterized by confidence (the empirical conditional probability of Y given X) and support (the frequency of X∪Y). The authors model each transaction as a propositional interpretation and define redundancy as logical entailment: a rule r₂ is redundant with respect to rule r₁ if every dataset in which r₁ holds also forces r₂ to hold.
The literature contains many ad‑hoc definitions of redundancy, often based on comparing confidence and support thresholds. By systematically reviewing these proposals, the authors discover that, fundamentally, only two distinct notions of redundancy exist, differing solely in how they treat full‑confidence implications (rules with confidence = 1).
Variant A treats full‑confidence implications as ordinary rules. In this setting the classic Armstrong axioms (reflexivity, augmentation, transitivity) no longer apply directly because transitivity does not preserve confidence: from X→Y and Y→Z with confidence ≥ γ we can only infer X→Z with a lower bound γ² (for γ < 1). Augmentation fails entirely. The authors therefore introduce a new sound and complete deductive calculus consisting of:
- Reflexivity (rules equivalent by adding the antecedent to the consequent have identical confidence and support);
- Partial transitivity (the γ² bound);
- Interaction rules that describe how a full‑confidence implication can be combined with a partial rule without degrading confidence.
They prove that this system derives exactly the set of rules that are semantically entailed under Variant A.
Variant B separates full‑confidence implications from partial rules. Full‑confidence implications are handled with the traditional Armstrong axioms (they form a Horn theory closed under set‑intersection), while partial rules are governed by the new calculus of Variant A. This separation exploits the combinatorial simplicity of Horn implications and yields strictly smaller redundant‑free bases.
For each variant the paper constructs a basis (a minimal generating set of rules) of absolute minimum size. The authors show that the previously known “Essential Rules” or “Representative Rules” are precisely the minimal bases for Variant B, providing the first formal proof of their optimality. The construction relies on the lattice of closed itemsets: each closed frequent set contributes at most one rule to the basis, and the basis size equals the number of non‑redundant closed sets.
Beyond pairwise redundancy, the authors explore redundancy with respect to multiple premises. While naïve intuition suggests that combining two partial rules of confidence ≥ γ would always lower confidence, they identify a special structural case where a new rule of confidence at least γ is non‑trivially entailed by two premises. They fully characterize this case (essentially when the antecedents and consequents overlap in a specific way) and introduce an additional inference rule to capture it.
The paper also discusses practical implications. By separating full‑confidence implications, one can compute closures once and reuse them for all partial rules, dramatically reducing the computational burden. The minimal bases enable users to inspect a compact set of representative rules without loss of information, and the deductive calculus can be employed to generate any omitted rule on demand.
In conclusion, the work clarifies the theoretical landscape of redundancy in association rule mining, provides two rigorously justified notions of redundancy, supplies sound and complete inference systems for each, and proves that the known essential‑rule bases are size‑optimal. It opens avenues for further research on redundancy involving larger sets of premises, alternative confidence measures, and empirical validation on large‑scale datasets.
Comments & Academic Discussion
Loading comments...
Leave a Comment