Query Rewriting On Path Views Without Integrity Constraints

Query Rewriting On Path Views Without Integrity Constraints
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A view with a binding pattern is a parameterised query on a database. Such views are used, e.g., to model Web services. To answer a query on such views, one has to orchestrate the views together in execution plans. The goal is usually to find equivalent rewritings, which deliver precisely the same results as the query on all databases. However, such rewritings are usually possible only in the presence of integrity constraints - and not all databases have such constraints. In this paper, we describe a class of plans that give practical guarantees about their result even if there are no integrity constraints. We provide a characterisation of such plans and a complete and correct algorithm to enumerate them. Finally, we show that our method can find plans on real-world Web Services.


💡 Research Summary

The paper addresses the problem of answering atomic queries using a set of parameterised views that model Web services, where each view is a path‑shaped function with a binding pattern. Traditional approaches aim at finding equivalent rewritings—execution plans that return exactly the same results as the original query on every possible database. Such rewritings, however, rely heavily on integrity constraints (e.g., inclusion dependencies) that are rarely available or may be violated in real‑world data sources. Consequently, in many practical settings no equivalent rewriting exists, and maximally contained rewritings, while broader, offer no guidance on which plan is more likely to succeed.

To overcome this limitation, the authors introduce the notion of a smart plan. A plan π is called smart for an atomic query q if, for every database instance I, the following holds: whenever the filter‑free version of π (i.e., the same sequence of function calls but without any equality filters) produces a result on I, then the original plan π returns exactly the answers of q on I. In other words, the plan is guaranteed to be correct whenever it manages to generate any intermediate data; it never yields spurious answers. A weaker notion, weakly smart, requires only that the plan’s results be a superset of the query answers whenever the filter‑free version succeeds. Weakly smart plans are useful for privacy‑preserving scenarios where a data provider wishes to avoid exposing certain attributes.

The paper focuses on path functions, which are conjunctive queries whose body atoms form a linear chain: f(x, y₁,…,yₘ) ← r₁(x, x₁), r₂(x₁, x₂), …, rₙ(xₙ₋₁, xₙ). The input variable of the first atom is the function’s input, and each subsequent atom’s first argument is the previous atom’s second argument. This structure mirrors typical RESTful service compositions where the output of one call becomes the input of the next.

A central technical contribution is the characterisation of smart plans under the Optional Edge Semantics. In many Web services, some attributes may be missing; the service returns a null value rather than failing. The authors model this behaviour by treating certain atoms as optional: if the optional atom cannot be satisfied, the function still succeeds, producing a null for the corresponding output variable. They formalise optionality through sub‑functions that expose only a prefix of the original path. Under the assumption that all optional edges behave according to this semantics, the authors prove that a plan is smart exactly when every intermediate variable that participates in a filter is guaranteed to be bound by a preceding function call, and no filter eliminates a tuple that could have satisfied the original query.

Based on this characterisation, the paper presents a complete enumeration algorithm for smart plans. The algorithm proceeds in three main phases:

  1. Path Exploration – It enumerates all possible sequences of functions that connect the query’s constant input to a relation containing the desired output variable, respecting the directionality of the path functions.
  2. Filter‑Free Feasibility Check – For each candidate sequence, it verifies that the filter‑free version would produce a non‑empty result on any database that contains the necessary tuples. This involves checking that each variable used as a filter appears as an output of some earlier function in the sequence.
  3. Filter Insertion and Redundancy Elimination – Once a feasible sequence is identified, the algorithm adds the required equality filters (e.g., y = Anna) and removes any superfluous function calls that do not contribute to binding the output variable.

The algorithm runs in time polynomial in the number of functions and the maximum path length, and it guarantees completeness (all smart plans are produced) and soundness (every produced plan satisfies the smartness definition) under the optional edge condition.

The experimental evaluation uses both synthetic datasets and a large collection of real‑world REST services harvested from programmableweb.com (over 22 000 services). The authors compare their smart‑plan enumeration against two baselines: (i) traditional equivalent‑rewriting algorithms that assume full integrity constraints, and (ii) maximally contained rewriting generators that ignore any notion of plan quality. Results show that smart plans retrieve significantly more answers than maximally contained rewritings, especially when the underlying data is incomplete. Moreover, the enumeration time remains practical (seconds to minutes) even for large service catalogs, demonstrating feasibility for on‑the‑fly service orchestration.

In conclusion, the paper makes three key contributions:

  • It formalises a new class of execution plans—smart plans—that provide strong correctness guarantees without requiring integrity constraints.
  • It offers a precise characterisation of smart plans under realistic optional‑edge behaviour of Web services.
  • It delivers a provably correct and efficient algorithm to enumerate all smart plans for a given atomic query and a set of path functions, validated on real‑world service data.

The work opens several avenues for future research, including extending the framework to functions with branching (non‑linear) structures, handling multiple inputs, and integrating cost‑based optimisation to prefer cheaper smart plans in large‑scale service composition scenarios.


Comments & Academic Discussion

Loading comments...

Leave a Comment