# Query Languages for Semistructured Data

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm The Bucket Algorithm Each subgoal g of Q must be covered by some view Make a list of candidates (buckets) per query subgoal Consider combinations of candidates from different buckets Not all combos are compatible Keep the compatible ones and minimize them Discard the ones contained in another Take their union 2 The Bucket Algorithm

q(X,Y,R) :- ForSale(X,Y,C,auto), Review(X,R,auto), Y > 1985 Step 1: For each subgoal, put the relevant sources into a bucket: V1(name, year) :- ForSale(name, year, France, auto), year > 1990 would be relevant V3(name, year) :- ForSale(name, year, France, cheese) would be irrelevant Step 2: Take the Cartesian product of the buckets Algorithm produces maximally contained rewriting Ignores interactions between subgoals in Step 1 3 The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs 500, Qtr Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr Aut97

q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C 300, Q Aut95 Step 1: For each query subgoal, put the relevant sources into a bucket 4 The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs 500, Qtr Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C 300, Q Aut95 Buckets teaches PProf, CCrs, QQtr

reg course V2 V4 Note: Arithmetic predicates dont pose a problem 5 The Bucket Algorithm: Example V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs 500, Qtr Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C 300, Q Aut95

Buckets SStd, CCrs, QQtr teaches reg V2 V4 V1 V2 course Note: V3 doesnt work: arithmetic predicates not consistent V4 doesnt work: S not in the output of V4 6 The Bucket Algorithm: Example

V1(Std,Crs,Qtr,Title) :- reg(Std,Crs,Qtr), course(Crs,Title), Crs 500, Qtr Aut98 V2(Std,Prof,Crs,Qtr) :- reg(Std,Crs,Qtr), teaches(Prof,Crs,Qtr) V3(Std,Crs) :- reg(Std,Crs,Qtr), Qtr Aut94 V4(Prof,Crs,Title,Qtr) :- reg(Std,Crs,Qtr), course(Crs,Title), teaches(Prof,Crs,Qtr), Qtr Aut97 q(S,C,P) :- teaches(P,C,Q), reg(S,C,Q), course(C,T), C 300, Q Aut95 Buckets CCrs, TTitle teaches reg course V2 V4

V1 V2 V1 V4 7 The Bucket Algorithm: Example Step 2: Try all combos of views, one each from a bucket Test satisfaction of arithmetic predicates in each case e.g., two views may not overlap, i.e., they may be inconsistent Desired rewriting = union of surviving ones Query rewriting 1: teaches reg

course V2 V4 V1 V2 V1 V4 q1(S,C,P) :- V2(S,P,C,Q), V1(S,C,Q,T), V1(S,C,Q,T) no problem from arithmetic predicates (none in V2) May or may not be minimal (why?) 8 The Bucket Algorithm: Example Unfolding of rewriting 1:

q1(S,C,P) :- r(S,C,Q), t(P,C,Q), r(S,C,Q), c(C,T), r(S,C,Q), c(C,T), C 500, Q Aut98, C 500, Q Aut98 Black rs can be mapped to green r: SS, SS, QQ Black c can be mapped to green c: just extend above mapping to TT Minimized unfolding of rewriting 1: q1m(S,C,P) :- t(P,C,Q), r(S,C,Q), c(C,T), C 500, Q Aut98 Minimized rewriting 1: q1m(S,C,P) :- V2(S,P,C,Q), V1(S,C,Q,T) 9 The Bucket Algorithm: Example Query Rewriting 2: teaches reg

course V2 V4 V1 V2 V1 V4 q2(S,C,P) :- V2(S,P,C,Q), V1(S,C,Q,T), V4(P,C,T,Q) q2(S,C,P) :- r(S,C,Q), t(P,C,Q), r(S,C,Q), r(S,C,Q), c(C,T), C 500, Q Aut98, r(S,C,Q), c(C,T), t(P,C,Q), Q Aut97 This combo is infeasible: consider the conjunction of arithmetic predicates in V1 and V4 Query rewriting 3:

teaches reg course V2 V4 V1 V2 V1 V4 q3(S,C,P) :- V2(S,P,C,Q), V2(S,P,C,Q), V4(P,C,T,Q) 10 The Bucket Algorithm: Example Unfolding of rewriting 3:

q3(S,C,P) :- r(S,C,Q), t(P,C,Q), r(S,C,Q), t(P,C,Q), r(S,C,Q), c(C,T), t(P,C,Q), Q Aut97 The green subgoals can cover the black ones under the mapping: SS, SS, PP, PP, QQ Minimized rewriting 3: q3m(S,C,P) :- V2(S,P,C,Q), V4(P,C,T,Q) Verify that there are only two rewritings that are not covered by others Maximally Contained Rewriting: q = q1m q3m 11 The Bucket Algorithm: Example 2 Query: q(X) :- cites(X,Y), cites(Y,X), sameTopic(X,Y) Views: V4(A) :- cites(A,B), cites(B,A) V5(C,D) :- sameTopic(C,D) V6(F,H) :- cites(F,G), cites(G,H), sameTopic(F,G) Buckets

cites cites sameTopic V4 V4 V5 V6 V6 V6 Note: Should we list V4(X) twice in the buckets? 12

The Bucket Algorithm: Example 2 Consider all combos & check for containment of the unfolded rewriting in Q V4(X) cannot be combined with anything (why?) Try q1(X) :- V4(X), V4(X), V5(X,Y) Try q2(X) :- V4(X), V6(X,Y), V5(X,Y) Does any of these work? When can we discard a view from consideration? 13 The Bucket Algorithm: Example 2 Here is a successful rewriting: q3(X) :- V6(X,Y), V6(X,Y), V6(X,Y) By itself is not contained in Q But, with subgoal X=Y added, it is! By minimizing the rewriting, we get:

q3m(X,Y) :- V6(X,X) 14 The Bucket Algorithm: Example 2 Remarks: V4 didnt contribute to any rewrite, but the bucket algorithm doesnt recognize it ahead Consider: q2(X,Y) :- cites(X,Y), cites(Y,X) Then both cites predicates can be folded into V4 Not recognized by the bucket algorithm 15 The State of Affairs Bucket algorithm: deals well with predicates, Cartesian product can be large (containment check required for every candidate

rewriting) Inverse rules: modular (extensible to binding patterns, FDs) no treatment of predicates resulting rewritings need significant further optimization Neither scales up The MINICON algorithm: change perspective: look at query variables 16 References Querying Heterogeneous Information Sources Using Source Descriptors By Alon Y. Levy, Anand Rajaraman and Joann J. Ordille VLDB, 1996

Laks VS Lakshmanan Lecture Slides Alon Halevy Answering Queries Using Views: A Survey VLDB Journal, 2000 http://citeseer.ist.psu.edu/halevy00answering.html 17

## Recently Viewed Presentations

• St. Joseph Mercy Health System Missouri Center for Patient Safety. Ann Arbor, MI Jefferson City, MO. [email protected] [email protected] Coaching Call 3: Hardwiring Multidisciplinary Rounds with Daily Goals; Sample Huddles. August 16, 2011. Document 1
• Raisin Bun Model. He suggested that all atoms must contain electrons (negative charge). His model pictured a positively charged ball with the negatively charged electrons embedded in it. Thomson's Model... Raisin Bun Model. Ernst Rutherford (1871-1937)
• The top five consequences all showed an increase from the prior year; loss of trust continued to be the most common consequence from online risks. Overall, 71% of respondents reported at least one consequence. Offline consequences were some of the...
• PMI-ACP® Certification & Exam. PMI provides information regarding PMI-ACP® exam content in the PMI Exam Content Outline document on their website. 50% of the exam covers Agile tools & techniques across 6 specific domain groups. The other 50% of the...
• What Caused the Financial Crisis of 2008? ... leading to a successful law suit against Bankers Trust 1995 Gibson Greeting Cards lost \$23 million from interest rate swaps, leading the SEC to fine Bankers Trust \$10 million 1994 Orange County,...
• DER based Planning (DRP) Grid Modernization. Structural changes . in distribution grid designs are required - grid we have is not the grid we need* Resilient, 2-way networks are essential - CA's distribution grids need to become generation ties &...
• In contrast, Owen's poem, "Exposure" is an unashamedly angry and bitter piece about the waste of life in war. Interestingly in this poem, Owen closely associates violence with natural imagery to emphasise both the physical and emotional hardships of war:...
• Serial Killers Video video 2 science of the mind video 3 science of psychopaths Joe Ball Operated during the late 1930s Killed at least 5, probably 14, waitresses at his tavern (The Sociable Inn) in Texas Threw them into a...