Introduction to XQuery
Resources:
Official URL: www.w3.org/TR/xquery
Short intros:
http://www.xml.com/pub/a/2002/10/16/xquery.html
www.brics.dk/~amoeller/XML/querying
Or see Ramakrishnan & Gehrke text
Lecture modified from slides by Dan Suciu
XML vs. Relational Data
{ row: { name: “John”, phone: 3634 },
row: { name: “Sue”, phone: 6343 },
row: { name: “Dick”, phone: 6363 }
}name phone
John 3634
Sue 6343
Dick 6363
row row row
name
name name
phone phone phone
“John”3634“Sue” “Dick”6343 6363
Relation
… in XML
Relational to XML Data
•A relation instance is basically a tree with:
–Unbounded fanout at level 1 (i.e., any # of rows)
–Fixed fanout at level 2 (i.e., fixed # fields)
•XML data is essentially an arbitrary tree
–Unbounded fanout at all nodes/levels
–Any number of levels
–Variable # of children at different nodes, variable
path lengths
Query Language for XML
•Must be high-level; “SQL for XML”
•Must conform to XSchema
–But also work in absence of schema info
•Support simple and complex/nested datatypes
•Support universal and existential quantifiers,
aggregation
•Operations on sequences and hierarchies of doc
structures
•Capability to transform and create XML structures
XQuery
•Influenced by XML-QL, Lorel, Quilt, YATL
–Also, XPath and XML Schema
•Reads a sequence of XML fragments or
atomic values and returns a sequence of
XML fragments or atomic values
–Inputs/outputs are objects defined by XML-
Query data model, rather than strings in XML
syntax
Overview of XQuery
•Path expressions
•Element constructors
•FLWOR (“flower”) expressions
–Several other kinds of expressions as well, including
conditional expressions, list expressions, quantified
expressions, etc.
•Expressions evaluated w.r.t. a context:
–Context item (current node)
–Context position (in sequence being processed)
–Context size (of the sequence being processed)
–Context also includes namespaces, variables, functions,
date, etc.
Path Expressions
Examples:
•Bib/paper
•Bib/book/publisher
•Bib/paper/author/lastname
Given an XML document, the valueof a path
expression pis a set of objects
Path Expression Examples
Doc =
&o1
&o12 &o24 &o29
&o43
&o70 &o71
&96
&243 &206
&25
“Serge”
“Abiteboul”
1997
“Victor”
“Vianu”
122 133
paper book
paper
references
references references
author
title
year
http
author
author
author
title
publisher
author
author
title
page
firstname lastname
firstname
lastname
first last
Bib
&o44 &o45&o46
&o47&o48&o49&o50&o51
&o52
Bib/paper = <&o12,&o29>
Bib/book/publisher = <&o51>
Bib/paper/author/lastname = <&o71,&206>
Note that order of
elements matters!
Element Construction
•An XQuery expression can construct new
values or structures
•Example: Consider the path expressions
from the previous slide.
–Each of them returns a newly constructed
sequence of elements
–Key point is that we don’t just return existing
structures or atomic values; we can re-arrange
them as we wish into new structures
FLWOR Expressions
•FOR-LET-WHERE-ORDERBY-RETURN = FLWOR
FOR / LET Clauses
WHERE Clause
ORDERBY/RETURN Clause
List of tuples
List of tuples
Instance of XQuery data model
FOR vs. LET
•FOR$xINlist-expr
–Binds $xin turn to each value in the list expr
•LET$x= list-expr
–Binds $xto the entire list expr
–Useful for common sub-expressions and for
aggregations
FOR vs. LET: Example
FOR$xINdocument("bib.xml")/bib/book
RETURN<result> $x</result>
Returns:
<result> <book>...</book></result>
<result> <book>...</book></result>
<result> <book>...</book></result>
...
LET$xINdocument("bib.xml")/bib/book
RETURN<result> $x</result>
Returns:
<result> <book>...</book>
<book>...</book>
<book>...</book>
...
</result>
Notice that result has
several elements
Notice that result has
exactly one element
XQuery Example 1
Find all book titles published after 1995:
FOR$xINdocument("bib.xml")/bib/book
WHERE$x/year> 1995
RETURN$x/title
Result:
<title> abc </title>
<title> def </title>
<title> ghi </title>
XQuery Example 2
For each author of a book by Morgan
Kaufmann, list all books she published:
FOR$aINdistinct(document("bib.xml")
/bib/book[publisher=“Morgan Kaufmann”]/author)
RETURN<result>
$a,
FOR$tIN/bib/book[author=$a]/title
RETURN$t
</result>
distinct= a function that eliminates duplicates (after
converting inputs to atomic values)
Results for Example 2
<result>
<author>Jones</author>
<title> abc </title>
<title> def </title>
</result>
<result>
<author> Smith </author>
<title> ghi </title>
</result>
Observe how nested
structure of result
elements is determined
by the nested structure
of the query.
XQuery Example 3
count= (aggregate) function that returns the
number of elements
<big_publishers>
FOR$pINdistinct(document("bib.xml")//publisher)
LET$b:= document("bib.xml")/book[publisher= $p]
WHEREcount($b) > 100
RETURN$p
</big_publishers>
For each publisher p
-Let the list of books
published by p be b
Count the # books in b,
and return p if b > 100
XQuery Example 4
Find books whose price is larger than average:
LET$a=avg(document("bib.xml")/bib/book/price)
FOR$bin document("bib.xml")/bib/book
WHERE$b/price> $a
RETURN$b
Collections in XQuery
•Ordered and unordered collections
–/bib/book/author= an ordered collection
–Distinct(/bib/book/author) = an unordered collection
•Examples:
–LET$a= /bib/book$ais a collection; stmt iterates
over all books in collecion
–$b/authoralso a collection (several authors...)
RETURN<result> $b/author</result>
Returns a singlecollection!
<result> <author>...</author>
<author>...</author>
<author>...</author>
...
</result>
However:
Collections in XQuery
What about collections in expressions ?
•$b/price list of n prices
•$b/price* 0.7 list of n numbers??
•$b/price* $b/quantitylist of n x m numbers ??
–Valid only if the two sequences have at most one element
–Atomization
•$book1/author eq "Kennedy" -Value Comparison
•$book1/author = "Kennedy" -General Comparison
Other Stuff in XQuery
•Beforeand After
–for dealing with order in the input
•Filter
–deletes some edges in the result tree
•Recursive functions
•Namespaces
•References, links …
•Lots more stuff …
Appendix
XML Schema and
XQuery Data Model
XML Schema
•Includes primitive data types (integers,
strings, dates, etc.)
•Supports value-based constraints (integers >
100)
•User-definable structured types
•Inheritance (extension or restriction)
•Foreign keys
•Element-type reference constraints
XML-Query Data Model
•Describes XML data as a tree
•Node::= DocNode |
ElemNode|
ValueNode|
AttrNode|
NSNode |
PINode |
CommentNode |
InfoItemNode |
RefNodehttp://www.w3.org/TR/query-datamodel/2/2001
XML-Query Data Model
Element node (simplified definition):
•elemNode: (QNameValue,
{AttrNode},
[ ElemNode| ValueNode])
ElemNode
•QNameValue= means “a tag name”
Reads: “Give me a tag, a set of attributes, a list of
elements/values, and I will return an element”