DB2 For i XMLTABLE, Part 1: Convert XML to Tabular Data

June 13, 2012 Michael Sansoterra

DB2 for i 7.1 brought many new features including the ability to compose and decompose XML data. However, decomposing an XML document in DB2 for i 7.1 requires quite a few steps and creates, sometimes unwanted, permanent objects. Further, for all of the XML processing features, one lacking feature is the ability to easily and dynamically shred XML. Fortunately, that missing feature is now present with the technology refresh 4 update, which adds the XMLTABLE table function to the SQL developer’s arsenal. In this article, part 1 of my series on DB2 for i, we will look at how the XMLTABLE function can convert XML to tabular data.

XMLTABLE is powerful because it can analyze an XML document on the fly and retrieve the desired portions of the XML document using XPATH expressions. XMLTABLE is defined in the SQL/XML 2006 specification and is supported by other members of the DB2 family, Oracle, and other RDBMSs.

To demonstrate the power of XMLTABLE, I’m going to start by using a Microsoft sample XML file called books.xml. Here is an abbreviated copy of the XML, which shows only two of the 12 books.

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML. </description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world. </description>
   </book>
</catalog>

Let’s say there is a DB2 for i table called SPECIALTY_CATALOG that has an XML column that holds book information in the XML format shown above. Here is the table definition:

CREATE TABLE SPECIALTY_CATALOG (
CATALOG_ID INT NOT NULL GENERATED ALWAYS AS IDENTITY,
CATALOG_DESCRIPTION VARCHAR(64) NOT NULL,
CATALOG_BOOK_LIST XML)

Now XML is great, but DB2 for i can’t do much with it until we convert it to a tabular format suitable for a relational database. This is where XMLTABLE comes in. The following query will retrieve each specialty catalog and join it to each book within the XML data. So if the first specialty catalog ID contains 27 books in its XML column, the query would return a total of 27 rows for that catalog ID. If the second catalog ID XML contains 15 books, then the query will return 15 rows, etc.

SELECT CATALOG_ID,CATALOG_DESCRIPTION,XML_BOOKS.*
FROM SPECIALTY_CATALOG SC,
XMLTABLE(
'$doc/catalog/book' 
PASSING SC.CATALOG_BOOK_LIST AS "doc"
COLUMNS 
Book_Id      VARCHAR(10)  PATH '@id',
Author       VARCHAR(64)  PATH 'author',
Title        VARCHAR(128) PATH 'title',
Genre        VARCHAR(12)  PATH 'genre',
Price        DECIMAL(7,2) PATH 'price',
Publish_Date DATE         PATH 'publish_date',
Description  VARCHAR(128) PATH 'description',
Book_Sequence_No FOR ORDINALITY
) AS XML_BOOKS

Ok, so here’s how it works. This query will retrieve a row from SPECIALTY_CATALOG and pass the content of the CATALOG_BOOK_LIST XML column into the XMLTABLE table function. XMLTABLE is powered by an XML parser that will extract requested pieces from the XML column and convert them to a tabular format.

In the query, the beginning and the end of the XMLTABLE table function is shown in blue. The first argument passed in the example is called a “row-xquery-expression-constant” and instructs XMLTABLE which elements in the XML to concern itself with:

'$doc/catalog/book' 
PASSING SC.CATALOG_BOOK_LIST AS "doc"

In the sample code shown above, “/catalog/book” is an XPath expression. While it’s beyond the article’s scope to explain XPATH, it’s basically a language designed to query XML documents. In this case, we’re searching for all <book> elements within the XML that are placed directly under the <catalog> root element. If an XML document contains 24 book elements under the <catalog> root, XMLTABLE will generate 24 rows. You can think of this portion as the row selector for the table function; whatever portions of the XML match this expression will be returned as a row to the table function. The “$doc” is a kind of substitution variable in the xquery expression. Note: What gets substituted into $doc when the statements runs is explained in the next portion of the statement. I called it $doc because it represents an XML document.

This expression is followed by a PASSING keyword that instructs DB2 on where to get the XML info ($doc). We’re going to pass the CATALOG_BOOK_LIST XML column from table SPECIALTY_CATALOG (aliased as SC) and give it a name of “doc” (the double quotes are important in DB2 to specify the alias as lowercase). This is the link that connects the $doc variable with the XML column (now aliased as “doc”) in the table. It’s no coincidence that $doc and “doc” are named the same; that’s how the substitution occurs, but the DB2 alias name doesn’t require the leading dollar sign ($). Also, in case you’re new to XML, be aware that XML is case sensitive! If you don’t type the case correctly, you’re in for a world of annoying troubleshooting.

The next argument given to XMLTABLE is a list of mapped columns. For each “/catalog/book” element that XMLTABLE finds in the XML data it is passed (remember this represents a row for the table function to return), it will go through the column list and attempt to map each requested piece of XML information to a column. Look at two of the column definitions:

Book_Id      VARCHAR(10)  PATH '@id',
Author       VARCHAR(64)  PATH 'author',

Notice the column name is specified first (and is developer defined), followed by the column’s data type, followed by an XPath expression for how to get the data from the XML. The “Book_Id” column will be populated by the value of the “id” attribute on the <book> element. (The “@ ” in XPath instructs the parser to get an element attribute value instead of an element value.)

Similarly, the “Author” column will be populated by finding the value of the <author> element within the XML. Keep in mind that a column’s XPath expressions is combined with the main XPath expression so that the full XPath to retrieve “Book_id” would really look like this:

/catalog/book/@id

The “Book_Sequence_No FOR ORDINALITY” column definition is an optional and special column definition that contains an incremental counter for each row it returns. This is useful because it can be used to track the order of the elements as they originally appeared in the XML.

The first two rows from the query result look like this (the column headings have been shortened here):

(Click graphic to enlarge.)

In summary, XMLTABLE converts an XML document into rows and columns so that a developer can use SQL to do normal JOIN, ORDER BY, and filtering operations on an XML document. For example, adding to the prior example, you can filter for a specific author by adding a WHERE clause:

SELECT CATALOG_ID,CATALOG_DESCRIPTION,XML_BOOKS.*
FROM SPECIALTY_CATALOG SC,
XMLTABLE(
'$doc/catalog/book' PASSING SC.CATALOG_BOOK_LIST AS "doc"
COLUMNS 
Book_Id      VARCHAR(10)  PATH '@id',
... (more columns here) ...
Book_Sequence_No FOR ORDINALITY
) AS XML_BOOKS
WHERE XML_BOOKS.Author='Corets, Eva'

However, we can also modify our row-selector XPath expression to filter this for us, such that XMLTABLE will return only the desired author:

SELECT CATALOG_ID,CATALOG_DESCRIPTION,XML_BOOKS.*
FROM SPECIALTY_CATALOG SC,
XMLTABLE(
'$doc/catalog/book[author="Corets, Eva"]' 
PASSING SC.CATALOG_BOOK_LIST AS "doc"
COLUMNS 
Book_Id      VARCHAR(10)  PATH '@id',
…
Book_Sequence_No FOR ORDINALITY
) AS XML_BOOKS

Notice how XPath allows the addition of a filter:

[author="Corets, Eva"]

This expression asks for all books under the <catalog> root element that have an author element with the value of “Corets, Eva.” Pretty powerful stuff! In the first example, the XML parser would parse all of the rows, pass them to DB2, and let DB2 filter it out. That could take a while for a big XML file. Specifying the filter as shown above makes the XML parser do the filtering and then it passes only the requested rows to DB2 and requires nothing further from DB2.

This example was hard coded, but we can “soft code” the XPath expression as follows:

SELECT CATALOG_ID,CATALOG_DESCRIPTION,XML_BOOKS.*
FROM SPECIALTY_CATALOG SC,
XMLTABLE(
'$doc/catalog/book[author=$requested_author]'
PASSING SC.CATALOG_BOOK_LIST AS "doc", 
'Corets, Eva' AS "requested_author"
COLUMNS 
Book_Id      VARCHAR(10)  PATH '@id',
...
Book_Sequence_No FOR ORDINALITY
) AS XML_BOOKS

Now, $requested_author is a second substitution variable that must also be supplied by the PASSING keyword. For simplicity, I hard-coded the author name again; normally this value would come from a variable or column. Further, notice that the alias name “requested_author” is in lowercase in order to match the “$requested_author substitution” variable.

There is one more important thing to note. In this example, the placement of the SPECIALTY_CATALOG and the XMLTABLE in the FROM clause mimics an INNER JOIN from the SPECIALTY_CATALOG row to all of the corresponding rows in the XML:

FROM SPECIALTY_CATALOG SC, XMLTABLE(…) XML_BOOKS

For cases when XMLTABLE doesn’t return any data and you still want the info from the SPECIALTY_CATALOG to show in the query results, you can restructure the query using a LEFT JOIN with a “dummy” correlation:

FROM SPECIALTY_CATALOG SC
LEFT JOIN XMLTABLE(…) XML_BOOKS ON 1=1

As you can see, XMLTABLE is a great tool for the DB2 for i developer. Before I end, here are a few notes that will help you when using XMLTABLE:

• Remember that XML is case sensitive.

• Similarly, substitution variables and their aliases specified by PASSING must match in case.

• Sometimes it takes a while to get used to the XPath syntax. Don’t give up!

• When passing values to an XML substitution value, make sure to supply a column or variable defined with the XML data type. If you have XML data in a character variable or column, use the XMLPARSE function to convert the character data to an XML document before passing it to XMLTABLE. I had some unexpected errors when I didn’t take this step (this may just be a bug).

• Remember, XMLTABLE dynamically parses XML data so there is a potential performance penalty for its use.

• DB2 for i does not support all XPath features. For example, I couldn’t get the fn:namespace-uri function to work, and I found I couldn’t use all of the axes. Here is the error I received trying to specify the following-sibling axis in an XPath expression:

Message: [SQ16007] The XPath path expression references an axis 
following-sibling that is not supported. Cause . . . . . :   
The following-sibling that is specified is not supported because 
DB2 XPath does not support the Full Axis Feature. The supported
axes include: child, attribute, descendant, self, descendant-or-self,
and parent. The XPath expression cannot be processed.
Recovery  . . . :   Specify an axis that is supported.

In part 2 of this series, I’ll show how to use XMLTABLE with namespaces and give an example of how to easily process an XML file on the IFS. Until then, happy XML shredding!

Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.

                     Post this story to del.icio.us
               Post this story to Digg
    Post this story to Slashdot