Guru: Reading Nested XML Using SQL
August 31, 2020 Jonathan M. Heinz
XML is a data-interchange format, not a relational database management system. For this reason, using SQL to query XML data can be challenging, as what would be stored in two relational tables are placed in one element of XML. To put it another way, detail data is nested under the header data.
I would like to share a way of using SQL to extract nested data from an XML file. I found this method useful when testing a change to a process that creates XML to be sent to customers. I can use this SQL to quickly check that the process is populating the XML correctly.
The XML contained invoice information with detailed product rows. I needed to extract the product rows for each invoice number. At first glance, this should be easy, but it isn’t, because of the repeated product rows.
Here is some simplified example XML that has multiple nested elements. It has two invoices to be sent to two customers, and each invoice contains two product rows. In real life it could contain thousands of invoices and each invoice might have thousands of product rows.
<?xml version='1.0' encoding='UTF-8'? > <INVOICES> <INVOICE> <HEADER> <INVOICE_ID>123456</INVOICE_ID> <CURRENCY> <CODE>EUR</CODE> </CURRENCY> </HEADER> <RECEIVER> <CUSTOMER_INFORMATION> <CUSTOMER_NAME>COMPANY1</CUSTOMER_NAME> <CUSTOMER_ID>00000001</CUSTOMER_ID> <ADDRESS> <STREET_ADDRESS1>Lane 1</STREET_ADDRESS1> <STREET_ADDRESS2></STREET_ADDRESS2> <STREET_ADDRESS3></STREET_ADDRESS3> <POSTAL_CODE>10</POSTAL_CODE> <COUNTRY>UK</COUNTRY> </ADDRESS> </CUSTOMER_INFORMATION> </RECEIVER> <ROWS> <ROW> <ROW_NUMBER>1</ROW_NUMBER> <PRODUCT> <PRODUCT_ID>111112</PRODUCT_ID> <PRODUCT_NAME>Some good stuff</PRODUCT_NAME> </PRODUCT> <ROW_TOTAL> <AMOUNT SIGN="+" VAT="EXCLUDED">10.000</AMOUNT> </ROW_TOTAL> </ROW> <ROW> <ROW_NUMBER>2</ROW_NUMBER> <PRODUCT> <PRODUCT_ID>111114</PRODUCT_ID> <PRODUCT_NAME>Some other good stuff</PRODUCT_NAME> </PRODUCT> <ROW_TOTAL> <AMOUNT SIGN="+" VAT="EXCLUDED">5.350</AMOUNT> </ROW_TOTAL> </ROW> </ROWS> </INVOICE> <INVOICE> <HEADER> <INVOICE_ID<123457</INVOICE_ID> <CURRENCY> <CODE>EUR</CODE> </CURRENCY> </HEADER> <RECEIVER> <CUSTOMER_INFORMATION> <CUSTOMER_NAME>COMPANY2</CUSTOMER_NAME> <CUSTOMER_ID>00000002</CUSTOMER_ID> <ADDRESS> <STREET_ADDRESS1>Lane 2</STREET_ADDRESS1> <STREET_ADDRESS2></STREET_ADDRESS2> <STREET_ADDRESS3></STREET_ADDRESS3> <POSTAL_CODE>20</POSTAL_CODE> <COUNTRY>UK</COUNTRY> </ADDRESS> </CUSTOMER_INFORMATION> </RECEIVER> <ROWS> <ROW> <ROW_NUMBER>1</ROW_NUMBER> <PRODUCT> <PRODUCT_ID>111112</PRODUCT_ID> <PRODUCT_NAME>Some good stuff</PRODUCT_NAME> </PRODUCT> <ROW_TOTAL> <AMOUNT SIGN="+" VAT="EXCLUDED">10.000</AMOUNT> </ROW_TOTAL> </ROW> <ROW> <ROW_NUMBER>2</ROW_NUMBER> <PRODUCT> <PRODUCT_ID>111115</PRODUCT_ID> <PRODUCT_NAME>Some more good stuff</PRODUCT_NAME> </PRODUCT> <ROW_TOTAL> <AMOUNT SIGN="+" VAT="EXCLUDED">5.350</AMOUNT> </ROW_TOTAL> </ROW> </ROWS> </INVOICE> </INVOICES>
I ran the following queries using Run SQL Scripts in Access Client Solutions (ACS). You must use commitment control when working with XML, otherwise the system will respond with SQL state 42926 (LOB and XML locators are not allowed with COMMIT(*NONE)). If your connection is not set to use commitment control, issue the following command before running the queries.
set transaction isolation level read committed;
Extracting the invoice ID and customer information (the header data) is easily done with the following SQL query.
select a.* from xmltable('INVOICES/INVOICE' passing (xmlparse(document get_xml_file('/SomeDir/INVOICE.XML'))) columns InvoiceID varchar(6) Path 'HEADER/INVOICE_ID', CustomerName varchar(20) Path 'RECEIVER/CUSTOMER_INFORMATION/CUSTOMER_NAME', CustomerNumber varchar(10) Path 'RECEIVER/CUSTOMER_INFORMATION/CUSTOMER_ID' ) as a;
INVOICEID | CUSTOMERNAME | CUSTOMERNUMBER |
123456 | COMPANY1 | 00000001 |
123457 | COMPANY2 | 00000002 |
This is quite straightforward. What if I want to return all the product rows? That’s just as easy.
select a.* from xmltable ('INVOICES/INVOICE/ROWS/ROW' Passing ( xmlparse(document get_xml_file('/SomeDir/INVOICE.XML'))) columns ROWNUM integer Path 'ROW_NUMBER', PRODUCTID varchar(20) Path 'PRODUCT/PRODUCT_ID', PRODUCTNAME varchar(50) Path 'PRODUCT/PRODUCT_NAME' ) as a;
ROWNUM | PRODUCTID | PRODUCTNAME |
1 | 111112 | Some good stuff |
2 | 111114 | Some other good stuff |
1 | 111112 | Some good stuff |
2 | 111115 | Some more good stuff |
However, what if we want to know the ID of the invoice each product belongs to? This presents a problem. We can’t just add the product ID and product name to the first query because the product is a repeating group. The database responds with SQL state 10507. (An XPath expression has a type that is not valid for the context in which the expression occurs.)
One way around this obstacle is to use a CTE (Common Table Expression) with the XML data type.
with f1 as (select * from xmltable ('INVOICES/INVOICE' passing ( xmlparse(document get_xml_file('/SomeDir/INVOICE.XML'))) columns InvoiceID varchar(6) Path 'HEADER/INVOICE_ID', ALLROWS xml Path 'ROWS')), f2 as (select f1.*, p.* from f1, xmltable ('ROWS/ROW' passing ALLROWS columns ROWNUM integer Path 'ROW_NUMBER', PRODUCTID varchar(20) Path 'PRODUCT/PRODUCT_ID', PRODUCTNAME varchar(50) Path 'PRODUCT/PRODUCT_NAME') as P) select f2.InvoiceID, f2.RowNum, f2.ProductID, f2.ProductName from f2;
INVOICEID | ROWNUM | PRODUCTID | PRODUCTNAME |
123456 | 1 | 111112 | Some good stuff |
123456 | 2 | 111114 | Some other good stuff |
123457 | 1 | 111112 | Some good stuff |
123457 | 2 | 111115 | Some more good stuff |
I’ve defined two common table expressions, F1 and F2. F1 retrieves the invoice ID as a single column and all the product information as a second column, which I named ALLROWS.
The second CTE, F2, selects the InvoiceID from F1 and also extracts the product information from the ALLROWS column in F1. The result is similar to what we would get by querying joined header and detail tables. If you select all columns from F2, you can see the XML column.
Using SQL to extracting nested data from an XML takes a little imagination, but it isn’t impossible. It isn’t even difficult.
RELATED STORY
Why not using an x-path-expression to refer to the appropriate parent element to get the Invoice ID like in the statement below:
select a.* from xmltable (‘INVOICES/INVOICE/ROWS/ROW’
Passing ( xmlparse(document get_xml_file(‘/SomeDir/INVOICE.XML’)))
columns
InvoiceID varchar(6) Path ‘../../HEADER/INVOICE_ID’,
ROWNUM integer Path ‘ROW_NUMBER’,
PRODUCTID varchar(20) Path ‘PRODUCT/PRODUCT_ID’,
PRODUCTNAME varchar(50) Path ‘PRODUCT/PRODUCT_NAME’
) as a;