Guru: SQL and QTEMP
January 4, 2021 Ted Holt
Hey, Ted.
For many years IBM i developers, operators, and others have taken advantage of an operating system feature called the QTEMP library. Through the years you have referenced it various times in IT Jungle as a useful feature of IBM i. I recently read an article in which a respected IBM expert from the Rochester lab services team told people not to use the QTEMP library when working in SQL. It would be very interesting if you could dive into this topic and explain when/if there is still a time and place for QTEMP. I think a lot of people would read this.
— Bill
I know these lab services guys, having attended conferences at which they spoke. I have the highest respect for them. I listen to them speak every opportunity I get and I read everything they write. I take everything they say seriously. I have learned so much from them. I can’t say enough good things about them.
An important part of their work is helping IBM customers with performance problems. This is the context in which they are speaking about QTEMP. They’ve seen performance problems that were directly caused by the use of QTEMP. To be more specific, it is the use of scratch tables in QTEMP that they are warning us against. Among the reasons they give are that (1) creating a table and loading it with data are slow operations, and (2) tables in QTEMP are not usually indexed. Instead, we should use other techniques, in particular the use of views, common table expressions, and nested table expressions (also called derived tables), all three of which are different ways to implement a SELECT statement.
Here’s an example. Let’s say that someone I serve wants to know who owes us money, how much money each customer owes us, the percentage a customer’s debt is of the combined debt of the customers in the same state, and the percentage a customer’s debt is of the total debt. To get the percentages requires summing up all debts. In days of yore, I might have used a non-SQL equivalent of the following SQL code to create two scratch tables in QTEMP.
declare global temporary table StateTotals as (select state, sum(baldue) as StateBalDue from qiws.qcustcdt group by state) with data; declare global temporary table GrandTotal as (select sum(StateBalDue) as GrandBalDue from session.StateTotals) with data; select d.state, d.cusnum, d.lstnam, d.city, d.baldue, case when s.StateBalDue is null or s.StateBalDue = 0 then 0 else dec(round(d.baldue / s.StateBalDue,4) * 100,5,2) end as PercentOfState, dec(round(d.baldue / g.GrandBalDue,4) * 100,5,2) as PercentOfTotal from qiws.qcustcdt as d join session.StateTotals as s on d.state = s.state cross join session.GrandTotal as g order by d.State, PercentOfTotal desc ;
Do I get the correct answers? Certainly. But that doesn’t mean that I should continue to use this tried-and-true technique. Modern computers are faster, memory is more abundant and cheaper, and SQL has gotten more and more powerful.
The two scratch tables are easily replaced with common table expressions of the same names.
with StateTotals as (select state, sum(baldue) as StateBalDue from qiws.qcustcdt group by state), GrandTotal as (select sum(StateBalDue) as GrandBalDue from StateTotals) select d.cusnum, d.lstnam, d.city, d.state, d.baldue, case when s.StateBalDue is null or s.StateBalDue = 0 then 0 else dec(round(d.baldue / s.StateBalDue,4) * 100,5,2) end as PercentOfState, dec(round(d.baldue / g.GrandBalDue,4) * 100,5,2) as PercentOfTotal from qiws.qcustcdt as d join StateTotals as s on d.state = s.state cross join GrandTotal as g order by d.State, PercentOfTotal desc;
Performance is not a concern for this small example, as file QCUSTCDT only has a dozen records, but if the file were much larger, the second version would run noticeably faster than the first one.
One very common use of scratch tables is in chains of queries. That is, one or more queries builds one or more scratch tables, often in QTEMP, but not necessarily there, before producing the final output. This problem is easily avoided by ceasing to use Query for IBM i, but Query is the software that refuses to die. I’ve already written about this topic, so if you want to know how to replace query chains with common table expressions, see my article Common Table Expressions Can Replace Query Chains.
I’m not going to presume to give my approval to what the lab services guys have said. That would be like Jack Benny approving the violin playing of Itzhak Perlman. Instead, I can only add to the discussion by addressing the question from a different perspective.
Let’s back up a step. What is the purpose of QTEMP? Just this: to allow two or more jobs to run the same program at the same time without interfering with one another. This is not a new problem. I faced it in my S/34 and S/36 days, as files were global to the system. This will look familiar to old-timers.
// LOAD #GSORT // FILE NAME-INPUT,LABEL-HISTORY // FILE NAME-OUTPUT,LABEL-SORT?WS?,RECORDS-?F'A,HISTORY'? // RUN
The ?WS? expression retrieves the workstation ID. Two users could run in parallel because the scratch files had different names. Imagine my joy when I started programming the S/38 and had the QTEMP library for such tasks.
When IBM says that you shouldn’t use SQL with QTEMP, they are talking from the perspective of performance. They are not addressing concurrent program execution. They don’t need to. If you use a view, common table expression, or nested table expression instead of scratch files, no two jobs can possibly interfere with one another! Magnificent!
The question, then, boils down to this: Are there times when SQL queries won’t serve the purpose, and scratch tables are unavoidable? I think so. I can’t give an exhaustive list, but here are a few situations that come to mind:
- Reading an outfile produced by a CL command. I’m most grateful for the IBM i Services and DB2 for i Services that IBM provides. They give us SQL interfaces to the operating system. Sometimes there is no service that retrieves the information I want, so I use an outfile, which I typically place into QTEMP.
- Joining to a one-row table for row selection. I’ve been using this technique since my S/36 days. A more modern technique would be to use global variables. I haven’t tried to compare performance of the two techniques.
- Subsetting data for use by several queries in a job stream. Suppose a batch job stream runs several queries, all of which need a certain subset of data. Depending on the situation, I might put that data into a scratch table to avoid selecting the same data time and time again.
- Dealing with bad database design. The problem with the examples I and other writers use in articles and presentations is that our examples reflect a perfect world, not the reality many of us have to deal with day after day. Many of us are forced to work with what I call a “doo-doo base”, a set of tables and/or physical files in which data is improperly stored. Over the years I have seen some bizarre ways to shoehorn data into files. I’ve seen numerous violations of first, second, and third normal forms. As I wrote a few months ago, I’ve seen many repurposed fields in the applications I’ve worked on. If you regularly work with a doo-doo base, you may have no choice but to run one or two or three or more RPG programs and/or SQL queries to build a scratch table, which you can then query using SQL. If you don’t need concurrent execution, you can build the scratch table, along with needed indexes, in a permanent library. If you do need to support the possibility of concurrent execution, QTEMP is a good place to store such a table.
Those are some things to come to mind. I don’t have any hard rules about when to use QTEMP and when not to use it. I can say that thanks to techniques such as views, common table expressions, and nested table expressions, I use QTEMP in SQL queries much less than I used to.
Keep in mind, too, that performance is relative. Is five seconds too long for a query to run? It depends on the context. If it’s in the overnight batch process, it’s almost certainly acceptable. If it’s on the server end of a request from a Web browser, it’s probably much too slow. Not every query has to run optimally. Good enough is good enough. The IBM guys kindly brought this matter of QTEMP to our attention because the use of QTEMP was the cause of performance problems. That doesn’t mean that every query that uses QTEMP is a problem and needs to be accelerated.
In short, QTEMP is a wonderful tool. It would be foolish not to use it at all. I subscribe to the idea that a master of his craft knows his tools, the capabilities, strengths, weaknesses, advantages and disadvantages of each one, and is able to make an informed decision when doing a task. In other words, use some common sense.
I welcome your comments below or in email. It would be especially helpful to all of us if the database masters from IBM would add their comments.
RELATED STORIES
Using Table Expressions (IBM Knowledge Center)
Guru: Common Table Expressions Can Replace Query Chains
Ted,
Well said. QTEMP and SQL are both useful tools, but like all tools one needs to make use of them wisely. I’ve seen both good and poor examples of both through the many years that I’ve been using them.
I also agree with you on the IBM Lab Services Team and the work that they do. There is a cost vs. benefit trade-off with every project as far as performance is concerned.
>>>> I recently read an article in which a respected IBM expert from the Rochester lab services team told people not to use the QTEMP library when working in SQL. <<<<
The article referred to by Bill was about negative performance impact for accessing data in QTEMP with SQL which applies only to the case of large amount of data being accessed or joined and accessed. But if the data being accessed from QTEMP is not too large and if the HWs involved are fast (e.g. SSD is used in place of spinning disk, CPU is POWER8 or newer), then it can be generally fine using SQL to get data from tables in QTEMP.