Guru: IBM i Save File Compression Options
April 1, 2019 Michael Sansoterra
As I finished populating some test tables with a large volume of data on a small and transient IBM i partition in the cloud, I thought life was good. But my countenance fell as I realized the tables plus OS hogged over 70 percent of the disk space. I wondered how to get all the data into a single save file for safe keeping.
The buzzer in my mind was loud and clear: it ain’t gonna work, you don’t have enough room. As I loathed the thought of using multiple save files to save my test data, I remembered that most save file commands have a data compression (DTACPR) parameter. I never used it so I decided to try it with Save Library (SAVLIB) to see how well it worked. I executed SAVLIB with DTACPR(*HIGH) and I was pleased that the compression was good enough to let me save the entire test library with about 7 percent storage to spare on the system.
IBM offers three compression options (*LOW, *MEDIUM and *HIGH) and shown below is how IBM’s documentation describes each option (the emphasis is mine):
- *NO — No data compression is performed.
- *YES — If the save is to tape and the target device supports compression, hardware compression is performed. If compression is not supported, or if the save data is written to optical media or to a save file, software compression is performed. Low software compression is used for all devices except optical DVD, which uses medium software compression.
- *LOW — If the save operation is to a save file or optical, software data compression is performed with the SNA algorithm. Low compression is usually faster and the compressed data is usually larger than if medium or high compression is used.
- *MEDIUM — If the save operation is to a save file or optical, software data compression is performed with the TERSE algorithm. Medium compression is usually slower than low compression but faster than high compression. The compressed data is usually smaller than if low compression is used and larger than if high compression is used.
- *HIGH — If the save operation is to a save file or optical, software data compression is performed with the LZ1 algorithm. High compression is usually slower and the compressed data is usually smaller than if low or medium compression is used.
These are all older compression algorithms, and I had only heard of LZ1.
I decided to go back and compare the available compression options. I used the save object (SAVOBJ) command to save an 8GB CUSTOMER table into a save file as follows:
SAVOBJ OBJ(CUSTOMER) LIB(MYDATA) DEV(*SAVF) OBJTYPE(*FILE) SAVF(QGPL/MYSAVF) CLEAR(*REPLACE) DTACPR(*NO)
I cleared and re-used the same save file (SAVF) with each test. The results are shown in the table below with the variations of the data compression option:
DTACPR Option | Avg CPU % Utilization | SAVOBJ Duration | SAVF size (bytes) | % of Original Size |
*NO | 4% | 5:41 | 8774656000 | |
*HIGH | 40% | 13:11 | 5687762944 | 64.8% |
*MEDIUM | 33% | 10:46 | 5701132288 | 65.0% |
*LOW | 14% | 3:42 | 6383755264 | 72.8% |
This test was done on a Power9 cloud partition running IBM i 7.3 with two vCPUs, 4GB of RAM and 200GB of disk.
The average CPU% utilization in the chart isn’t a high precision metric, it was basically me eye-balling the work with system activity (WRKSYSACT) command and watching the average CPU utilization over time. Even though the system wasn’t doing much besides these save tests, there is still some CPU cost to run everything. This machine varied between .5 percent and 1.5 percent while “idle”. The majority of the CPU was definitely due to the compression operation.
The chart demonstrates that it can be quite expensive in terms of CPU to request *HIGH or *MEDIUM compression levels, though admittedly this machine only had 2 vCPUs. Even so, you certainly would want to make sure your system has enough CPU capacity before running a save command (SAVnnn) with one of these compression options.
For my customer table, there wasn’t much space savings between *HIGH and *MEDIUM compression (only about .2 percent). While the *LOW option wasn’t as efficient in space savings (by about 8 percent compared to *HIGH), it performed the fastest out of all the methods. If time is of the essence, beware, as you can see the *HIGH and *MEDIUM options took quite a bit longer than a save without compression.
Of course your results may vary, depending on how conducive your data objects are to compression. Data with many repetitive elements typically compresses well. Admittedly, my test “CUSTOMER” table had a bunch of random characters in it. So, the odds are you may expect a better compression ratio for “normal” data.
I decided to do a secondary test to see how well “compressible” data such as a large plain text file would do. I downloaded the free list of Great Britain postal codes from the Geonames.org website. I unzipped it to /tmp/GB_full.txt on the IBM i and used the save object (SAV) command to save this text file from the IFS to a save file.
SAV DEV('/qsys.lib/qgpl.lib/mysavf.file') OBJ(('/tmp/GB_full.txt')) CLEAR(*REPLACE) DTACPR(*HIGH)
This chart contains the various save file sizes depending on the selected data compression (DTACPR):
File Description | File Size (bytes) | % of Original Size |
Uncompressed file | 173821160 | |
Zip file (original download) | 13946129 | 8.0% |
Save file-no compression | 184705024 | 106.3% |
Save file-high compression | 18907136 | 10.9% |
Save file-medium compression | 32538624 | 18.7% |
Save file-low compression | 173039616 | 99.6% |
I did not include duration or CPU% for this test, because the elapsed time of the save operation wasn’t significant. I’m glad I did this test because this result is quite a bit different from the first test with respect to how well the various compression levels performed.
Zip compression was the clear winner compared to the IBM i’s older compression algorithms. Keep in mind, you can use the jar command using QSHELL for zipping/unzipping IFS files. If you don’t mind searching the internet, a number of utilities and other compression formats (including 7z and tar) can also be used from QSHELL to compress IFS files, if getting significant size reduction or sharing data without a save file is your primary goal. If needed, you could always place the zip file into a save file to have the best of both worlds!
Unlike the first compression demo, there was quite a bit of difference between the resulting file sizes for the different compression types. Whereas *LOW compression was quite useful in the prior test, with the plain text file *LOW accomplished almost nothing.
In conclusion, when saving data to a save file, it pays to experiment to gauge the cost (CPU utilization and duration) vs benefit (disk space savings) of using a particular data compression option. Don’t forget, the optimal settings will depend on your data set (for example, program objects vs table data and journal receivers, plain text data vs binary data, etc.) so remember to test for each variation. If you’re only concerned with compressing IFS data, then other compression options are available.
It is worth considering that a smaller saved object can result in faster recovery times, and a lot less storage for multiple backup copies. Using compression is one answer.
Choosing not to save access paths can also dramatically reduce save time, storage of the backup and therefore CPU time, albeit at the expense of a lot of extra time and CPU when restoring. This may be OK for a small test system.
When dealing with recovery time is usually more of a precious resource than CPU in my experience though.
Also, for the CL fans at 7.2 and later is CPYTOARCF and CPYFRMARCF for zipping and unzipping files.
I used CPYTOARCF on V7R3 and it zipped a library to the IFS. However, the CPYFRMARCF does not allow me to nominate a library to unzip the files. It gives an error “CPFA0A2 Information passed to this operation was not valid”.
The “TODIR” parameter seems to only accept a directory name which is useless for restoring library objects.
The help text for the command gives an example
CPYFRMARCF FROMARCF(‘/MYDIR/MyArchiveFile.zip’)
TODIR(‘/QSYS.LIB/MYLIB.LIB/’)
RPLDTA(*YES)
but doesn’t seem to want to do that. I cannot find any Google or IBM information on this problem other than comments from those who have the same problem.