Guru: Finding Large Files With Python
July 15, 2019 Mike Larsen
It’s always a good idea to purge files that aren’t needed any longer. Chances are that you already have procedures in place to purge data from Db2 files and tables, but what about files that reside in the IFS? Do you have a good solution for keeping the IFS clean?
Perhaps you have old order files stored in the IFS. If you work for a large company, these types of files can accumulate quickly. I’ve written processes in the past to remove files from the IFS using RPG, but I’d like to offer an alternative. I’m going to show how you can use Python to search a directory in the IFS, display attributes of the file, and then delete it. To take it one more step, I’ll show how you can search for files with names that meet a certain criterion. In this example, I’m going to filter for text files.
This story contains code, which you can download here.
Figure 1. IFS folder with various types of documents.
With the goal set, let’s jump right into the code. I start by importing some Python modules, as seen in the following piece of code, to help me with various tasks.
from datetime import datetime import os import fnmatch
The ‘datetime’ module helps me to display a nicely formatted date. In this example, I’m displaying the last modified date. If I don’t format the date, I’d get an Epoch date (Figure 2) that is not easily deciphered (at least for me anyway).
The ‘os’ module will help me read the contents of an IFS directory. At the end of the process, I also use it to delete the file from the IFS.
Finally, the ‘fnmatch’ module assists with ensuring I only process text files as that’s what I’ve chosen to do in my process.
To format the date, I created a function that will be executed for each file I read in the directory.
def convert_date(timestamp): d = datetime.utcfromtimestamp(timestamp) formatted_date = d.strftime('%d %b %Y') return formatted_date
The next (and final) section of code does the heavy lifting. I’ll show the entire snippet of code, followed by an explanation.
dir_entries = os.scandir('/home/MLARSEN/test_folder/') for entry in dir_entries: if entry.is_file(): if fnmatch.fnmatch(entry.name, '*.txt'): info = entry.stat() # I just picked an arbitrary file size for which to look if info.st_size > 207: print(f'{entry.path}\t {entry.name}\t Last Modified: {convert_date(info.st_mtime)}\t Size in bytes: {info.st_size}') os.remove(entry.path)
I start by scanning the directory that holds my files. In a production process, you’d likely soft code the path, but I hard coded it here to make the example more readable.
Next, I loop through the directory entries and perform a few checks. I want to ensure that the directory entry is indeed a file (versus a directory or other entity) and also that it’s a text file (has .txt in the file name). If these conditions are true, I grab the attributes of the file and check the size. In my example, I’m looking for files that are larger than 207 bytes. That’s just a made-up number I’m using in my example. You can make that number whatever you like and you might also want to soft code it.
I print some of the file attributes to the terminal, then delete the file using ‘os.remove’. That’s it! With a few lines of code, I’ve built a very powerful process.
Now that the script is built, it’s time to test it out. To run Python scripts, I like to use SSH Terminal from ACS (Figure 3).
When I use SSH Terminal, it opens a PuTTY session for me where I can execute my script (Figure 4).
When I execute the script, it returns three files that met the criterion I specified. I chose to print the file attributes out to the terminal for illustrative purposes before the process deletes the files. When I go back to view the IFS from ACS (Figure 5), I see the 3 files have been deleted.
With a small amount of code, I built a very productive piece of software. Using RPG to perform this task as I have in the past was just fine, but doing this in Python was a lot easier to do and the code was much more concise. The complete code for the Python script used in this article is available for download.
another awesome one Mike, your usual fan Dandreb
Thank you, Dandreb. There is more on the way!
How do we get the SSH Terminal option in ACS? I have the latest version and do not see it listed…