Guru: Creating Excel Spreadsheets With Python
April 29, 2019 Mike Larsen
Over the past several years, I’ve seen a growing demand to provide business data in Excel. Some shops are content to receive their data in .csv format and then manually apply formatting to make it presentable. Wouldn’t it be better if we could provide a spreadsheet that is already formatted?
In my opinion, the answer is yes. Most recently, I’ve been using PHP to accomplish this task. PHP works great for me, but I always like to have different options from which to choose. This is important to me because some shops may not have PHP installed. Based on that fact, I decided to see what Python has to offer in this area.
This story contains code, which you can download here.
I’m making the assumption you have the latest version of Python running on your IBM i. For this example, I’m using Python 3.6. You’ll also need to have the ‘ibm_db_dbi’ package and ‘xlsxwriter’ module installed as seen in Figure 1.
Figure 1. Import ‘ibm_db_dbi’ and ‘xlsxwriter’
import ibm_db_dbi as db2 from xlsxwriter import Workbook
I start by creating a connection to the database and declaring a cursor (Figure 2). It’s important to note that the connection will default to the current user’s credentials if no parameters are specified.
Figure 2. Create a connection and declare cursor
conn = db2.connect() cursor = conn.cursor()
Next, I construct and execute my query (Figure 3). I’m using a sample employee table that is installed on my system, but you can use any table you like.
Figure 3. Query over the employee table
salaryStart = 25000.00 salaryEnd = 75000.00 query = ("Select Empno, FirstNme, LastName, Job, Salary from sample.employee " "Where Salary between ? and ? ") cursor.execute(query, (salaryStart, salaryEnd))
This is a simple query where I select employees within a certain salary range. I use parameter markers for the salary range variables to help prevent SQL injection.
After I execute my query, I need to store the results somewhere prior to loading them into Excel. This is done by using a list. A list in Python is a collection of data elements. I create a list for each of the data elements in my query (Figure 4), then load them by looping through my cursor (Figure 5).
Figure 4. Create a list for each data element
#--------------------------------------------------------------------- # create the lists #--------------------------------------------------------------------- employeeNumbers = [] firstNames = [] lastNames = [] jobTitles = [] salaries = []
Figure 5. Load the lists
#--------------------------------------------------------------------- # loop thru and load the lists #--------------------------------------------------------------------- for row in cursor: employeeNumbers.append(row[0]) firstNames.append(row[1]) lastNames.append(row[2]) jobTitles.append(row[3]) salaries.append(row[4])
Now that I have all the data I need, I can begin working with Excel. I start by creating a workbook (Figure 6).
Figure 6. Create a workbook
with Workbook('employee_listing.xlsx') as workbook:
As indicated, the name of my Excel workbook is ’employee_listing.xlsx’.
I chose to set some properties on my Excel workbook to give additional information about this workbook. This is accomplished by using the ‘set_properties’ method of xlsxwriter (Figure 7).
Figure 7. Set properties to the Excel spreadsheet
#----------------------------------------------------------------- # Set some properties to the workbook. #----------------------------------------------------------------- workbook.set_properties({ 'title': 'This is an example spreadsheet', 'subject': 'With document properties', 'author': 'Mike Larsen', 'manager': 'Lexie', 'company': 'Central Park Data Systems', 'category': 'Example spreadsheets', 'keywords': 'Sample, Example, Properties', 'comments': 'Created with Python and XlsxWriter', 'status': 'Final', })
Before I load the data to the worksheet, I set up formatting objects (Figure 8) so I can apply them to the spreadsheet. There are many ways to format the data; I’m only showing a few of them here.
Figure 8. Create format objects
#----------------------------------------------------------------- # Set up some formatting and text to highlight the panes. #----------------------------------------------------------------- header_format = workbook.add_format({'bold': True, 'align': 'center', 'valign': 'vcenter', 'fg_color': '#D7E4BC', 'border': 1}) center_format = workbook.add_format({'align': 'center'}) money_format = workbook.add_format({'num_format': '$#,##0.00'}) bold_format = workbook.add_format({'bold': True})
I created a format that is applied to my header row (header_format), as well as other formats that I apply to various cells.
Before I actually load the data, I need to create a worksheet to hold it (Figure 9).
Figure 9. Create a worksheet
ws = workbook.add_worksheet() ws2 = workbook.add_worksheet()
I’m finally ready to write the data to the worksheet. This is accomplished by writing the lists I created earlier into the worksheet columns (Figure 10). Note that I’m starting in row 2 as I want to add a header row to my worksheet.
Figure 10. Write the lists to worksheet columns
#--------------------------------------------------------------------- # write the lists to columns ws.write_column('A2', employeeNumbers) ws.write_column('B2', firstNames) ws.write_column('C2', lastNames) ws.write_column('D2', jobTitles) ws.write_column('E2', salaries, money_format)
Since column E contains salaries, I apply the ‘money_format’ to that column to make the data more presentable.
In the next step, I add a header row (Figure 11) and apply the ‘header_format’ to it.
Figure 11. Add a header row
# row, column, text, format ws.write(0, 0, 'Employee number', header_format) ws.write(0, 1, 'First name', header_format) ws.write(0, 2, 'Last name', header_format) ws.write(0, 3, 'Job title', header_format) ws.write(0, 4, 'Salary', header_format)
You may have wondered why I created two worksheets earlier. I did that because I wanted to demonstrate how you can change the tab colors on the worksheets. I do this by setting the tab color (Figure 12).
Figure 12. Change worksheet tab color
ws.set_tab_color('green')ws2.set_tab_color('red')
At this point, I have a really nice-looking spreadsheet. However, I want to demonstrate how to apply conditional formatting to individual cells or a range of cells. Conditional formatting allows me to use criteria over the cell contents to apply various types of formatting. In this example, I apply a red color to salaries that are greater than or equal to $44,000.00 and a green color to salaries that are less than $44,000.00 (Figure 13). The conditional formatting is applied to column ‘E’ as that is the column that has the salary information. I hard code the cell range in here, but in a production script you may want to soft code it.
Figure 13. Apply conditional formatting
# Add a format. Light red fill with dark red text. format1 = workbook.add_format({'bg_color': '#FFC7CE', 'font_color': '#9C0006'}) # Add a format. Green fill with dark green text. format2 = workbook.add_format({'bg_color': '#C6EFCE', 'font_color': '#006100'}) # Write a conditional format over a range. if salary >= $44,000.00 ws.conditional_format('E2:E26', {'type': 'cell', 'criteria': '>=', 'value': 44000, 'format': format1}) # Write another conditional format over the same range. if salary < $44,000.00 ws.conditional_format('E2:E26', {'type': 'cell', 'criteria': '<', 'value': 44000, 'format': format2})
With the Python script complete, it’s time to run it and see the spreadsheet it produces. Python scripts can be run by calling QP2TERM from a command prompt (Figure 14) on the green screen or by using SSH (Figure 15). For SSH, I’m using PuTTY.
My script ran successfully and I have a nicely formatted Excel spreadsheet (Figure 16).
This article showed how to query a DB2 table in Python and produce a nicely formatted Excel spreadsheet. Although most of the code used is shown throughout the article, the full version of the Python script is available for download.
thanks for this Mike, i’ll surely give it a try,
your usual fan ReynaldoDandreb.
Mike – is there any particular benefit to building entire columns in memory and then writing them as opposed to producing a row at a time from the SQL result set? It just seems to me that this could cause memory issues with large result sets. Plus of course you can’t have group totals in the results very easily.