A Few Excel Export to CSV Tips
January 27, 2010 Michael Sansoterra
Comma separated variable (CSV) files are frequently used when exporting DB2 data for use with Microsoft Excel. However, because CSV files contain text without any additional formatting instructions, it can be somewhat time-consuming for users to format their worksheets whenever they get a new export. This tip will address a few of the shortcomings of using CSV files with Excel and present a few possible workarounds to make the lives of your Excel user community easier. Many of these ideas are only applicable to CSV exports intended for Excel usage. If the export file is dual purpose (one for users and another for a computer system), the ideas may not be applicable. Also, many of these tips will only work if you have the ability to manipulate the data programmatically (with RPG, COBOL, SQL, etc.) before or during the creation of the CSV file. (Note: If you’re comfortable with Java and are looking for a flexible, programmatic solution for creating native Excel worksheets complete with formatting, check out the Apache POI project, which is called the Java API for Microsoft Documents. This site contains Java classes that enable developers to read and write native Excel and other Microsoft Office file formats.) Display Issues with Numbers It is quite common for CSV export files to contain text columns that contain numeric data. Unfortunately, Excel pays no attention to the fact that these columns should be treated as text fields and instead formats the data as it sees fit. Two often witnessed problems with these “text” columns are: the display of large numbers in scientific notation, and the removal of leading zeros that are intended for display. For example, a large GL account number such as “1000084002402” will display something like 1E+12 or 1.00008E+12 (depending on the width of the column). This scientific notation format is a pain to read and users end up re-formatting the column as text. Zip codes are an example of string data that may contain leading zeros. For instance, some Massachusetts Zip codes have a leading zero: 02351. But there’s nothing more annoying than these Zip codes showing up in a spreadsheet without the leading zero! The solution to resolve both these issues is to simply prefix a single quote (‘) in front of the data column in the CSV file. So if you want Excel to leave a number like this alone: “001236412” Simply put it in the CSV file as: “‘001236412” This will tell Excel to display the number as is. Display Issues with Embedded Double Quotes In CSV files, the most common string delimiter is the double quote (“). An item description within a CSV file may look like this: “Red Bicycle” A description for 12-inch PVC pipe might look like this: “12” PVC Pipe” Having the delimiter embedded within the description causes Excel to misinterpret where the data for the column begins and ends. In the above PVC description, Excel will interpret the data as two columns: a column with “12” and another column containing “PVC Pipe.” One way to fix this problem is to replace the double quote with a rare substitute character such as a tilde when creating the CSV file. However, once the user opens the CSV file in Excel, he will have to do a replace operation (use Control+H) to replace the substitute character with the double quote in order to return things to normal. A better solution is to “double up” each double quote within the data itself. Excel will interpret this “doubled up” quote as one quote character instead of a delimiter: “12” PVC Pipe” If you have the liberty to use SQL to manipulate your data table before exporting it to CSV, this operation becomes a piece of cake by simply using the REPLACE built-in function with an UPDATE statement: Update MyTable Set MyData=Replace(MyData, '"', '""') The one caveat here is that the MyData column definition may need to be expanded in order to hold a few additional characters for the storage of the extra quotes. Trimming Spaces Extraneous spaces in CSV files make it cumbersome for a user to work with the data. No one likes editing a cell only to have to backspace over a large number of trailing spaces. Trimming extraneous spaces will save file space and reduce some drudgery for users. If you create CSV files using the CPYTOIMPF command, don’t overlook the fact that this command has a “remove blank parameter” for trimming leading and/or trailing spaces: RMVBLANK(*BOTH). Formatting Column Widths One big pet peeve I have when working with CSV files in Excel is that they usually need some sort of formatting when they’re first opened. One of the most time consuming tasks is sizing the columns to fit the data. An easy way to automatically size the columns once the spreadsheet is open is to click on the handle that sits in the upper-left corner of the spreadsheet (to the left of the A column and above the first row) as shown in Figure 1.
Single clicking on this handle will highlight the entire spreadsheet. Then, let your mouse hover on the line that separates the A and B column headers. When hovering over this line, the mouse icon will change to the crosshair symbol. When it does, simply double-click the line between the two column headers and watch all the columns in the spreadsheet dynamically resize to fit the data. Keep in mind the entire sheet must be selected for this behavior to work for all columns. This trick also works for resizing all rows: simply repeat the process for the columns except double-click the line between the indicators for row one and two. Implementing these few tips can add up to substantial time savings for users as they’ll be spending much less time trying to fix up their spreadsheets. Michael Sansoterra is a DBA for Broadway Systems in Grand Rapids, Michigan. Send your questions or comments for Mike via the IT Jungle Contact page.
|