SSD Performance: Be Careful Before You Buy
November 30, 2009 Doug Mewmaw
The other day I was at an office supply store picking up a flash drive for my wife. As a teacher, the inexpensive technology is just perfect for her storage needs. Did you chuckle when I said the technology was inexpensive? I remember when a flash drive cost over $50, and now they are practically giving them away. I purchased a 4 GB flash drive for less than $10! Does any remember what we paid for our first VCR? As a passionate golfer, I see this phenomenon in the golf club industry, too. At the beginning of the year, the top new drivers are announced with a price tag of around $500. A year later, they are all leaving the store for under $200. There is nothing better than to have a technology around long enough to see the price drop dramatically, and to benefit from waiting it out. I bring this up as I was thinking about a recent customer I’ve been working with. You see, this customer implemented solid state disks (SSDs). Right now, we will all agree that SSD is like the new iPhone craze. It’s really cool technology, but do I want to fork out that kind money when my existing phone works just fine? With five kids in my family, every dollar counts, especially during a recession. I think it’s safe to say corporate America has the same mentality, especially when it comes to major expenditures. In my family, I don’t mind spending the money as long as I’m getting one thing back: bang for the buck. It made me wonder if my customer was getting SSD bang for the buck. Let’s look at what one needs to do during the SSD process to ensure you’re getting the most for your dollar, and measure the impact of a new SSD environment. Identify Your SSD Jobs For SSD implementation, the key is to see if any jobs in your system qualify as good candidates for SSD. IBM has an analyzer tool, but anyone can do it simply by looking at your current performance data. The key is to inventory your jobs into three categories:
Here is a great real-life example: What Are the SSD Guidelines? First, we will start with how much a job is waiting on disk I/O and use the disk read wait average. Here are some best practice guidelines you can refer to: Disk Read Wait Average > 3.5 milliseconds — Jobs that are good candidates for SSD. Disk Read Wait Average 1.5 to 3.5 milliseconds — Jobs that may be good candidates for SSD. The key is to look for jobs that not only have a lot of disk read waits, but to make sure you are selecting jobs that run for a long a time. In other words, who cares if a job has a high disk read wait average when it runs for only one minute? The only time I would be concerned with quick running jobs is if it was an environment where the job ran thousands of times per day. In our real-life example, my customer simply wanted to cut down his nightly batch window so his environment was one where he had the typical long-running batch jobs. The next step for my customer was to implement the SSD environment. My customer went from 72 SAS drives to 60 SAS drives plus four SSDs. Now, let’s measure the impact of the new disk environment. Measuring the SSD Impact Before measuring the SSD impact, let’s first understand the job stream that is being chosen for SSD. Below we see a batch job summary report: Some observations:
Next, the customer changes his files to be in the SSDs, so we can do a before/after analysis to see if he truly got bang for the buck. Let’s look at this analysis now:
Next, let’s look at a job disk read wait average before the SSD environment was implemented:
Measuring the job’s seven intervals, we see that the disk read wait average was 1.5 (1.494). It’s interesting that the customer chose a job where it was categorized as a maybe in regard to the potential SSD performance gain. That is, the above graph shows that all intervals were under the 3.5 milliseconds disk read wait best practice guideline. The maximum disk read wait was under the guideline as well (2.3 milliseconds). This is a great real-life example where we can see if the customer made the right SSD decision. Next, we look at the potential performance gain for this job: I like this graph because it illustrates what I would call a best-case scenario performance improvement. In other words, it shows exactly what can be gained with SSD. We see from start to finish, the job waited over 3,000 seconds due to disk waits. What does this mean? In theory, if we implemented this job and its related files into an SSD environment, we have the potential to save over 50 milliseconds in the job run time. Since the customer’s number one goal was to shorten the end of day processing, it’s important to understand our baseline data. Below we see the job run time before the SSD project was implemented. The job run time statistics are as follows: Job Started: 12:17 Job Ended: 2:01 Total Run Time: 1 hour; 44 milliseconds
Following is a job disk read wait average after the SSD environment was implemented:
Measuring the job’s seven intervals again, we see that the disk read wait average decreased to only 1.2 milliseconds, which represents a 20 percent improvement. The maximum disk read wait decreased from 2.3 milliseconds to 2.0 milliseconds (a 13 percent improvement). But did we get bang for the buck? Was the customer’s goal of decreasing the job run time, thus shortening the end of day processing, met? Let’s look at the job run time stats after SSD was implemented. The job run time statistics are as follows: Job Started: 12:01 Job Ended: 1:43 Total Run Time: 1 hour; 42 milliseconds Did the Customer Get SSD Bang For the Buck? This falls into that good news/bad news scenario. The good news is that with SSD, the disk read average wait definitely improved by 20 percent. The bad news is that the job run time only improved 1.4 percent. It only ran two milliseconds faster. So the answer to the question is: In this situation the customer did not receive bang for the buck they were expecting. However, it’s not because SSD is a bad idea. This real-life example indicates how important the SSD data analysis process is. Remember, the job selected was only a “maybe” in regard to the possible performance gain, and sure enough, the performance gain was minimal. Also, remember there are a lot of factors that affect performance. Did memory change in the environment? Is there a CPU bottleneck? Are more jobs processing on the system now? To measure the impact of SSD accurately, it’s important not to have an apples to oranges environment. The golden rule in measuring the impact of change is simple: Change one thing and measure the impact. Hopefully, the customer did that. And for the record, I am one of those people that thought the iPhone craze was just silly. I didn’t need a fancy phone when my existing phone worked just fine. Of course, I didn’t anticipate getting an iPhone for Father’s Day. Truth be told, after having an iPhone for months, I can’t imagine life without it. I wonder if someday we’ll all be saying the same thing about SSD? Doug Mewmaw is a “jack of all trades” IT veteran who currently is director of education and analysis at Midrange Performance Group, an i business partner that specializes in performance management and capacity planning.
|