Notice: register_sidebar was called incorrectly. No id was set in the arguments array for the "Sidebar 1" sidebar. Defaulting to "sidebar-1". Manually set the id to "sidebar-1" to silence this notice and keep existing sidebar content. Please see Debugging in WordPress for more information. (This message was added in version 4.2.0.) in /usr/share/wordpress/wp-includes/functions.php on line 4139 Tech Notes » SAN performance storage i/o bottleneck

Using VirtualWisdom to Reclaim Unused Disk Space

Best Practices, LUN, SAN performance storage i/o bottleneck, VirtualWisdom No Comments »

I was talking with an independent contractor a few days ago and she mentioned that more than a few customers justify buying storage management tools by using them to find unused disk space. It’s pretty common to find allocated but unused space that often amounts to tens or even hundreds of thousands of dollars’ worth of space. Though VirtualWisdom isn’t thought of as a storage capacity monitor, by watching for I/O activity, you can easily find opportunities to reclaim unused LUNs.

Below is a step-by-step process with screen grabs to illustrate exactly how this is done.

Start your VirtualWisdom Views client, which is the administration interface for VirtualWisdom. The Views client allows you to configure VirtualWisdom, create reports, set alarms and monitor the data collected by VirtualWisdom. Since we are looking for I/O traffic to a LUN, use the SAN Performance Probe to monitor the frames/sec metric in the SCSI metric set; the screen shot is below.

Then select the LUN and storage fields.

Sort the data first by Frames/Sec then Storage and LUN, via the Data Groupings tab.

Then, in the “data views” tab, select the Summary Table to list out each LUN, and a Trend chart to show the peak data for each period. The Trend chart is important because the Summary Table will show the average for a period. It’s important to note that such a small value over a long period could average out to zero. The Trend chart will let us spot these values.

Go to the reports tab in the views client. Set a period, say 30 days, and generate the report. For our small test lab, you can see that our tool found one LUN with zero activity in 30 days. With the SAN Performance Probe it’s easy to inspect the LUN and figure out why it hasn’t had any traffic.

You can use the same report with different selection criteria to look for underutilized LUNs. It’s easy, quick, and the ROI can be substantial. For a short video of this VirtualWisdom use case take a look below:

Using VirtualWisdom to De-risk a Migration / Consolidation Project

Best Practices, SAN performance storage i/o bottleneck, VirtualWisdom No Comments »

I wanted to share a real-life example of how VirtualWisdom can be used to de-risk your migration and consolidation projects:

Recently, one of our customers used VirtualWisdom to help successfully migrate a datacenter, and at the same time, consolidate two mission-critical, Oracle-based applications from two older-generation storage systems to one new storage system.

Pre-migration analysis for the new data center ensured that it was “production” ready.  VirtualWisdom was used to identify naming issues with the zone configuration, an incorrectly configured server from a multipathing perspective, queue depth configuration issues, physical layer problems, and miscellaneous performance concerns.  It’s worth noting that the physical layer issues concerned two links that were found to be borderline within specification at 4Gb, and several other ports that were found to be outside of specification at 8Gb, and were addressed before the migration occurred.  We highly recommend paying particular attention to physical layers issues when migrating to 8Gb SANs, as what worked fine at 4Gb may not work so well at 8Gb.

Before the move, the applications were benchmarked to help increase performance.  During the spin-up of the new site, which occurred on a weekend when traffic was low, VirtualWisdom reported an intermittent latency issue. The latency issue occurred for only a second or two every minute.  The  vendor performance tool that the customer was using could not detect the issue because it was averaging the latency metric and was not granular enough to pick up the anomaly.  The issue was serious enough that the team had to fix it by Monday or they forecasted an outage.  The fall-back plan was to re-deploy on the older storage arrays.  VirtualWisdom, which aggregates metrics to the one-second level, found a process that lasted one second, which was causing the problem.  Once the offending process was identified and remediated, the problem disappeared.  The new site went fully live; the Oracle-based applications functioned as predicted, and VirtualWisdom was able to confirm that the infrastructure performance of the new site, with the consolidated array, met its SLAs.

For more information on how VirtualWisdom can be used to de-risk your migration and consolidation projects, check out this tech brief on private cloud de-risking:  http://www.virtualinstruments.com/files/pdfs/tech-brief-private-cloud.pdf, or this blog on private cloud migration best practices:  http://www.virtualinstruments.com/sanbestpractices/best-practices/three-steps-to-de-risking-migration-to-the-private-cloud/ or this whitepaper on datacenter consolidation best practices at: http://www.virtualinstruments.com/files/pdfs/WP_Storage-Consolidation-Best-Practices.pdf.  If you would like to talk with the customer in this story to learn more, contact your Virtual Instruments account team and they can arrange it for you.

Understanding IOPS, MB/s, and Why They Aren’t Enough

Best Practices, SAN, SAN performance storage i/o bottleneck, VirtualWisdom No Comments »

People often don’t understand why their performance monitors don’t help to either predict or find performance problems.  Well, the answer to that could take a book, but a simple first step is understanding what IOPS is telling you, and why, in a FC SAN, you need to look at frames per second.

I/Os per second, or IOPS, is commonly recognized as a standard measurement of performance, whether to measure a storage array’s back-end drives or the performance of the SAN.  IOPS vary on a number of factors,including a system’s balance of read and write operations; whether the traffic is sequential, random or mixed; the storage drivers; the OS background operations; or even the I/O block size.

Block size is usually determined by the application, with different applications using different block sizes for various circumstances. For example, Oracle will typically use block sizes of 2 KB or 4 KB for online transaction processing, and larger block sizes of 8 KB, 16 KB, or 32 KB, for decision support system workload environments. Exchange 2007 may use an 8 KB block size, SQL may use a minimum of 8 KB, and SAP may use 64 KB, or even more.

In addition, when IOPS is considered as a measurement of performance, it’s standard practice that the throughput — that is to say, MB/sec — is also used. This is due to the different impact they have on performance.  For example, an application with only 100MB/sec of throughput, but 20,000 IOPS may not cause bandwidth issues, but with so many small commands, the storage array is put under significant pressure, as its front-end and back-end processors have an immense workload to deal with. Alternatively, if an application has a low number of IOPS but significant throughput, such as long sustained reads, then the pressure will occur on the bandwidth of the SAN links. Despite understanding this relationship, MB/s and IOPS are still insufficient measures of performance when you don’t take into consideration the frames per second.

Why is this?  Let’s look at the FC frame.  A standard FC frame has a data payload of approx 2K.  So if an application has an 8K I/O block size, this will require 4 FC frames to carry that data. In this instance, one  I/O is 4 frames.  To get a true picture of utilization, looking at IOPS alone is not sufficient because there’s a big difference between applications and their I/O size, with some ranging from 2K to even 256K.

Looking at a metric such as the ratio of frames/sec to Mb/sec, as displayed in this VirtualWisdom dashboard widget, we get a better picture and understanding of the environment and its performance. With reference to this graph of MB/sec to frames/sec ratio, the line graph should never be below the 0.2 of the y-axis, that is, the 2K data payload.

If the ratio falls below this, say at the 0.1 level, as in the widget below, we know that data is not being passed efficiently despite the throughput being maintained, as measured in MB/sec.

This enables you to proactively identify if there are a number of management frames being passed instead of data, as they are busily reporting on the physical device errors that are occurring.

Without taking frames per second into consideration and having an insight into this ratio to MB/s, it’s easy to believe that everything is OK and that data is being passed efficiently, since you see lots of traffic. However, in actuality, all you might be seeing are management frames reporting a problem. By ignoring frames per second, you run the risk of needlessly prolonging troubleshooting and increasing OPEX costs, simply by failing to identify the root cause of the performance degradation of your critical applications.

For a more complete explanation, and an example of how this applies to identifying slow-draining devices, check out this short video.

 

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in