Expanding the Reach of Real-time Monitoring

Best Practices, Real-Time Monitoring, SAN, SNW, VirtualWisdom No Comments »

It’s been a busy and exciting few months at Virtual Instruments. At SNW Fall last month, we introduced the new high-density VirtualWisdom SAN Performance Probe. By doubling the density and supporting up to 16 Fiber Channel links per unit, the ProbeFC8-HD enables customers to monitor more of their infrastructure for less. In fact, customers can expect to reduce the cost of real-time monitoring by 25 percent and lower power consumption by 40 percent.

We also announced enhanced support for FCoE. With FCoE-specific enhancements to the current SAN Availability Probe module we’re able to deliver improved monitoring of top-of-rack FCoE switches, extending visibility into infrastructure performance, health and utilization across converged network environments.

We had the chance to meet with a number of customers, press and analysts at SNW to share our news. Check out the news and learn more about our VirtualWisdom platform, courtesy of W. Curtis Preston, Truebit.TV.

What’s Hotter Than SSDs in the Storage Market? Knowing When and How to Use Them.

Best Practices, Infrastructure Performance Management, solid state disks No Comments »

According to searchstorage.com, there are over 400 companies designing and marketing solid state disks today.  Wow.  And according to the folks at the InfoPro, a service of 451 Research, which provides insight based on one-on-one, in-depth interviews with a proprietary network of the world’s largest buyers and users of IT, auto-tiering is perhaps the hottest storage technology today, as FC drives are being replaced with SSD and SAS blends (This figure came from their latest TIP Insight report.)  So I have to ask, are our performance problems over?  OK, it’s a rhetorical question.  But I have to laugh when someone points to IOPS or MB/s as their performance benchmark.  We’ve got plenty of examples where users were getting great utilization numbers, but their application owners were complaining of performance problems.  For example, in the time-correlated dashboard below, the top graph shows R/W MB/s and the middle graph shows application latency.  You probably can’t see the detail in this blog, but latency skyrockets from less than 25ms to over 150ms. Yet, MB/s doesn’t seem to be affected.

So, now that we know we have a problem, the solution is to apply SSD, right?  No.  Under normal circumstances, there is no performance problem and SSDs won’t make any difference.  The bottom graph shows a metric called Pending Exchanges.  In this case, there’s a very close correlation to how the HBA queue depths are set and the effect on application latency.  In this case, the customer adjusted the queue depth settings and the problem was remediated.

Maximizing application performance is one of IT’s most important tasks. The problem is that accurately identifying the root causes of poor performance is difficult, especially when the wrong metrics are used. This often leads to the assumption that storage subsystems are the bottleneck resulting in expensive upgrades, which still may not solve the problem. Through the use of a real-time, systems-level, Infrastructure Performance Management platform and metrics like transaction latency, IT can identify the real causes of application performance issues and fix those problems.  Sometimes, the best solution is to re-tier your application, applying SSD technologies.  But the best practice is to use the right tools to find out where these make sense, and when they don’t.

If you’d like to learn more about how to find the true cause of performance bottlenecks, call us, or call one of our resellers, and we’ll put you together with a performance specialist who can help you figure out when SSDs can help, and when they can’t.  Or check out this new report by Storage Switzerland, on “How do you know if you really need SSD”.



Using VirtualWisdom to Reclaim Unused Disk Space

Best Practices, LUN, SAN performance storage i/o bottleneck, VirtualWisdom No Comments »

I was talking with an independent contractor a few days ago and she mentioned that more than a few customers justify buying storage management tools by using them to find unused disk space. It’s pretty common to find allocated but unused space that often amounts to tens or even hundreds of thousands of dollars’ worth of space. Though VirtualWisdom isn’t thought of as a storage capacity monitor, by watching for I/O activity, you can easily find opportunities to reclaim unused LUNs.

Below is a step-by-step process with screen grabs to illustrate exactly how this is done.

Start your VirtualWisdom Views client, which is the administration interface for VirtualWisdom. The Views client allows you to configure VirtualWisdom, create reports, set alarms and monitor the data collected by VirtualWisdom. Since we are looking for I/O traffic to a LUN, use the SAN Performance Probe to monitor the frames/sec metric in the SCSI metric set; the screen shot is below.

Then select the LUN and storage fields.

Sort the data first by Frames/Sec then Storage and LUN, via the Data Groupings tab.

Then, in the “data views” tab, select the Summary Table to list out each LUN, and a Trend chart to show the peak data for each period. The Trend chart is important because the Summary Table will show the average for a period. It’s important to note that such a small value over a long period could average out to zero. The Trend chart will let us spot these values.

Go to the reports tab in the views client. Set a period, say 30 days, and generate the report. For our small test lab, you can see that our tool found one LUN with zero activity in 30 days. With the SAN Performance Probe it’s easy to inspect the LUN and figure out why it hasn’t had any traffic.

You can use the same report with different selection criteria to look for underutilized LUNs. It’s easy, quick, and the ROI can be substantial. For a short video of this VirtualWisdom use case take a look below:

A Simple Strategy for Reducing Your Reactive Tickets

Best Practices, datacenter migration, VirtualWisdom No Comments »

One of Virtual Instruments’ missions is to reduce the big, messy outages that happen in our customers’ datacenters. For many sectors across the industry, we work to determine ways to catch problems early to avoid outages. Numerous competitors have analytics engines and all kinds of software to detect problems before they become outages, but they are not all equal.

A few weeks ago, I was reviewing a customer’s successful datacenter migration.  Their strategy for success when using VirtualWisdom is to tag all the tickets that result from our solution. The IT staff is told to do the VirtualWisdom tickets first, and the customer found that over time, the number of reactive tickets decreased. What was happening was that VirtualWisdom found all the small problems that might otherwise get overlooked, such as: increases in physical errors that precede the failure of a SFP fiber module, the increase of traffic on a link that creeps over the threshold where it won’t successfully fail over to its backup link, and the misconfiguration of a path. VirtualWisdom quietly and diligently finds these problems, and the customer found if they fixed them, bigger problems were avoided.

I wanted to share this policy with my blog readers because it makes a lot of sense and is simple to implement. If anyone else tries this strategy, let me know how well it works for you.

Using VirtualWisdom to De-risk a Migration / Consolidation Project

Best Practices, SAN performance storage i/o bottleneck, VirtualWisdom No Comments »

I wanted to share a real-life example of how VirtualWisdom can be used to de-risk your migration and consolidation projects:

Recently, one of our customers used VirtualWisdom to help successfully migrate a datacenter, and at the same time, consolidate two mission-critical, Oracle-based applications from two older-generation storage systems to one new storage system.

Pre-migration analysis for the new data center ensured that it was “production” ready.  VirtualWisdom was used to identify naming issues with the zone configuration, an incorrectly configured server from a multipathing perspective, queue depth configuration issues, physical layer problems, and miscellaneous performance concerns.  It’s worth noting that the physical layer issues concerned two links that were found to be borderline within specification at 4Gb, and several other ports that were found to be outside of specification at 8Gb, and were addressed before the migration occurred.  We highly recommend paying particular attention to physical layers issues when migrating to 8Gb SANs, as what worked fine at 4Gb may not work so well at 8Gb.

Before the move, the applications were benchmarked to help increase performance.  During the spin-up of the new site, which occurred on a weekend when traffic was low, VirtualWisdom reported an intermittent latency issue. The latency issue occurred for only a second or two every minute.  The  vendor performance tool that the customer was using could not detect the issue because it was averaging the latency metric and was not granular enough to pick up the anomaly.  The issue was serious enough that the team had to fix it by Monday or they forecasted an outage.  The fall-back plan was to re-deploy on the older storage arrays.  VirtualWisdom, which aggregates metrics to the one-second level, found a process that lasted one second, which was causing the problem.  Once the offending process was identified and remediated, the problem disappeared.  The new site went fully live; the Oracle-based applications functioned as predicted, and VirtualWisdom was able to confirm that the infrastructure performance of the new site, with the consolidated array, met its SLAs.

For more information on how VirtualWisdom can be used to de-risk your migration and consolidation projects, check out this tech brief on private cloud de-risking:  http://www.virtualinstruments.com/files/pdfs/tech-brief-private-cloud.pdf, or this blog on private cloud migration best practices:  http://www.virtualinstruments.com/sanbestpractices/best-practices/three-steps-to-de-risking-migration-to-the-private-cloud/ or this whitepaper on datacenter consolidation best practices at: http://www.virtualinstruments.com/files/pdfs/WP_Storage-Consolidation-Best-Practices.pdf.  If you would like to talk with the customer in this story to learn more, contact your Virtual Instruments account team and they can arrange it for you.

SAN Troubleshooting Best Practices

Best Practices, bottlenecks, Real-Time Monitoring, SAN, troubleshooting No Comments »

People often ask — “Are there any best practices that the troubleshooting experts recommend?” I asked a couple of our top services guys for their recommendations, and I’m sharing them with all of you today:

  1. Don’t stop looking just because you’ve removed the symptom, because if you do, you’re likely to see the same problems later. Sure, to alleviate the immediate problem, you may have to remove users or applications that are less critical, perhaps stop backups, and remove other potential bottlenecks. While this may fix the immediate problem, it often stops the underlying cause from being discovered.
  2. Use “real” real-time monitoring for alerts that get you in front of the issues before the application users feel the pain.
  3. Sometimes you have to broaden your approach beyond what the user is reporting. If you stop there, you will often miss larger issues that may affect other, slightly less latency-sensitive apps.
  4. As a first step for triage, try to isolate whether the cause is on the server or the SAN. Comparing your baseline Exchange Completion time with ECT during the slowdown, will tell you immediately where to start, and where to stop looking. Your vendors will appreciate it, too.
  5. Try to find the finest granularity in your historical reporting to see which event preceded another, for cause and effect. A one-minute interval is often not sufficiently granular.
  6. Look at your historical I/O patterns, busy times of day, multipath configurations, queue depth settings, top talkers, etc. to gain a profile of behavior. Then compare with your healthy baseline, and rule out things that haven’t changed. You might find 6 things that appear to be going wrong, but if only one of those things seem to have occurred when the problem was reported, you can focus on that thing immediately. Later on, you can go back to look at the others.
  7. When changes are made to fix the incident, you should get immediate feedback. Without an immediate response, customers often take one of two approaches: 1) They delay or stagger fixes until they can determine the effect of each one; 2) Or they make all changes at the same time, and are then left wondering which change fixed the problem.
  8. Lastly, ask for help sooner rather than later. We’ve heard of problems dragging on for months, vendors getting kicked out of accounts, and literally millions of dollars wasted on adding expensive hardware. Waiting days or weeks to find the root cause of a problem is unacceptable. Bring in a performance pro.



Controlling Over-Provisioning of Your Storage Ports

Best Practices, latency, over-provisioning, SAN, storage arrays, VirtualWisdom No Comments »

While it’s generally accepted that SAN storage utilization is low, only a few industry luminaries, such as John Toigo, have talked about the severe underutilization of Fibre Channel (FC) SAN fabrics.  The challenge, of course, is that few IT shops have actually instrumented their SANs to enable accurate measurements of fabric utilization.  Instead, 100% of enterprise applications get the bandwidth that perhaps only 5% of the applications, wasting CAPEX need. 

In dealing with several dozen large organizations, we have found that nearly all FC storage networks are seriously over-provisioned, with average utilization rates well below 10%.  Here’s a VirtualWisdom dashboard widget (below) that shows the most heavily utilized storage ports on two storage arrays, taken from an F500 customer.  The figures refer to “% utilization.”

Beyond the obvious unnecessary expense, the reality is that with such low utilization rates, simply building in more SAN hardware to address performance and availability challenges does nothing more than add complexity and increase risk.  With VirtualWisdom, you can consolidate your ports, or avoid buying new ones, and track the net effect on your application latency to the millisecond.  The dashboard widgets below show the “before” and “after” latency figures that resulted from the configuration changes to this SAN, using VirtualWisdom.  They demonstrate a negligible effect.

Latency “before”

Latency “after”

Our most successful customers have tripled utilization and have been able to reduce future storage port purchases by 50% or more, saving $100 – $300K per new storage array.

For a more detailed discussion of SAN over-provisioning, click here, or check out this ten-minute video discussing this issue and over-tiering.

Eager Attendees Ready to Learn During Hands-On-Lab Sessions at Spring SNW 2012

Best Practices, Hands-On Lab, SAN, SNW, storage, VirtualWisdom No Comments »

 At the spring Storage Network World (SNW) show in Dallas, I had the pleasure of teaching the hands-on lab session for VirtualWisdom with Andrew Benrey, VI Solutions Consultant, and we had a fantastic response to our “Storage Implications for Server Virtualization” session. We co-presented with Avere and HP 3par, and during the two-hour session, we covered how to use VirtualWisdom to administer and optimize a fiber channel SAN, NAS optimization with the Avere appliance and the use of thin provisioning and reclamation using the HP 3par arrays.

The lab exercises covered all areas of SAN administration. The first exercise looked at how we discover and report physical layer errors. We then looked at queue depth performance, imbalanced paths, and detection of slow-draining devices using buffer-to-buffer credits. In the last exercise, we reviewed a VMware infrastructure showing the virtual machines, fiber channel fabric and SCSI performance.

I found it interesting that for most of the lab sessions, many students picked the VirtualWisdom lab to start with. I believe that with the demand for proactive SAN management, more and more people are finding out about the benefits of VirtualWisdom, and came to the hands-on-lab to see for themselves. When looking at the attendance numbers, our lab was sold out for most sessions. Our most popular session had a sign up list of 52 for 20 seats.  During the six sessions we conducted, we were able to meet and talk with almost 500 attendees in depth about the need for tools like VirtualWisdom and the advantages this platform offers for SAN teams working in a virtualized environment.  Attendees liked the ability to quickly walk through the infrastructure from the ESXi server down to the storage array and spot the anomalies. The ability to go back in time was also of importance. Several customers were in the lab as part of their product evaluation.

Those of you who have seen VirtualWisdom understand how rich our user interface can be. For the lab exercises, I specifically divided up exercises so that the lab attendees had a much simpler and more easily understood interface in which to work. This turned out well as very few of the attendees needed additional help in working with the Dashboard interface.

Storage Network World Hands-On Lab Infrastructure

De-risking SAP Performance and Availability

Best Practices, SAP, VirtualWisdom No Comments »

It’s no secret that many enterprise mission critical IT implementations depend on SAP.  In 2008, the Standish Group estimated the average cost of an ERP downtime at $888K per hour. If you’re an SAP user, you probably have some idea of your cost of downtime.

What’s surprising to me is that often companies still rely on massive over-provisioning to handle the database growth and ensure that their infrastructure can meet the level of performance and availability required for informal or formal Service Level Agreements.  On one level, it’s understandable, because the stakes are so high.  But we’re starting to see a trend towards better instrumentation and monitoring, because, while the stakes are high, so are the costs.

The truth is, the performance of SAP is usually not bottlenecked by server-side issues, but rather by I/O issues.  Unfortunately, most of today’s monitoring solutions, including the best known APM solutions, have a tough time correlating your applications with your infrastructure.  The “link” between the application and the infrastructure is often inferred, or is so high level that deriving actual cause and effect is still a guessing game.

Many of our largest customers de-risk their SAP applications using VirtualWisdom to directly correlate the infrastructure latency to their application instances.  In this simple dashboard widget (below), an application owner tracks, in real time, the application latency, in milliseconds, caused by the SAN infrastructure.

With this level of tracking and correlation, many of the largest SAP and VirtualWisdom customers have successfully de-risked their growing, mission-critical SAP deployments.

To hear our Director of Solutions Consulting Alex D’Anna discuss this issue in more detail, I encourage you to attend his 35-minute On-Demand webcast.

Spring 2012: Storage Networking World

Best Practices, Dallas, SNW, storage, virtualization, VirtualWisdom No Comments »

It was great to be at the Storage Networking World (SNW) show in Dallas last week. We saw more customers sending people from the operations and the architecture/planning groups. It’s important for operations and architecture/planning to work together on SAN infrastructure, so it was good to see this and to hear some of the attendee’s remark they were hired to bridge the gap between these groups.

In a panel of CIOs at medium to large companies, all agreed that staffing remains a huge issue.  No one is getting new headcount, yet the number of new technologies they have to work with continues to grow.  Some saw a solution in cross-training IT staff.  One CIO is creating “pods” where architects and planners work closely with operations.  Everyone agreed that even though the effect of training and cross-training staff often results in “poaching,” it was still worth it to have a better-trained staff.  At Virtual Instruments, we agree with this trend and see cross-domain expertise taking on a more of an important role. VirtualWisdom, for instance, is designed for use by everyone in the infrastructure, from the DBAs and server admins to the fabric and storage admins.

Stew Carless, Virtual Instruments Solutions Architect, held a well-attended session on, “Exploiting Storage Performance Metrics to Optimize Storage Management Processes.”  In the session, Stew talked about how using the right instrumentation can go a long way towards eliminating a lot of the guessing game that often accompanies provisioning decisions.

Over at the Hands-on-Lab, Andrew Benrey and I led the Virtual Instruments part of the “Storage Implications for Server Virtualization” session. We had a full house for most of the sessions and we were pleased that many of the lab attendees were familiar with Virtual Instruments before they participated in the lab.

In a real-time illustration of managing the unexpected: The big news at the show came from the U.S. weather service, when a series of tornados ripped through the Dallas area to the east and west of the hotel. The SNW staff and the hotel did an excellent job of gathering everyone on the expo floor and sharing updates on what was happening. After a two-hour interruption, the SNW staff did a great job of getting the conference back underway. The expo exhibitors enjoyed the two hours of a captive audience!

With a couple of exceptions, many of the big vendors weren’t at SNW, which we see as a positive trend.  People come to these events to learn about new things, and frankly, the newest things come from the newest, smallest vendors.  At SNW, the floor was full of smaller, newer vendors who may not have direct sales forces who can blanket continents, but whose fresh insights and new approaches provided valuable insights for the SAN community.  I didn’t hear one end user complain that their favorite big vendor wasn’t there.

The next Storage Network World show will be in Santa Clara this October. We are looking forward to meeting everyone again and to catch up on what’s going on.



WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in