Bufferpools and RAM Commit/Deliver

Uncategorized No Comments »

Bufferpool config is an often overlooked issue due to the rarity in which it nails you, but it can be important in those rare cases.

a Bufferpool is simply a resource limitation on a collection of RAM — typically this is a buffer, ie in-RAM space, it cannot be swapped-out because it represents in-flight transactions, uncommitted pages, or pre-fetched content that will be needed very soon.

a Commit is the RAM that is offered to a process — in glibc, this can default to 2G. This doesn’t say that every process automatically consumes 2G of RAM, but that the Kernel offers up to 2G to the process. Recall that due to sparse garbage in RAM pages, RAM offered to a process is NEVER reaped by the system back to the common pool.

A Commit can be dangerous when the OS over-commits RAM in a long-lived environment: if up to 100G is offered on a system with only 32G, you can see how if many threads grow their demand for RAM, the system will swap out some processes to meet demand. This is a typical action in a multiuser system with swap active (NOTE: Motorola tuned their commits on smartphones because on diskless systems, there’s obviously no swap)

In a long-lived database process, if the bufferpool is a commit, then it will soon grow to maximum commit. It can never be swapped out unless the database has bursty use-cases and has no active sessions for long periods. The bufferpools configured may be in addition to the heap space taken up by the process itself (in un-pooled resource space). The database itself may limit bufferpools, but consume a number of GB over the configured bufferpool space.

The other applications on the system also can demand up to their committed RAM — why limit one process while letting the others run amok on your server?

On long-lived systems, committed RAM becomes allocated and consumed RAM. Bufferpools need to be configured, and RAM usage monitored (or at least traps/exceptions raised when a critical DB starts swapping, an indication that review is critically needed)

Bufferpools and Commit/Demand discrepancies are silent but deadly killers, like the sharks and heart-disease of the resource-management domain.

VirtualWisdom UDCs for Script Variables

howto No Comments »

I worked with a user of a SAN monitoring product called VirtualWisdom, and it allows the user to define a matric for devices based on filters — basically, evaluating unrelated filter expressions in order, stopping at the first hit found, and providing a “catch-all”.

We wanted to allow the SAN administrator to keep migrating content off EVAs to remove an interposing virtualization product, with alerts to indicate when some content was in critical state without the admin having to continuously poll it. We wanted to work with user-visible response-time of the SAN (ie keep response time below 8ms) but had to settle for Capacity/Utilization metrics.

On our case, we looked at running evaperf on a disk array showing poor throughput — as soon as the user-visible response time drops (we call this “Exchange Completion Time”), the evaperf is run against that array. The problem is that the name of the array (ie the parameter to evaperf that indicates which array) is uppercase, and doesn’t match the various names of users ( ie servers named such as oradb07a2) or arrays (arrays named such as westeva8). We used UDC:

  1. if the ITL target is “wdceva1”, UDC “EVA” value is “EVA01”
  2. if the ITL target is “edceva2”, UDC “EVA” value is “EVA02”
  3. if the ITL target is “wdceva3”, UDC “EVA” value is “EVA03”
  4. if the ITL target is “edceva4”, UDC “EVA” value is “EVA04”
  5. if the ITL target is “westeva7”, USD “EVA” value is “EVA07” (yes, change in naming format)
  6. if the ITL target is “easteva8”, USD “EVA” value is “EVA08”
  7. … etc …
  8. otherwise, UDC value is “EVA00” as a catch-all

What this allowed us to do is create an alarm:

  1. groups by link, channel
  2. filter: EVA is not “EVA00” to avoid running the script where we don’t know the EVA
  3. if utilization exceeds 80%, run the evaperf external-script
  4. script parameters include $EVA$, which is defined as the value of the UDC “EVA” when the alarm triggers
  5. the “re-arm” stage sends an email to alert the administrator that evaperf was run and there is a report to pick up

This alarm forms the basis of an ECT-based alarm, but the metrics do not coincide, so we’re still looking at how to do that.

Currently, the administrator can apply this alarm to all switches, and as EVA utilization gets too high, the alarm automatically runs the evaperf tool to show what LUNs are being heavily used. The re-arm is set fairly long so that the running of evaperf (which we expect to cause some slowdown in overall processing) doesn’t get detected (via performance impact) as ANOTHER reason to run the evaperf script.

This alarm as it is allows the administrator to focus on migrating content off the EVAs, but proceed at an orderly pace until utilization on an existing link needs to be distributed off to another link — and the evaperf tool tells him which LUNs to move.

SAN Aliases for WWNs from Zonesets: Voting

howto No Comments »

I had a problem:

1) the switch I’m looking at has no aliases/nicknames for WWNs
2) the zonesets include names, but no ordering
3) produce tuples of {WWN, alias} with no dupe WWNs or aliases with most likely pairs

The input looks like:

Active Zoneset:
  Zone: FAB12SW33_ORAC4_HBA0_0899_FA_4CA
    ZoneMember: 10:00:00:00:C9:7D:B5:04
    ZoneMember: 50:06:04:8A:D5:31:AC:23
  Zone: FAB12SW33_ORAC4_HBA1_0899_FA_4CA
    ZoneMember: 50:06:04:8A:D5:31:AC:23
    ZoneMember: 10:00:00:00:C9:7D:B5:05
  Zone: FAB12SW33_ORAC4_HBA1_0899_FA_13DB
    ZoneMember: 10:00:00:00:C9:7D:B5:05
    ZoneMember: 50:06:04:8A:D5:31:AC:27
  Zone: FAB12SW33_ORAC4_HBA1_0899_FA_14DB
    ZoneMember: 10:00:00:00:C9:7D:B5:05
    ZoneMember: 50:06:04:8A:D5:31:AC:27
    ZoneMember: 10:00:00:00:C9:7D:B5:04

The intended output would be like:

10:00:00:00:C9:7D:B5:04, ORAC4_HBA0
10:00:00:00:C9:7D:B5:05, ORAC4_HBA1
50:06:04:8A:D5:31:AC:23, 0899_FA_4CA
50:06:04:8A:D5:31:AC:27, 0899_FA_13DB

You can see how the Zone has to be chopped up, and the ZoneMember items are not ordered… but because you can see slight correlation between groups with more of a WWN and its Alias, I chose a voting algorithm:

1) the first pass simply cleans things up, chops out the Active Zoneset
2) second pass tries to order the Zone name, normalize the format
3) third pass breaks apart the Zone name and produces a weighted-vote:

3a) the weight of a set is 100, so if there are three WWNs, each gets 33; two WWNs, 50 each
3b) each aliases gets a weighted vote from each WWN
3c) vote/ballot totals are tallied in a { WWN, alias } == total_votes format
4) fourth pass orders the tuples by highest votes first
5) fifth pass removes dupes and outputs the tuples with the highest vite where the WWN and alias have not been seem before

The result, unfortunately, was only 500 aliases, and took 4 hours of work; it’s completed in awk. I’m sure someone will do this in perl, or Java, but awk was my portable tool. I should be able to throw zonesets form all fabrics at this same tool from i10k switches in the future.

Checksums on Transfers

howto No Comments »

I used to send people checksums for downloads; recently, we seem to need to do this as a company.

Whenever a file is uploaded at the EU FTP server that I manage, the server sends me a sanity-check email something like:

size: 2628153754
user: abc_cheese
md5sum:      23fe22d1afad721740c0178b6ab842b0  /home/abccheese/backup-2010-11-29-12-21.zip
BSD sum -r:  38354 2566557
SYSV sum -s: 43793 5133113 /home/abccheese/backup-2010-11-29-12-21.zip

Processing archive: /home/abccheese/backup-2010-11-29-12-21.zip

Error: Can not open file as archive

When I was having to double-check uploads, I found that it was easier if the server itself told me when the upload was finished, and better, told me also if the file is OK. It doesn’t necessarily say the file is complete, simply that if it’s certain files, “does it look sane?” Zip files and 7-zips should pass a zip-defined sanity check, for example.

On an old FTP server, I have enabled the “SITE” command to allow checksums — which I later had to optimize to return a pre-calculated checksum (using “make” logic to update on changed files) in order to avoid a DoS on the server when the checksum took too long to generate. The intent was to allow a random user to calculate the checksum of a file to ensure that transfer was successful to reduce the possible errors when “something didn’t work out”… like an XML schema, it confirms that “the FTP server made specific delivery: the obligation of providing a file accurately was performed” just as a schema splits the “where did the error occur?” question in half.

In the “SantaSack” project, every file was a checksum. Yes, I had to do collision-avoidance in an MD5 signature storage. I joined the “mysqlfs”project as a replacement of SantaSack — with the intent of developing a layer that pre-calculates MD5s and SHAs asynchronously on change, storing them as file attributes for later query. I’m still considering that for MDS on OSX.

My company is looking into checksums on transferred files, now; it seems self-gratifying in an arrogant sense to see them crossing ground I’ve been over, but I regret that I’m not better-prepared.

MD5 checksum is everywhere: UNIX (md5sum {file}), BSD derivative of UNIX (md5 {file}), Windows, and cross-platform in Java: (martin: 2010-07-28_13:49:49):

        static String getMd5Digest(String pInput)
                MessageDigest lDigest = MessageDigest.getInstance("MD5");
	        BigInteger lHashInt = new BigInteger(1, lDigest.digest());
                return String.format("%1$032X", lHashInt);
            catch (NoSuchAlgorithmException lException)
                throw new RuntimeException(lException);

All of MD5 routines (both opensource and the RSA version) are a case of starting with a basic signature, updating it with a variable-length buffer of entropy, and reporting the resulting value. This can be done on a buffer as it’s used for other things, which I’ve done: the ftpkgadd tool, which was a pkgadd that worked from FTP URLs rather than filenames (connecting the socket for the GET to the inbound stream of the pkgadd decompressor) — this could be done similarly in a layer such as a compression that also MD5Update()s a buffer when compressed, or when written to the output stream. In this way, the checksum is ready when the archive is, at little additional cost.

MD5 is fairly ubiquitous, but sadly I don’t have much of this implemented anymore, save the upload-sanity-check on the FTP server.

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in