May 18

After thrashing with sed, awk, and various other attempts to cleanly edit XML, I kept thinking “why don’t I use xpathset”?

xpathset is a tool based on an example in xmlsoft — I’ve got a copy on my cnp-tools project — but it runs on Linux, and my employer uses Windows for all supported installs of its Java-based product. That seems a non-happener.

Sometime last night, I remembered that we are a Java company, and I can compile and share Java code without incurring support issues, additional compiler toolchains, licenses, etc while still leveraging the underlying strength of my coworkers where needed.

From 7pm to 9pm I built XPathSet.java using XPathTool.java to effect behavior similar to xpathset; it took another 4 hours to clean up and document, but it’s available if you would like to re-use it. Although I didn’t open up xpathset while I was working (xpathset based on an xmlsoft example, hence MIT license), I’m certain I reused the same concepts I used in that tool, so I should license as MIT.

In this example, we are (and it is sequential!):

  1. load input.xml
  2. registering “textfile.txt” as the replacement value
  3. searching for XPath “//ScanTask[@name='scanExample1']/@file” and replacing all matches for “textfile.txt”
  4. writing the result to output.xml

The example I did this for is a filename fix in an Axeda EDD_TEXT.xml file. It is possible to read and write the same file, effecting an in-place edit — with Axeda, there may be a timing issue: I’ve found that the file cannot be written on occasion.

The side-effect of the underlying javax.xml.transform technology used is that the attributes are alphabetized on the way in or out of the DOM, so don’t be too surprised if your attributes are re-ordered. Also, indentation may change.

May 09

Recently, we have a strange situation that certain critical users cannot log into an FTP server. Of course, Icinga is helping me to check this going forward:

First, define a service check:

define service{
use bidaily-service
host_name ftp.example.com
service_description FTP Login ftp.example.com-scott
check_command check_ncftpls!'ftp://scott:tiger@ftp.example.com/'
notifications_enabled 0
}

Next, catch that odd case when the script itself is missing (in past, payload of Nagios packages has added/dropped parts that I need)

define servicedependency{
dependent_host_name ftp.example.com
dependent_service_description FTP Login ftp.example.com-scott
host_name localhost
service_description Runnable check_ncftpls
execution_failure_criteria w,c,u
notification_failure_criteria w,c,u
}

Finally, the script itself:

#!/bin/bash

NCFTPLS=$(which ncftpls) ||{ echo "FAIL ncftpls not found"; exit 2; }
test -x ${NCFTPLS} || { echo "FAIL ${NCFTPLS} runnable|"; exit 2; }

${NCFTPLS} $@ &OK"; exit 0; }

echo "${NCFTPLS} failed"
exit 2

Now, I could’ve/should’ve used the hostname in the check itself, but I was more interested in just getting it there. I will probably clean it up someday, make it more reusable, but there it is.

Note that I did not establish a dependency on the ncftpls -bearing package itself in my RPM hierarchy simply because it’s perfectly fine for the “runnable” to fail, and the script itself will never thereafter hit the FTP server until the script it safely runnable. Sure, it’s listed as a failure, but it’s a choice against a huge dependency that typically brings in 100 packages of inconsistent perl and such (hey, “just hit cpan”, they’ll do that in datacenters, sure)

May 07

I’m updating my LDAP patch for Nagios based on the most-recent release; I’m also doing it as a git repos so that it’s reusable in a more independent way.

First, there are a few non-LDAP-specific changes needed:
1) commit 06d6ca4e7dfc44b1f93dcd836625ec20a1bbc3f1 — use true/false rather than only 0/1 for booleans
2) commit b37f9f5cbc8cc93796ec68d7f7359634eca56ed3 — propagates EPOCH and BROKER build flags through specfile

Next, there are LDAP-specific changes:
1) commit 561f2521aac88244694dcd0ea264acaa3c6796a2 — read in the LDAP-based config as described in http://wiki.nagios.org/index.php/LDAP-Configured_Nagios

This is all available in git://git.chickenandporn.com/nagios.git

I haven’t ported over my test-harness, so it’s fairly unknown code right now. I’m using it, but shifting back to Icinga.

May 04

I was creating a dropbox for photo-import, and I found that I could not select iPhoto’s “Auto Import” folder for sharing.

Instead, I found that “creating an alias” (ie a softlink) gave me the solution:

  1. control-click or right-click the “Music” folder, “Show Package Contents” to see inside
  2. Navigate to the “Auto Import” directory
  3. Right-click Auto Import, “Make Alias”, and drag the “Auto Import Alias” to your desktop or home directory
  4. System Properties, “Sharing”, check the “File Sharing” to activate sharing
  5. Click the “+” under the list of shared folders to add a folder, and navigate to your “Auto Import Alias” — but don’t click OK
  6. Clicking a second time on “Auto Import Alias” will turn the filename in the top of the browser to “Auto Import” — click OK now
  7. On the newly-added “Auto Import” item, select “Everyone” in the “Users” column, and make sure “Write Only (dropbox)” is selected in the third column to make sure no one can read your photos before they’re imported

What you’ve done is “follow symlinks” — follow the Alias to its origin — much like a spawning salmon seeks the streamhead where it hatched (yeah, a sex analogy, but it’s not a sex blog!)

Effectively, the “alias” or “softlink” or “symlink” has allowed you to access a folder that MacOSX probably doesn’t intend you to… software guys would say I was exposing a protected/private member function (“exposing”, “member function” — I swear it’s not a sex blog).

Now, you can sit on a remote system and drag-n-drop photos to the import folder. iPhoto will not import then unless it’s running; if it’s not running, then it’ll import on next startup.

NOTE: allowing anyone to drag-n-drop import files to your photos on a portable laptop might be a risky idea. “seriously, officer, that childporn is not mine”.

Mar 10

In looking for a quick Red-Amber-Green check for NTP, I’ve found that “ntpdc -c sysinfo” gives about as much as needed in a first-pass:

  • system peer: 0.0.0.0 ; (less than 15 min): AMBER: service starting
  • system peer: LOCAL(0) : RED: server is self-drifting
  • system peer: example.com ; stratum: [less than 10]: GREEN: peers reached, clock converging

RED

A Failed/Unreliable state for NTP would be when no peers are reached, and either NTP has no peer, or is using LOCAL(0) (127.127.10.0)

system peer: LOCAL(0)
system peer mode: client
stratum: 11
reference ID: [127.127.1.0]

An AMBER state that persists longer than 15 minutes should be considered a “cannot reach any peer” state, a RED state.

AMBER
When NTP has gone temporarily unusable, but should return to service, I would post an AMBER. Since there’s no state-tracking (unless monitoring with Nagios or Icinga) we cannot check whether “it was just OK, now it’s bad”, so all we can do is track the startup condition.

When starting, until (8 *64=512s) 8.5 min later, NTP will show:
system peer: 0.0.0.0
system peer mode: unspec
stratum: 16

Unfortunately, there’s no clear indication that 8 cycles of 64 seconds have gone past, meaning that no pears are reachable beyond this infancy/restart period.

Unfortunately, stratum==16 seems to persist past associating (at least on Win2008r2 Enterprise on a VM)

GREEN
Clearly, “green” status is when NTP has reached peers, is associated to one, and is gradually converging the clock:

system peer: time7.apple.com
system peer mode: client
stratum: 3
reference ID: [17.151.16.23]

Feb 28

It’s well-documented that Windows7 is a serious backstep in networking core, even missing some enhancements and fixes that are in Vista. Yeah, Vista.

Of course, you can’t tell that to Windows people because it works fine with the $1000 of other software Microsoft will sell you. … so even though your DAV works fine for everything non-microsoft, and for XP, Vista, and earlier, “Windows7 is the latest, greatest, ever” is the party-line. Toe it. Just make your DAV work.

So here’s what works for me. A few notes:

  1. a location of “/” doesn’t work on Windows, so even though your Cadaver and MacOSX works, not WIndows. Add a path.
  2. BasicAuth is a non-happener: Windows7 doesn’t react to an unauthenticated second action failing (401) after it’s authenticated once in its lifetime. Use Digest. Of course, that means no AuthUserFile /etc/shadow
  3. All shares from a certain hostname need to be the same user/pass (Windows7 cannot understand different shares under different passwords) so use CNAMEs and VirtualHosts

My Config (a virtualhost, of course!)

<VirtualHost *:80>
ServerAdmin username@example.com
ServerName serv1.example.com

DocumentRoot /home/serv1
ErrorLog logs/error_log
CustomLog logs/access_log vhost
# there is a DAVLockDB elsewhere in my config

# from http://www.perlcode.org/tutorials/sysadmin/mod_dav.txt
  BrowserMatch "^WebDAVFS/1.[012]" redirect-carefully
  BrowserMatch "Microsoft Data Access Internet Publishing Provider" redirect-carefully
  BrowserMatch "^Microsoft-WebDAV-MiniRedir/" redirect-carefully
  BrowserMatch "^WebDrive" redirect-carefully
  BrowserMatch "^WebDAVFS" redirect-carefully

# from http://www.debian-administration.org/articles/279
<IfModule !mod_header.c>
LoadModule headers_module   modules/mod_headers.so
</IfModule>
  Header add MS-Author-Via "DAV"

# Location / and /shared are exactly the same, but only "/shared" is accessible from Windows
<Location />
   DAV on

   <LimitExcept PROPFIND OPTIONS>
      AuthType Digest
      AuthName "Server1 Access"
      AuthDigestProvider file
      AuthDigestDomain /shared/
      AuthUserFile  /etc/server1/DAV.htdigest
      AuthGroupFile /etc/server1/DAV.groups
      Require group server1group
   </LimitExcept>
</Location>

<Location /shared>
   DAV on

   <LimitExcept PROPFIND OPTIONS>
      AuthType Digest
      AuthName "Server1 Access"
      AuthDigestProvider file
      AuthDigestDomain /shared/
      AuthUserFile  /etc/server1/DAV.htdigest
      AuthGroupFile /etc/server1/DAV.groups
      Require group server1group
   </LimitExcept>
</Location>

</VirtualHost>

Also, the documentation doesn’t describe the “Realm” so well. To match this configuration, the htdigest command to use is (to add the user “scott”):

First time:
htdigest -c /etc/server1/DAV.htdigest 'Server1 Access' scott

After the file’s been created:

htdigest /etc/server1/DAV.htdigest 'Server1 Access' scott

Also, don’t forget to chown -R apache /home/serv1/

Users should connect to this share using http://serv1.example.com:80/shared (note: use “:80″ on XP to force the older handler). Windows7 users should:

  1. use the Windows Explorer,
  2. choose “My Computer”,
  3. choose the “Map Network Drive” that shows up with “My Computer” selected,
  4. type in “http://serv1.example.com:80/shared” (no trailing “/”),
  5. select “login with other credentials” and click “next”
  6. carefully, give the username (ie “scott”) and the password
  7. the filesystem should be available. test by creating a file, creating a directory, and moving the test file into the test directory
  8. unmounting is done by restarting the system
  9. If you type your password incorrectly, you must restart your system to retry

Mac Users just Command-K, type in “http://serv1.example.com” or “http://serv1.example.com:80/” or “http://serv1.example.com/shared” (it doesn’t matter with Mac, it just works) and when connecting, they’ll be asked for the authentication details if it’s not in their keychain. Otherwise, it’ll just mount the filesystem, easy-peasy. Unmounts like any ejected disk or filesystem.

Jan 26

Previously, I posted that a quick way to throttle back your timemachine is to set your defaults:

I’m currently testing TimeMachineScheduler, which would give more control over the configuration. The magic is how the author has worked around this problem:
1) disable automatic backup
2) enable scheduled backup from launchd:
/Library/LaunchDaemons/com.klieme.TimeMachineScheduler.plist

Literally, he’s created a launchd process that wakes up at the user-specified interval, and (presumably) if the time is within the “Skip backup between” values, his app must short-circuit the timemachine run and shut itself down until the next “wake” interval.

I’m impressed.

The old method was a defaults write com.apple.something.something 3 (for three-hour throttle) which is a bit difficult for non-geeks to type, and had a different behavior: it would skip the TimeMachine if the previous run was within the given number of hours. The difference is (period 3 hours for both, skip 08:00-18:00) that a change would be backed up on average within (3/2) 1.5 hrs in the defaults method, but would slow down your user-experience during the day; alternatively, the scheduler method means that backups perform on regular intervals, and the bdefaults method doesn’t always (ever) work correctly.

As an aside, I’m also curious whether I would get better throughput offering the timemachine disk as an iSCSI Target

While researching, I also came across a method of limiting TimeMachine from filling the volume

Jan 15

Wow, it’s possible ot never have to reformat my resume again?

Apparently, a 5-year-stale project called “XML Resume” gets us a little close to this impossible goal, but support may be challenging. Also, generating formats form the common XML base requires some dependencies (as can be expected).

Other attempts are listed at MicroFormats, but if I can get work that lets me actually build things, I won’t need a resume for a while. I have a great company, but I spend a lot of time on disposable work and/or sitting in airports, hotels, car rentals.

Actually, MicroFormats — which lists the HackCollege content — is a great place to start. …so that makes this entry merely a redirection to that blog entry. Mostly. But mine’s better because it’s written in Lisp. Or whatever.

Jan 13

I’ve been having a problem on an FTP server that may well be resolved using FUSE, curl, and autofs, plus some blind rsync:

When I synch content across the Atlantic to another server, it’s always truncated despite my best efforts. It seems that if there’s too large of latency, the windows-based FTP service chokes and dies, and the uploaded content merely stops. FTP is a bit of a dumb protocol, it doesn’t much realize that the upload was finished due to abort rather than an orderly finish unless the command channel reflects this — if the ftp-data socket closes, it’s done.

I’ve asked for rsync service to avoid this (and to make a more gentle sync than recursive wget) to no avail — the people discussing the request don’t ask me questions, and seem to refuse on assumption.

  • FTP mounted as a filesystem allows a rsync
  • Rsync allows a sync with retry and continues a bit more gracefully
  • FUSE can mount a filesystem in userspace
  • CurlFTPfs can mount a FTP server via FUSE
  • AutoFS allows the filesystem to unmount when idle, isolating us from server restarts (it IS windows after all)

in /etc/auto.master, I added:
# --- a separate direct-map automount (large space is a tab)
/ftpfs /etc/auto.ftpfs
#

I created /etc/auto.ftpfs:
# from USA server: (large spaces are tabs) (area01 is all one line, no space between password and @exam...)
area01 -fstype=fuse,allow_other,ro curlftpfs#username01:PassWord01 @ftp01.example.com/
area02 -fstype=fuse,allow_other,ro curlftpfs#username02:PassWord02 @ftp02.example.com/

I added this to an RPM that requires fuse-curlftpfs, and made it service autofs reload in install (postin and postun)

Now, I can do the following: (still testing):

rsync /ftpfs/area01/* /my/backup/dir/area01/
rsync /ftpfs/area02/* /my/backup/dir/area02/

… and this command will implicitly automount the filesystems in turn, allowing the rsync to sync content. Of course, it’s more efficient if the FTP server simply activated the rsync service, but it’s Windows, and rsync is only a decade old, so Windows won’t have it yet (and the administrator fears both the Cygwin dependency issues and the simplified route)

The rsync might do horrible things with the FTP server, I don’t know. If it does, maybe that will open up a cooperative dialogue (Kidding! Forcing someone’s hand is never a good way to start!).

Dec 13

I worked with a user of a SAN monitoring product called VirtualWisdom, and it allows the user to define a matric for devices based on filters — basically, evaluating unrelated filter expressions in order, stopping at the first hit found, and providing a “catch-all”.

We wanted to allow the SAN administrator to keep migrating content off EVAs to remove an interposing virtualization product, with alerts to indicate when some content was in critical state without the admin having to continuously poll it. We wanted to work with user-visible response-time of the SAN (ie keep response time below 8ms) but had to settle for Capacity/Utilization metrics.

On our case, we looked at running evaperf on a disk array showing poor throughput — as soon as the user-visible response time drops (we call this “Exchange Completion Time”), the evaperf is run against that array. The problem is that the name of the array (ie the parameter to evaperf that indicates which array) is uppercase, and doesn’t match the various names of users ( ie servers named such as oradb07a2) or arrays (arrays named such as westeva8). We used UDC:

  1. if the ITL target is “wdceva1″, UDC “EVA” value is “EVA01″
  2. if the ITL target is “edceva2″, UDC “EVA” value is “EVA02″
  3. if the ITL target is “wdceva3″, UDC “EVA” value is “EVA03″
  4. if the ITL target is “edceva4″, UDC “EVA” value is “EVA04″
  5. if the ITL target is “westeva7″, USD “EVA” value is “EVA07″ (yes, change in naming format)
  6. if the ITL target is “easteva8″, USD “EVA” value is “EVA08″
  7. … etc …
  8. otherwise, UDC value is “EVA00″ as a catch-all

What this allowed us to do is create an alarm:

  1. groups by link, channel
  2. filter: EVA is not “EVA00″ to avoid running the script where we don’t know the EVA
  3. if utilization exceeds 80%, run the evaperf external-script
  4. script parameters include $EVA$, which is defined as the value of the UDC “EVA” when the alarm triggers
  5. the “re-arm” stage sends an email to alert the administrator that evaperf was run and there is a report to pick up

This alarm forms the basis of an ECT-based alarm, but the metrics do not coincide, so we’re still looking at how to do that.

Currently, the administrator can apply this alarm to all switches, and as EVA utilization gets too high, the alarm automatically runs the evaperf tool to show what LUNs are being heavily used. The re-arm is set fairly long so that the running of evaperf (which we expect to cause some slowdown in overall processing) doesn’t get detected (via performance impact) as ANOTHER reason to run the evaperf script.

This alarm as it is allows the administrator to focus on migrating content off the EVAs, but proceed at an orderly pace until utilization on an existing link needs to be distributed off to another link — and the evaperf tool tells him which LUNs to move.