Mar 10

In looking for a quick Red-Amber-Green check for NTP, I’ve found that “ntpdc -c sysinfo” gives about as much as needed in a first-pass:

  • system peer: 0.0.0.0 ; (less than 15 min): AMBER: service starting
  • system peer: LOCAL(0) : RED: server is self-drifting
  • system peer: example.com ; stratum: [less than 10]: GREEN: peers reached, clock converging

RED

A Failed/Unreliable state for NTP would be when no peers are reached, and either NTP has no peer, or is using LOCAL(0) (127.127.10.0)

system peer: LOCAL(0)
system peer mode: client
stratum: 11
reference ID: [127.127.1.0]

An AMBER state that persists longer than 15 minutes should be considered a “cannot reach any peer” state, a RED state.

AMBER
When NTP has gone temporarily unusable, but should return to service, I would post an AMBER. Since there’s no state-tracking (unless monitoring with Nagios or Icinga) we cannot check whether “it was just OK, now it’s bad”, so all we can do is track the startup condition.

When starting, until (8 *64=512s) 8.5 min later, NTP will show:
system peer: 0.0.0.0
system peer mode: unspec
stratum: 16

Unfortunately, there’s no clear indication that 8 cycles of 64 seconds have gone past, meaning that no pears are reachable beyond this infancy/restart period.

Unfortunately, stratum==16 seems to persist past associating (at least on Win2008r2 Enterprise on a VM)

GREEN
Clearly, “green” status is when NTP has reached peers, is associated to one, and is gradually converging the clock:

system peer: time7.apple.com
system peer mode: client
stratum: 3
reference ID: [17.151.16.23]

Feb 28

It’s well-documented that Windows7 is a serious backstep in networking core, even missing some enhancements and fixes that are in Vista. Yeah, Vista.

Of course, you can’t tell that to Windows people because it works fine with the $1000 of other software Microsoft will sell you. … so even though your DAV works fine for everything non-microsoft, and for XP, Vista, and earlier, “Windows7 is the latest, greatest, ever” is the party-line. Toe it. Just make your DAV work.

So here’s what works for me. A few notes:

  1. a location of “/” doesn’t work on Windows, so even though your Cadaver and MacOSX works, not WIndows. Add a path.
  2. BasicAuth is a non-happener: Windows7 doesn’t react to an unauthenticated second action failing (401) after it’s authenticated once in its lifetime. Use Digest. Of course, that means no AuthUserFile /etc/shadow
  3. All shares from a certain hostname need to be the same user/pass (Windows7 cannot understand different shares under different passwords) so use CNAMEs and VirtualHosts

My Config (a virtualhost, of course!)

<VirtualHost *:80>
ServerAdmin username@example.com
ServerName serv1.example.com

DocumentRoot /home/serv1
ErrorLog logs/error_log
CustomLog logs/access_log vhost
# there is a DAVLockDB elsewhere in my config

# from http://www.perlcode.org/tutorials/sysadmin/mod_dav.txt
  BrowserMatch "^WebDAVFS/1.[012]" redirect-carefully
  BrowserMatch "Microsoft Data Access Internet Publishing Provider" redirect-carefully
  BrowserMatch "^Microsoft-WebDAV-MiniRedir/" redirect-carefully
  BrowserMatch "^WebDrive" redirect-carefully
  BrowserMatch "^WebDAVFS" redirect-carefully

# from http://www.debian-administration.org/articles/279
<IfModule !mod_header.c>
LoadModule headers_module   modules/mod_headers.so
</IfModule>
  Header add MS-Author-Via "DAV"

# Location / and /shared are exactly the same, but only "/shared" is accessible from Windows
<Location />
   DAV on

   <LimitExcept PROPFIND OPTIONS>
      AuthType Digest
      AuthName "Server1 Access"
      AuthDigestProvider file
      AuthDigestDomain /shared/
      AuthUserFile  /etc/server1/DAV.htdigest
      AuthGroupFile /etc/server1/DAV.groups
      Require group server1group
   </LimitExcept>
</Location>

<Location /shared>
   DAV on

   <LimitExcept PROPFIND OPTIONS>
      AuthType Digest
      AuthName "Server1 Access"
      AuthDigestProvider file
      AuthDigestDomain /shared/
      AuthUserFile  /etc/server1/DAV.htdigest
      AuthGroupFile /etc/server1/DAV.groups
      Require group server1group
   </LimitExcept>
</Location>

</VirtualHost>

Also, the documentation doesn’t describe the “Realm” so well. To match this configuration, the htdigest command to use is (to add the user “scott”):

First time:
htdigest -c /etc/server1/DAV.htdigest 'Server1 Access' scott

After the file’s been created:

htdigest /etc/server1/DAV.htdigest 'Server1 Access' scott

Also, don’t forget to chown -R apache /home/serv1/

Users should connect to this share using http://serv1.example.com:80/shared (note: use “:80″ on XP to force the older handler). Windows7 users should:

  1. use the Windows Explorer,
  2. choose “My Computer”,
  3. choose the “Map Network Drive” that shows up with “My Computer” selected,
  4. type in “http://serv1.example.com:80/shared” (no trailing “/”),
  5. select “login with other credentials” and click “next”
  6. carefully, give the username (ie “scott”) and the password
  7. the filesystem should be available. test by creating a file, creating a directory, and moving the test file into the test directory
  8. unmounting is done by restarting the system
  9. If you type your password incorrectly, you must restart your system to retry

Mac Users just Command-K, type in “http://serv1.example.com” or “http://serv1.example.com:80/” or “http://serv1.example.com/shared” (it doesn’t matter with Mac, it just works) and when connecting, they’ll be asked for the authentication details if it’s not in their keychain. Otherwise, it’ll just mount the filesystem, easy-peasy. Unmounts like any ejected disk or filesystem.

Feb 09

I just found that a tool I use often has been added to CoreUtils so now I don’t need to carry my version around.

For years, I’ve used a version of a program called variously “limitedrun” or “timeout” in order to run servers so I can self-test against them. An example is my work on Apache (minimal) and Nagios (lots, especially testing Nagios-LDAP). If the autotest failed, which meant that the script would immediately stop, the service would not be stopped by the cleanup commands, so the shell would not complete, and the self-test would appear to hang.

Instead, I used a limited-run facility to add the logic “well, if you ran for 30 seconds without being shutdown, the test must have failed, so … BLAM snuff it!”… which would allow the self-test to completed and report the failure. A passing self-test would gracefully shutdown the services used.

I think on my work on Nagios, it’s even called “timeout”.

At v7.0-beta, CoreUtils gained timeout, which I haven’t checked for compatibility in featureset (not hard, few features in mine) but now I can use that rather than keep packing mine around (the issue is when I added to it: I had to go and update every project that had a copy, or I needed my custom RPM on every testbox).

Less work for me! More standardized! Colour me happy :)

Jan 28

Clearly, if a tool relies on fork()/exec() or System.Exec() an OS tool to determine data that affects licensing/control, spoofing that application to return predictable data is the easiest way to mislead the licensing subsystem.

This is the same as shimming a DLL or Shared-Object library.

My grasp of the obvious is so exceptional — and this method so easy — that I felt this was worth mentioning.

Jan 26

Previously, I posted that a quick way to throttle back your timemachine is to set your defaults:

I’m currently testing TimeMachineScheduler, which would give more control over the configuration. The magic is how the author has worked around this problem:
1) disable automatic backup
2) enable scheduled backup from launchd:
/Library/LaunchDaemons/com.klieme.TimeMachineScheduler.plist

Literally, he’s created a launchd process that wakes up at the user-specified interval, and (presumably) if the time is within the “Skip backup between” values, his app must short-circuit the timemachine run and shut itself down until the next “wake” interval.

I’m impressed.

The old method was a defaults write com.apple.something.something 3 (for three-hour throttle) which is a bit difficult for non-geeks to type, and had a different behavior: it would skip the TimeMachine if the previous run was within the given number of hours. The difference is (period 3 hours for both, skip 08:00-18:00) that a change would be backed up on average within (3/2) 1.5 hrs in the defaults method, but would slow down your user-experience during the day; alternatively, the scheduler method means that backups perform on regular intervals, and the bdefaults method doesn’t always (ever) work correctly.

As an aside, I’m also curious whether I would get better throughput offering the timemachine disk as an iSCSI Target

While researching, I also came across a method of limiting TimeMachine from filling the volume

Jan 25

My company uses Exchange — and it’s not bad, considering that it brings in the SyncML (I think) technology that Gmail also has — if only it had the rest of what Gmail has, but I can understand if we’re not moving to avoid thrashing about.

The problem is that when I send mail, I wand to receive a copy, I don’t want a bcc:, but Entourage (the only Exchange client for a laptop that doesn’t die) only allows saving a copy in the “Sent Items” folder. I know, it’s simple enough to copy stuff around, but hey, I can get a cronjob to do that…

  1. create a file (with restrictive permissions) containing only the password:
    — /home/scott/imapsync-pw-exchange —
    tiger

    In my example, user “scott” has password “tiger”. Bonus points if know where that user/pass comes from :)

  2. drop a cron.hourly consisting of:
    — /etc/cron.hourly/scott-exchange-sync —

    imapsync \
      --host1 exchange.example.com  --port1 993 --authmech1 PLAIN \
      --host2 exchange.example.com  --port2 993 --authmech2 PLAIN \
      --user1 scott --ssl1 --passfile1 /home/scott/imapsync-pw-exchange \
      --user2 scott --ssl2 --passfile2 /home/scott/imapsync-pw-exchange \
      --folder 'Sent Items' --regextrans2 's/Sent Items/INBOX/g' 

    (or, better, crontab -e yourself a cron job that fires on 5-minute accuracy)

  3. profit!!!1!one!! … oh wait… uh… sit and relax.

The trick here is that we’re using imapsync to connect to our own server twice, as two clients, to sync the “Sent Items” folder. Yes, those two sections of parameters are exactly the same (except for s/1/2/ — and use the same password file) on purpose. The “regextrans2″ tells it that we want to “translate” one folder to another (that may exist). Note that we’re deleting and expunging the moved files to avoid dupes.

Jan 15

Wow, it’s possible ot never have to reformat my resume again?

Apparently, a 5-year-stale project called “XML Resume” gets us a little close to this impossible goal, but support may be challenging. Also, generating formats form the common XML base requires some dependencies (as can be expected).

Other attempts are listed at MicroFormats, but if I can get work that lets me actually build things, I won’t need a resume for a while. I have a great company, but I spend a lot of time on disposable work and/or sitting in airports, hotels, car rentals.

Actually, MicroFormats — which lists the HackCollege content — is a great place to start. …so that makes this entry merely a redirection to that blog entry. Mostly. But mine’s better because it’s written in Lisp. Or whatever.

Jan 13

I’ve been having a problem on an FTP server that may well be resolved using FUSE, curl, and autofs, plus some blind rsync:

When I synch content across the Atlantic to another server, it’s always truncated despite my best efforts. It seems that if there’s too large of latency, the windows-based FTP service chokes and dies, and the uploaded content merely stops. FTP is a bit of a dumb protocol, it doesn’t much realize that the upload was finished due to abort rather than an orderly finish unless the command channel reflects this — if the ftp-data socket closes, it’s done.

I’ve asked for rsync service to avoid this (and to make a more gentle sync than recursive wget) to no avail — the people discussing the request don’t ask me questions, and seem to refuse on assumption.

  • FTP mounted as a filesystem allows a rsync
  • Rsync allows a sync with retry and continues a bit more gracefully
  • FUSE can mount a filesystem in userspace
  • CurlFTPfs can mount a FTP server via FUSE
  • AutoFS allows the filesystem to unmount when idle, isolating us from server restarts (it IS windows after all)

in /etc/auto.master, I added:
# --- a separate direct-map automount (large space is a tab)
/ftpfs /etc/auto.ftpfs
#

I created /etc/auto.ftpfs:
# from USA server: (large spaces are tabs) (area01 is all one line, no space between password and @exam...)
area01 -fstype=fuse,allow_other,ro curlftpfs#username01:PassWord01 @ftp01.example.com/
area02 -fstype=fuse,allow_other,ro curlftpfs#username02:PassWord02 @ftp02.example.com/

I added this to an RPM that requires fuse-curlftpfs, and made it service autofs reload in install (postin and postun)

Now, I can do the following: (still testing):

rsync /ftpfs/area01/* /my/backup/dir/area01/
rsync /ftpfs/area02/* /my/backup/dir/area02/

… and this command will implicitly automount the filesystems in turn, allowing the rsync to sync content. Of course, it’s more efficient if the FTP server simply activated the rsync service, but it’s Windows, and rsync is only a decade old, so Windows won’t have it yet (and the administrator fears both the Cygwin dependency issues and the simplified route)

The rsync might do horrible things with the FTP server, I don’t know. If it does, maybe that will open up a cooperative dialogue (Kidding! Forcing someone’s hand is never a good way to start!).

Jan 07

Why can’t Microsoft’s updater realize that it’s the app it wants me to shut down?

Besides locking the MDS and slowing down a Mac, is there any real engineering put into this product by Microsoft?

Oh, and why is Microsoft affecting Safari? That seems fishy…

Dec 15

Bufferpool config is an often overlooked issue due to the rarity in which it nails you, but it can be important in those rare cases.

a Bufferpool is simply a resource limitation on a collection of RAM — typically this is a buffer, ie in-RAM space, it cannot be swapped-out because it represents in-flight transactions, uncommitted pages, or pre-fetched content that will be needed very soon.

a Commit is the RAM that is offered to a process — in glibc, this can default to 2G. This doesn’t say that every process automatically consumes 2G of RAM, but that the Kernel offers up to 2G to the process. Recall that due to sparse garbage in RAM pages, RAM offered to a process is NEVER reaped by the system back to the common pool.

A Commit can be dangerous when the OS over-commits RAM in a long-lived environment: if up to 100G is offered on a system with only 32G, you can see how if many threads grow their demand for RAM, the system will swap out some processes to meet demand. This is a typical action in a multiuser system with swap active (NOTE: Motorola tuned their commits on smartphones because on diskless systems, there’s obviously no swap)

In a long-lived database process, if the bufferpool is a commit, then it will soon grow to maximum commit. It can never be swapped out unless the database has bursty use-cases and has no active sessions for long periods. The bufferpools configured may be in addition to the heap space taken up by the process itself (in un-pooled resource space). The database itself may limit bufferpools, but consume a number of GB over the configured bufferpool space.

The other applications on the system also can demand up to their committed RAM — why limit one process while letting the others run amok on your server?

On long-lived systems, committed RAM becomes allocated and consumed RAM. Bufferpools need to be configured, and RAM usage monitored (or at least traps/exceptions raised when a critical DB starts swapping, an indication that review is critically needed)

Bufferpools and Commit/Demand discrepancies are silent but deadly killers, like the sharks and heart-disease of the resource-management domain.