DateFormats and Smart People

Uncategorized No Comments »

For ~19 years, I’ve pushed international date formats; date formatting is a strange detail that seems to be so significant to me.

Someone (“lion”) whose industry knowledge I strongly respect, and who has an attitude of “what will XXX do for me today?” (ie not flippantly changing direction on things without clear benefit) admitted to switching to international date format.

Why is that so cool to me?

  • it validates my position, but may not yet validate the stubbornness with which I’ve pursued this
  • confirms the benefits are not merely in my biased opinion
  • it’s easier for me personally 🙂 (it’s all about me)

In this case the benefits of ISO-8601 format (the basis for W3C, RFC-3339, used by tools such as XML recommended datatypes) of yyyy-mm-dd being unambiguously understood in other countries, and sorting where least expected (filenames of presentations and revisions of tech pubs) due to its friendliness with sort algorithms no more complex than strcmp(), these benefits have made it easier to find the earliest/latest/newest versions of documents where dozens of similar content may exist.

I still argue that:

  1. Locale is not Language: Just because I’m in China, I may not speak Chinese (and which China? Which Chinese?)
  2. Locale is not format: in Canada, we use date formats dd/mm/yy, dd-mmm-yy, and yymmdd — so which one?
  3. Language is not format: en_CA is similar to en_US, but 2/3/4 is a different date than you think
  4. There are more cities than Timezones: if you’re choosing a timezone, choose a timezone, not a city, unless it helps to suggest the actual timezone
  5. Don’t accidentally get into politcs: Wales and London are not so United of Kingdom; if you don’t offer “Cardiff” and “Bristol” as cities, then allow the user to choose their timezone without claiming to be British (or is that “English”?). If you don’t understand that, spend a week in Wales.

RFC822 and RFC-2822 force foreign countries to know English day-names and Month names (not hard, but entrypoint for error) and require non-trivial parsers, but some developers just don’t seem to consider that there might be an easier way.

Tools that follow java.util.Date or PHP date formats or make assumptions based on /etc/tz values, they’ll continue to limit their users’ choices, further limiting their users’ end-users’ choices. Think about what constraints you’re doing. “Rome wasn’t built in a day”, “you can’t turn a supertanker”, “can’t change China” – you don’t have to try to change the world, just don’t participate in the constraint.

Hey, it’s nice to be right on occasion. Unfortunately, I still need to choose Japan or China as locales in order to get usable dates.

SNMP: Brocade: How to read Link Reset by SNMP

Uncategorized No Comments »

My company’s flagship product collects in-depth content via SNMP. Outside of random port resets and some bit-shifting, switches usually give us their information accurately matching the port metrics they show in a portstatshow.

snmpget -v1 -c {community} {switch IP}{portIndex – 1}

for example: port 91, switch, community BatM4n:

snmpget -v1 -c BatM4n

Successive checks of this value can show the deltas that VW is reporting.

I’m using the following:

Meanwhile, OIDView is a great tool if you’re not used to using “vi” as your MIB browser. In this case, it would have helped me find the values (rather then “/” to search), but I’d still need to choose the correct MIB to start with.

Link Reset Notes

Uncategorized No Comments »

Some notes in tracking down LinkResets and similar exceptions on a switch:

usr/bin/snmpget -c public -v1 -mALL -t5 .

(1588 == BRCD.mib private number)
=(BRCD.mib) bsci.commDev.fibrechannel.fcSwitch.
=(SW.mib) bsci.commDev.fibrechannel.fcSwitch.sw.swGroup.swGroupTable.swGroupEntry

fcFxPortLinkResetIns: The number of Link Reset Protocol received by this FxPort from the attached NxPort.
=(FE_RFC2837.mib) fcFeMIB.fcFeMIBObjects.fcFeError.fcFxPortErrorTable.fcFxPortErrorEntry.fcFxPortLinkResetIns
Bytes Transmitted:
snmpwalk private enterprises.1588.
unrel example
/usr/bin/snmpget -c public -v1 -mALL -t5 .

WoW on Mac: TCP Tuning

howto No Comments »

My sister plays WoW, and had some latency issues. Rather than go to a higher-speed WAN connection (hey, Wifi-B works OK for most people, but not when you’re raiding) she drilled a bunch of holes in her floors and went direct-wired.

Not to nock direct-attached LAN connections: it’s faster overall, and your latency/jitter in the environment cannot be influenced by a steel stovepipe, or driving a car between your PC and your router. unfortunately, it may have the effect of switching the upstream bottleneck (of data blocks or ACKs stuck behind them) to the router.

Since Wifi bandwidth already exceeds Broadband bandwidth, your speed won’t go up by doing this, but latency improves (insert the first dweeb quoting Linus Torvalds on a “because Linus Sez So! Linus 3:16!” quote here)

Latency can also be a factor of buffering in terms of number of sliding windows, window size, etc. In cases of raw video, you can get better performance (ie less jitter) at the cost of a few dropped frames if you reduce your buffering, for example.

I would take notes on performance (which might be a subjective “feels better” or “feels sluggish”) and then twist a few knobs, as follows. DO NOT change more than one at a time, lest those changes be misattributed to the wrong change.

  • reduce net.inet.tcp.sendspace
    • sudo sysctl -w net.inet.tcp.sendspace=250000
    • make sure that kern.ipc.maxsockbuf = (net.inet.tcp.recvspace + net.inet.tcp.sendspace)
    • net.inet.tcp.sockthreshold may need to be set lower (0 to disable) so that sendspace/recvspace are respected earlier on
  • reduce net.inet.tcp.mssdflt to 1500 – (20 * wrapper) —
    • in most cases, this is 1480, because 20 bytes overhead for a PPPoE link
    • it’s OK to reduce that further without a huge drop in performance
    • further drop because of WiFi? Not logical, but it does protect your stream in the event of unseen X-over-Y tunneling
    • 1440 is OK on a local LAN, even a gigabit; if all LAN members permit jumboframes (9k), use 8940
  • I’m not sure there’s benefit to increasing net.inet.tcp.win_scale_factor above 3 (for gigabit ethernet) because the bottleneck at the router and cablemodem/DSL will only be exacerbated. The congestion should be caught at the desktop to avoid filling the queue at the cable/DSL for outbound traffic.

These reductions are an attempt at reducing queuing at various hops that can reduce the effectiveness of TCP’s congestion algorithm.

If I get other ideas, I’ll add them here.

As always, Netalyzr is a good first-flinch when checking out an unknown network, even if you think you’ve used that net for months.

Big Sliding Windows Increase Impact of Retransmission

dataflow No Comments »

I was analyzing a situation at a local company, and found that large buffers on low-latency connections are counter-productive.

Consider the following:

  • Big server has 12x HBAs to Fibrechannel SAN. This could be anything: teamed NICs, for example
  • big writes (for example, 12 MByte) load-balanced (each HBA takes 1MB buffers to send)
  • all 12 buffers need to arrive at once for a two-stage commit

So the problem is that in Fibrechannel, this is breoken up into2112-byte frames (think like an MTU of 1480 or 1500 in Ethernet). The smallest atomic chunk is 2112, so the megabyte is actually 497 frames. If any of those frames is discarded or corrupted, the session is retransmitted.

The important impact of the third item above is that it’s actually 12MByte in a single transfer, only “shotgunned” by loadbalancing, but all must arrive, or none must. This means that (12 x 497) frames must all arrive, or all need to be retransmitted (as a result of the host-side multiplexing — SAN faithfully sends the other content perfectly fine)

So with 5964 frames, you only need a 1/10000 failure rate to cause every second transmission to fail. At 1/100000, 1 in 20 fails.

In the multiplexing application’s recovery phase, it needs to wait 30 seconds for a failure in some cases: even though FibreChannel immediately aborts in 496 of 497 failure cases, the multiplexing doesn’t get alerted until its own timeout has expired. It seems that this might be created for slower connections, such as across IPIP, FCoE, DWDM, or similar slower-than-it-seems connections with larger latencies.

That means that a system processing 51MBit/sec, or 6.4 MByte/sec, can buffer a sequence of 193MByte before a retransmission is required. If that happens 1/20, then you’ll only get (Poisson distribution, I know) 228MByte between failures which gives you roughly 44% efficiency.

Part-and-parcel with that, a failure will show a huge delay (30 sec) in response time while the failure is getting detected during which no transaction can hit the storage; when it finally gets freed up, the backlog of nearly 30sec needs to be retransmitted. The failure cases may occur more frequently when data ramps up (such as link-level congestion exhausting buffers). That means that during the busiest times, the failures will occur more frequently, and can cause neighbouring systems sharing resources to similarly be impacted in a flip-flop action like trading around the “Old Maid” in a childrens’ card game.

So what happens when you reduce the session size? What about 8k pages, which give you 4 frames per session? Similar to cut/join FTP uploads to reduce the retransmission cost, more of those 8k pages arrive, and although a 30-sec timeout is still a 30-sec timeout, the in-flight retransmission is only (4×12) 40 frames. Less than 1% of the big buffer cost, similar to the difference is size of each buffer. Efficiency drops, since a 3.1% inefficiency (8448/8192) replaces a 0.104% inefficiency (1049664 / 1048576 (both *1024)) but the overall throughput in a sub-optimal situation should be much higher due to reduce retransmission.

Reducing the timeout in the multiplexer application should reduce the retransmission cost so long as the timeout is not too high that successful transactions are failed. Considering Fibre’s fast response time (typically 3ms, rarely exceeding 12ms during spike situations so long as no single server has too much queue depth to rob the SAN of its buffer resource)

Skype Wake-On-LAN Control Proxy

Uncategorized No Comments »

It would be great to command a Skype status message from my meetings.

I get meetings through the day; sometimes, I’m the guy presenting a gotomeeting (which is a great tool), during which times I have to hide the constant stream from the Skype window. The bad part of that is that I cannot see chatter from coworkers… well, that’s not the bad part. The bad part is that when they send me messages, they assume I see them.

How can I remind them that I cannot see the messages? (should I care?)

One idea is to link the showing of my screen to a status that says “I’m ignoring Skype right now”; for which I looked at the Skype API. There’s scant detail, and the easiest (lightest way) seems to be an Applescript remote-executor. An rexec() proxy?

  1. wakes-on-LAN using a UDP to a known port, from which the service starts, and timesout 5 minutes after idle? (this was a standard feature of inetd-run UDP ONC/RPC)
  2. reads UDP-based command with PKI to confirm permissions (ie “here, take this cookie”)
  3. sends the request to Skype (whitelist of accepted commands?) and sends the response back. The response might be too big for UDP out-of-order delivery

The net effect is a few days’ work (OK, a few hours, but try to get that in a single calendar day) I could tell my coworkers that I really am ignoring them. …or I can let them figure it out, because I work with really smart people (they tend to be smart about non-details, no sarcasm here, they really are very smart)

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in