Notice: register_sidebar was called incorrectly. No id was set in the arguments array for the "Sidebar 1" sidebar. Defaulting to "sidebar-1". Manually set the id to "sidebar-1" to silence this notice and keep existing sidebar content. Please see Debugging in WordPress for more information. (This message was added in version 4.2.0.) in /usr/share/wordpress/wp-includes/functions.php on line 4139 Tech Notes » dataflow

Cloaked Twittering in Dangerous Places

dataflow No Comments »

Recently I’ve been bothered by the concepts such as censorship, Arab Spring, crowd discussions.

Many of the censorship tools in-use are just tools built by engineers who are not political, just building tools. Just doing their jobs. Often the desire to fulfill a challenging objective can blind the engineer to the possible uses — or the engineer simply doesn’t care (i.e. has bigger issues to care about than some foreign country’s citizens’ free speech).

I have an Idea I’d call Qloak (compression of “Quacking” and “Cloaked”, and “Quacking” based on what Chinese guys call gossip) that would allow:

    • twitter posts to get through firewalls and most paywall wifi APs
    • foursquare checkins to also get through
    • ability to check whether an app needs to self-destruct, flushing history

A lot of this technology is the way I used to configure the “ext” system as a phonebook at a past employer; as well, acting as the head of a TOR or a VPN connection may consistently get through.

I don’t judge Eqypt, or Libya, or China, but I worry over the limiting/chilling/hushing aspect of some engineering talent mis-applied.

I would prefer that more people are in the conversation.

Countries, Companies, people who claim to support freedom of speech should act to support it. Build a TOR gateway. Support free opinions in other countries. Listen to everyone, even the Gay, the Religious Fanatic, the Type-B personality, the Nature Fanatic, the Raging Republican — whatever grouping you put people into, those people will be oppressed in other countries. Listen to them, however much you may disagree.

Yes, if I built an App for this, I would give away free signups to anyone at an email with a domain such as .cn, .ly, etc.

Data Quality as Meta?

dataflow No Comments »

When presenting data, try to include some sense of quality or accuracy, even if it’s just a flag “I derived this” or “I got this from a very accurate source” or “this is a space-filler”.

I wanted to highlight something I saw quite interesting in Axeda Corporation‘s Gateway and Connector technologies: Quality of metrics. Axeda uses an enumeration of simple qualities (Good, Bad, or Unknown), and this could theoretically be used when choosing which of two conflicting data types to show.

The simple act of collecting and summarizing metrics is not necessarily made easier when the precision meta is tracked, but it can help the end-user make better decisions based on this data: if you see an aberrant data point, do you know it’s seriously out-of-norm and needs to be acted upon, or is it based perhaps on a ratio with a questionable denominator, and should be taken with a bit of skepticism?

Consider precision, or at least define why it’s out-of-scope for your work.

Big Sliding Windows Increase Impact of Retransmission

dataflow No Comments »

I was analyzing a situation at a local company, and found that large buffers on low-latency connections are counter-productive.

Consider the following:

  • Big server has 12x HBAs to Fibrechannel SAN. This could be anything: teamed NICs, for example
  • big writes (for example, 12 MByte) load-balanced (each HBA takes 1MB buffers to send)
  • all 12 buffers need to arrive at once for a two-stage commit

So the problem is that in Fibrechannel, this is breoken up into2112-byte frames (think like an MTU of 1480 or 1500 in Ethernet). The smallest atomic chunk is 2112, so the megabyte is actually 497 frames. If any of those frames is discarded or corrupted, the session is retransmitted.

The important impact of the third item above is that it’s actually 12MByte in a single transfer, only “shotgunned” by loadbalancing, but all must arrive, or none must. This means that (12 x 497) frames must all arrive, or all need to be retransmitted (as a result of the host-side multiplexing — SAN faithfully sends the other content perfectly fine)

So with 5964 frames, you only need a 1/10000 failure rate to cause every second transmission to fail. At 1/100000, 1 in 20 fails.

In the multiplexing application’s recovery phase, it needs to wait 30 seconds for a failure in some cases: even though FibreChannel immediately aborts in 496 of 497 failure cases, the multiplexing doesn’t get alerted until its own timeout has expired. It seems that this might be created for slower connections, such as across IPIP, FCoE, DWDM, or similar slower-than-it-seems connections with larger latencies.

That means that a system processing 51MBit/sec, or 6.4 MByte/sec, can buffer a sequence of 193MByte before a retransmission is required. If that happens 1/20, then you’ll only get (Poisson distribution, I know) 228MByte between failures which gives you roughly 44% efficiency.

Part-and-parcel with that, a failure will show a huge delay (30 sec) in response time while the failure is getting detected during which no transaction can hit the storage; when it finally gets freed up, the backlog of nearly 30sec needs to be retransmitted. The failure cases may occur more frequently when data ramps up (such as link-level congestion exhausting buffers). That means that during the busiest times, the failures will occur more frequently, and can cause neighbouring systems sharing resources to similarly be impacted in a flip-flop action like trading around the “Old Maid” in a childrens’ card game.

So what happens when you reduce the session size? What about 8k pages, which give you 4 frames per session? Similar to cut/join FTP uploads to reduce the retransmission cost, more of those 8k pages arrive, and although a 30-sec timeout is still a 30-sec timeout, the in-flight retransmission is only (4×12) 40 frames. Less than 1% of the big buffer cost, similar to the difference is size of each buffer. Efficiency drops, since a 3.1% inefficiency (8448/8192) replaces a 0.104% inefficiency (1049664 / 1048576 (both *1024)) but the overall throughput in a sub-optimal situation should be much higher due to reduce retransmission.

Reducing the timeout in the multiplexer application should reduce the retransmission cost so long as the timeout is not too high that successful transactions are failed. Considering Fibre’s fast response time (typically 3ms, rarely exceeding 12ms during spike situations so long as no single server has too much queue depth to rob the SAN of its buffer resource)

MySQL Replication Walkthru: Activate Secondary

dataflow, howto No Comments »

After Enabling Replication and Making a Remote Backup), we can activate the secondary server.

Already, our Primary has been returned to service, and we don’t really need to alter it from this point forward; all our work after enabling replication has been on the Secondary. We’ve saved our remote backup on Secondary’s disk, but not yet started the secondary server.

We will:

  1. start the Secondary (skip-start-slaves is still in the config file)
  2. Configure the replication
  3. Import the backup file
  4. Start the Replication process on the Secondary
  5. Configure the Secondary so that it will always start the Replication automatically

Our servers:

Primary: Sleepy (192.168.44.3)
Secondary: Doc (192.168.44.4)
MySQL: 5.1.41-enterprise-commercial-pro
OS: Windows 2008R2

Start the Secondary
We still have the option “skip-start-slaves” in our my.ini (my.cnf), and now we’re going to start the server; this is as simple as using the Windows services.msc to start the MySQL service.

On a Unix-like server, /sbin/init would start the service, or if you’re in the SysVinit-monstrosity, /etc/init.d/mysql* start or something similar would start the service. We can discuss why a script would reside in /etc/ some other time when the SystemV documents and the Linux FSH are both present (hint: violates both; config files only)

Configure the Replication
Replication can also be configured using the config file, but I did it using the CLI, as follows:

(On Doc/Secondary)

mysql.exe -u root mysql
mysql> CHANGE MASTER TO MASTER_HOST=’sleepy.example.com’, MASTER_USER=’repl’, MASTER_PASSWORD=’R3plPassw0rd’;
secondary:mysql> SHOW SLAVE STATUS;

(On Doc/Secondary)

mysql.exe -u root mysql
mysql> CHANGE MASTER TO MASTER_HOST=’sleepy.example.com’, MASTER_USER=’repl’, MASTER_PASSWORD=’R3plPassw0rd’;
secondary:mysql> SHOW SLAVE STATUS;

The output of “SHOW SLAVE STATUS” should have a proper Host, user, and pass, but the logfile and log position will be fairly incorrect.

Import the Backup File

We stored the backup file we made as repldbdump.db. We should already have a user on the local server that can insert and import (by default, “root” can do this), and we’ll import it using:

(On Secondary/Doc)

mysql.exe -u root < repldbdump.db

Another benefit of storing this backup/dump with the “master” options is that it will correct the laster log file and master log position for us. A repeat of “SHOW SLAVE STATUS” should show that the MASTER_LOG_FILE and MASTER_LOG_POS in the repldbdump.db has set things right.

Start the Replication process on the Secondary

We have the data loaded from the Primary, and we have our replication configured. The import has configured our replication binlog files and positions, so we’re ready to start.

(On Doc/Secondary)

mysql.exe -u root mysql
mysql> START SLAVE;

Configure the Secondary so that it will always start the Replication automatically

Finally, we can remove the config entry we put into our Secondary so that it would start with a crippled Replication config; comment out the skip-slave-start in your my.ini (my.cnf):

(Secondary/Doc)

[mysqld]
...
server-id=2
#skip-slave-start

There’s no need to restart the Secondary, but if it does restart, it will automatically get back into replication.

MySQL Replication Walkthru: Making a Remote Backup

dataflow, howto No Comments »

After Enabling Replication (or if you are not using replication, then right after Enabling Network Access), we can use this config to make a remote backup of the database.

Our servers:

Primary: Sleepy (192.168.44.3)
Secondary: Doc (192.168.44.4)
MySQL: 5.1.41-enterprise-commercial-pro
OS: Windows 2008R2

Use Remote Access to Pull a Remote Backup

By this point, you can connect to your server remotely to make queries; now we want to pull a backup.

(on Secondary/Doc)

mysqldump.exe –u repl -h 192.168.44.3 --password=R3plPassw0rd –add-drop-table –all-databases > repldbdump.db

In you’re setting up replication, then you’ll want the additional replication content provided by using this command instead:

(on Secondary/Doc)

mysqldump.exe –u repl -h 192.168.44.3 --password=R3plPassw0rd –add-drop-table –all-databases  –master-data –host=vi-sleepy.vi.local > repldbdump.db

The benefit of this additional data is that it sets the replication master file and log position for when you continue to Activate the Secondary

MySQL Replication Walkthru: Enable Replication

dataflow, howto 2 Comments »

After Enabling Network Access, we can Enable Replication before Making a Remote Backup of the Database. If you’re reading this to simply make recurring backups of your MySQL remotely, then you can ignore this step.

In this step, we’ll

  1. assign IDs to the Primary and Secondary
  2. restart the primary
  3. proceed to backup the database before starting the Secondary

Primary: Sleepy (ID = 1)
Secondary: Doc (ID = 2)
MySQL: 5.1.41-enterprise-commercial-pro
OS: Windows 2008R2

Assign ID to the Primary
While assigning a replication ID, we can also define the binary log for replication; I did this using two parameters into the my.ini (my.cnf) file:

(Primary/Sleepy)

[mysqld]
...
#bind-address=127.0.0.1 # commented to bind to all interfaces
log-bin=”E:\Data\repl-bin”
binlog=format=ROW
server-id=1

(Secondary/Doc)

[mysqld]
...
server-id=2
skip-slave-start

Start the Primary server, but don’t start the secondary yet. Note that “skip-slave-start” is there as opposed to running the Secondary with the option “–skip-slave-start”, which is difficult to do using Windows’ service stop/start. This option is only there for the first run of the Secondary.

(On Sleepy/Primary)

mysql.exe -u root mysql
mysql> GRANT REPLICATION SLAVE ON *.* TO ‘repl’@'%’;
mysql> FLUSH PRIVILEGES;

You should notice that when the Primary server starts up again, it begins creating E:Datarepl-bin.index and E:Datarepl-bin.000001 files

MySQL Replication Walkthru: Enable Network Access

dataflow, howto 1 Comment »

In my replication setup, I needed to make a backup, and I needed to enable TCP/IP access eventually, so I did them as a single step.

Primary: Sleepy
Secondary: Doc
MySQL: 5.1.41-enterprise-commercial-pro
OS: Windows 2008R2

In order to allow Doc (Secondary) access into Sleepy (Primary), Sleepy had to accept remote TCP clients. The process for that was:

  1. create a username/password pair for a new remote user (wildcard host, or a specific host)
  2. configure MySQL to accept remote client access

Create a User/Pass Pair

I wanted to ensure I could access the server’s authentication, so I restarted with “–skip-grant-tables”. In my case, I added this to the WINDOWSmy.ini, but Linux and Unix-like users (including BSD) might find /etc/my.cnf or /etc/inet/my.cnf. My config looked like:

(On Sleepy/Primary)

[mysqld]
...
...
skip-grant-tables
...

Restart the server.

Next, I connected and ran a GRANT command:

(On Sleepy/Primary)

mysql.exe -u root mysql
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'R3plPassw0rd';
mysql> GRANT SELECT ON *.* TO 'repl'@'%';
mysql> FLUSH PRIVILEGES;

Note: “mysql.exe” is obviously “mysql” on Unix-like systems. “%” is a wildcard in MySQL’s world.

The FLUSH PRIVILEGES is not quite necessary because in our next step, we’ll be restarting the database anyhow.

If you cannot connect, check that the unix-socket is present, and check for socket configs in the my.ini (my.cnf).

Configure MySQL to Accept Remote Client Access
In order to open up the external port (which might already be open, depending on your configuration), I commented out the bind-address in my my.ini (my.cnf) config file:

(On Sleepy/Primary)

[mysqld]
...
...
#bind-address=127.0.0.1
skip-grant-tables
...

After I restarted, I noticed that I could connect using “-h 127.0.0.1” (as I could before) but also using the external address (192.168.44.3):

(On Sleepy/Primary)

mysql.exe -u root -h 127.0.0.1 mysql
mysql> exit
mysql.exe -u repl --password=R3plPassw0rd -h 127.0.0.1 mysql
mysql> exit
mysql.exe -u root -h 127.0.0.1 mysql
(fails, as expected)
mysql.exe -u repl --password=R3plPassw0rd -h 192.168.44.3 mysql
mysql> exit

(On Doc/Secondary)

mysql.exe -u repl --password=R3plPassw0rd -h 192.168.44.3 mysql
mysql> exit

If you cannot connect with “-h 127.0.0.1”, check that the “bind-address” is defined properly or absent completely from the my.ini (my.cnf) file, and that you have restarted the server since you made that change. “netstat -n” would confirm whether mysql is listening on port 3306. “telnet 127.0.0.1 3306” or “nc 127.0.0.1 3306” would confirm whether MySQL is available on that port (or something else is).

If you cannot connect with your external IP address, check that you have the right address, and confirm (using telnet or nc) that you have a service responding there.

If that works fine, comment out your “skip-grant” and restart, then recheck with the same OS-level mysql(.exe) statements as above. Connectivity should work and fail as above.

(On Sleepy/Primary)

[mysqld]
...
#bind-address=127.0.0.1
#skip-grant-tables
...

MySQL Replication Walkthru

dataflow, howto No Comments »

This walkthru shows my steps to configure trivial replication using MySQL’s standard capability on a pair of servers.

Replication and HA are often difficult to configure and maintain in the little nuances and corner-cases. It’s so easy to make a mistake that doesn’t show until it’s time ot bring the backup online, or until a logfile slowly fills up the disk and chokes the server 212 days later. That said, parallel servers bring in a complexity of ACID/idempotent actions, split-brain issues, load-balancing, and at the core, one server remains the “master” for a very small delta between “master” and “subordinate”.

The problem with active/passive is that you can only reap 50% of your investment costs, and the service retains a limitation in overall throughput. Eventually — even through organic growth — the service will be unable to maintain the same responsiveness to queries while handing the continual updates and inserts. At some point, there needs to be a split.

As opposed to going back to parallelism, a PRM/SRM relationship can be used. A “Primary Replication Manager” (PRM, we used to call it “master”) is used to accept updates and offer serialized (ie: logical clock, ordered, not as in “Java Serialization”, “persistence”, and “hibernation”) updates to a number of “Secondary Replication Managers” (SRM). This is a process exploited in DNS most commonly, but I’ve configured 8 SRMs on a single LDAP PRM (Hi, Phorm), 4 Backup Domain Controllers in a single (hidden) Primary Domain Controllers on Windows NT-3.51, and my own NIS secondaries from a MySQL-fed NIS primary at Connected Networks, an ISP we used to own. The complexity is shifted from the maintenance to a the initial setup, which (in MySQL, LDAP, and NIS) can be done as a “smart postinstall trigger” approach to reduce the risk of human error.

The MySQL case is a bit more difficult, so here’s my walkthru, mostly to remind myself, but maybe someone else will benefit.

In my case, I wanted to evaluate the feasibility and difference in performance — in very crude numbers — when replication is configured.

My walkthru features the following players:

PRM: Sleepy
SRM: Doc
MySQL: 5.1.41-enterprise-commercial-pro
OS: WIndows 2008R2

I’ve cut this into separate articles not because I have some advertising quota to make, simply because that makes it bite-sized actions that a SysAdmin can take while fighting fires and repairing the Dell Retractible Drink Holder labeled “DVD” on some desktop systems.

  1. Enable Network Access
  2. Enable Replication on the Primary
  3. Make a Remote Backup
  4. ), we can activate the secondary server.
  5. Activate the Secondary Server

For a detailed example for the Replication Eval done in Scotts Valley on 2010-10-08, the example walkthru is online on the VI blog.

AppleTV and Firefly Media Server

dataflow No Comments »

The new AppleTV does not work with the iTunes Server on many purpose-built NAS devices simply because the underlying technology (Firefly Media Server) cannot authenticate with Apple Home Sharing. The Second-Gen AppleTV requires Home Sharing to work.

Update 2010-12-13:James Danylik reports that it’s possible to replace firefly with forked-daapd, which is a rewrite/fork of mt-daapd, which is how firefly started. …so there’s a rewrite in progress, the dependencies are problematic, but it shouldn’t be long before it’s fully-baked.

Update 2011-06-09:Additional work has been done on metadata. Haven’t found an upgrade path yet. Scratch my own itch?

CSV Isn't Versioned — Risk Being Incompatible With Yourself

dataflow No Comments »

Consider a simple example:

(example-2.0.1.csv)

Bloggins, Scott, "+1-212-555-1212", Engineer, 95060
Clark, Allan, "+1-424-242-2668", FAE, 98107

OK, so that’s fine. Obviously, we wanted a name, number, job code, and zipcode, and when we parse that, we simply say:
lastname = $1
firstname = $2
phone = $3
Job = $4
zipcode = $5

…but shoot, in version 2.0.2, we needed to paste the street address (because we’ll use zipcode to figure out the city/state). Simple enough, we’ll just stick that in:

(example-2.0.2.csv)

Bloggins, Scott, "+1-212-555-1212", Engineer, 100 Enterprise Way, 95060
Clark, Allan, "+1-HA-HA-HA-Boot", FAE, 2237 Starbucks Street, 98107

That’s fine, but now the parsing is broken — for example, Scott Bloggins:
Job = Engineer
zipcode = “100 Enterprise Way”

waitaminute. That got all screwed up, and CSV cannot indicate its version number (yes, a commented pre-amble has been discussed, and has screwed up parsers already — abandon all hope yet who there enters)

OK, now we’ll get around that by saying “well, if there are six entries, we’ll treat it like v2.0.2, but 5 entries, v2.0.1”:

Job = $4
if (NF > 5); then
  address = (null)
  zipcode = $5
else
  address = $5
  zipcode = $6
endif

Tell me that doesn’t get cumbersome soon; besides, it ignores optional content, so if anything is skipped, you eventually have:

Bloggins, Scott,,,,,,96050

Sounding a bit like Clint Eastwood: “Did I type 12, or 17 commas? Do you feel lucky, punk? Do ya?”

This is why XML was invented, has version numbers, optional content, and rock-solid parsing. There’s libraries for this, and the schema is obvious when reading it, plus it still compresses nicely (plaintext with repeated syntactical sugar).

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in