Planet SysAdmin

Upcoming sysadmin conferences/events

Contact me to have your event listed here.

September 02, 2014

Chris Siebenmann

An IPv6 dilemma for us: 'sandbox' machine DNS

In our current IPv4 network layout, we have a number of internal 'sandbox' networks for various purposes. These networks all use RFC 1918 private address space and with our split horizon DNS they have entirely internal names (and we have PTR resolution for them and so on). In a so far hypothetical IPv6 future, we would presumably give all of those sandbox machines public IPv6 addresses, because why not (they'd stay behind a firewall, of course). Except that this exposes a little question: what public DNS names do we give them? Especially, what's the result of doing a reverse lookup on one of their IPv6 addresses?

(Despite our split horizon DNS, we do have one RFC 1918 IP address that we've been forced to leak out.)

We can't expose our internal names for these machines because they're not valid global DNS names; they live in an entirely synthetic and private top level zone. We probably don't want to not have any reverse mapping for their IPv6 addresses because that's unfriendly (on various levels) and is likely to trigger various anti-abuse precautions on remote machines that they try to talk to. I think the only plausible answer is that we must expose reverse and forward mappings under our organizational zone (probably under a subzone to avoid dealing with name collision issues). One variant of this would be to expose only completely generic and autogenerated name mappings, eg 'ipv6-NNN.GROUP.etc' or the like; this would satisfy things that need reverse mappings with minimal work and no leakage of internal names.

If we expose the real names of machines through IPv6 DNS people will start using these names, for example for granting access to things. This is fine, except that of course these names only work for IPv6. This too is probably okay because most of these machines don't actually have externally visible IPv4 addresses anyways (they get NAT'd to a firewall IP when they talk to the outside world, and of course the NAT IP address is shared between many internal machines).

(There are some machines that are publically accessible through bidirectional NAT. These machines already have a public name to attach an IPv6 address to and we could make the reverse lookup work as well.)

Overall, I think the simplest solution is to have completely generic autogenerated IPv6 reverse and forward zones that are only visible in our external DNS view and then add IPv6 forward and reverse DNS for appropriate sandboxes to our existing internal zones. This does the minimal amount of work to pacify external things that want reverse DNS while preserving the existing internal names for machines even when you're using IPv6 with them.

The fly in this ointment is that I have no idea if the OpenBSD BIND can easily and efficiently autogenerate IPv6 reverse and forward names, given that there are a lot more of them than there are in typical autogenerated IPv4 names. If it's a problem, I suppose we can have a script that autogenerates the public IPv6 names for any IPv6 address we add to internal DNS.

by cks at September 02, 2014 02:27 AM

September 01, 2014

Everything Sysadmin

TPOCSA Book Tour announcement!

I'm excited to announce my "book tour" to promote The Practice of Cloud System Administration, which starts shipping on Friday, September 5!

I'll be speaking and/or doing book signings at the following events. More dates to be announced soon.

This book is the culmination of 2 years of research on the best practices for modern IT / DevOps / cloud / distributed computing. It is all new material. We're very excited by the early reviews and hope you find the book educational as well as fun to read.

I'd be glad to autograph you copy if you bring it to any of these events. (I have something special planned for ebook owners.)

Information about the book is available on Read a rough draft on Safari Books Online. For a limited time, save 35% by using discount code TPOSA35 on

I look forward to seeing you at one or more of these events!

September 01, 2014 09:00 PM

toolsmith - Jay and Bob Strike Back: Data-Driven Security


Data-Driven Security: Analysis, Visualization and Dashboards
R and RStudio as we’ll only focus on the R side of the discussion
All other dependencies for full interactive use of the book’s content are found in Tools You Will Need in the books Introduction.

When last I referred you to a book as a tool we discussed TJ O’Connor’s Violent Python. I’ve since been knee deep in learning R and quickly discovered Data-Driven Security: Analysis, Visualization and Dashboards from Jay Jacobs and Bob Rudis, hereafter referred to a Jay and Bob (no, not these guys).

Jay and Silent Bob Strike Back :-)
Just so you know whose company you’re actually keeping here Jay is a coauthor of Verizon Data Breach Investigation Reports and Bob Rudis was named one of the Top 25 Influencers in Information Security by Tripwire.
I was looking to make quick use of R as specific to my threat intelligence & engineering practice as it so capably helps make sense of excessive and oft confusing data. I will not torment you with another flagrant misuse of big data vendor marketing spew; yes, data is big, we get it, enough already. Thank goodness, the Internet of Things (IoT) is now the most abused, overhyped FUD-fest term. Yet, the reality is, when dealing with a lot of data, tools such as R and Python are indispensable particularly when trying to quantify the data and make sense of it. Most of you are likely familiar with Python but if you haven’t heard of R, it’s a scripting language for statistical data manipulation and analysis. There are a number of excellent books on R, but nowhere will you find a useful blending of R and Python to directly support your information security analysis practice as seen in Jay and Bob’s book. I pinged Jay and Bob for their perspective and Bob provided optimally:
“Believe it or not, we (and our readers) actually have ZeroAccess to thank for the existence of Data-Driven Security (the book, blog and podcast). We started collaborating on security data analysis & visualization projects just about a year before we began writing the book, and one of the more engaging efforts was when we went from a boatload of ZeroAccess latitude & longitude pairs (and only those pairs) to maps, statistics and even graph analyses. We kept getting feedback (both from observation and direct interaction) that there was a real lack of practical data analysis & visualization materials out there for security practitioners and the domain-specific, vendor-provided tools were and are still quite lacking. It was our hope that we could help significantly enhance the capabilities and effectiveness of organizations by producing a security-centric guide to using modern, vendor-agnostic tools for analytics, a basic introduction to statistics and machine learning, the science behind effective visual communications and a look at how to build a great security data science team.
One area we discussed in the book, but is worth expanding on is how essential it is for information security professionals to get plugged-in to the broader "data science" community. Watching "breaker-oriented" RSS feeds/channels is great, but it's equally as important to see what other disciplines are successfully using to gain new insights into tough problems and regularly tap into the wealth of detailed advice on how to communicate your messages as effectively as possible. There's no need to reinvent the wheel or use yesterday's techniques when trying to stop tomorrow's threats.”
Well said, I’m a major advocate for the premise of moving threat intelligence beyond data brokering as Bob mentions. This books endeavors and provides the means with which to conduct security data science. According to Booz Allen’s The Field Guide to Data Science, “data science is a team sport.” While I’m biased, nowhere is that more true than the information security field. As you embark on the journey Data-Driven Security: Analysis, Visualization and Dashboards (referred to hereafter as DDSecBook) intends to take you on you’ll be provided with direction on all the tools you need, so we’ll not spend much time there and instead focus on the applied use of this rich content. I will be focusing solely on the R side of the discussion though as that is an area of heavy focus for me at present.  DDSecBook is described with the byline Uncover hidden patterns of data and respond with countermeasures. Awesome, let’s do just that.

Data-Driven Security

DDSecBook is laid out in such a manner as to allow even those with only basic coding or scripting (like me; I am the quintessential R script kiddie) to follow along and grow while reading and experimenting:
1.       The Journey to Data-Driven Security
2.       Building Your Analytics Toolbox: A Primer on Using R and Python for Security Analysis
3.       Learning the “Hello World” of Security Data Analysis
4.       Performing Exploratory Security Data Analysis
5.       From Maps to Regression
6.       Visualizing Security Data
7.       Learning from Security Breaches
8.       Breaking Up with Your Relational Database
9.       Demystifying Machine Learning
10.   Designing Effective Security Dashboards
11.   Building Interactive Security Visualizations
12.   Moving Toward Data-Driven Security

For demonstrative purposes of making quick use of the capabilities described, I’ll focus our attention on chapters 4 and 6. As a longtime visualization practitioner I nearly flipped out when I realized what I’d been missing in R, so chapters 4 and 6 struck close to home for me. DDSecBook includes code downloads for each chapter and the related data so you can and should play along as you read. Additionally, just to keep things timely and relevant, I’ll apply some of the techniques described in DDSecBook to current data of interest to me so you can see how repeatable and useful these methods really are.

Performing Exploratory Security Data Analysis

Before you make use of DDSecBook, if you’re unfamiliar with R, you should read An Introduction to R, Notes on R: A Programming Environment for DataAnalysis and Graphics and run through Appendix A. This will provide at least an inkling of the power at your fingertips.
This chapter introduces concepts specific to dissecting IP addresses including their representation, conversion to and from 32-bit integers, segmenting, grouping, and locating, all of which leads to augmenting IP address data with the likes of IANA data. This is invaluable when reviewing datasets such as the AlienVault reputation data, mentioned at length in Chapter 3, and available as updated hourly.
We’ll jump ahead here to Visualizing Your Firewall Data (Listing 4-16) as it provides a great example of taking methods described in the book and applying it immediately to yourdata. I’m going to set you up for instant success but you will have to work for it a bit. The script we’re about to discuss takes a number of dependencies created earlier in the chapter; I’ll meet them in the script for you (you can download it from my site), but only if you promise to buy this book and work though all prior exercises for yourself. Trust me, it’s well worth it. Here’s the primary snippet of the script, starting at line 293 after all the dependencies are met. What I’ve changed most importantly is the ability to measure an IP list against the very latest AlienVault reputation data. Note, I found a bit of a bug here that you’ll need to update per the DDSecBook blog. This is otherwise all taken directly ch04.r in the code download with specific attention to Listing 4-16 as seen in Figure 2.

FIGURE 2: R code to match bad IPs to AlienVault reputation data
I’ve color coded each section to give you a quick walk-through of what’s happening.
1)      Defines the URL from which to download the AlienVault reputation data and provides a specific destination to download it to.
2)      Reads in the AlienVault reputation data, creates a data framefrom the data and provides appropriate column names. If you wanted to read the top of that data from the data frame, using head(av.df, 10) would result in Figure 3.

FIGURE 3: The top ten entries in the Alien Vault data frame
3)      Reads in the list of destination IP addresses, from a firewall log list as an example, and compares it against matches on the reliability column from the AlienVault reputation data.
4)      Reduces the dataset down to only matches for reliability above a rating of 6 as lower tends to be noise and of less value.
5)      Produces a graph with the function created earlier in the ch04.r code listing.
The results are seen in Figure 4 where I mapped against the Alien Vault reputation data provided with the chapter 4 download versus brand new AlienVault data as of 25 AUG 2014.

FIGURE 4: Bad IPs mapped against Alien Vault reputation data by type and country
What changed, you ask? The IP list provided with chapter 4 data is also a bit dated (over a year now) and has likely been cleaned up and is no longer of ill repute. When I ran a list 6100 IPs I had that were allegedly spammers, only two were identified as bad, one a scanning host, the other for malware distribution. 
Great stuff, right? You just made useful, visual sense of otherwise clunky data, in a manner that even a C-level executive could understand. :-)

Another example the follows the standard set in Chapter 6 comes directly from a project I’m currently working on. It matches the principles of said chapter as built from a quote from Colin Ware regarding information visualization:
“The human visual system is a pattern seeker of enormous power and subtlety. The eye and the visual cortex of the brain form a massively parallel processor that provides the highest bandwidth channel into human cognitive centers.”
Yeah, baby, plug me into the Matrix! Jay and Bob paraphrase Colin to describe the advantages of data visualization:
·         Data visualizations communicate complexity quickly.
·         Data visualizations enable recognition of latent patterns.
·         Data visualizations enable quality control on the data.
·         Data visualizations can serve as a muse.
To that end, our example.
I was originally receiving data for a particular pet peeve of mine (excessively permissive open SMB shares populated with sensitive data) in the form of a single Excel workbook with data for specific dates created as individual worksheets (tabs). My original solution was to save each worksheet as individual CSVs then use the read.csv function to parse each CSV individually for R visualization. Highly inefficient given the like of the XLConnect library that allows you to process the workbook and its individual worksheets without manipulating the source file.
raw - data="" harestats0727.csv="" openshares="" read.csv="" span="">
h - ostct="" raw="" span="" sum="">
s - harect="" raw="" span="" sum="">
sharestats - data="" harestats_8_21.xlsx="" loadworkbook="" openshares="" span="">
sheet1 - readworksheet="" sharestats="" sheet="1)</span">
h1 - ostct="" sheet1="" span="" sum="">
s1 - harect="" sheet1="" span="" sum="">
The first column of the data represented the number of hosts with open shares specific to a business unit, the second column represented the number of shares specific to that same host. I was interested in using R to capture a total number of hosts with open shares and the total number of open shares over all and visualize in order to show trending over time. I can’t share the source data with you as its proprietary, but I’ve hosted the R code for you. You’ll need to set your own working directory and the name and the path of the workbook you’d like to load. You’ll also need to define variables based on your column names. The result of my effort is seen in Figure 5.

FIGURE 5: Open shares host and shares counts trending over time
As you can see, I clearly have a trending problem, up versus down is not good in this scenario.
While this is a simple example given my terrible noob R skills, there is a vast green field of opportunity using R and Python to manipulate data in such fashion. I can’t insist enough that you give it a try.

In Conclusion

Don’t be intimidated by what you see in the way of code while reading DDSecBook. Grab R and R Studio, download the sample sets, open the book and play along while you read. I also grabbed three other R books to help me learn including The R Cookbookby Paul Teeter, R for Everyone by Jared Lander, and The Art of R Programming by Normal Matloff. There are of course many others to choose from. Force yourself out of your comfort zone if you’re not a programmer, add R to your list if you are, and above all else, as a security practitioner make immediate use of the techniques, tactics, and procedures inherent to Jay and Bob’s most excellent Data-Driven Security: Analysis, Visualization and Dashboards.
Ping me via email if you have questions (russ at holisticinfosec dot org).
Cheers…until next month.


Bob Rudis, @hrbrmstr, DDSecBook co-author, for his contributions to this content and the quick bug fix, and Jay Jacobs, @jayjacobs, DDSecBook co-author.

by (Russ McRee) at September 01, 2014 05:43 PM

Chris Siebenmann

We don't believe in DHCP for (our) servers

I understand that in some places it's popular for servers to get their IP addresses on boot through DHCP (presumably or usually static IP addresses). I understand the appeal of this for places with large fleets of servers that are frequently being deployed or redeployed; to put it one way I imagine that it involves touching machines less. However it is not something that we believe in or do in our core network, for at least two reasons.

First and lesser, it would add an extra step when setting up a machine or doing things like changing its network configuration (for example to move it to 10G-T). Not only would you have to rack it and connect it up but you'd also have to find out and note down the Ethernet address and then enter it into the DHCP server. Perhaps someday we will all have smartphones and all servers that we buy will come with machine readable QR codes of their MACs, but today neither is at all close to being true (and never mind the MACs of ports on expansion cards).

(By the way, pretty much every vendor who prints MACs on their cases uses way too small a font size.)

Second and greater, it would add an extra dependency to the server boot process. In fact it would add several extra dependencies; we'd be dependent not just on the DHCP server being up and having good data, but also on the network path between the booting server and the DHCP server (here I'm thinking of switches, cables, and so on). The DHCP server would become a central point of near total failure, and we don't like having those unless we're forced into it.

(Sure, we could have two or more DHCP servers. But then we'd have to make very sure their data stayed synchronized and we'd be devoting two servers to something that we don't see much or any advantage from. And two servers with synchronized data doesn't protect against screwups in the data itself. The DHCP server data is a real single point of failure where a single mistake has great potential for serious damage.)

A smaller side issue is that we label our physical servers with what host they are, so assigning IPs (and thus hostnames) through DHCP creates new and exciting ways for a machine's label to not match the actual reality. We also have their power feeds labeled in the network interfaces of our smart PDUs, which would create similar possibilities for exciting mismatches.

by cks at September 01, 2014 01:41 AM

August 31, 2014

Security Monkey

Shakacon: Open Source Threat Intelligence [VIDEO]


If there's one security conference in the USA that I wish I'd get to, it's Shakacon, held in the state of Hawaii.  :)  


One particular talk from Shakacon 5 that caught my eye (and continuing my discussion on threat intelligence from my previous posting) was this talk by a member of the Verizon Security team on open source threat intelligence.  


From the description:

August 31, 2014 10:01 PM

Scumblr and Sketchy: Gathering Threat Intel on a Budget!

One can never have enough good threat intelligence about potential threats to one's organization and assets.  Just as nations spend millions of dollars on intelligence agency budgets, organizations must make similar investments in people, tools and processes that allow them to find, process, and use valuable intelligence about potential threats.


Putting the discussion around private, commercial, and open source intelligence feeds and their value aside, many smaller

August 31, 2014 09:07 PM

Latest "CelebGate" Photo Leak is New News of an Old Problem

Tabloid magazines, TV shows and web sites make millions every year off of celebrity "oops" pictures that involve everything from a wardrobe malfunction, paparazzi photographs, and the leakage of private photographs.


This week's "Celebgate" leakage of nude photographs from users on 4cha

August 31, 2014 07:58 PM

Steve Kemp's Blog

A diversion - The National Health Service

Today we have a little diversion to talk about the National Health Service. The NHS is the publicly funded healthcare system in the UK.

Actually there are four such services in the UK, only one of which has this name:

  • The national health service (England)
  • Health and Social Care in Northern Ireland.
  • NHS Scotland.
  • NHS Wales.

In theory this doesn't matter, if you're in the UK and you break your leg you get carried to a hospital and you get treated. There are differences in policies because different rules apply, but the basic stuff "free health care" applies to all locations.

(Differences? In Scotland you get eye-tests for free, in England you pay.)

My wife works as an accident & emergency doctor, and has recently changed jobs. Hearing her talk about her work is fascinating.

The hospitals she's worked in (Dundee, Perth, Kirkcaldy, Edinburgh, Livingstone) are interesting places. During the week things are usually reasonably quiet, and during the weekend things get significantly more busy. (This might mean there are 20 doctors to hand, versus three at quieter times.)

Weekends are busy largely because people fall down hills, get drunk and fight, and are at home rather than at work - where 90% of accidents occur.

Of course even a "quiet" week can be busy, because folk will have heart-attacks round the clock, and somebody somewhere will always be playing with a power tool, a ladder, or both!

So what was the point of this post? Well she's recently transferred to working for a childrens hospital (still in A&E) and the patiences are so very different.

I expected the injuries/patients she'd see to differ. Few 10 year olds will arrive drunk (though it does happen), and few adults fall out of trees, or eat washing machine detergent, but talking to her about her day when she returns home is fascinating how many things are completely different from how I expected.

Adults come to hospital mostly because they're sick, injured, or drunk.

Children come to hospital mostly because their parents are paranoid.

A child has a rash? Doctors are closed? Lets go to the emergency ward!

A child has fallen out of a tree and has a bruise, a lump, or complains of pain? Doctors are closed? Lets go to the emergency ward!

I've not kept statistics, though I wish I could, but it seems that she can go 3-5 days between seeing an actually injured or chronicly-sick child. It's the first-time-parents who bring kids in when they don't need to.

Understandable, completely understandable, but at the same time I'm sure it is more than a little frustrating for all involved.

Finally one thing I've learned, which seems completely stupid, is the NHS-Scotland approach to recruitment. You apply for a role, such as "A&E doctor" and after an interview, etc, you get told "You've been accepted - you will now work in Glasgow".

In short you apply for a post, and then get told where it will be based afterward. There's no ability to say "I'd like to be a Doctor in city X - where I live", you apply, and get told where it is post-acceptance. If it is 100+ miles away you either choose to commute, or decline and go through the process again.

This has lead to Kirsi working in hospitals with a radius of about 100km from the city we live in, and has meant she's had to turn down several posts.

And that is all I have to say about the NHS for the moment, except for the implicit pity for people who have to pay (inflated and life-changing) prices for things in other countries.

August 31, 2014 11:51 AM

Chris Siebenmann

The downside of expanding your storage through bigger disks

As I mentioned recently, one of the simplest ways of expanding your storage space is simply to replace your current disks with bigger disks and then tell your RAID system, file system, or volume manager to grow into the new space. Assuming that you have some form of redundancy so you can do this on the fly, it's usually the simplest and easiest approach. But it has some potential downsides.

The simplest way to put the downsides is that this capacity expansion is generally blind and not so much inflexible as static. Your storage systems (and thus your groups) get new space in proportion to however much space (or how many disks) they're currently using, and that's it. Unless you already have shared storage, you can't reassign this extra new space from one entity to another because (for example) one group with a lot of space doesn't need more but another group with only a little space used now is expanding a lot.

This is of course perfectly fine if all of your different groups or filesystems or whatever are all going to use the extra space that you've given them, or if you only have one storage system anyways (so all space flowing to it is fine). But in other situations this rigidity in assigning new space may cause you heartburn and make you wish to reshape the storage to a lesser or greater amount.

(One assumption I'm making is that you're going to do basically uniform disk replacement and thus uniform expansion; you're not going to replace only some disks or use different sizes of replacement disks. I make that assumption because mixed disks are as much madness as any other mixed hardware situation.)

by cks at August 31, 2014 04:35 AM

August 30, 2014

Everything Sysadmin

Tom speaking at LOPSA-NJ September meeting (near Princeton/Trenton)

I'll be the speaker at the September LOPSA-NJ meeting. My topic will be the more radical ideas in our new book, The Practice of Cloud System Administration. This talk is (hopefully) useful whether you are legacy enterprise, fully cloud, or anywhere in between.

  • Topic: Radical ideas from The Practice of Cloud Computing
  • Date: Thursday, September 4, 2014
  • Time: 7:00pm (social), 7:30pm (discussion)

This is material from our newest book, which starts shipping the next day. Visit for more info.

August 30, 2014 05:00 PM

Chris Siebenmann

How to change your dm-cache write mode on the fly in Linux

Suppose that you are using a dm-cache based SSD disk cache, probably through the latest versions of LVM (via Lars, and see also his lvcache). Dm-cache is what I'll call an 'interposition' disk read cache, where writes to your real storage go through it; as a result it can be in either writethrough or writeback modes. It would be nice to be able to find out what mode your cache is and also to be able to change it. As it happens this is possible, although far from obvious. This procedure also comes with a bunch of caveats and disclaimers.

(The biggest disclaimer is that I'm fumbling my way around all of this stuff and I am in no way an expert on dm-cache. Instead I'm just writing down what I've discovered and been able to do so far.)

Let's assume we're doing our caching through LVM, partly because that's what I have and partly because if you're dealing directly with dm-cache you probably already know this stuff. Our cache LV is called testing/test and it has various sub-LVs under it (the original LV, the cache metadata LV, and the cache LV itself). Our first job is to find out what the device mapper calls it.

# dmsetup ls --tree
testing-test (253:5)
 |-testing-test_corig (253:4)
 |  `- (9:10)
 |-testing-cache_data_cdata (253:2)
 |  `- (8:49)
 `-testing-cache_data_cmeta (253:3)
    `- (8:49)

We can see the current write mode with 'dmsetup status' on the top level object, although the output is what they call somewhat hard to interpret:

# dmsetup status testing-test
0 10485760 cache 8 259/128000 128 4412/81920 16168 6650 39569 143311 0 0 0 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 512 discard_promote_adjustment 1 read_promote_adjustment 4 write_promote_adjustment 8

I have bolded the important bit. This says that the cache is in the potentially dangerous writeback mode. To change the writeback mode we must redefine the cache, which is a little bit alarming and also requires temporarily suspending and unsuspending the device; the latter may have impacts if, for example, you are actually using it for a mounted filesystem at the time.

DM devices are defined through what the dmsetup manpage describes as 'a table that specifies a target for each sector in the logical device'. Fortunately the tables involved are relatively simple and better yet, we can get dmsetup to give us a starting point:

# dmsetup table testing-test
0 10485760 cache 253:3 253:2 253:4 128 0 default 0

To change the cache mode, we reload an altered table and then suspend and resume the device to activate our newly loaded table. For now I am going to just present the new table with the changes bolded:

# dmsetup reload --table '0 10485760 cache 253:3 253:2 253:4 128 1 writethrough default 0' testing-test
# dmsetup suspend testing-test
# dmsetup resume testing-test

At this point you can rerun 'dmsetup status' to see that the cache device has changed to writethrough.

So let's talk about the DM table we (re)created here. The first two numbers are the logical sector range and the rest of it describes the target specification for that range. The format of this specification is, to quote the big comment in the kernel's devices/md/dm-cache-target.c:

cache <metadata dev> <cache dev> <origin dev> <block size>
      <#feature args> [<feature arg>]*
      <policy> <#policy args> [<policy arg>]*

The original table's ending of '0 default 0' thus meant 'no feature arguments, default policy, no policy arguments'. Our new version of '1 writethrough default 0' is a change to '1 feature argument of writethrough, still the default policy, no policy arguments'. Also, if you're changing from writethrough back to writeback you don't end the table with '1 writeback default 0' because it turns out that writeback isn't a feature, it's just the default state. So you write the end of the table as '0 default 0' (as it was initially here).

Now it's time for the important disclaimers. The first disclaimer is that I'm not sure what happens to any dirty blocks on your cache device if you switch from writeback to writethrough mode. I assume that they still get flushed back to your real device and that this happens reasonably fast, but I can't prove it from reading the kernel source or my own tests and I can't find any documentation. At the moment I would call this caveat emptor until I know more. In fact I'm not truly confident what happens if you switch between writethrough and writeback in general.

(I do see some indications that there is a flush and it is rapid, but I feel very nervous about saying anything definite until I'm sure of things.)

The second disclaimer is that at the moment the Fedora 20 LVM cannot set up writethrough cache LVs. You can tell it to do this and it will appear to have succeeded, but the actual cache device as created at the DM layer will be writeback. This issue is what prompted my whole investigation of this at the DM level. I have filed Fedora bug #1135639 about this, although I expect it's an upstream issue.

The third disclaimer is that all of this is as of Fedora 20 and its 3.15.10-200.fc20 kernel (on 64-bit x86, in specific). All of this may change over time and probably will, as I doubt that the kernel people consider very much of this to be a stable interface.

Given all of the uncertainties involved, I don't plan to consider using LVM caching until LVM can properly create writethrough caches. Apart from the hassle involved, I'm just not happy with converting live dm-cache setups from one mode to the other right now, not unless someone who really knows this system can tell us more about what's really going on and so on.

(A great deal of my basic understanding of dmsetup usage comes from Kyle Manna's entry SSD caching using dm-cache tutorial.)

Sidebar: forcing a flush of the cache

In theory, if you want to be more sure that the cache is clean in a switch between writeback and writethrough you can explicitly force a cache clean by switching to the cleaner policy first and waiting for it to stabilize.

# dmsetup reload --table '0 10485760 cache 253:3 253:2 253:4 128 0 cleaner 0' testing-test
# dmsetup wait testing-test

I don't know how long you have to wait for this to be safe. If the cache LV (or other dm-cache device) is quiescent at the user level, I assume that you should be okay when IO to the actual devices goes quiet. But, as before, caveat emptor applies; this is not really well documented.

Sidebar: the output of 'dmsetup status'

The output of 'dmsetup status' is mostly explained in a comment in front of the cache_status() function in devices/md/dm-cache-target.c. To save people looking for it, I will quote it here:

<metadata block size> <#used metadata blocks>/<#total metadata blocks>
<cache block size> <#used cache blocks>/<#total cache blocks>
<#read hits> <#read misses> <#write hits> <#write misses>
<#demotions> <#promotions> <#dirty>
<#features> <features>*
<#core args> <core args>
<policy name> <#policy args> <policy args>*

Unlike with the definition table, 'writeback' is considered a feature here. By cross-referencing this with the earlier 'dmsetup status' output we can discover that mq is the default policy and it actually has a number of arguments, the exact meanings of which I haven't researched (but see here and here).

by cks at August 30, 2014 04:56 AM

August 29, 2014

Everything Sysadmin

Save up to 55% off on our new book for a short time

The Practice Of Cloud System Administration is featured in the InformIT Labor Day sale. Up to 55% off!

Here is the link

Buy 3 or more and save 55% on your purchase. Buy 2 and save 45%, or buy 1 and save 35% off list price on books, video tutorials, and eBooks. Enter code LABORDAY at checkout.

You will also receive a 30-day trial to Safari Books Online, free with your purchase. Plus, get free shipping within the U.S.

The book won't be shipping until Sept 5th, but you can read a recent draft via Safari Online.

August 29, 2014 04:46 PM

Steve Kemp's Blog

Migration of services and hosts

Yesterday I carried out the upgrade of a Debian host from Squeeze to Wheezy for a friend. I like doing odd-jobs like this as they're generally painless, and when there are problems it is a fun learning experience.

I accidentally forgot to check on the status of the MySQL server on that particular host, which was a little embarassing, but later put together a reasonably thorough serverspec recipe to describe how the machine should be setup, which will avoid that problem in the future - Introduction/tutorial here.

The more I use serverspec the more I like it. My own personal servers have good rules now:

shelob ~/Repos/ $ make
Finished in 1 minute 6.53 seconds
362 examples, 0 failures

Slow, but comprehensive.

In other news I've now migrated every single one of my personal mercurial repositories over to git. I didn't have a particular reason for doing that, but I've started using git more and more for collaboration with others and using two systems felt like an annoyance.

That means I no longer have to host two different kinds of repositories, and I can use the excellent gitbucket software on my git repository host.

Needless to say I wrote a policy for this host too:

#  The host should be wheezy.
describe command("lsb_release -d") do
  its(:stdout) { should match /wheezy/ }

# Our gitbucket instance should be running, under runit.
describe supervise('gitbucket') do
  its(:status) { should eq 'run' }

# nginx will proxy to our back-end
describe service('nginx') do
  it { should be_enabled   }
  it { should be_running   }
describe port(80) do
  it { should be_listening }

#  Host should resolve
describe host("" ) do
  it { should'dns') }

Simple stuff, but being able to trigger all these kind of tests, on all my hosts, with one command, is very reassuring.

August 29, 2014 01:28 PM

Warren Guy

Global DNS Tester Update

Recent changes and improvements to the Global DNS Tester:

  • TTLs for responses are now displayed. This is the number of remaining seconds that the returning server will cache the given answer for before asking an authoritative nameserver again (thanks for the suggestion Alex).
  • Retrying is now more robust. Now when a request fails due to timeout, it has been attempted 5 times, with the last attempt waiting 5 seconds for a response.
  • Excluded nameservers in Iran from 'all servers'. Almost all of them appear to be unreachable from location of the new server.
  • Moved service from Heroku to self hosted server. Primarily because Heroku doesn't support IPv6, so tests against IPv6 nameservers would fail.
  • Minor visual changes. Now scales to full browser window width. Minor formatting changes.

Read full post

August 29, 2014 09:58 AM

Chris Siebenmann

A hazard of using synthetic data in tests, illustrated by me

My current programming enthusiasm is a 'sinkhole' SMTP server that exists to capture incoming email for spam analysis purposes. As part of this it supports matching DNS names against hostnames or hostname patterns that you supply, so you can write rules like:

@from reject host .boring.spammer with message "We already have enough"

Well, that was the theory. The practice was that until very recently this feature didn't actually work; hostname matches always failed. The reason I spent so much time not noticing this is that the code's automated tests passed. Like a good person I had written the code to do this matching and then written tests for it, in fact tests for it even (and especially) in the context of these rules. All of these tests passed with flying colours, so everything looked great (right up until it clearly failed in practice while I was lucky enough to be watching).

One of the standard problems of testing DNS-based features (such as testing matching against the DNS names of an IP address) is that DNS is an external dependency and a troublesome one. If you make actual DNS queries to actual Internet DNS servers, you're dependent on both a working Internet connection and the specific details of the results returned by those DNS servers. As a result people often mock out DNS query results in tests, especially low level tests. I was no exception here; my test harness made up a set of DNS results for a set of IPs.

(Mocking DNS query results is especially useful if you want to test broken things, such as IP addresses with predictably wrong reverse DNS.)

Unfortunately I got those DNS results wrong. The standard library for my language puts a . at the end of all reverse DNS queries, eg the result of looking up the name of is (currently) '' (note the end). Most standard libraries for most languages don't do that, and while I knew that Go's was an exception I had plain overlooked this while writing the synthetic DNS results in my tests. So my code was being tested against 'DNS names' without the trailing dot and matched them just fine, but it could never match actual DNS results in live usage because of the surprise final '.'.

This shows one hazard of using synthetic data in your tests: if you use synthetic data, you need to carefully check that it's accurate. I skipped doing that and I paid the price for it here.

(The gold standard for synthetic data is to make it real data that you capture once and then use forever after. This is relatively easy in algnauges with a REPL but is kind of a pain in a compiled language where you're going to have to write and debug some one-use scratch code.)

Sidebar: how the Go library tests deal with this

I got curious and looked at the tests for Go's standard library. It appears that they deal with this by making DNS and other tests that require external resources be optional (and by hardcoding some names and eg Google's public DNS servers). I think that this is a reasonably good solution to the general issue, although it wouldn't have solved my testing challenges all by itself.

(Since I want to test results for bad reverse DNS lookups and so on, I'd need a DNS server that's guaranteed to return (or not return) all sorts of variously erroneous things in addition to some amount of good data. As far as I know there are no public ones set up for this purpose.)

by cks at August 29, 2014 04:33 AM

RISKS Digest

August 28, 2014

Everything Sysadmin

I'm giving away 1 free ticket to PuppetConf 2014!

PuppetConf 2014 is the 4th annual IT automation event of the year, taking place in San Francisco September 20-24. Join the Puppet Labs community and over 2,000 IT pros for 150 track sessions and special events focused on DevOps, cloud automation and application delivery. The keynote speaker lineup includes tech professionals from DreamWorks Animation, Sony Computer Entertainment America, Getty Images and more.

If you're interested in going to PuppetConf this year, I will be giving away one free ticket to a lucky winner who will get the chance to participate in educational sessions and hands-on labs, network with industry experts and explore San Francisco. Note that these tickets only cover the cost of the conference (a $970+ value), but you'll need to cover your own travel and other expenses (discounted rates available). You can learn more about the conference at:

To enter, click here.

Even if you aren't selected, you can save 20% off registration by reading the note at the top of the form.

ALL ENTRIES MUST BE RECEIVED BY Tue, September 3 at 8pm US/Eastern time.

August 28, 2014 06:40 PM

Chris Siebenmann

One reason why we have to do a major storage migration

We're currently in the process of migrating from our old fileservers to our new fileservers. We're doing this by manually copying data around instead of anything less user visible and sysadmin intensive. You might wonder why, since the old environment and the new architecture are the same (with ZFS, iSCSI backends, mirroring, and so on). One reason is that the ZFS 4K sector issue has forced our hands. But even if that wasn't there it turns out that we probably would have had to do a big user visible migration because of a collision of technology and politics (and ZFS limitations). The simple version of the organizational politics are that we can't just give people free space.

Most storage migrations will increase the size of disks involved, and ours was no exception; we're going from a mixture of 750 GB and 1 TB drives to 2 TB drives. If you just swap drives out for bigger ones, pretty much any decent RAID or filesystem can expand into the newly available space without any problems at all; ZFS is no exception here. If you went from 1 TB to 2 TB disks in a straightforward way, all of your ZFS pools would or could immediately double in size.

(Even if you can still get your original sized disks during an equipment turnover, you generally want to move up in size so that you can reduce the overhead per GB or TB. Plus, smaller drives may actually have worse cost per GB than larger ones.)

Except that we can't give space away for free, so if we did this we would have had to persuade people to buy all this extra space. There was very little evidence that we could do this, since people weren't exactly breaking down our doors to buy more space as it stood.

If you can't expand disks in place and give away (or sell) the space, what you want (and need) to do is to reshape your existing storage blobs. If you were using N disks (or mirror pairs) for a storage blob before the disk change, you want to move to using N/2 disks afterwards; you get the same space but on fewer disks. Unfortunately ZFS simply doesn't support this sort of reshaping of pool storage with existing pools; once a pool has so many disks (or mirrors), you're stuck with that many. This left us with having to do it by hand, in other words a manual migration from old pools with N disks to new pools with N/2 disks.

(If you've decided to go from smaller disks to larger disks you've already implicitly decided to sacrifice overall IOPs per TB in the name of storage efficiency.)

Sidebar: Some hacky workarounds and their downsides

The first workaround would be to double the raw underlying space but then somehow limit people's ability to use it (ZFS can do this via quotas, for example). The problem is that this will generally leave you with a bit less than half of your space allocated but unused on a long term basis unless people wake up and suddenly start buying space. It's also awkward if the wrong people wake up and want a bunch more space, since your initial big users (who may not need all that much more space) are holding down most of the new space.

The second workaround is to break your new disks up into multiple chunks, where one chunk is roughly the size of your old disks. This has various potential drawbacks but in our case it would simply have put too many individual chunks on one disk as our old 1 TB disks already had four (old) chunks, which was about as many ways as we wanted to split up one disk.

by cks at August 28, 2014 03:52 AM

RISKS Digest

August 27, 2014

Everything Sysadmin


The schedule for 2014 has been published and OMG it looks like an entirely new conference. By "new" I mean "new material"... I don't see slots filled with the traditional topics that used to get repeated each year. By "new" I also mean that all the sessions are heavily focused on forward-thinking technologies instead of legacy systems. This conference looks like the place to go if you want to move your career forward.

LISA also has a new byline: LISA: Where systems engineering and operations professionals share real-world knowledge about designing, building, and maintaining the critical systems of our interconnected world.

The conference website is here:

I already have approval from my boss to attend. I can't wait!

Disclaimer: I helped recruit for the tutorials, but credit goes to the tutorial coordinator and Usenix staff who did much more work than I did.

August 27, 2014 05:00 PM

How sysadmins can best understand Burning Man

So, you've heard about Burning Man. It has probably been described to you as either a hippie festival, where rich white people go to act cool, naked chicks, or a drug-crazed dance party in the desert. All of those descriptions are wrong..ish. Burning Man is a lot of things and can't be summarized in a sound bite. I've never been to Burning Man, but a lot of my friends are burners, most of whom are involved in organizing their own group that attends, or volunteer directly with the organization itself.

Imagine 50,000 people (really!) showing up for a 1-week event. You essentially have to build a city and then remove it. As it is a federal land, they have to "leave no trace" (leave it as clean as they found it). That's a lot of infrastructure to build and projects to coordinate.

We, as sysadmins, love infrastructure and are often facinated by how large projects are managed. Burning Man is a great example.

There is a documentary called Burning Man: Beyond Black Rock which not only explains the history and experience that is Burning Man, but it goes into a lot of "behind the scenes" details of how the project management and infrastructure management is done.

You can watch the documentary many ways:

There is a 4 minute trailer on (warning: autoplay)

The reviews on IMDB are pretty good. One noteworthy says:

I cannot say enough about the job these filmmakers did and the monumental task they took on in making this film. First, Burning Man is a topic that has been incredibly marginalized by the media to the point of becoming a recurring joke in The Simpsons (although the director of the Simpsons is at Burning Man every year), and second, to those who DO know about it, its such a sensitive topic and so hard to deliver something that will please the core group.

Well, these guys did it, and in style I must say. This doc is witty, fast moving, and most importantly profoundly informational and moving without seeming too close to the material.

I give mad props to anyone that can manage super huge projects so well. I found the documentary a powerful look at an amazing organization. Plus, you get to see a lot of amazing art and music.

I highly recommend this documentary.

Update: The post originally said the land is a national park. It is not. It is federal land. (Thanks to Mike for the correction!)

August 27, 2014 03:00 PM

Debian Administration

Using the haproxy load-balancer for increased availability

HAProxy is a TCP/HTTP load-balancer, allowing you to route incoming traffic destined for one address to a number of different back-ends. The routing is very flexible and it can be a useful component of a high-availability setup.

by Steve at August 27, 2014 02:12 PM

Chris Siebenmann

The difference between Linux and FreeBSD boosters for me

A commentator on my entry about my cultural bad blood with FreeBSD said, in very small part:

I'm surprised that you didn't catch this same type of bad blood from the linux world. [...]

This is a good question and unfortunately my answers involve a certain amount of hand waving.

First off, I think I simply saw much less of the Linux elitism than I did of the FreeBSD elitism, partly because I wasn't looking in the places where it probably mostly occurred and partly because by the time I was looking at all, Linux was basically winning and so Linux people did less of it. To put it one way, I'm much more inclined towards the kind of places you found *BSD people in the early 00s than the kinds of places that were overrun by bright-eyed Linux idiots.

(I don't in general hold the enthusiasms of bright-eyed idiots of any stripe against a project. Bright-eyed idiots without enough experience to know better are everywhere and are perfectly capable of latching on to anything that catches their fancy.)

But I think a large part of it was that the Linux elitism I saw was both of a different sort than the *BSD elitism and also in large part so clearly uninformed and idiotic that it was hard to take seriously. To put it bluntly, the difference I can remember seeing between the informed people on both sides was that the Linux boosters mostly just argued that it was better while the *BSD boosters seemed to have a need to go further and slam Linux, Linux users, and Linux developers while wrapping themselves in the mantle of UCB BSD and Bell Labs Unix.

(Linux in part avoided this because it had no historical mantle to wrap itself in. Linux was so clearly ahistorical and hacked together (in a good sense) that no one could plausibly claim some magic to it deeper than 'we attracted the better developers'.)

I find arguments and boosterism about claimed technical superiority to be far less annoying and offputting than actively putting down other projects. While I'm sure there were Linux idiots putting down the *BSDs (because I can't imagine that there weren't), they were at most a a fringe element of what I saw as the overall Linux culture. This was not true of the FreeBSD and the *BSDs, where the extremely jaundiced views seemed to be part of the cultural mainline.

Or in short: I don't remember seeing as much Linux elitism as I saw *BSD elitism and the Linux boosterism I saw irritated me less in practice for various reasons.

(It's certainly possible that I was biased in my reactions to elitism on both sides. By the time I was even noticing the Linux versus *BSD feud we had already started to use Linux. I think before then I was basically ignoring the whole area of PC Unixes, feuds and all.)

by cks at August 27, 2014 06:26 AM

August 26, 2014

RISKS Digest

Chris Siebenmann

Why I don't like HTTP as a frontend to backend transport mechanism

An extremely common deployment pattern in modern web applications is not to have the Internet talk HTTP directly to your application but instead to put it behind a (relatively lightweight) frontend server like Apache, nginx, or lighttpd. While this approach has a number of advantages, it does leave you with the question of how you're going to transport requests and replies between your frontend web server and your actual backend web application. While there are a number of options, one popular answer is to just use HTTP; effectively you're using a reverse proxy.

(This is the method of, for example, Ruby's Unicorn and clones of it in other languages such as Gunicorn.)

As it happens, I don't like using HTTP for a transport this way; I would much rather use something like SCGI or FastCGI or basically any protocol that is not HTTP. My core problem with using HTTP as a transport protocol can be summarized by saying that standard HTTP does not have transparent encapsulation. Sadly that is jargon, so I'd better explain this better.

An incoming HTTP request from the outside world comes with a bunch of information and details; it has a source IP, a Host:, the URL it was requesting, and so on. Often some or many of these are interesting to your backend. However, many of them are going to be basically overwritten when your frontend turns around and makes its own HTTP request to your backend, because the new HTTP request has to use them itself. The source IP will be that of the frontend, the URL may well be translated by the frontend, the Host: may be forced to the name of your backend, and so on. The problem is that standard HTTP doesn't define a way to take the entire HTTP request, wrap it up intact and unaltered, and forward it off to some place for you to unwrap. Instead things are and have to be reused and overwritten in the frontend to backend HTTP request, so what your backend sees is a mixture of the original request plus whatever changes the frontend had to make in order to make a proper HTTP request to you.

You can hack around this; for example, your frontend can add special headers that contain copies of the information it has to overwrite and the backend can know to fish the information out of these headers and pretend that the request had them all the time. But this is an extra thing on top of HTTP, not a standard part of it, and there are all sorts of possibilities for incomplete and leaky abstractions here.

A separate transport protocol avoids all of this by completely separating the client's HTTP request to the frontend from the way it's transported to the backend. There's no choice but to completely encapsulate the HTTP request (and the reply) somehow and this enforces a strong separation between HTTP request information and transport information. In any competently designed protocol you can't possibly confuse one for the other.

Of course you could do the same thing with HTTP by defining an HTTP-in-HTTP encapsulation protocol. But as far as I know there is no official or generally supported protocol for this, so you won't find standard servers or clients for such a thing the way you can for SCGI, FastCGI, and so on.

(I feel that there are other pragmatic benefits of non-HTTP transport protocols, but I'm going to defer them to another entry.)

Sidebar: another confusion that HTTP as a transport causes

So far I've talked about HTTP requests, but there's an issue with HTTP replies as well because they aren't encapsulated either. In a backend server you have two sorts of errors, errors for the client (which should be passed through to them) and errors for the frontend server that tells it, for example, that something has gone terribly wrong in the backend. Because replies are not encapsulated you have no really good way of telling these apart. Is a 404 error a report from the web application to the client or an indication that your frontend is trying to talk to a missing or misconfigured endpoint on the backend server?

by cks at August 26, 2014 04:14 AM

August 25, 2014

Steve Kemp's Blog

Updates on git-hosting and load-balancing

To round up the discussion of the Debian Administration site yesterday I flipped the switch on the load-balancing. Rather than this:

  https -> pound \
  http  -------------> varnish  --> apache

We now have the simpler route for all requests:

http  -> haproxy -> apache
https -> haproxy -> apache

This means we have one less HTTP-request for all incoming secure connections, and these days secure connections are preferred since a Strict-Transport-Security header is set.

In other news I've been juggling git repositories; I've setup an installation of GitBucket on my git-host. My personal git repository used to contain some private repositories and some mirrors.

Now it contains mirrors of most things on github, as well as many more private repositories.

The main reason for the switch was to get a prettier interface and bug-tracker support.

A side-benefit is that I can use "groups" to organize repositories, so for example:

Most of those are mirrors of the github repositories, but some are new. When signed in I see more sources, for example the source to

I've been pleased with the setup and performance, though I had to add some caching and some other magic at the nginx level to provide /robots.txt, etc, which are not otherwise present.

I'm not abandoning github, but I will no longer be using it for private repositories (I was gifted a free subscription a year or three ago), and nor will I post things there exclusively.

If a single canonical source location is required for a repository it will be one that I control, maintain, and host.

I don't expect I'll give people commit access on this mirror, but it is certainly possible. In the past I've certainly given people access to private repositories for collaboration, etc.

August 25, 2014 08:16 AM

Chris Siebenmann

10G Ethernet is a sea change for my assumptions

We're soon going to migrate a bunch of filesystems to a SSD-based new fileserver, all at once. Such migrations force us to do full backups of migrated filesystems (to the backup system they appear as new filesystems), so a big move means a sudden surge in backup volume. As part of how to handle this surge, I had the obvious thought: we should upgrade the backup server that will handle the migrated filesystems to 10G Ethernet now. The 10G transfer speeds plus the source data being on SSDs would make it relatively simple to back up even this big migration overnight during our regular backup period.

Except I realized that this probably wasn't going to be the case. Our backup system writes backups to disk, specifically to ordinary SATA disks that are not aggregated together in any sort of striped setup, and an ordinary SATA disk might write at 160 Mbytes per second on a good day. This is only slightly faster than 1G Ethernet and certainly nowhere near the reasonable speeds of 10G Ethernet in our environment. We can read data off the SSD-based fileserver and send it over the network to the backup server very fast, but that doesn't really do us anywhere near as much good as it looks when the entire thing is then going to come to a screeching halt by the inconvenient need to write the data to disk on the backup server. 10G will probably help the backup servers a bit, but it isn't going to be anywhere near a great speedup.

What this points out is that my reflexive assumptions are calibrated all wrong for 10G Ethernet. I'm used to thinking of the network as slower than the disks, often drastically, but this is no longer even vaguely true. Even so-so 10G Ethernet performance (say 400 to 500 Mbytes/sec) utterly crushes single disk bandwidth for anything except SSDs. If we get good 10G speeds, we'll be crushing even moderate multi-disk bandwidth (and that's assuming we get full speed streaming IO rates and we're not seek limited). Suddenly the disks are the clear limiting factor, not the network. In fact even a single SSD can't keep up with a 10G Ethernet at full speed; we can see this from the mere fact that SATA interfaces themselves currently max out at 6 Gbits/sec on any system we're likely to use.

(I'd run into this before even for 1G Ethernet, eg here, but it evidently hadn't really sunk into my head.)

PS: I don't know what this means for our backup servers and any possible 10G networking in their future. 10G is likely to improve things somewhat, but the dual 10G-T Intel cards we use don't grow on trees and maybe it's not quite cost effective for them right now. Or maybe the real answer is working out how to give them striped staging disks for faster write speeds.

by cks at August 25, 2014 03:43 AM

August 24, 2014

RISKS Digest