Planet SysAdmin

January 27, 2015

RISKS Digest

January 26, 2015


The sysadmin skills-path.

Tom Limoncelli posted a question today.

What is the modern rite of passage for sysadmins? I want to know.

That's a hard one, but it got me thinking about career-paths and skills development, and how it has changed since I did it. Back when I started, the Internet was just becoming a big source of information. If it wasn't on Usenet, the vendor's web-site might have a posted knowledge-base. You could learn a lot from those. I also learned a lot from other admins I was working with.

One of the big lamentations I hear on ServerFault is that kids these days expect a HOWTO for everything.

Well, they're right. I believe that's because of how friendly bloggers like myself have trained others into finding out how to do stuff. So I posit this progression of skill-set for a budding sysadmin deploying a NewThing.

  1. There is always a checklist if you google hard enough. If that one doesn't work, look for another one.
    • And if that doesn't work, ask a patch of likely experts (or bother the expert in the office) to make one for you. It works sometimes.
    • And if that doesn't work, give up in disgust.
  2. Google for checklists. Find one. Hit a snag. Look for another one. Hit another snag. Titrate between the two to get a good install/config.
    • If that doesn't work, follow the step-1 progression to get a good config. You'll have better luck with the experts this time.
  3. Google for checklists. Find a couple. Analyze them for failure points and look for gotcha-docs. Build a refined procedure to get a good install/config.
  4. Google for checklists. Find a few. Generalize a good install/config procedure out of them and write your own checklist.
    • If it works, blog about it.
  5. Google for checklists. Find a few, and some actual documentation. Make decisions about what settings you need to change and why, based on documentation evidence and other people's experience. Install it.
    • If it works, write it up for the internal wiki.
  6. [Graduation] Plunder support-forums for problem-reports to see where installs have gone wrong. Revise your checklist accordingly.
    • If it works, go to local Meetups to give talks about your deploy experience.

That seems about right. When you get to the point where your first thought about deploying a new thing is, "what can go wrong that I need to know about," you've arrived.

by SysAdmin1138 at January 26, 2015 09:45 PM

Everything Sysadmin

Rites of Passage for a modern sysadmin?

Dear readers: I need your help. I feel like I've lost touch with what new sysadmins go through. I learned system administration 20+ years ago. I can't imagine what new sysadmins go through now.

In particular, I'd like to hear from new sysadmins about what their "rite of passage" was that made them feel like a "real sysadmin".

When I was first learning system administration, there was a rite of passage called "setting up an email server". Everyone did it.

This was an important project because it touches on so many different aspects of system administration: DNS, SMTP, Sendmail configuration, POP3/IMAP4, setting up a DNS server, debugging clients, and so on and so on. A project like this might take weeks or months depending on what learning resources you have, if you have a mentor, and how many features you want to enable and experiment with.

Nowadays it is easier to do that: Binary packages and better defaults have eliminated most of the complexity. Starter documentation is plentiful, free, and accessible on the web. DNS domain registrars host the zone too, and make updates easy. Email addressing has become banal, mostly thanks to uniformity (and the end of UUCP).

More "nails in the coffin" for this rite of passage include the fact that ISPs now provide email service (this didn't used to be true), hosted email services like Google Apps have more features than most open source products, and ...oh yeah... email is passe.

What is the modern rite of passage for sysadmins? I want to know.

If you became a sysadmin in the last 10 years: What project or "rite of passage" made you feel like you had gone from "beginner" to being "a real sysadmin!"

Please tell me here.

January 26, 2015 03:00 PM

Chris Siebenmann

Some notes on keeping up with Go packages and commands

Bearing in mind that just go get'ing things is a bad way to remember what packages you're interested in, it can be useful to keep an eye on updates to Go packages and commands. My primary tool for this is Dmitri Shuralyov's Go-Package-Store, which lets you keep an eye on not only what stuff in $GOPATH/src has updates but what they are. However, there are a few usage notes that I've accumulated.

The first and most important thing to know about Go-Package-Store, and something that I only realized recently myself (oh the embarrassment) is that Go-Package-Store does not rebuild packages or commands. All it does is download new versions (including fetching and updating their dependencies). You can see this in the commands it's running if you pay attention, since it specifically runs 'go get -u -d'. This decision is sensible and basically necessary, since many commands and (sub) packages aren't installed with 'go get <repo top level>', but it does mean that you're going to have to do this yourself when you want to.

So, the first thing this implies is that you need to keep track of the go get command to rebuild each command in $GOPATH/bin that you care about; otherwise, sooner or later you'll be staring at a program in $GOPATH/bin and resorting to web searches to find out what repository it came from and how it's built. I suggest just putting this information in a simple shell script that just does a mass rebuild, with one 'go get' per line; when I want to rebuild just a specific command, I cut and paste its line.

(Really keen people will turn the text file into a script so that you can do things like 'rebuild <command>' to run the right 'go get' to rebuild the given command.)

The next potentially tricky area is dependent packages, in several ways. The obvious thing is that having G-P-S update a dependent package doesn't in any way tell you that you should rebuild the command that uses it; in fact G-P-S doesn't particularly know what uses what package. The easy but bruce force way to deal with this is just to rebuild all commands every so often (well, run 'go get -u' against them, I'm not sure how much Make-like dependency checking it does).

The next issue is package growth. What I've noticed over time is that using G-P-S winds up with me having extra packages that aren't needed by the commands (and packages) that I have installed. As a result I both pay attention to what packages G-P-S is presenting updates for and periodically look through $GOPATH/src for packages that make me go 'huh?'. Out of place packages get deleted instead of updated, on the grounds that if they're actual dependencies of something I care about they'll get re-fetched when I rebuild commands.

(I also delete $GOPATH/pkg/* every so often. One reason that all of this rebuilding doesn't bother me very much is that I track the development version of Go itself, so I actively want to periodically rebuild everything with the latest compiler. People with big code bases and stable compilers may not be so sanguine about routinely deleting compiled packages and so on.)

I think that an explicit 'go get -u' of commands and packages that you care about will reliably rebuild dependent packages that have been updated but not (re)built in the past by Go-Package-Store, but I admit that I sometimes resort to brute force (ie deleting $GOPATH/pkg/*) just to be sure. Go things build very fast and I'm not building big things, so my attitude is 'why not?'.

Sidebar: Where I think the extra packages come from

This is only a theory. I haven't tested it directly; it's just the only cause I can think of.

Suppose you have a command that imports a sub-package from a repository. When you 'go get' the command, I believe that Go only fetches the further imported dependencies of the sub-package itself. Now, later on Go-Package-Store comes along, reports that the repository is out of date, and when you tell it to update things it does a 'go get' on the entire repository (not just the sub-package initially used by the command). This full-repo 'go get' presumably imports either all dependencies used in the repository or all dependencies of the code in the top level of the repository (I'm not sure which), which may well add extra dependencies over what the sub-package needed.

(The other possible cause is shifting dependencies in packages that I use directly, but some stray packages are so persistent in their periodic returns that I don't really believe that.)

by cks at January 26, 2015 06:54 AM

January 25, 2015

The Lone Sysadmin

How to Install CrashPlan on Linux

I like CrashPlan. They support a wider range of operating systems than some of their competitors, they have a simple pricing model, unlimited storage & retention, and a nice local, mobile, and web interfaces. I’ve been a customer for a few years now, and recently have switched a few of my clients’ businesses over to […]

The post How to Install CrashPlan on Linux appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 25, 2015 09:01 PM

Chris Siebenmann

The long term problem with ZFS on Linux is its license

Since I've recently praised ZFS on Linux as your only real choice today for an advanced filesystem, I need to bring up the long term downside because, awkwardly, I do believe that btrfs is probably going to be the best pragmatic option in the long term and is going to see wider adoption once it works reliably.

The core of the problem is ZFS's license, which I've written about before. What I didn't write about back then because I didn't know enough at the time was the full effects on ZoL of not being included in distributions. The big effect is it will probably never be easy or supported to make your root filesystem a ZFS pool. Unless distributions restructure their installers (and they have no reason to do so), a ZFS root filesystem needs first class support in the installer and it will almost certainly be rather difficult (both politically and otherwise) to add this. This means no installer-created filesystem can be a ZFS one, and the root filesystem has to be created in the installer.

(Okay, you can shuffle around your root filesystem after the basic install is done. But that's a big pain.)

In turn this means that ZFS on Linux is probably always going to be a thing for experts. To use it you need to leave disk space untouched in the installer (or add disk space later), then at least fetch the ZoL packages from an additional repository and have them auto-install on your kernel. And of course you have to live with a certain amount of lack of integration in all of the bits (especially if you go out of your way to use a ZFS root filesystem).

(And as I've seen there are issues with mixing ZFS and non-ZFS filesystems. I suspect that these issues will turn out to be relatively difficult to fix, if they can be at all. Certainly things seem much more likely to work well if all of your filesystems are ZFS filesystems.)

PS: Note that in general having non-GPLv2, non-bundled kernel modules is not an obstacle to widespread adoption if people want what you have to offer. A large number of people have installed binary modules for their graphics cards, for one glaring example. But I don't think that fetching these modules has been integrated into installers despite how popular they are.

(Also, I may be wrong here. If ZFS becomes sufficiently popular, distributions might at least make it easy for people to make third party augmented installers that have support for ZFS. Note that ZFS support in an installer isn't as simple as the choice of another filesystem; ZFS pools are set up quite differently from normal filesystems and good ZFS root pool support has to override things like setup for software RAID mirroring.)

by cks at January 25, 2015 09:21 AM

The Lone Sysadmin

Why Use SD Cards For VMware ESXi?

I’ve had four interactions now regarding my post on replacing a failed SD card in one of my servers. They’ve ranged from inquisitive: @plankers why would you use an SD card in a server. I’m not a sys admin, but just curious. — Allan Çelik (@Allan_Celik) January 22, 2015 to downright rude: “SD cards are […]

The post Why Use SD Cards For VMware ESXi? appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 25, 2015 12:24 AM

January 24, 2015


The Next Version of

Longtime TaoSecurity Blog readers are likely to remember me mentioning This is a Web site that returns nothing more than

uid=0(root) gid=0(root) groups=0(root)

This content triggers a Snort intrusion detection system alert, due to the signature

alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; fast_pattern:only; classtype:bad-unknown; sid:2100498; rev:8;)

You can see the Web page in Firefox, and the alert in Sguil, below.

A visit to this Web site is a quick way to determine if your NSM sensor sees what you expect it to see, assuming you're running a tool that will identify the activity as suspicious. You might just want to ensure your other NSM data records the visit, as well.

Site owner Chas Tomlin emailed me today to let me know he's adding some new features to You can read about them in this blog post. For example, you could download a malicious .exe, or other files.

Chas asked me what other sorts of tests I might like to see on his site. I'm still thinking about it. Do you have any ideas?

by Richard Bejtlich ( at January 24, 2015 09:52 PM

LZone - Sysadmin

Debugging dovecot ACL Shared Mailboxes Not Showing in Thunderbird

When you can't get ACL shared mailboxes visible with Dovecot and Thunderbird here are some debugging tipps:

  1. Thunderbird fetches the ACLs on startup (and maybe at some other interval). So for testing restart Thunderbird on each change you make.
  2. Ensure the shared mailboxes index can be written. You probably have it configured like
    plugin {
      acl_shared_dict = file:/var/lib/dovecot/db/shared-mailboxes.db

    Check if such a file was created and is populated with new entries when you add ACLs from the mail client. As long as entries do not appear here, nothing can work.

  3. Enable debugging in the dovecot log or use the "debug" flag and check the ACLs for the user who should see a shared mailbox like this:
    doveadm acl debug -u shared/users/box
    • Watch out for missing directories
    • Watch out for permission issues
    • Watch out for strangely created paths this could hint a misconfigured namespace prefix

by Lars Windolf at January 24, 2015 04:18 PM

Everything Sysadmin

How to move PCs to a corporate standard?

Someone asked me in email for advice about how to move many machines to a new corporate standard. I haven't dealt with desktop/laptop PC administration ("fleet management") in a while, but I explained this experience and thought I'd share it on my blog:

I favor using "the carrot" over "the stick". The carrot is making the new environment better for the users so they want to adopt it, rather than using management fiat or threats to motivate people. Each has its place.

The more people feel involved in the project the more likely they are to go along with it. If you start by involving typical users by letting them try out the new configuration in a test lab or even loaning them a machine for a week, they'll feel like they are being listened to and will be your partner instead of a roadblock.

Once I was in a situation where we had to convert many PCs to a corporate standard.

First we made one single standard PC. We let people try it out and find problems. We resolved or found workarounds to any problems or concerns raised.

At that point we had a rule: all new PCs would be built using the standard config. No regressions. The number of standard PCs should only increase. If we did that and nothing more, eventually everything would be converted as PCs only last 3 years.

That said, preventing any back-sliding (people installing PCs with the old configuration by mistake, out of habit, or wanting an "exception") was a big effort. The IT staff had to be vigilant. "No regressions!" was our battlecry. Management had to have a backbone. People on the team had to police ourselves and our users.

We knew waiting for the conversion to happen over 3 years was much too slow. However before we could accelerate the process, we had to get those basics correct.

The next step was to convert the PCs of people that were willing and eager. The configuration was better, so some people were eager to convert. Updates happened automatically. They got a lot of useful software pre-installed. We were very public about how the helpdesk was able to support people with the new configuration better and faster than the old configuration.

Did some people resist? Yes. However there were enough willing and eager people to keep us busy. We let those "late adopters" have their way. Though, we'd mentally prepare them for the eventual upgrade by saying things like (with a cheerful voice), "Oh, we're a late adopter! No worries. We'll see you in a few months." By calling them "late adopter" instead of "resistor" or "hard cases" it mentally reframed the issue as them being "eventual" not "never".

Some of our "late adopters" volunteered to convert on their own. They got a new machine and didn't have a choice. Or, they saw that other people were happy with the new configuration and didn't want to be left behind. Nobody wants to be the only kid on the block without the new toy that all the cool kids have.

(Oh, did I mention the system for installing PCs the old way is broken and we can't fix it? Yeah, kind of like how parents tell little kids the "Frozen" disc isn't working and we'll have to try again tomorrow.)

Eventually those conversions were done and we had the time and energy to work on the long tail of "late adopters". Some of these people had verified technical issues such as software that didn't work on the new system. Each of these could be many hours or days helping the user make the software work or finding replacement products. In some cases, we'd extract the user's disk into a Virtual Machine (p2v) so that it could run in the old environment.

However eventually we had to get rid of the last few hold-outs. The support cost of the old machine was $x and if there are 100 remaining machines, $x/100 isn't a lot of money. When there are 50 remaining machines the cost is $x/50. Eventually the cost is $x/1 and that makes that last machine very very expensive. The faster we can get to zero, the better.

We announced that unconverted machines would be unsupported after date X, and would stop working (the file servers wouldn't talk to them) by date Y. We had to get management support on X and Y, and a commitment to not make any exceptions. We communicated the dates broadly at first, then eventually only the specific people affected (and their manager) received the warnings. Some people figured out that they could convince (trick?) their manager into buying them a new PC as part of all this... we didn't care as long as we got rid of the old configuration. (If I was doing this today, I'd use 802.11x to kick old machines off the network after date Z.)

One excuse we could not tolerate was "I'll just support it myself". The old configuration didn't automatically receive security patches and "self-supported machines" were security problems waiting to happen. The virtual machines were enough of a risk.

Speaking of which... the company had a loose policy about people taking home equipment that was discarded. A lot of kids got new (old) PCs. We were sure to wipe the disks and be clear that the helpdesk would not assist them with the machine once disposed. (In hindsight, we should have put a sticker on the machine saying that.)

Conversion projects like this pop up all the time. Sometimes it is due to a smaller company being bought by a larger company, a division that didn't use centralized IT services adopting them, or moving from an older OS to a newer OS.

If you are tasked with a similar conversion project you'll find you need to adjust the techniques you use depending on many factors. Doing this for 10 machines, 500 machines, or 10,000 machines all require adjusting the techniques for the situation.

If you manage server farms instead of desktop/laptop PC fleets similar techniques work.

January 24, 2015 01:25 PM

Chris Siebenmann

Web applications and generating alerts due to HTTP requests

One of the consequences and corollaries of never trusting anything you get from the network is that you should think long and hard before you make your web application generate alerts based on anything in incoming HTTP requests. Because outside people can put nearly anything into HTTP requests and because the Internet is very big, it's very likely that sooner or later some joker will submit really crazy HTTP requests will all sorts of bad or malicious content. If you're alerting on this, well, you can wind up with a large pile of alerts (or with an annoying trickle of alerts that numbs you to them and to potential problems).

Since the Internet is very big and much of it doesn't give much of a damn about your complaints, 'alerts' about bad traffic from random bits of the Internet are unlikely to be actionable alerts. You can't get the traffic stopped by its source (although you can waste a lot of time trying) and if your web application is competently coded it shouldn't be vulnerable to these malicious requests anyways. So it's reporting that someone rattled the doorknobs (or tried to kick the door in); well, that happens all the time (ask any sysadmin with an exposed SSH port). It's still potentially useful to feed this information to a trend monitoring system, but 'HTTP request contains bad stuff' should not be an actual alert that goes to humans.

(However, if your web application is only exposed inside what is supposed to be a secured and controlled environment, bad traffic may well be an alert-worthy thing because it's something that's really never supposed to happen.)

A corollary to this is that web frameworks should not default to treating 'HTTP request contains bad stuff' as any sort of serious error that generates an alert. Serious errors are things like 'cannot connect to database' or 'I crashed'; 'HTTP request contains bad stuff' is merely a piece of information. Sadly there are frameworks that get this wrong. And yes, under normal circumstances a framework's defaults should be set for operation on the Internet, not in a captive internal network, because this is the safest and most conservative assumption (for a definition of 'safest' that is 'does not deluge people with pointless alerts').

(This implies that web frameworks should have a notion of different types or priorities of 'errors' and should differentiate what sort of things get what priorities. They should also document this stuff.)

by cks at January 24, 2015 05:16 AM

January 23, 2015


Is an Alert Review Time of Less than Five Hours Enough?

This week, FireEye released a report titled The Numbers Game: How Many Alerts are too Many to Handle? FireEye hired IDC to survey "over 500 large enterprises in North America, Latin America, Europe, and Asia" and asked director-level and higher IT security practitioners a variety of questions about how they manage alerts from security tools. In my opinion, the following graphic was the most interesting:

As you can see in the far right column, 75% of respondents report reviewing critical alerts in "less than 5 hours." I'm not sure if that is really "less than 6 hours," because the next value is "6-12 hours." In any case, is it sufficient for organizations to have this level of performance for critical alerts?

In my last large enterprise job, as director of incident response for General Electric, our CIO demanded 1 hour or less for critical alerts, from time of discovery to time of threat mitigation. This means we had to do more than review the alert; we had to review it and pass it to a business unit in time for them to do something to contain the affected asset.

The strategy behind this requirement was one of fast detection and response to limit the damage posed by an intrusion. (Sound familiar?)

Also, is it sufficient to have fast response for only critical alerts? My assessment is no. Alert-centric response, which I call "matching" in The Practice of Network Security Monitoring, is only part of the operational campaign model for a high-performing CIRT. The other part is hunting.

Furthermore, it is dangerous to rely on accurate judgement concerning alert rating. It's possible a low or moderate level alert is more important than a critical alert. Who classified the alert? Who wrote it? There are a lot of questions to be answered.

I'm in the process of doing research for my PhD in the war studies department at King's College London. I'm not sure if my data or research will be able to answer questions like this, but I plan to investigate it.

What do you think?

by Richard Bejtlich ( at January 23, 2015 07:45 PM

Try the Critical Stack Intel Client

You may have seen in my LinkedIn profile that I'm advising a security startup called Critical Stack. If you use Security Onion or run the Bro network security monitoring platform (NSM), you're ready to try the Critical Stack Intel Client.

Bro is not strictly an intrusion detection system that generates alerts, like Snort. Rather, Bro generates a range of NSM data, including session data, transaction data, extracted content data, statistical data, and even alerts -- if you want them.

Bro includes an intelligence framework that facilitates integrating various sources into Bro. These sources can include more than just IP addresses. This Bro blog post explains some of the options, which include:


This Critical Stack Intel Client makes it easy to subscribe to over 30 threat feeds for the Bro intelligence framework. The screen capture below shows some of the feeds:

Visit and follow the wizard to get started. Basically, you begin by creating a Collection. A Collection is a container for the threat intelligence you want. Next you select the threat intelligence Feeds you want to populate your collection. Finally you create a Sensor, which is the system where you will deploy the threat intelligence Collection. When done you have an API key that your client will use to access the service.

I wrote a document explaining how to move beyond the wizard and test the client on a sensor running Bro -- either Bro by itself, or as part of the Security Onion NSM distro.

The output of the Critical Stack Intel Client will be new entries in an intel.log file, stored with other Bro logs.

If Bro is completely new to you, I discuss how to get started with it in my latest book The Practice of Network Security Monitoring.

Please take a look at this new free software and let me know what you think.

by Richard Bejtlich ( at January 23, 2015 07:01 AM

Chris Siebenmann

A problem with gnome-terminal in Fedora 21, and tracking it down

Today I discovered that Fedora 21 subtly broke some part of my environment to the extent that gnome-terminal refuses to start. More than that, it refuses to start with a completely obscure error message:

; gnome-terminal
Error constructing proxy for org.gnome.Terminal:/org/gnome/Terminal/Factory0: Error calling StartServiceByName for org.gnome.Terminal: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.gnome.Terminal exited with status 8

If you're here searching for the cause of this error message, let me translate it: what it really means is that your session's dbus-daemon could not start /usr/libexec/gnome-terminal-server when gnome-terminal asked it to. In many cases, it may be because your system's environment has not initialized $LC_CTYPE or $LANG to some UTF-8 locale at the time that your session was being set up (even if one of these environment variables gets set later, by the time you're running gnome-terminal). In the modern world, increasing amount of Gnome bits absolutely insist on being in a UTF-8 locale and fail hard if they aren't.

Some of you may be going 'what?' here. What you suspect is correct; the modern Gnome 3 'gnome-terminal' program is basically a cover script rather than an actual terminal emulator. Instead of opening up a terminal window itself, it exists to talk over DBus to a master gnome-terminal-server process (which will theoretically get started on demand). It is the g-t-s process that is the actual terminal emulator, creates the windows, starts the shells, and all. And yes, one process handles all of your gnome-terminal windows; if that process ever hits a bug (perhaps because of something happening in one window) and dies, all of them die. Let's hope g-t-s doesn't have any serious bugs.

To find the cause of this issue, well, if I'm being honest a bunch of this was found with an Internet search of the error message. This didn't turn up my exact problem but it did turn up people reporting locale problems and also a mention of gnome-terminal-server, which I hadn't known about before. For actual testing and verification I did several things:

  • first I used strace on gnome-terminal itself, which told me nothing useful.

  • I discovered that starting gnome-terminal-server by hand before running gnome-terminal made everything work.

  • I used dbus-monitor --session to watch DBus messages when I tried to start gnome-terminal. This didn't really tell me anything that I couldn't have seen from the error message, but it did verify that there was really a DBus message being sent.

  • I found the dbus-daemon process that was handling my session DBus and used 'strace -f -p ...' on it while I ran gnome-terminal. This eventually wound up with it starting gnome-terminal-server and g-t-s exiting after writing a message to standard error. Unfortunately the default strace settings truncated the message, so I reran strace while adding '-e write=2' to completely dump all messages to standard error. This got me the helpful error message from g-t-s:
    Non UTF-8 locale (ANSI_X3.4-1968) is not supported!

    (If you're wondering if dbus-daemon sends standard error from either itself or processes that it starts to somewhere useful, ha ha no, sorry, we're all out of luck. As far as I can tell it specifically sends standard error to /dev/null.)

  • I dumped the environment of the dbus-daemon process with 'tr '\0' '\n' </proc/<PID>/environ | less' and inspected what environment variables it had set. This showed that it had been started without my usual $LC_CTYPE setting (cf).

With this in hand I could manually reproduce the problem by trying to start gnome-terminal-server with $LC_CTYPE unset, and then I could fix up my X startup scripts to set $LC_CTYPE before they ran dbus-launch.

(This entry is already long enough so I am going to skip my usual rant about Gnome and especially Gnome 3 making problems like this very difficult for even experienced system administrators to debug because there are now so many opaque moving parts to even running Gnome programs standalone, much less in a full Gnome environment. How is anyone normal supposed to debug this when gnome-terminal can't even be bothered to give you a useful error summary in addition to the detailed error report from DBus?)

by cks at January 23, 2015 06:55 AM

January 22, 2015


Notes on Stewart Baker Podcast with David Sanger

Yesterday Steptoe and Johnson LLP released the 50th edition of their podcast series, titled Steptoe Cyberlaw Podcast - Interview with David Sanger. Stewart Baker's discussion with New York Times reporter David Sanger (pictured at left) begins at the 20:15 mark. The interview was prompted by the NYT story NSA Breached North Korean Networks Before Sony Attack, Officials Say. I took the following notes for those of you who would like some highlights.

Sanger has reported on the national security scene for decades. When he saw President Obama's definitive statement on December 19, 2014 -- "We can confirm that North Korea engaged in this attack [on Sony Pictures Entertainment]." -- Sanger knew the President must have had solid attribution. He wanted to determine what evidence had convinced the President that the DPRK was responsible for the Sony intrusion.

Sanger knew from his reporting on the Obama presidency, including his book Confront and Conceal: Obama's Secret Wars and Surprising Use of American Power, that the President takes a cautious approach to intelligence. Upon assuming his office, the President had little experience with intelligence or cyber issues (except for worries about privacy).

Obama had two primary concerns about intelligence, involving "leaps" and "leaks." First, he feared making "leaps" from intelligence to support policy actions, such as the invasion of Iraq. Second, he worried that leaks of intelligence could "create a groundswell for action that the President doesn't want to take." An example of this second concern is the (mis)handling of the "red line" on Syrian use of chemical weapons.

In early 2009, however, the President became deeply involved with Olympic Games, reported by Sanger as the overall program for the Stuxnet operation. Obama also increased the use of drones for targeted killing. These experiences helped the President overcome some of his concerns with intelligence, but he was still likely to demand proof before taking actions.

Sanger stated in the podcast that, in his opinion, "the only way" to have solid attribution is to be inside adversary systems before an attack, such that the intelligence community can see attacks in progress. In this case, evidence from inside DPRK systems and related infrastructure (outside North Korea) convinced the President.

(I disagree that this is "the only way," but I believe it is an excellent option for performing attribution. See my 2009 post Counterintelligence Options for Digital Security for more details.)

Sanger would not be surprised if we see more leaks about what the intelligence community observed. "There's too many reporters inside the system" to ignore what's happening, he said. The NYT talks with government officials "several times per month" to discuss reporting on sensitive issues. The NYT has a "presumption to publish" stance, although Sanger held back some details in his latest story that would have enabled the DPRK or others to identify "implants in specific systems."

Regarding the purpose of announcing attribution against the DPRK, Sanger stated that deterrence against the DPRK and other actors is one motivation. Sanger reported meeting with NSA director Admiral Mike Rogers, who said the United States needs a deterrence capability in cyberspace. More importantly, the President wanted to signal to the North Koreans that they had crossed a red line. This was a destructive attack, coupled with a threat of physical harm against movie goers. The DPRK has become comfortable using "cyber weapons" because they are more flexible than missiles or nuclear bombs. The President wanted the DPRK to learn that destructive cyber attacks would not be tolerated.

Sanger and Baker then debated the nature of deterrence, arms control, and norms. Sanger stated that it took 17 years after Hiroshima and Nagasaki before President Kennedy made a policy announcement about seeking nuclear arms control with the Soviet Union. Leading powers don't want arms control, until their advantage deteriorates. Once the Soviet Union's nuclear capability exceeded the comfort level of the United States, Kennedy pitched arms control as an option. Sanger believes the nuclear experience offers the right set of questions to ask about deterrence and arms control, although all the answers will be different. He also hopes the US moves faster on deterrence, arms control, and norms than shown by the nuclear case, because other actors (China, Russia, Iran, North Korea, etc.) are "catching up fast."

(Incidentally, Baker isn't a fan of deterrence in cyberspace. He stated that he sees deterrence through the experience of bombers in the 1920s and 1930s.)

According to Sanger, the US can't really discuss deterrence, arms control, and norms until it is willing to explain its offensive capabilities. The experience with drone strikes is illustrative, to a certain degree. However, to this day, no government official has confirmed Olympic Games.

I'd like to thank Stewart Baker for interviewing David Sanger, and I thank David Sanger for agreeing to be interviewed. I look forward to podcast 51, featuring my PhD advisor Dr Thomas Rid.

by Richard Bejtlich ( at January 22, 2015 09:43 AM

Chris Siebenmann

How to set up static networking with systemd-networkd, or at least how I did

I recently switched my Fedora 21 office workstation from Fedora's old /etc/init.d/network init script based method of network setup to using the (relatively new) systemd network setup functionality, for reasons that I covered yesterday. The systemd documentation is a little bit scant and not complete, so in the process I accumulated some notes that I'm going to write down.

First, I'm going to assume that you're having networkd take over everything from the ground up, possibly including giving your physical network devices stable names. If you were previously doing this through udev, you'll need to comment out bits of /etc/udev/rules.d/70-persistent-net.rules (or wherever your system put it).

To configure your networking you need to set up two files for each network connection. The first file will describe the underlying device, using .link files for physical devices and .netdev files for VLANs, bridges, and so on. For physical links, you can use various things to identify the device (I use just the MAC address, which matches what I doing in udev) and then set its name with 'Name=' in the '[Link]' section. Just to make you a bit confused, the VLANs set up on a physical device are not configured in its .link file.

The second file describes the actual networking on the device (physical or virtual), including virtual devices associated with it; this is done with .network files. Again you can use various things to identify which device you want to operate on; I used the name of the device (a [Match] section with Name=<whatever>). Most of the setup will be done in the [Network] section, including telling networkd what VLANs to create. If you want IP aliases on a give interface, specify multiple addresses. Although it's not documented, experimentally the last address specified becomes the primary (default) address of the interface, ie the default source address for traffic going out that interface.

(This is unfortunately reversed from what I expected, which was that the first address specified would be the primary. Hopefully the systemd people will not change this behavior but document it, and then provide a way of specifying primary versus secondary addresses.)

If you're setting up IP aliases for an interface, it's important to know that ifconfig will now be misleading. In the old approach, alias interfaces got created (eg 'em0:0') and showed the alias IP. In the networkd world those interfaces are not created and you need to turn to 'ip addr list' in order to see your IP aliases. Not knowing this can be very alarming, since in ifconfig it looks like your aliases disappeared. In general you can expect networkd to give you somewhat different ifconfig and ip output because it does stuff somewhat differently.

For setting up VLANs, the VLAN= name in your physical device's .network file is paired up with the [NetDev] Name= setting in your VLAN's .netdev file. You then create another .network file with a [Match] Name= setting of your VLAN's name to configure the VLAN interface's IP address and so on. Unfortunately this is a bit tedious, since your .netdev VLAN file basically exists to set a single value (the [VLAN] Id= setting); it would be more convenient (although less pure) if you could just put that information into a new [VLAN] section in the .network file that specified Name and Id together.

If you're uniquely specifying physical devices in .link files (eg with a MAC address for all of them, with no wildcards) and devices in .network files, I believe that the filenames of all of these files are arbitrary. I chose to give my VLANs filenames of eg 'em0.151.netdev' (where em0.151 is the interface name) just in case. As you can see, there seems to be relatively little constraint on the interface names and I was able to match the names required by my old Fedora ifcfg-* setup so that I didn't have to change any of my scripts et al.

You don't need to define a lo interface; networkd will set one up automatically and do the right thing.

Once you have everything set up in /etc/systemd/network, you need to enable this by (in my case) 'chkconfig --del network; systemctl enable systemd-networkd' and then rebooting. If you have systemd .service units that want to wait for networking to be up, you also want to enable the systemd-networkd-wait-online.service unit, which does what it says in its manpage, and then make your units depend on it in the usual way. Note that this is not quite the same as setting your SysV init script ordering so that your init scripts came after network, since this service waits for at least one interface to be plugged in to something (unfortunately there's no option to override this). While systemd still creates the 'sys-subsystem-net-devices-<name>.device' pseudo-devices, they will now appear faster and with less configured than they did with the old init scripts.

(I used to wait for the appearance of the em0.151 device as a sign that the underlying em0 device had been fully configured with IP addresses attached and so on. This is no longer the case in the networkd world, so this hack broke on me.)

In another unfortunate thing, there's no syntax checker for networkd files and it is somewhat hard to get warning messages. networkd will log complaints to the systemd journal, but it won't print them out on the console during boot or anything (at least not that I saw). However I believe that you can start or restart it while the system is live and then see if things complain.

(Why yes I did make a mistake the first time around. It turns out that the Label= setting in the [Address] section of .network files is not for a description of what the address is and does not like 'labels' that have spaces or other funny games in them.)

On the whole, systemd-networkd doesn't cover all of the cases but then neither did Fedora ifcfg- files. I was able to transform all of my rather complex ifcfg- setup into networkd control files with relatively little effort and hassle and the result came very close to working the first time. My networkd config files have a few more lines than my ifcfg-* files, but on the other hand I feel that I fully understand my networkd files and will in the future even after my current exposure to them fades.

(My ifcfg-* files also contain a certain amount of black magic and superstition, which I'm happy to not be carrying forward, and at least some settings that turn out to be mistakes now that I've actually looked them up.)

by cks at January 22, 2015 05:43 AM

January 21, 2015

Chris Siebenmann

Why I'm switching to systemd's networkd stuff for my networking

Today I gave in to temptation and switched my Fedora 21 office workstation from doing networking through Fedora's old /etc/rc.d/init.d/network init script and its /etc/sysconfig/network-scripts/ifcfg-* system to using systemd-networkd. Before I write about what you have to set up to do this, I want to ramble a bit about why I even thought about it, much less went ahead.

The proximate cause is that I was hoping to get a faster system boot. At some point in the past few Fedora versions, bringing up my machine's networking through the network init script became the single slowest part of booting by a large margin, taking on the order of 20 to 30 seconds (and stalling a number of downstream startup jobs). I had no idea just what was taking so long, but I hoped that by switching to something else I could improve the situation.

The deeper cause is that Fedora's old network init script system is a serious mess. All of the work is done by a massive set of intricate shell scripts that use relatively undocumented environment variables set in ifcfg-* files (and the naming of the files themselves). Given the pile of scripts involved, it's absolutely no surprise to me that it takes forever to grind through processing all of my setup. In general the whole thing has all of the baroque charm of the evolved forms of System V init; the best thing I can say about it is that it generally works and you can build relatively sophisticated static setups with it.

(While there is some documentation for what variables can be set hiding in /usr/share/doc/initscripts/sysconfig.txt, it's not complete and for some things you get to decode the shell scripts yourself.)

What systemd's networkd stuff brings to the table for this is the same thing that systemd brings to the table relative to SysV init scripts: you have a well documented way of specifying what you want, which is then directly handled instead of being run through many, many layers of shell scripts. As an additional benefit it gets handled faster and perhaps better.

(I firmly believe that a mess of fragile shell scripts that source your ifcfg-* files and do magic things is not the right architecture. Robust handling of configuration files requires real parsing and so on, not shell script hackery. I don't really care who takes care of this (I would be just as happy with a completely separate system) and I will say straight up that systemd-networkd is not my favorite implementation of this idea and suffers from various flaws. But I like it more than the other options.)

In theory NetworkManager might fill this ecological niche already. In practice NetworkManager has never felt like something that was oriented towards my environment, instead feeling like it targeted machines and people who were going to do all of this through GUIs, and I've run into some issues with it. In particular I'm pretty sure that I'd struggle quite a bit to find documentation on how to set up a NM configuration (from the command line or in files) that duplicates my current network setup; with systemd, it was all in the manual pages. There is a serious (re)assurance value from seeing what you want to configure be clearly documented.

(My longer range reason for liking systemd's move here is that it may bring more uniformity to how you configure networking setups across various Linux flavours.)

by cks at January 21, 2015 07:09 AM

January 20, 2015

Anton Chuvakin - Security Warrior

Annual Blog Round-Up – 2014

Here is my annual "Security Warrior" blog round-up of top 10 popular posts/topics in 2014.
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? Well, you be the judge.  Current emergence of open sources log search tools (ELK FTW!), BTW, does not break the logic of that post.
  2. Simple Log Review Checklist Released!” is often at the top of this list – the checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version)
  3. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases.
  4. My classic PCI DSS Log Review series is always hot! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3.0 in 2015 as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (just out in its 4th edition!
  5. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM tools, see this document)
  6. SANS Top 6 Log Reports Reborn! highlights the re-release of top most popular log reports list.
  7. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (much more details on this here in this paper).
  8. “My Best PCI DSS Presentation EVER!” is my conference presentation where I make a passionate claim that PCI DSS is actually useful for security (do read the PCI book as well)!
  9. On Choosing SIEM” is about the least wrong way of choosing a SIEM tool – as well as about why the right way is so unpopular.
  10. “How to Write an OK SIEM RFP?” (from 2010) contains Anton’s least hated SIEM RFP writing tips (I don’t have the favorite tips since I hate the RFP process)
Disclaimer: all this content was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.
Also see my past monthly and annual “Top Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013.

by (Anton Chuvakin) at January 20, 2015 06:19 PM

The Tech Teapot

My 2014 Reading Log

A list of all of the books I read in 2014 and logged in Good Reads, I read a few more technical books but didn’t log them for whatever reason.


Andrew Smith: Moondust: In Search Of The Men Who Fell To Earth (Non-Fiction)



William Golding: The Spire (Fiction)

Niall Ferguson: The Pity of War: Explaining World War 1 (Non-Fiction)


Simon Parkes: Live at the Brixton Academy: A rioutous  life in the music business (Non-Fiction)


Christopher Priest: Inverted World (Fiction)

Philip K. Dick: VALIS (Fiction)

Lee Campbell: Introduction to Rx (Non-Fiction)


Neil Gaiman: Neverwhere (Fiction)


Fred Hoyle: A for Andromeda (Fiction)


Mark Ellen: Rock Stars Stole My Life! (Non-Fiction)

John Crowley: The Deep (Fiction)

Neil Gaiman: The Ocean at the End of the Lane (Fiction)

Philip K. Dick: A Maze of Death (Fiction)


Paul Ham: Hiroshima Nagasaki: The Real Story of the Atomic Bombings and their Aftermath (Non-Fiction)


A total of 14 books, 8 fiction (mostly science fiction and fantasy) and 6 non-fiction. Of the two autobiographies, I found Mark Ellen’s Rock Stars Stole My Life! to be very good, taking the reader through the rock landscape of the 1960s to the present day through the eyes of a music journalist.

The book I most enjoyed was Neil Gaiman’s The Ocean at the End of the Lane. Neil Gaiman was certainly my find of 2014. He manages to write quite original fantasy in a way that doesn’t make the book feel like fantasy.

The post My 2014 Reading Log appeared first on Openxtra Tech Teapot.

by Jack Hughes at January 20, 2015 12:43 PM

System Administration Advent Calendar

Day 15 - Cook your own packages: Getting more out of fpm

Written by: Mathias Lafeldt (@mlafeldt)
Edited by: Joseph Kern (@josephkern)


When it comes to building packages, there is one particular tool that has grown in popularity over the last years: fpm. fpm’s honorable goal is to make it as simple as possible to create native packages for multiple platforms, all without having to learn the intricacies of each distribution’s packaging format (.deb, .rpm, etc.) and tooling.

With a single command, fpm can build packages from a variety of sources including Ruby gems, Python modules, tarballs, and plain directories. Here’s a quick example showing you how to use the tool to create a Debian package of the AWS SDK for Ruby:

$ fpm -s gem -t deb aws-sdk
Created package {:path=>"rubygem-aws-sdk_1.59.0_all.deb"}

It is this simplicity that makes fpm so popular. Developers are able to easily distribute their software via platform-native packages. Businesses can manage their infrastructure on their own terms, independent of upstream vendors and their policies. All of this has been possible before, but never with this little effort.

In practice, however, things are often more complicated than the one-liner shown above. While it is absolutely possible to provision production systems with packages created by fpm, it will take some work to get there. The tool can only help you so far.

In this post we’ll take a look at several best practices covering: dependency resolution, reproducible builds, and infrastructure as code. All examples will be specific to Debian and Ruby, but the same lessons apply to other platforms/languages as well.

Resolving dependencies

Let’s get back to the AWS SDK package from the introduction. With a single command, fpm converts the aws-sdk Ruby gem to a Debian package named rubygem-aws-sdk. This is what happens when we actually try to install the package on a Debian system:

$ sudo dpkg --install rubygem-aws-sdk_1.59.0_all.deb
dpkg: dependency problems prevent configuration of rubygem-aws-sdk:
 rubygem-aws-sdk depends on rubygem-aws-sdk-v1 (= 1.59.0); however:
  Package rubygem-aws-sdk-v1 is not installed.

As we can see, our package can’t be installed due to a missing dependency (rubygem-aws-sdk-v1). Let’s take a closer look at the generated .deb file:

$ dpkg --info rubygem-aws-sdk_1.59.0_all.deb
 Package: rubygem-aws-sdk
 Version: 1.59.0
 License: Apache 2.0
 Vendor: Amazon Web Services
 Architecture: all
 Maintainer: <vagrant@wheezy-buildbox>
 Installed-Size: 5
 Depends: rubygem-aws-sdk-v1 (= 1.59.0)
 Provides: rubygem-aws-sdk
 Section: Languages/Development/Ruby
 Priority: extra
 Description: Version 1 of the AWS SDK for Ruby. Available as both `aws-sdk` and `aws-sdk-v1`.
  Use `aws-sdk-v1` if you want to load v1 and v2 of the Ruby SDK in the same

fpm did a great job at populating metadata fields such as package name, version, license, and description. It also made sure that the Depends field contains all required dependencies that have to be installed for our package to work properly. Here, there’s only one direct dependency – the one we’re missing.

While fpm goes to great lengths to provide proper dependency information – and this is not limited to Ruby gems – it does not automatically build those dependencies. That’s our job. We need to find a set of compatible dependencies and then tell fpm to build them for us.

Let’s build the missing rubygem-aws-sdk-v1 package with the exact version required and then observe the next dependency in the chain:

$ fpm -s gem -t deb -v 1.59.0 aws-sdk-v1
Created package {:path=>"rubygem-aws-sdk-v1_1.59.0_all.deb"}

$ dpkg --info rubygem-aws-sdk-v1_1.59.0_all.deb | grep Depends
 Depends: rubygem-nokogiri (>= 1.4.4), rubygem-json (>= 1.4), rubygem-json (<< 2.0)

Two more packages to take care of: rubygem-nokogiri and rubygem-json. By now, it should be clear that resolving package dependencies like this is no fun. There must be a better way.

In the Ruby world, Bundler is the tool of choice for managing and resolving gem dependencies. So let’s ask Bundler for the dependencies we need. For this, we create a Gemfile with the following content:

# Gemfile
source ""
gem "aws-sdk", "= 1.59.0"
gem "nokogiri", "~> 1.5.0" # use older version of Nokogiri

We then instruct Bundler to resolve all dependencies and store the resulting .gem files into a local folder:

$ bundle package
Updating files in vendor/cache
  * json-1.8.1.gem
  * nokogiri-1.5.11.gem
  * aws-sdk-v1-1.59.0.gem
  * aws-sdk-1.59.0.gem

We specifically asked Bundler to create .gem files because fpm can convert them into Debian packages in a matter of seconds:

$ find vendor/cache -name '*.gem' | xargs -n1 fpm -s gem -t deb
Created package {:path=>"rubygem-aws-sdk-v1_1.59.0_all.deb"}
Created package {:path=>"rubygem-aws-sdk_1.59.0_all.deb"}
Created package {:path=>"rubygem-json_1.8.1_amd64.deb"}
Created package {:path=>"rubygem-nokogiri_1.5.11_amd64.deb"}

As a final test, let’s install those packages…

$ sudo dpkg -i *.deb
Setting up rubygem-json (1.8.1) ...
Setting up rubygem-nokogiri (1.5.11) ...
Setting up rubygem-aws-sdk-v1 (1.59.0) ...
Setting up rubygem-aws-sdk (1.59.0) ...

…and verify that the AWS SDK actually can be used by Ruby:

$ ruby -e "require 'aws-sdk'; puts AWS::VERSION"


The purpose of this little exercise was to demonstrate one effective approach to resolving package dependencies for fpm. By using Bundler – the best tool for the job – we get fine control over all dependencies, including transitive ones (like Nokogiri, see Gemfile). Other languages provide similar dependency tools. We should make use of language specific tools whenever we can.

Build infrastructure

After learning how to build all packages that make up a piece of software, let’s consider how to integrate fpm into our build infrastructure. These days, with the rise of the DevOps movement, many teams have started to manage their own infrastructure. Even though each team is likely to have unique requirements, it still makes sense to share a company-wide build infrastructure, as opposed to reinventing the wheel each time someone wants to automate packaging.

Packaging is often only a small step in a longer series of build steps. In many cases, we first have to build the software itself. While fpm supports multiple source formats, it doesn’t know how to build the source code or determine dependencies required by the package. Again, that’s our job.

Creating a consistent build and release process for different projects across multiple teams is hard. Fortunately, there’s another tool that does most of the work for us: fpm-cookery. fpm-cookery sits on top of fpm and provides the missing pieces to create a reusable build infrastructure. Inspired by projects like Homebrew, fpm-cookery builds packages based on simple recipes written in Ruby.

Let’s turn our attention back to the AWS SDK. Remember how we initially converted the gem to a Debian package? As a warm up, let’s do the same with fpm-cookery. First, we have to create a recipe.rb file:

# recipe.rb
class AwsSdkGem < FPM::Cookery::RubyGemRecipe
  name    "aws-sdk"
  version "1.59.0"

Next, we pass the recipe to fpm-cook, the command-line tool that comes with fpm-cookery, and let it build the package for us:

$ fpm-cook package recipe.rb
===> Starting package creation for aws-sdk-1.59.0 (debian, deb)
===> Verifying build_depends and depends with Puppet
===> All build_depends and depends packages installed
===> [FPM] Trying to download {"gem":"aws-sdk","version":"1.59.0"}
===> Created package: /home/vagrant/pkg/rubygem-aws-sdk_1.59.0_all.deb

To complete the exercise, we also need to write a recipe for each remaining gem dependency. This is what the final recipes look like:

# recipe.rb
class AwsSdkGem < FPM::Cookery::RubyGemRecipe
  name       "aws-sdk"
  version    "1.59.0"
  maintainer "Mathias Lafeldt <>"

  chain_package true
  chain_recipes ["aws-sdk-v1", "json", "nokogiri"]

# aws-sdk-v1.rb
class AwsSdkV1Gem < FPM::Cookery::RubyGemRecipe
  name       "aws-sdk-v1"
  version    "1.59.0"
  maintainer "Mathias Lafeldt <>"

# json.rb
class JsonGem < FPM::Cookery::RubyGemRecipe
  name       "json"
  version    "1.8.1"
  maintainer "Mathias Lafeldt <>"

# nokogiri.rb
class NokogiriGem < FPM::Cookery::RubyGemRecipe
  name       "nokogiri"
  version    "1.5.11"
  maintainer "Mathias Lafeldt <>"

  build_depends ["libxml2-dev", "libxslt1-dev"]
  depends       ["libxml2", "libxslt1.1"]

Running fpm-cook again will produce Debian packages that can be added to an APT repository and are ready for use in production.

Three things worth highlighting:

  • fpm-cookery is able to build multiple dependent packages in a row (configured by chain_* attributes), allowing us to build everything with a single invocation of fpm-cook.
  • We can use the attributes build_depends and depends to specify a package’s build and runtime dependencies. When running fpm-cook as root, the tool will automatically install missing dependencies for us.
  • I deliberately set the maintainer attribute in all recipes. It’s important to take responsibility of the work that we do. We should make it as easy as possible for others to identify the person or team responsible for a package.

fpm-cookery provides many more attributes to configure all aspects of the build process. Among other things, it can download source code from GitHub before running custom build instructions (e.g. make install). The fpm-recipes repository is an excellent place to study some working examples. This final example, a recipe for chruby, is a foretaste of what fpm-cookery can actually do:

# recipe.rb
class Chruby < FPM::Cookery::Recipe
  description "Changes the current Ruby"

  name     "chruby"
  version  "0.3.8"
  homepage ""
  source   "{version}.tar.gz"
  sha256   "d980872cf2cd047bc9dba78c4b72684c046e246c0fca5ea6509cae7b1ada63be"

  maintainer "Jan Brauer <>"

  section "development"

  config_files "/etc/profile.d/"

  def build
    # nothing to do here

  def install
    make :install, "PREFIX" => prefix
    etc("profile.d").install workdir("")

source /usr/share/chruby/

Wrapping up

fpm has changed the way we build packages. We can get even more out of fpm by using it in combination with other tools. Dedicated programs like Bundler can help us with resolving package dependencies, which is something fpm won’t do for us. fpm-cookery adds another missing piece: it allows us to describe our packages using simple recipes, which can be kept under version control, giving us the benefits of infrastructure as code: repeatability, automation, rollbacks, code reviews, etc.

Last but not least, it’s a good idea to pair fpm-cookery with Docker or Vagrant for fast, isolated package builds. This, however, is outside the scope of this article and left as an exercise for the reader.

Further reading

by Christopher Webber ( at January 20, 2015 08:07 AM

Chris Siebenmann

A gotcha with Python tuples

Here's a little somewhat subtle Python syntax issue that I recently got to relearn (or be reminded of) by stubbing my toe on it. Let's start with an example, taken from our Django configuration:

# Tuple of directories to find templates

This looks good (and used to be accepted by Django), but it's wrong. I'm being tripped up by the critical difference in Python between '(A)' and '(A,)'. While I intended to define a one-element tuple, what I've actually done is set TEMPLATE_DIRS to a single string, which I happened to write in parentheses for no good reason (as far as the Python language is concerned, at least). This is still the case even though I've split the parenthesized expression over three lines; Python doesn't care about how many lines I use (or even how I indent them).

(Although it is not defined explicitly in the not a specification, this behavior is embedded in CPython; CPython silently ignores almost all newlines and whitespace inside ('s, ['s, and {'s.)

I used to be very conscious of this difference and very careful about putting a , at the end of my single-element tuples. I think I got into the habit of doing so when I at least thought that the % string formatting operation only took a tuple and would die if given a single element. At some point % started accepting bare single elements (or at least I noticed it did) and after that I got increasing casual about "..." % (a,) versus "..." % (a) (which I soon changed to "..." % a, of course). Somewhere along this the reflexive add-a-comma behavior fell out of my habits and, well, I wound up writing the example above.

(And Django accepted it for years, probably because any number of people wrote it like I did so why not be a bit friendly and magically assume things. Note that I don't blame Django for tightening up their rules here; it's probably a good idea as well as being clearly correct. Django already has enough intrinsic magic without adding more.)

As a side note, I think Python really has to do things this way. Given that () is used for two purposes, '(A)' for a plain A value is at least ambiguous. Adopting a heuristic that people really wanted a single element tuple instead of a uselessly parenthesized expression strikes me as too much magic for a predictable language, especially when you can force the tuple behavior with a ','.

by cks at January 20, 2015 04:20 AM

January 19, 2015

Chris Siebenmann

Why user-hostile policies are a bad thing and a mistake

One reasonable reaction to limited email retention policies being user-hostile is to say basically 'so what'. It's not really nice that policies make work for users, but sometimes that's just life; people will cope. I feel that this view is a mistake.

The problem with user-hostile policies is that users will circumvent them. Generously, let's assume that you enacted this policy to achieve some goal (not just to say that you have a policy and perhaps point to a technical implementation as proof of it). What you really want is not for the policy to be adhered to but to achieve your goal; the policy is just a tool in getting to the goal. If you enact a policy and then your users do things that defeat the goals of the policy, you have not actually achieved your overall goal. Instead you've made work, created resentment, and may have deluded yourself into thinking that your goal has actually been achieved because after all the policy has been applied.

(Clearly you won't have inconvenient old emails turn up because you're deleting all email after sixty days, right?)

In extreme cases, a user-hostile policy can actually move you further away from your goal. If your goal is 'minimal email retention', a policy that winds up causing users to automatically archive all emails locally because that's the most convenient way to handle things is actually moving you backwards. You were probably better off letting people keep as much email on the server as they wanted, because at least they were likely to delete some of it.

By the way, I happen to think that threatening punishment to people who take actions that go against the spirit or even the letter of your policy is generally not an effective thing from a business perspective in most environments, but that's another entry.

(As for policies for the sake of having policies, well, I would be really dubious of the idea that saying 'we have an email deletion policy so there's only X days of email on the mail server' will do you much good against either attackers or legal requests. To put it one way, do you think the police would accept that answer if they thought you had incriminating email and might have saved it somewhere?)

by cks at January 19, 2015 05:23 AM

January 18, 2015


AWS Cloudformation: Defining your stack

AWS Cloudformation is a service that allows you to define your infrastructure on AWS in a template. This template it’s just a json file where you can define all the resources you need to create on your infrastructure, this is really useful to keep track of all your changes on the infrastructure under a version control of your choice, rollback changes and replicate your environment in other places in question of minutes.

When you define a template, you can think about the definition of one stack where is a set of logical resources you’ll need to provide a service. For example imagine a typical architecture for a web application, is composed basically by a web layer and database layer. Depending on the size of the project, we’ll need more than one web server to serve the content to the clients so we’ll need a load balancer to distribute the load to the web servers. The web server layer can be setup under an auto scaling groups to scale up or scale down the number of servers depending on the load of our web servers. As far we’ve our basic web stack defined:

– Web server instances.
– Database server instance.
– Auto scaling group for web servers.
– Load balancer.


So based on the example of the web application, cloudformation allows us to define all these resources in a json file creating a stack and cloudformation will be responsible to create automatically all the resources for us. After the creation of the stack you can update, add or delete more resources modifying the template and updating our stack, it’s possible to protect some resources to be modified or deleted if they are critical for our service creating a stack policy. Now let’s see the basic anatomy of a cloudformation template:

"AWSTemplateFormatVersion" : "version date",

"Description" : "JSON string",

"Parameters" : {
set of parameters

"Mappings" : {
set of mappings

"Conditions" : {
set of conditions

"Resources" : {
set of resources

"Outputs" : {
set of outputs

AWSTemplateFormatVersion: Cloudformation template format used, most of the time is used the last one “2010-09-09″.
Description: A small explanation about our stack and all the resources.
Parameters: All the parameters passed to our resources at the creation time of the stack, for example the administrator user and password of the database instance, the number of initial instances to launch, and elastic ip to associate to an ec2 instance, etc…
Mappings: It’s a kind of lookup table where you can store definitions of key:value and retrieve the value using the internal function Fn::FindInMap. This is useful for example in cases we need to launch ec2 instances using different AMIs based on the region the stack is created.
Conditions: Includes statements to conditionally create or associate resources in our stack. For example imagine we’ve a stack definition for a test and a production environment and conditionally creates t2.small ec2 instances for our testing environment or m3.xlarge size for the production environment.
Resources: This is the central part of our template, here are defined all the resources of the stack such as s3 buckets, ec2 instances, load balancers, etc…
Outputs: The values returned by the different resources created, for example the URL of a S3 bucket, the dns record of a load balancer, the elastic ip address associated to an EC2 instance, etc…

The documentation of cloudformation it’s well documented, so understanding the anatomy of a template and following the documentation and some examples is enough to start working with cloudformation. I’ll leave link to my github repository where I’ve defined a small stack for my environment:

Basically this stack creates an EC2 instance with an elastic IP associated, a RDS database instance, two S3 buckets to store backups and logs with a lifecycle, a couple of security groups associated to the web server instance and the rds instance and an IAM role including policies to grant access to EC2, S3 buckets and cloudformation resources to the new EC2 instance. Let’s see with a bit more detail the different resources defined on this stack:

AWS::EC2::SecurityGroup: Creates two security groups, one is for the EC2 instance to give access to http, https and ssh services and the security group for the RDS with access to the entire internal subnet to the port 3306. See how parameters are referenced using the internal function {“Ref” : “VpcId”}, where associates the security groups to the VPC id passed.

  "WebServerSecurityGroup" : {
      "Type" : "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Enable HTTP, HTTPS and SSH access",
        "VpcId" : {"Ref" : "VpcId"},
        "SecurityGroupIngress" : [
          {"IpProtocol" : "tcp", "FromPort" : "80", "ToPort" : "80", "CidrIp" : ""},
          {"IpProtocol" : "tcp", "FromPort" : "443", "ToPort" : "443", "CidrIp" : ""},
          {"IpProtocol" : "tcp", "FromPort" : "22", "ToPort" : "22", "CidrIp" : ""}
    "DBEC2SecurityGroup": {
      "Type": "AWS::EC2::SecurityGroup",
      "Properties" : {
        "GroupDescription" : "Frontend Access",
        "VpcId"            : {"Ref" : "VpcId"},
        "SecurityGroupIngress" : [{
          "IpProtocol" : "tcp",
          "FromPort"   : { "Ref" : "DBPort" },
          "ToPort"     : { "Ref" : "DBPort" },
          "CidrIp"     : ""

AWS::S3::Bucket: Here are defined the two buckets for logs and backups. The DeletionPolicy ensures if the stack is deleted the s3 buckets will be preserved. AccessControl defines the ACL to access on this bucket, in that case both are private. LifecycleConfiguration allows you to create a lifecycle policy to apply on the bucket, in that case both buckets will remove the files older than 15 or 30 days, but here you can setup to archive the files to AWS Glacier for example.

  "S3BackupBucket" : {
      "Type" : "AWS::S3::Bucket",
      "DeletionPolicy" : "Retain",
      "Properties" : {
        "AccessControl" : "Private",
        "BucketName" : "opentodo-backups",
        "LifecycleConfiguration" : {
          "Rules" : [ {
            "ExpirationInDays" : 15,
            "Status" : "Enabled"
          } ]
    "S3LogBucket" : {
      "Type" : "AWS::S3::Bucket",
      "DeletionPolicy" : "Retain",
      "Properties" : {
        "AccessControl" : "Private",
        "BucketName" : "opentodo-logs",
        "LifecycleConfiguration" : {
          "Rules" : [ {
            "ExpirationInDays" : 30,
            "Status" : "Enabled"
          } ]

AWS::IAM::Role: Allows to make API requests to AWS services without using an access and secret keys, using Temporary Security Credentials. This role creates different policies to give access to S3 buckets backups and logs, ec2 access and cloudformation resources.

  "WebServerRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version" : "2012-10-17",
          "Statement": [ {
            "Effect": "Allow",
            "Principal": {
              "Service": [ "" ]
             "Action": [ "sts:AssumeRole" ]
          } ]
        "Path": "/",
        "Policies": [
          { "PolicyName": "EC2Access",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": ["ec2:*","autoscaling:*"],
                "Resource": "*"
              } ]
          { "PolicyName": "S3Access",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": "s3:*",
                "Resource": ["arn:aws:s3:::opentodo-backups","arn:aws:s3:::opentodo-backups/*","arn:aws:s3:::opentodo-logs","arn:aws:s3:::opentodo-logs/*"]
              } ]
          { "PolicyName": "CfnAccess",
            "PolicyDocument": {
              "Version" : "2012-10-17",
              "Statement": [ {
                "Effect": "Allow",
                "Action": ["cloudformation:DescribeStackResource"],
                "Resource": "*"
              } ]

AWS::IAM::InstanceProfile: references to the IAM role, this is just a container for the IAM role and this allows to assign the role to an EC2 instance.

  "WebServerInstanceProfile": {
      "Type": "AWS::IAM::InstanceProfile",
      "Properties": {
        "Path": "/",
        "Roles": [ {
        "Ref": "WebServerRole"
         } ]

AWS::EC2::Instance: Creates the EC2 instance using the AMI id ami-df1696a8, and assigns the InstanceProfile defined before, in the subnet id subnet-7d59d518 and the instance size and key pairs passed as parameters. The property UserData allows to setup scripts to run in the startup process. This commands passed on the user-data are run by the cloud-init service, which is included on the public AMIs provided by AWS. This user-data setup here installs the package python-setuptools and installs CloudFormation Helper Scripts, which are a set of python scripts to install packages, run commands, create files or start services as part of the cloudformation stack on the EC2 instances. The cfn-init command gets the cloudformation metadata to check what tasks has to run the instance (that’s why we include the policy access to cloudformation:DescribeStackResource on the IAM role before). The cloudformation metadata is setup on the AWS::CloudFormation::Init key, where basically installs some packages including the awscli tool and creates a couple of files, the /root/.my.cnf to access to the RDS instance which is filled using the attributes got after create the RDS instance, and the file /etc/bash_completion.d/awscli for the awscli auto completion. The cfn-signal command on user-data is used to indicate if the EC2 instance have been successfully created or updated, which is handled by the CreationPolicy attribute to wait until the cf-init command has finished, with a timeout of 5 minutes.

  "WebServerEc2Instance" : {
      "Type" : "AWS::EC2::Instance",
        "Metadata" : {
        "AWS::CloudFormation::Init" : {
          "config" : {
            "packages" : {
              "apt" : {
                "nginx" : [],
                "php5-fpm" : [],
                "git" : [],
                "etckeeper" : [],
                "fail2ban" : [],
                "mysql-client" : []
              "python" : {
                "awscli" : []
            "files" : {
              "/root/.my.cnf" : {
                "content" : { "Fn::Join" : ["", [
                  "user=", { "Ref" : "DBUser" }, "\n",
                  "password=", { "Ref" : "DBPassword" }, "\n",
                  "host=", { "Fn::GetAtt" : [ "DBInstance", "Endpoint.Address" ] }, "\n",
                  "port=", { "Fn::GetAtt" : [ "DBInstance", "Endpoint.Port" ] }, "\n"
                ] ] },
                "mode"  : "000600",
                "owner" : "root",
                "group" : "root"
              "/etc/bash_completion.d/awscli" : {
                "content" : { "Fn::Join" : ["", [
                  "complete -C aws_completer aws\n"
                ] ] },
                "mode"  : "000644",
                "owner" : "root",
                "group" : "root"
      "Properties" : {
        "ImageId" : "ami-df1696a8",
        "InstanceType"   : { "Ref" : "InstanceType" },
        "SecurityGroupIds" : [ {"Ref" : "WebServerSecurityGroup"} ],
        "KeyName"        : { "Ref" : "KeyPair" },
        "IamInstanceProfile" : { "Ref" : "WebServerInstanceProfile" },
        "SubnetId" : "subnet-7d59d518",
        "UserData": {
          "Fn::Base64": {
            "Fn::Join": [
                "aptitude update\n",
                "aptitude -y install python-setuptools\n",
                "# Install the files and packages from the metadata\n",
                "cfn-init --stack ", { "Ref" : "AWS::StackName" }," --resource WebServerEc2Instance --region ", { "Ref" : "AWS::Region" }, "\n",
                "# Signal the status from cfn-init\n",
                "cfn-signal -e $? ","--stack ", { "Ref" : "AWS::StackName" }," --resource WebServerEc2Instance --region ", { "Ref" : "AWS::Region" }, "\n"
      "CreationPolicy" : {
        "ResourceSignal" : {
          "Timeout" : "PT5M"

AWS::EC2::EIPAssociation: Associates the elastic IP passed as parameter to the EC2 instance. The elastic IP must be allocated before on AWS.

  "EIPAssociation" : {
      "Type" : "AWS::EC2::EIPAssociation",
      "Properties" : {
        "InstanceId" : {"Ref" : "WebServerEc2Instance"},
        "EIP" : {"Ref" : "ElasticIP"}

AWS::RDS::DBSubnetGroup: Creates a DB subnet group using the subnet ids defined where the RDS instance will be setup.

  "DBSubnetGroup" : {
      "Type" : "AWS::RDS::DBSubnetGroup",
      "Properties" : {
        "DBSubnetGroupDescription" : "WebServer DB subnet group",
        "SubnetIds" : [ "subnet-058c0560", "subnet-2072c457" ]

AWS::RDS::DBInstance: Creates the RDS instance on the subnet group created before with some properties passed as parameter.

  "DBInstance" : {
      "Type": "AWS::RDS::DBInstance",
      "Properties": {
        "DBInstanceIdentifier" : "WebServerRDS",
        "Engine"            : "MySQL",
        "MultiAZ"           : { "Ref": "MultiAZDatabase" },
        "MasterUsername"    : { "Ref" : "DBUser" },
        "MasterUserPassword": { "Ref" : "DBPassword" },
        "DBInstanceClass"   : { "Ref" : "DBClass" },
        "AllocatedStorage"  : { "Ref" : "DBAllocatedStorage" },
        "DBSubnetGroupName" : { "Ref" : "DBSubnetGroup" },
        "Port"              : { "Ref" : "DBPort" },
        "StorageType" : "gp2",
        "AutoMinorVersionUpgrade" : "true",
        "BackupRetentionPeriod" : 5,
        "PreferredBackupWindow" : "02:30-03:30",
	"PreferredMaintenanceWindow" : "sun:04:30-sun:05:30",
        "VPCSecurityGroups": [ { "Fn::GetAtt": [ "DBEC2SecurityGroup", "GroupId" ] } ]

As I said AWS has a very well documentation, so all that you need you can find on his doc pages and find very useful examples:

Github repository with the full template:

by ivanmp91 at January 18, 2015 08:11 PM

Chris Siebenmann

Limited retention policies for email are user-hostile

I periodically see security people argue for policies and technology to limit the retention of email and other files, ie to enact policies like 'all email older than X days is automatically deleted for you'. Usually the reasons given are that this limits the damage done in a compromise (for example), as attackers can't copy things that have already been deleted. The problem with this is that limited retention periods are clearly user hostile.

The theory of limited retention policies is that people will manually save the small amount of email that they really need past the retention period. The reality is that many people can't pick out in advance all of the email that will be needed later or that will turn out to be important. This is a lesson I've learned over and over myself; many times I've fished email out of my brute force archive that I'd otherwise deleted because I had no idea I'd want it later. The inevitable result is that either people don't save email and then wind up wanting it or they over-save (up to and including 'everything') just in case.

Beyond that, such policies clearly force make-work on people in order to deal with them. Unless you adopt an 'archive everything' policy that you can automate, you're going to spend some amount of your time trying to sort out which email you need to save and then saving it off somewhere before it expires. This is time that you're not doing your actual job and taking care of your actual work. It would clearly be much less work to keep everything sitting around and not have to worry that some of your email will be vanishing out from underneath you.

The result is that a limited retention policy is a classical 'bad' security policy in most environments. It's a policy that wouldn't exist without security (or legal) concerns, it makes people's lives harder, and it actively invites people to get around it (in fact you're supposed to get around it to some extent, just not too much).

(I can think of less user hostile ways to deal with the underlying problem, but what you should do depends on what you think the problem is.)

by cks at January 18, 2015 08:16 AM

OpenVZ/Proxmox - pre-backup all container dump script

This simple script creates a vzdump of all the OpenVZ containers on a machine. It can be used before an actual backup, in my case the actual backup excludes the container path /var/lib/vz/private. This because a dump is easier to backup because it has much less files in it.

January 18, 2015 12:00 AM

January 17, 2015

2014 Toolsmith Tool of the Year: SimpleRisk

Congratulations to Josh Sokol of SimpleRisk, LLC.
SimpleRisk is the 2014 Toolsmith Tool of the Year.
We mustered 933 total votes this year of which 438 went to SimpleRisk.
In Josh's own words, "I began writing SimpleRisk because I needed a tool to aide in my risk management activities and spreadsheets just weren't cutting it. But once I had a POC created, I knew that it was too good to keep to myself. I've always wanted to give back to the security community that has given so much to me. That's why I decided to release SimpleRisk under a Mozilla Public License 2.0. I hope it's as useful to you as it is for me."
Voters agree, SimpleRisk is definitely useful. :-)
Here's how the votes broke down.

Congratulations to all toolsmith entries and participants this year, and in particular to runners up Artillery from Dave Kennedy and Binary Defense Systems and ThreadFix from the Denim Group.
2015 promises us another great year of tools for information security practitioners and as always, if there are tools you'd like me to cover in toolsmith, please feel free to submit your favorites for consideration.

by (Russ McRee) at January 17, 2015 10:20 PM

Security Monkey

Advice to Students Curious About Entering Forensics Field

I responded to a Redditor in /r/computerforensics who posed the following question:


Hi /r/computerforensics, I'm a junior in HS interested in this career path. Mind answering a few questions?


I'll try to make this short and sweet. I've looked at a few similar previous threads that answered a few of my ques

January 17, 2015 11:37 AM

Chris Siebenmann

Node.js is not for me (and why)

I've been aware of and occasionally poking at node.js for a fairly long time now, and periodically I've considered writing something in it; I also follow a number of people on Twitter who are deeply involved with and passionate about node.js and the whole non-browser Javascript community. But I've never actually done anything with node.js and more or less ever since I got on Twitter and started following those node enthusiasts I've been feeling increasingly like I never would. Recently all of this has coalesced and now I think I can write down why node is not for me.

(These days there is also io.js, which is a compatible fork split off from node.js for reasons both technical and political.)

Node is fast server-side JavaScript in an asynchronous event based environment that uses callbacks for most event handling; a highly vibrant community and package ecosystem has coalesced around it. It's probably the fastest dynamic language you can run on servers.

My disengagement with node is because none of those appeal to me at all. While I accept that JavaScript is an okay language it doesn't appeal to me and I have no urge to write code in it, however fast it might be on the server once everything has started. As for the rest, I think that asynchronous event-based programming that requires widespread use of callbacks is actively the wrong programming model for dealing with concurrency, as it forces more or less explicit complexity on the programmer instead of handling it for you. A model of concurrency like Go's channels and coroutines is much easier to write code for, at least for me, and is certainly less irritating (even though the channel model has limits).

(I also think that a model with explicit concurrency is going to scale to a multi-core environment much better. If you promise 'this is pure async, two things never happen at once' you're now committed to a single thread of control model, and that means only using a single core unless your language environment can determine that two chunks of code don't interact with each other and so can't tell if they're running at the same time.)

As for the package availability, well, it's basically irrelevant given the lack of the appeal of the core. You'd need a really amazingly compelling package to get me to use a programming environment that doesn't appeal to me.

Now that I've realized all of this I'm going to do my best to let go of any lingering semi-guilty feelings that I should pay attention to node and maybe play around with it and so on, just because it's such a big presence in the language ecosystem at the moment (and because people whose opinions I respect love it). The world is a big place and we don't have to all agree with each other, even about programming things.

PS: None of this means that node.js is bad. Lots of people like JavaScript (or at least have a neutral 'just another language' attitude) and I understand that there are programming models for node.js that somewhat tame the tangle of event callbacks and so on. As mention, it's just not for me.

by cks at January 17, 2015 04:07 AM

Filtering IMAP mail with imapfilter

I have several email accounts at different providers. Most of them don't offer filtering capabilites like Sieve, or only their own non exportable rule system (Google Apps). My mail client of choice, Thunderbird, has filtering capabilities but my phone has not and I don't want to leave my machine running Thunderbird all the time since it gets quite slow with huge mailboxes. Imapfilter is a mail filtering utility written in Lua which connects to one or more IMAP accounts and filters on the server using IMAP queries. It is a lightweight command line utility, the configuration can be versioned and is simple text and it is very fast.

January 17, 2015 12:00 AM

January 16, 2015

The Lone Sysadmin

How to Replace an SD Card in a Dell PowerEdge Server

We use the Dell Internal Dual SD module (IDSDM) for our VMware ESXi hosts. It works great, and saves us a bunch of money per server in that we don’t need RAID controllers, spinning disks, etc. Ours are populated with two 2 GB SD cards from the factory, and set to Mirror Mode in the […]

The post How to Replace an SD Card in a Dell PowerEdge Server appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 16, 2015 07:27 PM

Chris Siebenmann

Using systemd-run to limit something's RAM consumption on the fly

A year ago I wrote about using cgroups to limit something's RAM consumption, for limiting the resources that make'ing Firefox could use when I did it. At the time my approach with an explicitly configured cgroup and the direct use of cgexec was the only way to do it on my machines; although systemd has facilities to do this in general, my version could not do this for ad hoc user-run programs. Well, I've upgraded to Fedora 21 and that's now changed, so here's a quick guide to doing it the systemd way.

The core command is systemd-run, which we use to start a command with various limits set. The basic command is:

systemd-run --user --scope -p LIM1=VAL1 -p LIM2=VAL2 [...] CMD ARG [...]

The --user makes things run as ourselves with no special privileges, and is necessary to get things to run. The --scope basically means 'run this as a subcommand', although systemd considers it a named object while it's running. Systemd-run will make up a name for it (and report the name when it starts your command), or you can use --unit NAME to give it your own name.

The limits you can set are covered in systemd.resource-control. Since systemd is just using cgroups, the limits you can set up are just the cgroup limits (and the documentation will tell you exactly what the mapping is, if you need it). Conveniently, systemd-run allows you to specify memory limits in Gb (or Mb), not just bytes. The specific limits I set up in the original entry give us a final command of:

systemd-run --user --scope -p MemoryLimit=3G -p CPUShares=512 -p BlockIOWeight=500 make

(Here I'm once again running make as my example command.)

You can inspect the parameters of your new scope with 'systemctl show --user <scope>', and change them on the fly with 'systemctl set-property --user <scope> LIM=VAL'. I'll leave potential uses of this up to your imagination. systemd-cgls can be used to show all of the scopes and find any particular one that's running this way (and show its processes).

(It would be nice if systemd-cgtop gave you a nice rundown of what resources were getting used by your confined scope, but as far as I can tell it doesn't. Maybe I'm missing a magic trick here.)

Now, there's a subtle semantic difference between what we're doing here and what I did in the original entry. With cgexec, everything that ran in our confine cgroup shared the same limit even if they were started completely separately. With systemd-run, separately started commands have separate limits; if you start two makes in parallel, each of them can use 3 GB of RAM. I'm not sure yet how you fix this in the official systemd way, but I think it involves defining a slice and then attaching our scopes to it.

(On the other hand, this separation of limits for separate commands may be something you consider a feature.)

Sidebar: systemd-run versus cgexec et al

In Fedora 20 and Fedora 21, cgexec works okay for me but I found that systemd would periodically clear out my custom confine cgroup and I'd have to do 'systemctl restart cgconfig' to recreate it (generally anything that caused systemd to reload itself would do this, including yum package updates that poked systemd). Now that the Fedora 21 version of systemd-run supports -p, using it and doing things the systemd way is just easier.

(I wrap the entire invocation up in a script, of course.)

by cks at January 16, 2015 07:01 AM

January 15, 2015

Chris Siebenmann

Link: Against DNSSEC by Thomas Ptacek

Against DNSSEC by Thomas Ptacek (@tqbf) is what it says in the title; lucid and to my mind strong reasons against using or supporting DNSSEC. I've heard some of these from @tqbf before in Tweets (and others are ambient knowledge in the right communities), but now that he's written this I don't have to try to dig those tweets out and make a coherent entry out of them.

For what it's worth, from my less informed perspective I agree with all of this. It would be nice if DNSSEC could bootstrap a system to get us out of the TLS CA racket but I've become persuaded (partly by @tqbf) that this is not viable and the cure is at least as bad as the disease. See eg this Twitter conversation.

(You may know of Thomas Ptacek from the days when he was at Matasano Security, where he was the author of such classics as If You're Typing the Letters A-E-S Into Your Code You're Doing It Wrong. See also eg his Hacker News profile.)

Update: there's a Hacker News discussion of this with additional arguments and more commentary from Thomas Ptacek here.

by cks at January 15, 2015 10:35 PM


FBI Is Part of US Intelligence Community

Are you surprised to learn that the FBI is officially part of the United States Intelligence Community? Did you know there's actually a list?

If you visit the Intelligence Community Web site at, you can learn more about the IC. The member agencies page lists all 17 organizations.

The FBI didn't always emphasize an intelligence role. The Directorate of Intelligence appeared in 2005 and was part of the National Security Branch, as described here.

Now, as shown on the latest organizational chart, Intelligence is a peer with the National Security Branch. Each has its own Executive Assistant Director. NSB currently houses a division for Counterterrorism, a division for Counterintelligence, and directorate for Weapons of Mass Destruction.

You may notice that there is a Cyber Divison within a separate branch for "Criminal, Cyber, Response, and Services." If the Bureau continues to stay exceptionally engaged in investigating and countering cyber crime, espionage, and sabotage, we might see a separate Cyber branch at some point.

The elevation of the Bureau's intelligence function was a consequence of 9-11 and the Intelligence Reform and Terrorism Prevention Act of 2004.

If you want to read a book on the IC, Jeffrey Richelson publishes every few years on the topic. His sixth edition dates to 2011. I read an earlier edition, and noticed his writing is fairly dry.

Mark Lowenthal's book is also in its sixth edition. I was able to find my review of the fourth edition, if you want my detailed opinion.

In general these books are suitable for students and participants in the IC. Casual readers will probably not find them exciting enough. Reading them and related .gov sites will help keep you up to date on the nature and work of the IC, however.

With this information in mind, it might make more sense to some why the FBI acted both as investigator for recent intrusions and as a spokesperson for the IC.

by Richard Bejtlich ( at January 15, 2015 10:24 PM

The Lone Sysadmin

Deduplication & Write Once, Read Many

It’s probably sad that I see this and think about deduplication & WORM. This fellow achieved a 27% deduplication rate, though. Think of all the extra letters he could tattoo on his back now! For those of you who don’t speak English natively I assume he was going for “Mississippi.”  

The post Deduplication & Write Once, Read Many appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 15, 2015 08:35 PM

Chris Siebenmann

ZFS should be your choice today if you need an advanced filesystem on Unix

The other day I ran down why ZFS is your only real choice on Linux if you need an advanced filesystem. Well, it's not just Linux. I don't know enough about the landscape of filesystems on other Unixes to confidently say it's your only choice on non-Linux Unixes, but I do think it is by far your best choice for this job.

Let's start with the first and biggest reason: ZFS works and has for years. At this point ZFS has been running in production environments (some of them very big and/or demanding) for more than half a decade. Yes, development is still happening and people are still finding issues (just as they are with essentially any filesystem), but the bulk of ZFS (in code and features) is extensively battle tested and is highly unlikely to blow up on you. Using ZFS puts you nowhere near the bleeding edge, and I don't think there's any other advanced Unix filesystem that can really make that claim.

The second reason is that ZFS is cross-platform, and in fact with care and advanced planning you can move ZFS pools between Unixes (instead of just shifting from one Unix to another for your next storage server build or the like). Chosing ZFS does not lock you into Solaris the way it once did; these days you can run ZFS reasonably sensibly on your choice of Oracle Solaris, anything Illumos based (eg OmniOS or SmartOS), FreeBSD, and Linux. People are running ZFS in production on all of these. This covers almost all of the gamut of major remaining Unixes and gives you plenty of choices as far as style, administration, user level tools, and so on. And if your first choice of Unix for ZFS turns out to have problems, well, no big deal; you can move your basic architecture to another Unix. In a way you can treat ZFS much the way you treat applications like Apache or Exim and basically host it on the Unix substrate of your choice.

(Yes, yes, the OpenBSD and NetBSD and DragonflyBSD and OS X Server crowds are now annoyed at my over-generalization. Sorry, you're all relatively marginal in this space at this time.)

Finally, and it's high time I said this explicitly, ZFS not merely works but it's decently good and has generally made the right design decisions. I can quibble with some of its architectural choices and it's (still) missing a few features that people continually yearn for, but on the whole it is not merely a filesystem that is there but a filesystem that is actually nice, with choices that are sensible and usable.

This last reason is actually kind of important. If there are no other production ready advanced filesystems on Unixes (Linux or otherwise) then ZFS would win today by default. But if it kind of sucked, you might want to stick with current filesystems and wait unless data integrity was really, really important to you. As it is ZFS is decent and deciding to use ZFS doesn't constrain your Unix choices much, so I think you should seriously consider using ZFS for its data integrity advantages.

(There are any number of situations where using ZFS is still not worth the hassles or the risks. Or to be direct, ZFS is certainly not to the point on Linux and maybe FreeBSD where I think using it is a slam dunk obvious choice. Of course on Solaris or Illumos you basically have no option, so go with the flow.)

by cks at January 15, 2015 08:21 PM


Cass Sunstein on Red Teaming

On January 7, 2015, FBI Director James Comey spoke to the International Conference on Cyber Security at Fordham University. Part of his remarks addressed controversy over the US government's attribution of North Korea as being responsible for the digital attack on Sony Pictures Entertainment.

Near the end of his talk he noted the following:

We brought in a red team from all across the intelligence community and said, “Let’s hack at this. What else could be explaining this? What other explanations might there be? What might we be missing? What competing hypothesis might there be? Evaluate possible alternatives. What might we be missing?” And we end up in the same place.

I noticed some people in the technical security community expressing confusion about this statement. Isn't a red team a bunch of hackers who exploit vulnerabilities to demonstrate defensive flaws?

In this case, "red team" refers to a group performing the actions Director Comey outlined above. Harvard Professor and former government official Cass Sunstein explains the sort of red team mentioned by Comey in his new book Wiser: Getting Beyond Groupthink to Make Groups Smarter. In this article published by Fortune, Sunstein and co-author Reid Hastie advise the following as one of the ways to avoid group think to improve decision making:

Appoint an adversary: Red-teaming

Many groups buy into the concept of devil’s advocates, or designating one member to play a “dissenting” role. Unfortunately, evidence for the efficacy of devil’s advocates is mixed. When people know that the advocate is not sincere, the method is weak. A much better strategy involves “red-teaming.”

This is the same concept as devil’s advocacy, but amplified: In military training, red teams play an adversary role and genuinely try to defeat the primary team in a simulated mission. In another version, the red team is asked to build the strongest case against a proposal or plan. Versions of both methods are used in the military and in many government offices, including NASA’s reviews of mission plans, where the practice is sometimes called a “murder board.”

Law firms have a long-running tradition of pre-trying cases or testing arguments with the equivalent of red teams. In important cases, some law firms pay attorneys from a separate firm to develop and present a case against them. The method is especially effective in the legal world, as litigators are naturally combative and accustomed to arguing a position assigned to them by circumstance. A huge benefit of legal red teaming is that it can helpt clients understand the weaknesses of their side of a case, often leading to settlements that avoid the devastating costs of losing at trial.

One size does not fit all, and cost and feasibility issues matter. But in many cases, red teams are worth the investment. In the private and public sectors, a lot of expensive mistakes can be avoided with the use of red teams.

Some critics of the government's attribution statements have ignored the fact that the FBI took this important step. An article in Reuters, titled In cyberattacks such as Sony strike, Obama turns to 'name and shame', add some color to this action:

The new [name and shame] policy has meant wresting some control of the issue from U.S. intelligence agencies, which are traditionally wary of revealing much about what they know or how they know it.

Intelligence officers initially wanted more proof of North Korea's involvement before going public, according to one person briefed on the matter. A step that helped build consensus was the creation of a team dedicated to pursuing rival theories - none of which panned out.

If you don't trust the government, you're unlikely to care that the intelligence community (which includes the FBI) red-teamed the attribution case. Nevertheless, it's important to understand the process involved. The government and IC are unlikely to release additional details, unless and until they pursue an indictment similar to the one against the PLA and five individuals from Unit 61398 last year.

Thanks to Augusto Barros for pointing me to the new "Wiser" book.

by Richard Bejtlich ( at January 15, 2015 11:17 AM

Chris Siebenmann

General ZFS pool shrinking will likely be coming to Illumos

Here is some great news. It started with this tweet from Alex Reece (which I saw via @bdha):

Finally got around to posting the device removal writeup for my first open source talk on #openzfs device removal! <link>

'Device removal' sounded vaguely interesting but I wasn't entirely sure why it called for a talk, since ZFS can already remove devices. Still, I'll read ZFS related things when I see them go by on Twitter, so I did. And my eyes popped right open.

This is really about being able to remove vdevs from a pool. In its current state I think the code requires all vdevs to be bare disks, which is not too useful for real configurations, but now that the big initial work has been done I suspect that there will be a big rush of people to improve it to cover more cases once it goes upstream to mainline Illumos (or before). Even being able to remove bare disks from pools with mirrored vdevs would be a big help for the 'I accidentally added a disk as a new vdev instead of as a mirror' situation that comes up periodically.

(This mistake is the difference between 'zpool add POOL DEV1 DEV2' and 'zpool add POOL mirror DEV1 DEV2'. You spotted the one word added to the second command, right?)

While this is not quite the same thing as an in-place reshape of your pool, a fully general version of this would let you move a pool from, say, mirroring to raidz provided that you had enough scratch disks for the transition (either because you are the kind of place that has them around or because you're moving to new disks anyways and you're just arranging them differently).

(While you can do this kind of 'reshaping' today by making a completely new pool and using zfs send and zfs receive, there are some advantages to being able to do it transparently and without interruptions while people are actively using the pool).

This feature has been a wishlist item for ZFS for so long that I'd long since given up on ever seeing it. To have even a preliminary version of it materialize out of the blue like this is simply amazing (and I'm a little bit surprised that this is the first I heard of it; I would have expected an explosion of excitement as the news started going around).

(Note that there may be an important fundamental limitation about this that I'm missing in my initial enthusiasm and reading. But still, it's the best news about this I've heard for, well, years.)

by cks at January 15, 2015 05:26 AM

January 14, 2015

Chris Siebenmann

What /etc/shells is and isn't

In traditional Unix, /etc/shells has only one true purpose: it lists programs that chsh will let you change your shell to (if it lets you do anything). Before people are tempted to make other programs use this file for something else, it is important to understand the limits of /etc/shells. These include but are not limited to:

  • Logins may have /etc/passwd entries that list other shells. For example, back when restricted shells were popular it was extremely common to not list them in /etc/shells so you couldn't accidentally chsh yourself into a restricted shell and then get stuck.

    Some but not all programs have used the absence of a shell from /etc/shells as a sign that it is a restricted shell (or not a real shell at all) and they should restrict a user with that shell in some way. Other programs have used different tests, such as matching against specific shell names or name prefixes.

    (It's traditional for the FTP daemon to refuse access for accounts that do not have a shell that's in /etc/shells and so this is broadly accepted. Other programs are on much thinner ice.)

  • On the other hand, sometimes you can find restricted shells in /etc/shells; a number of systems (Ubuntu and several FreeBSD versions) include rbash, the restricted version of Bash, if it's installed.

  • Not all normal shells used in /etc/passwd or simply installed on the system necessarily appear in /etc/shells for various reasons. In practice there are all sorts of ways for installed shells to fall through the cracks. Of course this makes them hard to use as your login shell (since you can't chsh to them), but this can be worked around in various ways.

    For example, our Ubuntu systems have /bin/csh and /bin/ksh (and some people use them as their login shells) but neither are in /etc/shells.

  • The (normal and unrestricted) shell someone's actually using isn't necessarily in either /etc/shells or their /etc/passwd entry. Unix is flexible and easily lets you use $SHELL and some dotfile hacking to switch basically everything over to running your own personal choice of shell, per my entry on using an alternate shell.

    (Essentially everything on Unix that spawns what is supposed to be your interactive shell has been clubbed into using $SHELL, partly because the code to use $SHELL is easier to write than the code to look up someone's /etc/passwd entry to find their official login shell. This feature probably came into Unix with BSD Unix, which was basically the first Unix to have two shells.)

  • Entries in /etc/shells don't necessarily exist.
  • Entries in /etc/shells are not necessarily shells. Ubuntu 14.04 lists screen.

  • Not all systems even have an /etc/shells. Solaris and derivatives such as Illumos and OmniOS don't.

In the face of all of this, most programs should simply use $SHELL and assume that it is what the user wants and/or what the sysadmin wants the user to get. It's essentially safe to assume that $SHELL always exists, because it is part of the long-standing standard Unix login environment. As a corollary, a program should not change $SHELL unless it has an excellent reason to do so.

Note particularly that a user's $SHELL not being listed in /etc/shells means essentially nothing. As outlined above, there are any number of non-theoretical ways that this can and does happen on real systems that are out there in the field. As a corollary your program should not do anything special in this case unless it has a really strong reason to do so, generally a security-related reason. Really, you don't even want to look at /etc/shells unless you're chsh or ftpd or sudo or the like.

(This entry is sadly brought to you by a program getting this wrong.)

by cks at January 14, 2015 06:24 AM

January 13, 2015

Everything Sysadmin

ACM's new Applicative conf, Feb. 25-27, NYC!

Are you a software developer that is facing rapidly changing markets, technologies and platforms? This new conference is for you.

ACM's new Applicative conference, Feb. 25-27, 2015 in Midtown Manhattan, is for software developers who work in rapidly changing environments. Technical tracks will focus on emerging technologies in system-level programming and application development.

The list of speakers is very impressive. I'd also recommend sysadmins attend as a way to stay in touch with the hot technologies that your developers will be using (and demanding) soon.

Early bird rates through Jan. 28 at

January 13, 2015 02:25 PM


Does This Sound Familiar?

I read the following in the 2009 book Streetlights and Shadows:
Searching for the Keys to Adaptive Decision Making by Gary Klein. It reminded me of the myriad ways operational information technology and security processes fail.

This is a long excerpt, but it is compelling.

== Begin ==

A commercial airliner isn't supposed to run out of fuel at 41,000 feet. There are too many safeguards, too many redundant systems, too many regulations and checklists. So when that happened to Captain Bob Pearson on July 23, 1983, flying a twin-engine Boeing 767 from Ottawa to Edmonton with 61 passengers, he didn't have any standard flight procedures to fall back on.

First the fuel pumps for the left engine quit. Pearson could work around that problem by turning off the pumps, figuring that gravity would feed the engine. The computer showed that he had plenty of fuel for the flight.

Then the left engine itself quit. Down to one engine, Pearson made the obvious decision to divert from Edmonton to Winnipeg, only 128 miles away. Next, the fuel pumps on the right engine went.

Shortly after that, the cockpit warning system emitted a warning sound that neither Pearson nor the first officer had ever heard before. It meant that both the engines had failed.

And then the cockpit went dark. When the engines stopped, Pearson lost all electrical power, and his advanced cockpit instruments went blank, leaving him only with a few battery-powered emergency instruments that were barely enough to land; he could read the instruments because it was still early evening.

Even if Pearson did manage to come in for a landing, he didn't have any way to slow the airplane down. The engines powered the hydraulic system that controlled the flaps used in taking off and in landing. Fortunately, the designers had provided a backup generator that used wind power from the forward momentum of the airplane.

With effort, Pearson could use this generator to manipulate some of his controls to change the direction and pitch of the airplane, but he couldn't lower the flaps and slats, activate the speed brakes, or use normal braking to slow down when landing. He couldn't use reverse thrust to slow the airplane, because the engines weren't providing any thrust. None of the procedures or flight checklists covered the situation Pearson was facing.

  Pearson, a highly experienced pilot, had been flying B-767s for only three months-almost as long as the airplane had been in the Air Canada fleet. Somehow, he had to fly the plane to Winnipeg. However, "fly" is the wrong term. The airplane wasn't flying. It was gliding, and poorly. Airliners aren't designed to glide very well-they are too heavy, their wings are too short, they can't take advantage of thermal currents. Pearson's airplane was dropping more than 20 feet per second.

Pearson guessed that the best glide ratio speed would be 220 knots, and maintained that speed in order to keep the airplane going for the longest amount of time. Maurice Quintal, the first officer, calculated that they wouldn't make it to Winnipeg. He suggested instead a former Royal Canadian Air Force base that he had used years earlier. It was only 12 miles away, in Gimli, a tiny community originally settled by Icelanders in 1875.1 So Pearson changed course once again.

Pearson had never been to Gimli but he accepted Quintal's advice and headed for the Gimli runway. He steered by the texture of the clouds underneath him. He would ask Winnipeg Central for corrections in his heading, turn by about the amount requested, then ask the air traffic controllers whether he had made the correct turn. Near the end of the flight he thought he spotted the Gimli runway, but Quintal corrected him.

As Pearson got closer to the runway, he knew that the airplane was coming in too high and too fast. Normally he would try to slow to 130 knots when the wheels touched down, but that was not possible now and he was likely to crash.

Luckily, Pearson was also a skilled glider pilot. (So was Chesley Sullenberger, the pilot who landed a US Airways jetliner in the Hudson River in January of 2009. We will examine the Hudson River landing in chapter 6.) Pearson drew on some techniques that aren't taught to commercial pilots. In desperation, he tried a maneuver called a slideslip, skidding the airplane forward in the way ice skaters twist their skates to skid to a stop.

He pushed the yoke to the left, as if he was going to turn, but pressed hard on the right rudder pedal to counter the turn. That kept the airplane on course toward the runway. Pearson used the ailerons and the rudder to create more drag. Pilots use this maneuver with gliders and light aircraft to produce a rapid drop in altitude and airspeed, but it had never been tried with a commercial jet. The slide-slip maneuver was Pearson's only hope, and it worked.

  When the plane was only 40 feet off the ground, Pearson eased up on the controls, straightened out the airplane, and brought it in at 175 knots, almost precisely on the normal runway landing point. All the passengers and the crewmembers were safe, although a few had been injured in the scramble to exit the plane after it rolled to a stop.

The plane was repaired at Gimli and was flown out two days later. It returned to the Air Canada fleet and stayed in service another 25 years, until 2008.2 It was affectionately called "the Gimli Glider."

The story had a reasonably happy ending, but a mysterious beginning. How had the plane run out of fuel? Four breakdowns, four strokes of bad luck, contributed to the crisis.

Ironically, safety features built into the instruments had caused the first breakdown. The Boeing 767, like all sophisticated airplanes, monitors fuel flow very carefully. It has two parallel systems measuring fuel, just to be safe. If either channel 1 or channel 2 fails, the other serves as a backup.

However, when you have independent systems, you also have to reconcile any differences between them. Therefore, the 767 has a separate computer system to figure out which of the two systems is more trustworthy. Investigators later found that a small drop of solder in Pearson's airplane had created a partial connection in channel 2. The partial connection allowed just a small amount of current to flow-not enough for channel 2 to operate correctly, but just enough to keep the default mode from kicking in and shifting to channel 1.

The partial connection confused the computer, which gave up. This problem had been detected when the airplane had landed in Edmonton the night before. The Edmonton mechanic, Conrad Yaremko, wasn't able to diagnose what caused the fault, nor did he have a spare fuel-quantity processor. But he had figured out a workaround. If he turned channel 2 off, that circumvented the problem; channel 1 worked fine as long as the computer let it.

The airplane could fly acceptably using just one fuel-quantity processor channel. Yaremko therefore pulled the circuit breaker to channel 2 and put tape over it, marking it as inoperative. The next morning, July 23, a crew flew the plane from Edmonton to Montreal without any trouble.

The second breakdown was a Montreal mechanic's misguided attempt to fix the problem. The Montreal mechanic, jean Ouellet, took note of the problem and, out of curiosity, decided to investigate further. Ouellet had just completed a two-month training course for the 767 but had never worked on one before. He tinkered a bit with the faulty Fuel Quantity Indicator System without success. He re-enabled channel 2; as before, the fuel gauges in the cockpit went blank. Then he got distracted by another task and failed to pull the circuit breaker for channel 2, even though he left the tape in place showing the channel as inoperative. As a result, the automatic fuel-monitoring system stopped working and the fuel gauges stayed blank.

  A third breakdown was confusion about the nature of the fuel gauge problem. When Pearson saw the blank fuel gauges and consulted a list of minimum requirements, he knew that the airplane couldn't be flown in that condition. He also knew that the 767 was still very new-it had first entered into airline service in 1982. The minimum requirements list had already been changed 55 times in the four months that Air Canada had been flying 767s. Therefore, pilots depended more on the maintenance crew to guide their judgment than on the lists and manuals.

Pearson saw that the maintenance crews had approved this airplane to keep flying despite the problem with the fuel gauges. Pearson didn't understand that the crew had approved the airplane to fly using only channel 1. In talking with the pilot who had flown the previous legs, Pearson had gotten the mistaken impression that the airplane had just flown from Edmonton to Ottawa to Montreal with blank fuel gauges. That pilot had mentioned a "fuel gauge problem." When Pearson climbed into the cockpit and saw that the fuel gauges were blank, he assumed that was the problem the previous pilot had encountered, which implied that it was somehow acceptable to continue to operate that way.

The mechanics had another way to provide the pilots with fuel information. They could use a drip-stick mechanism to measure the amount of fuel currently stored in each of the tanks, and they could manually enter that information into the computer. The computer system could then calculate, fairly accurately, how much fuel was remaining all through the flight.

In this case, the mechanics carefully determined the amount of fuel in the tanks. But they made an error when they converted that to weight. This error was the fourth breakdown.

Canada had converted to the metric system only a few years earlier, in 1979. The government had pressed Air Canada to direct Boeing to build the new 767s using metric measurements of liters and kilograms instead of gallons and pounds-the first, and at that time the only, airplane in the Air Canada fleet to use the metric system. The mechanics in Montreal weren't sure about how to make the conversion (on other airplanes the flight engineer did that job, but the 767 didn't use a flight engineer), and they got it wrong.

In using the drip-stick measurements, the mechanics plugged in the weight in pounds instead of kilograms. No one caught the error. Because of the error, everyone believed they had 22,300 kg of fuel on board, the amount needed to get them to Edmonton, but in fact they had only a little more than 10,000 kg-less than half the amount they needed.

  Pearson was understandably distressed by the thought of not being able to monitor the fuel flow directly. Still, the figures had been checked repeatedly, showing that the airplane had more fuel than was necessary. The drip test had been repeated several times, just to be sure.

That morning, the airplane had gotten approval to fly from Edmonton to Montreal despite having fuel gauges that were blank. (In this Pearson was mistaken; the airplane used channel 1 and did have working fuel gauges.) Pearson had been told that maintenance control had cleared the airplane.

The burden of proof had shifted, and Pearson would have to justify a decision to cancel this flight. On the basis of what he knew, or believed he knew, he couldn't justify that decision. Thus, he took off, and everything went well until he ran out of fuel and both his engines stopped.

== End ==

This story is an example that one cannot build "unhackable systems." I also believe this story demonstrates that operational and decision-based failures will continue to plague technology. It is no use building systems that theoretically "have no vulnerabilities" so long as people operate and make decisions based on use of those systems.

If you liked this post, I've written about engineering disasters in the past.

You can but the book which published this story at

by Richard Bejtlich ( at January 13, 2015 11:11 AM

Chris Siebenmann

Our tradeoffs on ZFS ZIL SLOG devices for pools

As I mentioned in my entry on the effects of losing a SLOG device, our initial plan (or really idea) for SLOGs in our new fileservers was to use a mirrored pair for each pool that we gave a SLOG to, split between iSCSI backends as usual. This is clearly the most resilient choice for a SLOG setup, assuming that you have SSDs with supercaps; it would take a really unusual series of events to lose any committed data in the pool.

On ZFS mailing lists that I've read, there are plenty of people who think that using mirrored SSDs for your SLOG is overkill for the likely extremely unlikely event of a simultaneous server and SLOG failure. This would obviously save us one SLOG device (or chunk) per pool, which has its obvious attractions.

If we're willing to drop to one SLOG device per pool and live with the resulting small chance of data loss, a more extreme possibility is to put the SLOG device on the fileserver itself instead of on an iSCSI backend. The potential big win here would be moving from iSCSI to purely local IO, which presumably has lower latency and thus would enable to fileserver to respond to synchronous NFS operations faster. The drawback is that we couldn't fail over pools to another fileserver without either abandoning the SLOG (with potential data loss) or physically moving the SLOG device to the other fileserver. While we've almost never failed over pools, especially remotely, I'm not sure we want to abandon the possibility quite so definitely.

(And before we went down this road we'd definitely want to measure the IO latencies of SLOG writes to a local SSD versus SLOG writes to an iSCSI SSD. It may well be that there's almost no difference, at which point giving up the failover advantages would be relatively crazy.)

Since we aren't yet at the point of trying SLOGs on any pools or even measuring our volume of ZIL writes, all of this is idle planning for now. But I like to think ahead and to some extent it affects things like how many bays we fill in the iSCSI backends (we're currently reserving two bays on each backend for future SLOG SSDs).

PS: Even if we have a low volume of ZIL writes in general, we may find that we hit the ZIL hard during certain sorts of operations (perhaps eg unpacking tarfiles or doing VCS operations) and it's worth adding SLOGs just so we don't perform terribly when people do them. Of course this is going to be quite affected by the price of appropriate SSDs.

by cks at January 13, 2015 05:39 AM

January 12, 2015

Chris Siebenmann

I've now seen comment spam attempts from Tor exit nodes

As I mentioned on Twitter, I've recently started seeing some amount of comment spam attempts from IPs that are more or less explicitly labeled as Tor exit nodes. While I haven't paid exhaustive attention to comment spam sources over time, to the best of my awareness this is relatively new behavior on the part of my comment spammers. To date not very many comment spam attempts have been made from Tor IPs and other sources still dominate.

Since none of the comment spam attempts have succeeded, I face no temptation to block the Tor exit nodes. There are plenty of legitimate uses for Tor and I'd much rather have my logs be a little bit noisier with more failed comment spam attempts than even block a legitimate anonymous comment.

(Really I only block comment spam sources because I'm irritated at them, not because I think they represent any particular danger of succeeding. So far I've seen no sign that the robotic form stuffers are changing their behavior in any way; they've been failing for more than half a decade and I expect them to keep failing for at least the next half a decade. It's very unlikely that my little corner of the web is important enough to attract actual human programming attention.)

Given that this is a recent change, my suspicion is that Tor has simply become increasingly visible and well known to spammers through its appearance in stories about Silk Road and other hidden services (and people using it). Apparently some malware is now starting to use Tor to contact its command and control infrastructure, too, and certainly we've seen attackers use Tor to hide their IP origin when they access cracked accounts.

(Ironically this makes access from Tor exit nodes a glaring sign of a cracked account for us, since basically none of our users do this normally. Conveniently there are sources for lists of Tor exit nodes (also).)

by cks at January 12, 2015 06:08 AM