Planet SysAdmin

March 06, 2015

Chris Siebenmann

The simple way CPython does constant folding

When I looked into CPython's constant folding as part of writing yesterday's entry, I expected to find something clever and perhaps intricate, involving examining the types of constants and knowing about special combinations and so on. It turns out that CPython doesn't bother with all of this and it has a much simpler approach.

To do constant folding, CPython basically just calls the same general code that it would call to do, say, '+' on built in types during execution. There is no attempt to recognize certain type combinations or operation combinations and handle them specially; any binary operation on any suitable constant will get tried. This includes combinations that will fail (such as '1 + "a"') and combinations that you might not expect to succeed, such as '"foo %d" % 10'.

(Note that there are some limits on what will get stored as a new folded constant, including that it's not too long. '"a" * 1000' won't be constant folded, for example, although '"a" * 5' will.)

What is a suitable constant is both easy and hard to define. The easy definition is that a suitable constant is anything that winds up in func_code.co_consts and so is accessed with a LOAD_CONST bytecode instruction. Roughly speaking this is any immutable basic type, which I believe currently is essentially integers, strings, and floating point numbers. In Python 3, tuples of these types will also be candidates for taking part in constant folding.

At first this approach to constant folding seemed alarming to me, since CPython is calling general purpose evaluation code, code that's normally used much later during bytecode execution. But then I realized that CPython is only doing this in very restricted circumstances; since it's only doing this with a very few types of immutable objects, it knows a lot about what C code is actually getting called here (and this code is part of the core interpreter). The code for these basic types has an implicit requirement that it can also be called as bytecode is being optimized, and the Python authors can make sure this is respected when things change. This would be unreasonable for arbitrary types, even arbitrary C level types, but is perfectly rational here and is beautifully simple.

In short, CPython has avoided the need to write special constant folding evaluation code by making sure that its regular evaluation code for certain basic types can also be used in this situation and then just doing so. In the process it opened up some surprising constant folding opportunities.

(And it can automatically open up more in the future, since anything that winds up in co_consts is immediately a candidate for constant folding.)

Sidebar: What happens with tuples in Python 2

In the compiled bytecode, tuples of constant values do not actually start out as constants; instead they start out as a series of 'load constant' bytecodes followed by a BUILD_TUPLE instruction. Part of CPython's peephole optimizations is to transform this sequence into a new prebuilt constant tuple (and a LOAD_CONST instruction to access it).

In Python 2, the whole peephole optimizer apparently effectively doesn't reconsider the instruction stream after doing this optimization. So if you have '(1, 2) + (3, 4)' you get a first transformation to turn the two tuples into constants, but CPython never goes on to do constant folding for the + operation itself; by the time + actually has two constant operands, it's apparently too late. In Python 3, this limitation is gone and so the + will be constant folded as well.

(Examining __code__.co_consts in Python 3 shows that the intermediate step still happens; the co_consts for a function that just has a 'return' of this is '(None, 1, 2, 3, 4, (1, 2), (3, 4), (1, 2, 3, 4))', where we see the intermediate tuples being built before we wind up with the final version. In general constant folding appears to leave intermediate results around, eg for '10 + 20 + 30 + 40' you get several intermediate constants as well as 100.)

by cks at March 06, 2015 07:21 AM

Steve Kemp's Blog

Free hosting, and key-signing

Over the past week I've mailed many of the people who had signed my previous GPG key and who had checked my ID as part of that process. My intention was to ask "Hey you trusted me before, would you sign my new key?".

So far no replies. I may have to be more dedicated and do the local-thing with people.

In other news Bytemark, who have previously donated a blade server, sponsored Debconf, and done other similar things, have now started offering free hosting to Debian-developers.

There is a list of such offers here:

I think that concludes this months blog-posting quota. Although who knows? I turn 39 in a couple of days, and that might allow me to make a new one.

March 06, 2015 12:00 AM

March 05, 2015

LZone - Sysadmin

HowTo: Implement Consistent Hashing with Different memcached Bindings

A short HowTo on memcached consistent hashing. Of course also works with memcached protocol compatible software as CouchBase, MySQL...


Papers to read to learn about what consistent hashing is about:

Consistent Hashing with nginx

 upstream somestream {
      consistent_hash $request_uri;

Consistent Hashing with PHP

Note: the order of setOption() and addServers() is important. When using OPT_LIBKETAMA_COMPATIBLE
the hashing is compatible with all other runtimes using libmemcached.

$memcached = new Memcached();
$memcached->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
$memcached->setOption(Memcached::OPT_LIBKETAMA_COMPATIBLE, true);

Consistent Hashing in Perl

As in PHP the order of setOptions() and addServers() matters. After all both languages use the same library in the background, so behaviour is the same.

$m = new Memcached('mymemcache');
   Memcached::OPT_LIBKETAMA_COMPATIBLE => true,

by Lars Windolf at March 05, 2015 08:39 PM


Encryption is hard

I've run into this workflow problem before, but it happened again so I'm sharing.

We have a standard.

No passwords in plain-text. If passwords need to be emailed, the email will be encrypted with S/MIME.

Awesome. I have certificates, and so do my coworkers. Should be awesome!

To: coworker
From: me
Subject: Anti-spam appliance password

[The content can't be displayed because the S/MIME control isn't available]

Standard folowed, mischief managed.

To: me
From: coworker
Subject: RE: Anti-spam appliance password
Thanks! Worked great.

To: coworker
From: me
uid: admin1792
pw: 92*$&diq38yljq3


Encryption is hard. It would be awesome if a certain mail-client defaulted to replying-in-kind to encrypted emails. But it doesn't, and users have to remember to click the button. Which they never do.

by SysAdmin1138 at March 05, 2015 04:46 PM

Chris Siebenmann

An interesting excursion with Python strings and is

Let's start with the following surprising interactive example from @xlerb's tweet:

>>> "foo" is ("fo" + "o")
>>> "foo" is ("fo".__add__("o"))
>>> "foo" == ("fo".__add__("o"))

The last two case aren't surprising at all; they demonstrate that equality is bigger than mere object identity, which is what is tests (as I described in my entry on Python's two versions of equality). The surprising case is the first one; why do the two sides of that result in exactly the same object? There turn out to be two things going on here, both of them quite interesting.

The first thing going on is that CPython does constant folding on string concatenation as part of creating bytecode. This means that the '"fo" + "o"' turns into a literal "foo" in the actual bytecodes that are executed. On the surface, this is enough to explain the check succeeding in some contexts. To make life simpler while simultaneously going further down the rabbit hole, consider a function like the following:

def f():
  return "foo" is ("fo"+"o")

Compiled functions have (among other things) a table of strings and other constants used in the function. Given constant folding and an obvious optimization, you would expect "foo" to appear in this table exactly once. Well, actually, that's wrong; here's what func_code.co_consts is for this function in Python 2:

(None, 'foo', 'fo', 'o', 'foo')

(It's the same in Python 3, but now it's in __code__.co_consts.)

Given this we can sort of see what happened. Probably the bytecode was originally compiled without constant folding and then a later pass optimized the string concatenation away and added the folded version to co_consts, operating on the entirely rational assumption that it didn't duplicate anything already there. This would be a natural fit for a simple peephole optimizer, which is in fact exactly what we find in Python/peephole.c in the CPython 2 source code.

But how does this give us object identity? The answer has to be that CPython interns at least some of the literal strings used in CPython code. In fact, if we check func_code.co_consts for our function up above, we can see that both "foo" strings are in fact already the same object even though there's two entries in co_consts. The effect is actually fairly strong; for example, the same literal string as in two different modules can be interned to be the same object. I haven't been able to find the CPython code that actually does this, so I can't tell you what the exact conditions are.

(Whether or not a literal string is interned appears to depend partly on whether or not it has spaces in it. This rabbit hole goes a long way down.)

PS: I believe that this means I was wrong about some things I said in my entry on instance dictionaries and attribute names, in that more things get interned than I thought back then. Or maybe CPython grew more string interning optimizations since then.

by cks at March 05, 2015 05:23 AM

March 04, 2015

Chris Siebenmann

What creates inheritance?

It all started with an @eevee tweet:

people who love inheritance: please tell me what problems it has solved for you that you don't think could be solved otherwise

My answer was code reuse, but that started me down the road of wondering what the boundaries are of what people will call 'inheritance', at least as far as code reuse goes.

These days, clearly the center of the 'inheritance' circle is straight up traditional subclassing in languages like Python. But let's take a whole series of other approaches to code and object reuse and ask if they provide inheritance.

First up, proxying. In Python it's relatively easy to build explicit proxy objects, where there is no subclass relationship but everything except some selected operations is handed off to another object's methods and you thus get to use them. I suspect that the existence of two objects makes this not inheritance in most people's eyes.

What about Go's structure embedding and interfaces? In Go you can get code reuse by embedding an anonymous instance of something inside your own struct. Any methods on defined for the anonymous instance can now be (syntactically) called on your own struct, and you can define your own methods that override some of them. With use of interfaces you can relatively transparently mix instances of your struct with instances of the original. This gets you something that's at least very like inheritance without a subclass or a proxy object in sight.

(This is almost like what I did here, but I didn't make the *bufio.Writer anonymous because there was no point.)

How about traits, especially traits that allow you to override methods in the base object? You certainly don't have a subclass relationship here, but you do have code reuse with modifications and some languages may be dynamic enough to allow the base object's methods to use methods from a trait.

So, as I wound up theorizing, maybe what creates inheritance is simply having a method resolution order, or more exactly having a need for it; this happens in a language where you can have multiple sources of methods for a single object. On the other hand this feels somewhat like a contorted definition and I don't know where people draw the line in practice. I don't even know exactly where I draw the line.

by cks at March 04, 2015 06:09 AM

toolsmith: Faraday IPE - When Tinfoil Won’t Work for Pentesting

Typically *nix, tested on Debian, Ubuntu, Kali, etc.
Kali 1.1.0 recommended, virtual machine or physical

I love me some tinfoil-hat-wearing conspiracy theorists, nothing better than sparking up a lively conversation with a “Hey man, what was that helicopter doing over your house?” and you’re off to the races. Me, I just operate on the premise that everyone is out to get me and I’m good to go. For the more scientific amongst you, there’s always a Faraday option. What? You don’t have a Faraday Cage in your house? You’re going to need more tinfoil. :-)

Figure 1 – Tinfoil coupon

In all seriousness, Faraday, in the toolsmith context, is an Integrated Penetration-Test Environment (IPE); think of it as an IDE for penetration testing designed for distribution, indexation, and analysis of the generated data during the process of a security audit (pentest) conducted with multiple users. It was some years ago when we discussed them in toolsmith, but Raphael Mudge’s Armitage is a similar concept for Metasploit, while Dradis provides information sharing for pentest teams
Faraday now includes plugin support for over 40 tools, including some toolsmith topics and favorites such as Openvas, BeEF Arachni, Skipfish, and ZAP.
The Faraday project offers a robust wiki and a number of demo videos you should watch as well.
I pinged Federico Kirschbaum, Infobyte’s CTO and project lead for Faraday.
He stated that, as learned from doing security assessments, they always had the need to know what the results were from the tests performed by other team members. Sharing partial knowledge of target systems proved to be useful not only to avoid overlapping but also to reuse discoveries and build a complete picture. During penetration tests where the scope is quite large, it is common that a vulnerability detected in one part of the network can be exploited somewhere else as well. Faraday’s purpose is to aid security professionals and its development is driven by this desire to truly convert penetration testing into a community experience.
Federico also described their goal to provide an environment where all the data generated during a pentest can be transformed into meaningful, indexed information. Results can then be easily distributed between team members in real time without the need to change workflow or tools, allowing them to benefit from the shared knowledge. Pentesters use a lot of tools on a daily basis, and everybody has a "favorite" toolset, ranging from full blown vulnerability scanners to in-house tools; instead of trying to change the way people like to work the team designed Faraday as a bridge that allows tools to work in a collaborative way. Faraday's plug-in engine currently supports more than 40 well known tools and also provides an easy-to-use API to support custom tools.
Information persisted in Faraday can be queried, filtered, and exported to feed other tools. As an example, one could extract all hosts discovered running SSH in order to perform mass brute force attacks or see which commands or tools have been executed.
Federico pointed out that Faraday wasn't built thinking only about pentesters. Project managers can also benefit from a central database containing several assessments at once while being able to easily see the progress of their teams and have the ability to export information to send status reports.
It was surprising to the Infobytes team that many of the companies that use Faraday today are pentest clients rather than the actual pentest consultant. This is further indication of why it is always useful to have a repository of penetration test results whether they be internal or through outside vendors.
Faraday comes in three flavors - Community, Professional and Corporate. All of the features mentioned above are available in our Community version, which is Open Source. I tested Community for this effort as it is free.
Federico, in closing, pointed out that one of the main features in the commercial version is the ability to export reports for MS Word containing all the vulnerabilities, graphs, and progress status. This makes reporting, a pentester’s bane (painful, uncomfortable, unnatural even), into a one-click operation that can be executed by any team member at any time. See the product comparison page for more features and details for versions, based on your budget and needs.

Faraday preparation

The easiest way to run Faraday, in my opinion, is from Kali. This is a good time to mention that Kali 1.1.0 is available as of 9 FEB 2015, if you haven’t yet upgraded, I recommend doing so soon.
At the Kali terminal prompt, execute:
git clone faraday-dev
cd faraday-dev
The installer will download and install dependencies, but you’ll need to tweak CouchDB to make use of the beautiful HTML5 reporting interface. Use vim or Leafpad to edit /etc/couchdb/local.iniand uncomment (remove semicolon) for port and bind_address on lines 11 and 12. You may want to use the Kali instances IP address, rather than the loopback address to allow remote connections (other users). You can also change the port to your liking. Then restart the CouchDB service with service couchdb restart. You can manipulate SSL and authentication mechanisms in local.ini as well. Now issue ./ -d. I recommend running with –das it gives you all the debug content in the logging console. The service will start, the QT GUI will spawn, and if all goes well, you’ll receive an INFO message telling you where to point your browser for the CouchDB reporting interface. Note that there are limitations specific to reporting in the Community version as compared to its commercial peers.

Figure 2 – Initial Faraday GUI QT
Fragging with Faraday

The first thing you should do in the Faraday UI is create a workspace: Workspace | Create. Be sure to save it as CouchDB as opposed to FS. I didn’t enable replication as I worked alone for this assessment.
Shockingly, I named mine toolsmith. Explore the plugins available thereafter with either Tools | Plugin or use the Plugin button, fourth from the right on the toolbar. I started my assessment exercise against a vulnerable virtual machine ( with a quick ping and nmap via the Faraday shell (Figure 3). To ensure the default visualizations for Top Services and Top Host populated in the Faraday Dashboard, I also scanned a couple of my gateways.

Figure 3 – Preliminary Faraday results
As we can see in Figure 3, our target host is appears to be listening on port 80, indicating a web server, and a great time to utilize a web application scanner. Some tools such as the commercial Burpsuite Pro have a Faraday plugin for direct integration, you can still make use free Burpsuite data, as well as results from the likes of free and fabulous OWASP ZAP. To so, conduct a scan and save the results as XML to the applicable workspace directory, ~/.faraday/report/toolsmithin my case. The results become evident when you right-click the target host in the Host Tree as seen in Figure 4.

Figure 4 – Faraday incorporates OWASP ZAP results
We can see as we scroll through findings we’ve discovered a SQL injection vulnerability; no better time to use sqlmap, also supported by Faraday. Via the Faraday shell I ran the following, based on my understanding of the target apps discovered with ZAP.
To enumerate the databases:
sqlmap -u '' –dbs
To enumerate the tables present in the Joomla database:
sqlmap -u '' -D joomla –tables
To dump the users from the Joomla database:
sqlmap -u '' --dump  -D joomla -T j25_users
Unfortunately, late in the game as this was being written, we discovered a change in sqlmap behavior that cause some misses for the Faraday sqlmap plugin, preventing sqlmap data from being populated in the CouchDB and thus the Faraday host tree. Federico immediately noted the issue was issuing a patch as I was writing; by the time you read this you’ll likely be working with an updated version. I love sqlmap so much though and wanted you to see the Faraday integration. Figure 5 gives you a general sense of the Faraday GUI accommodating all this sqlmap mayhem.

Figure 5 – Faraday shell and sqlmap
That being said, here’s where all the real Faraday superpowers kick in. You’ve enumerated, assessed, and even exploited, now to see some truly beautified HTML5 results. Per Figure 6, the Faraday Dashboard is literally one of the most attractive I’ve ever seen and includes different workspace views, hover-over functionality and host drilldown.

Figure 6 – Faraday Dashboard
There’s also the status report view which really should speak for itself but allows you really flexible filtering as seen in Figure7.

Figure 7 – Faraday Status
Those pentesters and pentest PMs who are looking for a data management solution should now be fully inspired to check out Faraday in its various versions and support levels. It’s an exciting tool for a critical cause.

In Conclusion

Faraday is a project that benefits from your feedback, feature suggestions, bug reports, and general support. They’re an engaged team with a uniquely specialized approach to problem solving for the red team cause, and I look forward to future releases and updates. I know more than one penetration testing team to whom I will strongly suggest Faraday consideration.
Ping me via email or Twitter if you have questions (russ at holisticinfosec dot org or @holisticinfosec).
Cheers…until next month.


Federico Kirschbaum (@fede_k), Faraday (@faradaysec) project lead, CTO Infobyte LLC (@infobytesec

by (Russ McRee) at March 04, 2015 04:00 AM

March 03, 2015

Everything Sysadmin

Hiring a network engineer for SRE team (NYC only)

Stack Exchange, Inc. is looking to hire a sysadmin/network admin/SRE/DevOps engineer that will focus on network-related projects. The position will work out of the NYC office, so you must be in NYC or be willing to relocate.

If 3 or more of these project sound like fun to you, contact us!

  • Automate Cisco LAN port configuration via Puppet
  • Make our site-to-site VPN more reliable
  • Tune NIC parameters for maximum performance / lowest latency
  • Lead the network design of our global datacenter network deployment strategy
  • Wrangle our BGP configurations for ease of updating and security
  • Establish operational procedures for when ISPs report they can't reach us

Sounds interesting? The full job advertisement and resume submission instructions are here:

This position will work on the same team that I'm on, the Stack Exchange SRE team.

March 03, 2015 03:52 PM

The Tech Teapot

New domain and last chance to subscribe via email

If you receive The Tech Teapot via email, this is your last chance to continue doing so. From now The Tech Teapot is moving over to use MailChimp instead of Google Feedburner for delivering email with the latest posts.

Feedburner has been withering on the vine since being taken over by Google. The reason that this is your final chance is because Google doesn’t manage the email list. Of the 500+ email subscribers, not a single one has been removed from the list due to email bouncing. I find it hard to believe that no subscriber has moved jobs in the last 8 years. Consequently, the Feedburner list is full of invalid, out of date email addresses. As a consequence, MailChimp will not import the Feedburner list.

Sorry about that, but fear not, you can sign-up here.

P.S. The more observant may have noticed the change of domain, I’ve owned the domain for a while and thought it about time the blog was moved onto its very own domain. Hope you like it.

P.P.S. Feed subscribers will also need to update the feed URL to

Update 5 March 2015: turns out that when you delete your feed in Feedburner you are given the option to redirect the Feedburner feed back to the original feed. So, if you’re subscribed to the feed through your feed reader, you will not need to change anything in order to continue receiving new updates :)

by Jack Hughes at March 03, 2015 01:31 PM

Standalone Sysadmin

Annoying pfSense Issue with 2.15 -> 2.2 Upgrade

I run several pfSense boxes throughout my network. Although the platform doesn't have an API, and it can be a pain to configure manually in certain cases, it's generally very reliable once running, and because it's essentially a skinned BSD, it's very easy on resources. There's also a really nice self-update feature that I use to get things to the newest release when they're available.

It's that last feature that bit me in my butt Sunday night. After doing the upgrade at midnight or so, I went to bed after everything seemed to work alright, but then this morning, I started getting reports that people couldn't log into the captive portal that we use for our "guest" wired connections.

I thought, "That's strange...everything seemed to work after the upgrade, but I'll check it out", and sure enough, as far as I could tell, all of the networks were working fine on that machine, but there was no one logged into the captive portal.

Taking a look at the logs, I found this error:

logportalauth[42471]: Zone: cpzone - Error during table cpzone creation.
Error message: file is encrypted or is not a database

Well, hrm. "Error during table cpzone creation" is strange, but "file is encrypted or is not a database" is even weirder. Doing a quick google search, I came across this thread on the pfSense forums where someone else (maybe the only other person?) has encountered the same problem I have.

As it turns out, prior to version 2.2, pfSense was still using sqlite2, but now, it's on sqlite3, and the database formats are incompatible. A mention of that in the upgrade notes would have been, you know, swell.

The thread on the forums suggests to shut off the captive portal service, remove the .db files, and then restart the service. I tried that, and it didn't work for me, so what I did after that was to shut down the captive portal (to release any file locks), remove the db files, and then from the text-mode administrative menu, force an re-installation of pfSense itself.

Although I haven't actually tested the captive portal yet (I'm at home doing this remotely, because #YOLO), a new database file has been created (/var/db/captiveportalcpzone.db) and inspecting it seems to show sqlite3 working:

[2.2-RELEASE][root@host]/var/db: sqlite3 captiveportalcpzone.db
SQLite version 2014-11-18 20:57:56
Enter ".help" for usage hints.
sqlite> .databases
seq  name             file
---  ---------------  ----------------------------------------------------------
0    main             /var/db/captiveportalcpzone.db
sqlite> .quit

This is as opposed to some of the older database files created prior to upgrade:

[2.2-RELEASE][root@host]/var/db/backup: sqlite3 captiveportalinterface.db
SQLite version 2014-11-18 20:57:56
Enter ".help" for usage hints.
sqlite> .databases
Error: file is encrypted or is not a database

What I don't understand is that the normal way to convert from sqlite2 to sqlite3 is to dump and restore, but it doesn't look like this process did that at all. It would be incredibly easy to do a database dump/restore during an upgrade, ESPECIALLY when revving major database versions like this.

Anyway, this kind of experience is very unusual for me with pfSense. Normally it's "set it and forget it". Hopefully this will work and I can get back to complaining about a lack of API.

by Matt Simmons at March 03, 2015 08:01 AM

Chris Siebenmann

The latest xterm versions mangle $SHELL in annoying ways

As of patch #301 (and with changes since then), the canonical version of xterm has some unfortunate behavior changes surrounding the $SHELL environment variable and how xterm interacts with it. The full details are in the xterm manpage in the OPTIONS section, but the summary is that xterm now clears or changes $SHELL if the $SHELL value is not in /etc/shells, and sometimes even if it is. As far as I can tell, the decision tree goes like this:

  1. if xterm is (explicitly) running something that is in /etc/shells (as 'xterm /some/thing', not 'xterm -e /some/thing'), $SHELL will be rewritten to that thing.

  2. if xterm is running anything (including running $SHELL itself via being invoked as just 'xterm') and $SHELL is not in /etc/shells but your login shell is, $SHELL will be reset to your login shell.

  3. otherwise $SHELL will be removed from the environment, resulting in a shell environment with $SHELL unset. This happens even if you run plain 'xterm' and so xterm is running $SHELL.

It is difficult for me to summarize concisely how wrong this is and how many ways it can cause problems. For a start, this is a misuse of /etc/shells, per my entry on what it is and isn't; /etc/shells is in no way a complete list of all of the shells (or all of the good shells) that are in use on the system. You cannot validate the contents of $SHELL against /etc/shells because that is not what /etc/shells is there for.

This xterm change causes significant problems for anyone with their shell set to something that is not in /etc/shells, anyone using an alternate personal shell (which is not in /etc/shells for obvious reasons), any program that assumes $SHELL is always set (historically a safe assumption), and any environment that assumes $SHELL is not reset when set to something non-standard such as a captive or special purpose 'shell'.

(Not all versions of chsh restrict you to what's in /etc/shells, for that matter; some will let you set other things if you really ask them to.)

If you fall into one or more of these categories and you use xterm, you're going to need to change your environment at some point. Unfortunately it seems unlikely that this change will be reverted, so if your version of Unix updates xterm at all you're going to have it sooner or later (so far only a few Linux distributions are recent enough to have it).

PS: Perhaps this should be my cue to switch to urxvt. However my almost-default configuration of it is still just enough different from xterm to be irritating for me, although maybe I could fix that with enough customization work. For example, I really want its double-click selection behavior to exactly match xterm because that's what my reflexes expect and demand by now. See also.

PPS: Yes, I do get quite irritated at abrupt incompatible changes in the behavior of long-standing Unix programs, at least when they affect me.

by cks at March 03, 2015 05:09 AM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – February 2015

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? Well, you be the judge.  Current emergence of open source log search tools, BTW, does not break the logic of that post. SIEM requires a lot of work, whether you paid for the software, or not.
  2. Simple Log Review Checklist Released!” is often at the top of this list – the checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version)
  3. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM, see this document)
  4. My classic PCI DSS Log Review series is always popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3.0 as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (just out in its 4th edition!)
  5. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases.
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog:

Current research on security analytics:
Miscellaneous fun posts:

(see all my published Gartner research here)

Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.
Previous post in this endless series:

by (Anton Chuvakin) at March 03, 2015 04:01 AM

March 02, 2015


Why Would Iran Welcome Western Tech?

I noticed an AFP story posted by Al Jazeera America titled Iran could allow in Google, other tech companies if they follow rules. It included the following:

Iran could allow Internet giants such as Google to operate in the the country if they respect its "cultural" rules, Fars news agency said on Sunday, quoting a senior official.

"We are not opposed to any of the entities operating in global markets who want to offer services in Iran," Deputy Telecommunications and Information Technology Minister Nasrollah Jahangard reportedly told Fars.

"We are ready to negotiate with them and if they accept our cultural rules and policies they can offer their services in Iran," he said.

Jahangard said Iran is "also ready to provide Google or any other company with facilities" that could enable them to provide their services to the region.

These statements caught my eye because they contrast with China's actions, in the opposite direction. For example, on Friday the Washington Post published China removes top U.S. tech firms from government purchasing list, which said in part:

China has dropped several top U.S. technology companies, including Cisco and Apple, from a list of brands that are approved for state purchases, amid a widening rift with the United States about cyberspace...

Other companies dropped included Apple, Intel’s McAfee security software firm, and network and server software company Citrix Systems. Hewlett-Packard and Dell products remained on the list.

“The main reason for dropping foreign brands is out of national security. It’s the effect of Snowden and PRISM,” said Mei Xinyu, a researcher with the Ministry of Commerce. “When it comes to national security, no country should let their guard down.”

So why would Iran "let their guard down," to use Mei Xinyu's suggestion?

It's possible Iran is trying to encourage a favorable resolution to the nuclear power negotiations currently underway. I don't think its stance on technology is going to move the negotiations one way or another, however.

It's more likely that Iran recognizes that it lacks the sorts of national champions found in China. Iran isn't at the point where a local version of Cisco or Apple could replace the American brands. China, in contrast, has Huawei and ZTE for telecoms and Xiaomi (and others) for smartphones.

Iran might also be smart enough to realize that American brands could be the "safest" and most "secure" brands available, given the resistance of American tech companies to perceptions that they work on behalf of the US intelligence community.

At the New America cyber event last week, Bruce Schneier noted that the Cold War mission of the NSA was to "attack their stuff, and defend our stuff." However, when we "all use the same stuff," it's tougher for the NSA to follow its Cold War methodology.

I stated several times last week in various locations that countries like China who adopt their own national tech champions are essentially restoring the Cold War situation. If China rejects American technology, and runs its own, it will once again be possible for the NSA to "attack their stuff, and defend our stuff."

In that respect, I encourage the Chinese to run their own gear.

by Richard Bejtlich ( at March 02, 2015 10:10 PM

Chris Siebenmann

My view of the difference between 'pets' and 'cattle'

A few weeks ago I wrote about how all of our important machines are pets. When I did that I did not strongly define how I view the difference between pets and cattle, partly because I thought it was obvious. Subsequent commentary in various places showed me that I was wrong about this, so now I'm going to nail things down.

To me the core distinction is not in whether you hand-build machines or have them automatically configured. Obviously when you have a large herd of cattle you cannot hand-build them, but equally obviously the current best practice is to use automated setups even for one-off machines and in small environments. Instead the real distinction is how much you care about each individual machine. In the cattle approach, any individual machine is more or less expendable. Does it have problems? Your default answer is to shoot it and start a new one (which your build automation and scaling systems should make easy). In the pet approach each individual machine is precious; if it has problems you attempt to nurse it back to health, just as you would with a loved pet, and building a new one is only a last resort even if your automation means that you can do this rapidly.

If you don't have build automation and so on, replacing any machine is a time consuming thing so you wind up with pets by default. But even if you do have fast automated builds, you can still have pets due to things like them having local state of some sort. Sure, you have backups and so on of that state, but you go to hand care because restoring a machine to full service is slower than a plain rebuild to get the software up.

(This view of pets versus cattle is supported by, eg, the discussion here. The author of that email clearly sees the distinction not in how machines are created but in significant part in how machines with problems are treated. If machines are expendable, you have cattle.)

It's my feeling that there are any number of situations where you will naturally wind up with a pet model unless you're operating at a very big scale, but that's another entry.

by cks at March 02, 2015 05:10 AM

March 01, 2015

Chris Siebenmann

Sometimes why we have singleton machines is that failover is hard

One of our single points of failure around here is that we have any number of singleton machines that provide important service, for example DHCP for some of our most important user networks. We build such machines with amenities like mirrored system disks and we can put together a new instance in an hour or so (most of which just goes to copying things to the local disk), but that still means some amount of downtime in the event of a total failure. So why don't we build redundant systems for these things?

One reason is that there's a lot of services where failover and what I'll call 'cohabitation' is not easy. On the really easy side is something like caching DNS servers; it's easy to have two on the network at once and most clients can be configured to talk to both of them. If the first one goes down there will be some amount of inconvenience, but most everyone will wind up talking to the second one without anyone having to do anything. On the difficult side is something like a DHCP server with continually updated DHCP registration. You can't really have two active DHCP servers on the network at once, plus the backup one needs to be continually updated from the master. Switching from one DHCP server to the other requires doing something active, either by hand or through automation (and automation has hazards, like accidental or incomplete failover).

(In the specific case of DHCP you can make this easier with more automation, but then you have custom automation. Other services, like IMAP, are much less tractable for various reasons, although in some ways they're very easy if you're willing to tell users 'in an emergency change the IMAP server name to imap2.cs'.)

Of course this is kind of an excuse. Having a prebuilt second server for many of these things would speed up bringing the service back if the worst came to the worst, even if it took manual intervention. But it's a tradeoff here; prebuilding second servers would require more servers and at least partially complicate how we administer things. It's simpler if we don't wrestle with this and so far our servers have been reliable enough that I can't remember any failures.

(This reliability is important. Building a second server is in a sense a gamble; you're investing up-front effort in the hopes that it will pay off in the future. If there is no payoff because you never need the second server, your effort turns into pure overhead and you may wind up feeling stupid.)

Another part of this is that I think we simply haven't considered building second servers for most of these roles; we've never sat down to consider the pros and cons, to evaluate how many extra servers it would take, to figure out how critical some of these pieces of infrastructure really are, and so on. Some of our passive decisions here were undoubtedly formed at a time when how our networks were used looked different than it does now.

(Eg, it used to be the case that many fewer people brought in their own devices than today; the natural result of this is that a working 'laptop' network is now much more important than before. Similar things probably apply to our wireless network infrastructure, although somewhat less so since users have alternatives in an emergency (such as the campus-wide wireless network).)

by cks at March 01, 2015 04:41 AM

February 28, 2015

Everything Sysadmin

Balitmore area folks: Come see me this Wednesday!

As previously blogged, if you are in the Baltimore area, check out the Baltimore LOPSA chapter meeting ("Crabby Admins") on Wednesday, March 4th when I'll be talking about my new book, The Practice of Cloud System Administration.

Even if you have zero interest in "the cloud", I assure you this talk will be relevant to you.

More info here.

(The meetings are at the office of OmniTI, in Fulton, MD)

February 28, 2015 09:58 PM

Anton Chuvakin - Security Warrior

Chris Siebenmann

Email from generic word domains is usually advance fee fraud spam

One of the patterns I've observed in the email sent to my sinkhole SMTP server is what I'll call the 'generic word domain' one. Pretty much any email that is from an address at any generic word domain (such as '', '', '', or '') is an advance fee fraud spam. It isn't sent from or associated with the actual servers involved in the domain (if there's anything more than a parking web page full of ads), it's just that advanced fee fraud spammers seem to really like using those domains as their MAIL FROM addresses and often (although not always) the 'From:' in their message.

Advance fee fraud spammers use other addresses, of course, and I haven't done enough of a study to see if my collection of them prefers generic nouns, other addresses (eg various free email providers), or just whatever address is attached to the account or email server they're exploiting to send out their spam. I was going to say that I'd seen only a tiny bit of phish spam that used this sort of domain name, but it turns out that a recent cluster of phish spam follows this pattern (using addresses like '', '', and '').

I assume that advance fee fraud spammers are doing this to make their spam sound more official and real, just as they like to borrow the domains of things associated with the particular variant of the scam they're using (eg a spam from someone who claims to be a UN staff member may well be sent from a UN-related domain, or at least from something that sounds like it). I expect that the owners of most of these 'generic word' domains are just using them to collect ad revenues, not email, and so don't particularly care about the email being sent 'from' them.

(Although I did discover while researching this that '' is a real company that may even send email on occasion, rather to my surprise. I suspect that they bought their domain name from the original squatter.)

(This elaborates on a tweet of mine, and is something that I've been noticing for many years.)

by cks at February 28, 2015 04:17 AM

February 27, 2015

The Tech Teapot

Top 3 Cable Tracing Technologies

Firstly, why would you need to trace network cabling? In a perfect world you wouldn’t need to, but even if a network begins life properly labelled, things have a habit of changing. Documentation and cable labelling don’t always keep up when changes are made.

Network Cable Mess

A jumble of network cables in a network cabinet.


When you need to re-arrange the cabling in your patch panel, can you be 100% certain that the label is correct? You can be reasonably certain if you installed and maintain the network yourself, but what if others are involved. Are they as fastidious as you are in keeping the network documentation up to date?

Before cable moves, it doesn’t do any harm to make sure the label is up to date. It is one thing to disconect a single phone or workstation, quite another to disconnect the server.

A number of different technologies exist for tracing network cabling. Each technology has their plus points and their down sides too. A brief explanation of each technology will now follow outlining when each should be applied.

Tone Tracing

Tone tracing is pretty much as old as copper cabling itself. Tone tracing is the grand daddy of all cable tracer technologies. The basic idea is that at one end of the cable you place an electrical signal onto the cable, using a tone generator, and then trace that signal, using a tone tracer, in order to understand where the cable is located.

It could not really be much simpler. Well unfortunately, it is simple in theory, and usually it really is that simple in practice, but there are some things in network cabling especially that manage to make things a little more complex.

The design characteristics of the network cable are working against you. The design of Unshielded Twisted Pair (UTP) cabling is meant to reduce the interference between the pairs of copper wire that make up the cable as a whole. All of CAT5, CAT5e, CAT6 cable types have four pairs of copper wire carefully twisted together to minimise interference. This presents a number of problems to anyone attempting to trace a category 5, 5e or 6 cable because you wish to maximise the signal on the cable in order to increase the strength of the signal you can detect.

The best way to minimise the effective damping effect of the cable twists is to place the signal on a single wire within the cable whilst the other pair is earthed. If a signal is placed onto both pairs in a cable the twists in the cable will work to dampen the signal, placing the signal onto a single wire will help to avoid the dampening effect.

Tone tracers have traditionally been analog. More recently digital or asymmetric tone tracers have arrived onto the market. Asymmetric toners have a number of advantages over more traditional analog tracers.

Tone Generator and Trace Diagram

A diagram showing how a tone generator and tone tracer works.

Tone tracing provides the only cable tracing technology that can be performed on a live cable. More modern asymmetric tone tracers operate at frequencies well above the level that even very modern cable like CAT7 operate. Consequently, the tone signal does not interfere with a network signal.

Locating exactly where the fault lies can be very useful in deciding whether a cable run needs replacing or repairing. A fault close to the end of the cable may be repairable. Conversely, a fault in the middle of the cable would, in all likelihood, require a replacement. A tone tracer can be used to locate a fault, though it is a laborious process tracing each wire until the break is located. A full featured cable tester or TDR tester would be much faster and consequently a much better use of your time.

Continuity Testing

A continuity tester provides the ideal way to locate and label your network cabling. Whilst a toner can only work one cable at a time, a continuity tester can locate up to 20 cables at a time.

Each continuity tester is supplied with at least one remote. The remote fixes onto one end of your cable and you place the tester itself on the other end. When the tester and the remote are connected to the same cable, the continuity tester will show the number of the connected remote. Additional remotes can usually be supplied for most continuity testers or are included in the kit form of the tester. The additional remotes are numbered sequentially allowing you to locate and label a batch of cables at a time. If you have a large number of cables to locate, this can be a real time saver and will save you a lot of leg work.

In addition, a continuity tester is also capable of simple cable testing ensuring that all of the pairs of wires have no breaks and are connected correctly. It must be stressed that low end continuity testers can be fooled into giving a positive result when the cable is incorrectly wired.

Most continuity testers have a tone generator built in, so, with the addition of a tone tracer, can be used for tone tracing as well. Using the built in tone generator on a continuity tester can save you from carrying around a tool, saving you space and weight in your kit bag.

Hub Blink

Most modern hubs and switches have activity lights for each port indicating the traffic level and status. A recent addition to many cable testers and outlet identifiers has been the ability to blink the lights on a port. This feature is called hub blink.

Of course, this feature is only useful if the cables you wish to locate are connected to a live network. Hub blink is completely useless if you are trying to locate bare wire cables or before the network infrastructure has been installed.


Which technology you find a best fit is largely dictated by a few factors like how many cables you need to track and whether the cables are live.

If you are tracking cables that are not terminated, your options are traditional tone generator and tracer or continuity tester. Ditto if the cable is live, your only option is to use an asymmetric toner.

Hub blinking is also fine so long as your switch is relatively small. If you’ve got a huge switch with hundreds of ports, you may well struggle to identify exactly which port is blinking.

If you have a lot of cables to find, then a continuity tester with multiple remotes will allow you to identify cables in batches, speeding up the process of identification and labelling.


by Jack Hughes at February 27, 2015 05:39 PM

Chris Siebenmann

What limits how fast we can install machines

Every so often I read about people talking about how fast they can get new machines installed and operational, generally in the context of how some system management framework or another accelerates the whole process. This has always kind of amused me, not because our install process is particularly fast but instead because of why it's not so fast:

The limit on how fast we install machines is how fast they can unpack packages to the local disk.

That's what takes almost all of the time; fetching (from a local mirror or the install media) and then unpacking a variegated pile of Ubuntu packages. A good part of this is the media speed of the install media, some of this is write speed to the system's disks, and some of this is all of the fiddling around that dpkg does in the process of installing packages, running postinstall scripts, and so on. The same thing is true of installing CentOS machines, OmniOS machines, and so on; almost all of the time is in the system installer and packaging system. What framework we wrap around this doesn't matter because we spend almost no time in said framework or doing things by hand.

The immediate corollary to this is that the only way to make any of our installs go much faster would be to do less work, ranging from installing fewer packages to drastic approaches where we reduce our 'package installs' towards 'unpack a tarball' (which would minimize package manager overhead). There are probably ways to do approach this, but again they have relatively little to do with what system install framework we use.

(I think part of the slowness is simply package manager overhead instead of raw disk IO speed limits. But this is inescapable unless we somehow jettison the package manager entirely.)

Sidebar: an illustration of how media speeds matter

Over time I've observed that both installs in my testing virtual machines and installs using the virtual DVDs provided by many KVM over IP management processors are clearly faster than installs done from real physical DVDs plugged into the machine. I've always assumed that this is because reading a DVD image from my local disk is faster than doing it from a real DVD drive (even including any KVM over IP virtual device network overhead).

by cks at February 27, 2015 06:25 AM

February 26, 2015

Chris Siebenmann

My current issues with systemd's networkd in Fedora 21

On the whole I'm happy with my switch to systemd-networkd, which I made for reasons covered here; my networking works and my workstation boots faster. But right now there are some downsides and limitations to networkd, and in the interests of equal time for the not so great bits I feel like running down them today. I covered some initial issues in my detailed setup entry; the largest one is that there is no syntax checker for the networkd configuration files and networkd itself doesn't report anything to the console if there are problems. Beyond that we get into a collection of operational issues.

What I consider the largest issue with networkd right now is that it's a daemon (as opposed to something that runs once and stops) but there is no documented way of interacting with it while it's running. There are two or three sides to this: information, temporary manipulation, and large changes. On the information front, networkd exposes no good way to introspect its full running state, including what network devices it's doing what to, or to wait for it to complete certain operations. On the temporary manipulation front, there's no way I know of to tell networkd to temporarily take down something and then later bring it back (the equivalent of ifdown and ifup). Perhaps you're supposed to do those with manual commands outside of networkd. Finally, on more permanent changes, if you add or remove or modify a configuration file in /etc/systemd/network and want networkd to notice, well, I don't know how you do that. Perhaps you restart networkd; perhaps you shut networkd down, modify things, and restart it; perhaps you reboot your machine. Perhaps networkd notices some changes on its own.

(Okay, it turns out that there's a networkctl command that queries some information from networkd, although it's not actually documented in the Fedora 21 version of systemd. This still doesn't allow you to poke networkd to do various operations.)

This points to a broader issue: there's a lot about networkd that's awfully underdocumented. I should not have to wonder about how to get networkd to notice configuration file updates; the documentation should tell me one way or another. As I write this the current systemd 219 systemd-networkd manpage is a marvel of saying very litte, and there's also omissions and lack of clarity in the manpages for the actual configuration files. All told networkd's documentation is not up to the generally good systemd standards.

The next issue is that networkd has forgotten everything that systemd learned about the difference between present configuration files and active configuration files. To networkd those are one and the same; if you have a file in /etc/systemd/network, it is live. Want it not to be live? Better move it out of the directory (or edit it, although there is no explicit 'this is disabled' option you can set). Want to override something in /usr/lib/systemd/network? I'm honestly not sure how you'd do that short of removing it or editing it. This is an unfortunate step backwards.

(It's also a problem in some situations where you have multiple configurations for a particular port that you want to swap between. In Fedora's static configuration world you can have multiple ifcfg-* files, all with ONBOOT=no, and then ifup and ifdown them as you need them; there is no networkd equivalent.)

I'm not going to count networkd's lack of general support for 'wait for specific thing <X> to happen' as an issue. But it certainly would be nice if systemd-networkd-wait-online was more generic and so could be more easily reused for various things.

I do think (as mentioned) that some of networkd's device and link configuration is unnecessarily tedious and repetitive. I see why it happened, but it's the easy way instead of the best way. I hope that it can be improved and I think that it can be. In theory I think you could go as far as optionally merging .link files with .network files to cover many cases much simpler, as the sections in each file today basically don't clash with each other.

In general I certainly hope that all of these issues will get better over time, although some of them will inevitably make networkd more complicated. Systemd's network configuration support is relatively young and I'm willing to accept some rough edges under the circumstances. I even sort of accept that networkd's priority right now probably needs to be supporting more types of networking instead of improving the administration experience, even if it doesn't make me entirely happy (but I'm biased, as my needs are already met there).

(To emphasize, my networkd issues are as of the state of networkd in Fedora 21, which has systemd 216, with a little bit of peeking at the latest systemd 219 documentation. In a year the situation may look a lot different, and I sure hope it does.)

by cks at February 26, 2015 04:05 AM

February 25, 2015

Chris Siebenmann

My Linux container temptation: running other Linuxes

We use a very important piece of (commercial) software that is only supported on Ubuntu 10.04 and RHEL/CentOS 6, not anything later (and it definitely doesn't work on Ubuntu 12.04, we've tried that). It's currently on a 10.04 machine but 10.04 is going to go out of support quite soon. The obvious alternative is to build a RHEL 6 machine, except I don't really like RHEL 6 and it would be our sole RHEL 6 host (well, CentOS 6 host, same thing). All of this has led me to a temptation, namely Linux containers. Specifically, using Linux containers to run one Linux as the host operating system (such as Ubuntu 14.04) while providing a different Linux to this software.

(In theory Linux containers are sort of overkill and you could do most or all of what we need in a chroot install of CentOS 6. In practice it's probably easier and surer to set up an actual container.)

Note that I specifically don't want something like Docker, because the Docker model of application containers doesn't fit how the software natively works; it expects an environment with cron and multiple processes and persistent log files it writes locally and so on and so forth. I just want to provide the program with the CentOS 6 environment it needs to not crash without having to install or actually administer a CentOS 6 machine more than a tiny bit.

Ubuntu 14.04 has explicit support for LXC with documentation and appears to support CentOS containers, so that's the obvious way to go for this. It's certainly a tempting idea; I could play with some interesting new technology while getting out of dealing with a Linux that I don't like.

On the other hand, is it a good idea? This is certainly a lot of work to go to in order to avoid most of running a CentOS 6 machine (I think we'd still need to watch for eg CentOS glibc security updates and apply them). Unless we make more use of containers later, it would also leave us with a unique and peculiar one-off system that'll require special steps to administer. And virtualization has failed here before.

(I'd feel more enthused about this if I thought we had additional good uses for containers, but I don't see any other ones right now.)

by cks at February 25, 2015 06:40 AM

Openstack Glance Image Download, download Openstack images

This guide shows you how download Openstack Images to your local machine using the command line Glance client. You can use this, for example, to download a copy of an image created from a VM, or to download the images your Openstack provider provides and adapt those.

February 25, 2015 12:00 AM

February 24, 2015

Everything Sysadmin

Blackbox now available via "MacPorts"

My open source project BlackBox is now available in the MacPorts collection. If you use MacPorts, simply type "sudo port install vcs_blackbox". There was already a package called "blackbox" so I had to call it something else.

Blackbox is a set of bash scripts that let you safely store secrets in a VCS repo (i.e. Git, Mercurial, or Subversion) using Gnu Privacy Guard (GPG). For more info, visit the homepage:

I'm looking for volunteers to maintain packages for Brew, Debian, and other package formats. If you are looking to learn how to make packages, this is a good starter project and will help people keep their files secure. Interested? Contact me by email open an issue in Github.

February 24, 2015 04:07 PM



In an effort to better understand the challenges facing the ops team of a particular project here at $DayJob, a project manager asked this question:

How many users per [sysadmin] can our system support?

The poor lead sysadmin on that side of the house swiveled her chair over and said to me, "there is no answer to this question!" And we had a short but spirited discussion about the various ratios to admin staff at the places we've been. Per-user is useless, we agreed. Machine/Instance count per admin? Slightly better. But even then. Between us we compiled a short list of places we've been and places we've read about.

  • Company A: 1000:1 And most of that 1 FTE was parts-monkey to keep the install-base running. The engineer to system ratio was closer to 10K:1. User count: global internet
  • Company B: 200:1 Which was desperately understaffed, as the ops team was frantically trying to keep up with a runaway application and a physical plant that was rotting under the load. User count: most of the US.
  • Company C: 150:1 Which was just right! User count: none, it was a being developed product.
  • Company D: 60:1 And the admin was part-time because there wasn't enough work. User count: 200
  • Company E: 40:1 Largely because 25-30 of those 40 systems were one-offs. It was a busy team. Monocultures are for wimps. User count: 20K.

This chart was used to explain to the project manager in question the "it depends" nature of admin staffing levels, and you can't rely on industry norms to determine the target we should be hitting. Everyone wants to be like Company A. Almost no one gets there.

What are the ratios you've worked with? Let me know @sysadm1138

by SysAdmin1138 at February 24, 2015 12:36 PM

Standalone Sysadmin

Overly-Late Notice: Cascadia IT Conference, March 13-14 in Seattle

It's way too late for this notice, but I can't forget to mention that Cascadia IT Conference 2015 is coming up very, very soon!

It's being held March 13-14 in Seattle Washington at the Hotel Deca.

As always, lots of great tutorials and tech program contents. If you're anywhere near the Pacific North-west, make sure to check it out today!

by Matt Simmons at February 24, 2015 12:30 PM

Chris Siebenmann

How we do and document machine builds

I've written before about our general Ubuntu install system and I've mentioned before that we have documented build procedures but we don't really automate them. But I've never discussed how we do reproducible builds and so on. Basically we do them by hand, but we do them systematically.

Our Ubuntu login and compute servers are essentially entirely built through our standard install system. For everything else, the first step is a base install with the same system. As part of this base install we make some initial choices, like what sort of NFS mounts this machine will have (all of them, only our central administrative filesystem, etc).

After the base install we have a set of documented additional steps; almost all of these steps are either installing additional packages or copying configuration files from that central filesystem. We try to make these steps basically cut and paste, often with the literal commands to run interlaced with an explanation of what they do. An example is:

* install our Dovecot config files:
     cd /etc/dovecot/conf.d/
     rsync -a /cs/site/machines/aviary/etc/dovecot/conf.d/*.conf .

Typically we do all of this over a SSH connection, so we are literally cutting and pasting from the setup documentation to the machine.

(In theory we have a system for automatically installing additional Ubuntu packages only on specific systems. In practice there are all sorts of reasons that this has wound up relatively disused; for example it's tied to the hostname of what's being installed and we often install new versions of a machine under a different hostname. Since machines rarely have that many additional packages installed, we've moved away from preconfigured packages in favour of explicitly saying 'install these packages'.)

We aren't neurotic about doing everything with cut and paste; sometimes it's easier to describe an edit to do to a configuration file in prose rather than to try to write commands to do it automatically (especially since those are usually not simple). There can also be steps like 'recover the DHCP files from backups or copy them from the machine you're migrating from', which require a bit of hand attention and decisions based on the specific situation you're in.

(This setup documentation is also a good place to discuss general issues with the machine, even if it's not strictly build instructions.)

When we build non-Ubuntu machines the build instructions usually follow a very similar form: we start with 'do a standard base install of <OS>' and then we document the specific customizations for the machine or type of machine; this is what we do for our OpenBSD firewalls and our CentOS based iSCSI backends. Setup of our OmniOS fileservers is sufficiently complicated and picky that a bunch of it is delegated to a couple of scripts. There's still a fair number of by-hand commands, though.

In theory we could turn any continuous run of cut and paste commands into a shell script; for most machines this would probably cover at least 90% of the install. Despite what I've written in the past, doing so would have various modest advantages; for example, it would make sure that we would never skip a step by accident. I don't have a simple reason for why we don't do it except 'it's never seemed like that much of an issue', given that we build and rebuild this sort of machine very infrequently (generally we build them once every Ubuntu version or every other Ubuntu version, as our servers generally haven't failed).

(I think part of the issue is that it would be a lot of work to get a completely hands-off install for a number of machines, per my old entry on this. Many machines have one or two little bits that aren't just running cut & paste commands, which means that a simple script can't cover all of the install.)

by cks at February 24, 2015 07:20 AM

RISKS Digest

February 23, 2015

Everything Sysadmin

Usenix SREcon15: Now with more con!

Usenix has announced the schedule for the second SREcon and the big surprise is that it is now 2 days long. The previous SREcon was a single day.

I wasn't able to attend last year's conference but I read numerous conference reports that were all enthusiastic about the presentations (you can see them online... I highly recommend the keynote).

I'm excited to also announce that my talk proposal was accepted. It is a case study of our experiences adopting SRE techniques at The full description is here.

I've heard the hotel is nearly full (or full), so register fast and book your room faster. More info about the conference is on the Usenix web site:

See you there!

February 23, 2015 03:00 PM

LZone - Sysadmin

Puppet: List Changed Files

If you want to know which files where changed by puppet in the last days:

cd /var/lib/puppet
for i in $(find clientbucket/ -name paths); do
	echo "$(stat -c %y $i | sed 's/\..*//')       $(cat $i)";
done | sort -n

will give you an output like

2015-02-10 12:36:25       /etc/resolv.conf
2015-02-17 10:52:09       /etc/bash.bashrc
2015-02-20 14:48:18       /etc/snmp/snmpd.conf
2015-02-20 14:50:53       /etc/snmp/snmpd.conf

by Lars Windolf at February 23, 2015 02:39 PM