Planet SysAdmin

June 30, 2016

Everything Sysadmin

LISA Conversations Episode 11: Russell Pavlicek on "Unleashing the Power of the Unikernel"

Episode 11 of LISA Conversations is Russell Pavlicek, who presented Unleashing the Power of the Unikernel at LISA '15.

You won't want to miss this!

by Tom Limoncelli at June 30, 2016 06:00 PM

Chris Siebenmann

What makes a email MIME part an attachment?

If you want to know what types of files your users are getting in email, it's likely that an important prerequisite is being able to recognize attachments in the first place. In a sane and sensible world, this would be easy; it would just be any MIME part with a Content-Disposition header of attachment.

I regret to tell you that this is not a sane world. There are mail clients that give every MIME part an inline Content-Disposition, so naturally this means that most mail clients can't trust an inline C-D and make their attachment versus non-attachment decisions based on other things. (I expect that there are mail clients that ignore a C-D of attachment, too, and will display some of those parts inline if they feel like it, but for logging we don't care much about that.)

MIME parts may have (proposed, nominal) filenames associated with them, from either Content-Type or Content-Disposition. However, neither the presence nor the absence of a MIME filename determines something's attachment status. Real attachments may have no proposed filename, and there are mail clients that attach filenames to things like inline images. And really, I can't argue with them; if the user told you that this (inline) picture is mydog.jpg, you're certainly within your rights to pass this information on in the MIME headers.

The MIME Content-Type provides at least hints, in that you can probably assume that most mail clients will treat things with any application/* C-T as attachments and not try to show them inline. And if you over-report here (logging information on 'attachments' that will really be shown inline), it's relatively harmless. It's possible that mail clients do some degree of content sniffing, so the C-T is not necessarily going to determine how a mail client processes a MIME part.

(At one point web browsers were infamous for being willing to do content sniffing on HTTP replies, so that what you served as eg text/plain might not be interpreted that way by some browsers. One can hope that mail clients are more sane, but I'm not going to hold my breath there.)

One caution here: trying to make decisions based on things having specific Content-Type values is a mug's game. For example, if you're trying to pick out ZIP files based on them having a C-T of application/zip, you're going to miss a ton of them; actual real email has ZIP files with all sorts of MIME types (including the catch-all value of application/octet-stream). My impression is that the most reliable predictor of how a mail client will interpret an attachment is actually the extension of its MIME filename.

(While the gold standard for figuring out if something is a ZIP file or whatever is actually looking at the data for the MIME part, please don't use file (or libmagic) for general file classification.)

One solution is certainly to just throw up our hands and log everything; inline, attachment, whatever, just log it all and we can sort it out later. The drawback on this is that it's going to be pretty verbose, even if you exclude inline text/plain and text/html, since lots of email comes with things like attached images and so on.

The current approach I'm testing is to use a collection of signs to pick out attachment-like things, with some heuristics attached. Does a MIME part declare a MIME filename ending in .zip? Then we'll log some information about it. Ditto if it has a Content-Disposition of attachment, ditto if it has a Content-Type of application/*, and so on. I'm probably logging information about some things that mail clients will display inline, but it's better than logging too little and missing things.

(Then there is the fun game of deciding how to exclude frequently emailed attachment types that you don't care about because you'll never be able to block them, like PDFs. Possibly the answer is to log them anyways just so you know something about the volume, rather than to try to be clever.)

by cks at June 30, 2016 05:04 AM

June 29, 2016

Chris Siebenmann

Modern DNS servers (especially resolvers) should have query logging

Since OpenBSD has shifted to using Unbound as their resolving DNS server, we've been in the process of making this shift ourselves as we upgrade, for example, our local OpenBSD-based resolver machines. One of the things this caused me to look into again is what Unbound offers for logging, and this has made me just a little bit grumpy.

So here is my opinion:

Given the modern Internet environment, every DNS server should be capable of doing compact query logging.

By compact query logging, I mean something that logs a single line with the client IP, the DNS lookup, and the resolution result for each query. This logging is especially important for resolving nameservers, because they're the place where you're most likely to want this data.

(How the logging should be done is an interesting question. Sending it to syslog is probably the easiest way; the best is probably to provide a logging plugin interface.)

What you want this for is pretty straightforward: you want to be able to spot and find compromised machines that are trying to talk to their command & control nodes. These machines leave traces in their DNS traffic, so you can use logs of that traffic to try to pick them out (either at the time or later, as you go back through log records). Sometimes what you want to know and search is the hosts and domains being looked up; other times, you want to search and know what IP addresses are coming back (attackers may use fast-flux host names but point them all at the same IPs).

(Quality query logging will allow you relatively fine grained control over what sort of queries from who get logged. For example, you might decide that you're only interested in successful A record lookups and then only for outside domains, not your own ones.)

Query logging for authoritative servers is probably less useful, but I think that it should still be included. You might not turn it on for your public DNS servers, but there are other cases such as internal DNS.

As for Unbound, it can sort of do query logging but it's rather verbose. Although I haven't looked in detail, it seems to be just the sort of thing you'd want when you have to debug DNS name resolution problems, but not at all what you want to deal with if you're trying to do this sort of DNS query monitoring.

by cks at June 29, 2016 05:17 AM

Racker Hacker

Talk recap: The friendship of OpenStack and Ansible

When flexibility met simplicity: The friendship of OpenStack and AnsibleThe 2016 Red Hat Summit is underway in San Francisco this week and I delivered a talk with Robyn Bergeron earlier today. Our talk, When flexibility met simplicity: The friendship of OpenStack and Ansible, explained how Ansible can reduce the complexity of OpenStack environments without sacrificing the flexibility that private clouds offer.

The talk started at the same time as lunch began and the Partner Pavilion first opened, so we had some stiff competition for attendees’ attention. However, the live demo worked without issues and we had some good questions when the talk was finished.

This post will cover some of the main points from the talk and I’ll share some links for the talk itself and some of the playbooks we ran during the live demo.

IT is complex and difficult

Getting resources for projects at many companies is challenging. OpenStack makes this a little easier by delivering compute, network, and storage resources on demand. However, OpenStack’s flexibility is a double-edged sword. It makes it very easy to obtain virtual machines, but it can be challenging to install and configure.

Ansible reduces some of that complexity without sacrificing flexibility. Ansible comes with plenty of pre-written modules that manage an OpenStack cloud at multiple levels for multiple types of users. Consumers, operators, and deployers can save time and reduce errors by using these modules and providing the parameters that fit their environment.

Ansible and OpenStack

Ansible and OpenStack are both open source projects that are heavily based on Python. Many of the same dependencies needed for Ansible are needed for OpenStack, so there is very little additional software required. Ansible tasks are written in YAML and the user only needs to pass some simple parameters to an existing module to get something done.

Operators are in a unique position since they can use Ansible to perform typical IT tasks, like creating projects and users. They can also assign fine-grained permissions to users with roles via reusable and extensible playbooks. Deployers can use projects like OpenStack-Ansible to deploy a production-ready OpenStack cloud.

Let’s build something

In the talk, we went through a scenario for a live demo. In the scenario, the marketing team needed a new website for a new campaign. The IT department needed to create a project and user for them, and then the marketing team needed to build a server. This required some additional tasks, such as adding ssh keys, creating a security group (with rules) and adding a new private network.

The files from the live demo are up on GitHub:

In the operator-prep.yml, we created a project and added a user to the project. That user was given the admin role so that the marketing team could have full access to their own project.

From there, we went through the tasks as if we were a member of the marketing team. The marketing.yml playbook went through all of the tasks to prepare for building an instance, actually building the instance, and then adding that instance to the dynamic inventory in Ansible. That playbook also verified the instance was up and performed additional configuration of the virtual machine itself — all in the same playbook.

What’s next?

Robyn shared lots of ways to get involved in the Ansible community. AnsibleFest 2016 is rapidly approaching and the OpenStack Summit in Barcelona is happening this October.


The presentation is available in a few formats:

The post Talk recap: The friendship of OpenStack and Ansible appeared first on

by Major Hayden at June 29, 2016 03:43 AM

June 28, 2016

Everything Sysadmin

Watch us live today! LISA Conversations Episode 11: Russell Pavlicek on "Unleashing the Power of the Unikernel"

Today (Tuesday, June 28, 2016) we'll be recording episode #11 of LISA Conversations. Join the Google Hangout and submit questions live via this link.

Our guest will be Russell Pavlicek. We'll be discussing his talk Unleashing the Power of the Unikernel from LISA '15.

  • The video we'll be discussing:

    • Unleashing the Power of the Unikernel
    • Russell Pavlicek
    • Recorded at LISA '15
    • Talk Description
  • Watch us record the episode live!

The recorded episode will be available shortly afterwards on YouTube.

You won't want to miss this!

by Tom Limoncelli at June 28, 2016 03:00 PM


Latest PhD Thesis Title and Abstract

In January I posted Why a War Studies PhD? I recently decided to revise my title and abstract to include attention to both offensive and defensive aspects of intrusion campaigns.

I thought some readers might be interested in reading about my current plans for the thesis, which I plan to finish and defend in early 2018.

The following offers the title and abstract for the thesis.

Network Intrusion Campaigns: Operational Art in Cyberspace 

Campaigns, Not Duels: The Operational Art of Cyber Intrusions*

Intruders appear to have the upper hand in cyberspace, eroding users' trust in networked organizations and the data that is their lifeblood. Three assumptions prevail in the literature and mainstream discussion of digital intrusions. Distilled, these assumptions are that attacks occur at blinding speed with immediate consequences, that victims are essentially negligent, and that offensive initiative dominates defensive reaction. 
This thesis examines these assumptions through two research questions. First, what characterizes network intrusions at different levels of war? Second, what role does operational art play in network intrusion campaigns? 
By analyzing incident reports and public cases, the thesis refutes the assumptions and leverages the results to improve strategy.  
The thesis reveals that strategically significant attacks are generally not "speed-of-light" events, offering little chance for recovery.  Digital defenders are hampered by a range of constraints that reduce their effectiveness while simultaneously confronting intruders who lack such restrictions. Offense does not necessarily overpower defense, constraints notwithstanding, so long as the defenders conduct proper counter-intrusion campaigns. 
The thesis structure offers an introduction to the subject, and an understanding of cybersecurity challenges and trade-offs. It reviews the nature of digital intrusions and the levels of war, analyzing the interactions at the levels of tools/tactics/technical details, operations and campaigns, and strategy and policy. The thesis continues by introducing historical operational art, applying lessons from operational art to network intrusions, and applying lessons from network intrusions to operational art. The thesis concludes by analyzing the limitations of operational art in evolving digital environments.

*See the post Updated PhD Thesis Title for details on the new title.

by Richard Bejtlich ( at June 28, 2016 01:42 PM

Sean's IT Blog

Horizon 7.0 Part 5–SSL Certificates

SSL certificates are an important part of all Horizon environments .  They’re used to secure communications from client to server as well as between the various servers in the environment.  Improperly configured or maintained certificate authorities can bring an environment to it’s knees – if a connection server cannot verify the authenticity of a certificate – such as an expired revocation list from an offline root CA, communications between servers will break down.  This also impacts client connectivity – by default, the Horizon client will not connect to Connection Servers, Security Servers, or Access Points unless users change the SSL settings.

Most of the certificates that you will need for your environment will need to be minted off of an internal certificate authority.  If you are using a security server to provide external access, you will need to acquire a certificate from a public certificate authority.  If you’re building a test lab or don’t have the budget for a commercial certificate from one of the major certificate providers, you can use a free certificate authority such as StartSSL or a low-cost certificate provider such as NameCheap. 


Before you can begin creating certificates for your environment, you will need to have a certificate authority infrastructure set up.  Microsoft has a great 2-Tier PKI walkthrough on TechNet. 

Note: If you use the walkthrough to set up your PKI environment., you will need to alter the configuration file to remove the  AlternateSignatureAlgorithm=1 line.  This feature does not appear to be supported on vCenter  and can cause errors when importing certificates.

Once your environment is set up, you will want to create a template for all certificates used by VMware products.  Derek Seaman has a good walkthrough on creating a custom VMware certificate template. 

Note: Although a custom template isn’t required, I like to create one per Derek’s instructions so all VMware products are using the same template.  If you are unable to do this, you can use the web certificate template for all Horizon certificates.

Creating The Certificate Request

Horizon 7.0 handles certificates on the Windows Server-based components the same way as Horizon 6.x and Horizon 5.3.  Certificates are stored in the Windows certificate store, so the best way of generating certificate requests is to use the certreq.exe certificate tool.  This tool can also be used to submit the request to a local certificate authority and accept and install a certificate after it has been issued.

Certreq.exe can use a custom INF file to create the certificate request.  This INF file contains all of the parameters that the certificate request requires, including the subject, the certificate’s friendly name, if the private key can be exported, and any subject alternate names that the certificate requires.

If you plan to use Subject Alternate Names on your certificates, I highly recommend reviewing this article from Microsoft.  It goes over how to create a certificate request file for SAN certificates.

A  certificate request inf file that you can use as a template is below.  To use this template, copy and save the text below into a text file, change the file to match your environment, and save it as a .inf file.

;----------------- request.inf -----------------

Signature="$Windows NT$"


Subject = "CN=<Server Name>, OU=<Department>, O=<Company>, L=<City>, S=<State>, C=<Country>" ; replace attribues in this line using example below
KeySpec = 1
KeyLength = 2048
; Can be 2048, 4096, 8192, or 16384.
; Larger key sizes are more secure, but have
; a greater impact on performance.
Exportable = TRUE
FriendlyName = "vdm"
MachineKeySet = TRUE
SMIME = False
PrivateKeyArchive = FALSE
UserProtected = FALSE
UseExistingKeySet = FALSE
ProviderName = "Microsoft RSA SChannel Cryptographic Provider"
ProviderType = 12
RequestType = PKCS10
KeyUsage = 0xa0


OID= ; this is for Server Authentication

[Extensions] = "{text}"
_continue_ = "dns=<DNS Short Name>&"
_continue_ = "dns=<Server FQDN>&"
_continue_ = "dns=<Alternate DNS Name>&"


CertificateTemplate = VMware-SSL


Note: When creating a certificate, the state or province should not be abbreviated.  For instance, if you are in Wisconsin, the full state names should be used in place of the 2 letter state postal abbreviation.

Note:  Country names should be abbreviated using the ISO 3166 2-character country codes.

The command the generate the certificate request is:

certreq.exe –New <request.inf> <certificaterequest.req>

Submitting the Certificate Request

Once you have a certificate request, it needs to be submitted to the certificate authority.  The process for doing this can vary greatly depending on the environment and/or the third-party certificate provider that you use. 

If your environment allows it, you can use the certreq.exe tool to submit the request and retrieve the newly minted certificate.  The command for doing this is:

certreq –submit -config “<ServerName\CAName>” “<CertificateRequest.req>” “<CertificateResponse.cer>

If you use this method to submit a certificate, you will need to know the server name and the CA’s canonical name in order to submit the certificate request.

Accepting the Certificate

Once the certificate has been generated, it needs to be imported into the server.  The import command is:

certreq.exe –accept “<CertificateResponse.cer>

This will import the generated certificate into the Windows Certificate Store.

Using the Certificates

Now that we have these freshly minted certificates, we need to put them to work in the Horizon environment.  There are a couple of ways to go about doing this.

1. If you haven’t installed the Horizon Connection Server components on the server yet, you will get the option to select your certificate during the installation process.  You don’t need to do anything special to set the certificate up.

2. If you have installed the Horizon components, and you are using a self-signed certificate or a certificate signed from a different CA, you will need to change the friendly name of the old certificate and restart the Connection Server or Security Server services.

Horizon requires the Connection Server certificate to have a friendly name value of vdm.  The template that is posted above sets the friendly name of the new certificate to vdm automatically, but this will conflict with any existing certificates. 

Friendly Name

The steps for changing the friendly name are:

  1. Go to Start –> Run and enter MMC.exe
  2. Go to File –> Add/Remove Snap-in
  3. Select Certificates and click Add
  4. Select Computer Account and click Finish
  5. Click OK
  6. Right click on the old certificate and select Properties
  7. On the General tab, delete the value in the Friendly Name field, or change it to vdm_old
  8. Click OK
  9. Restart the Horizon service on the server


Certificates and Horizon Composer

Unfortunately, Horizon Composer uses a different method of managing certificates.  Although the certificates are still stored in the Windows Certificate store, the process of replacing Composer certificates is a little more involved than just changing the friendly name.

The process for replacing or updating the Composer certificate requires a command prompt and the SVIConfig tool.  SVIConfig is the Composer command line tool.  If you’ve ever had to remove a missing or damaged desktop from your Horizon environment, you’ve used this tool.

The process for replacing the Composer certificate is:

  1. Open a command prompt as Administrator on the Composer server
  2. Change directory to your VMware Horizon Composer installation directory
    Note: The default installation directory is C:\Program Files (x86)\VMware\VMware View Composer
  3. Run the following command: sviconfig.exe –operation=replacecertificate –delete=false
  4. Select the correct certificate from the list of certificates in the Windows Certificate Store
  5. Restart the Composer Service


A Successful certificate swap

At this point, all of your certificates should be installed.  If you open up the Horizon Administrator web page, the dashboard should have all green lights.  If you do not see all green lights, you may need to check the health of your certificate environment to ensure that the Horizon servers can check the validity of all certificates and that a CRL hasn’t expired.

If you are using a certificate signed on an internal CA for servers that your end users connect to, you will need to deploy your root and intermediate certificates to each computer.  This can be done through Group Policy for Windows computers or by publishing the certificates in the Active Directory certificate store.  If you’re using Teradici PCoIP Zero Clients, you can deploy the certificates as part of a policy with the management VM.  If you don’t deploy the root and intermediate certificates, users will not be able to connect without disabling certificate checking in the Horizon client.

Access Point Certificates

Unlike Connection Servers and Composer, Horizon Access Point does not run on Windows.  It is a Linux-based virtual appliance, and it requires a different certificate management technique.  The certificate request and private key file for the Access Point should be generated with OpenSSL, and the certificate chain needs to be concatenated into a single file.  The certificate is also installed during, or shortly after, deployment using Chris Halstead’s fling or the appliance REST API. 

Because this is an optional component, I won’t go into too much detail on creating and deploying certificates for the Access Point in this section.  I will cover that in more detail in the section on deploying the Access Point appliance later in the series.

by seanpmassey at June 28, 2016 01:00 PM

Chris Siebenmann

Today's lesson on the value of commenting your configuration settings

We have a relatively long-standing Django web application that was first written for Django 1.2 and hadn't been substantially revised since. Earlier this year I did a major rework in order to fully update it for Django 1.9; not just to be compatible, but to be (re)structured into the way that Django now wants apps to be set up.

Part of Django's app structure is a file that contains, well, all sorts of configuration settings for your system; you normally get your initial version of this by having Django create it for you. What Django wants you to have in this file and how it's been structured has varied over Django versions, so if you have a five year old app its file can look nothing like what Django would now create. Since I was doing a drastic restructuring anyways, I decided to deal with this issue the simple way. I'd have Django write out a new stock file for me, as if I was starting a project from scratch, and then I would recreate all of the settings changes we needed. In the process I would drop any settings that were now quietly obsolete and unnecessary.

(Since the settings file is just ordinary Python, it's easy to wind up setting 'configuration options' that no longer exist. Nothing complains that you have some extra variables defined, and in fact you're perfectly free to define your own settings that are used only by your app, so Django can't even tell.)

In the process of this, I managed to drop (well, omit copying) the ADMINS setting that makes Django send us email if there's an uncaught exception (see Django's error reporting documentation). I didn't spot this when we deployed the new updated version of the application (I'm not sure I even remembered this feature). I only discovered the omission when Derek's question here sent me looking at our configuration file to find out just what we'd set and, well, I discovered that our current version didn't have anything. Oops, as they say.

Looking back at our old, I'm pretty certain that I omitted ADMINS simply because it didn't have any comments around it to tell me what it did or why it was there. Without a comment, it looked like something that old versions of Django set up but new versions didn't need (and so didn't put into their stock Clearly if I'd checked what if anything ADMINS meant in Django 1.9 I'd have spotted my error, but, well, people take shortcuts, myself included.

(Django does have global documentation for settings, but there is no global alphabetical index of settings so you can easily see what is and isn't a Django setting. Nor are settings grouped into namespaces to make it clear what they theoretically affect.)

This is yet another instance of some context being obvious to me at the time I did something but it being very inobvious to me much later. I'm sure that when I put the ADMINS setting into the initial I knew exactly what I was doing and why, and it was so obvious I didn't think it needed anything. Well, it's five years later and all of that detail fell out of my brain and here I am, re-learning this lesson yet again.

(When I put the ADMINS setting and related bits and pieces back into our, you can bet that I added a hopefully clear comment too. Technically it's not in, but that's a topic for another blog entry.)

by cks at June 28, 2016 02:49 AM

June 27, 2016


Undefined Pointer Arithmetic

In commits a004e72b9 (1.0.2) and 6f35f6deb (1.0.1) we released a fix for CVE-2016-2177. The fix corrects a common coding idiom present in OpenSSL 1.0.2 and OpenSSL 1.0.1 which actually relies on a usage of pointer arithmetic that is undefined in the C specification. The problem does not exist in master (OpenSSL 1.1.0) which refactored this code some while ago. This usage could give rise to a low severity security issue in certain unusual scenarios. The OpenSSL security policy ( states that we publish low severity issues directly to our public repository, and they get rolled up into the next release whenever that happens. The rest of this blog post describes the problem in a little more detail, explains the scenarios where a security issue could arise and why this issue has been rated as low severity.

The coding idiom we are talking about here is this one:

if (p + len > limit)
    return; /* Too long */

Where p points to some malloc’d data of SIZE bytes and limit == p + SIZE. len here could be from some externally supplied data (e.g. from a TLS message).

The idea here is that we are performing a length check on peer supplied data to ensure that len (received from the peer) is actually within the bounds of the supplied data. The problem is that this pointer arithmetic is only defined by the C90 standard (which we aim to conform to) if len <= SIZE, so if len is too long then this is undefined behaviour, and of course (theoretically) anything could happen. In practice of course it usually works the way you expect it to work and there is no issue except in the case where an overflow occurs.

In order for an overflow to occur p + len would have to be sufficiently large to exceed the maximum addressable location. Recall that len here is received from the peer. However in all the instances that we fixed it represents either one byte or two bytes, i.e. its maximum value is 0xFFFF. Assuming a flat 32 bit address space this means that if p < 0xFFFF0000 then there is no issue, or putting it another way, approx. 0.0015% of all conceivable addresses are potentially “at risk”.

Most architectures do not have the heap at the edge of high memory. Typically, one finds:

    0: Reserved low-memory
    LOW:    TEXT
            mapped libraries and files

And often (on systems with a unified kernel/user address space):


I don’t claim to know much about memory lay outs so perhaps there are architectures out there that don’t conform to this. So, assuming that they exist, what would the impact be if p was sufficiently large that an overflow could occur?

There are two primary locations where a 2 byte len could be an issue (I will ignore 1 byte lens because it seems highly unlikely that we will ever run into a problem with those in reality). The first location is whilst reading in the ciphersuites in a ClientHello, and the second is whilst reading in the extensions in the ClientHello.

Here is the (pre-patch) ciphersuite code:

n2s(p, i);

if (i == 0) {
    goto f_err;

/* i bytes of cipher data + 1 byte for compression length later */
if ((p + i + 1) > (d + n)) {
    /* not enough data */
    goto f_err;
if (ssl_bytes_to_cipher_list(s, p, i, &(ciphers)) == NULL) {
    goto err;
p += i;

Here i represents the two byte length read from the peer, p is the pointer into our buffer, and d + n is the end of the buffer. If p + i + 1 overflows then we will end up passing an excessively large i value to ssl_bytes_to_cipher_list(). Analysing that function it can be seen that it will loop over all of the memory from p onwards interpreting it as ciphersuite data bytes and attempting to read its value. This is likely to cause a crash once we overflow (if not before).

The analysis of the extensions code is similar and more complicated, but the outcome is the same. It will loop over the out-of-bounds memory interpreting it as extension data and will eventually crash.

It seems that the only way the above two issues could be exploited is via a DoS.

The final consideration in all of this, is how much of the above is under the control of the attacker. Clearly len is, but not the value of p. In order to exploit this an attacker would have to first find a system which is likely to allocate at the very highest end of address space and then send a high number of requests through until one “got lucky” and a DoS results. A possible alternative approach is to send out many requests to many hosts attempting to find one that is vulnerable.

It appears that such attacks, whilst possible, are unlikely and would be difficult to achieve successfully in reality - with a worst case scenario being a DoS. Given the difficulty of the attack and the relatively low impact we rated this as a low severity issue.

With thanks to Guido Vranken who originally reported this issue.

June 27, 2016 05:00 PM


Updated PhD Thesis Title

Yesterday I posted Latest PhD Thesis Title and Abstract. One of my colleagues Ben Buchanan subsequently contacted me via Twitter and we exchanged a few messages. He prompted me to think about the title.

Later I ruminated on the title of a recent book by my advisor, Dr. Thomas Rid. He wrote Cyber War Will Not Take Place. One of the best parts of the book is the title. In six words you get his argument as succinctly as possible. (It could be five words if you pushed "cyber" and "war" together, but the thought alone makes me cringe, in the age of cyber-everything.)

I wondered if I could transform my latest attempt at a thesis title into something that captured my argument in a succinct form.

I thought about the obsession of the majority of the information security community on the tool and tactics level of war. Too many technicians think about security as a single-exchange contest between an attacker and a defender, like a duel.

That reminded me of a problem I have with Carl von Clausewitz's definition of war.

We shall not enter into any of the abstruse definitions of war used by publicists. We shall keep to the element of the thing itself, to a duel. War is nothing but a duel on an extensive scale.

- On War, Chapter 1

Clausewitz continues by mentioning "the countless number of duels which make up a war," and then makes his famous statement that "War therefore is an act of violence to compel our opponent to fulfill our will." However, I've never liked the tactically-minded idea that war is a "duel."

This concept, plus the goal to deliver a compact argument, inspired me to revise my thesis title and subtitle to the following:

Campaigns, Not Duels: The Operational Art of Cyber Intrusions

In the first three words I deliver my argument, and in the subtitle I provide context by including my key perspective ("operational art"), environment ("cyber," yes, a little part of me is dying, but it's a keyword), and "intrusions."

When I publish the thesis as a book in 2018, I hope to use the same words in the book title.

by Richard Bejtlich ( at June 27, 2016 03:24 PM

Sean's IT Blog

Introducing StacksWare

Who uses this application?  How often are they using it?  Are we getting business value out of the licensing we purchased?  Is our licensing right-sized for our usage or environment?  How do we effectively track application usage in the era of non-persistent virtual desktops with per-user application layers?

These are common questions that the business asks when planning for an end-user computing environment, when planning to add licenses, or when maintenance and support are up for renewal.  They aren’t easy questions to answer either, and while IT can usually see where applications are installed, it’s not always easy to see who is using them and  how often they are being used.

The non-persistent virtual desktop world also has it’s own challenges.  The information required to effectively troubleshoot problems and issues is lost the minute the user logs out of the desktop.

Most tools that cover the application licensing compliance and monitoring space are difficult to set up and use.

I ran into StacksWare in the emerging vendors section at last year’s VMworld, and their goal is to address these challenges.  Their goal is to provide an easy-to-use tool that provides real-time insight into the   The company  was started as a research project at Stanford University that was sponsored by Sanjay Poonen of VMware, and it has received $2 million in venture capital funding from Greylock and Lightspeed.

StacksWare is a cloud-based application that uses an on-premises virtual appliance to track endpoints, users, and applications.  The on-premises portion integrates with vSphere and Active Directory to retrieve information about the environment, and data is presented in a cloud-based web interface.

It offers an HTML5 interface with graphs that update in real-time.  The interface is both fast and attractive, and it does not offer a lot of clutter or distraction.  It’s very well organized as well, and the layout makes it easy to find the information that you’re looking for quickly.


Caption: The StacksWare main menu located on the left side of the interface.


Caption: Application statistics for a user account in my lab.


Caption: Application list and usage statistics.

StacksWare can track both individual application usage and application suite usage.  So rather than having to compile the list of users who are using Word, Excel, and PowerPoint individually, you can track usage by the version of the Microsoft Office Suite that you have installed.  You can also assign license counts, licensing type, and costs associated with licensing.  StacksWare can track this information and generate licensing reports that can be used for compliance or budgetary planning.


Caption: The license management screen

One thing I like about StacksWare’s on-premises appliance is that it is built on Docker.  Upgrading the code on the local appliance is as simple as rebooting it, and it will download and install any updated containers as part of the bootup process.  This simplifies the updating process and ensures that customers get access to the latest code without any major hassles.

One other nice feature of StacksWare is the ability to track sessions and activity within sessions.  StacksWare can show me what applications I’m launching and when I am launching them in my VDI session. 


Caption: Application details for a user session.

For more information on StacksWare, you can check out their website at  You can also see some of their newer features in action over at this YouTube video.  StacksWare will also be at BriForum and VMworld this year, and you can check them out in the vendor solutions area.

by seanpmassey at June 27, 2016 01:45 PM

June 26, 2016

Racker Hacker

My list of must-see sessions at Red Hat Summit 2016

Red Hat Summit 2016The Red Hat Summit starts this week in San Francisco, and a few folks asked me about the sessions that they shouldn’t miss. The schedule can be overwhelming for first timers and it can be difficult at times to discern the technical sessions from the ones that are more sales-oriented.

If you’re in San Francisco, and you want to learn a bit more about using Ansible to manage OpenStack environments, come to the session that I am co-presenting with Robyn Bergeron: When flexibility met simplicity: The friendship of OpenStack and Ansible.

Confirmed good sessions

Here’s a list of sessions that I’ve seen in the past and I still highly recommend:

Dan is also doing a lab for container security called “A practical introduction to container security” that should be really interesting if you enjoy his regular sessions.

Sessions I want to see this year

If you want to talk OpenStack, Ansible, or Rackspace at the Summit, send me something on Twitter. Have a great time!

The post My list of must-see sessions at Red Hat Summit 2016 appeared first on

by Major Hayden at June 26, 2016 10:52 AM

June 24, 2016

Everything Sysadmin

Reminder: Do your homework for next week's LISA Conversations: Russell Pavlicek on "Unleashing the Power of the Unikernel"

This weekend is a good time to watch the video we'll be discussing on the next episode of LISA conversations: Russell Pavlicek's talk from LISA '15 titled Unleashing the Power of the Unikernel.

  • Homework: Watch his talk ahead of time.

Then you'll be prepared when we record the episode on Tuesday, June 28, 2016 at 3:30-4:30 p.m. PT. Register (optional) and watch via this link. Watching live makes it possible to participate in the Q&A.

The recorded episode will be available shortly afterwards on YouTube.

You won't want to miss this!

by Tom Limoncelli at June 24, 2016 03:00 PM

June 23, 2016

Errata Security

Use the freakin' debugger

This post is by a guy who does "not use a debugger". That's stupid. Using a friendly source-level debugger (Visual Studio, XCode, Eclipse) to step line-by-line through working code is what separates the 10x programmers from the wannabes. Yes, it's a bit of a learning hurdle, and creating "project" files for small projects is a bit of a burden, but do it. It'll vastly improve your coding skill.

That post quotes people like Rob Pike saying that stepping line-by-line is a crutch, that instead you should be able to reason about code. And that's true, if you understand what you are doing completely.

But in the real world, you never do. Programmers are constantly forced to stretch and use unfamiliar languages. Worse yet, they are forced to use unfamiliar libraries. Documentation sucks, there's no possible way to understand APIs than to step through code -- either watching the returned values, or compiling their source and stepping into it.

As an experienced programmer, it's true I often don't step through every line. The lines I understand completely, the ones I can fully reason about, I don't bother. But the programmer spends only a small percentage of their time on things they understand -- most of the time spent coding is noodling on the things they don't understand, and that's where the debugger comes in.

And this doesn't even take into account that in the real world, where programmers spend a lot of time working on other people's code. Sometimes the only way to figure it out is to set a breakpoint and run the unit test until it reaches that point.

Programmers fetishize editors. Real programmers, those who produce a lot of code, fetishize debuggers, both the one built into the IDE for debugging working code, and also the specialized tools for diagnosing problems in buggy code.

Seriously, if you are learning to program, learn to use the debugger in the integrated IDE. Step line-by-line through every line of code, until you grok it.. Microsoft's Visual Code is a good system for debugging JavaScript (which is a good starting language to learn). You'll thank me later when you are pulling down seven figures as a 10x programmer.

by Robert Graham ( at June 23, 2016 08:41 AM

Toolsmith Tidbit: XssPy

You've likely seen chatter recently regarding the pilot Hack the Pentagon bounty program that just wrapped up, as facilitated by HackerOne. It should come as no surprise that the most common vulnerability reported was cross-site scripting (XSS). I was invited to participate in the pilot, yes I found and submitted an XSS bug, but sadly, it was a duplicate finding to one already reported. Regardless, it was a great initiative by DoD, SecDef, and the Defense Digital Service, and I'm proud to have been asked to participate. I've spent my share of time finding XSS bugs and had some success, so I'm always happy when a new tool comes along to discover and help eliminate these bugs when responsibly reported.
XssPy is just such a tool.
A description as paraphrased from it's Github page:
XssPy is a Python tool for finding Cross Site Scripting vulnerabilities. XssPy traverses websites to find all the links and subdomains first, then scans each and every input on each and every page discovered during traversal.
XssPy uses small yet effective payloads to search for XSS vulnerabilities.
The tool has been tested in parallel with commercial vulnerability scanners, most of which failed to detect vulnerabilities that XssPy was able to find. While most paid tools typically scan only one site, XssPy first discovers sub-domains, then scans all links.
XssPy includes:
1) Short Scanning
2) Comprehensive Scanning
3) Subdomain discovery
4) Comprehensive input checking
XssPy has discovered cross-site scripting vulnerabilities in the websites of MIT, Stanford, Duke University, Informatica, Formassembly, ActiveCompaign, Volcanicpixels, Oxford, Motorola, Berkeley, and many more.

Install as follows:
git clone /opt/xsspy
Python 2.7 is required and you should have mechanize installed. If mechanize is not installed, type pip install mechanize in the terminal.

Run as follows:
python (no http:// or www).

Let me know what successes you have via email or Twitter and let me know if you have questions (russ at holisticinfosec dot org or @holisticinfosec).
Cheers…until next time.

by Russ McRee ( at June 23, 2016 06:46 AM

June 22, 2016

Errata Security

Reverse Turing testing tech support

So I have to get a new Windows license for a new PC. Should I get Windows 10 Home or Windows 10 Professional? What's the difference?

So I google the question, which gives me this website:

Ooh, a button that says "Download Table". That's exactly what I want -- a technical list without all the fluff. I scroll down to the parts that concern me, like encryption.

Wait, what? What's the difference between "Device Encryption" and "BitLocker"? I though BitLocker was Device Encryption?? Well, the purchase screen for Windows 10 has this friendly little pop out offering to help. Of course, as a techy, I know that such things are worse than useless, but I haven't tried one in a while, so I thought if I'd see if anything changed.

So up pops a chat window and we start chatting:

So at first he says they are the same. When I press him on the difference, he then admits they are different. He can't read the document I'm reading, because it's on a non-Microsoft "third party" site. While it's true it's on "", that's still a Microsoft site, but apparently he's not allowed to access it. I appears Microsoft firewalls their access to the Internet so jerks like me can't social engineer them.

So he goes on to find other differences:

At this point, he's acting as a Markov bot, searching Microsoft's internal site with the terms I give him, then selecting random phrases to spew back at me, with no understanding. Support for TPM has nothing to do with the difference.

Finally, he admits he can't answer the question, and offers to send me to more technical people:

I know this isn't going to work, but I'm in this far, and already planning to write a blog post, I continue the game.

At this point, I've learned to be more precise in my questioning:

It takes him awhile to research the answer. During this time, with more careful sleuthing with Google, I find the real answer (below). But eventually he comes back with this:

Like the previous person, it's all gibberish. He looked up my phrases, then spewed back random sampling of tech terms.

So I did figure eventually what "Device Encryption" was. It's described in this Ars Technica post. It's designed for when Windows is installed on tablet-style devices -- things that look more like an iPad and less like a notebook computer. It has strict hardware requirements, so it's probably not going to work on a desktop or even laptop computer. It requires a TPM, SSD, no fans (at least while sleeping), and so on.

The tl;dr is for people in my situation with desktop computer, Win10-Home's "Device Encryption" won't work -- it only works for tablets. If I want full disk encryption on my desktop, I'll need "Win10-Pro".

The idea is that in the future, tech support will be replaced AI bots that use natural language processing to answer questions like. But that's what we already have: tech support search text, finds plausible answers they don't understand, and regurgitates them back at us.

In other words, when the Turing Test is finally won, it's going to be in tech support, where a well-designed bot will outperform humans on answering such questions.

by Robert Graham ( at June 22, 2016 04:27 AM

June 20, 2016

Use the Nitrokey HSM or SmartCard-HSM with mod_nss and Apache

This is a guide on using the Nitrokey HSM with mod_nss and the Apache webserver. The HSM allows you to store the private key for a SSL certificate inside the HSM (instead of on the filesystem), so that it can never leave the device and thus never be stolen. The guide covers the installation and configuration of mod_nss, coupling the HSM to NSS, generating the keys and configuring Apache, and last but not least we also do some benchmarks on Apache with the HSM and different key sizes.

June 20, 2016 12:00 AM

June 19, 2016

Errata Security

Tesla review: What you need to know about charging

Before you buy an electric car, you need to understand charging. It’s a huge deal. You think it works almost like filling the gas tank. It doesn’t. Before going on long trips, you first need to do math and a bit of planning.

The Math

Like BMW model numbers indicate engine size, Tesla model numbers indicate the size of the battery, so my "Tesla S P90D" has a 90kwh (killowatt-hour) battery, with a 286mile range. Their lowest end model is the “Tesla S 60”, which has a 60kwh hour battery, or a 208mile advertised range.

In the United States, a typical plug is a 120volt circuit with a maximum of 15amps. Doing the math, this is how long it’ll take for me to recharge the battery:

That’s right, 1.4 days (or 2.1 days for a 90kwh car). This is the absolute worse case scenario, mind you, but it demonstrates that you have to pay attention to charging. You can't simply drive up to a station, fill up the tank in a couple minutes, and drive away.

Let’s say you live in Austin, Texas, and you have a meeting in Dallas. You think that you can drive up to Dallas in your new Tesla S 60, let the car charge while you are in the meeting, and then drive home. Or, maybe you have dinner there, letting the car charge longer. Or maybe you even stay overnight.

Nope, even 24 hours later, you still might not have enough charge left to get home. At 195 miles, it's at the range of the 60kwh battery, which would take more than a day to recharge using a normal electric circuit.

Faster Charging

That was a worst case scenario. Luckily, you probably won’t be charging using a normal 120volt/15amp circuit. That’s just the emergency backup if all else fails.

In your home, for high-watt devices like ovens, air conditioners, and clothes dryers, you have higher wattage circuits. The typical max in your home will be a 240volt/50amp circuit. It has a different power connector than a normal circuit, thicker wires, and so forth. Doing the math on this sucker, you get:

For our 190 mile drive, then, you can except to drive to Dallas, charge during the meeting and dinner for 5 hours, then you’ll have enough juice to get back home.

When you buy a Tesla, the first thing you’ll do is hire and electrician, and for $1000 to $5000, pay them to install this high-end circuit in your garage or car port. Since you garage is usually where the circuit breaker is located anyway, it’s usually the low-end of this range. You have to choose either the NEMA 14-50 plug, which can be used to power any electric car, or the Tesla HPWC (“High Power Wall Charger”) that just bundles the cord and everything together, making it easier to charge. Just back into your garage, get out of the car, pull off the cord, and plug it in. Real simple.

Standard NEMA 14-50 plug.
Different layout so you don't accidentally plug the wrong thing into it and blow a circuit.
Tesla proprietary wall charger.
Now for our trip to Dallas, though, we have a problem. While we can get the right charging circuit at home, we might not be able to find one on the road. How common can they possibly be? They sound like they'll be hard to find.

Well, no. Electric cars have become a thing. People are into them, unnaturally so. Even if you haven't noticed they EV (electric vehicle) plugs around you, they are everywhere. People are way ahead of you one this.

In our story of driving to Dallas, the first thing you'll do is visit, and lookup where to find charging stations. You'll find a ton of them, even in oil-rich cities like Dallas:

On the left coast, like California, it's insane. Chances are if you go to a business meeting, you'll find one in the parking lot, if not that building, then one next door. Drive in, go to the meeting, have some drinks or dinner afterwards, and you'll be able to drive home on a full charge.

Note that these charging stations primarily use the J1772 plug, a standard that all electric cars support. Your car comes with the standard electrical plug, the NEMA 14-050, and a J1772, so you can use any.

Also note that these charging stations are either run for profit, or part of a network. Even if the charging station is free, you still have to become a member. The most popular network nationwide is ChargePoint, but which one is most popular in a city varies. You may have to join a couple networks (I've just joined ChargePoint -- they have a free one down the street in a park, so I drive there and go for a walk and suck free juice).

These are sort of a franchise network. Somebody owns the parking space. They sign up with a network like ChargePoint, buy their unit and pay for installation, then get payments back from ChargePoint when they use the parking space. Since some businesses want to encourage you to visit, they don't charge you.

Ideally, all these charging stations should deliver max power. In practice, they are usually a bit weaker. Luckily, you can read people's reviews online and figure that out before you go.

One thing I want to stress is there is that charging is parking. The cost of electricity is negligible, that's not what they are charging your for. Instead, they charge your for time. Almost always, you'll be charge for how much time you leave your car parked there, not how much power you use.

As a Tesla owner, you can use these plugs, but also special Tesla plugs. Tesla makes up to two HPWC chargers available for free to business owners, especially hotels. They call this "destination charging", because you charge once you reach your destination. This is rather awesome as a Tesla owner, that you get vastly more options to charge than normal electric cars.

Level 1 and Level 2 charging

When you look at a charging map, like this one from ChargePoint, you'll see it mentions different "levels" of charging. What does this mean?

Level 1 means the standard 120volt/15amp, 1.8kw plug that you have for all electrical devices. Business (or even homes) that have an external plug will often put themselves on the map, but you don't care, since the charging at this level is so slow.

Level 2 means anything faster than Level 1, using the J1772 connector usually. There are a wide range of charging speeds. Original Nissan Leaf could only charge at 3.3kw, so a lot are in that range. More cars can deal with 6.6kw, so some are in that range. Only the Tesla and a Mercedes model go to the full 10kw, so many chargers don't support that much juice.

"Level 2 Tesla", in the map above means the HPWC mentioned above. They have appeared in the last 6 months as Tesla has aggressively pushed them to businesses, though it's usually just hotels. These may be 10kw (40amp), but may go up to 20kw (80amp). Note your car can only handle the 10kw/40amp speeds unless you upgrade to 20kw/80amp dual-charger.

"Level 2 NEMA" didn't use to be on the charging maps 6 months ago, but have appeared now. From what I can tell, a big reason is that when businesses put in a Tesla HPWC, they also put in the NEMA plug because it's really cheap, and allows them to attract more cars than just Teslas (though many don't come standard with that plug). Another reason this exists is because camping parks usually have the plug. You drive in with your campter/trailer, then hook up to the local electricity with this plug. You can also drive in with your Tesla. Indeed, the back of your car is big enough to sleep in.

The next options I'm going to describe below.

DC Fast Charging

Electricity from our walls is AC, or alternating current. Batteries, however, deal only with DC, or direct current. When you plug in your car, a device inside called the charger must convert the current. That's why cars have limitations on how much juice they consume, they are limited by the charger's capacity.

An alternative is to place the charger outside the car, where it can be much bigger, and feed DC direct current to the car. There is a lot of complexity here, because the car's computers need to talk to the external charger many times a second in order to adjust the flow of current. It's as much as a computer networking protocol as it is a copper connection for power.

If you look at your car, you'll see that the regenerative braking often charges the battery at 50kw, which is already many times faster than the AC Level 2 chargers mentioned above. We know the battery pack can handle it. Indeed, according to Tesla, SuperChargers can charge at a rate of 120kw, or 170 miles driving range in 30 minutes of charging.

Tesla has a network of SuperChargers placed along freeways in the United States so that you can drive across country. The idea is that every couple hours, you stop for 30 minutes, relax, drink some coffee, take a pee, and charge your car for the next stage.

In our driving scenario above, there's a SuperCharger in Waco, halfway between Austin and Dallas with 8 stalls available:

From Waco, it’s 76 miles to Dallas, meaning if you fully charge there, you can make it to Dallas and back to Waco without recharging – though it’s cutting it a bit close with the 60kwh model. Though, if your destination is the east side of Dallas, then maybe going through Corsicana and using their SuperCharger would be easier.

The SuperCharger is Tesla's solution to the problem but there are two other standards. There is the Asian standard known as CHAdeMO, and the European standard often called the COMBO adapter, as the DC component is combined with the J1772 AC standard. The CHAdeMO was the early standard coming with Japanese cars, but the COMBO adapter appears to be the winning standard in the long run. Tesla doesn't support either of these standards. These other standards are also far behind the Tesla in charging speed, maxing out around 60kw (and usually less), whereas the Tesla does 120kw.

As I mentioned, the direct DC charging dumps power into the battery as fast as the battery can take it -- which means it's the battery that now becomes the bottleneck. As you know from experience charging phones and laptops, charging to 50% full, but it seems to take forever to get from 90% to 100%. The same is true of your Tesla.

So here's a trick. The 60kwh car actually ships with the 75kwh battery. Tesla is allows you to "upgrade" your car later to the higher range for $9k, which will consist of just Tesla turning on a switch enabling the entire battery. But, for SuperCharging, it's still a 75kwh battery. 60 is 80% of 75. That means, you can charge to 100% full in 40minutes rather than 75minutes.

Charging to 100% lowers the lifetime of the battery, to Tesla recommends you only charge to 80% anyway. Again, with the 60kwh battery, charging to 100% means only 80% anyway. Thus, you get most of the benefits of a larger battery without paying for it. If you truly want extra range, you might consider the 90kwh upgrade instead of going from 60kwh to 75kwh.

The new Model 3s that come out in a few years won't have free SuperCharger access. That may also be true of the cheapest Model S (they keep changing this). You should check on this.

Off-peak Charging

Among the many revolutions recently has been smart electrical meters. They can now bill you by time-of-day when you consume power, charging you more during peak hours, and less during off-peak. If you haven't been paying attention to this, you are still probably on the old billing plan. You might want to look into changing.

Where I live, they charge $0.01 (one cent) per kilowatt-hour during off-off-peak, between 11pm and 7am. That's insane, compared to $0.20 elsewhere in the country.

Your Tesla can be configured to when it charges, so you get home, plug it in, and it won't start charging right away, but wait until off-peak or off-off-peak to start charging.


Electric charging math has given me a new appreciation for the power of gasoline. Filling up your gas tank is the equivalent of charging at multiple megawatt speeds – something electric cars will never get close to. Driving at the range limits of the car requires planning – you just can’t do it without doing some math.

by Robert Graham ( at June 19, 2016 03:17 AM

HTTP Strict Transport Security for Apache, NGINX and Lighttpd

HTTP Strict Transport Security (often abbreviated as HSTS) is a security feature that lets a web site tell browsers that it should only be communicated with using HTTPS, instead of using HTTP. This tutorial will show you how to set up HSTS in Apache2, NGINX and Lighttpd.

June 19, 2016 12:00 AM

June 18, 2016

How I configure a docker host with CFEngine

DockerAfter some lengthy busy times I’ve been able to restart my work on Docker. Last time I played with some containers to create a Consul cluster using three containers running on the same docker host — something you will never want to do in production.

And the reason why I was playing with a Consul cluster on docker was that you need a key/value store to play with overlay networks in Docker, and Consul is one of the supported stores. Besides, Consul is another technology I wanted to play with since the first minute I’ve known it.

To run an overlay network you need more than one Docker host otherwise it’s pretty pointless. That suggested me that it was time to automate the installation of a Docker host, so that I could put together a test lab quickly and also maintain it. And, as always, CFEngine was my friend. The following policy will not work out of the box for you since it uses a number of libraries of mine, but I’m sure you’ll get the idea.

bundle agent manage_docker(docker_users_list)
# This bundle
# - configures the official apt repo for docker
# - installs docker from the official repositories
# - adds @(docker_users_list) to the docker group
# - ensures the service is running
        slist => { "deb debian-$(inventory_lsb.codename) main" },
        comment => "APT source for docker, based on current Debian distro" ;

        slist => {
        comment => "Packages to ensure on the system" ;

      "report docker"
        usebundle => report("$(this.bundle)","bundle","INFO",
                            "Managing docker installation on this node") ;

      # Installation instructions from
        usebundle => add_local_apt_key("$(this.promise_dirname)/docker.key"),
        comment => "Ensure the apt key for docker is up to date" ;

      "configure docker's apt source"
        usebundle => add_apt_sourcefile("docker","@(manage_docker.source)"),
        comment => "Ensure the apt source for docker is up to date" ;

      "install docker"
        usebundle => install_packs("@(manage_docker.packages)"),
        comment => "Ensure all necessary packages are installed" ;

      "add users to docker group"
        usebundle => add_user_to_system_group("$(docker_users_list)",
        comment => "Ensure user $(docker_users_list) in group docker" ;

      "watch docker service"
        usebundle => watch_service("docker",                 # service name
                                   "/usr/bin/docker daemon", # process name
                                   "up"),                    # desired state
        comment => "Ensure the docker service is up and running" ;

It’s basically the automation of the installation steps for Debian Jessie from the official documentation. Neat, isn’t it?

by bronto at June 18, 2016 09:48 PM

June 17, 2016

Sean's IT Blog

Full-Stack Engineering Starts With A Full-Stack Education

A few weeks ago, my colleague Eric Shanks wrote a good piece called “So You Wanna Be  A Full Stack Engineer…”  In the post, Eric talks about some the things you can expect when you get into a full-stack engineering role. 

You may be wondering what exactly a full-stack engineer is.  It’s definitely a hot buzzword.  But what does someone in a full-stack engineering role do?  What skills do they need to have? 

The closest definition I can find for someone who isn’t strictly a developer is “an engineer who is responsible for an entire application stack including databases, application servers, and hardware.”  This is an overly broad definition, and it may include things like load balancers, networking, and storage.  It can also include software development.  The specifics of the role will vary by environment – one company’s full-stack engineer may be more of an architect while another may have a hands-on role in managing all aspects of an application or environment.  The size and scope of the environment play a role too – a sysadmin or developer in a smaller company may be a full-stack engineer by default because the team is small.

However that role shakes out, Eric’s post gives a good idea of what someone in a full-stack role can expect.

Sounds fun, doesn’t it?  But how do you get there? 

In order to be a successful full-stack engineer, you need to get a full-stack education.  A full-stack education is a broad, but shallow, education in multiple areas of IT.    Someone in a full-stack engineering role needs to be able to communicate effectively with subject-matter experts in their environments, and that requires knowing the fundamentals of each area and how to identify, isolate, and troubleshoot issues. 

A full-stack engineer would be expected to have some knowledge of the following areas:

  • Virtualization: Understanding the concepts of a hypervisor, how it interacts with networking and storage, and how to troubleshoot using some basic tools for the platform
  • Storage: Understanding the basic storage concepts for local and remote (SAN/NAS) storage and how it integrates with the environment.  How to troubleshoot basic storage issues.
  • Network: Understand the fundamentals of TCP/IP, how the network interacts with the environment,and the components that exist on the network.  How to identify different types of networking issues using various built-in tools like ping, traceroute, and telnet.
    Networking, in particular, is a very broad topic that has many different layers where problems can occur.  Large enterprise networks can also be very complex when VLANs, routing, load balancers and firewalls are considered.  A full-stack engineer wouldn’t need to be an expert on all of the different technologies that make up the network, but one should have enough networking experience to identify and isolate where potential problems can occur so they can communicate with the various networking subject matter experts to resolve the issue.
  • Databases: Understand how the selected database technology works, how to query the database, and how troubleshoot performance issues using built-in or 3rd-party database tools.  The database skillset can be domain-specific as there are different types of databases (SQL vs. No-SQL vs. Column-Based vs. Proprietary), differences in the query languages between SQL database vendors (Oracle, MS SQL Server, and PostgreSQL) and how the applications interact with the database.
  • Application Servers: Understand how to deploy, configure, and troubleshoot the application servers.  Two of the most important things to note are the location of log files and how it interacts with the database (ie – can the application automatically reconnect to the database or does it need to be restarted). This can also include a middleware tier that sits between the application servers  The skills required for managing the middleware tier will greatly depend on the application, the underlying operating system, and how it is designed. 
  • Operating Systems: Understand the basics of the operating system, where to find the logs, and how to measure performance of the system.
  • Programming: Understand the basics of software development and tools that developers use for maintaining code and tracking bugs.  Not all full-stack engineers develop code, but one might need to check out production ready code to deploy it into development and production environments.  Programming skills are also useful for automating tasks, and this can lead to faster operations and fewer mistakes.

Getting A Full-Stack Education

So how does one get exposure and experience in all of the areas above?

There is no magic bullet, and much of the knowledge required for a full-stack engineering role is gained through time and experience.  There are a few ways you can help yourself, though.

  • Get a theory-based education: Get into an education program that covers the fundamental IT skills from a theory level.  Many IT degree programs are hands-on with today’s hot technologies, and they may not cover the business and technical theory that will be valuable across different technology silos.  Find ways to apply that theory outside of the classroom with side projects or in the lab.
  • Create learning opportunities: Find ways to get exposure to new technology.  This can be anything from building a home lab and learning technology hands-on to shadowing co-workers when they perform maintenance.
  • Be Curious: Don’t be afraid to ask questions, pick up a book, or read a blog on something you are interested in and want to learn.

by seanpmassey at June 17, 2016 08:28 PM

Get started with the Nitrokey HSM or SmartCard-HSM

This is a guide to get started with the NitroKey HSM (or SmartCard-HSM). It covers what a HSM is and what it can be used for. It also goes over software installation and initializing the device, including backups of the device and the keys. Finally we do some actual crypto operatons via pkcs11, OpenSSL, Apache and OpenSSH.

June 17, 2016 12:00 AM

June 16, 2016

Racker Hacker

New SELinux shirts are available

With the upcoming Red Hat Summit 2016 in San Francisco almost upon us, I decided to update the old SELinux shirts with two new designs:

Make SELinux enforcing again shirtsetenforce 1 shirt 2016

You can buy these now over at Spreadshirt! There are styles there for men and women and I’ve priced them as low as the store will allow.

Spreadshirt is also running a sale for 15% off T-shirts until June 21st with the code TSHIRT16. Let’s make SELinux enforcing again!

The post New SELinux shirts are available appeared first on

by Major Hayden at June 16, 2016 09:22 PM

June 15, 2016

Cryptography Engineering

What is Differential Privacy?

Yesterday at the WWDC keynote, Apple announced a series of new security and privacy features, including one feature that's drawn a bit of attention -- and confusion. Specifically, Apple announced that they will be using a technique called "Differential Privacy" (henceforth: DP) to improve the privacy of their data collection practices.

The reaction to this by most people has been a big "???", since few people have even heard of Differential Privacy, let alone understand what it means. Unfortunately Apple isn't known for being terribly open when it comes to sharing the secret sauce that drives their platform, so we'll just have to hope that at some point they decide to publish more. What we know so far comes from Apple's iOS 10 Preview guide:
Starting with iOS 10, Apple is using Differential Privacy technology to help discover the usage patterns of a large number of users without compromising individual privacy. To obscure an individual’s identity, Differential Privacy adds mathematical noise to a small sample of the individual’s usage pattern. As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience. In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions and Lookup Hints in Notes.
To make a long story short, it sounds like Apple is going to be collecting a lot more data from your phone. They're mainly doing this to make their services better, not to collect individual users' usage habits. To guarantee this, Apple intends to apply sophisticated statistical techniques to ensure that this aggregate data -- the statistical functions it computes over all your information -- don't leak your individual contributions. In principle this sounds pretty good. But of course, the devil is always in the details.

While we don't have those details, this seems like a good time to at least talk a bit about what Differential Privacy is, how it can be achieved, and what it could mean for Apple -- and for your iPhone.

The motivation

In the past several years, "average people" have gotten used to the idea that they're sending a hell of a lot of personal information to the various services they use. Surveys also tell us they're starting to feel uncomfortable about it.

This discomfort makes sense when you think about companies using our personal data to market (to) us. But sometimes there are decent motivations for collecting usage information. For example, Microsoft recently announced a tool that can diagnose pancreatic cancer by monitoring your Bing queries. Google famously runs Google Flu Trends. And of course, we all benefit from crowdsourced data that improves the quality of the services we use -- from mapping applications to restaurant reviews.

Unfortunately, even well-meaning data collection can go bad. For example, in the late 2000s, Netflix ran a competition to develop a better film recommendation algorithm. To drive the competition, they released an "anonymized" viewing dataset that had been stripped of identifying information. Unfortunately, this de-identification turned out to be insufficient. In a well-known piece of work, Narayanan and Shmatikov showed that such datasets could be used to re-identify specific users -- and even predict their political affiliation! -- if you simply knew a little bit of additional information about a given user.

This sort of thing should be worrying to us. Not just because companies routinely share data (though they do) but because breaches happen, and because even statistics about a dataset can sometimes leak information about the individual records used to compute it. Differential Privacy is a set of tools that was designed to address this problem.

What is Differential Privacy?

Differential Privacy is a privacy definition that was originally developed by Dwork, Nissim, McSherry and Smith, with major contributions by many others over the years. Roughly speaking, what it states can summed up intuitively as follows:
Imagine you have two otherwise identical databases, one with your information in it, and one without it. Differential Privacy ensures that the probability that a statistical query will produce a given result is (nearly) the same whether it's conducted on the first or second database.
One way to look at this is that DP provides a way to know if your data has a significant effect on the outcome of a query. If it doesn't, then you might as well contribute to the database -- since there's almost no harm that can come of it. Consider a silly example:

Imagine that you choose to enable a reporting feature on your iPhone that tells Apple if you like to use the 💩  emoji routinely in your iMessage conversations. This report consists of a single bit of information: 1 indicates you like 💩 , and 0 doesn't. Apple might receive these reports and fill them into a huge database. At the end of the day, it wants to be able to derive a count of the users who like this particular emoji.

It goes without saying that the simple process of "tallying up the results" and releasing them does not satisfy the DP definition, since computing a sum on the database that contains your information will potentially produce a different result from computing the sum on a database without it. Thus, even though these sums may not seem to leak much information, they reveal at least a little bit about you. A key observation of the Differential Privacy research is that in many cases, DP can be achieved if the tallying party is willing to add random noise to the result. For example, rather than simply reporting the sum, the tallying party can inject noise from a Laplace or gaussian distribution, producing a result that's not quite exact -- but that masks the contents of any given row. (For other interesting functions, there are many other techniques as well.) 

Even more usefully, the calculation of "how much" noise to inject can be made without knowing the contents of the database itself (or even its size). That is, the noise calculation can be performed based only on knowledge of the function to be computed, and the acceptable amount of data leakage. 

A tradeoff between privacy and accuracy

Now obviously calculating the total number of 💩 -loving users on a system is a pretty silly example. The neat thing about DP is that the same overall approach can be applied to much more interesting functions, including complex statistical calculations like the ones used by Machine Learning algorithms. It can even be applied when many different functions are all computed over the same database.

But there's a big caveat here. Namely, while the amount of "information leakage" from a single query can be bounded by a small value, this value is not zero. Each time you query the database on some function, the total "leakage" increases -- and can never go down. Over time, as you make more queries, this leakage can start to add up.

This is one of the more challenging aspects of DP. It manifests in two basic ways:
  1. The more information you intend to "ask" of your database, the more noise has to be injected in order to minimize the privacy leakage. This means that in DP there is generally a fundamental tradeoff between accuracy and privacy, which can be a big problem when training complex ML models.
  2. Once data has been leaked, it's gone. Once you've leaked as much data as your calculations tell you is safe, you can't keep going -- at least not without risking your users' privacy. At this point, the best solution may be to just to destroy the database and start over. If such a thing is possible.
The total allowed leakage is often referred to as a "privacy budget", and it determines how many queries will be allowed (and how accurate the results will be). The basic lesson of DP is that the devil is in the budget. Set it too high, and you leak your sensitive data. Set it too low, and the answers you get might not be particularly useful.

Now in some applications, like many of the ones on our iPhones, the lack of accuracy isn't a big deal. We're used to our phones making mistakes. But sometimes when DP is applied in complex applications, such as training Machine Learning models, this really does matter.

Mortality vs. info disclosure, from Frederikson et al.
The red line is partient mortality.
To give an absolutely crazy example of how big the tradeoffs can be, consider this paper by Frederikson et al. from 2014. The authors began with a public database linking Warfarin dosage outcomes to specific genetic markers. They then used ML techniques to develop a dosing model based on their database -- but applied DP at various privacy budgets while training the model. Then they evaluated both the information leakage and the model's success at treating simulated "patients".

The results showed that the model's accuracy depends a lot on the privacy budget on which it was trained. If the budget is set too high, the database leaks a great deal of sensitive patient information -- but the resulting model makes dosing decisions that are about as safe as standard clinical practice. On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the "noise-ridden" model had a tendency to kill its "patients". 

Now before you freak out, let me be clear: your iPhone is not going to kill you. Nobody is saying that this example even vaguely resembles what Apple is going to do on the phone. The lesson of this research is simply that there are interesting tradeoffs between effectiveness and the privacy protection given by any DP-based system -- these tradeoffs depend to a great degree on specific decisions made by the system designers, the parameters chosen by the deploying parties, and so on. Hopefully Apple will soon tell us what those choices are.

How do you collect the data, anyway?

You'll notice that in each of the examples above, I've assumed that queries are executed by a trusted database operator who has access to all of the "raw" underlying data. I chose this model because it's the traditional model used in most of the literature, not because it's a particularly great idea.

In fact, it would be worrisome if Apple was actually implementing their system this way. That would require Apple to collect all of your raw usage information into a massive centralized database, and then ("trust us!") calculate privacy-preserving statistics on it. At a minimum this would make your data vulnerable to subpoenas, Russian hackers, nosy Apple executives and so on.

Fortunately this is not the only way to implement a Differentially Private system. On the theoretical side, statistics can be computed using fancy cryptographic techniques (such as secure multi-party computation or fully-homomorphic encryption.) Unfortunately these techniques are probably too inefficient to operate at the kind of scale Apple needs. 

A much more promising approach is not to collect the raw data at all. This approach was recently pioneered by Google to collect usage statistics in their Chrome browser. The system, called RAPPOR, is based on an implementation of the 50-year old randomized response technique. Randomized response works as follows:
  1. When a user wants to report a piece of potentially embarrassing information (made up example: "Do you use Bing?"), they first flip a coin, and if the coin comes up "heads", they return a random answer -- calculated by flipping a second coin. Otherwise they answer honestly.
  2. The server then collects answers from the entire population, and (knowing the probability that the coins will come up "heads"), adjusts for the included "noise" to compute an approximate answer for the true response rate.
Intuitively, randomized response protects the privacy of individual user responses, because a "yes" result could mean that you use Bing, or it could just be the effect of the first mechanism (the random coin flip). More formally, randomized response has been shown to achieve Differential Privacy, with specific guarantees that can adjusted by fiddling with the coin bias. 

 I've met Craig Federighi. He actually
looks like this in person.
RAPPOR takes this relatively old technique and turns it into something much more powerful. Instead of simply responding to a single question, it can report on complex vectors of questions, and may even return complicated answers, such as strings -- e.g., which default homepage you use. The latter is accomplished by first encoding the string into a Bloom filter -- a bitstring constructed using hash functions in a very specific way. The resulting bits are then injected with noise, and summed, and the answers recovered using a (fairly complex) decoding process.

While there's no hard evidence that Apple is using a system like RAPPOR, there are some small hints. For example, Apple's Craig Federighi describes Differential Privacy as "using hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private." That's pretty weak evidence for anything, admittedly, but presence of the "hashing" in that quote at least hints towards the use of RAPPOR-like filters.

The main challenge with randomized response systems is that they can leak data if a user answers the same question multiple times. RAPPOR tries to deal with this in a variety of ways, one of which is to identify static information and thus calculate "permanent answers" rather than re-randomizing each time. But it's possible to imagine situations where such protections could go wrong. Once again, the devil is very much in the details -- we'll just have to see. I'm sure many fun papers will be written either way.

So is Apple's use of DP a good thing or a bad thing?

As an academic researcher and a security professional, I have mixed feelings about Apple's announcement. On the one hand, as a researcher I understand how exciting it is to see research technology actually deployed in the field. And Apple has a very big field.

On the flipside, as security professionals it's our job to be skeptical -- to at a minimum demand people release their security-critical code (as Google did with RAPPOR), or at least to be straightforward about what it is they're deploying. If Apple is going to collect significant amounts of new data from the devices that we depend on so much, we should really make sure they're doing it right -- rather than cheering them for Using Such Cool Ideas. (I made this mistake already once, and I still feel dumb about it.)

But maybe this is all too "inside baseball". At the end of the day, it sure looks like Apple is honestly trying to do something to improve user privacy, and given the alternatives, maybe that's more important than anything else.  

by Matthew Green ( at June 15, 2016 02:51 PM

June 14, 2016

Anton Chuvakin - Security Warrior

June 12, 2016

Sarah Allen

sailsjs testing: how to truncate the database

SailsJS is a NodeJS MVC framework that we use for the OpenOpps Platform. Sails has some basic testing docs, but it doesn’t explain how to set up the framework nicely where the database is dropped in between tests. I find myself always re-figuring out these patterns when I write experimental apps.

With an in-memory database, this bootstrap.test.js will drop the database between tests:

var sails = require('sails');

before(function(done) {
    // test config
    environment: 'test',
    port: 9999,   // so we can run the app and tests at the same time
    hostName: 'localhost:9999',
    connections: {
      testDB: {
        adapter: 'sails-memory'
    models: {
      connection: 'testDB'
  }, function(err, server) {
    if (err) return done(err);
    done(err, sails);

after(function(done) {

beforeEach(function(done) {
  // Drops database between each test.  This works because we use
  // the memory database
  sails.once('hook:orm:reloaded', done);

When I need to use postgres, Waterline doesn’t expose SQL truncate which would be much faster, instead this bootstrap.test.js will destroy all the models:

var async = require('async');
var sails = require('sails');

before(function(done) {
    // test config
    environment: 'test',
    port: 9999,   // so we can run the app and tests at the same time
    hostName: 'localhost:9999',
    models: {
      connection: 'postgresql'
  }, function(err, server) {
    if (err) return done(err);
    done(err, sails);

afterEach(function(done) {
  destroyFuncs = [];
  for (modelName in sails.models) {
    destroyFuncs.push(function(callback) {
      .exec(function(err) {
        callback(null, err)
  async.parallel(destroyFuncs, function(err, results) {

after(function(done) {

The post sailsjs testing: how to truncate the database appeared first on the evolving ultrasaurus.

by sarah at June 12, 2016 02:10 AM

June 11, 2016

Feeding the Cloud

Cleaning up obsolete config files on Debian and Ubuntu

As part of regular operating system hygiene, I run a cron job which updates package metadata and looks for obsolete packages and configuration files.

While there is already some easily available information on how to purge unneeded or obsolete packages and how to clean up config files properly in maintainer scripts, the guidance on how to delete obsolete config files is not easy to find and somewhat incomplete.

These are the obsolete conffiles I started with:

$ dpkg-query -W -f='${Conffiles}\n' | grep 'obsolete$'
 /etc/apparmor.d/abstractions/evince ae2a1e8cf5a7577239e89435a6ceb469 obsolete
 /etc/apparmor.d/tunables/ntpd 5519e4c01535818cb26f2ef9e527f191 obsolete
 /etc/apparmor.d/usr.bin.evince 08a12a7e468e1a70a86555e0070a7167 obsolete
 /etc/apparmor.d/usr.sbin.ntpd a00aa055d1a5feff414bacc89b8c9f6e obsolete
 /etc/bash_completion.d/initramfs-tools 7eeb7184772f3658e7cf446945c096b1 obsolete
 /etc/bash_completion.d/insserv 32975fe14795d6fce1408d5fd22747fd obsolete
 /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf 8df3896101328880517f530c11fff877 obsolete
 /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf d81013f5bfeece9858706aed938e16bb obsolete

To get rid of the /etc/bash_completion.d/ files, I first determined what packages they were registered to:

$ dpkg -S /etc/bash_completion.d/initramfs-tools
initramfs-tools: /etc/bash_completion.d/initramfs-tools
$ dpkg -S /etc/bash_completion.d/insserv
initramfs-tools: /etc/bash_completion.d/insserv

and then followed Paul Wise's instructions:

$ rm /etc/bash_completion.d/initramfs-tools /etc/bash_completion.d/insserv
$ apt install --reinstall initramfs-tools insserv

For some reason that didn't work for the /etc/dbus-1/system.d/ files and I had to purge and reinstall the relevant package:

$ dpkg -S /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf
system-config-printer-common: /etc/dbus-1/system.d/com.redhat.NewPrinterNotification.conf
$ dpkg -S /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf
system-config-printer-common: /etc/dbus-1/system.d/com.redhat.PrinterDriversInstaller.conf

$ apt purge system-config-printer-common
$ apt install system-config-printer

The files in /etc/apparmor.d/ were even more complicated to deal with because purging the packages that they come from didn't help:

$ dpkg -S /etc/apparmor.d/abstractions/evince
evince: /etc/apparmor.d/abstractions/evince
$ apt purge evince
$ dpkg-query -W -f='${Conffiles}\n' | grep 'obsolete$'
 /etc/apparmor.d/abstractions/evince ae2a1e8cf5a7577239e89435a6ceb469 obsolete
 /etc/apparmor.d/usr.bin.evince 08a12a7e468e1a70a86555e0070a7167 obsolete

I was however able to get rid of them by also purging the apparmor profile packages that are installed on my machine:

$ apt purge apparmor-profiles apparmor-profiles-extra evince ntp
$ apt install apparmor-profiles apparmor-profiles-extra evince ntp

Not sure why I had to do this but I suspect that these files used to be shipped by one of the apparmor packages and then eventually migrated to the evince and ntp packages directly and dpkg got confused.

If you're in a similar circumstance, you want want to search for the file you're trying to get rid of on Google and then you might end up on which could lead you to the old package that used to own this file.

June 11, 2016 09:40 PM

June 09, 2016

Toolsmith Feature Highlight: Autopsy 4.0.0's case collaboration

First, here be changes.
After nearly ten years of writing toolsmith exactly the same way once a month, now for the 117th time, it's time to mix things up a bit.
1) Tools follow release cycles, and often may add a new feature that could be really interesting, even if the tool has been covered in toolsmith before.
2) Sometimes there may not be a lot to say about a tool if its usage and feature set are simple and easy, yet useful to us.
3) I no longer have an editor or publisher that I'm beholden too, there's no reason to only land toolsmith content once a month at the same time.
Call it agile toolsmith. If there's a good reason for a short post, I'll do so immediately, such as a new release of feature, and every so often, when warranted, I'll do a full coverage analysis of a really strong offering.
For tracking purposes, I'll use title tags (I'll use these on Twitter as well):
  • Toolsmith Feature Highlight
    • new feature reviews
  • Toolsmith Release Advisory
    • heads up on new releases
  • Toolsmith Tidbit
    • infosec tooling news flashes
  • Toolsmith In-depth Analysis
    • the full monty
That way you get the tl;dr so you know what you're in for.

On to our topic.
This is definitely in the "in case you missed it" category, I was clearly asleep at the wheel, but Autopsy 4.0.0 was released Nov 2015. The major highlight of this release is specifically the ability to setup a multi-user environment, including "multi-user cases supported that allow collaboration using network-based services." Just in case you aren't current on free and opensource DFIR tools, "Autopsy® is a digital forensics platform and graphical interface to The Sleuth Kit® and other digital forensics tools." Thanks to my crew, Luiz Mello for pointing the v4 release out to me, and to Mike Fanning for a perfect small pwned system to test v4 with.

Autopsy 4.0.0 case creation walk-through

I tested the latest Autopsy with an .e01 image I'd created from a 2TB victim drive, as well as against a mounted VHD.

Select the new case option via the opening welcome splash (green plus), the menu bar via File | New Case, or Ctrl+N:
New case
Populate your case number and examiner:
Case number and examiner
Point Autopsy at a data source. In this case I refer to my .e01 file, but also mounted a VHD as a local drive during testing (an option under select source type drop-down.
Add data source
Determine which ingest modules you'd like to use. As I examined both a large ext4 filesystem as well as a Windows Server VHD, I turned off Android Analyzer...duh. :-)
Ingest modules
After the image or drive goes through initial processing you'll land on the Autopsy menu. The Quick Start Guide will get you off to the races.

The real point of our discussion here is the new Autopsy 4.0.0 case collaboration feature, as pulled directly from Autopsy User Documentation: Setting Up Multi-user Environment

Multi-user Installation

Autopsy can be setup to work in an environment where multiple users on different computers can have the same case open at the same time. To set up this type of environment, you will need to configure additional (free and open source) network-based services.

Network-based Services

You will need the following that all Autopsy clients can access:

  • Centralized storage that all clients running Autopsy have access to. The central storage should be either mounted at the same Windows drive letter or UNC paths should be used everywhere. All clients need to be able to access data using the same path.
  • A central PostgreSQL database. A database will be created for each case and will be stored on the local drive of the database server. Installation and configuration is explained in Install and Configure PostgreSQL.
  • A central Solr text index. A Solr core will be created for each case and will be stored in the case folder (not on the local drive of the Solr server). We recommend using Bitnami Solr. This is explained in Install and Configure Solr.
  • An ActiveMQ messaging server to allow the various clients to communicate with each other. This service has minimal storage requirements. This is explained in Install and Configure ActiveMQ.

When you setup the above services, securely document the addresses, user names, and passwords so that you can configure each of the client systems afterwards.

The Autopsy team recommends using at least two dedicated computers for this additional infrastructure. Spreading the services out across several machines can improve throughput. If possible, place Solr on a machine by itself, as it utilizes the most RAM and CPU among the servers.

Ensure that the central storage and PostgreSQL servers are regularly backed up.

Autopsy Clients

Once the infrastructure is in place, you will need to configure Autopsy to use them.

Install Autopsy on each client system as normal using the steps from Installing Autopsy.
Start Autopsy and open the multi-user settings panel from "Tools", "Options", "Multi-user". As shown in the screenshot below, you can then enter all of the address and authentication information for the network-based services. Note that in order to create or open Multi-user cases, "Enable Multi-user cases" must be checked and the settings below must be correct.

Multi-user settings
In closing

Autopsy use is very straightforward and well documented. As of version 4.0.0, the ability to utilize a multi-user is a highly beneficial feature for larger DFIR teams. Forensicators and responders alike should be able to put it to good use.
Ping me via email or Twitter if you have questions (russ at holisticinfosec dot org or @holisticinfosec).
Cheers…until next month time.

by Russ McRee ( at June 09, 2016 05:31 AM

June 08, 2016

Feeding the Cloud

Simple remote mail queue monitoring

In order to monitor some of the machines I maintain, I rely on a simple email setup using logcheck. Unfortunately that system completely breaks down if mail delivery stops.

This is the simple setup I've come up with to ensure that mail doesn't pile up on the remote machine.

Server setup

The first thing I did on the server-side is to follow Sean Whitton's advice and configure postfix so that it keeps undelivered emails for 10 days (instead of 5 days, the default):

postconf -e maximal_queue_lifetime=10d

Then I created a new user:

adduser mailq-check

with a password straight out of pwgen -s 32.

I gave ssh permission to that user:

adduser mailq-check sshuser

and then authorized my new ssh key (see next section):

sudo -u mailq-check -i
mkdir ~/.ssh/
cat - > ~/.ssh/authorized_keys

Laptop setup

On my laptop, the machine from where I monitor the server's mail queue, I first created a new password-less ssh key:

ssh-keygen -t ed25519 -f .ssh/egilsstadir-mailq-check
cat ~/.ssh/

which I then installed on the server.

Then I added this cronjob in /etc/cron.d/egilsstadir-mailq-check:

0 2 * * * francois /usr/bin/ssh -i /home/francois/.ssh/egilsstadir-mailq-check mailq-check@egilsstadir mailq | grep -v "Mail queue is empty"

and that's it. I get a (locally delivered) email whenever the mail queue on the server is non-empty.

There is a race condition built into this setup since it's possible that the server will want to send an email at 2am. However, all that does is send a spurious warning email in that case and so it's a pretty small price to pay for a dirt simple setup that's unlikely to break.

June 08, 2016 05:30 AM

June 07, 2016

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – May 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? Well, you be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” …  [262 pageviews]
  2. “A Myth of An Expert Generalist” is a fun rant on what I think it means to be “a security expert” today; it argues that you must specialize within security to really be called an expert [103 pageviews]
  3. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM, see this document [2015 update]) [80 pageviews]
  4. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our new 2016 research on security monitoring use cases here! [74 pageviews]
  5. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version) [65 pageviews of total 3971 pageviews to all blog pages]
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 3X of the traffic of this blog]: 
Current research on SOC and threat intelligence [2 projects]:
Past research on IR:
Past research on EDR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin ( at June 07, 2016 06:08 PM

Alien Deb to RPM convert fails with error: "Use of uninitialized value in split…" [FIXED]

TL;DR: Run alien under the script tool.

I was trying to get my build server to build packages for one of my projects. One step involves converting a Debian package to a RPM by means of the Alien tool. Unfortunately it failed with the following error:

alien -r -g ansible-cmdb-9.99.deb
Warning: alien is not running as root!
Warning: Ownerships of files in the generated packages will probably be wrong.
Use of uninitialized value in split at /usr/share/perl5/Alien/Package/ line 52.
no entry data.tar.gz in archive

gzip: stdin: unexpected end of file
tar: This does not look like a tar archive
tar: Exiting with failure status due to previous errors
Error executing "ar -p 'ansible-cmdb-9.99.deb' data.tar.gz | gzip -dc | tar tf -":  at /usr/share/perl5/Alien/ line 481.
make: *** [release_rpm] Error 2

The same process ran fine manually from the commandline, so I suspected something related with the controlling terminal. One often trick is to pretend to a script we actually have a working interactive TTY using the script tool. Here's how that looks:

script -c "sh"

The job now runs fine:

alien -r -g ansible-cmdb-9.99.deb
Warning: alien is not running as root!
Warning: Ownerships of files in the generated packages will probably be wrong.
Directory ansible-cmdb-9.99 prepared.
sed -i '\:%dir "/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/share/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/share/man/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/share/man/man1/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/lib/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
sed -i '\:%dir "/usr/bin/":d' ansible-cmdb-9.99/ansible-cmdb-9.99-2.spec
cd ansible-cmdb-9.99 && rpmbuild --buildroot='/home/builder/workspace/ansible-cmdb/ansible-cmdb-9.99/' -bb --target noarch 'ansible-cmdb-9.99-2.spec'
Building target platforms: noarch
Building for target noarch
Processing files: ansible-cmdb-9.99-2.noarch
Provides: ansible-cmdb = 9.99-2
Requires(rpmlib): rpmlib(CompressedFileNames) <= 3.0.4-1 rpmlib(PayloadFilesHavePrefix) <= 4.0-1
Requires: /usr/bin/env
Checking for unpackaged file(s): /usr/lib/rpm/check-files /home/builder/workspace/ansible-cmdb/ansible-cmdb-9.99
warning: Installed (but unpackaged) file(s) found:
Wrote: ../ansible-cmdb-9.99-2.noarch.rpm
Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.3ssxv0
+ umask 022
+ cd /home/builder/rpmbuild/BUILD
+ /bin/rm -rf /home/builder/workspace/ansible-cmdb/ansible-cmdb-9.99
+ exit 0

Hope that helps someone out there.

by admin at June 07, 2016 07:44 AM

June 06, 2016

Carl Chenet

My Free Activities in May 2015

Follow me also on Diaspora*diaspora-banner or Twitter 

Trying to catch up with my blog posts about My Free Activities. This blog post will tell you about my free activities from January to May 2016.

1. Personal projects

2. Journal du hacker

That’s all folks! See you next month!

by Carl Chenet at June 06, 2016 10:00 PM

June 05, 2016

Sarah Allen

serving those who served us

There is a tremendous sea of good will in the public to volunteer their time, energy, and services to support Veterans and their families. The Department of Veterans Innovators Network has a lengthy list of potential innovation opportunities to serve Veterans, ranging from developing reimagined service journeys for Veterans seeking mental health services to designing 3D printed personalized assistive technologies for Veterans with disabilities to developing a machine learning algorithm to scan retinal images to detect blindness in Veterans.

The question becomes: how to match the talents and spirit to serve of citizens of the United States with the opportunities and needs of employees within the Department of Veteran Affairs?

Our goal is to harness the ingenuity and brainpower of the public and pair this talent with projects in need across the Department of Veterans Affairs (VA). On the platform “Serving Those Who Served Us”, the public will be able to come for an “innovation volunteer tour” and collaborate with VA employees, Veterans, and Veterans supporters on a portfolio of innovation-related projects across the Department. These are short term projects ranging from activities that could take a couple hours to a couple months and are in need of the brain power + talent of innovators, such as designers, engineers, developers, entrepreneurs, and data scientists! Most volunteer activities can happen virtually with volunteers and staff collaborating from different locations.

This is how it will work:

  • VA employees will post innovation project opportunities on to the “Serving Those Who Served Us” platform. These projects will list out their problem statement, a short summary of the project, and the potential tasks for the project and skillsets needed.
  • External innovators can create a profile and browse opportunities to volunteer.
  • When the external innovator finds a project of interest, they can then contact the VA employee through the platform and they can begin conversations to work together!
  • External innovators can earn “badges” and recognition for volunteering their efforts to serve our Nation’s Veterans and their families! Also volunteers who volunteer often (and earn the most badges!) or get nominated from their VA employee will earn a special award from VA leadership!

To help accomplish this goal, the VA is partnering with the Open Opportunities team to develop this platform and working with a local high school to build this platform as part of a STEM student project. We are looking for a team of developer mentors to help provide guidance to this team of high school students in leveraging the open source code of the open opportunities platform to build this platform to serve Veterans!

Anyone interested in volunteering to mentor the kids? It could be completely remote, though if you are in DC, we could probably set up time for you to meet the students in person too. We’re kicking off the project on Monday, but the kids don’t start for two weeks. We’re looking for a few more Javascript developers to help. Feel free to contact me directly or reply on our mailing list.

The Open Opportunities project is supported by:

eagle image within a seal and golden banner
Presidential Innovation Fellows Foundation

blue cloud with number 9
Cloud9 IDE

The post serving those who served us appeared first on the evolving ultrasaurus.

by sarah at June 05, 2016 11:42 PM

June 03, 2016

Security Monkey

May 31, 2016

LZone - Sysadmin

Workaround OpenSSH 7.0 Problems

OpenSSH 7+ deprecates weak key exchange algorithm diffie-hellman-group1-sha1 and DSA public keys for both host and user keys which lead to the following error messages:
Unable to negotiate with port 22: no matching key exchange method found. Their offer: diffie-hellman-group1-sha1
or a simple permission denied when using a user DSA public key or
Unable to negotiate with no matching host key type found.
Their offer: ssh-dss
when connecting to a host with a DSA host key.


Allow the different deprecated features in ~/.ssh/config
Host myserver
  # To make pub ssh-dss keys work again

# To make host ssh-dss keys work again HostkeyAlgorithms ssh-dss

# To allow weak remote key exchange algorithm KexAlgorithms +diffie-hellman-group1-sha1
Alternatively pass those three options using -o. For example allow the key exchange when running SSH
ssh -oKexAlgorithms=+diffie-hellman-group1-sha1 <host>


Replace all your dss keys to avoid keys stopping to work. And upgrade all SSH version to avoid offering legacy key exchange algorithms.

May 31, 2016 09:32 AM

May 30, 2016

SaltStack Minion communication and missing returns

Setting up SaltStack is a fairly easy task. There is plenty of documentation here. This is not an install tutorial, this is an explanation and trouble shooting of what is going on with SaltStack Master and Minion communication. Mostly when using the CLI to send commands from the Master to the Minions.

Basic Check List

After you have installed your Salt Master and your Salt Minions software the first thing to do after starting your Master is open your Minion's config file in /etc/salt/minion and fill out the line "master: " to tell the Minion where his Master is. Then start/restart your Salt Minion. Do this for all your Minions.

Go back to the Master and accept all of of the Minions keys. See here on how to do this. If you don't see a certain Minions key here are some things you should check.

  1. Is your Minion and Master running the same software version? The Master can usually work at a higher version. Try to keep them the same if possible.
  2. Is your salt-Minion service running? Make sure it is set to run on start as well.
  3. Has the Minions key been accepted by the Master? If you don't even see a key request from the Minion then the Minion is not even talking to the Master .
  4. Does the Minion have an unobstructed network path back to TCP port 4505 on the Master? The Minions initialize a TCP connection back to the Master so they don't need any ports open. Watch out for those Firewalls.
  5. Check your Minions log file in /var/log/salt/minion for key issues or any other issues.

Basic Communication

Now lets say you have all of basic network and and key issues worked out and would like to send some jobs to your Minions. You can do this via the Salt CLI. Something like salt \* 'echo HI'. This is considered a job by Salt. The Minions get this request and run the command and return the job information to the Master. The CLI talks to the Master who is listening for the return messages as they are coming in on the ZMQ bus. The CLI then reports back that status and output of the job.

That is a basic view of this process. But, sometimes Minions don't return job information. Then you ask yourself what the heck happened. You know the Minion is running fine. Eventually you find out you don't really understand Minion Master job communication at all.

Detailed Breakdown of Master Minion CLI Communication

By default when the job information gets returned to the Master and is stored on disk in the job cache. We will assume this is the case below.

The Salt CLI is just an small bit of code that interfaces with the API SaltStack has written that allows anyone to send commands to the Minions programmatically. The CLI is not connected directly to the Minions when the job request is made. When the CLI makes a job request, is handed to the Master to fulfill.

There are 2 key timeout periods you need be aware of before we go into a explanation of how a job request is handled. They are "timeout" and "gather_job_timeout".

  • In the Master config the "timeout:" setting tells the Master how often to poll the Minions about their job status. After the timeout period expires the Master fires off a "find_job" query. If you did this on the command line it would look like salt \* saltutil.find_job <jid> to all the Minions that were targeted. This asks all of the Minions if they are still running the job they were assigned. This setting can also be set from the CLI with the "-t" flag. Do not make the mistake that this timeout is how long the CLI command will wait until it times out and kills itself. The default value for "timeout" is 5 seconds.
  • In the Master config the "gather_job_timeout" setting defines how long the Master waits for a Minion to respond to the "find_job" query issued by the "timeout" setting mentioned above. If a Minion does not respond in the defined gather_job_timeout time then it is marked by the Master as "non responsive" for that polling period. This setting can not be set from the CLI. It can only be set in the Master config file. The default was 5 seconds, but current configs show it might be 10 seconds now.

When the CLI command is issued, the Master gathers a list of Minions with valid keys so it knows which Minions are on the system. It validates and filters the targeting information from the given target list and sets that as its list (targets) of Minions for the job. Now the Master has a list of who should return information when queried. The Master takes the requested command, target list, job id, and a few pieces of info, and broadcasts a message on the ZeroMQ bus to all of Minions. When all Minions get the message, they look at the target list and decide if they should execute the command or not. If the Minion sees he is in the target list he executes the command. If a Minion sees he is not part of the target list, he just ignores the message. The Minion that decided to run the command creates an local job id for the job and then performs the work.

When ever the Minion finishes job, it will return a message to the Master with the output of the job. The Master will mark that Minion off the target list as "Job Finished". While the Minions are working their jobs the Master is asking all of the Minions every 5 seconds (timeout) if they are still running the job they were assigned by providing them with the job id from the original job request. The Minion responds to "find job" request with a status of "still working" or "Job Finished". If a Minion does not respond to the request within the gather_job_timeout time period (5 secs), the Master marks the Minion as "non responsive" for the polling interval. All Minions will keep being queried on the polling interval until all of the responding Minions have been marked as responding with "Job Finished" as some point. Any Minion not responding during each of these intervals will keep being marked as "non responsive". After the last Minion that has been responding responds with "Job Finished", the Master polls the Minions one last time. Any Minion that has not responded in the final polling period is marked as "non responsive".

The CLI will show the output from the Minions as they finish their jobs. For the Minions that did not respond, but are connected to the Master, you will see the message "Minion did not return". If a Minion does not even look like it has a TCP connection with the Master, you will see "Minion did not return. [Not connected]".

By this time the Master should have marked the job as finished. The jobs info should now be available in the job cache. The above explanation is a high level explanation of how Master and Minions communicate. There are more details to this process than the above info, but this should give you a basic idea of how it works.

Takeaways From This Info

  1. There is no defined period on how long a job will take. The job will finish when the last responsive Minion has said it is done.
  2. If a Minion is not up or connected when a job request it sent out, then the Minion just misses that job. It is _not_ queued by the Master, and sent at a later time.
  3. Currently there is no hard timeout to force the Master to stop listening after a certain amount of time.
  4. If you set your timeout (-t) to be something silly like 3600, then if even one Minion is not responding the CLI will wait the full 3600 seconds to return. Beware!

Missing Returns

Sometimes you know there are Minions up and working, but you get "Minion did not return" or you did not see any info from the Minion at all before the CLI timed out. It is frustrating, as you can send the same Minion that just failed a job and it finishes it with no problem. There can be many reasons for this. Try/check the following things.

  • Did the Minion actually get the job request message? Start the Minion with log level "info" or higher. Then check the Minion log /var/log/salt/minion for job acceptance and completion.
  • After the CLI has exited, check job log. Use the jobs.list_jobs runner to find the job id, then list the output of the job with the jobs.lookup_jid runner. The CLI can miss returns, especially when the Master is overloaded. See the next bit about a overloaded Master.
  • The Master server getting overloaded is often the answer to missed returns. The Master is bound mostly by CPU and disk IO. Make sure these are not being starved. Bump up the threads in the Master config with the setting "worker_threads". Don't bump the threads past the amount of available CPU cores.
  • Bump your timeout (-t) values higher to give the Minions longer periods to finish in between asking for job status. You could also play with gather_job_timeout value in the Master to give the Minions more time to respond (they could be under heavy load) before marking them "non responsive" in between queries.

by at May 30, 2016 04:42 AM

May 29, 2016

Sarah Allen

a new chapter

Last week was a big week for me. I started a new job at Google and my kid graduated from high school. These big changes seem to have arrived all at once, though in reality change is constant.

I’ve spent a lot of time over the past 4 months reflecting on US Government transformation and the small, yet significant impact that one human can make. When I decided to leave at the end of my two-year term at 18F, I knew exactly what I wanted: find a group of smart, fun people who are trying to do something challenging, yet possible, that has the potential to make a big impact on a lot of people’s lives. I wanted to make (or improve) products for regular people. I didn’t need to save the world. I wanted something intellectually challenging that would let me solve different kinds of hard problems, the familiar kind that I’ve been solving my whole career.

At this point in my career, given my tech experience, folks think it is easy to find a job. And, yes, it is easy to find a job, but finding the job where I’ll be happy and be able to effectively apply my skills is hard. Those of us who don’t look like Mark Zuckerberg need to do a bit of homework to validate that we’re joining the right kind of team, and a company where you can be an effective leader and collaborate with difference. I knew the kind of people I wanted to work with, but was pretty flexible of the kind of tech, so it took a bit of research.

I started to get together with people I’ve worked with before who I respect and they introduced me to new people… in almost 5 months I only got about a quarter of the way through my list and met many more folks. To keep track of my random schedule of meetings and introductions, I started tracking everything in a spreadsheet… since December 26th, I’ve had almost 100 meetings or phone calls with 83 people, which led me to seriously consider 10 companies. I interviewed with 5 of them, had job offers from 3, and finally had a really difficult choice. This was actually my goal. I’ve never actually done this kind of job search before, but it is similar to how I usually approach hiring. I like to have good, tough choices.

On of the reasons the choice was hard was that I was biased against Google based on their hiring process. I still think their hiring process is a problem for them attracting and selecting the best talent, but having spent the last three years working for the federal government, I have a lot more empathy for large systems and organizations. I talked to a lot of people inside Google to assess their internal culture and the groups I considered joining. I found a lot of genuinely good and caring people who share my values about building great teams and creating great software. Also, after working for the federal government, I joke that I’m ready to join a small company, like Google, which is about half the size of the US Dept. of Agriculture.

In selecting a new job, I found that the journey, as well as the destination, helped me figure out some things that really matter to me.

An Open Source Project

I hadn’t really understood until last month that open source is not only something I believe is important for government services and non-profit work, but that I value how it enables me to work. I love the impromptu collaboration with like-minded strangers that happens when you work in the open.

Some time ago, Google quietly released Vanadium on github. If you search for it on google you find lots of information on the element which is an essential mineral in the lives of sea squirts.


The technology emerged from a research group at Google, and allows for secure peer-to-peer messaging.

Connected Applications

Connected applications have become widespread in the last decade, with the rise of social networks, messaging apps, and online collaborative tools like Google Docs which let people see each others’ edits in real-time. For the convenience of developers, these online interactions require people to be connected to the Internet. We have transformed the Internet from a resilient, distributed network to a more fragile place where we all have to be connected via central servers. I love Google docs, and understand how powerful it can be to use in the classroom; however, I agree with parents’ concerns about privacy. There’s really no technical need to store all of our personal information in the “cloud.” Technically we could create local clouds, where information never leaves a classroom, home or office. This could have a significant positive impact for privacy, and also for the developing world where Internet connectivity is expensive, intermittent or simply unavailable. So, I guess part of me wants to keep trying to save the world, after all.

What’s Next

I’ll be leading a small team within the Vanadium group. For me, this connects to the work I was doing at the turn of the century with multiparty communication. Under-the-hood it’s a lot like what I was working on when we created the Flash Media Server, except it’s not focused on media, and it doesn’t require a server, and it has nothing to do with Flash. I have a lot of ideas about how technology can help humans connect with each other, and maybe I can help bring some of those to life… or, even more exciting, develop new ideas that make sense in this future we’re creating.

I just started at Google. I’m not sure yet whether all of my ideas align with theirs, but I can already tell that I’m joining an ambitious group of talented engineers who aspire to create a positive impact on the world. I’m looking forward to whatever happens next!

The post a new chapter appeared first on the evolving ultrasaurus.

by sarah at May 29, 2016 07:51 PM

May 28, 2016

Security Monkey

May 27, 2016

The Lone Sysadmin

esxupdate Error Code 99

So I’ve got a VMware ESXi 6.0 host that’s been causing me pain lately. It had some storage issues, and now it won’t let VMware Update Manager scan it, throwing the error: The host returns esxupdate error code:99. An unhandled exception was encountered. Check the Update Manager log files and esxupdate log files for more […]

The post esxupdate Error Code 99 appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at May 27, 2016 05:43 PM

May 25, 2016

The Geekess

Ditch “Culture Fit”

A couple different talks at OSCON got me thinking about the unhealthy results of hiring on the basis of “culture fit”.


Slide from Casey West’s OSCON talk that says “Never in the history of my career has my ability to drink beer made me better at solving a business problem.”

What is company culture? Is it celebrating with co-workers around the company keg? Or would that exclude non-drinkers? Does your company value honest and direct feedback in meetings? Does that mean introverts and remote workers are talked over? Are long working hours and individual effort rewarded, to the point that people who value family are passed up for promotion?

Often times teams who don’t have a diverse network end up hiring people who have similar hobbies, backgrounds, and education. Companies need to avoid “group think” and focus on increasing diversity, because studies have shown that gender-diverse companies are 15% more likely to financially outperform other companies, and racially-diverse companies are 35% more likely to outperform. Other studies have shown that diversity can lead to more internal conflict, but the end result is a more productive team.

How do you change your company culture to value a diverse team? It’s much more than simply hiring more diverse people or making people sit through an hour of unconscious bias training. At OSCON, Casey West talked about some examples of company culture that create an inclusive environment where diverse teams can thrive:

  • Blame-free teams
  • Knowledge sharing culture
  • Continuous learning
  • No judgement on asking questions
  • Continuous feedback
  • Curiosity about different cultures
  • Individually defined work-life balance
  • Valuing empathy

For example, if you have a culture where there’s no judgement on asking questions or raising issues and people are naturally curious about different cultures, it’s easy for a team member to suggest a new feature that might make your product appeal to a broader customer base. After years of analyzing teams, Google found that the most productive teams foster a sense of “psychological safety”, a shared belief in expressing ideas without fear of humiliation.

The other problem with “culture fit” is that it’s an unevenly applied standard. An example of this was Kevin Stewart’s OSCON talk called “Managing While Black”. When Kevin emulated the company culture of pushing back on unnecessary requirements and protecting his team, he was told to “work on his personal brand”. White coworkers were reading him as “the angry black guy.” When he dialed it back, he was told he was “so articulate”, which is a non-compliment that relies on the stereotype that all African Americans are either uneducated or recent immigrants.

In both cases, even though his project was successful, Kevin had his team (and his own responsibilities) scaled back. After years of watching less successful white coworkers get promoted, he was told by management that they simply didn’t “see him in a leadership role.” Whether or not people of color emulate the white leadership behavior and corporate culture around them, they are punished because their coworkers are biased towards white leaders.

As a woman in technical leadership positions, I’ve faced similar “culture fit” issues. I’ve been told by one manager that I needed to be the “one true technical voice” (meaning as a leader I need to shout over the mansplainy guys on my team). And yet, when I clearly articulate valid technical or resourcing concerns to management, I’m “dismissive” of their goals. When I was a maintainer in the Linux kernel and adamantly pushed back on a patch that wall-papered over technical debt, I was told by another maintainer to “calm down”. (If you don’t think that’s a gendered slur based on the stereotype that women are “too emotional”, try imagining telling Linus Torvalds to calm down when he gets passionate about technical debt.)

The point is, traditional “cultural fit” narratives and leadership behaviors only benefit the white cis males that created these cultural norms. Culture can be manipulated in the span of a couple years to enforce or change the status quo. For example, computer programming used to be dominated by women, before hiring “personality tests” biased for men who displayed “disinterest in people”.

We need to be deliberate about the company culture we cultivate. By hiring for empathy, looking for coworkers who are curious about different cultures, and rewarding leaders who don’t fit our preconceived notions, we create an inclusive work environment where people are free to be their authentic selves. Project Include has more resources and examples for people who are interested in changing their company’s culture.

Thanks for reading! If you want me to write more posts on diversity in tech, please consider donating to my Patreon.

by sarah at May 25, 2016 07:40 AM

May 24, 2016

Carl Chenet

Tweet your database with db2twitter

Follow me also on Diaspora*diaspora-banner or Twitter 

You have a database (MySQL, PostgreSQL, see supported database types), a tweet pattern and wants to automatically tweet on a regular basis? No need for RSS, fancy tricks, 3rd party website to translate RSS to Twitter or whatever. Just use db2twitter.

A quick example of a tweet generated by db2twitter:


The new version 0.6 offers the support of tweets with an image. How cool is that?


db2twitter is developed by and run for, the job board of th french-speaking Free Software and Opensource community.


db2twitter also has cool options like;

  • only tweet during user-specified time (e.g 9AM-6PM)
  • use user-specified SQL filter in order to get data from the database (e.g only fetch rows where status == « edited »)

db2twitter is coded in Python 3.4, uses SQlAlchemy (see supported database types) and  Tweepy. The official documentation is available on readthedocs.


by Carl Chenet at May 24, 2016 10:00 PM

May 23, 2016

LZone - Sysadmin

Scan Linux for Vulnerable Packages

How do you know wether your Linux server (which has no desktop update notifier or unattended security updates running) does need to be updated? Of course an
apt-get update && apt-get --dry-run upgrade
might give an indication. But what of the package upgrades do stand for security risks and whose are only simple bugfixes you do not care about?

Check using APT

One useful possibility is apticron which will tell you which packages should be upgraded and why. It presents you the package ChangeLog to decided wether you want to upgrade a package or not. Similar but less details is cron-apt which also informs you of new package updates.

Analyze Security Advisories

Now with all those CERT newsletters, security mailing lists and even security news feeds out there: why can't we check the other way around? Why not find out:
  1. Which security advisories do affect my system?
  2. Which ones I have already complied with?
  3. And which vulnerabilities are still there?
My mad idea was to take those security news feeds (as a start I tried with the ones from Ubuntu and CentOS) and parse out the package versions and compare them to the installed packages. The result was a script producing the following output:

screenshot of

In the output you see lines starting with "CEBA-2012-xxxx" which is CentOS security advisory naming schema (while Ubuntu has USN-xxxx-x). Yellow color means the security advisory doesn't apply because the relevant packages are not installed. Green means the most recent package version is installed and the advisory shouldn't affect the system anymore. Finally red, of course meaning that the machine is vulnerable.

Does it Work Reliably?

The script producing this output can be found here. I'm not yet satisfied with how it works and I'm not sure if it can be maintained at all given the brittle nature of the arbitrarily formatted/rich news feeds provided by the distros. But I like how it gives a clear indication of current advisories and their effect on the system.

Maybe persuading the Linux distributions into using a common feed format with easy to parse metadata might be a good idea...

How do you check your systems? What do you think of a package scanner using XML security advisory feeds?

May 23, 2016 03:52 PM

May 21, 2016

Geek and Artist - Tech

Catching Up

I haven’t posted anything for quite some time (which I feel a little bad about), so this is something of a randomly-themed catch-up post. According to my LinkedIn profile I’ve been doing this engineering management thing for about two years, which at least provides some explanation for a relative lack of technical-oriented blog posts. Of course in that time I have certainly not revoked my Github access, deleted all compilers/runtimes/IDEs/etc and entirely halted technical work, but the nature of the work of course has changed. In short, I don’t find myself doing so much “interesting” technical work that leads to blog-worthy posts and discoveries.

So what does the work look like at the moment? I’ll spare you the deep philosophical analysis – there are many, many (MANY) posts and indeed books on making the transition from a technical contributor to a team manager or lead of some sort. Right back at the beginning I did struggle with the temptation to continue coding and contribute also on my management tasks – it is difficult to do both adequately at the same time. More recently (and perhaps in my moments of less self-control) I do allow myself to do some technical contributions. These usually look like the following:

  • Cleaning up some long-standing technical debt that is getting in the way of the rest of the team being productive, but is not necessarily vital to their learning/growth or knowledge of our technology landscape.
  • Data analysis – usually ElasticSearch, Pig/Hive/Redshift/MapReduce jobs to find the answer to a non-critical but still important question.
  • Occasionally something far down the backlog that is a personal irritation for me, but is not in the critical path.
  • Something that enables the rest of the team in some way, or removes a piece of our technology stack that was previously only known by myself (i.e. removes the need for knowledge transfer)
  • Troubleshooting infrastructure (usually also coupled with data analysis).

I’d like to say I’ve been faithful to that list but I haven’t always. The most recent case was probably around a year ago I guess, when I decided I’d implement a minimum-speed data transfer monitor to our HLS server. This ended up taking several weeks and was a far more gnarly problem than I realised. The resulting code was also not of the highest standard – when you are not coding day-in and day-out, I find that my overall code quality and ability to perceive abstractions and the right model for a solution is impaired.

Otherwise, the tasks that I perhaps should be filling my day with (and this is not an exhaustive list, nor ordered, just whatever comes to mind right now) looks more like this:

  • Assessing the capacity and skills make up of the team on a regular basis, against our backlog and potential features we’d like to deliver. Do we have the right skills and are we managing the “bus factor”? If the answer is “no” (and it almost always is), I should be hiring.
  • Is the team able to deliver? Are there any blockers?
  • Is the team happy? Why or why not? What can I do to improve things for them?
  • How are the team-members going on their career paths? How can I facilitate their personal growth and helping them become better engineers?
  • What is the overall health of our services and client applications? Did we have any downtime last night? What do I need to jump on immediately to get these problems resolved? I would usually consider this the first item to check in my daily routine – if something has been down we need to get it back up and fix the problems as a matter of urgency.
  • What is the current state of our technical debt; are there any tactical or strategic processes we need to start in order to address it?
  • How are we matching up in terms of fitting in with technology standards in the rest of the organisation? Are we falling behind or leading the way in some areas? Are there any new approaches that have worked well for us that could be socialised amongst the rest of the organisation?
  • Are there any organisational pain-points that I can identify and attempt to gather consensus from my peer engineering managers? What could we change on a wider scale that would help the overall organisation deliver user value faster, or with higher quality?
  • Could we improve our testing processes?
  • How are we measuring up against our KPIs? Have we delivered something new recently that needs to be assessed for impact, and if so has it been a success or not matched up to expectations? Do we need to rethink our approach or iterate on that feature?
  • Somewhat related: have there been any OS or platform updates on any of our client platforms that might have introduced bugs that we need to address? Ideally we would be ahead of the curve and anticipate problems before they happen, but if you have a product that targets web browsers or Android phones, there are simply too many to adequately test ahead of general public releases before potential problems are discovered by the public.
  • Is there any free-range experimentation the team could be doing? Let’s have a one-day offsite to explore something new! (I usually schedule at least one offsite a month for this kind of thing, with a very loose agenda.)
  • How am I progressing on my career path? What aspects of engineering management am I perhaps not focussing enough on? What is the next thing I need to be learning?

I could probably go on and on about this for a whole day. After almost two years (and at several points before that) it is natural to question whether the engineering management track is the one I should be on. Much earlier (perhaps 6 months in) I was still quite unsure – if you are still contributing a lot of code as part of your day to day work, the answer to the question is that much harder to arrive at since you have blurred the lines of what your job description should look like. It is much easier to escape the reality of settling permanently on one side or the other.

Recently I had some conversations with people which involved talking in depth about either software development or engineering management. On the one hand, exploring the software development topics with someone, I definitely got the feeling that there was a lot I am getting progressively more and more rusty on. To get up to speed again I feel would take some reasonable effort on my part. In fact, one of the small technical debt “itches” I scratched at the end of last year was implementing a small application to consume from AWS Kinesis, do some minor massaging of the events and then inject them into ElasticSearch. I initially thought I’d write it in Scala, but the cognitive burden of learning the language at that point was too daunting. I ended up writing it in Java 8 (which I have to say is actually quite nice to use, compared to much older versions of Java) but this is not a struggle a competent engineer coding on a daily basis would typically have.

On the other hand, the conversations around engineering management felt like they could stretch on for ever. I could literally spend an entire day talking about some particular aspect of growing an organisation, or a team, or on technical decision-making (and frequently do). Some of this has been learned through trial and error, some by blind luck and I would say a decent amount through reading good books and the wonderful leadership/management training course at SoundCloud (otherwise known as LUMAS). I and many other first-time managers took this course (in several phases) starting not long after I started managing the team, and I think I gained a lot from it. Unfortunately it’s not something anyone can simply take, but at least I’d like to recommend some of the books we were given during the course – I felt I got a lot out of them as well.

  • Conscious Business by Fred Kofman. It might start out a bit hand-wavy, and feel like it is the zen master approach to leadership but if you persist you’ll find a very honest, ethical approach to business and leadership. I found it very compelling.
  • Five Dysfunctions of a Team by Patrick Lencioni. A great book, and very easy read with many compelling stories as examples – for building healthy teams. Applying the lessons is a whole different story, and I would not say it is easy by any measure. But avoiding it is also a path to failure.
  • Leadership Presence by Kathy Lubar and Belle Linda Halpern. Being honest and genuine, knowing yourself, establishing a genuine connection and empathy to others around you and many other gems within this book are essential to being a competent leader. I think this is a book I’ll keep coming back to for quite some time.

In addition I read a couple of books on Toyota and their lean approach to business (they are continually referenced in software development best practices). I have to admit that making a solid connection between all of TPS can be a challenge, and I hope to further learn about them in future and figure out which parts are actually relevant and which parts are not. There were a few other books around negotiation and other aspects of leadership which coloured my thinking but were not significant enough to list. That said, I still have something like 63 books on my wish list still to be ordered and read!

In order to remain “relevant” and in touch with technical topics I don’t want to stop programming, of course, but this will have to remain in the personal domain. To that end I’m currently taking a game programming course in Unity (so, C#) and another around 3D modelling using Blender. Eventually I’ll get back to the machine learning courses I was taking a long time ago but still need to re-take some beginner linear algebra in order to understand the ML concepts properly. Then there are a tonne of other personal projects in various languages and to various ends. I’ll just keep fooling myself that I’ll have free time for all of these things 🙂

by oliver at May 21, 2016 08:14 PM

May 17, 2016


Creating a swap file for tiny cloud servers

A few months ago while setting up a few cloud servers to host one of my applications. I started running into an interesting issue while building Docker containers. During the docker build execution my servers ran out of memory causing the Docker build to fail.

The servers in question only have about 512MB of RAM and the Docker execution was using the majority of the available memory. My solution to this problem was simple, add a swap file.

A swap file is a file that lives on a file system and is used like a swap device.

While adding the swap file I thought this would make for an interesting article. Today's post is going to cover how to add a swap file to a Linux system. However, before we jump into creating a swap file, let's explore what exactly swap is and how it works.

What is swap

When Linux writes data to physical memory, or RAM it does so in chunks that are called pages. Pages can vary in size from system to system but in general a page is a small amount of data. The Linux kernel will keep these pages in physical memory for as long as it has available memory to spare and the pages are relevant.

When the kernel needs to make room for new memory pages it will move some memory pages out of physical memory, where the destination is usually to a swap device. Swap devices are hard-drive partitions or files within a file system used to expand the memory capacity of a system.

A simple way to think of swap is to think of it as an overflow area for system memory. With that thought, the general rule of thumb is that memory pages being written to swap is a bad thing. In my case however, since the only time I require swap is during the initial build of my Docker containers. Swapping isn't necessarily a bad thing.

In fact, my use case is exactly the reason why swap exists. I have a process that requires more memory than what is available on my system, and I would prefer to use some disk space to perform the task rather than add memory to the system. The key is that my system does not swap during normal operations, only during the Docker build execution.

Determining if a system is Swapping

In summary, Swapping is only a bad thing, if it happens during unexpected times. A simple way to determine if a system is actively swapping is to run the vmstat command.

$ vmstat -n 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0    100 207964  22056 129964    0    0    14     1   16   32  0  0 100  0  0
 0  0    100 207956  22056 129956    0    0     0     0   16   26  0  0 100  0  0
 0  0    100 207956  22064 129956    0    0     0     2   16   27  0  0 100  0  0
 0  0    100 207956  22064 129956    0    0     0     0   14   23  0  0 100  0  0
 0  0    100 207956  22064 129956    0    0     0     0   14   24  0  0 100  0  0

The vmstat command can be used to show memory statistics from the system in question. In the above there are two columns, si (Swapped In) and so (Swapped Out) which are used to show the number of pages swapped in and swapped out.

Since the output above shows that the example system is neither swapping in, or swapping out memory pages, we can assume that swapping on this server is probably not an issue. The fact that at one point the system has swapped 100KB of data does not necessarily mean there is an issue.

When does a system swap

A common misconception about swap and Linux is that the system will swap only when it is completely out of memory. While this was mostly true a long time ago, recent versions of the Linux kernel include a tunable parameter called vm.swappiness. The vm.swappiness parameter is essentially an aggressiveness setting. The higher the vm.swappiness value (0 - 100), the more aggressive the kernel will swap. When set to 0, the kernel will only swap once the system memory is below the vm.min_free_kbytes value.

What this means is, unless vm.swappiness is set at 0. The system may swap at any time, even without memory being completely used. Since by default the vm.swappiness value is set to 60, which is somewhat on the aggressive side. A typical Linux system may swap without ever hitting the vm.min_free_kbytes threshold.

This aggressiveness does not often impact performance (though it can in some cases) because when the kernel decides to start swapping it will pick memory pages based on the last accessed time of those pages. What this means is, data that is frequently accessed is kept in physical memory longer, and data that is not frequently accessed is usually the primary candidates for moving to swap.

Now that we have a better understanding of swap, let's add a swap device to the cloud servers in question.

Adding a swap device

Adding a swap device is fairly easy and can be done with only a few commands. Let's take a look at the memory available on the system today. We will do this with the free command, using the -m (Megabytes) flag.

$ free -m
             total       used       free     shared    buffers     cached
Mem:           489        281        208          0         17        126
-/+ buffers/cache:        136        353
Swap:            0          0          0

From the free command's output we can see that this server currently has no swap space at all. We will change that by creating a 2GB swap file.

Creating a swap file

The first step in creating a swap file is to create a file with the dd command.

$ sudo dd if=/dev/zero of=/swap bs=1M count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 5.43507 s, 395 MB/s

The reason we used dd to create the file is because we can create more than a simple empty file. With the dd command above we created a 2GB file (writing 1MB chunks 2048 times) with the contents being populated from /dev/zero. This gives us the data space that will make up the swap device. Whatever size we make the file in this step, is the size the swap device will be.

With the file created, we can now use the mkswap command to make the file a swap device.

$ sudo mkswap /swap
Setting up swapspace version 1, size = 2097148 KiB
no label, UUID=0d64a557-b7f9-407e-a0ef-d32ccd081f4c

The mkswap command is used to setup a file or device as a swap device. This is performed by formatting the device or file as a swap partition.

Mounting the swap device

With our file now formatted as a swap device we can start the steps to mount it. Like any other filesystem device, in order to mount our swap device we will need to specify it within the /etc/fstab file. Let's start by appending the below contents into the /etc/fstab file.

/swap   swap  swap  defaults  0 0

With the above step complete, we can now mount the swap file. To do this we will use the swapon command.

$ sudo swapon -a

The -a flag for swapon tells the command to mount and activate all defined swap devices within the /etc/fstab file. To check if our swap file was successfully mounted we can execute the swapon command again with the -s option.

$ sudo swapon -s
Filename                Type        Size    Used    Priority
/swap                                   file        2097148    0    -1

From the above we can see that /swap, the file we created is mounted and currently in use as a swap device. If we were to re-run the free command from above we should also see the swap space available.

$ free -m
             total       used       free     shared    buffers     cached
Mem:           489        442         47          0         11        291
-/+ buffers/cache:        139        350
Swap:         2047          0       2047

In the previous execution of the free command the swap size was 0 and now it is 2047 MB. This means we have roughly 2GB of swap to use with this system.

A useful thing to know about Linux is that you can have multiple swap devices. The priority value of the swap device is used to determine in what order the data should be written. When 2 or more swap devices have the same priority swap usage is balanced across both. This means you can gain some performance during times of swap usage by creating multiple swap devices on different hard-drives.

Got any other swap tips? Add them below in the comments.

Posted by Benjamin Cane

May 17, 2016 08:30 PM

LZone - Sysadmin

Do Not List Iptables NAT Rules Without Care

What I do not want to do ever again is running
iptables -L -t nat
on a core production server with many many connections.

And why?

Well because running "iptables -L" auto-loads the table specific iptables kernel module which for the "nat" table is "iptables_nat" which has a dependency on "nf_conntrack".

While "iptables_nat" doesn't do anything when there are no configured iptables rules, "nf_conntrack" immediately starts to drop connections as it cannot handle the many many connections the server has.

The probably only safe way to check for NAT rules is:
grep -q ^nf_conntrack /proc/modules && iptables -L -t nat

May 17, 2016 04:05 PM

May 11, 2016

A Year in the Life of a BSD Guru

May 10, 2016

A Year in the Life of a BSD Guru

EuroBSDCon 2010

I'll be heading to the airport later this afternoon, enroute to Karlsruhe for this year's EuroBSDCon. Here are my activities for the conference:

May 10, 2016 12:05 PM

Security Monkey

May 09, 2016

Standalone Sysadmin

Debian Jessie Preseed – Yes, please

What could possibly draw me out of blogging dormancy? %!@^ing Debian, that’s what.

I’m working on getting a Packer job going so I can make an updated Vagrant box on a regular basis, but in order to do that, I need to get a preseed file arranged. Well, everything goes swimmingly using examples I’ve found on the internet (because the Debian-provided docs are awful) until this screen:

partition disks if you continue the changes listed below will be written to the disks write the changes to disks yes no

That’s weird. I’ve got the right answers in the preseed:

d-i partman-auto/method                 string lvm
d-i partman-auto-lvm/guided_size        string max
d-i partman-lvm/device_remove_lvm       boolean true
d-i partman-lvm/confirm                 boolean true
d-i partman-auto-lvm/confirm            boolean true
d-i partman-partitioning/confirm_write_new_label   boolean true
d-i partman-partitioning/confirm        boolean true
d-i partman-lvm/confirm_nooverwrite     boolean true

and I’ve even got this, as a backup:

d-i partman/confirm                     boolean true
d-i partman/confirm_nooverwrite         boolean true

The “partman-lvm” is what should be the answer, because of the “partman-auto/method” line. Well, it doesn’t work. That’s weird. So I start Googling. And nothing. And nothing. Hell, I can’t even figure out if preseed files are order-specific. The last time someone apparently asked that question on the internet was 2008.

Anyway. My coworker found a link to a Debian 6 question on ServerFault and suggested that I try something crazy: set the preseed answer for the partman RAID setting:

d-i partman-md/confirm_nooverwrite      boolean true

which makes no freaking sense. So, of course, it worked.

Why you have to do that to make the partman-lvm installer work, I have no idea. Please comment if you have any inkling. Or just want to rant against preseeds too. It’s 8:45am and I’m ready to go home.

by Matt Simmons at May 09, 2016 03:49 PM

May 08, 2016

toolsmith #116: vFeed & vFeed Viewer


In case you haven't guessed by now, I am an unadulterated tools nerd. Hopefully, ten years of toolsmith have helped you come to that conclusion on your own. I rejoice when I find like-minded souls, I found one in Nabil (NJ) Ouchn (@toolswatch), he of Black Hat Arsenal and fame. In addition to those valued and well-executed community services, NJ also spends a good deal of time developing and maintaining vFeed. vFeed included a Python API and the vFeed SQLite database, now with support for Mongo. It is, for all intents and purposes a correlated community vulnerability and threat database. I've been using vFeed for quite a while now having learned about it when writing about FruityWifi a couple of years ago.
NJ fed me some great updates on this constantly maturing product.
Having achieved compatibility certifications (CVE, CWE and OVAL) from MITRE, the vFeed Framework (API and Database) has started to gain more than a little gratitude from the information security community and users, CERTs and penetration testers. NJ draws strength from this to add more features now and in the future. The actual vFeed roadmap is huge. It varies from adding new sources such as security advisories from industrial control system (ICS) vendors, to supporting other standards such as STIX, to importing/enriching scan results from 3rd party vulnerability and threat scanners such as Nessus, Qualys, and OpenVAS.
There have a number of articles highlighting impressive vFeed uses cases of vFeed such as:
Needless to say, some fellow security hackers and developers have included vFeed in their toolkit, including Faraday (March 2015 toolsmith), Kali Linux, and more (FruityWifi as mentioned above).

The upcoming version vFeed will introduce support for CPE 2.3, CVSS 3, and new reference sources. A proof of concept to access the vFeed database via a RESTFul API is in testing as well. NJ is fine-tuning his Flask skills before releasing it. :) NJ, does not consider himself a Python programmer and considers himself unskilled (humble but unwarranted). Luckily Python is the ideal programming language for someone like him to express his creativity.
I'll show you all about woeful programming here in a bit when we discuss the vFeed Viewer I've written in R.

First, a bit more about vFeed, from its Github page:
The vFeed Framework is CVE, CWE and OVAL compatible and provides structured, detailed third-party references and technical details for CVE entries via an extensible XML/JSON schema. It also improves the reliability of CVEs by providing a flexible and comprehensive vocabulary for describing the relationship with other standards and security references.
vFeed utilizes XML-based and  JSON-based formatted output to describe vulnerabilities in detail. This output can be leveraged as input by security researchers, practitioners and tools as part of their vulnerability analysis practice in a standard syntax easily interpreted by both human and machine.
The associated vFeed.db (The Correlated Vulnerability and Threat Database) is a detective and preventive security information repository useful for gathering vulnerability and mitigation data from scattered internet sources into an unified database.
vFeed's documentation is now well populated in its Github wiki, and should be read in its entirety:
  1. vFeed Framework (API & Correlated Vulnerability Database)
  2. Usage (API and Command Line)
  3. Methods list
  4. vFeed Database Update Calendar
vFeed features include:
  • Easy integration within security labs and other pentesting frameworks 
  • Easily invoked via API calls from your software, scripts or from command-line. A proof of concept python is provided for this purpose
  • Simplify the extraction of related CVE attributes
  • Enable researchers to conduct vulnerability surveys (tracking vulnerability trends regarding a specific CPE)
  • Help penetration testers analyze CVEs and gather extra metadata to help shape attack vectors to exploit vulnerabilities
  • Assist security auditors in reporting accurate information about findings during assignments. vFeed is useful in describing a vulnerability with attributes based on standards and third-party references(vendors or companies involved in the standardization effort)
vFeed installation and usage

Installing vFeed is easy, just download the ZIP archive from Github and unpack it in your preferred directory or, assuming you've got Git installed, run git clone
You'll need a Python interpreter installed, the latest instance of 2.7 is preferred. From the directory in which you installed vFeed, just run python -h followed by python -u to confirm all is updated and in good working order; you're ready to roll.

You've now read section 2 (Usage) on the wiki, so you don't need a total usage rehash here. We'll instead walk through a few options with one of my favorite CVEs: CVE-2008-0793.

If we invoke python -m get_cve CVE-2008-0793, we immediately learn that it refers to a Tendenci CMS cross-site scripting vulnerability. The -m parameter lets you define the preferred method, in this case, get_cve.

Groovy, is there an associated CWE for CVE-2008-0793? But of course. Using the get_cwe method we learn that CWE-79 or "Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')" is our match.

If you want to quickly learn all the available methods, just run python --list.
Perhaps you'd like to determine what the CVSS score is, or what references are available, via the vFeed API? Easy, if you run...

from lib.core.methods.risk import CveRisk
cve = "CVE-2014-0160"
cvss = CveRisk(cve).get_cvss()
print cvss

You'll retrieve...

For reference material...

from lib.core.methods.ref import CveRef
cve = "CVE-2008-0793"
ref = CveRef(cve).get_refs()
print ref


And now you know...the rest of the story. CVE-2008-0793 is one of my favorites because a) I discovered it, and b) the vendor was one of the best of many hundreds I've worked with to fix vulnerabilities.

vFeed Viewer

If NJ thinks his Python skills are rough, wait until he sees this. :-)
I thought I'd get started on a user interface for vFeed using R and Shiny, appropriately name vFeed Viewer and found on Github here. This first version does not allow direct queries of the vFeed database as I'm working on SQL injection prevention, but it does allow very granular filtering of key vFeed tables. Once I work out safe queries and sanitization, I'll build the same full correlation features you enjoy from NJ's Python vFeed client.
You'll need a bit of familiarity with R to make use of this viewer.
First install R, and RStudio.  From the RStudio console, to ensure all dependencies are met, run install.packages(c("shinydashboard","RSQLite","ggplot2","reshape2")).
Download and install the vFeed Viewer in the root vFeed directory such that app.R and the www directory are side by side with, etc. This ensures that it can read vfeed.db as the viewer calls it directly with dbConnect and dbReadTable, part of the RSQLite package.
Open app.R with RStudio then, click the Run App button. Alternatively, from the command-line, assuming R is in your path, you can run R -e "shiny::runApp('~/shinyapp')" where ~/shinyapp is the path to where app.R resides. In my case, on Windows, I ran R -e "shiny::runApp('c:\\tools\\vfeed\\app.R')". Then browser to the localhost address Shiny is listening on. You'll probably find the RStudio process easier and faster.
One important note about R, it's not known for performance, and this app takes about thirty seconds to load. If you use Microsoft (Revolution) R with the MKL library, you can take advantage of multiple cores, but be patient, it all runs in memory. Its fast as heck once it's loaded though.
The UI is simple, including an overview.

At present, I've incorporated NVD and CWE search mechanisms that allow very granular filtering.

 As an example, using our favorite CVE-2008-0793, we can zoom in via the search field or the CVE ID drop down menu. Results are returned instantly from 76,123 total NVD entries at present.

From the CWE search you can opt to filter by keywords, such as XSS for this scenario, to find related entries. If you drop cross-site scripting in the search field, you can then filter further via the cwetitle filter field at the bottom of the UI. This is universal to this use of Shiny, and allows really granular results.

You can also get an idea of the number of vFeed entries per vulnerability category entities. I did drop CPEs as their number throws the chart off terribly and results in a bad visualization.

I'll keep working on the vFeed Viewer so it becomes more useful and helps serve the vFeed community. It's definitely a work in progress but I feel as if there's some potential here.


Thanks to NJ for vFeed and all his work with the infosec tools community, if you're going to Black Hat be certain to stop by Arsenal. Make use of vFeed as part of your vulnerability management practice and remember to check for updates regularly. It's a great tool, and getting better all the time.
Ping me via email or Twitter if you have questions (russ at holisticinfosec dot org or @holisticinfosec).
Cheers…until next month.


Nabil (NJ) Ouchn (@toolswatch)

by Russ McRee ( at May 08, 2016 07:23 PM

May 03, 2016

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – April 2016

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month:
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? Well, you be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” …  [223 pageviews]
  2. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM, see this document [2015 update]) [116 pageviews]
  3. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our new 2016 research on security monitoring use cases here! [84 pageviews]
  4. “A Myth of An Expert Generalist” is a fun rant on what I think it means to be “a security expert” today; it argues that you must specialize within security to really be called an expert [80 pageviews]
  5. Simple Log Review Checklist Released!” is often at the top of this list – this aging checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version) [78 pageviews of total 3857 pageviews to all blog pages]
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has about 3X the traffic of this blog]: 
Current research on IR:
Current research on EDR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015.
Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on Aug 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Previous post in this endless series:

by Anton Chuvakin ( at May 03, 2016 08:00 PM

Carl Chenet

Feed2tweet, your RSS feed to Twitter Python self-hosted app

Feed2tweet is a self-hosted Python app to send you RSS feed to Twitter.

Feed2tweet is in production for Le Journal du hacker, a French Hacker News-style FOSS website and, the job board of the French-speaking FOSS community.


Feed2tweet 0.3 now only runs with Python 3.  It also fixes a nasty bug with RSS feeds modifying the RSS entry orders. Have a look at the Feed2tweet 0.3 changelog:

Using Feed2tweet? Send us bug reports/feature requests/push requests/comments about it!

by Carl Chenet at May 03, 2016 04:15 PM

May 02, 2016

Vincent Bernat

Pragmatic Debian packaging

While the creation of Debian packages is abundantly documented, most tutorials are targeted to packages implementing the Debian policy. Moreover, Debian packaging has a reputation of being unnecessarily difficult1 and many people prefer to use less constrained tools2 like fpm or CheckInstall.

However, I would like to show how building Debian packages with the official tools can become straightforward if you bend some rules:

  1. No source package will be generated. Packages will be built directly from a checkout of a VCS repository.

  2. Additional dependencies can be downloaded during build. Packaging individually each dependency is a painstaking work, notably when you have to deal with some fast-paced ecosystems like Java, Javascript and Go.

  3. The produced packages may bundle dependencies. This is likely to raise some concerns about security and long-term maintenance, but this is a common trade-off in many ecosystems, notably Java, Javascript and Go.

Pragmatic packages 101§

In the Debian archive, you have two kinds of packages: the source packages and the binary packages. Each binary package is built from a source package. You need a name for each package.

As stated in the introduction, we won’t generate a source package but we will work with its unpacked form which is any source tree containing a debian/ directory. In our examples, we will start with a source tree containing only a debian/ directory but you are free to include this debian/ directory into an existing project.

As an example, we will package memcached, a distributed memory cache. There are four files to create:

  • debian/compat,
  • debian/changelog,
  • debian/control, and
  • debian/rules.

The first one is easy. Just put 9 in it:

echo 9 > debian/compat

The second one has the following content:

memcached (0-0) UNRELEASED; urgency=medium

  * Fake entry

 -- Happy Packager <>  Tue, 19 Apr 2016 22:27:05 +0200

The only important information is the name of the source package, memcached, on the first line. Everything else can be left as is as it won’t influence the generated binary packages.

The control file§

debian/control describes the metadata of both the source package and the generated binary packages. We have to write a block for each of them.

Source: memcached
Maintainer: Vincent Bernat <>

Package: memcached
Architecture: any
Description: high-performance memory object caching system

The source package is called memcached. We have to use the same name as in debian/changelog.

We generate only one binary package: memcached. In the remaining of the example, when you see memcached, this is the name of a binary package. The Architecture field should be set to either any or all. Use all exclusively if the package contains only arch-independent files. In doubt, just stick to any.

The Description field contains a short description of the binary package.

The build recipe§

The last mandatory file is debian/rules. It’s the recipe of the package. We need to retrieve memcached, build it and install its file tree in debian/memcached/. It looks like this:

#!/usr/bin/make -f

DISTRIBUTION = $(shell lsb_release -sr)
VERSION = 1.4.25
TARBALL = memcached-$(VERSION).tar.gz

    dh $@

    wget -N --progress=dot:mega $(URL)
    tar --strip-components=1 -xf $(TARBALL)
    ./configure --prefix=/usr
    make install DESTDIR=debian/memcached

    dh_gencontrol -- -v$(PACKAGEVERSION)

The empty targets override_dh_auto_clean, override_dh_auto_test and override_dh_auto_build keep debhelper from being too smart. The override_dh_gencontrol target sets the package version3 without updating debian/changelog. If you ignore the slight boilerplate, the recipe is quite similar to what you would have done with fpm:

DISTRIBUTION=$(lsb_release -sr)

wget -N --progress=dot:mega ${URL}
tar --strip-components=1 -xf ${TARBALL}
./configure --prefix=/usr
make install DESTDIR=/tmp/installdir

# Build the final package
fpm -s dir -t deb \
    -n memcached \
    -C /tmp/installdir \
    --description "high-performance memory object caching system"

You can review the whole package tree on GitHub and build it with dpkg-buildpackage -us -uc -b.

Pragmatic packages 102§

At this point, we can iterate and add several improvements to our memcached package. None of those are mandatory but they are usually worth the additional effort.

Build dependencies§

Our initial build recipe only work when several packages are installed, like wget and libevent-dev. They are not present on all Debian systems. You can easily express that you need them by adding a Build-Depends section for the source package in debian/control:

Source: memcached
Build-Depends: debhelper (>= 9),
               wget, ca-certificates, lsb-release,

Always specify the debhelper (>= 9) dependency as we heavily rely on it. We don’t require make or a C compiler because it is assumed that the build-essential meta-package is installed and it pulls those. dpkg-buildpackage will complain if the dependencies are not met. If you want to install those packages from your CI system, you can use the following command4:

mk-build-deps \
    -t 'apt-get -o Debug::pkgProblemResolver=yes --no-install-recommends -qqy' \
    -i -r debian/control

You may also want to investigate pbuilder or sbuild, two tools to build Debian packages in a clean isolated environment.

Runtime dependencies§

If the resulting package is installed on a freshly installed machine, it won’t work because it will be missing libevent, a required library for memcached. You can express the dependencies needed by each binary package by adding a Depends field. Moreover, for dynamic libraries, you can automatically get the right dependencies by using some substitution variables:

Package: memcached
Depends: ${misc:Depends}, ${shlibs:Depends}

The resulting package will contain the following information:

$ dpkg -I ../memcached_1.4.25-0\~unstable0_amd64.deb | grep Depends
 Depends: libc6 (>= 2.17), libevent-2.0-5 (>= 2.0.10-stable)

Integration with init system§

Most packaged daemons come with some integration with the init system. This integration ensures the daemon will be started on boot and restarted on upgrade. For Debian-based distributions, there are several init systems available. The most prominent ones are:

  • System-V init is the historical init system. More modern inits are able to reuse scripts written for this init, so this is a safe common denominator for packaged daemons.
  • Upstart is the less-historical init system for Ubuntu (used in Ubuntu 14.10 and previous releases).
  • systemd is the default init system for Debian since Jessie and for Ubuntu since 15.04.

Writing a correct script for the System-V init is error-prone. Therefore, I usually prefer to provide a native configuration file for the default init system of the targeted distribution (Upstart and systemd).


If you want to provide a System-V init script, have a look at /etc/init.d/skeleton on the most ancient distribution you want to target and adapt it5. Put the result in debian/memcached.init. It will be installed at the right place, invoked on install, upgrade and removal. On Debian-based systems, many init scripts allow user customizations by providing a /etc/default/memcached file. You can ship one by putting its content in debian/memcached.default.


Providing an Upstart job is similar: put it in debian/memcached.upstart. For example:

description "memcached daemon"

start on runlevel [2345]
stop on runlevel [!2345]
respawn limit 5 60
expect daemon

  . /etc/default/memcached
  exec memcached -d -u $USER -p $PORT -m $CACHESIZE -c $MAXCONN $OPTIONS
end script

When writing an Upstart job, the most important directive is expect. Be sure to get it right. Here, we use expect daemon and memcached is started with the -d flag.


Providing a systemd unit is a bit more complex. The content of the file should go in debian/memcached.service. For example:

Description=memcached daemon

ExecStart=/usr/bin/memcached -d -u $USER -p $PORT -m $CACHESIZE -c $MAXCONN $OPTIONS


We reuse /etc/default/memcached even if it is not considered a good practice with systemd6. Like for Upstart, the directive Type is quite important. We used forking as memcached is started with the -d flag.

You also need to add a build-dependency to dh-systemd in debian/control:

Source: memcached
Build-Depends: debhelper (>= 9),
               wget, ca-certificates, lsb-release,

And you need to modify the default rule in debian/rules:

    dh $@ --with systemd

The extra complexity is a bit unfortunate but systemd integration is not part of debhelper7. Without those additional modifications, the unit will get installed but you won’t get a proper integration and the service won’t be enabled on install or boot.

Dedicated user§

Many daemons don’t need to run as root and it is a good practice to ship a dedicated user. In the case of memcached, we can provide a _memcached user8.

Add a debian/memcached.postinst file with the following content:


set -e

case "$1" in
        adduser --system --disabled-password --disabled-login --home /var/empty \
                --no-create-home --quiet --force-badname --group _memcached


exit 0

There is no cleanup of the user when the package is removed for two reasons:

  1. Less stuff to write.
  2. The user could still own some files.

The utility adduser will do the right thing whatever the requested user already exists or not. You need to add it as a dependency in debian/control:

Package: memcached
Depends: ${misc:Depends}, ${shlibs:Depends}, adduser

The #DEBHELPER# marker is important as it will be replaced by some code to handle the service configuration files (or some other stuff).

You can review the whole package tree on GitHub and build it with dpkg-buildpackage -us -uc -b.

Pragmatic packages 103§

It is possible to leverage debhelper to reduce the recipe size and to make it more declarative. This section is quite optional and it requires understanding a bit more how a Debian package is built. Feel free to skip it.

The big picture§

There are four steps to build a regular Debian package:

  1. debian/rules clean should clean the source tree to make it pristine.

  2. debian/rules build should trigger the build. For an autoconf-based software, like memcached, this step should execute something like ./configure && make.

  3. debian/rules install should install the file tree of each binary package. For an autoconf-based software, this step should execute make install DESTDIR=debian/memcached.

  4. debian/rules binary will pack the different file trees into binary packages.

You don’t directly write each of those targets. Instead, you let dh, a component of debhelper, do most of the work. The following debian/rules file should do almost everything correctly with many source packages:

#!/usr/bin/make -f
    dh $@

For each of the four targets described above, you can run dh with --no-act to see what it would do. For example:

$ dh build --no-act

Each of those helpers has a manual page. Helpers starting with dh_auto_ are a bit “magic”. For example, dh_auto_configure will try to automatically configure a package prior to building: it will detect the build system and invoke ./configure, cmake or Makefile.PL.

If one of the helpers do not do the “right” thing, you can replace it by using an override target:

    ./configure --with-some-grog

Those helpers are also configurable, so you can just alter a bit their behaviour by invoking them with additional options:

    dh_auto_configure -- --with-some-grog

This way, ./configure will be called with your custom flag but also with a lot of default flags like --prefix=/usr for better integration.

In the initial memcached example, we overrode all those “magic” targets. dh_auto_clean, dh_auto_configure and dh_auto_build are converted to no-ops to avoid any unexpected behaviour. dh_auto_install is hijacked to do all the build process. Additionally, we modified the behavior of the dh_gencontrol helper by forcing the version number instead of using the one from debian/changelog.

Automatic builds§

As memcached is an autoconf-enabled package, dh knows how to build it: ./configure && make && make install. Therefore, we can let it handle most of the work with this debian/rules file:

#!/usr/bin/make -f

DISTRIBUTION = $(shell lsb_release -sr)
VERSION = 1.4.25
TARBALL = memcached-$(VERSION).tar.gz

    dh $@ --with systemd

    wget -N --progress=dot:mega $(URL)
    tar --strip-components=1 -xf $(TARBALL)

    # Don't run the whitespace test
    rm t/whitespace.t

    dh_gencontrol -- -v$(PACKAGEVERSION)

The dh_auto_clean target is hijacked to download and setup the source tree9. We don’t override the dh_auto_configure step, so dh will execute the ./configure script with the appropriate options. We don’t override the dh_auto_build step either: dh will execute make. dh_auto_test is invoked after the build and it will run the memcached test suite. We need to override it because one of the test is complaining about odd whitespaces in the debian/ directory. We suppress this rogue test and let dh_auto_test executes the test suite. dh_auto_install is not overriden either, so dh will execute some variant of make install.

To get a better sense of the difference, here is a diff:

--- memcached-intermediate/debian/rules 2016-04-30 14:02:37.425593362 +0200
+++ memcached/debian/rules  2016-05-01 14:55:15.815063835 +0200
@@ -12,10 +12,9 @@
    wget -N --progress=dot:mega $(URL)
    tar --strip-components=1 -xf $(TARBALL)
-   ./configure --prefix=/usr
-   make
-   make install DESTDIR=debian/memcached
+   # Don't run the whitespace test
+   rm t/whitespace.t
+   dh_auto_test

It is up to you to decide if dh can do some work for you, but you could try to start from a minimal debian/rules and only override some targets.

Install additional files§

While make install installed the essential files for memcached, you may want to put additional files in the binary package. You could use cp in your build recipe, but you can also declare them:

  • files listed in debian/ will be copied to /usr/share/doc/memcached by dh_installdocs,
  • files listed in debian/memcached.examples will be copied to /usr/share/doc/memcached/examples by dh_installexamples,
  • files listed in debian/memcached.manpages will be copied to the appropriate subdirectory of /usr/share/man by dh_installman,

Here is an example using wildcards for debian/


If you need to copy some files to an arbitrary location, you can list them along with their destination directories in debian/memcached.install and dh_install will take care of the copy. Here is an example:

scripts/memcached-tool usr/bin

Using those files make the build process more declarative. It is a matter of taste and you are free to use cp in debian/rules instead. You can review the whole package tree on GitHub.

Other examples§

The GitHub repository contains some additional examples. They all follow the same scheme:

  • dh_auto_clean is hijacked to download and setup the source tree
  • dh_gencontrol is modified to use a computed version

Notably, you’ll find daemons in Java, Go, Python and Node.js. The goal of those examples is to demonstrate that using Debian tools to build Debian packages can be straightforward. Hope this helps.

  1. People may remember the time before debhelper 7.0.50 (circa 2009) where debian/rules was a daunting beast. However, nowaday, the boilerplate is quite reduced. 

  2. The complexity is not the only reason. Those alternative tools enable the creation of RPM packages, something that Debian tools obviously don’t. 

  3. There are many ways to version a package. Again, if you want to be pragmatic, the proposed solution should be good enough for Ubuntu. On Debian, it doesn’t cover upgrade from one distribution version to another, but we assume that nowadays, systems get reinstalled instead of being upgraded. 

  4. You also need to install devscripts and equivs package. 

  5. It’s also possible to use a script provided by upstream. However, there is no such thing as an init script that works on all distributions. Compare the proposed with the skeleton, check if it is using start-stop-daemon and if it sources /lib/lsb/init-functions before considering it. If it seems to fit, you can install it yourself in debian/memcached/etc/init.d/. debhelper will ensure its proper integration. 

  6. Instead, a user wanting to customize the options is expected to edit the unit with systemctl edit

  7. See #822670 

  8. The Debian Policy doesn’t provide any hint for the naming convention of those system users. A common usage is to prefix the daemon name with an underscore (like _memcached). Another common usage is to use Debian- as a prefix. The main drawback of the latest solution is that the name is likely to be replaced by the UID in ps and top because of its length. 

  9. We could call dh_auto_clean at the end of the target to let it invoke make clean. However, it is assumed that a fresh checkout is used before each build. 

by Vincent Bernat at May 02, 2016 07:25 PM

April 30, 2016


Resumè of failure

There has been a Thing going through twitter lately, about a Princeton Prof who posted a resume of failures.

About that...

This is not a bad idea, especially for those of us in Ops or bucking for 'Senior' positions. Why? Because in my last job hunt, a very large percentage of interviewers asked a question like this:

Please describe your biggest personal failure, and what you learned from it?

That's a large scope. How to pick which one?

What was your biggest interpersonal failure, and how did you recover from it?

In a 15+ year career, figuring out which is 'biggest' is a challenge. But first, I need to remember what they are. For this one, I've been using events that happened around 2001; far enough back that they don't really apply to the person I am now. This is going to be a problem soon.

What was your biggest production-impacting failure, and how was the post-mortem process handled?

Oh, do I go for 'most embarrassing,' or, 'most educational'? I'd snark about 'so many choices', but my memory tends to wallpaper over the more embarassing fails in ways that make remembering them during an interview next to impossible. And in this case, the 'post-mortem process' bit at the end actually rules out my biggest production-impacting problem... there wasn't a post-mortem, other than knowing looks of, you're not going to do that again, right?

Please describe your biggest failure of working with management on something.

Working in service-organizations as long as I have, I have a lot of 'failure' here. Again, picking the right one to use in an interview is a problem.

You begin to see what I'm talking about here. If I had realized that my failures would be something I needed to both keep track of, and keep adequate notes on to refer back to them 3, 5, 9, 14 years down the line, I would have been much better prepared for these interviews. The interviewers are probing how I behave when Things Are Not Going Right, since that sheds far more light on a person than Things Are Going Perfectly projects.

A Resumè of Failure would have been an incredibly useful thing to have. Definitely do not post it online, since hiring managers are looking for rule-outs to thin the pile of applications. But keep it next to your copy of your resume, next to your References from Past Managers list.

by SysAdmin1138 at April 30, 2016 01:53 PM

April 26, 2016

Ansible-cmdb v1.14: Generate a host overview of Ansible facts.

I've just released ansible-cmdb v1.14. Ansible-cmdb takes the output of Ansible's fact gathering and converts it into a static HTML overview page containing system configuration information. It supports multiple templates and extending information gathered by Ansible with custom data.

This release includes the following bugfixes and feature improvements:

  • Look for ansible.cfg and use hostfile setting.
  • html_fancy: Properly sort vcpu and ram columns.
  • html_fancy: Remember which columns the user has toggled.
  • html_fancy: display groups and hostvars even if no host information was collected.
  • html_fancy: support for facter and custom facts.
  • html_fancy: Hide sections if there are no items for it.
  • html_fancy: Improvements in the rendering of custom variables.
  • Apply Dynamic Inventory vars to all hostnames in the group.
  • Many minor bugfixes.

As always, packages are available for Debian, Ubuntu, Redhat, Centos and other systems. Get the new release from the Github releases page.

by admin at April 26, 2016 01:30 PM

April 25, 2016

The Geekess

White Corporate Feminism

When I first went to the Grace Hopper Celebration of Women in Computing conference, it was magical. Being a woman in tech means I’m often the only woman in a team of male engineers, and if there’s more than one woman on a team, it’s usually because we have a project manager or marketing person who is a woman.

Going to the Grace Hopper conference, and being surrounded by women engineers and women computer science students, allowed me to relax in a way that I could never do in a male-centric space. I could talk with other women who just understood things like the glass ceiling and having to be constantly on guard in order to “fit in” with male colleagues. I had crafted a persona, an armor of collared shirts and jeans, trained myself to interrupt in order to make my male colleagues listen, and lied to myself and others that I wasn’t interested in “girly” hobbies like sewing or knitting. At Grace Hopper, surrounded by women, I could stop pretending, and try to figure out how to just be myself. To take a breath, stop interrupting, and cherish the fact that I was listened to and given space to listen to others.

However, after a day or so, I began to feel uneasy about two particular aspects of the Grace Hopper conference. I felt uneasy watching how aggressively the corporate representatives at the booths tried to persuade the students to join their companies. You couldn’t walk into the ballroom for keynotes without going through a gauntlet of recruiters. When I looked around the ballroom at the faces of the women surrounding me, I realized the second thing that made me uneasy. Even though Grace Hopper was hosted in Atlanta that year, a city that is 56% African American, there weren’t that many women of color attending. We’ve also seen the Grace Hopper conference feature more male keynote speakers, which is problematic when the goal of the conference is to allow women to connect to role models that look like them.

When I did a bit of research for this blog post, I looked at the board member list for Anita Borg Institute, who organizes the Grace Hopper Conference. I was unsurprised to see major corporate executives hold the majority of Anita Borg Institute board seats. However, I was curious why the board member page had no pictures on it. I used Google Image search in combination with the board member’s name and company to create this image:

My unease was recently echoed by Cate Huston, who also noticed the trend towards corporations trying to co-opt women’s only spaces to feed women into their toxic hiring pipeline. Last week, I also found this excellent article on white feminism, and how white women need to allow people of color to speak up about the problematic aspects of women-only spaces. There was also an interesting article last week about how “women’s only spaces” can be problematic for trans women to navigate if they don’t “pass” the white-centric standard of female beauty. The article also discusses that by promoting women-only spaces as “safe”, we are unintentially promoting the assumption that women can’t be predators, unconsciously sending the message to victims of violent or abusive women that they should remain silent about their abuse.

So how to do we expand women-only spaces to be more inclusive, and move beyond white corporate feminism? It starts with recognizing the problem often lies with the white women who start initiatives, and fail to bring in partners who are people of color. We also need to find ways to fund inclusive spaces and diversity efforts without big corporate backers.

We also need to take a critical look at how well-meaning diversity efforts often center around improving tech for white women. When you hear a white male say, “We need more women in X community,” take a moment to question them on why they’re so focused on women and not also bringing in more people of color, people who are differently abled, or LGBTQ people. We need to figure out how to expand the conversation beyond white women in tech, both in external conversations, and in our own projects.

One of the projects I volunteer for is Outreachy, a three-month paid internship program to increase diversity in open source. In 2011, the coordinators were told the language around encouraging “only women” to apply wasn’t trans-inclusive, so they changed the application requirements to clarify the program was open to both cis and trans women. In 2013, they clarified that Outreachy was also open to trans men and gender queer people. Last year, we wanted to open the program to men who were traditionally underrepresented in tech. After taking a long hard look at the statistics, we expanded the program to include all people in the U.S. who are Black/African American, Hispanic/Latin@, American Indian, Alaska Native, Native Hawaiian, or Pacific Islander. We want to expand the program to additional people who are underrepresented in tech in other countries, so please contact us if you have good sources of diversity data for your country.

But most importantly, white people need to learn to listen to people of color instead of being a “white savior”. We need to believe people of color’s lived experience, amplify their voices when people of color tell us they feel isolated in tech, and stop insisting “not all white women” when people of color critique a problematic aspect of the feminist community.

Trying to move into more intersectional feminism is one of my goals, which is why I’m really excited to speak at the Richard Tapia Celebration of Diversity in Computing. I hadn’t heard of it until about a year ago (probably because they have less corporate sponsorship and less marketing), but it’s been described to me as “Grace Hopper for people of color”. I’m excited to talk to people about open source and Outreachy, but most importantly, I want to go and listen to people who have lived experiences that are different from mine, so I can promote their voices.

If you can kick in a couple dollars a month to help me cover costs for the conference, please donate on my Patreon. I’ll be writing about the people I meet at Tapia on my blog, so look for a follow-up post in late September!

by sarah at April 25, 2016 04:25 PM

April 23, 2016

That grumpy BSD guy

Does Your Email Provider Know What A "Joejob" Is?

Anecdotal evidence seems to indicate that Google and possibly other mail service providers are either quite ignorant of history when it comes to email and spam, or are applying unsavoury tactics to capture market dominance.

The first inklings that Google had reservations about delivering mail coming from my domain came earlier this year, when I was contacted by friends who have left their email service in the hands of Google, and it turned out that my replies to their messages did not reach their recipients, even when my logs showed that the Google mail servers had accepted the messages for delivery.

Contacting Google about matters like these means you first need to navigate some web forums. In this particular case (I won't give a direct reference, but a search on the likely keywords will likely turn up the relevant exchange), the denizens of that web forum appeared to be more interested in demonstrating their BOFHishness than actually providing input on debugging and resolving an apparent misconfiguration that was making valid mail disappear without a trace after it had entered Google's systems.

The forum is primarily intended as a support channel for people who host their mail at Google (this becomes very clear when you try out some of the web accessible tools to check domains not hosted by Google), so the only practial result was that I finally set up DKIM signing for outgoing mail from the domain, in addition to the SPF records that were already in place. I'm in fact less than fond of either of these SMTP addons, but there were anyway other channels for contact with my friends, and I let the matter rest there for a while.

If you've read earlier instalments in this column, you will know that I've operated with an email service since 2004 and a handful of other domains from some years before the domain was set up, sharing to varying extents the same infrastructure. One feature of the and associated domains setup is that in 2006, we started publishing a list of known bad addresses in our domains, that we used as spamtrap addresses as well as publising the blacklist that the greytrapping generates.

Over the years the list of spamtrap addresses -- harvested almost exclusively from records in our logs and greylists of apparent bounces of messages sent with forged From: addresses in our domains - has grown to a total of 29757 spamtraps, a full 7387 in the domain alone. At the time I'm writing this 31162 hosts have attempted to deliver mail to one of those spamtrap addresses. The exact numbers will likely change by the time you read this -- blacklisted addresses expire 24 hours after last contact, and new spamtrap addresses generally turn up a few more each week. With some simple scriptery, we pick them out of logs and greylists as they appear, and sometimes entire days pass without new candidates appearing. For a more general overview of how I run the blacklist, see this post from 2013.

In addition to the spamtrap addresses, the domain has some valid addresses including my own, and I've set up a few addresses for specific purposes (actually aliases), mainly set up so I can filter them into relevant mailboxes at the receiving end. Despite all our efforts to stop spam, occasionally spam is delivered to those aliases too (see eg the ethics of running the traplist page for some amusing examples).

Then this morning a piece of possibly well intended but actually quite clearly unwanted commercial email turned up, addressed to one of those aliases. For no actually good reason, I decided to send an answer to the message, telling them that whoever sold them the address list they were using were ripping them off.

That message bounced, and it turns out that the domain was hosted at Google.

Reading that bounce message is quite interesting, because if you read the page they link to, it looks very much like whoever runs Google Mail doesn't know what a joejob is.

The page, which again is intended mainly for Google's own customers, specifies that you should set up SPF and DKIM for domains. But looking at the headers, the message they reject passes both those criteria:

Received-SPF: pass ( domain of designates 2001:16d8:ff00:1a9::2 as permitted sender) client-ip=2001:16d8:ff00:1a9::2;
dkim=pass (test mode);
spf=pass ( domain of designates 2001:16d8:ff00:1a9::2 as permitted sender)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=x;
h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version:Date:Message-ID:From:References:To:Subject; bh=OonsF8beQz17wcKmu+EJl34N5bW6uUouWw4JVE5FJV8=;

Then for reasons known only to themselves, or most likely due to the weight they assign to some unknown data source, they reject the message anyway.

We do not know what that data source is. But with more than seven thousand bogus addresses that have generated bounces we've registered it's likely that the number of invalid From: addresses Google's systems has seen is far larger than the number of valid ones. The actual number of bogus addresses is likely higher, though: in the early days the collection process had enough manual steps that we're bound to have missed some. Valid addresses that do not eventually resolve to a mailbox I read are rare if not entirely non-existent. But the 'bulk mail' classification is bizarre if you even consider checking Received: headers.

The reason Google's systems most likely has seen more bogus From: addresses than valid ones is that by historical accident faking sender email addresses in SMTP dialogues is trivial.

Anecdotal evidence indicates that if a domain exists it will sooner or later be used in the from: field of some spam campaign where the messages originate somewhere else completely, and for that very reason the SPF and DKIM mechanisms were specified. I find both mechanisms slightly painful and inelegant, but used in their proper context, they do have their uses.

For the domains I've administered, we started seeing log entries, and in the cases where the addresses were actually deliverable, actual bounce messages for messages that definitely did not originate at our site and never went through our infrastructure a long time before was created. We didn't even think about recording those addresses until a practical use for them suddenly appeared with the greytrapping feature in OpenBSD 3.3 in 2003.

A little while after upgrading the relevant systems to OpenBSD 3.3, we had a functional greytrapping system going, at some point before the 2007 blog post I started publishing the generated blacklist. The rest is, well, what got us to where we are today.

From the data we see here, mail sent with faked sender addresses happens continuously and most likely to all domains, sooner or later. Joejobs that actually hit deliverable addresses happen too. Raw data from a campaign in late 2014 that used my main address as the purported sender is preserved here, collected with a mind to writing an article about the incident and comparing to a similar batch from 2008. That article could still be written at some point, and in the meantime the messages and specifically their headers are worth looking into if you're a bit like me. (That is, if you get some enjoyment out of such things as discovering the mindbogglingly bizarre and plain wrong mail configurations some people have apparently chosen to live with. And unless your base64 sightreading skills are far better than mine, some of the messages may require a bit of massaging with MIME tools to extract text you can actually read.)

Anyone who runs a mail service and bothers even occasionally to read mail server logs will know that joejobs and spam campaigns with fake and undeliverable return addresses happen all the time. If Google's mail admins are not aware of that fact, well, I'll stop right there and refuse to believe that they can be that incompentent.

The question then becomes, why are they doing this? Are they giving other independent operators the same treatment? If this is part of some kind of intimidation campaign (think "sign up for our service and we'll get your mail delivered, but if you don't, delivering to domains that we host becomes your problem"). I would think a campaign of intimidation would be a less than useful strategy when there are alread antitrust probes underway, these things can change direction as discoveries dictate.

Normally I would put oddities like the ones I saw in this case down to a silly misconfiguration, some combination of incompetence and arrogance and, quite possibly, some out of control automation thrown in. But here we are seeing clearly wrong behavior from a company that prides itself in hiring only the smartest people they can find. That doesn't totally rule out incompetence or plain bad luck, but it makes for a very strange episode. (And lest we forget, here is some data on a previous episode involving a large US corporation, spam and general silliness. (Now with the external link fixed via the wayback machine.))

One other interesting question is whether other operators, big and small behave in any similar ways. If you have seen phenomena like this involving Google or other operators, I would like to hear from you by email (easily found in this article) or in comments.

Update 2016-05-03: With Google silent on all aspects of this story, it is not possible pinpoint whether the public whining or something else that made a difference, but today a message aimed at one of those troublesome domains made it through to its intended recipient.

Much like the mysteriously disappeared messages, my logs show a "Message accepted for delivery" response from the Google servers, but unlike the earlier messages, this actually appeared in the intended inbox. In the meantime one of one of the useful bits of feedback I got on the article was that my IPv6 reverse DNS was in fact lacking. Today I fixed that, or at least made a plausible attempt to (you're very welcome to check with tools available to you). And for possibly unrelated reasons, my test message made it through all the way to the intended recipient's mailbox.

[Opinions offered here are my own and may or may not reflect the views of my employer.]

I will be giving a PF tutorial at BSDCan 2016, and I welcome your questions now that I'm revising the material for that session. See this blog post for some ideas (note that I'm only giving the PF tutorial this time around).

by Peter N. M. Hansteen ( at April 23, 2016 05:56 PM

April 21, 2016

Michael Biven

Our Resistance to Change has Sent Us the Long Way Round to Where We’re Going

Eight years after Mark Mayo wrote about how cloud computing would change our profession pretty much everything still applies and people are still trying to adapt to the changes he highlighted. The post is primarily about the changes that individual systems administrators would be facing, but it also describes how the way we work would need to adapt.

The problem is that on the operations and systems side of web development our overall reaction to this was slow and I think we confused what the actual goal should be with something that’s was only temporary.

Configuration Management is the configure; make; make install of infrastructure.

The big change that happened during that time was the public cloud became a reality and we were able to programmatically interact with the infrastructure. That change opened up the possibility of adopting software development best practices. The reaction to this was better configuration management with tools like Puppet and Chef.

Five years after (2013) Mark’s post people started working on containers again along with a new read-only OS. Just as the public cloud was important by bringing an API to interact and manage an environment, Docker and CoreOS provided the opportunity for the early adopters to start thinking about delivering artifacts instead of updating existing things with config management. Immutable infrastructure finally became practical. Years ago we gave up on manually building software and started deploying packages. Configure; make; make install was replaced with rpm, apt-get, or yum.

We stopped deploying a process that would be repeated either manually or scripted and replaced it with the results. That’s what we’ve been missing. We set our sights lower and settled on just more manual work. We would never pull down a diff and do a make install. We would just update to the latest package. Puppet, Chef, Ansible and Salt have held us back.

Wasn’t a benefit of infrastructure as code the idea that we could move the environment into a CI/CD pipeline as a first class citizen?

The build pipelines that has existed in software development is now really a thing for infrastructure. Testing and a way (Packer and Docker ) to create artifacts has been available for a while, but now we have tools ( Terraform, Mesos, and Kubernetes ) to manage and orchestrate how those things are used.

You better learn to code, because we only need so many people to write documentation.

There are only two things that we all really work on. Communication and dealing with change. The communication can be writing text or writing code. Sometimes the change is planned and sometimes it’s unplanned. But at the end of the day these two things are really the only things we do. We communicate, perform and react to changes.

If you cannot write code then you can’t communicate, perform and react as effectively as you could. I think it’s safe to say this is why Mark highlighted those three points eight years ago.

  1. We have to accept and plan for a future where getting access to bare metal will be a luxury. Clients won’t want to pay for it, and frankly we won’t have the patience or time to deal with waiting more than 5 minutes for any infrastructure soon anyways.
  2. We need to learn to program better, with richer languages.
  3. Perhaps the biggest challenge we’re going to have is one we share with developers: How to deal with a world where everything is distributed, latency is wildly variable, where we have to scale for throughput more than anything else. I believe we’re only just starting to see the need for “scale” on the web and in systems work.

– Mark Mayo, Cloud computing is a sea change - How sysadmins can prepare

April 21, 2016 02:47 PM

April 19, 2016


Hiera Node Classifier 0.7

A while ago I released a Puppet 4 Hiera based node classifier to see what is next for hiera_include(). This had the major drawback that you couldn’t set an environment with it like with a real ENC since Puppet just doesn’t have that feature.

I’ve released a update to the classifier that now include a small real ENC that takes care of setting the environment based on certname and then boots up the classifier on the node.


ENCs tend to know only about the certname, you could imagine getting most recent seen facts from PuppetDB etc but I do not really want to assume things about peoples infrastructure. So for now this sticks to supporting classification based on certname only.

It’s really pretty simple, lets assume you are wanting to classify, you just need to have a (or JSON) file somewhere in a path. Typically this is going to be in a directory environment somewhere but could of course also be a site wide hiera directory.

In it you put:

classifier::environment: development

And this will node will form part of that environment. Past that everything in the previous post just applies so you make rules or assign classes as normal, and while doing so you have full access to node facts.

The classifier now expose some extra information to help you determine if the ENC is in use and based on what file it’s classifying the node:

  • $classifier::enc_used – boolean that indicates if the ENC is in use
  • $classifier::enc_source – path to the data file that set the environment. undef when not found
  • $classifier::enc_environment – the environment the ENC is setting

It supports a default environment which you configure when configuring Puppet to use a ENC as below.

Configuring Puppet

Configuring Puppet is pretty simple for this:

node_terminus = exec
external_nodes = /usr/local/bin/classifier_enc.rb --data-dir /etc/puppetlabs/code/hieradata --node-pattern nodes/%%.yaml

Apart from these you can do –default development to default to that and not production and you can add –debug /tmp/enc.log to get a bunch of debug output.

The data-dir above is for your classic Hiera single data dir setup, but you can also use globs to support environment data like –data-dir /etc/puppetlabs/code/environments/*/hieradata. It will now search the entire glob until it finds a match for the certname.

That’s really all there is to it, it produce a classification like this:

environment: production
    enc_used: true
    enc_source: /etc/puppetlabs/code/hieradata/node.example.yaml
    enc_environment: production


That’s really all there is to it, I think this might hit a high percentage of user cases and bring a key ability to the hiera classifiers. It’s a tad annoying there is no way really to do better granularity than just per node here, I might come up with something else but don’t really want to go too deep down that hole.

In future I’ll look about adding a class to install the classifier into some path and configure Puppet, for now that’s up to the user. It’s shipped in the bin dir of the module.

by R.I. Pienaar at April 19, 2016 03:14 PM

April 18, 2016


Two Column for Loop in bash

I had this interesting question the other day. Someone had a file with two columns of data in it. They wanted to assign each column to a different variable and then take action using those two variables. Here's the example I wrote for them:

for LINE in $(cat data_file); do
    VARA=$(echo ${LINE} | awk '{ print $1}')
    VARB=$(echo ${LINE} | awk '{ print $2 }')
    echo "VARA is ${VARA}"
    echo "VARB is ${VARB}"

The key here is to set the internal field separator ($IFS) to $'n' so that the for loop interates on lines rather than words. The it's simply a matter of splitting the column into individual variables. In this case, I chose to use awk since it's a simple procedure and speed is not really an issue. Long-term, I would probably re-write this using arrays.

by Scott Hebert at April 18, 2016 01:00 PM