Planet SysAdmin

March 25, 2018

OpenStack nova get-password, set-password and post encrypted password to metadata service

When you create images for an OpenStack Cloud you want to use 'cloud' features. Fancy term for automatic resizing of your instance disk, adding an SSH key, (re)setting passwords and executing scripts on first boot to configure your instance further. OpenStack provides the metadata service for instances, which supplies information for the instance, like its public IP, SSH public key that was provided and vendor or user provided data like scripts or information. The OpenStack metadata service allows an instance to post data to an endpoint wich can be retreived with the 'nova get-password' command. It is meant to be an encrypted password (with the public SSH key) but it can be any plain text as well and it doesn't have to be the root password. In this guide I'll go over the scripts I use inside linux images to post a password to the metadata service and the 'nova' commands such as 'set-password' and 'get-password'. That includes decrypting a password with an SSH key that is password-protected (Horizon and nova don't support that) and the 'nova set-password' command, which sets the root password inside an instance when it has the 'qemu-guest-agent' installed and running.

March 25, 2018 12:00 AM

March 23, 2018

Chris Siebenmann

Some things about Dovecot, its index files, and the IMAP LIST command

We have a backwards compatibility issue with our IMAP server, where people's IMAP roots are $HOME, their home directory, and then clients ask the IMAP server to search all through the IMAP namespace; this causes various bad things to happen, including running out of inodes. The reason we ran out of inodes is that Dovecot maintains some index files for every mailbox it looks at.

We have Dovecot store its index files on our IMAP server's local disk, in /var/local/dovecot/<user>. Dovecot puts these in a hierarchy that mirrors the actual Unix (and IMAP) hierarchy of the mailboxes; if there is a subdirectory Mail in your home directory with a mailbox Drafts, the Dovecot index files will be in .../<user>/Mail/.imap/Drafts/. It follows that you can hunt through someone's Dovecot index files to see what mailboxes their clients have looked at, although this may tell you less than you think and what their active mailboxes are.

(One reason that Dovecot might look at a mailbox is that your client has explicitly asked it to, with an IMAP SELECT command or perhaps an APPEND, COPY, or MOVE operation. However, there are other reasons.)

When I began digging into our IMAP pain and working on our planned migration (which has drastically changed directions since then), I was operating under the charming idea that most clients used IMAP subscriptions and only a few of them asked the IMAP server to inventory everything in sight. One of the reasons for this is that only a few people had huge numbers of Dovecot index files, and I assumed that the two were tied together. It turns out that both sides of this are wrong.

Perhaps I had the idea that it was hard to do an IMAP LIST operation that asked the server to recursively descend through everything under your IMAP root. It isn't; it's trivial. Here's the IMAP command to do it:

m LIST "" "*"

That's all it takes (the unrestricted * is the important bit). The sort of good news is that this operation by itself won't cause Dovecot to actually look at those mailboxes and thus to build index files for them. However, there is a close variant of this LIST command that does force Dovecot to look at each file, because it turns out that you can ask your IMAP server to not just list all your mailboxes but to tell you which ones have unseen messages. That looks like this:


Some clients use one LIST version, some use the other, and some seem to use both. Importantly, the standard iOS Mail app appears to use the 'LIST UNSEEN' version at least some of the time. iDevices are popular around the department, and it's not all that easy to find the magic setting for what iOS calls the 'IMAP path prefix'.

For us, a user with a lot of Dovecot index files was definitely someone who had a client with the 'search all through $HOME' problem (especially if the indexes were for things that just aren't plausible mailboxes). However, a user with only a few index files wasn't necessarily someone without the problem, because their client could be using the first version of the LIST command and thus not creating all those tell-tale index files. As far as I know, stock Dovecot has no way of letting you find out about these people.

(We hacked logging in to the Ubuntu version of Dovecot, which involved some annoyances. In theory Dovecot has a plugin system that we might have been able to use for this; in practice, figuring out the plugin API seemed likely to be at least as much work as hacking the Dovecot source directly.)

Sidebar: Limited LISTs

IMAP LIST commands can be limited in two ways, both of which have more or less the same effect for us:

m LIST "" "mail/*"
m LIST "mail/" "*"

For information on what the arguments to the basic LIST command mean, I will refer you to the IMAP RFC. The extended form is discussed in RFC 5819 and is based on things from, I believe, RFC 5258. See also RFC 6154 and here for the special-use stuff.

(The unofficial IMAP protocol wiki may be something I'll be consulting periodically now that I've stumbled over it, eg this matrix of all of the IMAP RFCs.)

by cks at March 23, 2018 05:54 AM

March 22, 2018

Upcoming Drupal 7 and 8 core highly critical security release: March 28th

The post Upcoming Drupal 7 and 8 core highly critical security release: March 28th appeared first on

If you're getting warnings like these, you might want to make sure you're ready to patch on March 28th.

There will be a security release of Drupal 7.x, 8.3.x, 8.4.x, and 8.5.x on March 28th 2018 between 18:00 -- 19:30 UTC, one week from the publication of this document, that will fix a highly critical security vulnerability.

The Drupal Security Team urges you to reserve time for core updates at that time because exploits might be developed within hours or days.


While Drupal 8.3.x and 8.4.x are no longer supported and we don't normally provide security releases for unsupported minor releases, given the potential severity of this issue, we are providing 8.3.x and 8.4.x releases that include the fix for sites which have not yet had a chance to update to 8.5.0.

Source: Drupal 7 and 8 core highly critical release on March 28th, 2018 PSA-2018-001 |

The post Upcoming Drupal 7 and 8 core highly critical security release: March 28th appeared first on

by Mattias Geniar at March 22, 2018 08:25 AM

Chris Siebenmann

Why seeing what current attributes a Python object has is hard

Back when I wrote some notes on __slots__ and class hierarchies, I said in passing that there was no simple way to see what attributes an object currently has (I was sort of talking about objects that use __slots__, but it's actually more general). Today, for reasons beyond the scope of this entry, I feel like talking about why things work out this way.

To see where things get tricky, I'll start out by talking about where they're simple. If what we have is some basic struct object and we want to see what fields it has, the most straightforward approach is to look at its __dict__. We can get the same result indirectly by taking the dir() of the object and subtracting the dir() of its class:

>>> class A:
...   def __init__(self):
...      self.a = 10
...      self.b = 20
>>> a = A()
>>> set(dir(a)) - set(dir(a.__class__))
{'b', 'a'}

(This falls out of the definition of dir(), but note that this only works on simple objects that don't do a variety of things.)

The first problem is that neither version of this approach works for instances of classes that use __slots__. Such objects have no __dict__, and if you look at dir() it will tell you that they have no attributes of their own:

>>> class B:
...   __slots__ = ('a', 'b')
...   def __init__(self):
...      self.a = 10
>>> b = B()
>>> set(dir(b)) - set(dir(b.__class__))

This follows straightforwardly from how __slots__ are defined, particularly this bit:

  • __slots__ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. [...]

Descriptors are attributes on the class, not on instances of the class, although they create behavior in those instances. As we can see in dir(), the class itself has a and b attributes:

>>> B.a
<member 'a' of 'B' objects>

(In CPython, these are member_descriptor objects.)

For an instance of a __slots__ using class, we still have a somewhat workable definition of what attributes it has. For each __slots__ attribute, an instance has the attribute if hasattr() is true for it, which means that you can access it. Here our b instance of B has an a attribute but doesn't have a b attribute. You can at least write code that mechanically checks this, although it's a bit harder than it looks.

(One part is that you need the union of __slots__ on all base classes.)

However, we've now arrived at the tricky bit. Suppose that we have a general property on a class under the name par. When should we say that instances of this class have a par attribute? In one sense, instances never will, because at the mechanical level par will always be a class attribute and will never appear in an instance __dict__. In another sense, we could reasonably say that instances have a par attribute when hasattr() is true for it, ie when accessing inst.par won't raise AttributeError; this is the same definition as we used for __slots__ attributes. Or we might want to be more general and say that an attribute only 'exists' for our purposes when accessing it doesn't raise any errors, not just AttributeError (after all, this is when we can use the attribute). But what if this property actually computes the value for par on the fly from somewhere, in effect turning an attribute into a method; do we say that par is still an attribute of the instance, even though it doesn't really act like an attribute any more?

Python has a lot of ways to attach sophisticated behavior to instances of classes that's triggered when you try to access an attribute in some way. Once we have such sophisticated behavior in action, there's no clear or universal definition of when an instance 'has' an attribute and it becomes a matter of interpretation and opinion. This is one deep cause of why there's no simple way to see what attributes an object currently has; once we get past the simple cases, it's not even clear what the question means.

(Even if we come up with a meaning for ourselves, classes that define __getattr__ or __getattribute__ make it basically impossible to see what attribute names we want to check, as the dir() documentation gently notes. There are many complications here.)

Sidebar: The pragmatic answer

The pragmatic answer is that if it's sensible to ask this question about an object at all, we can get pretty much the answer we want by looking at the object's __dict__ (if it has one), then adding the merged __slots__ names for which hasattr() reports true.

That this answer blows up on things like proxy objects suggests that perhaps it's not a question we should be asking in the first place, at least not outside of limited and specialized situations.

(In other words, it's possible to get entirely too entranced with the theory of Python and neglect its practical applications. I'm guilty of this from time to time.)

by cks at March 22, 2018 04:15 AM

March 21, 2018

Chris Siebenmann

You probably don't want to run Firefox Nightly any more

Some people like to run Firefox Nightly for various reasons; you can like seeing what's coming, or want to help Mozilla out by testing the bleeding edge, or various other things. I myself have in the past run a Firefox compiled from the development tree (although at the moment I'm still using Firefox 56). Unfortunately and sadly I must suggest that you not do that any more, and only run Firefox Nightly if you absolutely have to (for example to test some bleeding edge web feature that's available only in Nightly).

Let's start with @doublec's tweet:

Even if it is only Mozilla's nightly browser and for a short period of time I'm a bit disturbed about the possibility of an opt-out only "send all visited hostnames to a third party US company" study.
FYI: Short Nightly Shield Study involving DNS over HTTPs (DoH)

(via Davor Cubranic)

In the thread, it's revealed that Mozilla is planning an opt-out DNS over HTTPS study for Firefox Nightly users that will send DoH queries for all hostnames to a server implementation at Cloudflare (with some legal protections for the privacy of this information).

This by itself is not why I think you should stop running Firefox Nightly now. Instead, the reason why comes up further down the thread, in a statement by a Mozilla person which I'm going to quote from directly:

It isn't explicit right now that using nightly means opting in to participating in studies like this, and I think the text of the download page antedates our ability to do those studies. The text of the Firefox privacy page says that prerelease products "may contain different privacy characteristics" than release, but doesn't enumerate them. [...]

Let me translate this: people using Firefox Nightly have less privacy protections and less respect for user choice from Mozilla than people using Firefox releases. Mozilla feels free to do things to your browsing that they wouldn't do to users of regular Firefox (well, theoretically wouldn't do), and you're implicitly consenting to all of this just by using Nightly.

That's why you shouldn't use Nightly; you shouldn't agree to this. Using Nightly now is pasting a 'kick me' sign on your back. You can hope that Mozilla will kick carefully and for worthwhile things and that it won't hurt, but Mozilla is going to kick you. They've said so explicitly.

Unfortunately, Mozilla's wording on this on the current privacy page says that these 'different privacy characteristics' apply to all pre-release versions, not just Nightly. It's not clear to me if the 'Developer Edition' is considered a pre-release version for what Mozilla can do to it, but it probably is. Your only reasonably safe option appears to be to run a release version of Firefox.

(Perhaps Mozilla will clarify that, but I'm not holding my breath for Mozilla to take their hands out of the cookie jar.)

I don't know what this means for people building Firefox from source (especially from the development tree instead of a release). I also don't know what currently happens in any version (built from source or downloaded) if you explicitly turn off SHIELD studies. Regardless of what happens now, I wouldn't count on turning off SHIELD studies working in future Nightly versions; allowing you to opt out of such things runs counter to Mozilla's apparent goal of using Nightly users as a captive pool of test dummies.

(I don't know if I believe or accept Mozilla's views that existing users of Nightly have accepted this tiny print that says that Mozilla can dump them in opt-out privacy invasive studies, but it doesn't matter. It's clear that Mozilla has this view, and it's not like I expect Mozilla to pay any attention to people like me.)

PS: I had a grumpy Twitter reaction to this news, which I stand by. Mozilla knows this is privacy intrusive and questionable, they just don't care when it's Nightly users. There are even people in the discussion thread arguing that the ends justify the means. Whatever, I don't care any more; my expectations keep getting lowered.

PPS: I guess I'll have to periodically check about:studies and the Privacy preference for SHIELD studies, just to make sure.

by cks at March 21, 2018 06:19 AM

March 20, 2018

Evaggelos Balaskas

Migrating to PowerDNS

A few years ago, I migrated from ICS Bind Authoritative Server to PowerDNS Authoritative Server.

Here was my configuration file:

# egrep -v '^$|#' /etc/pdns/pdns.conf 




Α quick reminder, a DNS server is running on tcp/udp port53.

I use dnsdist (a highly DNS-, DoS- and abuse-aware loadbalancer) in-front of my pdns-auth, so my configuration file has a small change:


instead of local-address, local-ipv6

You can also use pdns without dnsdist.

My named.conf looks like this:

# cat /etc/pdns/named.conf

zone "" IN {
    type master;
    file "/etc/pdns/var/";

So in just a few minutes of work, bind was no more.
You can read more on the subject here: Migrating to PowerDNS.

Converting from Bind zone files to SQLite3

PowerDNS has many features and many Backends. To use some of these features (like the HTTP API json/rest api for automation, I suggest converting to the sqlite3 backend, especially for personal or SOHO use. The PowerDNS documentation is really simple and straight-forward: SQLite3 backend


Install the generic sqlite3 backend.
On a CentOS machine type:

# yum -y install pdns-backend-sqlite


Create the directory in which we will build and store the sqlite database file:

# mkdir -pv /var/lib/pdns


You can find the initial sqlite3 schema here:


you can also review the sqlite3 database schema from github

If you cant find the schema.sqlite3.sql file, you can always download it from the web:

# curl -L -o /var/lib/pdns/schema.sqlite3.sql  \

Create the database

Time to create the database file:

# cat /usr/share/doc/pdns/schema.sqlite3.sql | sqlite3 /var/lib/pdns/pdns.db

Migrating from files

Now the difficult part:

# zone2sql --named-conf=/etc/pdns/named.conf -gsqlite | sqlite3 /var/lib/pdns/pdns.db

100% done
7 domains were fully parsed, containing 89 records

Migrating from files - an alternative way

If you have already switched to the generic sql backend on your powerdns auth setup, then you can use: pdnsutil load-zone command.

# pdnsutil load-zone /etc/pdns/var/ 

Mar 20 19:35:34 Reading random entropy from '/dev/urandom'
Creating ''


If you dont want to read error messages like the below:

sqlite needs to write extra files when writing to a db file

give your powerdns user permissions on the directory:

# chown -R pdns:pdns /var/lib/pdns


Last thing, make the appropriate changes on the pdns.conf file:

## launch=bind
## bind-config=/etc/pdns/named.conf


Reload Service

Restarting powerdns daemon:

# service pdns restart

Restarting PowerDNS authoritative nameserver: stopping and waiting..done
Starting PowerDNS authoritative nameserver: started


# dig @ -p 5353  -t soa +short 2018020107 14400 7200 1209600 86400


# dig -t soa +short 2018020107 14400 7200 1209600 86400


Using the API

Having a database as pdns backend, means that we can use the PowerDNS API.

Enable the API

In the pdns core configuration file: /etc/pdns/pdns.conf enable the API and dont forget to type a key.


The API key is used for authorization, by sending it through the http headers.

reload the service.

Testing API

Using curl :

# curl -s -H 'X-API-Key: 0123456789ABCDEF'

The output is in json format, so it is prefable to use jq

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | jq .

    "zones_url": "/api/v1/servers/localhost/zones{/zone}",
    "version": "4.1.1",
    "url": "/api/v1/servers/localhost",
    "type": "Server",
    "id": "localhost",
    "daemon_type": "authoritative",
    "config_url": "/api/v1/servers/localhost/config{/config_setting}"

jq can also filter the output:

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | jq .[].version


Getting the entire zone from the database and view all the Resource Records - sets:

# curl -s -H 'X-API-Key: 0123456789ABCDEF'

or just getting the serial:

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | \
  jq .serial


or getting the content of SOA type:

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | \
  jq '.rrsets[] | select( .type | contains("SOA")).records[].content '

" 2018020107 14400 7200 1209600 86400"


Creating or updating records is also trivial.
Create the Resource Record set in json format:

# cat > /tmp/test.text <<EOF
    "rrsets": [
            "name": "",
            "type": "TXT",
            "ttl": 86400,
            "changetype": "REPLACE",
            "records": [
                    "content": ""Test, this is a test ! "",
                    "disabled": false


and use the http Patch method to send it through the API:

# curl -s -X PATCH -H 'X-API-Key: 0123456789ABCDEF' --data @/tmp/test.text \ | jq . 

Verify Record

We can use dig internal:

# dig -t TXT @ -p 5353 +short
"Test, this is a test ! "

querying public dns servers:

$ dig txt +short @
"Test, this is a test ! "

$ dig txt +short @
"Test, this is a test ! "

or via the api:

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | \
   jq '.rrsets[].records[] | select (.content | contains("test")).content'

""Test, this is a test ! ""

That’s it.

Tag(s): powerdns, sqlite, api

March 20, 2018 06:47 PM

Essential Monitoring checks

In this article I'll provide a list of checks I consider essential for monitoring and why they are usefull. It's on different levels, ranging from your application (health checks), to operating system (disk usage, load) and hardware (iDrac, disks, power). Use it as a starting point when setting up your monitoring.

March 20, 2018 12:00 AM

March 19, 2018

The security footgun in etcd

The post The security footgun in etcd appeared first on

Etcd is yet another highly critical piece of infrastructure that had authentication disabled by default.

I guess I'll add this one to the list of unauthenticated, unfirewalled protocols.

"etcd before 2.1 was a completely open system; anyone with access to the API could change keys. In order to preserve backward compatibility and upgradability, this feature is off by default."


Yes. The same thing, etcd has an authentication mechanism which is disabled by default and it also has a very nice RESTful API as it’s main interface, what could go wrong right. People are smart and they will keep their etcd services from leaking to the open internet.


Source: The security footgun in etcd – elweb

The post The security footgun in etcd appeared first on

by Mattias Geniar at March 19, 2018 12:27 PM

Steve Kemp's Blog

Serverless deployment via docker

I've been thinking about serverless-stuff recently, because I've been re-deploying a bunch of services and some of them could are almost microservices. One thing that a lot of my things have in common is that they're all simple HTTP-servers, presenting an API or end-point over HTTP. There is no state, no database, and no complex dependencies.

These should be prime candidates for serverless deployment, but at the same time I don't want to have to recode them for AWS Lamda, or any similar locked-down service. So docker is the obvious answer.

Let us pretend I have ten HTTP-based services, each of which each binds to port 8000. To make these available I could just setup a simple HTTP front-end:


We'd need to route the request to the appropriate back-end, so we'd start to present URLs like:


Here any request which had the prefix steve/foo would be routed to a running instance of the docker container steve/foo. In short the name of the (first) path component performs the mapping to the back-end.

I wrote a quick hack, in golang, which would bind to port 80 and dynamically launch the appropriate containers, then proxy back and forth. I soon realized that this is a terrible idea though! The problem is a malicious client could start making requests for things like:


That would trigger my API-proxy to download the containers and spin them up. Allowing running arbitrary (albeit "sandboxed") code. So taking a step back, we want to use the path-component of an URL to decide where to route the traffic? Each container will bind to :8000 on its private (docker) IP? There's an obvious solution here: HAProxy.

So I started again, I wrote a trivial golang deamon which will react to docker events - containers starting and stopping - and generate a suitable haproxy configuration file, which can then be used to reload haproxy.

The end result is that if I launch a container named "foo" then requests to will reach it. Success! The only downside to this approach is that you must manually launch your back-end docker containers - but if you do so they'll become immediately available.

I guess there is another advantage. Since you're launching the containers (manually) you can setup links, volumes, and what-not. Much more so than if your API layer span them up with zero per-container knowledge.

March 19, 2018 11:00 AM

Vincent Bernat

Integration of a Go service with systemd: socket activation

In a previous post, I highlighted some useful features of systemd when writing a service in Go, notably to signal readiness and prove liveness. Another interesting bit is socket activation: systemd listens on behalf of the application and, on incoming traffic, starts the service with a copy of the listening socket. Lennart Poettering details in a blog post:

If a service dies, its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off. If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade.

This is one solution to get zero-downtime deployment for your application. Another upside is you can run your daemon with less privileges—loosing rights is a difficult task in Go.1

The basics🔗

Let’s take back our nifty 404-only web server:

package main

import (

func main() {
    listener, err := net.Listen("tcp", ":8081")
    if err != nil {
        log.Panicf("cannot listen: %s", err)
    http.Serve(listener, nil)

Here is the socket-activated version, using go-systemd:

package main

import (


func main() {
    listeners, err := activation.Listeners(true) // ❶
    if err != nil {
        log.Panicf("cannot retrieve listeners: %s", err)
    if len(listeners) != 1 {
        log.Panicf("unexpected number of socket activation (%d != 1)",
    http.Serve(listeners[0], nil) // ❷

In ❶, we retrieve the listening sockets provided by systemd. In ❷, we use the first one to serve HTTP requests. Let’s test the result with systemd-socket-activate:

$ go build 404.go
$ systemd-socket-activate -l 8000 ./404
Listening on [::]:8000 as 3.

In another terminal, we can make some requests to the service:

$ curl '[::1]':8000
404 page not found
$ curl '[::1]':8000
404 page not found

For a proper integration with systemd, you need two files:

  • a socket unit for the listening socket, and
  • a service unit for the associated service.

We can use the following socket unit, 404.socket:

ListenStream = 8000
BindIPv6Only = both

WantedBy =

The systemd.socket(5) manual page describes the available options. BindIPv6Only = both is explicitely specified because the default value is distribution-dependent. As for the service unit, we can use the following one, 404.service:

Description = 404 micro-service

ExecStart = /usr/bin/404

systemd knows the two files work together because they share the same prefix. Once the files are in /etc/systemd/system, execute systemctl daemon-reload and systemctl start 404.​socket. Your service is ready to accept connections!

Handling of existing connections🔗

Our 404 service has a major shortcoming: existing connections are abruptly killed when the daemon is stopped or restarted. Let’s fix that!

Waiting a few seconds for existing connections🔗

We can include a short grace period for connections to terminate, then kill remaining ones:

// On signal, gracefully shut down the server and wait 5
// seconds for current connections to stop.
done := make(chan struct{})
quit := make(chan os.Signal, 1)
server := &http.Server{}
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)

go func() {
    log.Println("server is shutting down")
    ctx, cancel := context.WithTimeout(context.Background(),
    defer cancel()
    if err := server.Shutdown(ctx); err != nil {
        log.Panicf("cannot gracefully shut down the server: %s", err)

// Start accepting connections.

// Wait for existing connections before exiting.

Upon reception of a termination signal, the goroutine would resume and schedule a shutdown of the service:

Shutdown() gracefully shuts down the server without interrupting any active connections. Shutdown() works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down.

While restarting, new connections are not accepted: they sit in the listen queue associated to the socket. This queue is bounded and its size can be configured with the Backlog directive in the socket unit. Its default value is 128. You may keep this value, even when your service is expecting to receive many connections by second. When this value is exceeded, incoming connections are silently dropped. The client should automatically retry to connect. On Linux, by default, it will retry 5 times (tcp_syn_retries) in about 3 minutes. This is a nice way to avoid the herd effect you would experience on restart if you increased the listen queue to some high value.

Waiting longer for existing connections🔗

If you want to wait for a very long time for existing connections to stop, you do not want to ignore new connections for several minutes. There is a very simple trick: ask systemd to not kill any process on stop. With KillMode = none, only the stop command is executed and all existing processes are left undisturbed:

Description = slow 404 micro-service

ExecStart = /usr/bin/404
ExecStop  = /bin/kill $MAINPID
KillMode  = none

If you restart the service, the current process gracefully shuts down for as long as needed and systemd spawns immediately a new instance ready to serve incoming requests with its own copy of the listening socket. On the other hand, we loose the ability to wait for the service to come to a full stop—either by itself or forcefully after a timeout with SIGKILL.

Waiting longer for existing connections (alternative)🔗

An alternative to the previous solution is to make systemd believe your service died during reload.

done := make(chan struct{})
quit := make(chan os.Signal, 1)
server := &http.Server{}
    // for reload:
    // for stop or full restart:
    syscall.SIGINT, syscall.SIGTERM)
go func() {
    sig := <-quit
    switch sig {
    case syscall.SIGINT, syscall.SIGTERM:
        // Shutdown with a time limit.
        log.Println("server is shutting down")
        ctx, cancel := context.WithTimeout(context.Background(),
        defer cancel()
        if err := server.Shutdown(ctx); err != nil {
            log.Panicf("cannot gracefully shut down the server: %s", err)
    case syscall.SIGHUP: // ❶
        // Execute a short-lived process and asks systemd to
        // track it instead of us.
        log.Println("server is reloading")
        pid := detachedSleep()
        daemon.SdNotify(false, fmt.Sprintf("MAINPID=%d", pid))
        time.Sleep(time.Second) // Wait a bit for systemd to check the PID

        // Wait without a limit for current connections to stop.
        if err := server.Shutdown(context.Background()); err != nil {
            log.Panicf("cannot gracefully shut down the server: %s", err)

// Serve requests with a slow handler.
server.Handler = http.HandlerFunc(
    func(w http.ResponseWriter, r *http.Request) {
        time.Sleep(10 * time.Second)
        http.Error(w, "404 not found", http.StatusNotFound)

// Wait for all connections to terminate.
log.Println("server terminated")

The main difference is the handling of the SIGHUP signal in ❶: a short-lived decoy process is spawned and systemd is told to track it. When it dies, systemd will start a new instance. This method is a bit hacky: systemd needs the decoy process to be a child of PID 1 but Go cannot easily detach on its own. Therefore, we leverage a short Python helper, wrapped in a detachedSleep() function:2

// detachedSleep spawns a detached process sleeping
// one second and returns its PID.
func detachedSleep() uint64 {
    py := `
import os
import time

pid = os.fork()
if pid == 0:
    for fd in {0, 1, 2}:
    cmd := exec.Command("/usr/bin/python3", "-c", py)
    out, err := cmd.Output()
    if err != nil {
        log.Panicf("cannot execute sleep command: %s", err)
    pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64)
    if err != nil {
        log.Panicf("cannot parse PID of sleep command: %s", err)
    return pid

During reload, there may be a small period during which both the new and the old processes accept incoming requests. If you don’t want that, you can move the creation of the short-lived process outside the goroutine, after server.Serve(), or implement some synchronization mechanism. There is also a possible race-condition when we tell systemd to track another PID—see PR #7816.

The 404.service unit needs an update:

Description = slow 404 micro-service

ExecStart    = /usr/bin/404
ExecReload   = /bin/kill -HUP $MAINPID
Restart      = always
NotifyAccess = main
KillMode     = process

Each additional directive is significant:

  • ExecReload tells how to reload the process—by sending SIGHUP.
  • Restart tells to restart the process if it stops “unexpectedly”, notably on reload.3
  • NotifyAccess specifies which process can send notifications, like a PID change.
  • KillMode tells to only kill the main identified process—others are left untouched.

Zero-downtime deployment?🔗

Zero-downtime deployment is a difficult endeavor on Linux. For example, HAProxy had a long list of hacks until a proper—and complex—solution was implemented in HAproxy 1.8. How do we fare with our simple implementation?

From the kernel point of view, there is a only one socket with a unique listen queue. This socket is associated to several file descriptors: one in systemd and one in the current process. The socket stays alive as long as there is at least one file descriptor. An incoming connection is put by the kernel in the listen queue and can be dequeued from any file descriptor with the accept() syscall. Therefore, this approach actually achieves zero-downtime deployment: no incoming connection is rejected.

By contrast, HAProxy was using several different sockets listening to the same addresses, thanks to the SO_REUSEPORT option.4 Each socket gets its own listening queue and the kernel balances incoming connections between each queue. When a socket gets closed, the content of its queue is lost. If an incoming connection was sitting here, it would receive a reset. An elegant patch for Linux to signal a socket should not receive new connections was rejected. HAProxy 1.8 is now recycling existing sockets to the new processes through an Unix socket.

I hope this post and the previous one show how systemd is a good sidekick for a Go service: readiness, liveness and socket activation are some of the useful features you can get to build a more reliable application.

Addendum: decoy process using Go🔗

UPDATED (2018.03): On /r/golang, it was pointed out to me that, in the version where systemd is tracking a decoy, the helper can be replaced by invoking the main executable. By relying on a change of environment, it assumes the role of the decoy. Here is such an implementation replacing the detachedSleep() function:

func init() {
    // As early as possible, check if we should be the decoy.
    state := os.Getenv("__SLEEPY")
    switch state {
    case "1":
        // First step, fork again.
        execPath := self()
        child, err := os.StartProcess(
                Env: append(os.Environ(), "__SLEEPY=2"),
        if err != nil {
            log.Panicf("cannot execute sleep command: %s", err)

        // Advertise child's PID and exit. Child will be
        // orphaned and adopted by PID 1.
        fmt.Printf("%d", child.Pid)
    case "2":
        // Sleep and exit.
    // Not the sleepy helper. Business as usual.

// self returns the absolute path to ourselves. This relies on
// /proc/self/exe which may be a symlink to a deleted path (for
// example, during an upgrade).
func self() string {
    execPath, err := os.Readlink("/proc/self/exe")
    if err != nil {
        log.Panicf("cannot get self path: %s", err)
    execPath = strings.TrimSuffix(execPath, " (deleted)")
    return execpath

// detachedSleep spawns a detached process sleeping one second and
// returns its PID. A full daemonization is not needed as the process
// is short-lived.
func detachedSleep() uint64 {
    cmd := exec.Command(self())
    cmd.Env = append(os.Environ(), "__SLEEPY=1")
    out, err := cmd.Output()
    if err != nil {
        log.Panicf("cannot execute sleep command: %s", err)
    pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64)
    if err != nil {
        log.Panicf("cannot parse PID of sleep command: %s", err)
    return pid

Addendum: identifying sockets by name🔗

For a given service, systemd can provide several sockets. To identify them, it is possible to name them. Let’s suppose we also want to return 403 error codes from the same service but on a different port. We add an additional socket unit definition, 403.socket, linked to the same 404.service job:

ListenStream = 8001
BindIPv6Only = both
Service      = 404.service


Unless overridden with FileDescriptorName, the name of the socket is the name of the unit: 403.socket. go-systemd provides the ListenersWithNames() function to fetch a map from names to listening sockets:

package main

import (


func main() {
    var wg sync.WaitGroup

    // Map socket names to handlers.
    handlers := map[string]http.HandlerFunc{
        "404.socket": http.NotFound,
        "403.socket": func(w http.ResponseWriter, r *http.Request) {
            http.Error(w, "403 forbidden",

    // Get listening sockets.
    listeners, err := activation.ListenersWithNames(true)
    if err != nil {
        log.Panicf("cannot retrieve listeners: %s", err)

    // For each listening socket, spawn a goroutine
    // with the appropriate handler.
    for name := range listeners {
        for idx := range listeners[name] {
            go func(name string, idx int) {
                defer wg.Done()
            }(name, idx)

    // Wait for all goroutines to terminate.

Let’s build the service and run it with systemd-socket-activate:

$ go build 404.go
$ systemd-socket-activate -l 8000 -l 8001 \
>                         --fdname=404.socket:403.socket \
>                         ./404
Listening on [::]:8000 as 3.
Listening on [::]:8001 as 4.

In another console, we can make a request for each endpoint:

$ curl '[::1]':8000
404 page not found
$ curl '[::1]':8001
403 forbidden

  1. Many process characteristics in Linux are attached to threads. Go runtime transparently manages them without much user control. Until recently, this made some features, like setuid() or setns(), unusable. ↩︎

  2. Python is a good candidate: it’s likely to be available on the system, it is low-level enough to easily implement the functionality and, as an interpreted language, it doesn’t require a specific build step.

    UPDATED (2018.03): There is no need to fork twice as we only need to detach the decoy from the current process. This simplify a bit the Python code. ↩︎

  3. This is not an essential directive as the process is also restarted through socket-activation. ↩︎

  4. This approach is more convenient when reloading since you don’t have to figure out which sockets to reuse and which ones to create from scratch. Moreover, when several processes need to accept connections, using multiple sockets is more scalable as the different processes won’t fight over a shared lock to accept connections. ↩︎

by Vincent Bernat at March 19, 2018 08:28 AM

March 18, 2018

Restic (backup) deleting old backups is extremely slow

Here's a very quick note:

I've been using the Restic backup tool with the SFTP backend for a while now, and so far it was great. Until I tried to prune some old backups. It takes two hours to prune 1 GiB of data from a 15 GiB backup. During that time, you cannot create new backups. It also consumes a huge amount of bandwidth when deleting old backups. I strongly suspect it downloads each blob from the remote storage backend, repacks it and then writes it back.

I've seen people on the internet with a few hundred GiB worth of backups having to wait 7 days to delete their old backups. Since the repo is locked during that time, you cannot create new backups.

This makes Restic completely unusable as far as I'm concerned. Which is a shame, because other than that, it's an incredible tool.

by admin at March 18, 2018 03:34 PM

Vincent Bernat

Route-based VPN on Linux with WireGuard

In a previous article, I described an implementation of redundant site-to-site VPNs using IPsec (with strongSwan as an IKE daemon) and BGP (with BIRD) to achieve this: 🦑

Redundant VPNs between 3 sites

The two strengths of such a setup are:

  1. Routing daemons distribute routes to be protected by the VPNs. They provide high availability and decrease the administrative burden when many subnets are present on each side.
  2. Encapsulation and decapsulation are executed in a different network namespace. This enables a clean separation between a private routing instance (where VPN users are) and a public routing instance (where VPN endpoints are).

As an alternative to IPsec, WireGuard is an extremely simple (less than 5,000 lines of code) yet fast and modern VPN that utilizes state-of-the-art and opinionated cryptography (Curve25519, ChaCha20, Poly1305) and whose protocol, based on Noise, has been formally verified. It is currently available as an out-of-tree module for Linux but is likely to be merged when the protocol is not subject to change anymore. Compared to IPsec, its major weakness is its lack of interoperability.

It can easily replace strongSwan in our site-to-site setup. On Linux, it already acts as a route-based VPN. As a first step, for each VPN, we create a private key and extract the associated public key:

$ wg genkey
$ echo oM3PZ1Htc7FnACoIZGhCyrfeR+Y8Yh34WzDaulNEjGs= | wg pubkey

Then, for each remote VPN, we create a short configuration file:1

PrivateKey = oM3PZ1Htc7FnACoIZGhCyrfeR+Y8Yh34WzDaulNEjGs=
ListenPort = 5803

PublicKey  = Jixsag44W8CFkKCIvlLSZF86/Q/4BovkpqdB9Vps5Sk=
EndPoint   = [2001:db8:2::1]:5801
AllowedIPs =,::/0

A new ListenPort value should be used for each remote VPN. WireGuard can multiplex several peers over the same UDP port but this is not applicable here, as the routing is dynamic. The AllowedIPs directive tells to accept and send any traffic.

The next step is to create and configure the tunnel interface for each remote VPN:

$ ip link add dev wg3 type wireguard
$ wg setconf wg3 wg3.conf

WireGuard initiates a handshake to establish symmetric keys:

$ wg show wg3
interface: wg3
  public key: hV1StKWfcC6Yx21xhFvoiXnWONjGHN1dFeibN737Wnc=
  private key: (hidden)
  listening port: 5803

peer: Jixsag44W8CFkKCIvlLSZF86/Q/4BovkpqdB9Vps5Sk=
  endpoint: [2001:db8:2::1]:5801
  allowed ips:, ::/0
  latest handshake: 55 seconds ago
  transfer: 49.84 KiB received, 49.89 KiB sent

Like VTI interfaces, WireGuard tunnel interfaces are namespace-aware: once created, they can be moved into another network namespace where clear traffic is encapsulated and decapsulated. Encrypted traffic is routed in its original namespace. Let’s move each interface into the private namespace and assign it a point-to-point IP address:

$ ip link set netns private dev wg3
$ ip -n private addr add 2001:db8:ff::/127 dev wg3
$ ip -n private link set wg3 up

The remote end uses 2001:db8:ff::1/127. Once everything is setup, from one VPN, we should be able to ping each remote host:

$ ip netns exec private fping 2001:db8:ff::{1,3,5,7}
2001:db8:ff::1 is alive
2001:db8:ff::3 is alive
2001:db8:ff::5 is alive
2001:db8:ff::7 is alive

BIRD configuration is unmodified compared to our previous setup and the BGP sessions should establish quickly:

$ birdc6 -s /run/bird6.private.ctl show proto | grep IBGP_
IBGP_V2_1 BGP      master   up     20:16:31    Established
IBGP_V2_2 BGP      master   up     20:16:31    Established
IBGP_V3_1 BGP      master   up     20:16:31    Established
IBGP_V3_2 BGP      master   up     20:16:29    Established

Remote routes are learnt over the different tunnel interfaces:

$ ip -6 -n private route show proto bird
2001:db8:a1::/64 via fe80::5254:33ff:fe00:13 dev eth2 metric 1024 pref medium
2001:db8:a2::/64 metric 1024
        nexthop via 2001:db8:ff::1 dev wg3 weight 1
        nexthop via 2001:db8:ff::3 dev wg4 weight 1
2001:db8:a3::/64 metric 1024
        nexthop via 2001:db8:ff::5 dev wg5 weight 1
        nexthop via 2001:db8:ff::7 dev wg6 weight 1

From one site, you can ping an host on the other site through the VPNs:

$ ping -c 2 2001:db8:a3::1
PING 2001:db8:a3::1(2001:db8:a3::1) 56 data bytes
64 bytes from 2001:db8:a3::1: icmp_seq=1 ttl=62 time=1.54 ms
64 bytes from 2001:db8:a3::1: icmp_seq=2 ttl=62 time=1.67 ms

--- 2001:db8:a3::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.542/1.607/1.672/0.065 ms

As with the strongSwan setup, you can easily snoop unencrypted traffic with tcpdump:

$ ip netns exec private tcpdump -c3 -pni wg5 icmp6
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wg5, link-type RAW (Raw IP), capture size 262144 bytes
08:34:34 IP6 2001:db8:a3::1 > 2001:db8:a1::1: ICMP6, echo reply, seq 40
08:34:35 IP6 2001:db8:a3::1 > 2001:db8:a1::1: ICMP6, echo reply, seq 41
08:34:36 IP6 2001:db8:a3::1 > 2001:db8:a1::1: ICMP6, echo reply, seq 42
3 packets captured
3 packets received by filter
0 packets dropped by kernel

You can find all the configuration files for this example on GitHub.

  1. Compared to IPsec, the cryptography is not configurable and you have to use the strong provided defaults. ↩︎

by Vincent Bernat at March 18, 2018 01:29 AM

My Yubikey broke, but I had a backup. So should you with your 2FA

Today my trusty old first generation Yubikey didn't light up when I plugged it in. No problem for me, I had a backup key. But most people don't, so here's an important tip when you use two factor authentication like a Yubikey, Nitrokey or Google Authenticator (HOTP). TL;DR: Have a second hardware token stored away safely and backup your QR codes (print/screenshot) somewhere secure. Swap the hardware tokens often to make sure they both work with all services. Just as with regular data, make backups and test restores.

March 18, 2018 12:00 AM

March 17, 2018

LZone - Sysadmin

Puppet Agent Settings Issue

Experienced a strange puppet agent 4.8 configuration issue this week. To properly distribute the agent runs over time to even out puppet master load I wanted to configure the splay settings properly. There are two settings:
  • A boolean "splay" to enable/disable splaying
  • A range limiter "splayLimit" to control the randomization
What first confused me was the "splay" was not on per-default. Of course when using the open source version it makes sense to have it off. Having it on per-default sounds more like an enterprise feature :-)

No matter the default after deploying an agent config with settings like this
runInterval = 3600
splay = true
splayLimit = 3600
... nothing happened. Runs were still not randomized. Checking the active configuration with
# puppet config print | grep splay
turned out that my config settings were not working at all. What was utterly confusing is that even the runInterval was reported as 1800 (which is the default value). But while the splay just did not work the effective runInterval was 3600!

After hours of debugging it, I happened to read the puppet documentation section that covers the config sections like [agent] and [main]. It says that [main] configures global settings and other sections can override the settings in [main], which makes sense.

But it just doesn't work this way. In the end the solution was using [main] as config section instead of [agent]:
and with this config "puppet config print" finally reported the settings as effective and the runtime behaviour had the expected randomization.

Maybe I misread something somewhere, but this is really hard to debug. And INI file are not really helpful in Unix. Overriding works better default files and with drop dirs.

March 17, 2018 08:38 PM

March 16, 2018

LZone - Sysadmin

Python re.sub Examples

Example for re.sub() usage in Python


import re

result = re.sub(pattern, repl, string, count=0, flags=0);

Simple Examples

num = re.sub(r'abc', '', input)              # Delete pattern abc
num = re.sub(r'abc', 'def', input)           # Replace pattern abc -> def
num = re.sub(r'\s+', '\s', input)            # Eliminate duplicate whitespaces
num = re.sub(r'abc(def)ghi', '\1', input)    # Replace a string with a part of itself

Advance Usage

Replacement Function

Instead of a replacement string you can provide a function performing dynamic replacements based on the match string like this:
def my_replace(m):
    if :
       return <replacement variant 1>
    return <replacement variant 2>

result = re.sub("\w+", my_replace, input)

Count Replacements

When you want to know how many replacements did happen use re.subn() instead:
result = re.sub(pattern, replacement, input)
print ('Result: ', result[0])
print ('Replacements: ', result[1])

See also:

March 16, 2018 08:08 PM

March 15, 2018

Enable the slow log in Elastic Search

The post Enable the slow log in Elastic Search appeared first on

Elasticsearch is pretty cool, you can just fire of HTTP commands to it to change (most of) its settings on the fly, without restarting the service. Here's how you can enable the slowlog to lo queries that exceed a certain time treshold.

These are enabled per index you have, so you can be selective about it.

Get all indexes in your Elastic Search

To start, get a list of all your Elasticsearch indexes. I'm using jq here for the JSON formatting (get jq here).

$ curl -s -XGET ''
green open index1 BV8NLebPuHr6wh2qUnp7XpTLBT 2 0 425739 251734  1.3gb  1.3gb
green open index2 3hfdy8Ldw7imoq1KDGg2FMyHAe 2 0 425374 185515  1.2gb  1.2gb
green open index3 ldKod8LPUOphh7BKCWevYp3xTd 2 0 425674 274984  1.5gb  1.5gb

This shows you have 3 indexes, called index1, index2 and index3.

Enable slow log per index

Make a PUT HTTP call to change the settings of a particular index. In this case, index index3 will be changed.

$ curl -XPUT -d '{"" : "50ms","": "50ms","index.indexing.slowlog.threshold.index.warn": "50ms"}' | jq

If you pretty-print the JSON payload, it looks like this:

  "" : "50ms",
  "": "50ms",
  "index.indexing.slowlog.threshold.index.warn": "50ms"

Which essentially means: log all queries, fetches and index rebuilds that exceed 50ms with a severity of "warning".

Enable global warning logging

To make sure those warning logs get written to your logs, make sure you enable that logging in your cluster.

$ curl -XPUT -d '{"transient" : {"" : "WARN", "logger.index.indexing.slowlog" : "WARN" }}' | jq

Again, pretty-printed payload:

  "transient" : {
    "" : "WARN",
    "logger.index.indexing.slowlog" : "WARN"

These settings aren't persisted on restart, they are only written to memory and active for the currently running elasticsearch instance.

The post Enable the slow log in Elastic Search appeared first on

by Mattias Geniar at March 15, 2018 06:30 PM

March 14, 2018


Target your damned survey report

StackOverflow has released their 2018 Developer Hiring Landscape report. (alternate source)

This is the report that reportedly is about describing the demographics and preferences of software creators, which will enable people looking to hire such creators to better tailor their offerings.

It's an advertising manual, basically. However, they dropped the ball in a few areas. One of which has been getting a lot of traction on Twitter.

It's getting traction for a good reason, and it has to do with how these sorts of reports are written. The section under discussion here is "Differences in assessing jobs by gender". They have five cross-tabs here:

  1. All respondents highest-ranked.
  2. All respondents lowest-ranks (what the above references).
  3. All men highest-ranked.
  4. All women highest-ranked.
  5. All non-binary highest-ranked (they have this. This is awesome).

I took this survey, and it was one of those classic questions like:

Rank these ten items from lowest to highest.

And yet, this report seems to ignore everything but the 1's and 10's. This is misguided, and leaves a lot of very valuable market-segment targeting information on the floor. Since 92% of respondents were men, the first and third tabs were almost identical, differing only by tenths of a percent. The second tab is likewise, that's a proxy tab for "what men don't want". We don't know how women or non-binary differ in their least-liked preferences.

There is some very good data they could have presented, but chose not to. First of all, the number one, two and three priorities are the ones that people are most conscious of and may be willing to compromise one to get the other two. This should have been presented.

  1. All respondents top-3 ranked.
  2. All men top-3 ranked.
  3. All women top-3 ranked.
  4. All non-binary top-3 ranked.

Compensation/Benefits would probably be close to 100%, but we would get interesting differences in the number two and three places on that chart. This gives recruiters the information they need to construct their pitches. Top-rank is fine, but you also want to know the close-enoughs. Sometimes, if you don't hit the top spot, you can win someone by hitting everything else.

I have the same complaint for their "What Developers Value in Compensation and Benefits" cross-tab. Salary/Bonus is the top item for nearly everyone. This is kind of a gimmie. The number 2 and 3 places are very important because they're the tie-breaker. If an applicant is looking at a job that hits their pay rank, but misses on the next two most important priorities, they're going to be somewhat less enthusiastic. In a tight labor market, if they're also looking at an offer from a company that misses the pay by a bit and hits the rest, that may be the offer that gets accepted. The 2 through 9 rankings on that chart are important.

This is a company that uses proportional voting for their moderator elections. They know the value of ranked voting. Winner-takes-all surveys are missing the point, and doing their own target market, recruiters, a disservice.

They should do better.

by SysAdmin1138 at March 14, 2018 08:59 PM

The Lone Sysadmin

No VMware NSX Hardware Gateway Support for Cisco

I find it interesting, as I’m taking my first real steps into the world of VMware NSX, that there is no Cisco equipment supported as a VMware NSX hardware gateway (VTEP). According to the HCL on March 13th, 2018 there is a complete lack of “Cisco” in the “Partner” category: I wonder how that works out […]

The post No VMware NSX Hardware Gateway Support for Cisco appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 14, 2018 05:26 PM

Evaggelos Balaskas

Let's Encrypt Wildcard Certificate

ACME v2 and Wildcard Certificate Support is Live

We have some good news, letsencrypt support wildcard certificates! For more details click here.

The key phrase on the post is this:

Certbot has ACME v2 support since Version 0.22.0.

unfortunately -at this momment- using certbot on a centos6 is not so trivial, so here is an alternative approach using: is a pure Unix shell script implementing ACME client protocol.

# curl -LO
# tar xf 2.7.7.tar.gz
# cd

[]# ./ --version


I have my own Authoritative Na,e Server based on powerdns software.

PowerDNS has an API for direct control, also a built-in web server for statistics.

To enable these features make the appropriate changes to pdns.conf


and restart your pdns service.

To read more about these capabilities, click here: Built-in Webserver and HTTP API

testing the API:

# curl -s -H 'X-API-Key: 0123456789ABCDEF' | jq .

  "zones_url": "/api/v1/servers/localhost/zones{/zone}",
  "version": "4.1.1",
  "url": "/api/v1/servers/localhost",
  "type": "Server",
  "id": "localhost",
  "daemon_type": "authoritative",
  "config_url": "/api/v1/servers/localhost/config{/config_setting}"


export PDNS_Url=""
export PDNS_ServerId="localhost"
export PDNS_Token="0123456789ABCDEF"
export PDNS_Ttl=60

Prepare Destination

I want to save the certificates under /etc/letsencrypt directory.
By default, will save certificate files under /root/ path.

I use selinux and I want to save them under /etc and on similar directory as before, so:

# mkdir -pv /etc/letsencrypt/

Create WildCard Certificate


# ./
  --dns dns_pdns
  --dnssleep 30
  -d *
  --cert-file /etc/letsencrypt/
  --key-file  /etc/letsencrypt/
  --ca-file   /etc/letsencrypt/
  --fullchain-file /etc/letsencrypt/


Using HTTP Strict Transport Security means that the browsers probably already know that you are using a single certificate for your domains. So, you need to add every domain in your wildcard certificate.

Web Server

Change your VirtualHost

from something like this:

SSLCertificateFile /etc/letsencrypt/live/
SSLCertificateKeyFile /etc/letsencrypt/live/
Include /etc/letsencrypt/options-ssl-apache.conf
SSLCertificateChainFile /etc/letsencrypt/live/

to something like this:

SSLCertificateFile    /etc/letsencrypt/
SSLCertificateKeyFile /etc/letsencrypt/
Include /etc/letsencrypt/options-ssl-apache.conf
SSLCertificateChainFile /etc/letsencrypt/

and restart your web server.




Qualys SSL Server Test)



X509v3 Subject Alternative Name

# openssl x509 -text -in /etc/letsencrypt/ | egrep balaskas


March 14, 2018 12:49 PM

March 13, 2018

Errata Security

What John Oliver gets wrong about Bitcoin

John Oliver covered bitcoin/cryptocurrencies last night. I thought I'd describe a bunch of things he gets wrong.

How Bitcoin works

Nowhere in the show does it describe what Bitcoin is and how it works.

Discussions should always start with Satoshi Nakamoto's original paper. The thing Satoshi points out is that there is an important cost to normal transactions, namely, the entire legal system designed to protect you against fraud, such as the way you can reverse the transactions on your credit card if it gets stolen. The point of Bitcoin is that there is no way to reverse a charge. A transaction is done via cryptography: to transfer money to me, you decrypt it with your secret key and encrypt it with mine, handing ownership over to me with no third party involved that can reverse the transaction, and essentially no overhead.

All the rest of the stuff, like the decentralized blockchain and mining, is all about making that work.

Bitcoin crazies forget about the original genesis of Bitcoin. For example, they talk about adding features to stop fraud, reversing transactions, and having a central authority that manages that. This misses the point, because the existing electronic banking system already does that, and does a better job at it than cryptocurrencies ever can. If you want to mock cryptocurrencies, talk about the "DAO", which did exactly that -- and collapsed in a big fraudulent scheme where insiders made money and outsiders didn't.

Sticking to Satoshi's original ideas are a lot better than trying to repeat how the crazy fringe activists define Bitcoin.

How does any money have value?

Oliver's answer is currencies have value because people agree that they have value, like how they agree a Beanie Baby is worth $15,000.

This is wrong. A better way of asking the question why the value of money changes. The dollar has been losing roughly 2% of its value each year for decades. This is called "inflation", as the dollar loses value, it takes more dollars to buy things, which means the price of things (in dollars) goes up, and employers have to pay us more dollars so that we can buy the same amount of things.

The reason the value of the dollar changes is largely because the Federal Reserve manages the supply of dollars, using the same law of Supply and Demand. As you know, if a supply decreases (like oil), then the price goes up, or if the supply of something increases, the price goes down. The Fed manages money the same way: when prices rise (the dollar is worth less), the Fed reduces the supply of dollars, causing it to be worth more. Conversely, if prices fall (or don't rise fast enough), the Fed increases supply, so that the dollar is worth less.

The reason money follows the law of Supply and Demand is because people use money, they consume it like they do other goods and services, like gasoline, tax preparation, food, dance lessons, and so forth. It's not like a fine art painting, a stamp collection or a Beanie Baby -- money is a product. It's just that people have a hard time thinking of it as a consumer product since, in their experience, money is what they use to buy consumer products. But it's a symmetric operation: when you buy gasoline with dollars, you are actually selling dollars in exchange for gasoline. That you call one side in this transaction "money" and the other "goods" is purely arbitrary, you call gasoline money and dollars the good that is being bought and sold for gasoline.

The reason dollars is a product is because trying to use gasoline as money is a pain in the neck. Storing it and exchanging it is difficult. Goods like this do become money, such as famously how prisons often use cigarettes as a medium of exchange, even for non-smokers, but it has to be a good that is fungible, storable, and easily exchanged. Dollars are the most fungible, the most storable, and the easiest exchanged, so has the most value as "money". Sure, the mechanic can fix the farmers car for three chickens instead, but most of the time, both parties in the transaction would rather exchange the same value using dollars than chickens.

So the value of dollars is not like the value of Beanie Babies, which people might buy for $15,000, which changes purely on the whims of investors. Instead, a dollar is like gasoline, which obey the law of Supply and Demand.

This brings us back to the question of where Bitcoin gets its value. While Bitcoin is indeed used like dollars to buy things, that's only a tiny use of the currency, so therefore it's value isn't determined by Supply and Demand. Instead, the value of Bitcoin is a lot like Beanie Babies, obeying the laws of investments. So in this respect, Oliver is right about where the value of Bitcoin comes, but wrong about where the value of dollars comes from.

Why Bitcoin conference didn't take Bitcoin

John Oliver points out the irony of a Bitcoin conference that stopped accepting payments in Bitcoin for tickets.

The biggest reason for this is because Bitcoin has become so popular that transaction fees have gone up. Instead of being proof of failure, it's proof of popularity. What John Oliver is saying is the old joke that nobody goes to that popular restaurant anymore because it's too crowded and you can't get a reservation.

Moreover, the point of Bitcoin is not to replace everyday currencies for everyday transactions. If you read Satoshi Nakamoto's whitepaper, it's only goal is to replace certain types of transactions, like purely electronic transactions where electronic goods and services are being exchanged. Where real-life goods/services are being exchanged, existing currencies work just fine. It's only the crazy activists who claim Bitcoin will eventually replace real world currencies -- the saner people see it co-existing with real-world currencies, each with a different value to consumers.

Turning a McNugget back into a chicken

John Oliver uses the metaphor of turning a that while you can process a chicken into McNuggets, you can't reverse the process. It's a funny metaphor.

But it's not clear what the heck this metaphor is trying explain. That's not a metaphor for the blockchain, but a metaphor for a "cryptographic hash", where each block is a chicken, and the McNugget is the signature for the block (well, the block plus the signature of the last block, forming a chain).

Even then that metaphor as problems. The McNugget produced from each chicken must be unique to that chicken, for the metaphor to accurately describe a cryptographic hash. You can therefore identify the original chicken simply by looking at the McNugget. A slight change in the original chicken, like losing a feather, results in a completely different McNugget. Thus, nuggets can be used to tell if the original chicken has changed.

This then leads to the key property of the blockchain, it is unalterable. You can't go back and change any of the blocks of data, because the fingerprints, the nuggets, will also change, and break the nugget chain.

The point is that while John Oliver is laughing at a silly metaphor to explain the blockchain becuase he totally misses the point of the metaphor.

Oliver rightly says "don't worry if you don't understand it -- most people don't", but that includes the big companies that John Oliver name. Some companies do get it, and are producing reasonable things (like JP Morgan, by all accounts), but some don't. IBM and other big consultancies are charging companies millions of dollars to consult with them on block chain products where nobody involved, the customer or the consultancy, actually understand any of it. That doesn't stop them from happily charging customers on one side and happily spending money on the other.

Thus, rather than Oliver explaining the problem, he's just being part of the problem. His explanation of blockchain left you dumber than before.


John Oliver mocks the Brave ICO ($35 million in 30 seconds), claiming it's all driven by YouTube personalities and people who aren't looking at the fundamentals.

And while this is true, most ICOs are bunk, the  Brave ICO actually had a business model behind it. Brave is a Chrome-like web-browser whose distinguishing feature is that it protects your privacy from advertisers. If you don't use Brave or a browser with an ad block extension, you have no idea how bad things are for you. However, this presents a problem for websites that fund themselves via advertisements, which is most of them, because visitors no longer see ads. Brave has a fix for this. Most people wouldn't mind supporting the websites they visit often, like the New York Times. That's where the Brave ICO "token" comes in: it's not simply stock in Brave, but a token for micropayments to websites. Users buy tokens, then use them for micropayments to websites like New York Times. The New York Times then sells the tokens back to the market for dollars. The buying and selling of tokens happens without a centralized middleman.

This is still all speculative, of course, and it remains to be seen how successful Brave will be, but it's a serious effort. It has well respected VC behind the company, a well-respected founder (despite the fact he invented JavaScript), and well-respected employees. It's not a scam, it's a legitimate venture.

How to you make money from Bitcoin?

The last part of the show is dedicated to describing all the scam out there, advising people to be careful, and to be "responsible". This is garbage.

It's like my simple two step process to making lots of money via Bitcoin: (1) buy when the price is low, and (2) sell when the price is high. My advice is correct, of course, but useless. Same as "be careful" and "invest responsibly".

The truth about investing in cryptocurrencies is "don't". The only responsible way to invest is to buy low-overhead market index funds and hold for retirement. No, you won't get super rich doing this, but anything other than this is irresponsible gambling.

It's a hard lesson to learn, because everyone is telling you the opposite. The entire channel CNBC is devoted to day traders, who buy and sell stocks at a high rate based on the same principle as a ponzi scheme, basing their judgment not on the fundamentals (like long term dividends) but animal spirits of whatever stock is hot or cold at the moment. This is the same reason people buy or sell Bitcoin, not because they can describe the fundamental value, but because they believe in a bigger fool down the road who will buy it for even more.

For things like Bitcoin, the trick to making money is to have bought it over 7 years ago when it was essentially worthless, except to nerds who were into that sort of thing. It's the same tick to making a lot of money in Magic: The Gathering trading cards, which nerds bought decades ago which are worth a ton of money now. Or, to have bought Apple stock back in 2009 when the iPhone was new, when nerds could understand the potential of real Internet access and apps that Wall Street could not.

That was my strategy: be a nerd, who gets into things. I've made a good amount of money on all these things because as a nerd, I was into Magic: The Gathering, Bitcoin, and the iPhone before anybody else was, and bought in at the point where these things were essentially valueless.

At this point with cryptocurrencies, with the non-nerds now flooding the market, there little chance of making it rich. The lottery is probably a better bet. Instead, if you want to make money, become a nerd, obsess about a thing, understand a thing when its new, and cash out once the rest of the market figures it out. That might be Brave, for example, but buy into it because you've spent the last year studying the browser advertisement ecosystem, the market's willingness to pay for content, and how their Basic Attention Token delivers value to websites -- not because you want in on the ICO craze.


John Oliver spends 25 minutes explaining Bitcoin, Cryptocurrencies, and the Blockchain to you. Sure, it's funny, but it leaves you worse off than when it started. It admits they "simplify" the explanation, but they simplified it so much to the point where they removed all useful information.

by Robert Graham ( at March 13, 2018 01:26 AM

March 10, 2018

Evaggelos Balaskas

GitLab CI/CD for building RPM

Continuous Deployment with GitLab: how to build and deploy a RPM Package with GitLab CI

I would like to automate building custom rpm packages with gitlab using their CI/CD functionality. This article is a documentation of my personal notes on the matter.

[updated: 2018-03-20 gitlab-runner Possible Problems]


You can find notes on how to install gitlab-community-edition here: Installation methods for GitLab. If you are like me, then you dont run a shell script on you machines unless you are absolutely sure what it does. Assuming you read and you are on a CentOS 7 machine, you can follow the notes below and install gitlab-ce manually:

Import gitlab PGP keys

# rpm --import 

# rpm --import

Gitlab repo

# curl -s '' \
  -o /etc/yum.repos.d/gitlab-ce.repo 

Install Gitlab

# yum -y install gitlab-ce

Configuration File

The gitlab core configuration file is /etc/gitlab/gitlab.rb
Remember that every time you make a change, you need to reconfigure gitlab:

# gitlab-ctl reconfigure

My VM’s IP is: Update the external_url to use the same IP or add a new entry on your hosts file (eg. /etc/hosts).

external_url ''

Run: gitlab-ctl reconfigure for updates to take effect.


To access the GitLab dashboard from your lan, you have to configure your firewall appropriately.

You can do this in many ways:

  • Accept everything on your http service
    # firewall-cmd --permanent --add-service=http

  • Accept your lan:
    # firewall-cmd --permanent --add-source=

  • Accept only tcp IPv4 traffic from a specific lan
    # firewall-cmd --permanent --direct --add-rule ipv4 filter INPUT 0 -p tcp -s -j ACCEPT

or you can complete stop firewalld (but not recommended)

  • Stop your firewall
    # systemctl stop firewalld

okay, I think you’ve got the idea.

Reload your firewalld after every change on it’s zones/sources/rules.

# firewall-cmd --reload



Point your browser to your gitlab installation:

this is how it looks the first time:


and your first action is to Create a new password by typing a password and hitting the Change your password button.



First Page


New Project

I want to start this journey with a simple-to-build project, so I will try to build libsodium,
a modern, portable, easy to use crypto library.

New project --> Blank project



I will use this libsodium.spec file as the example for the CI/CD.


The idea is to build out custom rpm package of libsodium for CentOS 6, so we want to use docker containers through the gitlab CI/CD. We want clean & ephemeral images, so we will use containers as the building enviroments for the GitLab CI/CD.

Installing docker is really simple.


# yum -y install docker 

Run Docker

# systemctl restart docker
# systemctl enable  docker

Download image

Download a fresh CentOS v6 image from Docker Hub:

# docker pull centos:6 
Trying to pull repository ...
6: Pulling from
ca9499a209fd: Pull complete
Digest: sha256:551de58ca434f5da1c7fc770c32c6a2897de33eb7fde7508e9149758e07d3fe3

View Docker Images

# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE    6                   609c1f9b5406        7 weeks ago         194.5 MB

Gitlab Runner

Now, it is time to install and setup GitLab Runner.

In a nutshell this program, that is written in golang, will listen to every change on our repository and run every job that it can find on our yml file. But lets start with the installation:

# curl -s '' \
  -o /etc/yum.repos.d/gitlab-runner.repo

# yum -y install gitlab-runner

GitLab Runner Settings

We need to connect our project with the gitlab-runner.

 Project --> Settings --> CI/CD

or in our example:

click on the expand button on Runner’s settings and you should see something like this:


Register GitLab Runner

Type into your terminal:

# gitlab-runner register

following the instructions


[root@centos7 ~]# gitlab-runner register
Running in system-mode.                            

Please enter the gitlab-ci coordinator URL (e.g.

Please enter the gitlab-ci token for this runner:

Please enter the gitlab-ci description for this runner:

Please enter the gitlab-ci tags for this runner (comma separated):

Whether to lock the Runner to current project [true/false]:

Registering runner... succeeded                     runner=s6ASqkR8

Please enter the executor: docker, ssh, virtualbox, docker-ssh+machine, kubernetes, docker-ssh, parallels, shell, docker+machine:

Please enter the default Docker image (e.g. ruby:2.1):

Runner registered successfully. Feel free to start it, but if it's running already the config should be automatically reloaded!
[root@centos7 ~]#

by refreshing the previous page we will see a new active runner on our project.


The Docker executor

We are ready to setup our first executor to our project. That means we are ready to run our first CI/CD example!

In gitlab this is super easy, just add a

New file --> Template --> gitlab-ci.yml --> based on bash

Dont forget to change the image from busybox:latest to centos:6


that will start a pipeline


GitLab Continuous Integration

Below is a gitlab ci test file that builds the rpm libsodium :


image: centos:6

  - echo "Get the libsodium version and name from the rpm spec file"
  - export LIBSODIUM_VERS=$(egrep '^Version:' libsodium.spec | awk '{print $NF}')
  - export LIBSODIUM_NAME=$(egrep '^Name:'    libsodium.spec | awk '{print $NF}')

  stage: build
    untracked: true
    - echo "Install rpm-build package"
    - yum -y install rpm-build
    - echo "Install BuildRequires"
    - yum -y install gcc
    - echo "Create rpmbuild directories"
    - mkdir -p rpmbuild/{BUILD,RPMS,SOURCES,SPECS,SRPMS}
    - echo "Download source file from github"
    - rpmbuild -D "_topdir `pwd`/rpmbuild" --clean -ba `pwd`/libsodium.spec

  stage: test
    - echo "Test it, Just test it !"
    - yum -y install rpmbuild/RPMS/x86_64/$LIBSODIUM_NAME-$LIBSODIUM_VERS-*.rpm

  stage: deploy
    - echo "Do your deploy here"


GitLab Artifacts

Before we continue I need to talk about artifacts

Artifacts is a list of files and directories that we produce at stage jobs and are not part of the git repository. We can pass those artifacts between stages, but you have to remember that gitlab can track files that only exist under the git-clone repository and not on the root fs of the docker image.

GitLab Continuous Delivery

We have successfully build an rpm file!! Time to deploy it to another machine. To do that, we need to add the secure shell private key to gitlab secret variables.

Project --> Settings --> CI/CD


stage: deploy

Lets re-write gitlab deployment state:


  stage: deploy
    - echo "Create ssh root directory"
    - mkdir -p ~/.ssh/ && chmod 700 ~/.ssh/

    - echo "Append secret variable to the ssh private key file"
    - echo -e "$SSH_PRIVATE_test_KEY" > ~/.ssh/id_rsa
    - chmod 0600 ~/.ssh/id_rsa

    - echo "Install SSH client"
    - yum -y install openssh-clients

    - echo "Secure Copy the libsodium rpm file to the destination server"
    - scp -o StrictHostKeyChecking=no rpmbuild/RPMS/x86_64/$LIBSODIUM_NAME-$LIBSODIUM_VERS-*.rpm  $DESTINATION_SERVER:/tmp/

    - echo "Install libsodium rpm file to the destination server"
    - ssh -o StrictHostKeyChecking=no $DESTINATION_SERVER yum -y install /tmp/$LIBSODIUM_NAME-$LIBSODIUM_VERS-*.rpm

and we can see that our pipeline has passed!


Possible Problems:

that will probable fail!


because our docker images don’t recognize

Disclaimer: If you are using real fqdn - ip then you will probably not face this problem. I am referring to this issue, only for people who will follow this article step by step.

Easy fix:

# export -p EXTERNAL_URL="" && yum -y reinstall gitlab-ce

GitLab Runner

GitLab Runner is not running !

# gitlab-runner verify
Running in system-mode.                            

Verifying runner... is alive                        runner=e9bbcf90
Verifying runner... is alive                        runner=77701bad

#  gitlab-runner status
gitlab-runner: Service is not running.

# gitlab-runner install  -u gitlab-runner -d /home/gitlab-runner/

# systemctl is-active gitlab-runner

# systemctl enable gitlab-runner
# systemctl start gitlab-runner

# systemctl is-active gitlab-runner

# systemctl | egrep gitlab-runner
  gitlab-runner.service     loaded active running   GitLab Runner

# gitlab-runner status
gitlab-runner: Service is running!

# ps -e fuwww | egrep -i gitlab-[r]unner
root      5116  0.4  0.1  63428 16968 ?        Ssl  07:44   0:00 /usr/bin/gitlab-runner run --working-directory /home/gitlab-runner/ --config /etc/gitlab-runner/config.toml --service gitlab-runner --syslog --user gitlab-runner
Tag(s): gitlab, docker, CI/CD

March 10, 2018 11:28 PM

March 09, 2018


Making Of “Murdlok”, the new old adventure game for the C64

Recently, the 1986 adventure game “Murdlok” was published here for the first time. This is author Peter Hempel‘s “making-of” story, in German. (English translation)

Am Anfang war der Brotkasten: Wir schreiben das Jahr 1984, oder war es doch schon 1985? Ich hab es über all die Jahre vergessen. Computer sind noch ein Zauberwort, obwohl sie schon seit Jahren auf dem Markt angeboten werden. Derweilen sind sie so klein, dass diese problemlos auf den Tisch gestellt werden können. Mikroprozessor! Und Farbe soll der auch haben, nicht monochrom wie noch überall üblich. Commodore VC20 stand in der Reklame der Illustrierten Zeitung, der Volkscomputer, wahrlich ein merkwürdiger Name, so wie der Name der Firma die ihn herstellt. C=Commodore, was hat dieser Computer mit der Seefahrt zu tun frage ich mich? Gut, immerhin war mir die Seite ins Auge gefallen.

Das Ding holen wir uns, aber gleich den „Großen“, der C64 mit 64 KB. Den bestellen wir im Versandhandel bei Quelle. So trat mein Kumpel an mich heran. Das war damals noch mit erheblichen Kosten verbunden. Der Computer 799 D-Mark, Floppy 799 D-Mark und noch ein Bildschirm in Farbe dazu. Damals noch ein Portable TV für 599 D-Mark.

Als alles da war ging es los! Ohne Selbststudium war da nichts zu machen, für mich war diese Technologie absolutes Neuland. Ich kannte auch niemanden, der sich hier auskannte, auch mein Kumpel nicht. Es wurden Fachbücher gekauft! BASIC für Anfänger! Was für eine spannende Geschichte. Man tippt etwas ein und es gibt gleich ein Ergebnis, manchmal ein erwartetes und manchmal ein unerwartetes. Das Ding hatte mich gefesselt, Tag und Nacht, wenn es die Arbeit und die Freundin zuließ.

Irgendwann viel mir das Adventure „Zauberschloß“ von Dennis Merbach in die Hände. Diese Art von Spielen war genau mein Ding! Spielen und denken! In mir keimte der Gedanke auch so ein Adventure zu basteln. „Adventures und wie man sie programmiert“ hieß das Buch, das ich zu Rate zog. Ich wollte auf jeden Fall eine schöne Grafik haben und natürlich möglichst viele Räume. Die Geschichte habe ich mir dann ausgedacht und im Laufe der Programmierung auch ziemlich oft geändert und verbessert. Ich hatte mich entschieden, die Grafik mit einem geänderten Zeichensatz zu erzeugen. Also, den Zeichensatzeditor aus der 64’er Zeitung abgetippt. Ja, Sprites brauchte ich auch, also den Sprite-Editor aus der 64’er Zeitung abgetippt. „Maschinensprache für Anfänger“ und fertig war die kleine abgeänderte Laderoutine im Diskettenpuffer. Die Entwicklung des neuen Zeichensatzes war dann eine sehr mühselige Angelegenheit. Zeichen ändern und in die Grafik einbauen. Zeichen ändern und in die Grafik einbauen………….und so weiter. Nicht schön geworden, dann noch mal von vorne. Als das Listing zu groß wurde kam, ich ohne Drucker nicht mehr aus und musste mir einen anschaffen. Irgendwann sind mir dann auch noch die Bytes ausgegangen und der Programmcode musste optimiert werden. Jetzt hatte sich die Anschaffung des Druckers ausgezahlt.

Während ich nach Feierabend und in der Nacht programmierte, saß meine Freundin mit den Zwillingen schwanger auf der Couch. Sie musste viel Verständnis für mein stundenlanges Hacken auf dem Brotkasten aufbringen. Sie hatte es aufgebracht, das Verständnis, und somit konnte das Spiel im Jahr 1986 fertigstellt werden. War dann auch mächtig stolz darauf. Habe meine Freundin dann auch später geheiratet, oder hatte sie mich geheiratet?

Das Projekt hatte mich viel über Computer und Programmierung gelehrt. Das war auch mein hautsächlicher Antrieb das Adventure zu Ende zu bringen. Es hat mir einfach außerordentliche Freude bereitet. Einige Kopien wurden angefertigt und an Freunde verteilt. Mehr hatte ich damals nicht im Sinn.

Mir wird immer wieder die Frage gestellt: „Warum hast du dein Spiel nicht veröffentlicht?“ Ja, im nachherein war es vermutlich dumm, aber ich hatte das damals einfach nicht auf dem Schirm. Es gab zu dieser Zeit eine Vielzahl von Spielen auf dem Markt, und ich hatte nicht das Gefühl, dass die Welt gerade auf meins wartete. War wohl eine Fehleinschätzung!

Sorry, dass ihr alle so lange auf „Murdlok“ warten musstet!

Zu meiner Person: Ich heiße Peter Hempel, aber das wisst ihr ja schon. Ich bin Jahrgang 1957 und wohne in Berlin, Deutschland. Das Programmieren ist nicht mein Beruf. Als ich 1974 meine Lehre zum Elektroniker angetreten hatte waren Homecomputer noch unbekannt. Ich habe viele Jahre als Servicetechniker gearbeitet und Ampelanlagen entstört und programmiert.

Das Spiel ist dann in Vergessenheit geraten!

Derweilen hatte ich schon mit einem Amiga 2000 rumgespielt.

Wir schreiben das Jahr 2017, ich finde zufällig einen C=Commodore C65. Ein altes Gefühl meldet sich in mir. Was für eine schöne Erinnerung an vergangene Tage. Aufbruch in die Computerzeit. Der C65 stellt sofort eine Verbindung zur Vergangenheit her. Die letzten Reste meiner C64 Zeit werden wieder vorgekramt. So kommt das Adventure „Murdlok“ wieder ans Tageslicht. Spiel läuft auch auf dem C65, was für ein schönes Gefühl.

Ich habe dann Michael kennengelernt. Ihm haben wir die Veröffentlichung von „Murdlok“ zu verdanken. Ich hätte nie gedacht, dass mein altes Spiel noch mal so viel Ehre erfährt.


Ich wünsche allen viel Spaß mit meinem Spiel und natürlich beim 8-Bit Hobby.

by Michael Steil at March 09, 2018 05:12 PM

Steve Kemp's Blog

A change of direction ..

In my previous post I talked about how our child-care works here in wintery Finland, and suggested there might be a change in the near future.

So here is the predictable update; I've resigned from my job and I'm going to be taking over childcare/daycare. Ideally this will last indefinitely, but it is definitely going to continue until November. (Which is the earliest any child could be moved into public day-care if there problems.)

I've loved my job, twice, but even though it makes me happy (in a way that several other positions didn't) there is no comparison. Child-care makes me happier-still. Sure there are days when your child just wants to scream, refuse to eat, and nothing works. But on average everything is awesome.

It's a hard decision, a "brave" decision too apparently (which I read negatively!), but also an easy one to make.

It'll be hard. I'll have no free time from 7AM-5PM, except during nap-time (11AM-1PM, give or take). But it will be worth it.

And who knows, maybe I'll even get to rant at people who ask "Where's his mother?" I live for those moments. Truly.

March 09, 2018 11:00 AM

March 08, 2018

Errata Security

Some notes on memcached DDoS

I thought I'd write up some notes on the memcached DDoS. Specifically, I describe how many I found scanning the Internet with masscan, and how to use masscan as a killswitch to neuter the worst of the attacks.

Test your servers

I added code to my port scanner for this, then scanned the Internet:

masscan -pU:11211 --banners | grep memcached

This example scans the entire Internet (/0). Replaced with your address range (or ranges).

This produces output that looks like this:

Banner on port 11211/udp on [memcached] uptime=230130 time=1520485357 version=1.4.13
Banner on port 11211/udp on [memcached] uptime=3935192 time=1520485363 version=1.4.17
Banner on port 11211/udp on [memcached] uptime=230130 time=1520485357 version=1.4.13
Banner on port 11211/udp on [memcached] uptime=399858 time=1520485362 version=1.4.20
Banner on port 11211/udp on [memcached] uptime=29429482 time=1520485363 version=1.4.20
Banner on port 11211/udp on [memcached] uptime=2879363 time=1520485366 version=1.2.6
Banner on port 11211/udp on [memcached] uptime=42083736 time=1520485365 version=1.4.13

The "banners" check filters out those with valid memcached responses, so you don't get other stuff that isn't memcached. To filter this output further, use  the 'cut' to grab just column 6:

... | cut -d ' ' -f 6 | cut -d: -f1

You often get multiple responses to just one query, so you'll want to sort/uniq the list:

... | sort | uniq

My results from an Internet wide scan

I got 15181 results (or roughly 15,000).

People are using Shodan to find a list of memcached servers. They might be getting a lot results back that response to TCP instead of UDP. Only UDP can be used for the attack.

Other researchers scanned the Internet a few days ago and found ~31k. I don't know if this means people have been removing these from the Internet.

Masscan as exploit script

BTW, you can not only use masscan to find amplifiers, you can also use it to carry out the DDoS. Simply import the list of amplifier IP addresses, then spoof the source address as that of the target. All the responses will go back to the source address.

masscan -iL amplifiers.txt -pU:11211 --spoof-ip --rate 100000

I point this out to show how there's no magic in exploiting this. Numerous exploit scripts have been released, because it's so easy.

Why memcached servers are vulnerable

Like many servers, memcached listens to local IP address for local administration. By listening only on the local IP address, remote people cannot talk to the server.

However, this process is often buggy, and you end up listening on either (all interfaces) or on one of the external interfaces. There's a common Linux network stack issue where this keeps happening, like trying to get VMs connected to the network. I forget the exact details, but the point is that lots of servers that intend to listen only on end up listening on external interfaces instead. It's not a good security barrier.

Thus, there are lots of memcached servers listening on their control port (11211) on external interfaces.

How the protocol works

The protocol is documented here. It's pretty straightforward.

The easiest amplification attacks is to send the "stats" command. This is 15 byte UDP packet that causes the server to send back either a large response full of useful statistics about the server.  You often see around 10 kilobytes of response across several packets.

A harder, but more effect attack uses a two step process. You first use the "add" or "set" commands to put chunks of data into the server, then send a "get" command to retrieve it. You can easily put 100-megabytes of data into the server this way, and causes a retrieval with a single "get" command.

That's why this has been the largest amplification ever, because a single 100-byte packet can in theory cause a 100-megabytes response.

Doing the math, the 1.3 terabit/second DDoS divided across the 15,000 servers I found vulnerable on the Internet leads to an average of 100-megabits/second per server. This is fairly minor, and is indeed something even small servers (like Raspberry Pis) can generate.

Neutering the attack ("kill switch")

If they are using the more powerful attack against you, you can neuter it: you can send a "flush_all" command back at the servers who are flooding you, causing them to drop all those large chunks of data from the cache.

I'm going to describe how I would do this.

First, get a list of attackers, meaning, the amplifiers that are flooding you. The way to do this is grab a packet sniffer and capture all packets with a source port of 11211. Here is an example using tcpdump.

tcpdump -i -w attackers.pcap src port 11221

Let that run for a while, then hit [ctrl-c] to stop, then extract the list of IP addresses in the capture file. The way I do this is with tshark (comes with Wireshark):

tshark -r attackers.pcap -Tfields -eip.src | sort | uniq > amplifiers.txt

Now, craft a flush_all payload. There are many ways of doing this. For example, if you are using nmap or masscan, you can add the bytes to the nmap-payloads.txt file. Also, masscan can read this directly from a packet capture file. To do this, first craft a packet, such as with the following command line foo:

echo -en "\x00\x00\x00\x00\x00\x01\x00\x00flush_all\r\n" | nc -q1 -u 11211

Capture this packet using tcpdump or something, and save into a file "flush_all.pcap". If you want to skip this step, I've already done this for you, go grab the file from GitHub:

Now that we have our list of attackers (amplifiers.txt) and a payload to blast at them (flush_all.pcap), use masscan to send it:

masscan -iL amplifiers.txt -pU:112211 --pcap-payload flush_all.pcap

Reportedly, "shutdown" may also work to completely shutdown the amplifiers. I'll leave that as an exercise for the reader, since of course you'll be adversely affecting the servers.

Some notes

Here are some good reading on this attack:

by Robert Graham ( at March 08, 2018 12:07 PM

March 07, 2018

Vincent Bernat

Packaging an out-of-tree module for Debian with DKMS

DKMS is a framework designed to allow individual kernel modules to be upgraded without changing the whole kernel. It is also very easy to rebuild modules as you upgrade kernels.

On Debian-like systems,1 DKMS enables the installation of various drivers, from ZFS on Linux to VirtualBox kernel modules or NVIDIA drivers. These out-of-tree modules are not distributed as binaries: once installed, they need to be compiled for your current kernel. Everything is done automatically:

# apt install zfs-dkms
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  binutils cpp cpp-6 dkms fakeroot gcc gcc-6 gcc-6-base libasan3 libatomic1 libc-dev-bin libc6-dev
  libcc1-0 libcilkrts5 libfakeroot libgcc-6-dev libgcc1 libgomp1 libisl15 libitm1 liblsan0 libmpc3
  libmpfr4 libmpx2 libnvpair1linux libquadmath0 libstdc++6 libtsan0 libubsan0 libuutil1linux libzfs2linux
  libzpool2linux linux-compiler-gcc-6-x86 linux-headers-4.9.0-6-amd64 linux-headers-4.9.0-6-common
  linux-headers-amd64 linux-kbuild-4.9 linux-libc-dev make manpages manpages-dev patch spl spl-dkms
  zfs-zed zfsutils-linux
3 upgraded, 44 newly installed, 0 to remove and 3 not upgraded.
Need to get 42.1 MB of archives.
After this operation, 187 MB of additional disk space will be used.
Do you want to continue? [Y/n]
# dkms status
spl,, 4.9.0-6-amd64, x86_64: installed
zfs,, 4.9.0-6-amd64, x86_64: installed
# modinfo zfs | head
filename:       /lib/modules/4.9.0-6-amd64/updates/dkms/zfs.ko
license:        CDDL
author:         OpenZFS on Linux
description:    ZFS
srcversion:     42C4AB70887EA26A9970936
depends:        spl,znvpair,zcommon,zunicode,zavl
retpoline:      Y
vermagic:       4.9.0-6-amd64 SMP mod_unload modversions
parm:           zvol_inhibit_dev:Do not create zvol device nodes (uint)

If you install a new kernel, a compilation of the module is automatically triggered.

Building your own DKMS-enabled package🔗

Suppose you’ve gotten your hands on an Intel XXV710-DA2 NIC. This card is handled by the i40e driver. Unfortunately, it only got support from Linux 4.10 and you are using a stock 4.9 Debian Stretch kernel. DKMS provides here an easy solution!

Download the driver from Intel, unpack it in some directory and add a debian/ subdirectory with the following files:

  • debian/changelog:

    i40e-dkms (2.4.6-0) stretch; urgency=medium
      * Initial package.
     -- Vincent Bernat <>  Tue, 27 Feb 2018 17:20:58 +0100
  • debian/control:

    Source: i40e-dkms
    Maintainer: Vincent Bernat <>
    Build-Depends: debhelper (>= 9), dkms
    Package: i40e-dkms
    Architecture: all
    Depends: ${misc:Depends}
    Description: DKMS source for the Intel i40e network driver
  • debian/rules:

    #!/usr/bin/make -f
    include /usr/share/dpkg/
            dh $@ --with dkms
            dh_install src/* usr/src/i40e-$(DEB_VERSION_UPSTREAM)/
            dh_dkms -V $(DEB_VERSION_UPSTREAM)
  • debian/i40e-dkms.dkms:

  • debian/compat:


In debian/changelog, pay attention to the version. The version of the driver is 2.4.6. Therefore, we use 2.4.6-0 for the package. In debian/rules, we install the source of the driver in /usr/src/i40e-2.4.6—the version is extracted from debian/changelog.

The content of debian/i40e-dkms.dkms is described in details in the dkms(8) manual page. The i40e driver is fairly standard and dkms is able to figure out how to compile it. However, if your kernel module does not follow the usual conventions, it is the right place to override the build command.

Once all the files are in place, you can turn the directory into a Debian package with, for example, the dpkg-buildpackage command.2 At the end of this operation, you get your DKMS-enabled package, i40e-dkms_2.4.6-0_all.deb. Put it in your internal repository and install it on the target.

Avoiding compilation on target🔗

If you feel uncomfortable installing compilation tools on the target servers, there is a simple solution. Since version,3 thanks to Thijs Kinkhorst, dkms can build lean binary packages with only the built modules. For each kernel version, you build such a package in your CI system:

KERNEL_VERSION=4.9.0-6-amd64 # could be a Jenkins parameter
apt -qyy install \
      i40e-dkms \
      linux-image-${KERNEL_VERSION} \

DRIVER_VERSION=$(dkms status i40e | awk -F', ' '{print $2}')
dkms mkbmdeb i40e/${DRIVER_VERSION} -k ${KERNEL_VERSION}

cd /var/lib/dkms/i40e/${DRIVER_VERSION}/bmdeb/
dpkg -c i40e-modules-${KERNEL_VERSION}_*
dpkg -I i40e-modules-${KERNEL_VERSION}_*

Here is the shortened output of the two last commands:

# dpkg -c i40e-modules-${KERNEL_VERSION}_*
-rw-r--r-- root/root    551664 2018-03-01 19:16 ./lib/modules/4.9.0-6-amd64/updates/dkms/i40e.ko
# dpkg -I i40e-modules-${KERNEL_VERSION}_*
 new debian package, version 2.0.
 Package: i40e-modules-4.9.0-6-amd64
 Source: i40e-dkms-bin
 Version: 2.4.6
 Architecture: amd64
 Maintainer: Dynamic Kernel Modules Support Team <>
 Installed-Size: 555
 Depends: linux-image-4.9.0-6-amd64
 Provides: i40e-modules
 Section: misc
 Priority: optional
 Description: i40e binary drivers for linux-image-4.9.0-6-amd64
  This package contains i40e drivers for the 4.9.0-6-amd64 Linux kernel,
  built from i40e-dkms for the amd64 architecture.

The generated Debian package contains the pre-compiled driver and only depends on the associated kernel. You can safely install it without pulling dozens of packages.

  1. DKMS is also compatible with RPM-based distributions but the content of this article is not suitable for these. ↩︎

  2. You may need to install some additional packages: build-essential, fakeroot and debhelper↩︎

  3. Available in Debian Stretch and in the backports for Debian Jessie. However, for Ubuntu Xenial, you need to backport a more recent version of dkms↩︎

by Vincent Bernat at March 07, 2018 05:38 PM


50 000 Node Choria Network

I’ve been saying for a while now my aim with Choria is that someone can get a 50 000 node Choria network that just works without tuning, like, by default that should be the scale it supports at minimum.

I started working on a set of emulators to let you confirm that yourself – and for me to use it during development to ensure I do not break this promise – though that got a bit side tracked as I wanted to do less emulation and more just running 50 000 instances of actual Choria, more on that in a future post.

Today I want to talk a bit about a actual 50 000 real nodes deployment and how I got there – the good news is that it’s terribly boring since as promised it just works.



The network is pretty much just your typical DC network. Bunch of TOR switches, Distribution switches and Core switches, nothing special. Many dom0’s and many more domUs and some specialised machines. It’s flat there are firewalls between all things but it’s all in one building.


I have 4 machines, 3 set aside for the Choria Network Broker Cluster and 1 for a client, while waiting for my firewall ports I just used the 1 machine for all the nodes as well as the client. It’s a 8GB RAM VM with 4 vCPU, not overly fancy at all. Runs Enterprise Linux 6.

In the past I think we’d have considered this machine on the small side for a ActiveMQ network with 1000 nodes 😛

I’ll show some details of the single Choria Network Broker here and later follow up about the clustered setup.

Just to be clear, I am going to show managing 50 000 nodes on a machine that’s the equivalent of a $40/month Linode.


I run a custom build of Choria 0.0.11, I bump the max connections up to 100k and turned off SSL since we simply can’t provision certificates, so a custom build let me get around all that.

The real reason for the custom build though is that we compile in our agent into the binary so the whole deployment that goes out to all nodes and broker is basically what you see below, no further dependencies at all, this makes for quite a nice deployment story since we’re a bit challenged in that regard.

$ rpm -ql choria

Other than this custom agent and no SSL we’re about on par what you’d get if you just install Choria from the repos.

Network Broker Setup

The Choria Network Broker is deployed basically exactly as the docs. Including setting the sysctl values to what was specified in the docs.

identity =
logfile = /var/log/choria.log
plugin.choria.stats_address = ::
plugin.choria.stats_port = 8222 = :: = 4222 = 4223

Most of this isn’t even needed basically if you use defaults like you should.

Server Setup

The server setup was even more boring:

logger_type = file
logfile = /var/log/choria.log
plugin.choria.middleware_hosts =
plugin.choria.use_srv = false


So we were being quite conservative and deployed it in batches of 50 a time, you can see the graph below of this process as seen from the Choria Network Broker (click for larger):

This is all pretty boring actually, quite predictable growth in memory, go routines, cpu etc. The messages you see being sent is me doing lots of pings and rpc’s and stuff just to check it’s all going well.

$ ps -auxw|grep choria
root     22365 12.9 14.4 2879712 1175928 ?     Sl   Mar06 241:34 /usr/choria broker --config=....
# a bit later than the image above
$ sudo netstat -anp|grep 22365|grep ESTAB|wc -l


So how does work in practise? In the past we’d have had a lot of issues with getting consistency out of a network of even 10% this size, I was quite confident it was not the Ruby side, but you never know?

Well, lets look at this one, I set discovery_timeout = 20 in my client configuration:

$ mco rpc rpcutil ping --display failed
Finished processing 51152 / 51152 hosts in 20675.80 ms
Finished processing 51152 / 51152 hosts in 20746.82 ms
Finished processing 51152 / 51152 hosts in 20778.17 ms
Finished processing 51152 / 51152 hosts in 22627.80 ms
Finished processing 51152 / 51152 hosts in 20238.92 ms

That’s a huge huge improvement, and this is without fancy discovery methods or databases or anything – it’s the, generally fairly unreliable, broadcast based method of discovery. These same nodes on a big RabbitMQ cluster never gets a consistent result (and it’s 40 seconds slower), so this is a huge win for me.

I am still using the Ruby code here of course and it’s single threaded and stuck on 1 CPU, so in practise it’s going to have a hard ceiling of churning through about 2500 to 3000 replies/second, hence the long timeouts there.

I have a go based ping, it round trips this network in less than 3.5 seconds quite reliably – wow.

The broker peaked at 25Mbps at times when doing many concurrent RPC requests and pings etc, but it’s all just been pretty good with no surprises.

The ruby client is a bit big so as a final test I bumped the RAM on this node to 16GB. If I run 6 x RPC clients at exactly the same time doing a full estate RPC round trip (including broadcast based discovery) all 6 clients get exactly the same results consistently. So I guess I know the Ruby code was never the problem and I am very glad to see code I designed and wrote in 2009 scaling to this size – the Ruby client code really have never been touched after initial development.

So, that’s about it, I really can’t complain about this.

by R.I. Pienaar at March 07, 2018 04:50 PM

March 05, 2018


Choria Progress Update

It’s been a while since I posted about Choria and where things are. There are major changes in the pipeline so it’s well overdue a update.

The features mentioned here will become current in the next release cycle – about 2 weeks from now.

New choria module

The current gen Choria modules grew a bit organically and there’s a bit of a confusion between the various modules. I now have a new choria module, it will consume features from the current modules and deprecate them.

On the next release it can manage:

  1. Choria YUM and APT repos
  2. Choria Package
  3. Choria Network Broker
  4. Choria Federation Broker
  5. Choria Data Adatpaters

Network Brokers

We have had amazing success with the NATS broker, lightweight, fast, stable. It’s perfect for Choria. While I had a pretty good module to configure it I wanted to create a more singular experience. Towards that there is a new Choria Broker incoming that manages an embedded NATS instance.

To show what I am on about, imagine this is all that is required to configure a cluster of 3 production ready brokers capable of hosting 50k or more Choria managed nodes on modestly specced machines:

plugin.choria.broker_network = true = nats://, nats://, nats://
plugin.choria.stats_address = ::

Of course there is Puppet code to do this for you in choria::broker.

That’s it, start the choria-broker daemon and you’re done – and ready to monitor it using Prometheus. Like before it’s all TLS and all that kinds of good stuff.

Federation Brokers

We had good success with the Ruby Federation Brokers but they also had issues particularly around deployment as we had to deploy many instances of them and they tended to be quite big Ruby processes.

The same choria-broker that hosts the Network Broker will now also host a new Golang based Federation Broker network. Configuration is about the same as before you don’t need to learn new things, you just have to move to the configuration in choria::broker and retire the old ones.

Unlike the past where you had to run 2 or 3 of the Federation Brokers per node you now do not run any additional processes, you just enable the feature in the singular choria-broker, you only get 1 process. Internally each run 10 instances of the Federation Broker, its much more performant and scalable.

Monitoring is done via Prometheus.

Data Adapters

Previously we had all kinds of fairly bad schemes to manage registration in MCollective. The MCollective daemon would make requests to a registration agent, you’d designate one or more nodes as running this agent and so build either a file store, mongodb store etc.

This was fine at small size but soon enough the concurrency in large networks would overwhelm what could realistically be expected from the Agent mechanism to manage.

I’ve often wanted to revisit that but did not know what approach to take. In the years since then the Stream Processing world has exploded with tools like Kafka, NATS Streaming and offerings from GPC, AWS and Azure etc.

Data Adapters are hosted in the Choria Broker and provide stateless, horizontally and vertically scalable Adapters that can take data from Choria and translate and publish them into other systems.

Today I support NATS Streaming and the code is at first-iteration quality, problems I hope to solve with this:

  • Very large global scale node metadata ingest
  • IoT data ingest – the upcoming Choria Server is embeddable into any Go project and it can exfil data into Stream Processors using this framework
  • Asynchronous RPC – replies to requests streaming into Kafka for later processing, more suitable for web apps etc
  • Adhoc asynchronous data rewrites – we have had feature requests where person one can make a request but not see replies, they go into Elastic Search


After 18 months of trying to get Puppet Inc to let me continue development on the old code base I have finally given up. The plugins are now hosted in their own GitHub Organisation.

I’ve released a number of plugins that were never released under Choria.

I’ve updated all their docs to be Choria specific rather than out dated install docs.

I’ve added Action Policy rules allowing read only actions by default – eg. puppet status will work for anyone, puppet runonce will give access denied.

I’ve started adding Playbooks the first ones are mcollective_agent_puppet::enable, mcollective_agent_puppet::disable and mcollective_agent_puppet::disable_and_wait.

Embeddable Choria

The new Choria Server is embeddable into any Go project. This is not a new area of research for me – this was actually the problem I tried to solve when I first wrote the current gen MCollective, but i never got so far really.

The idea is that if you have some application – like my Prometheus Streams system – where you will run many of a specific daemon each with different properties and areas of responsibility you can make that daemon connect to a Choria network as if it’s a normal Choria Server. The purpose of that is to embed into the daemon it’s life cycle management and provide an external API into this.

The above mentioned Prometheus Streams server for example have a circuit breaker that can start/stop the polling and replication of data:

$ mco rpc prometheus_streams switch -T prometheus
Discovering hosts using the mc method for 2 second(s) .... 1
 * [ ============================================================> ] 1 / 1
     Mode: poller
   Paused: true
Summary of Mode:
   poller = 1
Summary of Paused:
   false = 1
Finished processing 1 / 1 hosts in 399.81 ms

Here I am communicating with the internals of the Go process, they sit in their of Sub Collective, expose facts and RPC endpoints. I can use discovery to find all only nodes in certain modes, with certain jobs etc and perform functions you’d typically do via a REST management interface over a more suitable interface.

Likewise I’ve embedded a Choria Server into IoT systems where it uses the above mentioned Data Adapters to publish temperature and humidity while giving me the ability to extract from those devices data in demand using RPC and do things like in-place upgrades of the running binary on my IoT network.

You can use this today in your own projects and it’s compatible with the Ruby Choria you already run. A full walk through of doing this can be found in the ripienaar/embedded-choria-sample repository.

by R.I. Pienaar at March 05, 2018 09:42 AM

March 04, 2018

Lurch: a unixy launcher and auto-typer

I cobbled together a unixy command / application launcher and auto-typer. I've dubbed it Lurch.


  • Fuzzy filtering as-you-type.
  • Execute commands.
  • Open new browser tabs.
  • Auto-type into currently focussed window
  • Auto-type TOTP / rfc6238 / two-factor / Google Authenticator codes.
  • Unixy and composable. Reads entries from stdin.

You can use and combine these features to do many things:

  • Auto-type passwords
  • Switch between currently opened windows by typing a part of its title (using wmctrl to list and switch to windows)
  • As a generic (and very customizable) application launcher by parsing .desktop entries or whatever.
  • Quickly cd to parts of your filesystem using auto-type.
  • Open browser tabs and search via google or specific search engines.
  • List all entries in your SSH configuration and quickly launch an ssh session to one of them.
  • Etc.

You'll need a way to launch it when you press a keybinding. That's usually the window manager's job. For XFCE, you can add a keybinding under the Keyboard -> Application Shortcuts settings dialog.

Here's what it looks like:

Unfortunately, due to time constraints, I cannot provide any support for this project:

NO SUPPORT: There is absolutely ZERO support on this project. Due to time constraints, I don't take bug or features reports and probably won't accept your pull requests.

You can get it from the Github page.

by admin at March 04, 2018 08:45 AM

multi-git-status can now hide repos that don't need attention

I've added an "-e" argument to my multi-git-status project. It hides repositories that have no unpushed, untracked or uncommitted changes.

Without "-e":

And with the "-e" argument:

by admin at March 04, 2018 08:29 AM

March 02, 2018

Errata Security

AskRob: Does Tor let government peek at vuln info?

On Twitter, somebody asked this question:

The question is about a blog post that claims Tor privately tips off the government about vulnerabilities, using as proof a "vulnerability" from October 2007 that wasn't made public until 2011.

The tl;dr is that it's bunk. There was no vulnerability, it was a feature request. The details were already public. There was no spy agency involved, but the agency that does Voice of America, and which tries to protect activists under foreign repressive regimes.


The issue is that Tor traffic looks like Tor traffic, making it easy to block/censor, or worse, identify users. Over the years, Tor has added features to make it look more and more like normal traffic, like the encrypted traffic used by Facebook, Google, and Apple. Tors improves this bit-by-bit over time, but short of actually piggybacking on website traffic, it will always leave some telltale signature.

An example showing how we can distinguish Tor traffic is the packet below, from the latest version of the Tor server:

Had this been Google or Facebook, the names would be something like "" or "". Or, had this been a normal "self-signed" certificate, the names would still be recognizable. But Tor creates randomized names, with letters and numbers, making it distinctive. It's hard to automate detection of this, because it's only probably Tor (other self-signed certificates look like this, too), which means you'll have occasional "false-positives". But still, if you compare this to the pattern of traffic, you can reliably detect that Tor is happening on your network.

This has always been a known issue, since the earliest days. Google the search term "detect tor traffic", and set your advanced search dates to before 2007, and you'll see lots of discussion about this, such as this post for writing intrusion-detection signatures for Tor.

Among the things you'll find is this presentation from 2006 where its creator (Roger Dingledine) talks about how Tor can be identified on the network with its unique network fingerprint. For a "vulnerability" they supposedly kept private until 2011, they were awfully darn public about it.

The above blogpost claims Tor kept this vulnerability secret until 2011 by citing this message. It's because Levine doesn't understand the terminology and is just blindly searching for an exact match for "TLS normalization". Here's an earlier proposed change for the long term goal of to "make our connection handshake look closer to a regular HTTPS [TLS] connection", from February 2007. Here is another proposal from October 2007 on changing TLS certificates, from days after the email discussion (after they shipped the feature, presumably).

What we see here is here is a known problem from the very beginning of the project, a long term effort to fix that problem, and a slow dribble of features added over time to preserve backwards compatibility.

Now let's talk about the original train of emails cited in the blogpost. It's hard to see the full context here, but it sounds like BBG made a feature request to make Tor look even more like normal TLS, which is hinted with the phrase "make our funders happy". Of course the people giving Tor money are going to ask for improvements, and of course Tor would in turn discuss those improvements with the donor before implementing them. It's common in project management: somebody sends you a feature request, you then send the proposal back to them to verify what you are building is what they asked for.

As for the subsequent salacious paragraph about "secrecy", that too is normal. When improving a problem, you don't want to talk about the details until after you have a fix. But note that this is largely more for PR than anything else. The details on how to detect Tor are available to anybody who looks for them -- they just aren't readily accessible to the layman. For example, Tenable Networks announced the previous month exactly this ability to detect Tor's traffic, because any techy wanting to would've found the secrets how to. Indeed, Teneble's announcement may have been the impetus for BBG's request to Tor: "can you fix it so that this new Tenable feature no longer works".

To be clear, there are zero secret "vulnerability details" here that some secret spy agency could use to detect Tor. They were already known, and in the Teneble product, and within the grasp of any techy who wanted to discover them. A spy agency could just buy Teneble, or copy it, instead of going through this intricate conspiracy.


The issue isn't a "vulnerability". Tor traffic is recognizable on the network, and over time, they make it less and less recognizable. Eventually they'll just piggyback on true HTTPS and convince CloudFlare to host ingress nodes, or something, making it completely undetectable. In the meanwhile, it leaves behind fingerprints, as I showed above.

What we see in the email exchanges is the normal interaction of a donor asking for a feature, not a private "tip off". It's likely the donor is the one who tipped off Tor, pointing out Tenable's product to detect Tor.

Whatever secrets Tor could have tipped off to the "secret spy agency" were no more than what Tenable was already doing in a shipping product.

Update: People are trying to make it look like Voice of America is some sort of intelligence agency. That's a conspiracy theory. It's not a member of the American intelligence community. You'd have to come up with a solid reason explaining why the United States is hiding VoA's membership in the intelligence community, or you'd have to believe that everything in the U.S. government is really just some arm of the C.I.A.

by Robert Graham ( at March 02, 2018 02:51 AM

March 01, 2018

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – February 2018

It is mildly shocking that I’ve been blogging for 13+ years (my first blog post on this blog was in December 2005, my old blog at O’Reilly predates this by about a year), so let’s spend a moment contemplating this fact.

<contemplative pause here :-)>

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
  3. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  5. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version)
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

Critical reference posts:
Current research on testing security:
Current research on threat detection “starter kit”
Just finished research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin ( at March 01, 2018 05:19 PM

The Lone Sysadmin

How to Troubleshoot Unreliable or Malfunctioning Hardware

My post on Intel X710 NICs being awful has triggered a lot of emotion and commentary from my readers. One of the common questions has been: so I have X710 NICs, what do I do? How do I troubleshoot hardware that isn’t working right? 1. Document how to reproduce the problem and its severity. Is […]

The post How to Troubleshoot Unreliable or Malfunctioning Hardware appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 01, 2018 04:55 PM


Seeking Last Group of Contributors

The following is a press release that we just put out about how finishing off our relicensing effort. For the impatient, please see to help us find the last people; we want to change the license with our next release, which is currently in Alpha, and tentatively set for May.

For background, you can see all posts in the license category.

One copy of the press release is at

OpenSSL Seeking Last Group of Contributors

Looking for programmers who contributed code to the OpenSSL project

The OpenSSL project, [] (, is trying to reach the last couple-dozen people who have contributed code to OpenSSL. They are asking people to look at to see if they recognize any names. If so, contact with any information.

This marks one of the final steps in the project’s work to change the license from its non-standard custom text, to the highly popular Apache License. This effort first started in the Fall of 2015, by requiring contributor agreements. Last March, the project made a major publicity effort, with large coverage in the industry. It also began to reach out and contact all contributors, as found by reviewing all changes made to the source. Over 600 people have already responded to emails or other attempts to contact them, and more than 98% agreed with the change. The project removed the code of all those who disagreed with the change. In order to properly respect the desires of all original authors, the project continues to make strong efforts to find everyone.

Measured purely by simple metrics, the average contribution still outstanding is not large. There are a total of 59 commits without a response, out of a history of more than 32,300. On average, each person submitted a patch that modified 3-4 files, adding 100 lines and removing 23.

“We’re very pleased to be changing the license, and I am personally happy that OpenSSL has adopted the widely deployed Apache License,” said Mark Cox, a founding member of the OpenSSL Management Committee. Cox is also a founder and former Board Member of the Apache Software Foundation.

The project hopes to conclude its two-year relicensing effort in time for the next release, which will include an implementation of TLS 1.3.

For more information, email


March 01, 2018 06:00 AM

February 28, 2018

The Lone Sysadmin

Intel X710 NICs Are Crap

(I’m grumpy this week and I’m giving myself permission to return to my blogging roots and complain about stuff. Deal with it.) In the not so distant past we were growing a VMware cluster and ordered 17 new blade servers with X710 NICs. Bad idea. X710 NICs suck, as it turns out. Those NICs do […]

The post Intel X710 NICs Are Crap appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at February 28, 2018 09:29 PM

Everything Sysadmin

DevOpsDays New York City 2019: Join the planning committee!

2019 feels like a long way off, but since the conference is in January, we need to start planning soon. The sooner we start, the less rushed the planning can be.

I have to confess that working with the 2018 committee was one of the best and most professional conference planning experiences I've ever had. I've been involved with many conferences over the years and this experience was one of the best!

I invite new people to join the committee for 2019. The best way to learn about organizing is to join a committee and help out. You will be mentored and learn a lot in the process. Nothing involved in creating a conference is difficult, it just takes time and commitment.

Interested in being on the next planning committee? An informational meeting will be held via WedEx on Tuesday, March 6 at 2pm (NYC timezone, of course!).

During this kick-off meeting, the 2018 committee will review what roles they took on, what went well, what could be improved and the timeframe for the 2019 event. Please note, attendance to this meeting doesn't commit you to help organize this event, however, it is hoped by the end that we will be able to firm up who will comprise the 2019 event committee.

Hope you all can make it!

If you are interested in attending, email for connection info.

by Tom Limoncelli at February 28, 2018 04:06 PM

February 27, 2018

Everything Sysadmin

Male Ally Summit 2018 (NYC)

(quoted from the website)

The Male Ally Summit comes after a successful event on March 13, 2017 in NYC called, "The Role Male Allies, Advocates, and Mentors have in Retaining Women in Tech", we had over 180 in attendance. We know this year will be just as impactful.

This year we bring back some of the same amazing speakers such as David Smith and Brad Johnson who are co-authors of, "Athena Rising: How and Why Men Should Mentor Women"; Evin Robinson (Co-founder, New York on Tech); Matt Wallaert (Chief Behavioral Officer, Clover Health). We add to the agenda, Heather Cabot (Co-author, Geek Girl Rising); Bryan Liles (Software Engineer, Heptio); Avis Yates Rivers (CEO and Author, Technology Concepts Groups International), Dennis Kozak (SVP of Next Generation Portfolio Strategy, CA Technologies) and others!

At this event, we bring Advocates, Mentors, Sponsors and many others under the umbrella, "Ally" with the vision of enlightening ourselves to be the change agents who will transform the companies and environment we interact with, and in, on a daily basis. You will leave the Summit with the following knowledge:

  • A guide to mentor women in the workplace
  • Best practices to sponsoring female employees
  • Steps you can take to become an ally for women in your workplace

For more information visit

by Tom Limoncelli at February 27, 2018 10:27 PM

February 26, 2018


Importing Pcap into Security Onion

Within the last week, Doug Burks of Security Onion (SO) added a new script that revolutionizes the use case for his amazing open source network security monitoring platform.

I have always used SO in a live production mode, meaning I deploy a SO sensor sniffing a live network interface. As the multitude of SO components observe network traffic, they generate, store, and display various forms of NSM data for use by analysts.

The problem with this model is that it could not be used for processing stored network traffic. If one simply replayed the traffic from a .pcap file, the new traffic would be assigned contemporary timestamps by the various tools observing the traffic.

While all of the NSM tools in SO have the independent capability to read stored .pcap files, there was no unified way to integrate their output into the SO platform.

Therefore, for years, there has not been a way to import .pcap files into SO -- until last week!

Here is how I tested the new so-import-pcap script. First, I made sure I was running Security Onion Elastic Stack Release Candidate 2 ( ISO) or later. Next I downloaded the script using wget from

I continued as follows:

richard@so1:~$ sudo cp so-import-pcap /usr/sbin/

richard@so1:~$ sudo chmod 755 /usr/sbin/so-import-pcap

I tried running the script against two of the sample files packaged with SO, but ran into issues with both.

richard@so1:~$ sudo so-import-pcap /opt/samples/10k.pcap


Please wait while...
...creating temp pcap for processing.
mergecap: Error reading /opt/samples/10k.pcap: The file appears to be damaged or corrupt
(pcap: File has 263718464-byte packet, bigger than maximum of 262144)
Error while merging!

I checked the file with capinfos.

richard@so1:~$ capinfos /opt/samples/10k.pcap
capinfos: An error occurred after reading 17046 packets from "/opt/samples/10k.pcap": The file appears to be damaged or corrupt.
(pcap: File has 263718464-byte packet, bigger than maximum of 262144)

Capinfos confirmed the problem. Let's try another!

richard@so1:~$ sudo so-import-pcap /opt/samples/zeus-sample-1.pcap


Please wait while...
...creating temp pcap for processing.
mergecap: Error reading /opt/samples/zeus-sample-1.pcap: The file appears to be damaged or corrupt
(pcap: File has 1984391168-byte packet, bigger than maximum of 262144)
Error while merging!

Another bad file. Trying a third!

richard@so1:~$ sudo so-import-pcap /opt/samples/evidence03.pcap


Please wait while...
...creating temp pcap for processing.
...setting sguild debug to 2 and restarting sguild.
...configuring syslog-ng to pick up sguild logs.
...disabling syslog output in barnyard.
...configuring logstash to parse sguild logs (this may take a few minutes, but should only need to be done once)...done.
...stopping curator.
...disabling curator.
...stopping ossec_agent.
...disabling ossec_agent.
...stopping Bro sniffing process.
...disabling Bro sniffing process.
...stopping IDS sniffing process.
...disabling IDS sniffing process.
...stopping netsniff-ng.
...disabling netsniff-ng.
...adjusting CapMe to allow pcaps up to 50 years old.
...analyzing traffic with Snort.
...analyzing traffic with Bro.
...writing /nsm/sensor_data/so1-eth1/dailylogs/2009-12-28/snort.log.1261958400

Import complete!

You can use this hyperlink to view data in the time range of your import:

or you can manually set your Time Range to be:
From: 2009-12-28    To: 2009-12-29

Incidentally here is the capinfos output for this trace.

richard@so1:~$ capinfos /opt/samples/evidence03.pcap
File name:           /opt/samples/evidence03.pcap
File type:           Wireshark/tcpdump/... - pcap
File encapsulation:  Ethernet
Packet size limit:   file hdr: 65535 bytes
Number of packets:   1778
File size:           1537 kB
Data size:           1508 kB
Capture duration:    171 seconds
Start time:          Mon Dec 28 04:08:01 2009
End time:            Mon Dec 28 04:10:52 2009
Data byte rate:      8814 bytes/s
Data bit rate:       70 kbps
Average packet size: 848.57 bytes
Average packet rate: 10 packets/sec
SHA1:                34e5369c8151cf11a48732fed82f690c79d2b253
RIPEMD160:           afb2a911b4b3e38bc2967a9129f0a11639ebe97f
MD5:                 f8a01fbe84ef960d7cbd793e0c52a6c9
Strict time order:   True

That worked! Now to see what I can find in the SO interface.

I accessed the Kibana application and changed the timeframe to include those in the trace.

Here's another screenshot. Again I had to adjust for the proper time range.

Very cool! However, I did not find any IDS alerts. This made me wonder if there was a problem with alert processing. I decided to run the script on a new .pcap:

richard@so1:~$ sudo so-import-pcap /opt/samples/emerging-all.pcap


Please wait while...
...creating temp pcap for processing.
...analyzing traffic with Snort.
...analyzing traffic with Bro.
...writing /nsm/sensor_data/so1-eth1/dailylogs/2010-01-27/snort.log.1264550400

Import complete!

You can use this hyperlink to view data in the time range of your import:

or you can manually set your Time Range to be:
From: 2010-01-27    To: 2010-01-28

When I searched the interface for NIDS alerts (after adjusting the time range), I found results:

The alerts show up in Sguil, too!

This is a wonderful development for the Security Onion community. Being able to import .pcap files and analyze them with the standard SO tools and processes, while preserving timestamps, makes SO a viable network forensics platform.

This thread in the mailing list is covering the new script.

I suggest running on an evaluation system, probably in a virtual machine. I did all my testing on Virtual Box. Check it out! 

by Richard Bejtlich ( at February 26, 2018 05:12 PM

Ben's Practical Admin Blog

Active Directory & Certificates – Which One is Being Used?

So here’s a question I want you to try answering off the top of your head – Which certificate is your domain controller using for Kerberos & LDAPS and what happens when there are multiple certificates in the crypto store?

The answer is actually pretty obvious if you already know the answer, however this was the question I faced recently, and ended up having to do a little bit of poking around to answer the question.

The scenario in question for me is having built a new multi-tier PKI in our environment I have reached the point of migrating services to it, including the auto-enrolling certificates templates used on Domain Controllers.

For most contemporary active directory installs where AD certificate services is also used, there are two main certificate templates related to domain controllers:

  • Kerberos Authentication
  • Directory Email Replication

The “Kerberos Authentication” certificate template made it’s appearance in Windows Server 2008, replacing the “Domain Controller” and “Domain Controller Authentication” templates in earlier versions of ADCS. The “Directory Email Replication” template is used where you use email protocols to replicate AD (I am not quite sure why anyone would want to do this in this day & age).

Getting back to my scenario and question, how do you work out which certificate is in use?

In both examples, we’re interested in the certificate serial number. The first way is to use a network analyser such as Wireshark (or MS Message Analyzer) to trace a connection to port 636 of a domain controller:

Wireshark Screenshot of TLS Connection

Using a network analyser is nifty in that you can see the full handshake occurring and the data passed – something crypto-geeks can get excited about 🙂 expanding out the information we can obtain the serial number: 655dc58900010000e01e

Alternatively, if you have openSSL available, you can use the following commands to connect and obtain similar information:

openssl s_client -connect <LDAPS Server>

This will connect to the server and amongst the output will be the offered certificate in bas64 format.  Copying the All text between and including —–BEGIN CERTIFICATE—– & —–END CERTIFICATE—– to a file which will give you the public key being offered. You can then run this command:

openssl x509 -in <certificate-file> -check

To obtain all the detailed information on the certificate, including the serial number.


From here, it’s just a matter of checking the personal certificate store on the local computer account and find the certificate with the matching serial:


What Happens for multiple Kerberos Certificates?

Again, looking back at my scenario, I now have two Kerberos Authentication certificates in my store – one from the old CA Infrastructure, and the other from the New CAS Infrastructure, with a different template name to meet naming standards.

Using the tried and true method of “test it and see what happens”, I found that the AD DS service will always use the newest certificate available. That is, the one that has the newest validity start date. As an example, if today is February 26, the certificate which is valid from February 25th will be used over the certificate valid from February 20th.

Changing between certificates is a seamless affair. AD Domain Services doesn’t need restarting, nor does the machine in general.


So there you have it, Domain Controllers at their base use 1-2 certificate templates, based on how you replicate. There’s no native way (that I found) to work out which certificate is being used, so tools like Wireshark and OpenSSL can be useful for obtaining certificate information to reference. Finally, Domain Controllers will use the Kerberos Certificate with the latest validity period.

by Ben at February 26, 2018 12:55 AM

February 25, 2018

Sarah Allen

listening to very specific events

The model of declarative eventing allows for listening to very specific events and then triggering specific actions. This model simplifies the developer experience, as well as optimizing the system by reducing network traffic.

AWS S3 bucket trigger

In looking AWS to explain changes in S3 can trigger Lambda functions, I found that the AWS product docs focus on the GUI configuration experience. This probably makes it easy for new folks to write a specific Lambda function; however, it a little harder to see the system patterns before gaining a lot of hands-on experience.

The trigger-action association can be seen more clearly in a Terraform configuration. Under the hood, Teraform must be using AWS APIs for setting up the trigger). The configuration below specifies that whenever a json file is uploaded to a specific bucket with the path prefix “content-packages” then a specific Lambda function will be executed:

resource "aws_s3_bucket_notification" "bucket_terraform_notification" {
    bucket = "${}"
    lambda_function {
        lambda_function_arn = "${aws_lambda_function.terraform_func.arn}"
        events = ["s3:ObjectCreated:*"]
        filter_prefix = "content-packages/"
        filter_suffix = ".json"

— via justinsoliz’ github gist

Google Cloud events

To illustrate an alternate developer experience, the examples below are shown with Firebase JavaScript SDK for Google Cloud Functions, which is idiomatic for JavaScript developers using the Fluent API style, popularized by jQuery. The same functionality is available via command line options using gcloud, the Google Cloud CLI.

** Cloud Storage trigger**

Below is an example of specifying a trigger for a change to a Google Cloud Storage object in a specific bucket:

exports.generateThumbnail ='my-bucket').object().onChange((event) => {
  // ...

Cloud Firestore trigger

This approach to filtering events at their source is very powerful when applied to database operations, where a developer can listen to a specific database path, such as with Cloud Firestore events:

exports.createProduct = functions.firestore
  .onCreate(event => {
    // Get an object representing the document
    // e.g. {'name': 'Wooden Doll', 'description': '...}
    var newValue =;

    // access a particular field as you would any JS property
    var name =;

    // perform desired operations ...

by sarah at February 25, 2018 04:05 PM


Choria Playbooks DSL

I previously wrote about Choria Playbooks – a reminder they are playbooks written in YAML format and can orchestrate many different kinds of tasks, data, inputs and discovery systems – not exclusively ones from MCollective. It integrates with tools like terraform, consul, etcd, Slack, Graphite, Webhooks, Shell scripts, Puppet PQL and of course MCollective.

I mentioned in that blog post that I did not think a YAML based playbook is the way to go.

I am very pleased to announce that with the release of Choria 0.6.0 playbooks can now be written with the Puppet DSL. I am so pleased with this that effectively immediately the YAML DSL is deprecated and set for a rather short life time.

A basic example can be seen here, it will:

  • Reuse a company specific playbook and notify Slack of the action about to be taken
  • Discover nodes using PQL in a specified cluster and verify they are using a compatible Puppet Agent
  • Obtain a lock in Consul ensuring only 1 member in the team perform critical tasks related to the life cycle of the Puppet Agent at a time
  • Disable Puppet on the discovered nodes
  • Wait for up to 200 seconds for the nodes to become idle
  • Release the lock
  • Notify Slack that the task completed
# Disables Puppet and Wait for all in-progress catalog compiles to end
plan acme::disable_puppet_and_wait (
  Enum[alpha, bravo] $cluster
) {
  choria::run_playbook(acme::slack_notify, message => "Disabling Puppet in cluster ${cluster}")
  $puppet_agents = choria::discover("mcollective",
    discovery_method => "choria",
    agents => ["puppet"],
    facts => ["cluster=${cluster}"],
    uses => { puppet => ">= 1.13.1" }
  $ds = {
    "type" => "consul",
    "timeout" => 120,
    "ttl" => 60
  choria::lock("locks/puppet.critical", $ds) || {
      "action" => "puppet.disable",
      "nodes" => $puppet_agents,
      "fail_ok" => true,
      "silent" => true,
      "properties" => {"message" => "restarting puppet server"}
      "action"    => "puppet.status",
      "nodes"     => $puppet_agents,
      "assert"    => "idling=true",
      "tries"     => 10,
      "silent"    => true,
      "try_sleep" => 20,
    message => sprintf("Puppet disabled on %d nodes in cluster %s", $puppet_agents.count, $cluster)

As you can see we can re-use playbooks and build up a nice cache of utilities that the entire team can use, the support for locks and data sharing ensures safe and coordinated use of this style of system.

You can get this today if you use Puppet 5.4.0 and Choria 0.6.0. Refer to the Playbook Docs for more details, especially the Tips and Patterns section.

Why Puppet based DSL?

The Plan DSL as you’ll see in the Background and History part later in this post is something I have wanted a long time. I think the current generation Puppet DSL is fantastic and really suited to this problem. Of course having this in the Plan DSL I can now also create Ruby versions of this and I might well do that.

The Plan DSL though have many advantages:

  • Many of us already know the DSL
  • There are vast amounts of documentation and examples of Puppet code, you can get trained to use it.
  • The other tools in the Puppet stable support plans – you can use puppet strings to document your Playbooks
  • The community around the Puppet DSL is very strong, I imagine soon rspec-puppet might support testing Plans and so by extension Playbooks. This appears to be already possible but not quite as easy as it could be.
  • We have a capable and widely used way of sharing these between us in the Puppet Forge

I could not compete with this in any language I might want to support.

Future of Choria Playbooks

As I mentioned the YAML playbooks are not long for this world. I think they were an awesome experiment and I learned a ton from them, but these Plan based Playbooks are such a massive step forward that I just can’t see the YAML ones serving any purpose what so ever.

This release supports both YAML and Plan based Playbooks, the next release will ditch the YAML ones.

At that time a LOT of code will be removed from the repositories and I will be able to very significantly simplify the supporting code. My goal is to make it possible to add new task types, data sources, discovery sources etc really easily, perhaps even via Puppet modules so the eco system around these will grow.

I will be doing a bunch of work on the Choria Plugins (agent, server, puppet etc) and these might start shipping small Playbooks that you can use in your own Playbooks. The one that started this blog post would be a great candidate to supply as part of the Choria suite and I’d like to do that for this and many other plugins.

Background and History

For many years I have wanted Puppet to move in a direction that might one day support scripts – perhaps even become a good candidate for shell scripts, not at the expense of the CM DSL but as a way to reward people for knowing the Puppet Language. I wanted this for many reasons but a major one was because I wanted to use it as a DSL to write orchestration scripts for MCollective.

I did some proof of concepts of this late in 2012, you can see the fruits of this POC here, it allowed one to orchestrate MCollective tasks using Puppet DSL and a Ruby DSL. This was interesting but the DSL as it was then was no good for this.

I also made a pure YAML Puppet DSL that deeply incorporated Hiera and remained compatible with the Puppet DSL. This too was interesting and in hindsight given the popularity of YAML I think I should have given this a lot more attention than I did.

Neither of these really worked for what I needed. Around the time Henrik Lindberg started talking about massive changes to the Puppet DSL and I think our first ever conversation covered this very topic – this must have been back in 2012 as well.

More recently I worked on YAML based playbooks for Choria, a sample can be seen in the old Choria docs, this is about the closest I got to something workable, we have users in the wild using it and having success with these. As a exploration they were super handy and taught me loads.

Fast forward to Puppet Conf 2017 and Puppet Inc announced something called Puppet Plans, these are basically script like, uncompiled (kind of), top-down executed and aimed at use within your CLI much like you would a script. This was fantastic news, unfortunately the reality ended up with these locked up inside their new SSH based orchestrator called Bolt. Due to some very unfortunate technical direction and decision making Plans are entirely unusable by Puppet users without Bolt. Bolt vendors it’s own Puppet and Facter and so it’s unaware of the AIO Puppet.

Ideally I would want to use Plans as maintained by Puppet Inc for my Playbooks but the current status of things are that the team just is not interested in moving in that direction. Thus in the latest version of Choria I have implemented my own runner, result types, error types and everything needed to write Choria Playbooks using the Puppet DSL.


I am really pleased with how these playbooks turned out and am excited for what I can provide to the community in the future. There are no doubt some rough edges today in the implementation and documentation, your continued feedback and engagement in the Choria community around these would ensure that in time we will have THE Playbook system in the Puppet Eco system.

by R.I. Pienaar at February 25, 2018 12:02 PM

February 22, 2018

Sarah Allen

declarative eventing

An emerging pattern in server-side event-driven programming formalizes the data that might be generated by an event source, then a consumer of that event source registers for very specific events.

A declarative eventing system establishes a contract between the producer (event source) and consumer (a specific action) and allows for binding a source and action without modifying either.

Comparing this to how traditional APIs are constructed, we can think of it as a kind of reverse query — we reverse the direction of typical request-response by registering a query and then getting called back every time there’s a new answer. This new model establishes a specific operational contract for registering these queries that are commonly called event triggers.

This pattern requires a transport for event delivery. While systems typically support HTTP and RPC mechanisms for local events which might be connected point-to-point in a mesh network, they also often connect to messaging or streaming data systems, like Apache Kafka, RabbitMQ, as well as proprietary offerings.

This declarative eventing pattern can be seen in a number of serverless platforms, and is typically coupled with Functions-as-a-Service offerings, such as AWS Lambda and Google Cloud Functions.

An old pattern applied in a new way

Binding events to actions is nothing new. We have seen this pattern in various GUI programming environment for decades, and on the server-side in many Services Oriented Architecture (SOA) frameworks. What’s new is that we’re seeing server-side code that can be connected to managed services in a way that is almost as simple to set up as an onClick handler in HyperCard. However, the problems that we can solve with this pattern are today’s challenges of integrating data from disparate systems, often at high volume, along with custom analysis, business logic, machine learning and human interaction.

Distributed systems programming is no longer solely the domain of specialized systems engineers who create infrastructure, most applications we use every day integrate data sources from multiple systems across many providers. Distributed systems programming has become ubiquitous, providing an opportunity for interoperable systems at a much higher level.

by sarah at February 22, 2018 01:47 PM

February 20, 2018


Murdlok: A new old adventure game for the C64

Murdlok is a previously unreleased graphical text-based adventure game for the Commodore 64 written in 1986 by Peter Hempel. A German and an English version exist.

Murdlok – Ein Abenteuer von Peter Hempel

Befreie das Land von dem bösen Murdlok. Nur Nachdenken und kein Leichtsinn führen zum Ziel.


(Originalversion von 1986)

Murdlok – An Adventure by Peter Hempel

Liberate the land from the evil Murdlok! Reflection, not recklessness will guide you to your goal!


(English translation by Lisa Brodner and Michael Steil, 2018)

The great thing about a new game is that no walkthroughs exist yet! Feel free to use the comments section of this post to discuss how to solve the game. Extra points for the shortest solution – ours is 236 steps!

by Michael Steil at February 20, 2018 08:07 PM

February 19, 2018

Steve Kemp's Blog

How we care for our child

This post is a departure from the regular content, which is supposed to be "Debian and Free Software", but has accidentally turned into a hardware blog recently!

Anyway, we have a child who is now about 14 months old. The way that my wife and I care for him seems logical to us, but often amuses local people. So in the spirit of sharing this is what we do:

  • We divide the day into chunks of time.
  • At any given time one of us is solely responsible for him.
    • The other parent might be nearby, and might help a little.
    • But there is always a designated person who will be changing nappies, feeding, and playing at any given point in the day.
  • The end.

So our weekend routine, covering Saturday and Sunday, looks like this:

  • 07:00-08:00: Husband
  • 08:01-13:00: Wife
  • 13:01-17:00: Husband
  • 17:01-18:00: Wife
  • 18:01-19:30: Husband

Our child, Oiva, seems happy enough with this and he sometimes starts walking from one parent to the other at the appropriate time. But the real benefit is that each of us gets some time off - in my case I get "the morning" off, and my wife gets the afternoon off. We can hide in our bedroom, go shopping, eat cake, or do anything we like.

Week-days are similar, but with the caveat that we both have jobs. I take the morning, and the evenings, and in exchange if he wakes up overnight my wife helps him sleep and settle between 8PM-5AM, and if he wakes up later than 5AM I deal with him.

Most of the time our child sleeps through the night, but if he does wake up it tends to be in the 4:30AM/5AM timeframe. I'm "happy" to wake up at 5AM and stay up until I go to work because I'm a morning person and I tend to go to bed early these days.

Day-care is currently a complex process. There are three families with small children, and ourselves. Each day of the week one family hosts all the children, and the baby-sitter arrives there too (all the families live within a few blocks of each other).

All of the parents go to work, leaving one carer in charge of 4 babies for the day, from 08:15-16:15. On the days when we're hosting the children I greet the carer then go to work - on the days the children are at a different families house I take him there in the morning, on my way to work, and then my wife collects him in the evening.

At the moment things are a bit terrible because most of the children have been a bit sick, and the carer too. When a single child is sick it's mostly OK, unless that is the child which is supposed to be host-venue. If that child is sick we have to panic and pick another house for that day.

Unfortunately if the child-carer is sick then everybody is screwed, and one parent has to stay home from each family. I guess this is the downside compared to sending the children to public-daycare.

This is private day-care, Finnish-style. The social-services (kela) will reimburse each family €700/month if you're in such a scheme, and carers are limited to a maximum of 4 children. The net result is that prices are stable, averaging €900-€1000 per-child, per month.

(The €700 is refunded after a month or two, so in real-terms people like us pay €200-€300/month for Monday-Friday day-care. Plus a bit of beaurocracy over deciding which family is hosting, and which parents are providing food. With the size being capped, and the fees being pretty standard the carers earn €3600-€4000/month, which is a good amount. To be a school-teacher you need to be very qualified, but to do this caring is much simpler. It turns out that being an English-speaker can be a bonus too, for some families ;)

Currently our carer has a sick-note for three days, so I'm staying home today, and will likely stay tomorrow too. Then my wife will skip work on Wednesday. (We usually take it in turns but sometimes that can't happen easily.)

But all of this is due to change in the near future, because we've had too many sick days, and both of us have missed too much work.

More news on that in the future, unless I forget.

February 19, 2018 11:00 AM

February 17, 2018

Cryptography Engineering

A few notes on Medsec and St. Jude Medical

In Fall 2016 I was invited to come to Miami as part of a team that independently validated some alleged flaws in implantable cardiac devices manufactured by St. Jude Medical (now part of Abbott Labs). These flaws were discovered by a company called MedSec. The story got a lot of traction in the press at the time, primarily due to the fact that a hedge fund called Muddy Waters took a large short position on SJM stock as a result of these findings. SJM subsequently sued both parties for defamation. The FDA later issued a recall for many of the devices.

Due in part to the legal dispute (still ongoing!), I never had the opportunity to write about what happened down in Miami, and I thought that was a shame: because it’s really interesting. So I’m belatedly putting up this post, which talks a bit MedSec’s findings, and implantable device security in general.

By the way: “we” in this case refers to a team of subject matter experts hired by Bishop Fox, and retained by legal counsel for Muddy Waters investments. I won’t name the other team members here because some might not want to be troubled by this now, but they did most of the work — and their names can be found in this public expert report (as can all the technical findings in this post.)

Quick disclaimers: this post is my own, and any mistakes or inaccuracies in it are mine and mine alone. I’m not a doctor so holy cow this isn’t medical advice. Many of the flaws in this post have since been patched by SJM/Abbot. I was paid for my time and travel by Bishop Fox for a few days in 2016, but I haven’t worked for them since. I didn’t ask anyone for permission to post this, because it’s all public information.

A quick primer on implantable cardiac devices 

Implantable cardiac devices are tiny computers that can be surgically installed inside a patient’s body. Each device contains a battery and a set of electrical leads that can be surgically attached to the patient’s heart muscle.

When people think about these devices, they’re probably most familiar with the cardiac pacemaker. Pacemakers issue small electrical shocks to ensure that the heart beats at an appropriate rate. However, the pacemaker is actually one of the least powerful implantable devices. A much more powerful type of device is the Implantable Cardioverter-Defibrillator (ICD). These devices are implanted in patients who have a serious risk of spontaneously entering a dangerous state in which their heart ceases to pump blood effectively. The ICD continuously monitors the patient’s heart rhythm to identify when the patient’s heart has entered this condition, and applies a series of increasingly powerful shocks to the heart muscle to restore effective heart function. Unlike pacemakers, ICDs can issue shocks of several hundred volts or more, and can both stop and restart a patient’s normal heart rhythm.

Like most computers, implantable devices can communicate with other computers. To avoid the need for external data ports – which would mean a break in the patient’s skin – these devices communicate via either a long-range radio frequency (“RF”) or a near-field inductive coupling (“EM”) communication channel, or both. Healthcare providers use a specialized hospital device called a Programmer to update therapeutic settings on the device (e.g., program the device, turn therapy off). Using the Programmer, providers can manually issue commands that cause an ICD to shock the patient’s heart. One command, called a “T-Wave shock” (or “Shock-on-T”) can be used by healthcare providers to deliberately induce ventrical fibrillation. This capability is used after a device is implanted, in order to test the device and verify it’s functioning properly.

Because the Programmer is a powerful tool – one that could cause harm if misused – it’s generally deployed in a physician office or hospital setting. Moreover, device manufacturers may employ special precautions to prevent spurious commands from being accepted by an implantable device. For example:

  1. Some devices require that all Programmer commands be received over a short-range communication channel, such as the inductive (EM) channel. This limits the communication range to several centimeters.
  2. Other devices require that a short-range inductive (EM) wand must be used to initiate a session between the Programmer and a particular implantable device. The device will only accept long-range RF commands sent by the Programmer after this interaction, and then only for a limited period of time.

From a computer security perspective, both of these approaches have a common feature: using either approach requires some form of close-proximity physical interaction with the patient before the implantable device will accept (potentially harmful) commands via the long-range RF channel. Even if a malicious party steals a Programmer from a hospital, she may still need to physically approach the patient – at a distance limited to perhaps centimeters – before she can use the Programmer to issue commands that might harm the patient.

In addition to the Programmer, most implantable manufacturers also produce some form of “telemedicine” device. These devices aren’t intended to deliver commands like cardiac shocks. Instead, they exist to provide remote patient monitoring from the patient’s home. Telematics devices use RF or inductive (EM) communications to interrogate the implantable device in order to obtain episode history, usually at night when the patient is asleep. The resulting data is uploaded to a server (via telephone or cellular modem) where it can be accessed by healthcare providers.

What can go wrong?

Before we get into specific vulnerabilities in implantable devices, it’s worth asking a very basic question. From a security perspective, what should we even be worried about?

There are a number of answers to this question. For example, an attacker might abuse implantable device systems or infrastructure to recover confidential patient data (known as PHI). Obviously this would be bad, and manufacturers should design against it. But the loss of patient information is, quite frankly, kind of the least of your worries.

A much scarier possibility is that an attacker might attempt to harm patients. This could be as simple as turning off therapy, leaving the patient to deal with their underlying condition. On the much scarier end of the spectrum, an ICD attacker could find a way to deliberately issue dangerous shocks that could stop a patient’s heart from functioning properly.

Now let me be clear: this isn’t not what you’d call a high probability attack. Most people aren’t going to be targeted by sophisticated technical assassins. The concerning thing about this  the impact of such an attack is significantly terrifying that we should probably be concerned about it. Indeed, some high-profile individuals have already taken precautions against it.

The real nightmare scenario is a mass attack in which a single resourceful attacker targets thousands of individuals simultaneously — perhaps by compromising a manufacturer’s back-end infrastructure — and threatens to harm them all at the same time. While this might seem unlikely, we’ve already seen attackers systematically target hospitals with ransomware. So this isn’t entirely without precedent.

Securing device interaction physically

The real challenge in securing an implantable device is that too much security could hurt you. As tempting as it might be to lard these devices up with security features like passwords and digital certificates, doctors need to be able to access them. Sometimes in a hurry.

This shouldn’t happen in the ER.

This is a big deal. If you’re in a remote emergency room or hospital, the last thing you want is some complex security protocol making it hard to disable your device or issue a required shock. This means we can forget about complex PKI and revocation lists. Nobody is going to have time to remember a password. Even merely complicated procedures are out — you can’t afford to have them slow down treatment.

At the same time, these devices obviously must perform some sort of authentication: otherwise anyone with the right kind of RF transmitter could program them — via RF, from a distance. This is exactly what you want to prevent.

Many manufacturers have adopted an approach that cut through this knot. The basic idea is to require physical proximity before someone can issue commands to your device. Specifically, before anyone can issue a shock command (even via a long-range RF channel) they must — at least briefly — make close physical contact with the patient.

This proximity be enforced in a variety of ways. If you remember, I mentioned above that most devices have a short-range inductive coupling (“EM”) communications channel. These short-range channels seem ideal for establishing a “pairing” between a Programmer and an implantable device — via a specialized wand. Once the channel is established, of course, it’s possible to switch over to long-range RF communications.

This isn’t a perfect solution, but it has a lot going for it: someone could still harm you, but they would have to at least get a transmitter within a few inches of your chest before doing so. Moreover, you can potentially disable harmful commands from an entire class of device (like telemedecine monitoring devices) simply by leaving off the wand.

St. Jude Medical and MedSec


So given this background, what did St. Jude Medical do? All of the details are discussed in a full expert report published by Bishop Fox. In this post we I’ll focus on the most serious of MedSec’s claims, which can be expressed as follows:

Using only the hardware contained within a “Merlin @Home” telematics device, it was possible to disable therapy and issue high-power “shock” commands to an ICD from a distance, and without first physically interacting with the implantable device at close range.

This vulnerability had several implications:

  1. The existence of this vulnerability implies that – through a relatively simple process of “rooting” and installing software on a Merlin @Home device – a malicious attacker could create a device capable of issuing harmful shock commands to installed SJM ICD devices at a distance. This is particularly worrying given that Merlin @Home devices are widely deployed in patients’ homes and can be purchased on eBay for prices under $30. While it might conceivably be possible to physically secure and track the location of all PCS Programmer devices, it seems challenging to physically track the much larger fleet of Merlin @Home devices.
  2. More critically, it implies that St. Jude Medical implantable devices do not enforce a close physical interaction (e.g., via an EM wand or other mechanism) prior to accepting commands that have the potential to harm or even kill patients. This may be a deliberate design decision on St. Jude Medical’s part. Alternatively, it could be an oversight. In either case, this design flaw increases the risk to patients by allowing for the possibility that remote attackers might be able to cause patient harm solely via the long-range RF channel.
  3. If it is possible – using software modifications only – to issue shock commands from the Merlin @Home device, then patients with an ICD may be vulnerable in the hypothetical event that their Merlin @Home device becomes remotely compromised by an attacker. Such a compromise might be accomplished remotely via a network attack on a single patient’s Merlin @Home device. Alternatively, a compromise might be accomplished at large scale through a compromise of St. Jude Medical’s server infrastructure.

We stress that the final scenario is strictly hypothetical. MedSec did not allege a specific vulnerability that allows for the remote compromise of Merlin @Home devices or SJM infrastructure. However, from the perspective of software and network security design, these attacks are one of the potential implications of a design that permits telematics devices to send such commands to an implantable device. It is important to stress that none of these attacks would be possible if St. Jude Medical’s design prohibited the implantable from accepting therapeutic commands from the Merlin @Home device (e.g., by requiring close physical interaction via the EM wand, or by somehow authenticating the provenance of commands and restricting critical commands to be sent by the Programmer only).

Validating MedSec’s claim

To validate MedSec’s claim, we examined their methodology from start to finish. This methodology included extracting and decompiling Java-based software from a single PCS Programmer; accessing a Merlin @Home device to obtain a root shell via the JTAG port; and installing a new package of custom software written by MedSec onto a used Merlin @Home device.

We then observed MedSec issue a series of commands to an ICD device using a Merlin @Home device that had been customized (via software) as described above. We used the Programmer to verify that these commands were successfully received by the implantable device, and physically confirmed that MedSec had induced shocks by attaching a multimeter to the leads on the implantable device.

Finally, we reproduced MedSec’s claims by opening the case of a second Merlin @Home device (after verifying that the tape was intact over the screw holes), obtaining a shell by connecting a laptop computer to the JTAG port, and installing MedSec’s software on the device. We were then able to issue commands to the ICD from a distance of several feet. This process took us less than three hours in total, and required only inexpensive tools and a laptop computer.

What are the technical details of the attack?

Simply reproducing a claim is only part of the validation process. To verify MedSec’s claims we also needed to understand why the attack described above was successful. Specifically, we were interested in identifying the security design issues that make it possible for a Merlin @Home device to successfully issue commands that are not intended to be issued from this type of device. The answer to this question is quite technical, and involves the specific way that SJM implantable devices verify commands before accepting them.

MedSec described to us the operation of SJM’s command protocol as part of their demonstration. They also provided us with Java JAR executable code files taken from the hard drive of the PCS Programmer. These files, which are not obfuscated and can easily be “decompiled” into clear source code, contain the software responsible for implementing the Programmer-to-Device communications protocol.

By examining the SJM Programmer code, we verified that Programmer commands are authenticated through the inclusion of a three-byte (24 bit) “authentication tag” that must be present and correct within each command message received by the implantable device. If this tag is not correct, the device will refuse to accept the command.

From a cryptographic perspective, 24 bits is a surprisingly short value for an important authentication field. However, we note that even this relatively short tag might be sufficient to prevent forgery of command messages – provided the tag ws calculated using a secure cryptographic function (e.g., a Message Authentication Code) using a fresh secret key that cannot be predicted by an the attacker.

Based on MedSec’s demonstration, and on our analysis of the Programmer code, it appears that SJM does not use the above approach to generate authentication tags. Instead, SJM authenticates the Programmer to the implantable with the assistance of a “key table” that is hard-coded within the Java code within the Programmer. At minimum, any party who obtains the (non-obfuscated) Java code from a legitimate SJM Programmer can gain the ability to calculate the correct authentication tags needed to produce viable commands – without any need to use the Programmer itself.

Moreover, MedSec determined – and successfully demonstrated – that there exists a “Universal Key”, i.e., a fixed three-byte authentication tag, that can be used in place of the calculated authentication tag. We identified this value in the Java code provided by MedSec, and verified that it was sufficient to issue shock commands from a Merlin @Home to an implantable device.

While these issues alone are sufficient to defeat the command authentication mechanism used by SJM implantable devices, we also analyzed the specific function that is used by SJM to generate the three-byte authentication tag.  To our surprise, SJM does not appear to use a standard cryptographic function to compute this tag. Instead, they use an unusual and apparently “homebrewed” cryptographic algorithm for the purpose.

Specifically, the PCS Programmer Java code contains a series of hard-coded 32-bit RSA public keys. To issue a command, the implantable device sends a value to the Programmer. This value is then “encrypted” by the Programmer using one of the RSA public keys, and the resulting output is truncated to produce a 24-bit output tag.

The above is not a standard cryptographic protocol, and quite frankly it is difficult to see what St. Jude Medical is trying to accomplish using this technique. From a cryptographic perspective it has several problems:

  1. The RSA public keys used by the PCS Programmers are 32 bits long. Normal RSA keys are expected to be a minimum of 1024 bits in length. Some estimates predict that a 1024-bit RSA key can be factored (and thus rendered insecure) in approximately one year using a powerful network of supercomputers. Based on experimentation, we were able to factor the SJM public keys in less than one second on a laptop computer.
  2. Even if the RSA keys were of an appropriate length, the SJM protocol does not make use of the corresponding RSA secret keys. Thus the authentication tag is not an RSA signature, nor does it use RSA in any way that we are familiar with.
  3. As noted above, since there is no shared session key established between the specific implantable device and the Programmer, the only shared secret available to both parties is contained within the Programmer’s Java code. Thus any party who extracts the Java code from a PCS Programmer will be able to transmit valid commands to any SJM implantable device.

Our best interpretation of this design is that the calculation is intended as a form of “security by obscurity”, based on the assumption that an attacker will not be able to reverse engineer the protocol. Unfortunately, this approach is rarely successful when used in security systems. In this case, the system is fundamentally fragile – due to the fact that code for computing the correct authentication tag is likely available in easily-decompiled Java bytecode on each St. Jude Medical Programmer device. If this code is ever extracted and published, all St. Jude Medical devices become vulnerable to command forgery.

How to remediate these attacks?

To reiterate, the fundamental security concerns with these St. Jude Medical devices (as of 2016) appeared to be problems of design. These were:

  1. SJM implantable devices did not require close physical interaction prior to accepting commands (allegedly) sent by the Programmer.
  2. SJM did not incorporate a strong cryptographic authentication mechanism in its RF protocol to verify that commands are truly sent by the Programmer.
  3. Even if the previous issue was addressed, St. Jude did not appear to have an infrastructure for securely exchanging shared cryptographic keys between a legitimate Programmer and an implantable device.

There are various ways to remediate these issues. One approach is to require St. Jude implantable devices to exchange a secret key with the Programmer through a close-range interaction involving the Programmer’s EM wand. A second approach would be to use a magnetic sensor to verify the presence of a magnet on the device, prior to accepting Programmer commands. Other solutions are also possible. I haven’t reviewed the solution SJM ultimately adopted in their software patches, and I don’t know how many users patched.


Implantable devices offer a number of unique security challenges. It’s naturally hard to get these things right. At the same time, it’s important that vendors take these issues seriously, and spend the time to get cryptographic authentication mechanisms right — because once deployed, these devices are very hard to repair, and the cost of a mistake is extremely high.

by Matthew Green at February 17, 2018 06:27 PM

That grumpy BSD guy

A Life Lesson in Mishandling SMTP Sender Verification

An attempt to report spam to a mail service provider's abuse address reveals how incompetence is sometimes indistinguishable from malice.

It all started with one of those rare spam mails that got through.

This one was hawking address lists, much like the ones I occasionally receive to addresses that I can not turn into spamtraps. The message was addressed to, of all things, (The message with full headers has been preserved here for reference).

Yes, that's right, they sent their spam to root@. And a quick peek at the headers revealed that like most of those attempts at hawking address lists for spamming that actually make it to a mailbox here, this one had been sent by an customer.

The problem with spam delivered via is that you can't usefully blacklist the sending server, since the largish chunk of the world that uses some sort of Microsoft hosted email solution (Office365 and its ilk) have their usually legitimate mail delivered via the very same infrastructure.

And since is one of the mail providers that doesn't play well with greylisting (it spreads its retries across no less than 81 subnets (the output of 'echo | doas smtpctl spf walk' is preserved here), it's fairly common practice to just whitelist all those networks and avoid the hassle of lost or delayed mail to and from Microsoft customers.

I was going to just ignore this message too, but we've seen an increasing number of spammy outfits taking advantage of's seeming right of way to innocent third parties' mail boxes.

So I decided to try both to do my best at demoralizing this particular sender and alert to their problem. I wrote a messsage (preserved here) with a Cc: to where the meat is,

Ms Farell,

The address has never been subscribed to any mailing list, for obvious reasons. Whoever sold you an address list with that address on it are criminals and you should at least demand your money back.

Whoever handles will appreciate the attachment, which is a copy of the message as it arrived here with all headers intact.

Yours sincerely,
Peter N. M. Hansteen

What happened next is quite amazing.

If my analysis is correct, it may not be possible for senders who are not themselves customers to actually reach the abuse team.

Almost immediately after I sent the message to Ms Farell with a Cc: to, two apparently identical messages from, addressed to appeared (preserved here and here), with the main content of both stating

This is an email abuse report for an email message received from IP on Sat, 17 Feb 2018 01:59:21 -0800.
The message below did not meet the sending domain's authentication policy.
For more information about this format please see

In order to understand what happened here, it is necessary to look at the mail server log for a time interval of a few seconds (preserved here).

The first few lines describe the processing of my outgoing message:

2018-02-17 10:59:14 1emzGs-0009wb-94 <= H=( [] P=esmtps X=TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128 CV=no S=34977 id=31b4ffcf-bf87-de33-b53a-0

My server receives the message from my laptop, and we can see that the connection was properly TLS encrypted

2018-02-17 10:59:15 1emzGs-0009wb-94 => peter <> R=localuser T=local_delivery

I had for some reason kept the original recipient among the To: addresses. Actually useless but also harmless.

2018-02-17 10:59:16 1emzGs-0009wb-94 [] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/" H=""
2018-02-17 10:59:18 1emzGs-0009wb-94 SMTP error from remote mail server after end of data: 451 4.4.0 Message failed to be made redundant due to A shadow copy was required but failed to be made with an AckStatus of Fail [] []
2018-02-17 10:59:19 1emzGs-0009wb-94 [] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/" H=""

What we see here is that even a huge corporation like Microsoft does not always handle certificates properly. The certificate they present for setting up the encrypted connection is not actually valid for the host name that the server presents.

There is also what I interpret as a file system related message which I assume is meaningful to someone well versed in Microsoft products, but we see that

2018-02-17 10:59:20 1emzGs-0009wb-94 => R=dnslookup T=remote_smtp [] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=yes K C="250 2.6.0 <> [InternalId=40926743365667, Hostname=BMXPR01MB0934.INDPRD01.PROD.OUTLOOK.COM] 44350 bytes in 0.868, 49.851 KB/sec Queued mail for delivery"

even though the certificate fails the verification part, the connection sets up with TLSv1.2 anyway, and the message is accepted with a "Queued mail for delivery" message.

The message is also delivered to the Cc: recipient:

2018-02-17 10:59:21 1emzGs-0009wb-94 => R=dnslookup T=remote_smtp [] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K C="250 2.6.0 <> [InternalId=3491808500196,] 42526 bytes in 0.125, 332.215 KB/sec Queued mail for delivery"
2018-02-17 10:59:21 1emzGs-0009wb-94 Completed

And the transactions involving my message would normally have been completed.

But ten seconds later this happens:

2018-02-17 10:59:31 1emzHG-0004w8-0l <= [] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43968 id=BAY0-XMR-100m4KrfmH000a51d4@bay0-xmr-100.phx.gbl
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <> R=localuser T=local_delivery
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <> R=localuser T=local_delivery

That's the first message to my domain's postmaster@ address, followed two seconds later by

2018-02-17 10:59:33 1emzHI-0004w8-Fy <= [] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43963 id=BAY0-XMR-100Q2wN0I8000a51d3@bay0-xmr-100.phx.gbl
2018-02-17 10:59:33 1emzHI-0004w8-Fy => peter <> R=localuser T=local_delivery
2018-02-17 10:59:33 1emzHI-0004w8-Fy Completed

a second, apparently identical message.

Both of those messages state that the message I sent to had failed SPF verification, because the check happened on connections from ( by whatever handles incoming mail to the address, which apparently is where the system forwards's mail.

Reading Microsoft Exchange's variant SMTP headers has never been my forte, and I won't try decoding the exact chain of events here since that would probably also require you to have fairly intimate knowledge of Microsoft's internal mail delivery infrastructure.

But even a quick glance at the messages reveals that the message passed SPF and other checks on incoming to the infrastructure, but may have ended up not getting delivered after all since a second SPF test happened on a connection from a host that is not in the sender domain's SPF record.

In fact, that second test would only succeed for domains that have

in their SPF record, and those would presumably be customers.

Any student or practitioner of SMTP mail delivery should know that SPF records should only happen on ingress, that is at the point where the mail traffic enters your infrastructure and the sender IP address is the original one. Leave the check for later when the message may have been forwarded, and you do not have sufficient data to perform the check.

Whenever I encounter incredibly stupid and functionally destructive configuration errors like this I tend to believe they're down to simple incompetence and not malice.

But this one has me wondering. If you essentially require incoming mail to include the contents of (currently no less than 81 subnets) as valid senders for the domain, you are essentially saying that only customers are allowed to communicate.

If that restriction is a result of a deliberate choice rather than a simple configuration error, the problem moves out of the technical sphere and could conceivably become a legal matter, depending on what have specified in their contracts that they are selling to their customers.

But let us assume that this is indeed a matter of simple bad luck or incompetence and that the solution is indeed technical.

I would have liked to report this to whoever does technical things at that domain via email, but unfortunately there are indications that being their customer is a precondition for using that channel of communication to them.

I hope they fix that, and soon. And then move on to terminating their spamming customers' contracts.

The main lesson to be learned from this is that when you shop around for email service, please do yourself a favor and make an effort to ensure that your prospective providers actually understand how the modern-ish SMTP addons SPF, DKIM and DMARC actually work.

Otherwise you may end up receiving more of the mail you don't want than what you do want, and your own mail may end up not being delivered as intended.

Update 2018-02-19: Just as I was going to get ready for bed (it's late here in CET) another message from Ms Farell arrived, this time to an alias I set up in order to make it easier to filter PF tutorial related messages into a separate mailbox.

I wrote another response, and as the mail server log will show, despite the fact that a friend with an Office365 contract contacted them quoting this article, have still not fixed the problem. Two more messages (preserved here and here) shot back here immediately.

Update 2018-02-20: A response from Microsoft, with pointers to potentially useful information.

A message from somebody identifying as working for Microsoft Online Safety arrived, apparently responding to my message dated 2018-02-19, where the main material was,


Based on the information you provided, it appears to have originated from an Office 365 or Exchange Online tenant account.

To report junk mail from Office 365 tenants, send an email to   and include the junk mail as an attachment.

This link provides further junk mail education

I have asked for clarification of some points, but no response has arrived by this getting close to bedtime in CET.

However I did take the advice to forward the offending messages as attachment to the junk@ message, and put the abuse address in the Cc: on that message. My logs indicate that the certificate error had not gone away, but no SPF-generated bounces appeared either.

If Microsoft responds with further clarifications, I will publish a useful condensate here.

In other news, there will be PF tutorial at the 2018 AsiaBSDCon in Tokyo. Follow the links for the most up to date information.

by Peter N. M. Hansteen ( at February 17, 2018 04:38 PM


Commodore KERNAL History

If you have ever written 6502 code for the Commodore 64, you may remember using “JSR $FFD2” to print a character on the screen. You may have read that the jump table at the end of the KERNAL ROM was designed to allow applications to run on a all Commodore 8 bit computers from the PET to the C128 (and the C65!) – but that is a misconception. This article will show how

  • the first version of the jump table in the PET was designed to only hook up BASIC to the system’s features
  • it wasn’t until the VIC-20 that the jump table was generalized for application development (and the vector table introduced)
  • all later machines add their own calls, but later machines don’t necessary support older calls.

KIM-1 (1976)

The KIM-1 was originally meant as a computer development board for the MOS 6502 CPU. Commodore acquired MOS in 1976 and kept selling the KIM-1. It contained a 2 KB ROM (“TIM”, “Terminal Interface Monitor”), which included functions to read characters from ($1E5A) and write characters to ($1EA0) a serial terminal, as well as code to load from and save to tape and support for the hex keyboard and display.

Commodore asked Microsoft to port their BASIC for 6502 to it, which interfaced with the monitor only through the two character in and out functions. The original source of BASIC shows how Microsoft adapted it to work with the KIM-1 by defining CZGETL and OUTCH to point to the monitor routines:

        OUTCH=^O17240                   ;1EA0

(The values are octal, since the assembler Microsoft used did not support hexadecimal.)

The makers of the KIM-1 never intended to change the ROM, so there was no need to have a jump table for these calls. Applications just hardcoded their offsets in ROM.

PET (1977)

The PET was Commodore’s first complete computer, with a keyboard, a display and a built-in tape drive. The system ROM (“KERNAL”) was now 4 KB and included a powerful file I/O system for tape, RS-232 and IEEE-488 (for printers and disk drives) as well as timekeeping logic. Another 2 KB ROM (“EDITOR”) handled screen output and character input. Microsoft BASIC was included in ROM and was marketed – with the name “COMMODORE BASIC” – as the actual operating system, making the KERNAL and the editor merely a device driver package.

Like with the KIM-1, Commodore asked Microsoft to port BASIC to the PET, and provided them with addresses of a jump table in the KERNAL ROM for interfacing with it. These are the symbol definitions in Microsoft’s source:

        CQOIN= ^O177706         ;OPEN CHANNEL FOR INPUT
        CQOOUT=^O177711         ;FILL FOR COMMO.
        CQINCH=^O177717         ;INCHR'S CALL TO GET A CHARACTER
        OUTCH= ^O177722
        CQSYS= ^O177736
        CZGETL=^O177744         ;CALL POINT FOR "GET"
        CQCALL=^O177747         ;CLOSE ALL CHANNELS

(The meaning of the CQ prefix is left as an exercise to the reader.)

In hex and with Commodore’s names, these are the KERNAL calls used by BASIC:

  • $FFC0: OPEN
  • $FFC3: CLOSE
  • $FFC6: CHKIN
  • $FFD2: BSOUT
  • $FFD5: LOAD
  • $FFD8: SAVE
  • $FFDE: SYS
  • $FFE1: STOP
  • $FFE4: GETIN
  • $FFE7: CLALL
  • $FFEA: UDTIM (advance clock; not used by BASIC)

At first sight, this jump table looks very similar to the one known from the C64, but it is indeed very different, and it is not generally compatible.

The following eight KERNAL routines are called from within the implementation of BASIC commands to deal with character I/O and the keyboard:

  • $FFC6: CHKIN – set channel for character input
  • $FFC9: CHKOUT – set channel for character output
  • $FFCC: CLRCHN – restore character I/O to screen/keyboard
  • $FFCF: BASIN – get character
  • $FFD2: BSOUT – write character
  • $FFE1: STOP – test for STOP key
  • $FFE4: GETIN – get character from keyboard
  • $FFE7: CLALL – close all channels

But the remaining six calls are not library calls at all, but full-fledged implementations of BASIC commands:

  • $FFC0: OPEN – open a channel
  • $FFC3: CLOSE – close a channel
  • $FFD5: LOAD – load a file into memory
  • $FFD8: SAVE – save a file from memory
  • $FFDB: VERIFY – compare a file with memory
  • $FFDE: SYS – run machine code

When compiled for the PET, Microsoft BASIC detects the extra commands “OPEN”, “CLOSE” etc., but does not provide an implementation for them. Instead, it calls out to these KERNAL functions when these commands are encountered. So these KERNAL calls have to parse the BASIC arguments, check for errors, and update BASIC’s internal data structures.

These 6 KERNAL calls are actually BASIC command extensions, and they are not useful for any other programs in machine code. After all, the whole jump table was not meant as an abstraction of the machine, but as an interface especially for Microsoft BASIC.

PET BASIC V4 (1980)

Version 4 of the ROM set, which came with significant improvements to BASIC and shipped by default with the 4000 and 8000 series, contained several additions to the KERNAL – all of which were additional BASIC commands.

  • $FF93: CONCAT
  • $FF96: DOPEN
  • $FF99: DCLOSE
  • $FFA8: COPY
  • $FFB1: DLOAD
  • $FFBD: DS$ (disk status)

Even though Commodore was doing all development on their fork of BASIC after version 2, command additions were still kept separate and developed as part of the KERNAL. In fact, for all Commodore 8-bit computers from the PET to the C65, BASIC and KERNAL were built separately, and the KERNAL jump table was their interface.

VIC-20 (1981)

The VIC-20 was Commodore’s first low-cost home computer. In order to keep the cost down, the complete ROM had to fit into 16 KB, which meant the BASIC V4 features and the machine language monitor had to be dropped and the editor was merged into the KERNAL. While reorganizing the ROM, the original BASIC command extensions (OPEN, CLOSE, …) were moved into the BASIC ROM (so the KERNAL calls for the BASIC command implementations were no longer needed).

The VIC-20 KERNAL is the first one to have a proper system call interface, which does not only include all calls required so BASIC is hardware-independent, but also additional calls not used by BASIC but intended for applications written in machine code. The VIC-20 Programmer’s Reference Manual documents these, making this the first time that machine code applications could be written for the Commodore 8 bit series in a forward-compatible way.

Old PET Calls

The following PET KERNAL calls are generally useful and therefore still supported on the VIC-20:

  • $FFC6: CHKIN
  • $FFD2: BSOUT
  • $FFE1: STOP
  • $FFE4: GETIN
  • $FFE7: CLALL

Channel I/O

The calls for the BASIC commands OPEN, CLOSE, LOAD and SAVE have been replaced by generic functions that can be called from machine code:

  • $FFC0: OPEN
  • $FFC3: CLOSE
  • $FFD5: LOAD
  • $FFD8: SAVE

(There is no separate call for VERIFY, since the LOAD function can perform this function based on its inputs.)

OPEN, LOAD and SAVE take more arguments (LA, FA, SA, filename) than fit into the 6502 registers, so two more calls take these and store them temporarily.

  • $FFBA: SETLFS – set LA, FA and SA
  • $FFBD: SETNAM – set filename

Two more additions allow reading the status of the last operation and to set the verbosity of messages/errors:

  • $FFB7: READST – return status byte
  • $FF90: SETMSG – set verbosity

BASIC uses all these functions to implement the commands OPEN, CLOSE, LOAD, SAVE and VERIFY. It basically parses the arguments and then calls the KERNAL functions.


The KERNAL also exposes a complete low-level interface to the serial IEC (IEEE-488) bus used to connect printers and disk drives. None of these calls are used by BASIC though, which talks to these devices on a higher level (OPEN, CHKIN, BASIN etc.).

  • $FFB4: TALK – send TALK command
  • $FFB1: LISTEN – send LISTEN command
  • $FFAE: UNLSN – send UNLISTEN command
  • $FFAB: UNTLK – send UNTALK command
  • $FFA8: IECOUT – send byte to serial bus
  • $FFA5: IECIN – read byte from serial bus
  • $FFA2: SETTMO – set timeout
  • $FF96: TKSA – send TALK secondary address
  • $FF93: SECOND – send LISTEN secondary address


BASIC needs to know where usable RAM starts and where it ends, which is what the MEMTOP and MEMBOT function are for. They also allow setting these values.

  • $FF9C: MEMBOT – read/write address of start of usable RAM
  • $FF99: MEMTOP – read/write address of end of usable RAM


BASIC supports the TI and TI$ variables to access the system clock. The RDTIM and SETTIM KERNAL calls allow reading and writing this clock.

  • $FFDE: RDTIM – read system clock
  • $FFDB: SETTIM – write system clock

These functions use the addresses that used to be the BASIC commands SYS and VERIFY on the PET.


Machine code applications may want to know the size of the text screen (SCREEN) and be able to read or set the cursor position (PLOT). The latter is used by BASIC to align text on tab positions.

  • $FFED: SCREEN – get the screen resolution
  • $FFF0: PLOT – read/write cursor position


On the PET, the BASIC’s random number generator for the RND command was directly reading the timers in THE VIA 6522 controller. Since the VIC-20, this is abstracted: The IOBASE function returns the start address of the VIA in memory, and BASIC reads from the indexes 4, 5, 8 and 9 to access the timer values.

  • $FFF3: IOBASE – return start of I/O area

The VIC-20 Programmer’s Reference Guide states: “This routine exists to provide compatibility between the VIC 20 and future models of the VIC. If the I/O locations for a machine language program are set by a call to this routine, they should still remain compatible with future versions of the VIC, the KERNAL and BASIC.”


The PET already allowed the user to override the following vectors in RAM to hook into some KERNAL functions:

  • $00E9: input from keyboard
  • $00EB: output to screen
  • $0090: IRQ handler
  • $0092: BRK handler
  • $0094: NMI handler

The VIC-20 ROM replaces these vectors with a more extensive table of addresses in RAM at $0300 to hook core BASIC and KERNAL functions. The KERNAL ones start at $0314. The first three can be used to hook IRQ, BRK and NMI:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler
  • $0318: NMINV – NMI handler

The others allow overriding the core set of KERNAL calls

  • $031A: IOPEN – indirect entry to OPEN ($FFC0)
  • $031C: ICLOSE – indirect entry to CLOSE ($FFC3)
  • $031E: ICHKIN – indirect entry to CHKIN ($FFC6)
  • $0320: ICKOUT – indirect entry to CHKOUT ($FFC9)
  • $0322: ICLRCH – indirect entry to CLRCHN ($FFCC)
  • $0324: IBASIN – indirect entry to CHRIN ($FFCF)
  • $0326: IBSOUT – indirect entry to CHROUT ($FFD2)
  • $0328: ISTOP – indirect entry to STOP ($FFE1)
  • $032A: IGETIN – indirect entry to GETIN ($FFE4)
  • $032C: ICLALL – indirect entry to CLALL ($FFE7)
  • $032E: USRCMD – “User-Defined Vector”
  • $0330: ILOAD – indirect entry to LOAD ($FFD5)
  • $0332: ISAVE – indirect entry to SAVE ($FFD8)

The “USRCMD” vector is interesting: It’s unused on the VIC-20 and C64. On all later machines, this vector is documented as “EXMON” and allows hooking the machine code monitor’s command entry. The vector was presumably meant for the monitor from the beginning, but this feature was cut from these two machines.

The KERNAL documentation warns against changing these vectors by hand. Instead, the VECTOR call allows the application to copy the complete set of KERNAL vectors ($0314-$0333) from and to private memory. The RESTOR command sets the default values.

  • $FF8D: VECTOR – read/write KERNAL vectors
  • $FF8A: RESTOR – set KERNAL vectors to defaults

Custom IRQ Handlers

If an application hooks the IRQ vector, it could just insert itself and call code originally pointed to by the vector, or completely replace the IRQ code (and return with pulling the registers and RTI). In the latter case, it may still want the keyboard and the system clock to work. The PET already had the UDTIM ($FFEA) call to update the clock in the IRQ context. The VIC-20 adds SCNKEY to handle the keyboard and populating the keyboard buffer.

  • $FF9F: SCNKEY – keyboard driver

CBM-II (1982)

The CBM-II series of computers was meant as a successor of the PET 4000/8000 series. The KERNAL’s architecture was based on the VIC-20.

The vector table in RAM is compatible except for ILOAD, ISAVE and USRCMD (which is now used), whose order was changed:

  • $032E: ILOAD – indirect entry to LOAD ($FFD5)
  • $0330: ISAVE – indirect entry to SAVE ($FFD8)
  • $0332: USRCMD – machine code monitor command input

There are two new keyboard-related vectors:

  • $0334: ESCVEC – ESC key vector
  • $0336: CTLVEC – CONTROL key vector (unused)

And all IEEE-488 KERNAL calls except ACPTR can be hooked:

  • $0346: ITALK – indirect entry to TALK ($FFB4)
  • $0344: ILISTN – indirect entry to LISTEN ($FFB1)
  • $0342: IUNLSN – indirect entry to UNLSN ($FFAE)
  • $0340: IUNTLK – indirect entry to UNTLK ($FFAB)
  • $033E: ICIOUT – indirect entry to CIOUT ($FFA8)
  • $033C: IACPTR – indirect entry to ACPTR ($FFA5)
  • $033A: ITKSA – indirect entry to TKSA ($FF96)
  • $0338: ISECND – indirect entry to SECOND ($FF93)

For no apparent reason, the VECTOR and RESTOR calls have moved to different addresses:

  • $FF84: VECTOR – read/write KERNAL vectors
  • $FF87: RESTOR – set KERNAL vectors to defaults

And there are several new calls. All machines since the VIC-20 have a way to hand control to ROM cartridges instead of BASIC on system startup. At this point, no system initialization whatsoever has been done by the KERNAL, so the application or game on the cartridge can start up as quickly as possible. Applications that want to be forward-compatible can call into the following new KERNAL calls to initialize different parts of the system:

  • $FF7B: IOINIT – initialize I/O and enable timer IRQ
  • $FF7E: CINT – initialize text screen

The LKUPLA and LKUPSA calls are used by BASIC to find unused logical and secondary addresses for channel I/O, so its built-in disk commands can open channels even if the user has currently open channels – logical addresses have to be unique on the computer side, and secondary addresses have to be unique on the disk drive side.

  • $FF8D: LKUPLA – search tables for given LA
  • $FF8A: LKUPSA – search tables for given SA

It also added 6 generally useful calls:

  • $FF6C: TXJMP – jump across banks
  • $FF6F: VRESET – power-on/off vector reset
  • $FF72: IPCGO – loop for other processor
  • $FF75: FUNKEY – list/program function key
  • $FF78: IPRQST – send IPC request
  • $FF81: ALOCAT – allocate memory from MEMTOP down

C64 (1982)

Both the KERNAL and the BASIC ROM of the C64 are derived from the VIC-20, so both the KERNAL calls and the vectors are fully compatible with it, but some extensions from the CBM-II carried over: The IOINIT and CINT calls to initialize I/O and the text screen exist, but at different addresses, and a new RAMTAS call has been added, which is also useful for startup from a ROM cartridge.

  • $FF87: RAMTAS – test and initialize RAM
  • $FF84: IOINIT – initialize I/O and enable timer IRQ
  • $FF81: CINT – initialize text screen

The other CBM-II additions are missing, since they are not needed, e.g. because BASIC doesn’t have the V4 disk commands (LKUPLA, LKUPSA) and because there is only one RAM bank (TXJMP, ALOCAT).

Plus/4 (264 Series, 1985)

The next Commodore 8 bit computers in historical order are the 264 series: the C16, the C116 and the Plus/4, which share the same general architecture, BASIC and KERNAL. But they are neither meant as successors of the C64, nor to the CBM-II series – they are more like spiritual successors of the VIC-20. Nevertheless, the KERNAL jump table and vectors are based on the C64.

Since the 264 machines don’t have an NMI, the NMI vector is missing, and the remaining vectors have been moved in memory. This makes most of the vector table incompatible with their predecessors:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler
  • (NMI removed)
  • $0318: IOPEN
  • $031A: ICLOSE
  • $031C: ICHKIN
  • $031E: ICKOUT
  • $0320: ICLRCH
  • $0322: IBASIN
  • $0324: IBSOUT
  • $0326: ISTOP
  • $0328: IGETIN
  • $032A: ICLALL
  • $032C: USRCMD
  • $032E: ILOAD
  • $0330: ISAVE

The Plus/4 is the first machine from the home computer series to include the machine code monitor, so the USRCMD vector is now used for command input in the monitor.

And there is one new vector, ITIME, which is called one every frame during vertical blank.

  • $0312: ITIME – vertical blank IRQ

The Plus/4 supports all C64 KERNAL calls, plus some additions. The RESET call has been added to the very end of the table:

  • $FFF6: RESET – restart machine

There are nine more undocumented entries, which are located at lower addresses so that there is an (unused) gap between them and the remaining calls. Since the area $FFD0 to $FF3F is occupied by the I/O area, these vectors are split between the areas just below and just above it. These two sets are known as the “banking routine table” and the “unofficial jump table”.

  • $FF49: DEFKEY – program function key
  • $FF4C: PRINT – print string
  • $FF4F: PRIMM – print string following the caller’s code
  • $FF52: MONITOR – enter machine code monitor

The DEFKEY call has the same functionality as FUNKEY ($FF75) call of the CBM-II series, but the two take different arguments.

C128 (1985)

The Commodore 128 is the successor of the C64. Next to a 100% compatible C64 mode that used the original ROMs, it has a native C128 mode, which is based on the C64 (not the CBM-II or the 264), so all KERNAL vectors and calls are compatible with the C64, but there are additions.

The KERNAL vectors are the same as on the C64, but again, the USRCMD vector (at the VIC-20/C64 location of $032E) is used for command input in the machine code monitor. There are additional vectors starting at $0334 for hooking editor logic as well as pointers to keyboard decode tables, but these are not part of the KERNAL vectors, since the VECTOR and RESTOR calls don’t include them.

The set of KERNAL calls has been extended by 19 entries. The LKUPLA and LKUPSA calls from the CBM-II exist (because BASIC has disk commands), but they are at different locations:

  • $FF59: LKUPLA

There are also several calls known from the Plus/4, but at different addresses:

  • $FF65: PFKEY – program a function key
  • $FF7D: PRIMM – print string following the caller’s code
  • $FF56: PHOENIX – init function cartridges

And there are another 14 completely new ones:

  • $FF47: SPIN_SPOUT – setup fast serial ports for I/O
  • $FF4A: CLOSE_ALL – close all files on a device
  • $FF4D: C64MODE – reconfigure system as a C64
  • $FF50: DMA_CALL – send command to DMA device
  • $FF53: BOOT_CALL – boot load program from disk
  • $FF5F: SWAPPER – switch between 40 and 80 columns
  • $FF62: DLCHR – init 80-col character RAM
  • $FF68: SETBNK – set bank for I/O operations
  • $FF6B: GETCFG – lookup MMU data for given bank
  • $FF6E: JSRFAR – gosub in another bank
  • $FF71: JMPFAR – goto another bank
  • $FF74: INDFET – LDA (fetvec),Y from any bank
  • $FF77: INDSTA – STA (stavec),Y to any bank
  • $FF7A: INDCMP – CMP (cmpvec),Y to any bank

Interestingly, the C128 Programmer’s Reference Guide states that all calls since the C64 “are specifically for the C128 and as such should not be considered as permanent additions to the standard jump table.

C65 (1991)

The C65 (also known as the C64X) was a planned successor of the C64 line of computers. Several hundred prerelease devices were built, but it was never released as a product. Like the C128, it has a C64 mode, but it is not backwards-compatible with the C128. Nevertheless, the KERNAL of the native C65 mode is based on the C128 KERNAL.

Like the CBM-II, but at different addresses, all IEE-488/IEC functions can be hooked with these 8 new vectors:

  • $0335: ITALK – indirect entry to TALK ($FFB4)
  • $0338: ILISTEN – indirect entry to LISTEN ($FFB1)
  • $033B: ITALKSA – indirect entry to TKSA ($FF96)
  • $033E: ISECND – indirect entry to SECOND ($FF93)
  • $0341: IACPTR – indirect entry to ACPTR ($FFA5)
  • $0344: ICIOUT – indirect entry to CIOUT ($FFA8)
  • $0347: IUNTLK – indirect entry to UNTLK ($FFAB)
  • $034A: IUNLSN – indirect entry to UNLSN ($FFAE)

The C128 additions of the jump table are basically supported, but three calls have been removed and one has been added. The removed ones are DMA_CALL (REU support), DLCHR (VDC support) and GETCFG (MMU support). All three are C128-specific and would make no sense on the C65. The one addition is:

  • $FF56: MONITOR_CALL – enter machine code monitor

The removals and addition causes the addresses of the following calls to change:

  • $FF50: CLOSE_ALL
  • $FF53: C64MODE
  • $FF59: BOOT_CALL
  • $FF62: LKUPSA
  • $FF65: SWAPPER
  • $FF68: PFKEY

The C128-added KERNAL calls on the C65 can in no way be called compatible with the C128, since several of the calls take different arguments, e.g. the INDFET, INDSTA, INDCMP calls take the bank number in the 65CE02’s Z register. This shows again that the C65 is no way a successor of the C128, but another successor of the C64.

Relationship Graph

The successorship of the Commodore 8 bit computers is messy. Most were merely spiritual successors and rarely truly compatible. The KERNAL source code and the features of the jump table mostly follow the successorship path, but some KERNAL features and jump table calls carried over between branches.

Which entries are safe?

If you want to write code that works on multiple Commodore 8 bit machines, this table will help:






KERNAL Version















































































Code that must work on all Commodore 8 bit computers (without detecting the specific machine) is limited to the following KERNAL calls that are supported from the first PET up to the C65:

  • $FFCF: BASIN – get character
  • $FFD2: BSOUT – write character
  • $FFE1: STOP – test for STOP key
  • $FFE4: GETIN – get character from keyboard

The CHKIN, CHKOUT, CLRCHN, CLALL and UDTIM would be available, but they are not useful, since they are missing their counterparts (opening a file, hooking an interrupt) on the PET. The UDTIM call would be available too, but there is no standard way to hook the timer interrupt if you include the PET.

Nevertheless, the four basic calls are enough for any text mode application that doesn’t care where the line breaks are. Note that the PETSCII graphical character set and the basic PETSCII command codes e.g. for moving the cursor are supported across the whole family.

If you are limiting yourself to the VIC-20 and above (i.e. excluding the PET but including the CBM-II), you can use the basic of 34 calls starting at $FF90.

You can only use these two vectors though – if you’re okay with changing them manually without going through the VECTOR call in order to support the CBM-II:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler

VECTOR and RESTOR are supported on the complete home computer series (i.e. if you exclude the PET and the CBM-II), and the complete set of 16 vectors can be used on all home computers except the Plus/4.

The initialization calls (CINT, IOINIT, RAMTAS) exist on all home computers since the C64. In addition, all these machines contain the version of the KERNAL at $FF80.


by Michael Steil at February 17, 2018 12:38 PM

February 12, 2018

Colin Percival

FreeBSD/EC2 history

A couple years ago Jeff Barr published a blog post with a timeline of EC2 instances. I thought at the time that I should write up a timeline of the FreeBSD/EC2 platform, but I didn't get around to it; but last week, as I prepared to ask for sponsorship for my work I decided that it was time to sit down and collect together the long history of how the platform has evolved and improved over the years.

February 12, 2018 07:50 PM

Sarah Allen

to be recognized

Some people have the privilege to be recognized in our society. We’ve started to resurrect history and tell stories of people whose contributions have been studiously omitted. In America, February is Black history month. I didn’t learn Black history in school. I value this time for remedial studies, even as I feel a bit disturbed that we need to aggregate people by race to notice their impact. I had hoped that we, as a society, would have come farther along by now, in treating each other with fairness and respect. Instead, we are encoding our bias about what is noticed and who is recognized.

At the M.I.T. Media Lab, researcher Joy Buolamwini has studied facial recognition software, finding error rates increased with darker skin (via NYT). Specifically, algorithms by Microsoft, IBM and Face++ more frequently failed to identify the gender of black women than white men.

When the person in the photo is a white man, the software is right 99 percent of the time.
But the darker the skin, the more errors arise — up to nearly 35 percent for images of darker skinned women

A lack of judgement in choosing a data set is cast as an error of omission, a small lapse in attention on the part of software developers, yet the persistence of these kinds of errors illustrates a systemic bias. The systems that we build (ones made of code and others made of people) lack checks and balances where we actively notice whether our peers and our software are exercising good judgement, which includes treating people fairly and with respect.

Errors made by humans are amplified by the software we create.

Google "knowledge card" for Bessie Blount Griffen, shows same photo for Marie Van Brittan Brown and Miriam Benjamin

The Google “knowledge card” that appears next to the search results for “Bessie Blount Griffen” shows “people also searched for” two other women who are identified with the same photo. It’s hard to tell when the error first appeared, but we can guess that it was amplified by Google search results and perhaps by image search.

I discovered this error first when reading web articles about two different inventors and noticed that the photos used many of the articles were identical. This can be seen clearly in two examples below where the photo is composited with an image of the corresponding patent drawings. The patents were awarded to two unique humans, but somehow we, collectively, blur their individual identities, anonymizing them with a singular black female face.

I found a New York Times article of Marie Van Brittan Brown, and it seems that the oft replicated photo is Bessie Blount Griffen.

Additional confirmation by @SamMaggs tweet, author of “Wonder Women: 25 Innovators, Inventors, and Trailblazers Who Changed History”

photo of black women and patent drawing of medical apparatus
source: Black Then

patent drawing with home security system, with text: Marie Van Brittan Brown invented First Home Security System in 1966
source: Circle City Alarm blog

Newspaper article with photo of black woman and man behind her, caption: Mr and Mrs Albert L Brown

by sarah at February 12, 2018 01:49 PM

toolsmith #131 - The HELK vs APTSimulator - Part 1

Ladies and gentlemen, for our main attraction, I give you...The HELK vs APTSimulator, in a Death Battle! The late, great Randy "Macho Man" Savage said many things in his day, in his own special way, but "Expect the unexpected in the kingdom of madness!" could be our toolsmith theme this month and next. Man, am I having a flashback to my college days, many moons ago. :-) The HELK just brought it on. Yes, I know, HELK is the Hunting ELK stack, got it, but it reminded me of the Hulk, and then, I thought of a Hulkamania showdown with APTSimulator, and Randy Savage's classic, raspy voice popped in my head with "Hulkamania is like a single grain of sand in the Sahara desert that is Macho Madness." And that, dear reader, is a glimpse into exactly three seconds or less in the mind of your scribe, a strange place to be certain. But alas, that's how we came up with this fabulous showcase.
In this corner, from Roberto Rodriguez, @Cyb3rWard0g, the specter in SpecterOps, it's...The...HELK! This, my friends, is the s**t, worth every ounce of hype we can muster.
And in the other corner, from Florian Roth, @cyb3rops, the The Fracas of Frankfurt, we have APTSimulator. All your worst adversary apparitions in one APT mic drop. Battle!

Now with that out of our system, let's begin. There's a lot of goodness here, so I'm definitely going to do this in two parts so as not undervalue these two offerings.
HELK is incredibly easy to install. Its also well documented, with lots of related reading material, let me propose that you take the tine to to review it all. Pay particular attention to the wiki, gain comfort with the architecture, then review installation steps.
On an Ubuntu 16.04 LTS system I ran:
  • git clone
  • cd HELK/
  • sudo ./ 
Of the three installation options I was presented with, pulling the latest HELK Docker Image from cyb3rward0g dockerhub, building the HELK image from a local Dockerfile, or installing the HELK from a local bash script, I chose the first and went with the latest Docker image. The installation script does a fantastic job of fulfilling dependencies for you, if you haven't installed Docker, the HELK install script does it for you. You can observe the entire install process in Figure 1.
Figure 1: HELK Installation
You can immediately confirm your clean installation by navigating to your HELK KIBANA URL, in my case
For my test Windows system I created a Windows 7 x86 virtual machine with Virtualbox. The key to success here is ensuring that you install Winlogbeat on the Windows systems from which you'd like to ship logs to HELK. More important, is ensuring that you run Winlogbeat with the right winlogbeat.yml file. You'll want to modify and copy this to your target systems. The critical modification is line 123, under Kafka output, where you need to add the IP address for your HELK server in three spots. My modification appeared as hosts: ["","",""]. As noted in the HELK architecture diagram, HELK consumes Winlogbeat event logs via Kafka.
On your Windows systems, with a properly modified winlogbeat.yml, you'll run:
  • ./winlogbeat -c winlogbeat.yml -e
  • ./winlogbeat setup -e
You'll definitely want to set up Sysmon on your target hosts as well. I prefer to do so with the @SwiftOnSecurity configuration file. If you're doing so with your initial setup, use sysmon.exe -accepteula -i sysmonconfig-export.xml. If you're modifying an existing configuration, use sysmon.exe -c sysmonconfig-export.xml.  This will ensure rich data returns from Sysmon, when using adversary emulation services from APTsimulator, as we will, or experiencing the real deal.
With all set up and working you should see results in your Kibana dashboard as seen in Figure 2.

Figure 2: Initial HELK Kibana Sysmon dashboard.
Now for the showdown. :-) Florian's APTSimulator does some comprehensive emulation to make your systems appear compromised under the following scenarios:
  • POCs: Endpoint detection agents / compromise assessment tools
  • Test your security monitoring's detection capabilities
  • Test your SOCs response on a threat that isn't EICAR or a port scan
  • Prepare an environment for digital forensics classes 
This is a truly admirable effort, one I advocate for most heartily as a blue team leader. With particular attention to testing your security monitoring's detection capabilities, if you don't do so regularly and comprehensively, you are, quite simply, incomplete in your practice. If you haven't tested and validated, don't consider it detection, it's just a rule with a prayer. APTSimulator can be observed conducting the likes of:
  1. Creating typical attacker working directory C:\TMP...
  2. Activating guest user account
    1. Adding the guest user to the local administrators group
  3. Placing a svchost.exe (which is actually srvany.exe) into C:\Users\Public
  4. Modifying the hosts file
    1. Adding mapping to private IP address
  5. Using curl to access well-known C2 addresses
    1. C2:
  6. Dropping a Powershell netcat alternative into the APT dir
  7. Executes nbtscan on the local network
  8. Dropping a modified PsExec into the APT dir
  9. Registering mimikatz in At job
  10. Registering a malicious RUN key
  11. Registering mimikatz in scheduled task
  12. Registering cmd.exe as debugger for sethc.exe
  13. Dropping web shell in new WWW directory
A couple of notes here.
Download and install APTSimulator from the Releases section of its GitHub pages.
APTSimulator includes curl.exe, 7z.exe, and 7z.dll in its helpers directory. Be sure that you drop the correct version of 7 Zip for your system architecture. I'm assuming the default bits are 64bit, I was testing on a 32bit VM.

Let's do a fast run-through with HELK's Kibana Discover option looking for the above mentioned APTSimulator activities. Starting with a search for TMP in the sysmon-* index yields immediate results and strikes #1, 6, 7, and 8 from our APTSimulator list above, see for yourself in Figure 3.

Figure 3: TMP, PS nc, nbtscan, and PsExec in one shot
Created TMP, dropped a PowerShell netcat, nbtscanned the local network, and dropped a modified PsExec, check, check, check, and check.
How about enabling the guest user account and adding it to the local administrator's group? Figure 4 confirms.

Figure 4: Guest enabled and escalated
Strike #2 from the list. Something tells me we'll immediately find svchost.exe in C:\Users\Public. Aye, Figure 5 makes it so.

Figure 5: I've got your svchost right here
Knock #3 off the to-do, including the process.commandline,, and file.creationtime references. Up next, the At job and scheduled task creation. Indeed, see Figure 6.

Figure 6. tasks OR schtasks
I think you get the point, there weren't any misses here. There are, of course, visualization options. Don't forget about Kibana's Timelion feature. Forensicators and incident responders live and die by timelines, use it to your advantage (Figure 7).

Figure 7: Timelion
Finally, for this month, under HELK's Kibana Visualize menu, you'll note 34 visualizations. By default, these are pretty basic, but you quickly add value with sub-buckets. As an example, I selected the Sysmon_UserName visualization. Initially, it yielded a donut graph inclusive of malman (my pwned user), SYSTEM and LOCAL SERVICE. Not good enough to be particularly useful I added a sub-bucket to include process names associated with each user. The resulting graph is more detailed and tells us that of the 242 events in the last four hours associated with the malman user, 32 of those were specific to cmd.exe processes, or 18.6% (Figure 8).

Figure 8: Powerful visualization capabilities
This has been such a pleasure this month, I am thrilled with both HELK and APTSimulator. The true principles of blue team and detection quality are innate in these projects. The fact that Roberto consider HELK still in alpha state leads me to believe there is so much more to come. Be sure to dig deeply into APTSimulator's Advance Solutions as well, there's more than one way to emulate an adversary.
Next month Part 2 will explore the Network side of the equation via the Network Dashboard and related visualizations, as well as HELK integration with Spark, Graphframes & Jupyter notebooks.
Aw snap, more goodness to come, I can't wait.
Cheers...until next time.

by Russ McRee ( at February 12, 2018 06:56 AM

February 11, 2018

The future of configuration management (again), and a suggestion

cfgmgmtcamp-logoI have attended the Config Management Camp in Gent this year, where I also presented the talk “Promise theory: from configuration management to team leadership“. A thrilling experience, considering that I was talking about promise theory at the same conference and in the same track where Mark Burgess, the inventor of promise theory, was holding one of the keynotes!

The quality of the conference was as good as always, but my experience at the conference was completely different from the past. Last time I attended, in 2016, I was actively using CFEngine and that shaped in both the talks I attended and the people that I hanged on with the most. This year I was coming from a different work environment and a different job: I jumped a lot through the different tracks and devrooms, and talked with many people with a very different experience than mine. And that was truly enriching. I’ll focus on one experience in particular, that led me to see what the future of configuration management could be.


I attended all the keynotes. Mark Burgess’ was, as always, rich in content and a bit hard to process; lots of food for though, but I couldn’t let it percolate in my brain until someone made it click several hours later. More on that in a minute.

Then there was Luke Kanies’ keynote, explaining where configuration management and we, CM practitioners, won the battle; and also where we lost the battle and where we are irrelevant. Again, more stuff accumulated, waiting for something to trigger the mental process to consume the information. There was also the keynote by Adam Jacob about the future of Configuration Management, great and fun as always but not part of this movie 🙂 I recommend that you enjoy it on youtube.

Later, at the social event, I had the pleasure to have a conversation with Stein Inge Morisbak, whom I knew from before as we met in Oslo several times. With his experience working on public cloud infrastructures like AWS and Google Cloud Platform, Stein Inge was one of the people who attended the conference with a sceptical eye about configuration management and, at the same time, with the open mind that you would expect from the great guy he is. In a sincere effort to understand, he couldn’t really see how CM, “a sinking ship”, could possibly be relevant in an era where public cloud, immutable infrastructure and all the tooling around are the modern technology of today.

While we were talking, another great guy chimed in, namely Ivan Rossi. If you look at Ivan’s LinkedIn page you’ll see that he’s been working in technology for a good while and has seen things from many different angles. Ivan made a few practical examples where CM is the only tooling that you can use because the cloud simply isn’t there and the tooling that you use in immutable infrastructure don’t work: think of networks of devices sitting in the middle of nowhere. In situations like those, with limited hardware resources and/or shitty wireless links like 2G networks, you need something that is lightweight, resilient, fault tolerant, and that can maintain the configuration because in no way you’re just going around every other day to replace the devices with new ones with updated configurations and software.

And there, Stein Inge was the first one to make the link with Mark Burgess’ keynote and to make me part of his revelation (or his “pilgrim’s experience”, as he calls it). Mark talked about a new sprawl of hardware devices going on: they are all around us, in phones and tablets, and more and more in our domestic appliances, in smart cars, in all the “smart” devices that people are buying every day. A heap of devices that is poorly managed as of today, if at all, and where CM has definitely a place. Stein Inge talked about this experience in his blog; his post is in Norwegian so you must either know the language or ask some translation software for help, I promise it’s worth the read.

What’s the future then?

So, what’s the future of configuration management, based on Mark Burgess’ vision and these observations? A few ideas:

  • on the server side, it will be less and less relevant to the everyday user as more people will shift to private and public clouds. It will still be relevant for those who maintain hardware infrastructures; the big players will maybe decide to bake their own tools to better suit their hardware and workflows — they have the workforce and the skills in house, so why not? The smaller players will keep using “off-the-shelf” tools in the same lines of those we have today for provisioning hardware and keep their configurations in shape;
  • configuration management will become more relevant as a tool to manage fleets of hardware like company workstations and laptops, for example, to enforce policies and ensure that security measures are in place at all times; that will eventually include company-owned phones;
  • configuration management will be more and more relevant in IoT and “smart” devices in general; for those, a new generation of tools may be needed that can run on limited hardware and unreliable networks; agent-based tools will probably have the upper hand here;
  • we’ll have less and less config management on virtual machines (and possibly less and less virtual machines and more and more containers); CM on virtual machines will remain only in special cases, e.g. where you need to run a software that doesn’t lend itself to automatic installation and configuration (Atlassian, I am looking at you).

As always with future forecast, time will tell.

One word about Configuration Management Camp

I am a fan of Config Management Camp since I attended (and presented at-) the first edition. I am glad to see that the scope of the conference is widening to include containers and immutable infrastructure. However, as Stein Inge says in his blog post (the translation is mine, as all mistakes thereof):

The most part of the talks revolved around configuration management or servers, which is of little importance in a world where we use services on public cloud platforms on a much higher abstraction level.

Maybe, and I stress maybe, an effort should be made to reduce the focus from configuration management a bit in favour of the “rival” technologies of nowadays; not to the point that CM disappears because, as I just said, CM will still play an important part, and CfgMgmtCamp is not DevOpsDays anyway. Possibly a different name that underlines Infrastructure as Code as the real topic could help in this rebalance?

by bronto at February 11, 2018 09:03 PM

February 08, 2018

Sean's IT Blog

Moving to the Cloud? Don’t Forget End-User Experience

The cloud has a lot to offer IT departments.  It provides the benefits of virtualization in a consumption-based model, and it allows new applications to quickly be deployed while waiting for, or even completely forgoing, on-premises infrastructure.  This can provide a better time-to-value and greater flexibility for the business.  It can help organizations reduce, or eliminate, their on-premises data center footprint.

But while the cloud has a lot of potential to disrupt how IT manages applications in the data center, it also has the potential to disrupt how IT delivers services to end users.

In order to understand how cloud will disrupt end-user computing, we first need to look at how organizations are adopting the cloud.  We also need to look at how the cloud can change application development patterns, and how that will change how IT delivers services to end users.

The Current State of Cloud

When people talk about cloud, they’re usually talking about three different types of services.  These services, and their definitions, are:

  • Infrastructure-as-a-Service: Running virtual machines in a hosted, multi-tenant virtual data center.
  • Platform-as-a-Service: Allows developers to subscribe to build applications without having to build the supporting infrastructure.  The platform can include some combination of web services, application run time services (like .Net or Java), databases, message bus services, and other managed components.
  • Software-as-a-Service: Subscription to a vendor hosted and managed application.

The best analogy to explain this comparing the different cloud offerings with different types of pizza restaurants using the graphic below from

Image retrieved from:

So what does this have to do with End-User Computing?

Today, it seems like enterprises that are adopting cloud are going in one of two directions.  The first is migrating their data centers into infrastructure-as-a-service offerings with some platform-as-a-service mixed in.  The other direction is replacing applications with software-as-a-service options.  The former is migrating your applications to Azure or AWS EC2, the latter is replacing on-premises services with options like ServiceNow or Microsoft Office 365.

Both options can present challenges to how enterprises deliver applications to end-users.  And the choices made when migrating on-premises applications to the cloud can greatly impact end-user experience.

The challenges around software-as-a-service deal more with identity management, so this post will focus on migrating on-premises applications to the cloud.

Know Thy Applications – Infrastructure-As-A-Service and EUC Challenges

Infrastructure-as-a-Service offerings provide IT organizations with virtual machines running in a cloud service.  These offerings provide different virtual machines optimized for different tasks, and they provide the flexibility to meet the various needs of an enterprise IT organization.  They allow IT organizations to bring their on-premises business applications into the cloud.

The lifeblood of many businesses is Win32 applications.  Whether they are commercial or developed in house, these applications are often critical to some portion of a business process.  Many of these applications were never designed with high availability or the cloud in mind, and the developer and/or the source code may be long gone.  Or they might not be easily replaced because they are deeply integrated into critical processes or other enterprise systems.

Many Win32 applications have clients that expect to connect to local servers.  But when you move those servers to a remote datacenter, including the cloud, it can introduce problems that makes the application nearly unusable.  Common problems that users encounter are longer application load times, increased transaction times, and reports taking longer to preview and/or print.

These problems make employees less productive, and it has an impact on the efficiency and profitability of the business.

A few jobs ago, I was working for a company that had its headquarters, local office, and data center co-located in the same building.  They also had a number of other regional offices scattered across our state and the country.  The company had grown to the point where they were running out of space, and they decided to split the corporate and local offices.  The corporate team moved to a new building a few miles away, but the data center remained in the building.

Many of the corporate employees were users of a two-tier business application, and the application client connected directly to the database server.  Moving users of a fat client application a few miles down the road from the database server had a significant impact on application performance and user experience.  Application response suffered, and user complaints rose.  Critical business processes took longer, and productivity suffered as a result.

More bandwidth was procured. That didn’t solve the issue, and IT was sent scrambling for a new solution.  Eventually, these issues were addressed with a solution that was already in use for other areas of the business – placing the core applications into Windows Terminal Services and provide users at the corporate office with a published desktop that provided their required applications.

This solution solved their user experience and application performance problems.  But it required other adjustments to the server environment, business process workflows, and how users interact with the technology that enables them to work.  It took time for users to adjust to the changes.  Many of the issues were addressed when the business moved everything to a colocation facility a hundred miles away a few months later.

Ensuring Success When Migrating Applications to the Cloud

The business has said it’s time to move some applications to the cloud.  How do you ensure it’s a success and meets the business and technical requirements of that application while making sure an angry mob of users don’t show up at your office with torches and pitchforks?

The first thing is to understand your application portfolio.  That understanding goes beyond having visibility into what applications you have in your environment and how those applications work from a technical perspective.  You need wholistic view of your applications and  keep the following questions in mind:

  • Who uses the application?
  • What do the users do in the application?
  • How do the users access the application?
  • Where does it fit into business processes and workflows?
  • What other business systems does the application integrate with?
  • How is that integration handled?

Applications rarely exist in a vacuum, and making changes to one not only impacts the users, but it can impact other applications and business processes as well.

By understanding your applications, you will be able to build a roadmap of when applications should migrate to the cloud and effectively mitigate any impacts to both user experience and enterprise integrations.

The second thing is to test it extensively.  The testing needs to be more extensive than functional testing to ensure that the application will run on the server images built by the cloud providers, and it needs to include extensive user experience and user acceptance testing.  This may include spending time with users measuring tasks with a stop-watch to compare how long tasks take in cloud-hosted systems versus on-premises systems.

If application performance isn’t up to user standards and has a significant impact on productivity, you may need to start investigating solutions for bringing users closer to the cloud-hosted applications.  This includes solutions like Citrix, VMware Horizon Cloud, or Amazon Workspaces or AppStream. These solutions bring users closer to the applications, and it can give users an on-premises experience in the cloud.

The third thing is to plan ahead.  Having a roadmap and knowing your application portfolio enables you to plan for when you need capacity or specific features to support users, and it can guide your architecture and product selection.  You don’t want to get three years into a five year migration and find out that the solution you selected doesn’t have the features you require for a use case or that the environment wasn’t architected to support the number of users.

When planning to migrate applications from your on-premises datacenters to an infrastructure-as-a-service offering, it’s important to know your applications and take end-user experience into account.   It’s important to test, and understand, how these applications perform when the application servers and databases are remote to the application client.  If you don’t, you not only anger your users, but you also make them less productive and less profitable overall.


by seanpmassey at February 08, 2018 03:22 PM


Using TLS1.3 With OpenSSL

Note: This is an updated version of an earlier blog post available here.

The forthcoming OpenSSL 1.1.1 release will include support for TLSv1.3. The new release will be binary and API compatible with OpenSSL 1.1.0. In theory, if your application supports OpenSSL 1.1.0, then all you need to do to upgrade is to drop in the new version of OpenSSL when it becomes available and you will automatically start being able to use TLSv1.3. However there are some issues that application developers and deployers need to be aware of. In this blog post I am going to cover some of those things.

Differences with TLS1.2 and below

TLSv1.3 is a major rewrite of the specification. There was some debate as to whether it should really be called TLSv2.0 - but TLSv1.3 it is. There are major changes and some things work very differently. A brief, incomplete, summary of some things that you are likely to notice follows:

  • There are new ciphersuites that only work in TLSv1.3. The old ciphersuites cannot be used for TLSv1.3 connections.
  • The new ciphersuites are defined differently and do not specify the certificate type (e.g. RSA, DSA, ECDSA) or the key exchange mechanism (e.g. DHE or ECHDE). This has implications for ciphersuite configuration.
  • Clients provide a “key_share” in the ClientHello. This has consequences for “group” configuration.
  • Sessions are not established until after the main handshake has been completed. There may be a gap between the end of the handshake and the establishment of a session (or, in theory, a session may not be established at all). This could have impacts on session resumption code.
  • Renegotiation is not possible in a TLSv1.3 connection
  • More of the handshake is now encrypted.
  • More types of messages can now have extensions (this has an impact on the custom extension APIs and Certificate Transparency)
  • DSA certificates are no longer allowed in TLSv1.3 connections

Note that at this stage only TLSv1.3 is supported. DTLSv1.3 is still in the early days of specification and there is no OpenSSL support for it at this time.

Current status of the TLSv1.3 standard

As of the time of writing TLSv1.3 is still in draft. Periodically a new version of the draft standard is published by the TLS Working Group. Implementations of the draft are required to identify the specific draft version that they are using. This means that implementations based on different draft versions do not interoperate with each other.

OpenSSL 1.1.1 will not be released until (at least) TLSv1.3 is finalised. In the meantime the OpenSSL git master branch contains our development TLSv1.3 code which can be used for testing purposes (i.e. it is not for production use). You can check which draft TLSv1.3 version is implemented in any particular OpenSSL checkout by examining the value of the TLS1_3_VERSION_DRAFT_TXT macro in the tls1.h header file. This macro will be removed when the final version of the standard is released.

TLSv1.3 is enabled by default in the latest development versions (there is no need to explicitly enable it). To disable it at compile time you must use the “no-tls1_3” option to “config” or “Configure”.

Currently OpenSSL has implemented the “draft-23” version of TLSv1.3. Other applications that support TLSv1.3 may still be using older draft versions. This is a common source of interoperability problems. If two peers supporting different TLSv1.3 draft versions attempt to communicate then they will fall back to TLSv1.2.


OpenSSL has implemented support for five TLSv1.3 ciphersuites as follows:

  • TLS13-AES-256-GCM-SHA384
  • TLS13-CHACHA20-POLY1305-SHA256
  • TLS13-AES-128-GCM-SHA256
  • TLS13-AES-128-CCM-8-SHA256
  • TLS13-AES-128-CCM-SHA256

Of these the first three are in the DEFAULT ciphersuite group. This means that if you have no explicit ciphersuite configuration then you will automatically use those three and will be able to negotiate TLSv1.3.

All the TLSv1.3 ciphersuites also appear in the HIGH ciphersuite alias. The CHACHA20, AES, AES128, AES256, AESGCM, AESCCM and AESCCM8 ciphersuite aliases include a subset of these ciphersuites as you would expect based on their names. Key exchange and authentication properties were part of the ciphersuite definition in TLSv1.2 and below. This is no longer the case in TLSv1.3 so ciphersuite aliases such as ECDHE, ECDSA, RSA and other similar aliases do not contain any TLSv1.3 ciphersuites.

If you explicitly configure your ciphersuites then care should be taken to ensure that you are not inadvertently excluding all TLSv1.3 compatible ciphersuites. If a client has TLSv1.3 enabled but no TLSv1.3 ciphersuites configured then it will immediately fail (even if the server does not support TLSv1.3) with an error message like this:

140399519134144:error:141A90B5:SSL routines:ssl_cipher_list_to_bytes:no ciphers available:ssl/statem/statem_clnt.c:3715:No ciphers enabled for max supported SSL/TLS version

Similarly if a server has TLSv1.3 enabled but no TLSv1.3 ciphersuites it will also immediately fail, even if the client does not support TLSv1.3, with an error message like this:

140640328024512:error:141FC0B5:SSL routines:tls_setup_handshake:no ciphers available:ssl/statem/statem_lib.c:120:No ciphers enabled for max supported SSL/TLS version

For example, setting a ciphersuite selection string of ECDHE:!COMPLEMENTOFDEFAULT will work in OpenSSL 1.1.0 and will only select those ciphersuites that are in DEFAULT and also use ECDHE for key exchange. However no TLSv1.3 ciphersuites are in the ECDHE group so this ciphersuite configuration will fail in OpenSSL 1.1.1 if TLSv1.3 is enabled.

You may want to explicitly list the TLSv1.3 ciphersuites you want to use to avoid problems. For example:


You can test which ciphersuites are included in a given ciphersuite selection string using the openssl ciphers -s -v command:

$ openssl ciphers -s -v "ECDHE:!COMPLEMENTOFDEFAULT"

Ensure that at least one ciphersuite supports TLSv1.3


In TLSv1.3 the client selects a “group” that it will use for key exchange. At the time of writing, OpenSSL only supports ECDHE groups for this. The client then sends “key_share” information to the server for its selected group in the ClientHello.

The list of supported groups is configurable. It is possible for a client to select a group that the server does not support. In this case the server requests that the client sends a new key_share that it does support. While this means a connection will still be established (assuming a mutually supported group exists), it does introduce an extra server round trip - so this has implications for performance. In the ideal scenario the client will select a group that the server supports in the first instance.

In practice most clients will use X25519 or P-256 for their initial key_share. For maximum performance it is recommended that servers are configured to support at least those two groups and clients use one of those two for its initial key_share. This is the default case (OpenSSL clients will use X25519).

The group configuration also controls the allowed groups in TLSv1.2 and below. If applications have previously configured their groups in OpenSSL 1.1.0 then you should review that configuration to ensure that it still makes sense for TLSv1.3. The first named (i.e. most preferred group) will be the one used by an OpenSSL client in its intial key_share.

Applications can configure the group list by using SSL_CTX_set1_groups() or a similar function (see here for further details). Alternatively, if applications use SSL_CONF style configuration files then this can be configured using the Groups or Curves command (see here).


In TLSv1.2 and below a session is established as part of the handshake. This session can then be used in a subsequent connection to achieve an abbreviated handshake. Applications might typically obtain a handle on the session after a handshake has completed using the SSL_get1_session() function (or similar). See here for further details.

In TLSv1.3 sessions are not established until after the main handshake has completed. The server sends a separate post-handshake message to the client containing the session details. Typically this will happen soon after the handshake has completed, but it could be sometime later (or not at all).

The specification recommends that applications only use a session once (although this is not enforced). For this reason some servers send multiple session messages to a client. To enforce the “use once” recommendation applications could use SSL_CTX_remove_session() to mark a session as non-resumable (and remove it from the cache) once it has been used.

The old SSL_get1_session() and similar APIs may not operate as expected for client applications written for TLSv1.2 and below. Specifically if a client application calls SSL_get1_session() before the server message containing session details has been received then an SSL_SESSION object will still be returned, but any attempt to resume with it will not succeed and a full handshake will occur instead. In the case where multiple sessions have been sent by the server then only the last session will be returned by SSL_get1_session().

Client application developers should consider using the SSL_CTX_sess_set_new_cb() API instead (see here). This provides a callback mechanism which gets invoked every time a new session is established. This can get invoked multiple times for a single connection if a server sends multiple session messages.

Note that SSL_CTX_sess_set_new_cb() was also available in OpenSSL 1.1.0. Applications that already used that API will still work, but they may find that the callback is invoked at unexpected times, i.e. post-handshake.

An OpenSSL server will immediately attempt to send session details to a client after the main handshake has completed. To server applications this post-handshake stage will appear to be part of the main handshake, so calls to SSL_get1_session() should continue to work as before.

Custom Extensions and Certificate Transparency

In TLSv1.2 and below the initial ClientHello and ServerHello messages can contain “extensions”. This allows the base specifications to be extended with additional features and capabilities that may not be applicable in all scenarios or could not be foreseen at the time that the base specifications were written. OpenSSL provides support for a number of “built-in” extensions.

Additionally the custom extensions API provides some basic capabilities for application developers to add support for new extensions that are not built-in to OpenSSL.

Built on top of the custom extensions API is the “serverinfo” API. This provides an even more basic interface that can be configured at run time. One use case for this is Certificate Transparency. OpenSSL provides built-in support for the client side of Certificate Transparency but there is no built-in server side support. However this can easily be achieved using “serverinfo” files. A serverinfo file containing the Certificate Transparency information can be configured within OpenSSL and it will then be sent back to the client as appropriate.

In TLSv1.3 the use of extensions is expanded significantly and there are many more messages that can include them. Additionally some extensions that were applicable to TLSv1.2 and below are no longer applicable in TLSv1.3 and some extensions are moved from the ServerHello message to the EncryptedExtensions message. The old custom extensions API does not have the ability to specify which messages the extensions should be associated with. For that reason a new custom extensions API was required.

The old API will still work, but the custom extensions will only be added where TLSv1.2 or below is negotiated. To add custom extensions that work for all TLS versions application developers will need to update their applications to the new API (see here for details).

The “serverinfo” data format has also been updated to include additional information about which messages the extensions are relevant to. Applications using “serverinfo” files may need to update to the “version 2” file format to be able to operate in TLSv1.3 (see here and here for details).


TLSv1.3 does not have renegotiation so calls to SSL_renegotiate() or SSL_renegotiate_abbreviated() will immediately fail if invoked on a connection that has negotiated TLSv1.3.

A common use case for renegotiation is to update the connection keys. The function SSL_key_update() can be used for this purpose in TLSv1.3 (see here for further details).

Another use case is to request a certificate from the client. This can be achieved by using the SSL_verify_client_post_handshake() function in TLSv1.3 (see here for further details).

DSA certificates

DSA certificates are no longer allowed in TLSv1.3. If your server application is using a DSA certificate then TLSv1.3 connections will fail with an error message similar to the following:

140348850206144:error:14201076:SSL routines:tls_choose_sigalg:no suitable signature algorithm:ssl/t1_lib.c:2308:

Please use an ECDSA or RSA certificate instead.

Middlebox Compatibility Mode

During development of the TLSv1.3 standard it became apparent that in some cases, even if a client and server both support TLSv1.3, connections could sometimes still fail. This is because middleboxes on the network between the two peers do not understand the new protocol and prevent the connection from taking place. In order to work around this problem the TLSv1.3 specification introduced a “middlebox compatibility” mode. This made a few optional changes to the protocol to make it appear more like TLSv1.2 so that middleboxes would let it through. Largely these changes are superficial in nature but do include sending some small but unneccessary messages. OpenSSL has middlebox compatibility mode on by default, so most users should not need to worry about this. However applications may choose to switch it off by calling the function SSL_CTX_clear_options() and passing SSL_OP_ENABLE_MIDDLEBOX_COMPAT as an argument (see here for further details).

If the remote peer is not using middlebox compatibility mode and there are problematic middleboxes on the network path then this could cause spurious connection failures.


TLSv1.3 represents a significant step forward and has some exciting new features but there are some hazards for the unwary when upgrading. Mostly these issues have relatively straight forward solutions. Application developers should review their code and consider whether anything should be updated in order to work more effectively with TLSv1.3. Similarly application deployers should review their configuration.

February 08, 2018 11:00 AM

February 01, 2018

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – January 2018

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  4. Simple Log Review Checklist Released!” is often at the top of this list – this rapildy aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version) s
  5. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

A critical reference post:
Current research on testing security:
Current research on threat detection “starter kit”
Current research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin ( at February 01, 2018 09:05 PM

Ben's Practical Admin Blog

Impressions of Dell EMC OpenManage Enterprise

Dell EMC OpenManage Enterprise has now been available available as a Tech Release for a couple of months now, and I have recently had a opportunity to sit down and do some evaluation of the product at work.

The following thoughts and comments are made based on the version 1.0.0 (build 543) appliance.



OpenManage Enterprise (OMEnt) is described by Dell EMC as the next generation of their  Open Manage Essentials (OMEss) platform. At face value it has some really good features going for it:

  • System is now deployed from an appliance template (OVF, VHD etc). No more having to customise a host build for the application, and no more licensing considerations.
  • The UI is now HTML5. I can’t begin to describe how happy I am to see the end of silverlight…
  • Information and UI simplification.

Installation of the appliance was painless, and I had my test install up and running in under 10 minutes, which again is quite a welcome thing. Previous OpenManage Essentials installed usually took much longer after ensuring the prerequisites software installs were met, and mucking around with the interactive install.

Upon logging in, your eyes are in for a treat. The new HTML5 UI makes OMEss look positively archaic. Dell EMC have largely adopted the interface design seen in their OpenManage Mobile (OMM) application and the whole login dashboard is refreshingly uncluttered. The same goes for the whole UI really, and I think this is best shown in the device views.

Another excellent new feature is the overhaul of the firmware management within OMEnt. It is now possible to configure multiple baselines for firmware based on network share location or by using the Dell EMC online repository. This is fantastic news for larger organisations that like to establish firmware baselines and schedule their platform updates, rather than having to do some fancy repository juggling that OMEss forced you to do.  The one issue I do see with this so far is that you cannot import firmware version information from an existing device to establish what the baseline should be.

Update: Version 2.4 of OMEss released Jan 25 supports multiple firmware baselines as well.

Many of the other features currently remain the same as OMEss, albeit in in a much nicer UI, so I do not plan to go through those.

Email alerts have also had a facelift, now being full HTML, rather than the plain text of OMEss. This I have mixed feelings about because for email alerts at least, pretty formatting seems overkill and gets in the way of information. Below is an example of one of the new look alerts.


Being a version 1.0.0 Tech Release build, it definitely has a number of issues that will need resolving before the next planned release during Q2-Q3 2018. Some of the issues I have experienced include:

  • AD/LDAP integration doesn’t appear to be working
  • Logins cannot have a ‘.’ in the username
  • In order to import configuration templates from devices, SMB1 must be enabled in application settings. This one tripped me up and left me frustrated for a good day.
  • Navigation around application logs is not terribly intuitive. When the configuration template import failed (see above), there was no immediate way to find a log file to find out why it had failed. I eventually found it buried in the jobs section of the site.
  • It is not possible to archive alerts in a way that they are cleared as being active in dashboards, but remain associated with the device. This can be very useful for forensic troubleshooting of a server (e.g. oh look, it had memory issues 3 times in last 12 months). Currently the only way to “clear” an active alert is to delete it from OMEss/OMEnt.

Being a Tech Release of the software also means that there is a reasonable amount of technical information not available yet or vague. One such example was that in the build white paper, it advised to install on “fast storage”. I am pretty sure “fast storage” could mean a number of different things to people – SSD, SAN, RAID Array, High RPM disk?

Luckily, there is an active Dell EMC Community Forum where you can ask questions and leave feedback. My experience there has been mostly a positive one.

OMEnt will not be replacing Essentials any time soon though, with a 2.4 release of OMEss on the 25th January and future release look likely to be scheduled as OMEnt still appears to lack some features of OMEss such as integration with OpenManage Mobile, which in the install I was using there were references to in the documentation, however did not appear in any of the application settings menus.

The appearance of OMEnt as an appliance also raises questions about the future for the suite of additional applications associated with the OpenManage suite including:

  • OpenManage Power Center
  • OpenManage License Manager
  • OpenManage Repository Manager

One of the “useful” things about OMEss install on a host system was that you were able to install these additional tools on the same host. I am particularly hopeful that Dell EMC will look into integrating the features of these products into the appliance itself – for example I couldn’t imagine there being a huge jump to store historical power and temperature data from the iDRAC to draw charts with, which would almost make power center redundant (some code would need to be written around power policy as well). License Manager was purely about exporting and storing licenses from management interfaces, that could be integrated into OMEnt as well.

Overall the Tech Release of OMEnt shows significant improvements in the UI and ability to manage servers and you can see that there is great potential there for this to be a much better management platform that it’s predecessor. If you are only needing basic monitoring and firmware management, then you may want start your own evaluation of the Tech Release.

However if you are an OMEss power user, Making extensive use of Configuration Management for Deployment along with firmware repository management and using the secondary applications like OpenManage Mobile, Power Center and Repository Manager, I would say that you may wish to hold off on rolling out OMEnt until reviewing the next release – right now it’s not quite there yet, but many things are on the roadmap.


by Ben at February 01, 2018 09:53 AM

January 30, 2018 and 2017 in numbers

2017 has been a pretty good year for this blog.

The 10000 mark was passed for both the views (13790) and the visitors (11454); the previous records were established in 2015 for the views (10395) and in 2016 for the visitors (7520).

The top three visiting countries are the US (3251), Germany (1037) and France (763). My own country, Italy, didn’t make the top 10 with only 328 views.

The top three articles of the year were An init system in a Docker container with 3287 views, followed by Dates from UNIX timestamps in OpenOffice/LibreOffice (3123) and Exploring Docker overlay networks (published this year) with 1601 views.

2017 was also an year of change. In November 2016 I have left Opera Software and joined Telenor Digital as Head of IT. I have more “managerial” tasks now, less time for “operations” and the scale is definitely different than the one I was managing in Opera. That had an impact on the contents I was able to post in this blog, both in terms of topics and amount. Whether the new course is better or worse, only time will tell.

Happy 2018!

by bronto at January 30, 2018 08:00 AM

January 28, 2018

Promise-based team leadership

Can Promise Theory help you shape a better, effective leadership style?

Promise-based leadership will be the topic of the talk I will hold at two conferences. The first one is the Config Management Camp in Gent, Belgium, and it’s pretty close: February 5th. The second conference is the glorious Incontro DevOps Italia 2018, the Italian DevOps Meeting in Bologna, on March 9th.

When I joined Telenor Digital as the Head of IT I had to find an unconventional leadership style, as circumstances didn’t allow for a traditional one based on the “line-of-command” approach. After so many years spent with using CFEngine it was quite natural to me to use Promise Theory to model my new “reality” and understand how I could exploit exactly those peculiarities that were making the traditional leadership approach pointless.

Promise-based leadership has clear limits in applicability. It requires the right attitude in leaders and the right culture in the company. Where the right leaders and the right culture are present, I am confident that it provides significant advantages compared to the conventional approach based on top-down imposition.

I have been doing promise-based leadership for a bit more than one year now, regardless of people being direct reports or simply colleagues at any level of the hierarchy. My talk is a report of the experience so far. I don’t have definitive answers yet and there are several unanswered questions. I will be a bit tight with my talk schedule and I won’t be able to take many questions, but I hope to have several interesting conversations “on the side” of the conference events 🙂

One fun fact for closing: when I submitted to Config Management Camp I wasn’t really confident that my talk would be accepted because, I thought, the topic was kind-of “tangent” to the conference’s, so I didn’t even plan to attend. Later on the keynotes were announced, and one of them will be held by Mark Burgess, the inventor of Promise Theory. A few weeks more and I was informed that my talk was accepted and they actually liked it. So I will be talking of promise-based leadership at the same conference, in the same track and in the same room as the inventor of Promise Theory himself. Guess how hard I am working to put together a decent talk in time… 😀

by bronto at January 28, 2018 11:38 AM

January 26, 2018

Simon Lyall 2018 – Day 5 – Light Talks and Close

Lightning Talk

  • Usability Fails
  • Etching
  • Diverse Events
  • Kids Space – fairly unstructured and self organising
  • Opening up LandSat imagery – NBAR-T available on NCI
  • Project Nacho – HTML -> VPN/RDP gateway . Apache Guacomle
  • Vocaloids
  • Blockchain
  • Using j2 to create C++ code
  • Memory model code update
  • CLIs are user interface too
  • Complicated git things
  • Mollygive -matching donations
  • Abusing Docker


  • LCA 2019 will be in Christchurch, New Zealand –
  • 700 Attendees at 2018
  • 400 talk and 36 Miniconf submissions




by simon at January 26, 2018 06:17 AM 2018 – Day 5 – Session 2

QUIC: Replacing TCP for the Web Jana Iyengar

  • History
    • Protocol for http transport
    • Deployed Inside Google 2014 and Chrome / mobile apps
    • Improved performance: Youtube rebuffers 15-18% , Google search latency 3.6 – 8 %
    • 35% of Google’s egree traffic (7% of Internet)
    • Working group started in 2016 to standardized QUIC
    • Turned off at the start of 2016 due to security problem
    • Doubled in Sept 2016 due turned on for the youtube app
  • Technology
    • Previously – ip _> TCP -> TLS -> HTTP/2
    • QUIC -> udp -> QUIC -> http over QUIC
    • Includes crypto and tcp handshake
    • congestion control
    • loss recovery
    • TLS 1.3 has some of the same features that QUIC pioneered, being updated to take account
  • HTTP/1
    • 1 trip for TCP
    • 2 trips for TLS
    • Single connection – Head Of Line blocking
    • Multiple TCP connections workaround.
  • HTTP/2
    • Streams within a single transport connection
    • Packet loss will stall the TCP layer
    • Unresolved problems
      • Connection setup latency
      • Middlebox interference with TCP – makes it hard to change TCP
      • Head of line blocking within TCP
  • QUIC
    • Connection setup
      • 0 round trips, handshake packet followed directly by data packet
      • 1 round-trips if crypto keys are not new
      • 2 round trips if QUIC version needs renegotiation
    • Streams
      • http/2 streams are sent as quic streams
  • Aspirations of protocol
    • Deployable and evolveable
    • Low latency connection establishment
    • Stream multiplexing
    • Better loss recovery and flexible congestion control
      • richer signalling (unique packet number)
      • better RTT estimates
    • Resilience to NAT-rebinding ( UDP Nat-mapping changes often, maybe every few seconds)
  • UDP is not a transport, you put something in top of UDP to build a transport
  • Why not a new protocol instead of UDP? Almost impossible to get a new protocol in middle boxes around the Internet.
  • Metrics
    • Search Latency (see paper for other metrics)
    • Enter search term > entire page is loaded
    • Mean: desktop improve 8% , mobile 3.6 %
    • Low latency: Desktop 1% , Mobile none
    • Highest Latency 90-99% of users: Desktop & mobile 15-16%
    • Video similar
    • Big gain is from 0 RTT handshake
  • QUIC – Search Latency Improvements by Country
    • South Korea – 38ms RTT – 1% improvement
    • USA – 50ms – 2 – 3.5 %
    • India – 188ms – 5 – 13%
  • Middlebox ossification
    • Vendor ossified first byte of QUIC packet – flags byte
    • since it seemed to be the same on all QUIC packets
    • broke QUIC deployment when a flag was fixed
    • Encryption is the only way to protect against network ossification
    • “Greasing” by randomly changing options is also an option.
  • Other Protocols over QUIC?
    • Concentrating on http/2
    • Looking at Web RPC

Remote Work: My first decade working from the far end of the earth John Dalton

  • “Remote work has given me a fulfilling technical career while still being able to raise my family in Tasmania”
  • First son both in 2015, wanted to start in Tasmania with family to raise them, rather than moving to a tech hub.
  • 2017 working with High Performance Computing at University Tasmania
  • If everything is going to be outsourced, I want to be the one they outsourced to.
  • Wanted to do big web stuff, nobody in Tasmania doing that.
  • Was a user at LibraryThing
    • They were searching for Sysadmin/DBA in Portland, Maine
    • Knew he could do the job even though was on other side of the world
    • Negotiated into it over a couple of months
    • Knew could do the work, but not sure how the position would work out


  • Discipline
    • Feels he is not organised. Doesn’t keep planner uptodate or todo lists etc
    • “You can spend a lot of time reading about time management without actually doing it”
    • Do you need to have the minimum level
  • Isolation
    • Lives 20 minutes out of Hobart
    • In semi-rural area for days at a time, doesn’t leave house all week except to ferry kids on weekends.
    • “Never considered myself an extrovert, but I do enjoy talking to people at least weekly”
    • Need to work to hook in with Hobart tech community, Goes to meetups. Plays D&D with friends.
    • Considering going to coworking space. sometimes goes to Cafes etc
  • Setting Boundries
    • Hard to Leave work.
    • Have a dedicated work space.
  • Internet Access
    • Prioritise Coverage over cost these days for mobile.
    • Sometimes fixed provider go down, need to have a backup
  • Communication
    • Less random communicated with other employees
    • Cannot assume any particular knowledge when talking with other people
    • Aware of particular cultural differences
    • Multiple chance of a miscommunication


  • Access to companies and jobs and technologies that could get locally
  • Access to people with a wider range of experiences and backgrounds

Finding remote work

  • Talk your way into it
  • Networking
  • Job Bof
  • can filter

Making it work

  • Be Visable
  • Go home at the end of the day
  • Remember real people are at the end of the email



by simon at January 26, 2018 04:23 AM 2018 – Day 5 – Session 1

Self-Documenting Coders: Writing Workshop for Devs Heidi Waterhouse

History of Technical documentation

  • Linear Writing
    • On Paper, usually books
    • Emphasis on understanding and doing
  • Task-based writing
    • Early 90s
    • DITA
    • Concept, Procedure, Reference
  • Object-orientated writing
    • High art for of tech writers
    • Content as code
    • Only works when compiled
    • Favoured by tech writers, translated. Up to $2000 per seat
  • Guerilla Writing
    • Stack Overflow
    • Wikis
    • YouTube
    • frustrated non-writers trying to help peers
  • Search-first writing
    • Every page is page one
    • Search-index driven

Writing Words

  • 5 W’s of journalism.
  • Documentation needs to be tested
  • Audiences
    • eg Users, future-self, Sysadmins, experts, End users, installers
  • Writing Basics
    • Sentences short
    • Graphics for concepts
    • Avoid screencaps (too easily outdated)
    • User style guides and linters
    • Accessibility is a real thing
  • Words with pictures
    • Never include settings only in an image ( “set your screen to look like this” is bad)
    • Use images for concepts not instructions
  • Not all your users are readers
    • Can’t see well
    • Can’t parse easily
    • Some have terrible equipment
    • Some of the “some people” is us
    • Accessibility is not a checklist, although that helps, it is us
  • Using templates to write
    • Organising your thoughts and avoid forgetting parts
    • Add a standard look at low mental cost
  • Search-first writing – page one
    • If you didn’t answer the question or point to the answer you failed
    • answer “How do I?”
  • Indexing and search
    • All the words present are indexed
    • No false pointers
    • Use words people use and search for, Don’t use just your internal names for things
  • Semantic tagging and reuse
    • Semantic text splits form and content
    • Semantic tagging allows reuse
    • Reuse saves duplication
    • Reuse requires compiling
  • Sorting topics into buckets
    • Even with search you need some organisation
    • Group items by how they get used not by how they get prammed
    • Grouping similar items allows serendipity
  • Links, menus and flow
    • give people a next step
    • Provide related info on same page
    • show location
    • offer a chance to see the document structure

Distributing Words

  • Static Sites
  • Hosted Sites
  • Baked into the product
    • Only available to customers
    • only updates with the product
    • Hard to encourage average user to input
  • Knowledge based / CMS
    • Useful to community that known what it wants
    • Prone to aging and rot
    • Sometimes diverges from published docs or company message
  • Professional Writing Tools
    • Shiny and powerful
    • Learning Cliff
    • IDE
    • Super features
    • Not going to happen again
  • Paper-ish things
    • Essential for some topics
    • Reassuring to many people
    • touch is a sense we can bond with
    • Need to understand if people using docs will be online or offline when they want them.
  • Using templates to publish
    • Unified look and feel
    • Consistency and not missing things
    • Built-in checklist

Collaborating on Words

  • One weird trick, write it up as your best guess and let them correct it
  • Have a hack day
    • Ste a goal of things to delete
    • Set a goal of things to fix
    • Keep track of debt you can’t handle today
    • team-building doesn’t have to be about activities

Deleting Words

  • What needs to go
    • Old stuff that is wrong and terrible
    • Wrong stuff that hides right stuff
  • What to delete
    • Anything wrong
    • Anything dangerious
    • Anything used of updated in year
  • How
    • Delete temporarily (put aside for a while)
    • Based on analytics
    • Ruthlessly
    • Delete or update

Documentation Must be

  • True
  • Timely
  • Testable
  • Tuned

Documentation Components

  • Who is reading and why
    • Assuming no one likes reading docs
    • What is driving them to be here
  • Pre Requisites
    • What does a user need to succeed
    • Can I change the product to reduce documentation
    • Is there any hazard in this process
  • How do I do this task
    • Steps
    • Results
    • Next steps
  • Test – How do I know that it worked
    • If you can’t test i, it is not a procedure
    • What will the system do, how does the state change
  • Reference
    • What other stuff that affects this
    • What are the optionsal settings
    • What are the related things
  • Code and code samples
    • Best: code you can modify and run in the docs
    • 2nd Best: Code you can copy easily
    • Worst: retyping code
  • Option
    • Why did we build it this way
    • What else might you want to know
    • Have other people done this
    • Lifecycle

Documentation Types

  • Instructions
  • Ideas (arch, problem space,discarded options, process)
  • Action required (release notes, updates, deprecation)
  • Historical (roads maps, projects plans, retrospective documents)
  • Invisible docs (user experience, microinteractions, error messages)
    • Error messages – Unique ID, what caused, What mitigation, optional: Link to report



by simon at January 26, 2018 01:11 AM

January 25, 2018

Sean's IT Blog

VDI in the Time of Frequent Windows 10 Upgrades

The longevity of Windows 7, and Windows XP before that, has spoiled many customers and enterprises.  It provided IT organizations with a stable base to build their end-user computing infrastructures and applications on, and users were provided with a consistent experience.  The update model was fairly well known – a major service pack with all updates and feature enhancements would come out after about one year.

Whether this stability was good for organizations is debatable.  It certainly came with trade-offs, security of the endpoint being the primary one.

The introduction of Windows 10 has changed that model, and Microsoft is continuing to refine that model.  Microsoft is now releasing two major “feature updates” for Windows 10 each year, and these updates will only be supported for about 18 months each.  Microsoft calls this the “Windows as a Service” model, and it consists of two production-ready semi-annual release channels – a targeted deployment that is used to pilot users to test applications, and a broad deployment that replaces the “Current Branch for Business” option for enterprises.

Gone are the days where the end user’s desktop will have the same operating system for it’s entire life cycle.

(Note: While there is still a long-term servicing branch, Microsoft has repeatedly stated that this branch is suited for appliances and “machinery” that should not receive frequent feature updates such as ATMs and medical equipment.)

In order to facilitate this new delivery model, Microsoft has refined their in-place operating system upgrade technology.  While it has been possible to do this for years with previous versions of Windows, it was often flaky.  Settings wouldn’t port over properly, applications would refuse to run, and other weird errors would crop up.  That’s mostly a thing of the past when working with physical Windows 10 endpoints.

Virtual desktops, however, don’t seem to handle in-place upgrades well.  Virtual desktops often utilize various additional agents to deliver desktops remotely to users, and the in-place upgrade process can break these agents or cause otherwise unexpected behavior.  They also have a tendancy to reinstall Windows Modern Applications that have been removed or reset settings (although Microsoft is supposed to be working on those items).

If Windows 10 feature release upgrades can break, or at least require significant rework of, existing VDI images, what is the best method for handling them in a VDI environment?

I see two main options.  The first is to manually uninstall the VDI agents from the parent VMs, take a snapshot, and then do an in-place upgrade.  After the upgrade is complete, the VDI agents would need to be reinstalled on the machine.  In my opinion, this option has a couple of drawbacks.

First, it requires a significant amount of time.  While there are a number of steps that could be automated, validating the image after the upgrade would still require an administrator.  Someone would have to log in to validate that all settings were carried over properly and that Modern Applications were not reinstalled.  This may become a significant time sink if I have multiple parent desktop images.

Second, this process wouldn’t scale well.  If I have a large number of parent images, or a large estate of persistent desktops, I have to build a workflow to remove agents, upgrade Windows, and reinstall agents after the upgrade.  Not only do I have to test this workflow significantly, but I still have to test my desktops to ensure that the upgrade didn’t break any applications.

The second option, in my view, is to rebuild the desktop image when each new version of Windows 10 is released.  This ensures that you have a clean OS and application installation with every new release, and it would require less testing to validate because I don’t have to check to see what broke during the upgrade process.

One of the main drawbacks to this approach is that image building is a time consuming process.  This is where automated deployments can be helpful.  Tools like Microsoft Deployment Toolkit can help administrators build their parent images, including any agents and required applications, automatically as part of a task sequence.  With this type of toolkit, and administrator can automate their build process so that when a new version of Windows 10 is released, or a core desktop component like the Horizon or XenDesktop agent is updated, the image will have the latest software the next time a new build is started.

(Note: MDT is not the only tool in this category.  It is, however, the one I’m most familiar with.  It’s also the tool that Trond Haavarstein, @XenAppBlog, used for his Automation Framework Tool.)

Let’s take this one step further.  As an administrator, I would be doing a new Windows 10 build every 6 months to a year to ensure that my virtual desktop images remain on a supported version of Windows.  At some point, I’ll want to do more than just automate the Windows installation so that my end result, a fully configured virtual desktop that is deployment ready, is available at the push of a button.  This can include things like bringing it into Citrix Provisioning Services or shutting it down and taking a snapshot for VMware Horizon.

Virtualization has allowed for significant automation in the data center.  Tools like VMware PowerCLI and the Nutanix REST API make it easy for administrators to deploy and manage virtual machines using a few lines of PowerShell.   Using these same tools, I can also take details from this virtual machine shell, such as the name and MAC address, and inject them into my MDT database along with a Task Sequence and role.  When I power the VM on, it will automatically boot to MDT and start the task sequence that has been defined.

This is bringing “Infrastructure as Code” concepts to end-user computing, and the results should make it easier for administrators to test and deploy the latest versions of Windows 10 while reducing their management overhead.

I’m in the process of working through the last bits to automate the VM creation and integration with MDT, and I hope to have something to show in the next couple of weeks.


by seanpmassey at January 25, 2018 04:47 PM