Planet SysAdmin

May 25, 2018

Chris Siebenmann

There's real reasons for Linux to replace ifconfig, netstat, et al

One of the ongoing system administration controversies in Linux is that there is an ongoing effort to obsolete the old, cross-Unix standard network administration and diagnosis commands of ifconfig, netstat and the like and replace them with fresh new Linux specific things like ss and the ip suite. Old sysadmins are generally grumpy about this; they consider it yet another sign of Linux's 'not invented here' attitude that sees Linux breaking from well-established Unix norms to go its own way. Although I'm an old sysadmin myself, I don't have this reaction. Instead, I think that it might be both sensible and honest for Linux to go off in this direction. There are two reasons for this, one ostensible and one subtle.

The ostensible surface issue is that the current code for netstat, ifconfig, and so on operates in an inefficient way. Per various people, netstat et al operate by reading various files in /proc, and doing this is not the most efficient thing in the world (either on the kernel side or on netstat's side). You won't notice this on a small system, but apparently there are real impacts on large ones. Modern commands like ss and ip use Linux's netlink sockets, which are much more efficient. In theory netstat, ifconfig, and company could be rewritten to use netlink too; in practice this doesn't seem to have happened and there may be political issues involving different groups of developers with different opinions on which way to go.

(Netstat and ifconfig are part of net-tools, while ss and ip are part of iproute2.)

However, the deeper issue is the interface that netstat, ifconfig, and company present to users. In practice, these commands are caught between two masters. On the one hand, the information the tools present and the questions they let us ask are deeply intertwined with how the kernel itself does networking, and in general the tools are very much supposed to report the kernel's reality. On the other hand, the users expect netstat, ifconfig and so on to have their traditional interface (in terms of output, command line arguments, and so on); any number of scripts and tools fish things out of ifconfig output, for example. As the Linux kernel has changed how it does networking, this has presented things like ifconfig with a deep conflict; their traditional output is no longer necessarily an accurate representation of reality.

For instance, here is ifconfig output for a network interface on one of my machines:

 ; ifconfig -a
 em0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
    inet 128.100.3.XX  netmask  broadcast
    inet6 fe80::6245:cbff:fea0:e8dd  prefixlen 64  scopeid 0x20<link>
    ether 60:45:cb:a0:e8:dd  txqueuelen 1000  (Ethernet)

There are no other 'em0:...' devices reported by ifconfig, which is unfortunate because this output from ifconfig is not really an accurate picture of reality:

; ip -4 addr show em0
  inet 128.100.3.XX/24 brd scope global em0
    valid_lft forever preferred_lft forever
  inet 128.100.3.YY/24 brd scope global secondary em0
    valid_lft forever preferred_lft forever

This interface has an IP alias, set up through systemd's networkd. Perhaps there once was a day when all IP aliases on Linux had to be set up through additional alias interfaces, which ifconfig would show, but these days each interface can have multiple IPs and directly setting them this way is the modern approach.

This issue presents programs like ifconfig with an unappealing choice. They can maintain their traditional output, which is now sometimes a lie but which keeps people's scripts working, or they can change the output to better match reality and probably break some scripts. It's likely to be the case that the more they change their output (and arguments and so on) to match the kernel's current reality, the more they will break scripts and tools built on top of them. And some people will argue that those scripts and tools that would break are already broken, just differently; if you're parsing ifconfig output on my machine to generate a list of all of the local IP addresses, you're already wrong.

(If you try to keep the current interface while lying as little as possible, you wind up having arguments about what to lie about and how. If you can only list one IPv4 address per interface in ifconfig, how do you decide which one?)

In a sense, deprecating programs like ifconfig and netstat that have wound up with interfaces that are inaccurate but hard to change is the honest approach. Their interfaces can't be fixed without significant amounts of pain and they still work okay for many systems, so just let them be while encouraging people to switch to other tools that can be more honest.

(This elaborates on an old tweet of mine.)

PS: I believe that the kernel interfaces that ifconfig and so on currently use to get this information are bound by backwards compatibility issues themselves, so getting ifconfig to even know that it was being inaccurate here would probably take code changes.

by cks at May 25, 2018 05:32 AM

May 24, 2018

The Lone Sysadmin

vSphere 6.7 Will Not Run In My Lab: A Parable

“Hey Bob, I tried installing vSphere 6.7 on my lab servers and it doesn’t work right. You tried using it yet? Been beating my head against a wall here.” “Yeah, I really like it. A lot. Like, resisting the urge to be irresponsible and upgrade everything. What are your lab servers?” I knew what he […]

The post vSphere 6.7 Will Not Run In My Lab: A Parable appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at May 24, 2018 03:12 PM

Chris Siebenmann

Registering for things on the Internet is dangerous these days

Back in the old days (say up through the middle of the 00s), it was easily possible to view registering for websites, registering products on the Internet, and so on as a relatively harmless and even positive activity. Not infrequently, signing up was mostly there so you could customize your site experience and preferences, and maybe so that you could get to hear about important news. Unfortunately those days are long over. On today's Internet, registration is almost invariably dangerous.

The obvious problem is that handing over your email address often involves getting spam later, but this can be dealt with in various ways. The larger and more pernicious danger is that registering invariably requires agreeing to people's terms of service. In the old days, terms of service were not all that dangerous and often existed only to cover the legal rears of the service you were registering with. Today, this is very much not the case; most ToSes are full to the brim of obnoxious and dangerous things, and are very often not in your benefit in the least. At the very least, most ToSes will have you agreeing that the service can mine as much data from you as possible and sell it to whoever it wants. Beyond that, many ToSes contain additional nasty provisions like forced arbitration, perpetual broad copyright licensing for whatever you let them get their hands on (including eg your profile picture), and so on. Some but not all of these ToS provisions can be somewhat defanged by using the service as little as possible; on the other hand, sometimes the most noxious provisions cut to the heart of why you want to use the service at all.

(If you're in the EU and the website in question wants to do business there, the EU GDPR may give you some help here. Since I'm not in the EU, I'm on my own.)

Some Terms of Service are benign, but today ToSes are so long and intricate that you can't tell whether you have a benign or a dangerous one (and anyway, many ToSes are effectively self-upgrading). Even with potentially dangerous ToSes, some companies will never exercise the freedom that their ToS nominally gives them, for various reasons. But neither is the way to bet given an arbitrary company and an arbitrary ToS. Today the only safe assumption is that agreeing to someone's Terms of Service is at least a somewhat dangerous act that may bite you at some point.

The corollary to this is that you should assume that anyone who requires registration before giving you access to things when this is not actively required by how their service works is trying to exploit you. For example, 'register to see this report' should be at least a yellow and perhaps a red warning sign. My reaction is generally that I probably don't really need to read it after all.

(Other people react by simply giving up and agreeing to everything, taking solace in the generally relatively low chance that it will make a meaningful difference in their life one way or another. I have this reaction when I'm forced to agree to ToSes; since I can neither meaningfully read the terms nor do anything about them, what they are don't matter and I just blindly agree. I have to trust that I'll hear about it if the terms are so bad that I shouldn't agree under any circumstances. Of course this attitude of helplessness plays into the hands of these people.)

by cks at May 24, 2018 04:21 AM

Errata Security

C is to low level

I'm in danger of contradicting myself, after previously pointing out that x86 machine code is a high-level language, but this article claiming C is a not a low level language is bunk. C certainly has some problems, but it's still the closest language to assembly. This is obvious by the fact it's still the fastest compiled language. What we see is a typical academic out of touch with the real world.

The author makes the (wrong) observation that we've been stuck emulating the PDP-11 for the past 40 years. C was written for the PDP-11, and since then CPUs have been designed to make C run faster. The author imagines a different world, such as where CPU designers instead target something like LISP as their preferred language, or Erlang. This misunderstands the state of the market. CPUs do indeed supports lots of different abstractions, and C has evolved to accommodate this.

The author criticizes things like "out-of-order" execution which has lead to the Spectre sidechannel vulnerabilities. Out-of-order execution is necessary to make C run faster. The author claims instead that those resources should be spent on having more slower CPUs, with more threads. This sacrifices single-threaded performance in exchange for a lot more threads executing in parallel. The author cites Sparc Tx CPUs as his ideal processor.

But here's the thing, the Sparc Tx was a failure. To be fair, it's mostly a failure because most of the time, people wanted to run old C code instead of new Erlang code. But it was still a failure at running Erlang.

Time after time, engineers keep finding that "out-of-order", single-threaded performance is still the winner. A good example is ARM processors for both mobile phones and servers. All the theory points to in-order CPUs as being better, but all the products are out-of-order, because this theory is wrong. The custom ARM cores from Apple and Qualcomm used in most high-end phones are so deeply out-of-order they give Intel CPUs competition. The same is true on the server front with the latest Qualcomm Centriq and Cavium ThunderX2 processors, deeply out of order supporting more than 100 instructions in flight.

The Cavium is especially telling. Its ThunderX CPU had 48 simple cores which was replaced with the ThunderX2 having 32 complex, deeply out-of-order cores. The performance increase was massive, even on multithread-friendly workloads. Every competitor to Intel's dominance in the server space has learned the lesson from Sparc Tx: many wimpy cores is a failure, you need fewer beefy cores. Yes, they don't need to be as beefy as Intel's processors, but they need to be close.

Even Intel's "Xeon Phi" custom chip learned this lesson. This is their GPU-like chip, running 60 cores with 512-bit wide "vector" (sic) instructions, designed for supercomputer applications. Its first version was purely in-order. Its current version is slightly out-of-order. It supports four threads and focuses on basic number crunching, so in-order cores seems to be the right approach, but Intel found in this case that out-of-order processing still provided a benefit. Practice is different than theory.

As an academic, the author of the above article focuses on abstractions. The criticism of C is that it has the wrong abstractions which are hard to optimize, and that if we instead expressed things in the right abstractions, it would be easier to optimize.

This is an intellectually compelling argument, but so far bunk.

The reason is that while the theoretical base language has issues, everyone programs using extensions to the language, like "intrinsics" (C 'functions' that map to assembly instructions). Programmers write libraries using these intrinsics, which then the rest of the normal programmers use. In other words, if your criticism is that C is not itself low level enough, it still provides the best access to low level capabilities.

Given that C can access new functionality in CPUs, CPU designers add new paradigms, from SIMD to transaction processing. In other words, while in the 1980s CPUs were designed to optimize C (stacks, scaled pointers), these days CPUs are designed to optimize tasks regardless of language.

The author of that article criticizes the memory/cache hierarchy, claiming it has problems. Yes, it has problems, but only compared to how well it normally works. The author praises the many simple cores/threads idea as hiding memory latency with little caching, but misses the point that caches also dramatically increase memory bandwidth. Intel processors are optimized to read a whopping 256 bits every clock cycle from L1 cache. Main memory bandwidth is orders of magnitude slower.

The author goes onto criticize cache coherency as a problem. C uses it, but other languages like Erlang don't need it. But that's largely due to the problems each languages solves. Erlang solves the problem where a large number of threads work on largely independent tasks, needing to send only small messages to each other across threads. The problems C solves is when you need many threads working on a huge, common set of data.

For example, consider the "intrusion prevention system". Any thread can process any incoming packet that corresponds to any region of memory. There's no practical way of solving this problem without a huge coherent cache. It doesn't matter which language or abstractions you use, it's the fundamental constraint of the problem being solved. RDMA is an important concept that's moved from supercomputer applications to the data center, such as with memcached. Again, we have the problem of huge quantities (terabytes worth) shared among threads rather than small quantities (kilobytes).

The fundamental issue the author of the the paper is ignoring is decreasing marginal returns. Moore's Law has gifted us more transistors than we can usefully use. We can't apply those additional registers to just one thing, because the useful returns we get diminish.

For example, Intel CPUs have two hardware threads per core. That's because there are good returns by adding a single additional thread. However, the usefulness of adding a third or fourth thread decreases. That's why many CPUs have only two threads, or sometimes four threads, but no CPU has 16 threads per core.

You can apply the same discussion to any aspect of the CPU, from register count, to SIMD width, to cache size, to out-of-order depth, and so on. Rather than focusing on one of these things and increasing it to the extreme, CPU designers make each a bit larger every process tick that adds more transistors to the chip.

The same applies to cores. It's why the "more simpler cores" strategy fails, because more cores have their own decreasing marginal returns. Instead of adding cores tied to limited memory bandwidth, it's better to add more cache. Such cache already increases the size of the cores, so at some point it's more effective to add a few out-of-order features to each core rather than more cores. And so on.

The question isn't whether we can change this paradigm and radically redesign CPUs to match some academic's view of the perfect abstraction. Instead, the goal is to find new uses for those additional transistors. For example, "message passing" is a useful abstraction in languages like Go and Erlang that's often more useful than sharing memory. It's implemented with shared memory and atomic instructions, but I can't help but think it couldn't better be done with direct hardware support.

Of course, as soon as they do that, it'll become an intrinsic in C, then added to languages like Go and Erlang.


Academics live in an ideal world of abstractions, the rest of us live in practical reality. The reality is that vast majority of programmers work with the C family of languages (JavaScript, Go, etc.), whereas academics love the epiphanies they learned using other languages, especially function languages. CPUs are only superficially designed to run C and "PDP-11 compatibility". Instead, they keep adding features to support other abstractions, abstractions available to C. They are driven by decreasing marginal returns -- they would love to add new abstractions to the hardware because it's a cheap way to make use of additional transitions. Academics are wrong believing that the entire system needs to be redesigned from scratch. Instead, they just need to come up with new abstractions CPU designers can add.

by Robert Graham ( at May 24, 2018 01:49 AM

May 23, 2018

Errata Security

The devil wears Pravda

Classic Bond villain, Elon Musk, has a new plan to create a website dedicated to measuring the credibility and adherence to "core truth" of journalists. He is, without any sense of irony, going to call this "Pravda". This is not simply wrong but evil.

Musk has a point. Journalists do suck, and many suck consistently. I see this in my own industry, cybersecurity, and I frequently criticize them for their suckage.

But what he's doing here is not correcting them when they make mistakes (or what Musk sees as mistakes), but questioning their legitimacy. This legitimacy isn't measured by whether they follow established journalism ethics, but whether their "core truths" agree with Musk's "core truths".

An example of the problem is how the press fixates on Tesla car crashes due to its "autopilot" feature. Pretty much every autopilot crash makes national headlines, while the press ignores the other 40,000 car crashes that happen in the United States each year. Musk spies on Tesla drivers (hello, classic Bond villain everyone) so he can see the dip in autopilot usage every time such a news story breaks. He's got good reason to be concerned about this.

He argues that autopilot is safer than humans driving, and he's got the statistics and government studies to back this up. Therefore, the press's fixation on Tesla crashes is illegitimate "fake news", titillating the audience with distorted truth.

But here's the thing: that's still only Musk's version of the truth. Yes, on a mile-per-mile basis, autopilot is safer, but there's nuance here. Autopilot is used primarily on freeways, which already have a low mile-per-mile accident rate. People choose autopilot only when conditions are incredibly safe and drivers are unlikely to have an accident anyway. Musk is therefore being intentionally deceptive comparing apples to oranges. Autopilot may still be safer, it's just that the numbers Musk uses don't demonstrate this.

And then there is the truth calling it "autopilot" to begin with, because it isn't. The public is overrating the capabilities of the feature. It's little different than "lane keeping" and "adaptive cruise control" you can now find in other cars. In many ways, the technology is behind -- my Tesla doesn't beep at me when a pedestrian walks behind my car while backing up, but virtually every new car on the market does.

Yes, the press unduly covers Tesla autopilot crashes, but Musk has only himself to blame by unduly exaggerating his car's capabilities by calling it "autopilot".

What's "core truth" is thus rather difficult to obtain. What the press satisfies itself with instead is smaller truths, what they can document. The facts are in such cases that the accident happened, and they try to get Tesla or Musk to comment on it.

What you can criticize a journalist for is therefore not "core truth" but whether they did journalism correctly. When such stories criticize "autopilot", but don't do their diligence in getting Tesla's side of the story, then that's a violation of journalistic practice. When I criticize journalists for their poor handling of stories in my industry, I try to focus on which journalistic principles they get wrong. For example, the NYTimes reporters do a lot of stories quoting anonymous government sources in clear violation of journalistic principles.

If "credibility" is the concern, then it's the classic Bond villain here that's the problem: Musk himself. His track record on business statements is abysmal. For example, when he announced the Model 3 he claimed production targets that every Wall Street analyst claimed were absurd. He didn't make those targets, he didn't come close. Model 3 production is still lagging behind Musk's twice adjusted targets.

So who has a credibility gap here, the press, or Musk himself?

Not only is Musk's credibility problem ironic, so is the name he chose, "Pravada", the Russian word for truth that was the name of the Soviet Union Communist Party's official newspaper. This is so absurd this has to be a joke, yet Musk claims to be serious about all this.

Yes, the press has a lot of problems, and if Musk were some journalism professor concerned about journalists meeting the objective standards of their industry (e.g. abusing anonymous sources), then this would be a fine thing. But it's not. It's Musk who is upset the press's version of "core truth" does not agree with his version -- a version that he's proven time and time again differs from "real truth".

Just in case Musk is serious, I've already registered "" to start measuring the credibility of statements by billionaire playboy CEOs. Let's see who blinks first.

I stole the title, with permission, from this tweet:

by Robert Graham ( at May 23, 2018 10:45 PM

Evaggelos Balaskas

CentOS Bootstrap

CentOS 6

This way is been suggested for building a container image from your current centos system.


In my case, I need to remote upgrade a running centos6 system to a new clean centos7 on a test vps, without the need of opening the vnc console, attaching a new ISO etc etc.

I am rather lucky as I have a clean extra partition to this vps, so I will follow the below process to remote install a new clean CentOS 7 to this partition. Then add a new grub entry and boot into this partition.


Current OS

# cat /etc/redhat-release
CentOS release 6.9 (Final)


Format partition

format & mount the partition:

 mkfs.ext4 -L rootfs /dev/vda5
 mount /dev/vda5 /mnt/




# yum -y groupinstall "Base" --releasever 7 --installroot /mnt/ --nogpgcheck



test it, when finished:

mount --bind /dev/  /mnt/dev/
mount --bind /sys/  /mnt/sys/
mount --bind /proc/ /mnt/proc/

chroot /mnt/

bash-4.2#  cat /etc/redhat-release
CentOS Linux release 7.5.1804 (Core)

It works!


Root Password

inside chroot enviroment:

bash-4.2# passwd
Changing password for user root.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

bash-4.2# exit



adding the new grub entry for CentOS 7

title CentOS 7
        root (hd0,4)
        kernel /boot/vmlinuz-3.10.0-862.2.3.el7.x86_64 root=/dev/vda5 ro rhgb LANG=en_US.UTF-8
        initrd /boot/initramfs-3.10.0-862.2.3.el7.x86_64.img

by changing the default boot entry from 0 to 1 :




our system will boot into centos7 when reboot!


May 23, 2018 08:28 PM

Concurrency in Go

In my quest to learn the Go language I am currently in the process of doing the Go Code Clinic. It’s taking me quite some time because instead of going through the solutions proposed in the course I try to implement a solution by myself; only when I have no idea whatsoever about how to proceed I peep into the solution to get some insight, and then work independently on my solution again.

The second problem in the Clinic is already at a non-trivial level: compare a number of  images with a bigger image to check if any of those is a “clipping” of the bigger one. I confess that I would have a lot to read and to work even if I was trying to solve it in Perl!

It took some banging of my head to the wall till I eventually solved the problem. Unfortunately my program is single-threaded and the process of matching images is very expensive. For example, it took more than two hours to match a clipping sized 967×562 pixels with it’s “base” image sized 2048×1536. And for the whole time only one CPU thread was running 100%, the others where barely used.If I really want to say that I solved the problem I must  adapt the program to the available computing power by starting a number of subprocesses/threads (in our case: goroutines) to distribute the search across several CPU threads.

Since this was completely new to me in golang, I decided to experiment with a much simpler program: generate up to 100 random integers (say) between 0 and 10000 and run 8 workers to find if any of these random numbers is a multiple of another number, for example 17. And of course the program must shut down gracefully, whether or not a multiple is found. This gave me a few problems to solve:

  • how do I start exactly 8 worker goroutines?
  • what’s the best way to pass them the numbers to check? what’s the best way for them to report back the result?
  • how do I tell them to stop when it’s time that they shut down?
  • how do I wait that they are actually shut down?

The result is the go program that you can find in this gist. Assuming that it is good enough, you can use it as a skeleton for a program of yours, re-implementing the worker part and maybe the reaper part if a boolean response is not enough. Enjoy!

by bronto at May 23, 2018 03:34 PM

The Lone Sysadmin

Midnight is a Confusing Choice for Scheduling

Midnight is a poor choice for scheduling anything. Midnight belongs to tomorrow. It’s 0000 on the clock, which is the beginning of the next day. That’s not how humans think, though, because tomorrow is after we wake up! A great example is a statement like “proposals are due by midnight on April 15.” What you […]

The post Midnight is a Confusing Choice for Scheduling appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at May 23, 2018 02:54 PM

Lightning Calendar, Google and “foreign” addresses

This is written to my older self, and to all those using Mozilla Thunderbird and the Lightning Calendar add-on with Google calendars and they see this:


If you are seeing this, the solution is to change the setting to true:


and everything should work as expected:




by bronto at May 23, 2018 01:19 PM

Vincent Bernat

Multi-tier load-balancing with Linux

A common solution to provide a highly-available and scalable service is to insert a load-balancing layer to spread requests from users to backend servers.1 We usually have several expectations for such a layer:

It allows a service to scale by pushing traffic to newly provisioned backend servers. It should also be able to scale itself when it becomes the bottleneck.
It provides high availability to the service. If one server becomes unavailable, the traffic should be quickly steered to another server. The load-balancing layer itself should also be highly available.
It handles both short and long connections. It is flexible enough to offer all the features backends generally expect from a load-balancer like TLS or HTTP routing.
With some cooperation, any expected change should be seamless: rolling out a new software on the backends, adding or removing backends, or scaling up or down the load-balancing layer itself.

The problem and its solutions are well known. From recently published articles on the topic, “Introduction to modern network load-balancing and proxying” provides an overview of the state of the art. Google released “Maglev: A Fast and Reliable Software Network Load Balancer” describing their in-house solution in details.2 However, the associated software is not available. Basically, building a load-balancing solution with commodity servers consists of assembling three components:

  • ECMP routing
  • stateless L4 load-balancing
  • stateful L7 load-balancing

In this article, I describe and support a multi-tier solution using Linux and only open-source components. It should offer you the basis to build a production-ready load-balancing layer.

Update (2018.05)

Facebook just released Katran, an L4 load-balancer implemented with XDP and eBPF and using consistent hashing. It could be inserted in the configuration described below.

Last tier: L7 load-balancing🔗

Let’s start with the last tier. Its role is to provide high availability, by forwarding requests to only healthy backends, and scalability, by spreading requests fairly between them. Working in the highest layers of the OSI model, it can also offer additional services, like TLS-termination, HTTP routing, header rewriting, rate-limiting of unauthenticated users, and so on. Being stateful, it can leverage complex load-balancing algorithm. Being the first point of contact with backend servers, it should ease maintenances and minimize impact during daily changes.

L7 load-balancers
The last tier of the load-balancing solution is a set of L7 load-balancers receiving user connections and forwarding them to the backends.

It also terminates client TCP connections. This introduces some loose coupling between the load-balancing components and the backend servers with the following benefits:

  • connections to servers can be kept open for lower resource use and latency,
  • requests can be retried transparently in case of failure,
  • clients can use a different IP protocol than servers, and
  • servers do not have to care about path MTU discovery, TCP congestion control algorithms, avoidance of the TIME-WAIT state and various other low-level details.

Many pieces of software would fit in this layer and an ample literature exists on how to configure them. You could look at HAProxy, Envoy or Træfik. Here is a configuration example for HAProxy:

# L7 load-balancer endpoint
frontend l7lb
  # Listen on both IPv4 and IPv6
  bind :80 v4v6
  # Redirect everything to a default backend
  default_backend servers
  # Healthchecking
  acl dead nbsrv(servers) lt 1
  acl disabled nbsrv(enabler) lt 1
  monitor-uri /healthcheck
  monitor fail if dead || disabled

# IPv6-only servers with HTTP healthchecking and remote agent checks
backend servers
  balance roundrobin
  option httpchk
  server web1 [2001:db8:1:0:2::1]:80 send-proxy check agent-check agent-port 5555
  server web2 [2001:db8:1:0:2::2]:80 send-proxy check agent-check agent-port 5555
  server web3 [2001:db8:1:0:2::3]:80 send-proxy check agent-check agent-port 5555
  server web4 [2001:db8:1:0:2::4]:80 send-proxy check agent-check agent-port 5555

# Fake backend: if the local agent check fails, we assume we are dead
backend enabler
  server enabler [::1]:0 agent-check agent-port 5555

This configuration is the most incomplete piece of this guide. However, it illustrates two key concepts for operability:

  1. Healthchecking of the web servers is done both at HTTP-level (with check and option httpchk) and using an auxiliary agent check (with agent-check). The later makes it easy to put a server to maintenance or to orchestrate a progressive rollout. On each backend, you need a process listening on port 5555 and reporting the status of the service (UP, DOWN, MAINT). A simple socat process can do the trick:3

    socat -ly \
      TCP6-LISTEN:5555,ipv6only=0,reuseaddr,fork \

    Put UP in /etc/lb/agent-check when the service is in nominal mode. If the regular healthcheck is also positive, HAProxy will send requests to this node. When you need to put it in maintenance, write MAINT and wait for the existing connections to terminate. Use READY to cancel this mode.

  2. The load-balancer itself should provide an healthcheck endpoint (/healthcheck) for the upper tier. It will return a 503 error if either there is no backend servers available or if put down the enabler backend through the agent check. The same mechanism as for regular backends can be used to signal the unavailability of this load-balancer.

Additionally, the send-proxy directive enables the proxy protocol to transmit the real clients’ IP addresses. This protocol also works for non-HTTP connections and is supported by a variety of servers, including nginx:

http {
  server {
    listen [::]:80 default ipv6only=off proxy_protocol;
    root /var/www;
    set_real_ip_from ::/0;
    real_ip_header proxy_protocol;

As is, this solution is not complete. We have just moved the availability and scalability problem somewhere else. How do we load-balance the requests between the load-balancers?

First tier: ECMP routing🔗

On most modern routed IP networks, redundant paths exist between clients and servers. For each packet, routers have to choose a path. When the cost associated to each path is equal, incoming flows4 are load-balanced among the available destinations. This characteristic can be used to balance connections among available load-balancers:

ECMP routing
ECMP routing is used as a first tier. Flows are spread among available L7 load-balancers. Routing is stateless and asymmetric. Backend servers are not represented.

There is little control over the load-balancing but ECMP routing brings the ability to scale horizontally both tiers. A common way to implement such a solution is to use BGP, a routing protocol to exchange routes between network equipments. Each load-balancer announces to its connected routers the IP addresses it is serving.

If we assume you already have BGP-enabled routers available, ExaBGP is a flexible solution to let the load-balancers advertise their availability. Here is a configuration for one of the load-balancers:

# Healthcheck for IPv6
process service-v6 {
  run python -m exabgp healthcheck -s --interval 10 --increase 0 --cmd "test -f /etc/lb/v6-ready -a ! -f /etc/lb/disable";
  encoder text;

template {
  # Template for IPv6 neighbors
  neighbor v6 {
    local-address 2001:db8::;
    local-as 65000;
    peer-as 65000;
    hold-time 6;
    family {
      ipv6 unicast;
    api services-v6 {
      processes [ service-v6 ];

# First router
neighbor 2001:db8:: {
  inherit v6;

# Second router
neighbor 2001:db8:: {
  inherit v6;

If /etc/lb/v6-ready is present and /etc/lb/disable is absent, all the IP addresses configured on the lo interface will be announced to both routers. If the other load-balancers use a similar configuration, the routers will distribute incoming flows between them. Some external process should manage the existence of the /etc/lb/v6-ready file by checking for the healthiness of the load-balancer (using the /healthcheck endpoint for example). An operator can remove a load-balancer from the rotation by creating the /etc/lb/disable file.

To get more details on this part, have a look at “High availability with ExaBGP.” If you are in the cloud, this tier is usually implemented by your cloud provider, either using an anycast IP address or a basic L4 load-balancer.

Unfortunately, this solution is not resilient when an expected or unexpected change happens. Notably, when adding or removing a load-balancer, the number of available routes for a destination changes. The hashing algorithm used by routers is not consistent and flows are reshuffled among the available load-balancers, breaking existing connections:

Stability of ECMP routing 1/2
ECMP routing is unstable when a change happens. An additional load-balancer is added to the pool and the flows are routed to different load-balancers, which do not have the appropriate entries in their connection tables.

Moreover, each router may choose its own routes. When a router becomes unavailable, the second one may route the same flows differently:

Stability of ECMP routing 2/2
A router becomes unavailable and the remaining router load-balances its flows differently. One of them is routed to a different load-balancer, which do not have the appropriate entry in its connection table.

If you think this is not an acceptable outcome, notably if you need to handle long connections like file downloads, video streaming or websocket connections, you need an additional tier. Keep reading!

Second tier: L4 load-balancing🔗

The second tier is the glue between the stateless world of IP routers and the stateful land of L7 load-balancing. It is implemented with L4 load-balancing. The terminology can be a bit confusing here: this tier routes IP datagrams (no TCP termination) but the scheduler uses both destination IP and port to choose an available L7 load-balancer. The purpose of this tier is to ensure all members take the same scheduling decision for an incoming packet.

There are two options:

  • stateful L4 load-balancing with state synchronization accross the members, or
  • stateless L4 load-balancing with consistent hashing.

The first option increases complexity and limits scalability. We won’t use it.5 The second option is less resilient during some changes but can be enhanced with an hybrid approach using a local state.

We use IPVS, a performant L4 load-balancer running inside the Linux kernel, with Keepalived, a frontend to IPVS with a set of healthcheckers to kick out an unhealthy component. IPVS is configured to use the Maglev scheduler, a consistent hashing algorithm from Google. Among its family, this is a great algorithm because it spreads connections fairly, minimizes disruptions during changes and is quite fast at building its lookup table. Finally, to improve performance, we let the last tier—the L7 load-balancers—sends back answers directly to the clients without involving the second tier—the L4 load-balancers. This is referred to as direct server return (DSR) or direct routing (DR).

Second tier: L4 load-balancing
L4 load-balancing with IPVS and consistent hashing as a glue between the first tier and the third tier. Backend servers have been omitted. Dotted lines represent the path for the return packets.

With such a setup, we expect packets from a flow to be able to move freely between the components of the two first tiers while sticking to the same L7 load-balancer.


Assuming ExaBGP has already been configured like described in the previous section, let’s start with the configuration of Keepalived:

virtual_server_group VS_GROUP_MH_IPv6 {
  2001:db8:: 80
virtual_server group VS_GROUP_MH_IPv6 {
  lvs_method TUN  # Tunnel mode for DSR
  lvs_sched mh    # Scheduler: Maglev
  sh-port         # Use port information for scheduling
  protocol TCP
  delay_loop 5
  alpha           # All servers are down on start
  omega           # Execute quorum_down on shutdown
  quorum_up   "/bin/touch /etc/lb/v6-ready"
  quorum_down "/bin/rm -f /etc/lb/v6-ready"

  # First L7 load-balancer
  real_server 2001:db8:: 80 {
    weight 1
    HTTP_GET {
      url {
        path /healthcheck
        status_code 200
      connect_timeout 2

  # Many others...

The quorum_up and quorum_down statements define the commands to be executed when the service becomes available and unavailable respectively. The /etc/lb/v6-ready file is used as a signal to ExaBGP to advertise the service IP address to the neighbor routers.

Additionally, IPVS needs to be configured to continue routing packets from a flow moved from another L4 load-balancer. It should also continue routing packets from unavailable destinations to ensure we can drain properly a L7 load-balancer.

# Schedule non-SYN packets
sysctl -qw net.ipv4.vs.sloppy_tcp=1
# Do NOT reschedule a connection when destination
# doesn't exist anymore
sysctl -qw net.ipv4.vs.expire_nodest_conn=0
sysctl -qw net.ipv4.vs.expire_quiescent_template=0

The Maglev scheduling algorithm will be available with Linux 4.18, thanks to Inju Song. For older kernels, I have prepared a backport.6 Use of source hashing as a scheduling algorithm will hurt the resilience of the setup.

DSR is implemented using the tunnel mode. This method is compatible with routed datacenters and cloud environments. Requests are tunneled to the scheduled peer using IPIP encapsulation. It adds a small overhead and may lead to MTU issues. If possible, ensure you are using a larger MTU for communication between the second and the third tier.7 Otherwise, it is better to explicitely allow fragmentation of IP packets:

sysctl -qw net.ipv4.vs.pmtu_disc=0

You also need to configure the L7 load-balancers to handle encapsulated traffic:8

# Setup IPIP tunnel to accept packets from any source
ip tunnel add tunlv6 mode ip6ip6 local 2001:db8::
ip link set up dev tunlv6
ip addr add 2001:db8:: dev tunlv6

Evaluation of the resilience🔗

As configured, the second tier increases the resilience of this setup for two reasons:

  1. The scheduling algorithm is using a consistent hash to choose its destination. Such an algorithm reduces the negative impact of expected or unexpected changes by minimizing the number of flows moving to a new destination. “Consistent Hashing: Algorithmic Tradeoffs” offers more details on this subject.

  2. IPVS keeps a local connection table for known flows. When a change impacts only the third tier, existing flows will be correctly directed according to the connection table.

If we add or remove a L4 load-balancer, existing flows are not impacted because each load-balancer takes the same decision, as long as they see the same set of L7 load-balancers:

L4 load-balancing instability 1/3
Loosing a L4 load-balancer has no impact on existing flows. Each arrow is an example of flow. The dots are flow endpoints bound to the associated load-balancer. If they had moved to another load-balancer, connection would have been lost.

If we add a L7 load-balancer, existing flows are not impacted either because only new connections will be scheduled to it. For existing connections, IPVS will look at its local connection table and continue to forward packets to the original destination. Similarly, if we remove a L7 load-balancer, only existing flows terminating at this load-balancer are impacted. Other existing connections will be forwarded correctly:

L4 load-balancing instability 2/3
Loosing a L7 load-balancer only impacts the flows bound to it.

We need to have simultaneous changes on both levels to get a noticeable impact. For example, when adding both a L4 load-balancer and a L7 load-balancer, only connections moved to a L4 load-balancer without state and scheduled to the new load-balancer will be broken. Thanks to the consistent hashing algorithm, other connections will stay bound to the right L7 load-balancer. During a planned change, this disruption can be minimized by adding the new L4 load-balancers first, waiting a few minutes, then adding the new L7 load-balancers.

L4 load-balancing instability 3/3
Both a L4 load-balancer and a L7 load-balancer come back to life. The consistent hash algorithm ensures that only one fifth of the existing connections would be moved to the incoming L7 load-balancer. Some of them continue to be routed through their original L4 load-balancer, which mitigates the impact.

Additionally, IPVS correctly routes ICMP messages to the same L7 load-balancers as the associated connections. This ensures notably path MTU discovery works and there is no need for smart workarounds.

Tier 0: DNS load-balancing🔗

Optionally, you can add DNS load-balancing to the mix. This is useful either if your setup is spanned accross multiple datacenters, or multiple cloud regions, or if you want to break a large load-balancing cluster into smaller ones. It is not intended to replace the first tier as it doesn’t share the same characteristics: load-balancing is unfair (it is not flow-based) and recovery from a failure is slow.

Complete load-balancing solution
A complete load-balancing solution spanning accross two datacenters.

gdnsd is an authoritative-only DNS server with integrated healthchecking. It can serve zones from master files using the RFC 1035 zone format:

@ SOA ns1 1 7200 1800 259200 900
@ NS
@ NS
@ MX 10 smtp

@     60 DYNA multifo!web
www   60 DYNA multifo!web
smtp     A

The special RR type DYNA will return A and AAAA records after querying the specified plugin. Here, the multifo plugin implements an all-active failover of monitored addresses:

service_types => {
  web => {
    plugin => http_status
    url_path => /healthcheck
    down_thresh => 5
    interval => 5
  ext => {
    plugin => extfile
    file => /etc/lb/ext
    def_down => false

plugins => {
  multifo => {
    web => {
      service_types => [ ext, web ]
      addrs_v4 => [, ]
      addrs_v6 => [ 2001:db8::, 2001:db8:: ]

In nominal state, an A request will be answered with both and An healthcheck failure will update the returned set accordingly. It is also possible to administratively remove an entry by modifying the /etc/lb/ext file. For example, with the following content, will not be advertised anymore: => UP => DOWN
2001:db8::c633:6401 => UP
2001:db8::c633:6402 => UP

You can find all the configuration files and the setup of each tier in the GitHub repository. If you want to replicate this setup at a smaller scale, it is possible to collapse the second and the third tiers by using either localnode or network namespaces. Even if you don’t need its fancy load-balancing services, you should keep the last tier: while backend servers come and go, the L7 load-balancers bring stability, which translates to resiliency.

  1. In this article, “backend servers” are the servers behind the load-balancing layer. To avoid confusion, we will not use the term “frontend.” ↩︎

  2. A good summary of the paper is available from Adrian Colyer. From the same author, you may also have a look at the summary for “Stateless datacenter load-balancing with Beamer.” ↩︎

  3. If you feel this solution is fragile, feel free to develop your own agent. It could coordinate with a key-value store to determine the wanted state of the server. It is possible to centralize the agent in a single location, but you may get a chicken-and-egg problem to ensure its availability. ↩︎

  4. A flow is usually determined by the source and destination IP and the L4 protocol. Alternatively, the source and destination port can also be used. The router hashes these information to choose the destination. For Linux, you may find more information on this topic in “Celebrating ECMP in Linux.” ↩︎

  5. On Linux, it can be implemented by using Netfilter for load-balancing and conntrackd to synchronize state. IPVS only provides active/backup synchronization. ↩︎

  6. The backport is not strictly equivalent to its original version. Be sure to check the README file to understand the differences. Briefly, in Keepalived configuration, you should:

    • not use inhibit_on_failure
    • use sh-port
    • not use sh-fallback


  7. At least 1520 for IPv4 and 1540 for IPv6. ↩︎

  8. As is, this configuration is a insecure. You need to ensure only the L4 load-balancers will be able to send IPIP traffic. ↩︎

by Vincent Bernat at May 23, 2018 08:59 AM

Chris Siebenmann

Almost no one wants to run their own infrastructure

Every so often, people get really enthused about the idea of a less concentrated, more distributed Internet, one where most of our email isn't inside only a few places, our online chatting happens over federated systems instead of Twitter, there are flower gardens of personal websites and web servers, there are lots of different Git servers instead of mostly Github, and so on. There are many obstacles in the way of this, including that the current large providers don't want to let people go, but over time I have come to think that a large underappreciated one is simply that people don't want to run their own infrastructure. Not even if it's free to do so.

I'm a professional system administrator. I know how to run my own mail and IMAP server, and I know that I probably should and will have to some day. Do I actually run my own server? Of course not. It's a hassle. I have things on Github, and in theory I could publish them outside Github too, on a machine where I'm already running a web server. Have I done so? No, it's not worth the effort when the payoff I'd get is basically only feeling good.

Now, I've phrased this as if running your own infrastructure is easy and the only thing keeping people from doing so is the minor effort and expense involved. We shouldn't underestimate the effects of even minor extra effort and expense, but the reality is that doing a good job of your own infrastructure is emphatically not a minor effort. There is security, TLS certificates, (offsite) backups, choosing the software, managing configuration, long term maintenance and updates, and I'm assuming that someone else has already built the system and you just have to set up an instance of it.

(And merely setting up an instance of something is often fraught with annoyance and problems, especially for a non-specialist.)

If you use someone else's infrastructure and they're decently good at it, they're worrying about all of that and more things on top (like monitoring, dealing with load surges and DDOSes, and fixing things in the dead of night). Plus, they're on the right side of the issues universities have with running their own email; many such centralized places are paying entire teams of hard-working good people to improve their services (or at least the ones that they consider strategic). I like open source, but it's fairly rare that it can compete head to head with something that a significant company considers a strategic product.

Can these problems be somewhat solved? Sure, but until we get much better 'computing as a utility' (if we ever do), a really usable solution is a single-vendor solution, which just brings us back to the whole centralization issue again. Maybe life is a bit better if we're all hosting our federated chat systems and IMAP servers and Git repo websites in the AWS cloud using canned one-click images, but it's not quite the great dream of a truly decentralized and democratic Internet.

(Plus, it still involves somewhat more hassle than using Github and Twitter and Google Mail, and I think that hassle really does matter. Convinced people are willing to fight a certain amount of friction, but to work, the dream of a decentralized Internet needs to reach even the people who don't really care.)

All of this leads me to the conclusion that any decentralized Internet future that imagines lots of people running their own infrastructure is dead on arrival. It's much more likely that any decentralized future will involve a fair amount of concentration, with many people choosing to use someone else's infrastructure and a few groups running most of it. This matters because running such a big instance for a bunch of people generally requires real money and thus some way of providing it. If there is no real funding model, the whole system is vulnerable to a number of issues.

(See, for example, Mastodon, which is fairly centralized in practice with quite a number of large instances, per the instance statistics.)

by cks at May 23, 2018 04:53 AM

May 22, 2018

Evaggelos Balaskas

Restrict email addresses for sending emails



Maintaining a (public) service can be sometimes troublesome. In case of email service, often you need to suspend or restrict users for reasons like SPAM, SCAM or Phishing. You have to deal with inactive or even compromised accounts. Protecting your infrastructure is to protect your active users and the service. In this article I’ll propose a way to restrict messages to authorized addresses when sending an email and get a bounce message explaining why their email was not sent.


Reading Material

The reference documentation when having a Directory Service (LDAP) as our user backend and using Postfix:




In this post, we will not get into openldap internals but as reference I’ll show an example user account (this is from my working test lab).


dn: uid=testuser2,ou=People,dc=example,dc=org
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
objectClass: posixAccount
smtpd_sender_restrictions: true
cn: Evaggelos Balaskas
sn: Balaskas
givenName: Evaggelos
uidNumber: 99
gidNumber: 12
uid: testuser2
homeDirectory: /storage/vhome/%d/%n
userPassword: XXXXXXXXXX

as you can see, we have a custom ldap attribute:

smtpd_sender_restrictions: true

keep that in mind for now.



The default value of smtpd_sender_restrictions is empty, that means by default the mail server has no sender restrictions. Depending on the policy we either can whitelist or blacklist in postfix restrictions, for the purpose of this blog post, we will only restrict (blacklist) specific user accounts.



To do that, let’s create a new file that will talk to our openldap and ask for that specific ldap attribute.

server_host = ldap://localhost
server_port = 389
search_base = ou=People,dc=example,dc=org
query_filter = (&(smtpd_sender_restrictions=true)(mail=%s))
result_attribute = uid
result_filter = uid
result_format = REJECT This account is not allowed to send emails, plz talk to
version = 3
timeout = 5

This is an anonymous bind, as we do not search for any special attribute like password.


Status Codes

The default status code will be: 554 5.7.1
Take a look here for more info: RFC 3463 - Enhanced Mail System Status Codes


Test it

# postmap -q ldap:/etc/postfix/
REJECT This account is not allowed to send emails, plz talk to

Add -v to extent verbosity

# postmap -v -q ldap:/etc/postfix/


Possible Errors

postmap: fatal: unsupported dictionary type: ldap

Check your postfix setup with postconf -m . The result should be something like this:


If not, you need to setup postfix to support the ldap dictionary type.



Modify the to add the

# applied in the context of the MAIL FROM
smtpd_sender_restrictions =
        check_sender_access ldap:/etc/postfix/

and reload postfix

# postfix reload

If you keep logs, tail them to see any errors.






May 19 13:20:26 centos6 postfix/smtpd[20905]:
NOQUEUE: reject: RCPT from XXXXXXXX[XXXXXXXX]: 554 5.7.1 <>:
Sender address rejected: This account is not allowed to send emails, plz talk to;
from=<> to=<> proto=ESMTP helo=<[]>
Tag(s): postfix, ldap

May 22, 2018 05:12 PM


Database schemas vs. identity

Yesterday brought this tweet up:

This is amazingly bad wording, and is the kind of thing that made the transpeople in my timeline (myself included) go "Buwhuh?" and me to wonder if this was a snopes worthy story.

No, actually.

The key phrase here is, "submit your prints for badges".

There are two things you should know:

  1. NASA works on National Security related things, which requires a security clearance to work on, and getting one of those requires submitting prints.
  2. The FBI is the US Government's authority in handling biometric data

Here is a chart from the Electronic Biometric Transmission Specification, which describes a kind of API for dealing with biometric data.

If Following Condition ExistsEnter Code
Subject's gender reported as femaleF
Occupation or charge indicated "Male Impersonator"G
Subject's gender reported as maleM
Occupation or charge indicated "Female Impersonator" or transvestiteN
Male name, no gender givenY
Female name, no gender givenZ
Unknown genderX

Source: EBTS Version 10.0 Final, page 118.

Yep, it really does use the term "Female Impersonator". To a transperson living in 2016 getting their first Federal job (even as a contractor), running into these very archaic terms is extremely off-putting.

As someone said in a private channel:

This looks like some 1960's bureaucrat trying to be 'inclusive'

This is not far from the truth.

This table exists unchanged in the 7.0 version of the document, dated January 1999. Previous versions are in physical binders somewhere, and not archived on the Internet; but the changelog for the V7 document indicates that this wording was in place as early as 1995. Mention is also made of being interoperable with UK law-enforcement.

The NIST standard for fingerprints issued in 1986 mentions a SEX field, but it only has M, F, and U; later NIST standards drop this field definition entirely.

As this field was defined in standard over 20 years ago and has not been changed, is used across the full breadth of the US justice system, is referenced in International communications standards including Visa travel, and used as the basis for US Military standards, these field definitions are effectively immutable and will only change after concerted effort over decades.

This is what institutionalized transphobia looks like, and we will be dealing with it for another decade or two. If not longer.

The way to deal with this is to deprecate the codes in documentation, but still allow them as valid.

  • Create a deprecation notice in the definition of the field saying that the G and N values are to be considered deprecated and should not be used.
  • In the deprecation notice say that in the future, new records will not be accepted with those values.
  • Those values will remain valid for queries, because there are decades of legacy-coding in databases using them.

The failure-mode of this comes in with form designers who look at the spec and build forms based on the spec. Like this example from Maryland. Which means we need to let the forms designers know that the spec needs to be selectively ignored. The deprecation notice does that.

At the local level, convince your local City Council to pass resolutions to modernize their Police forms to reflect modern sensibilities, and drop the G and N codes from intake forms. Do this at the County too, for the Sheriff's department.

At the state level, convince your local representatives to push resolutions to get the State Patrol to modernize their forms likewise. Drop the G and N codes from the forms.

At the Federal employee level, there is less to be done here as you're closer to the governing standards, but you may be able to convince The Powers That Be to drop the two offensive checkboxes or items from the drop-down list.

At the Federal standard level. Lobby the decision makers that govern this standard and push for a deprecation notice. If any of your congress-people are on any Judiciary committees, you'll have more luck than most.

by SysAdmin1138 at May 22, 2018 03:35 PM

May 21, 2018

Steve Kemp's Blog

This month has been mostly golang-based

This month has mostly been about golang. I've continued work on the protocol-tester that I recently introduced:

This has turned into a fun project, and now all my monitoring done with it. I've simplified the operation, such that everything uses Redis for storage, and there are now new protocol-testers for finger, nntp, and more.

Sample tests are as basic as this: must run smtp must run smtp with port 587 must run imaps must run http with content 'Prayer Webmail service'

Results are stored in a redis-queue, where they can picked off and announced to humans via a small daemon. In my case alerts are routed to a central host, via HTTP-POSTS, and eventually reach me via the pushover

Beyond the basic network testing though I've also reworked a bunch of code - so the markdown sharing site is now golang powered, rather than running on the previous perl-based code.

As a result of this rewrite, and a little more care, I now score 99/100 + 100/100 on Google's pagespeed testing service. A few more of my sites do the same now, thanks to inline-CSS, inline-JS, etc. Nothing I couldn't have done before, but this was a good moment to attack it.

Finally my "silly" Linux security module, for letting user-space decide if binaries should be executed, can-exec has been forward-ported to v4.16.17. No significant changes.

Over the coming weeks I'll be trying to move more stuff into the cloud, rather than self-hosting. I'm doing a lot of trial-and-error at the moment with Lamdas, containers, and dynamic-routing to that end.

Interesting times.

May 21, 2018 09:01 AM

May 20, 2018

Errata Security

masscan, macOS, and firewall

One of the more useful features of masscan is the "--banners" check, which connects to the TCP port, sends some request, and gets a basic response back. However, since masscan has it's own TCP stack, it'll interfere with the operating system's TCP stack if they are sharing the same IPv4 address. The operating system will reply with a RST packet before the TCP connection can be established.

The way to fix this is to use the built-in packet-filtering firewall to block those packets in the operating-system TCP/IP stack. The masscan program still sees everything before the packet-filter, but the operating system can't see anything after the packet-filter.

Note that we are talking about the "packet-filter" firewall feature here. Remember that macOS, like most operating systems these days, has two separate firewalls: an application firewall and a packet-filter firewall. The application firewall is the one you see in System Settings labeled "Firewall", and it controls things based upon the application's identity rather than by which ports it uses. This is normally "on" by default. The packet-filter is normally "off" by default and is of little use to normal users.

Also note that macOS changed packet-filters around version 10.10.5 ("Yosemite", October 2014). The older one is known as "ipfw", which was the default firewall for FreeBSD (much of macOS is based on FreeBSD). The replacement is known as PF, which comes from OpenBSD. Whereas you used to use the old "ipfw" command on the command line, you now use the "pfctl" command, as well as the "/etc/pf.conf" configuration file.

What we need to filter is the source port of the packets that masscan will send, so that when replies are received, they won't reach the operating-system stack, and just go to masscan instead. To do this, we need find a range of ports that won't conflict with the operating system. Namely, when the operating system creates outgoing connections, it randomly chooses a source port within a certain range. We want to use masscan to use source ports in a different range.

To figure out the range macOS uses, we run the following command:

sysctl net.inet.ip.portrange.first net.inet.ip.portrange.last

On my laptop, which is probably the default for macOS, I get the following range. Sniffing with Wireshark confirms this is the range used for source ports for outgoing connections.

net.inet.ip.portrange.first: 49152
net.inet.ip.portrange.last: 65535

So this means I shouldn't use source ports anywhere in the range 49152 to 65535. On my laptop, I've decided to use for masscan the ports 40000 to 41023. The range masscan uses must be a power of 2, so here I'm using 1024 (two to the tenth power).

To configure masscan, I can either type the parameter "--source-port 40000-41023" every time I run the program, or I can add the following line to /etc/masscan/masscan.conf. Remember that by default, masscan will look in that configuration file for any configuration parameters, so you don't have to keep retyping them on the command line.

source-port = 40000-41023

Next, I need to add the following firewall rule to the bottom of /etc/pf.conf:

block in proto tcp from any to any port 40000 >< 41024

However, we aren't done yet. By default, the packet-filter firewall is off. Therefore, every time you reboot your computer, you need to enable it. The simple way to do this is on the command line run:

pfctl -e

Or, if that doesn't work, try:

pfctl -E

Ideally, you'd want it to start automatically on bootup. I haven't figure out how to do this one macOS in an approved fashion that doesn't conflict with something else. Apparently there are a few GUIs that will do this for you.

by Robert Graham ( at May 20, 2018 09:15 PM

May 18, 2018

HP-UX 11.31 System information & find out part number of a failed disk with sasmgr

On one of my regular scheduled datacenter visits one of the older HP-UX Itanium machines had an orange light on the front. These systems are not (yet) monitored, but still in use so the disk had to be replaced. Not knowing anything about this system or which parts were used, I managed to find the exact part number and device type so we could order a spare. This small guide uses sasmgr to get the data on HP-UX 11.31.

May 18, 2018 12:00 AM

May 17, 2018

Cryptography Engineering

Was the Efail disclosure horribly screwed up?

TL;DR. No. Or keep reading if you want.

On Monday a team of researchers from Münster, RUB and NXP disclosed serious cryptographic vulnerabilities in a number of encrypted email clients. The flaws, which go by the cute vulnerability name of “Efail”, potentially allow an attacker to decrypt S/MIME or PGP-encrypted email with only minimal user interaction.

By the standards of cryptographic vulnerabilities, this is about as bad as things get. In short: if an attacker can intercept and alter an encrypted email — say, by sending you a new (altered) copy, or modifying a copy stored on your mail server — they can cause many GUI-based email clients to send the full plaintext of the email to an attacker controlled-server. Even worse, most of the basic problems that cause this flaw have been known for years, and yet remain in clients.

The big (and largely under-reported) story of EFail is the way it affects S/MIME. That “corporate” email protocol is simultaneously (1) hated by the general crypto community because it’s awful and has a slash in its name, and yet (2) is probably the most widely-used email encryption protocol in the corporate world. The table at the right — excerpted from the paper — gives you a flavor of how Efail affects S/MIME clients. TL;DR it affects them very badly.

Efail also happens to affect a smaller, but non-trivial number of OpenPGP-compatible clients. As one might expect (if one has spent time around PGP-loving folks) the disclosure of these vulnerabilities has created something of a backlash on HN, and among people who make and love OpenPGP clients. Mostly for reasons that aren’t very defensible.

So rather than write about fun things — like the creation of CFB and CBC gadgets — today, I’m going to write about something much less exciting: the problem of vulnerability disclosure in ecosystems like PGP. And how bad reactions to disclosure can hurt us all.

How Efail was disclosed to the PGP community

Putting together a comprehensive timeline of the Efail disclosure process would probably be a boring, time-intensive project. Fortunately Thomas Ptacek loves boring and time-intensive projects, and has already done this for us.

Briefly, the first Efail disclosures to vendors began last October, more than 200 days prior to the agreed publication date. The authors notified a large number of vulnerable PGP GUI clients, and also notified the GnuPG project (on which many of these projects depend) by February at the latest. From what I can tell every major vendor agreed to make some kind of patch. GnuPG decided that it wasn’t their fault, and basically stopped corresponding.

All parties agreed not to publicly discuss the vulnerability until an agreed date in April, which was later pushed back to May 15. The researchers also notified the EFF and some journalists under embargo, but none of them leaked anything. On May 14 someone dumped the bug onto a mailing list. So the EFF posted a notice about the vulnerability (which we’ll discuss a bit more below), and the researchers put up a website. That’s pretty much the whole story.

There are three basic accusations going around about the Efail disclosure. They can be summarized as (1) maintaining embargoes in coordinated disclosures is really hard, (2) the EFF disclosure “unfairly” made this sound like a serious vulnerability “when it isn’t”, and (3) everything was already patched anyway so what’s the big deal.

Disclosures are hard; particularly coordinated ones

I’ve been involved in two disclosures of flaws in open encryption protocols. (Both were TLS issues.) Each one poses an impossible dilemma. You need to simultaneously (a) make sure every vendor has as much advance notice as possible, so they can patch their software. But at the same time (b) you need to avoid telling literally anyone, because nothing on the Internet stays secret. At some point you’ll notify some FOSS project that uses an open development mailing list or ticket server, and the whole problem will leak out into the open.

Disclosing bugs that affect PGP is particularly fraught. That’s because there’s no such thing as “PGP”. What we have instead is a large and distributed community that revolves around the OpenPGP protocol. The pillar of this community is the GnuPG project, which maintains the core GnuPG tool and libraries that many clients rely on. Then there are a variety of niche GUI-based clients and email plugin projects. Finally, there are commercial vendors like Apple and Microsoft. (Who are mostly involved in the S/MIME side of things, and may reluctantly allow PGP plugins.)

Then, of course there are thousands of end-users, who will generally fail to update their software unless something really bad and newsworthy happens.

The obvious solution to the disclosure problem to use a staged disclosure. You notify the big commercial vendors first, since that’s where most of the affected users are. Then you work your way down the “long tail” of open source projects, knowing that inevitably the embargo could break and everyone will have to patch in a hurry. And you keep in mind that no matter what happens, everyone will blame you for screwing up the disclosure.

For the PGP issues in Efail, the big client vendors are Mozilla (Thunderbird), Microsoft (Outlook) and maybe Apple (Mail). The very next obvious choice would be to patch the GnuPG tool so that it no longer spits out unauthenticated plaintext, which is the root of many of the problems in Efail.

The Efail team appears to have pursued exactly this approach for the client-side vulnerabilities. Sadly, the GnuPG team made the decision that it’s not their job to pre-emptively address problems that they view as ‘clients misusing the GnuPG API’ (my paraphrase), even when that misuse appears to be rampant across many of the clients that use their tool. And so the most obvious fix for one part of the problem was not available.

This is probably the most unfortunate part of the Efail story, because in this case GnuPG is very much at fault. Their API does something that directly violates cryptographic best practices — namely, releasing unauthenticated plaintext prior to producing an error message. And while this could be understood as a reasonable API design at design time, continuing to support this API even as clients routinely misuse it has now led to flaws across the ecosystem. The refusal of GnuPG to take a leadership role in preemptively safeguarding these vulnerabilities both increases the difficulty of disclosing these flaws, and increases the probability of future issues.

So what went wrong with the Efail disclosure?

Despite what you may have heard, given the complexity of this disclosure, very little went wrong. The main issues people have raised seem to have to do with the contents of an EFF post. And with some really bad communications from Robert J. Hansen at the Enigmail (and GnuPG) project.

The EFF post. The Efail researchers chose to use the Electronic Frontier Foundation as their main source for announcing the existence of the vulnerability to the privacy community. This hardly seems unreasonable, because the EFF is generally considered a trusted broker, and speaks to the right community (at least here in the US).

The EFF post doesn’t give many details, nor does it give a list of affected (or patched) clients. It does give two pretty mild recommendations:

  1. Temporarily disable or uninstall your existing clients until you’ve checked that they’re patched.
  2. Maybe consider using a more modern cryptosystem like Signal, at least until you know that your PGP client is safe again.

This naturally led to a huge freakout by many in the PGP community. Some folks, including vendors, have misrepresented the EFF post as essentially pushing people to “permanently” uninstall PGP, which will “put lives at risk” because presumably these users (whose lives are at risk, remember) will immediately fall back to sending incriminating information via plaintext emails — rather than temporarily switching their communications to one of several modern, well-studied secure messengers, or just not emailing for a few hours.

In case you think I’m exaggerating about this, here’s one reaction from ProtonMail:

The most reasonable criticism I’ve heard of the EFF post is that it doesn’t give many details about which clients are patched, and which are vulnerable. This could presumably give someone the impression that this vulnerability is still present in their email client, and thus would cause them to feel less than secure in using it.

I have to be honest that to me that sounds like a really good outcome. The problem with Efail is that it doesn’t matter if your client is secure. The Efail vulnerability could affect you if even a single one of your communication partners is using an insecure client.

So needless to say I’m not very sympathetic to the reaction around the EFF post. If you can’t be sure whether your client is secure, you probably should feel insecure.

Bad communications from GnuPG and Enigmail. On the date of the disclosure, anyone looking for accurate information about security from two major projects — GnuPG and Enigmail — would not have been able to find it.

They wouldn’t have found it because developers from both Enigmail and GnuPG were on mailing lists and Twitter claiming that they had never heard of Efail, and hadn’t been notified by the researchers. Needless to say, these allegations took off around the Internet, sometimes in place of real information that could have helped users (like, whether either project had patched.)

It goes without saying that neither allegation was actually true. In fact, both project members soon checked with their fellow developers (and their memories) and found out that they’d both been given months of notice by the researchers, and that Enigmail had even developed a patch. (However, it turned out that even this patch may not perfectly address the issue, and the community is still working to figure out exactly what still needs to be done.)

This is an understandable mistake, perhaps. But it sure is a bad one.

PGP is bad technology and it’s making a bad community

Now that I’ve made it clear that neither the researchers nor the EFF is out to get the PGP community, let me put on my mask and horns and tell you why someone should be.

I’ve written extensively about PGP on this blog, but in the past I’ve written mostly from a technical point of view about the problems with PGP. But what’s really problematic about PGP is not just the cryptography; it’s the story it tells about path dependence and how software communities work.

The fact of the matter is that OpenPGP is not really a cryptography project. That is, it’s not held together by cryptography.  It’s held together by backwards-compatibility and (increasingly) a kind of an obsession with the idea of PGP as an end in and of itself, rather than as a means to actually make end-users more secure.

Let’s face it, as a protocol, PGP/OpenPGP is just not what we’d develop if we started over today. It was formed over the years out of mostly experimental parts, which were in turn replaced, bandaged and repaired — and then worked into numerous implementations, which all had to be insanely flexible and yet compatible with one another. The result is bad, and most of the software implementing it is worse. It’s the equivalent of a beloved antique sports car, where the electrical system is totally shot, but it still drives. You know, the kind of car where the owner has to install a hand-switch so he can turn the reverse lights on manually whenever he wants to pull out of a parking space.

If PGP went away, I estimate it would take the security community less than a year to entirely replace (the key bits of) the standard with something much better and modern. It would have modern crypto and authentication, and maybe even extensions for future post-quantum future security. It would be simple. Many bright new people would get involved to help write the inevitable Rust, Go and Javascript clients and libraries.

Unfortunately for us all, (Open)PGP does exist. And that means that even fancy greenfield email projects feel like they need to support OpenPGP, or at least some subset of it. This in turn perpetuates the PGP myth, and causes other clients to use it. And as a direct result, even if some clients re-implement OpenPGP from scratch, other clients will end up using tools like GnuPG which will support unauthenticated encryption with bad APIs. And the cycle will go round and around, like a spaceship stuck near the event horizon of a black hole.

And as the standard perpetuates itself, largely for the sake of being a standard, it will fail to attract new security people. It will turn away exactly the type of people who should be working on these tools. Those people will go off and build encryption systems in a totally different area, or they’ll get into cryptocurrency. And — with some exceptions — the people who work in the community will increasingly work in that community because they’re supporting PGP, and not because they’re trying to seek out the best security technologies for their users. And the serious (email) users of PGP will be using it because they like the idea of using PGP better than they like using an actual, secure email standard.

And as things get worse, and fail to develop, people who work on it will become more dogmatic about its importance, because it’s something threatened and not a real security protocol that anyone’s using. To me that’s where PGP is going today, and that is why the community has such a hard time motivating itself to take these vulnerabilities seriously, and instead reacts defensively.

Maybe that’s a random, depressing way to end a post. But that’s the story I see in OpenPGP. And it makes me really sad.

by Matthew Green at May 17, 2018 07:06 PM

May 16, 2018


Changing the Guiding Principles in Our Security Policy

“That we remove “We strongly believe that the right to advance patches/info should not be based in any way on paid membership to some forum. You can not pay us to get security patches in advance.” from the security policy and Mark posts a blog entry to explain the change including that we have no current such service.”

At the OpenSSL Management Committee meeting earlier this month we passed the vote above to remove a section our security policy. Part of that vote was that I would write this blog post to explain why we made this change.

At each face to face meeting we aim to ensure that our policies still match the view of the current membership committee at that time, and will vote to change those that don’t.

Prior to 2018 our Security Policy used to contain a lot of background information on why we selected the policy we did, justifying it and adding lots of explanatory detail. We included details of things we’d tried before and things that worked and didn’t work to arrive at our conclusion. At our face to face meeting in London at the end of 2017 we decided to remove a lot of the background information and stick to explaining the policy simply and concisely. I split out what were the guiding principles from the policy into their own list.

OpenSSL has some full-time fellows who are paid from various revenue sources coming into OpenSSL including sponsorship and support contracts. We’ve discussed having the option in the future to allow us to share patches for security issues in advance to these support contract customers. We already share serious issues a little in advance with some OS vendors (and this is still a principle in the policy to do so), and this policy has helped ensure that the patches and advisory get an extra level of testing before being released.

Thankfully there are relatively few serious issues in OpenSSL these days; the last worse than Moderate severity being in February 2017.

In the vote text we wrote that we have “no current such service” and neither do we have any plan right now to create such a service. But we allow ourselves to consider such a possibility in the future now that this principle, which no longer represents the view of the OMC, is removed.

May 16, 2018 09:00 PM

Icinga2 / Nagios / Net::SNMP change the default timeout of 60 seconds

Recently a rather large amount of new infrastructure was added to one of my monitoring instances. Using SNMP exclusively, but not the fastest network or infrastructure. The SNMP checks in the Icinga2 instance started giving timeouts, which look like false positives and give unclean logs. Raising the SNMP timeout for the checks above 60 seconds was not that easy since the 60 second timeout is hardcoded in the underlying library (NET::SNMP). This article shows you how to raise that timeout on an Ubuntu 16.04 system.

May 16, 2018 12:00 AM

May 15, 2018


Bejtlich Joining Splunk

Since posting Bejtlich Moves On I've been rebalancing work, family, and personal life. I invested in my martial arts interests, helped more with home duties, and consulted through TaoSecurity.

Today I'm pleased to announce that, effective Monday May 21st 2018, I'm joining the Splunk team. I will be Senior Director for Security and Intelligence Operations, reporting to our CISO, Joel Fulton. I will help build teams to perform detection and monitoring operations, digital forensics and incident response, and threat intelligence. I remain in the northern Virginia area and will align with the Splunk presence in Tyson's Corner.

I'm very excited by this opportunity for four reasons. First, the areas for which I will be responsible are my favorite aspects of security. Long-time blog readers know I'm happiest detecting and responding to intruders! Second, I already know several people at the company, one of whom began this journey by Tweeting about opportunities at Splunk! These colleagues are top notch, and I was similarly impressed by the people I met during my interviews in San Francisco and San Jose.

Third, I respect Splunk as a company. I first used the products over ten years ago, and when I tried them again recently they worked spectacularly, as I expected. Fourth, my new role allows me to be a leader in the areas I know well, like enterprise defense and digital operational art, while building understanding in areas I want to learn, like cloud technologies, DevOps, and security outside enterprise constraints.

I'll have more to say about my role and team soon. Right now I can share that this job focuses on defending the Splunk enterprise and its customers. I do not expect to spend a lot of time in sales cycles. I will likely host visitors in the Tyson's areas from time to time. I do not plan to speak as much with the press as I did at Mandiant and FireEye. I'm pleased to return to operational defense, rather than advise on geopolitical strategy.

If this news interests you, please check our open job listings in information technology. As a company we continue to grow, and I'm thrilled to see what happens next!

by Richard Bejtlich ( at May 15, 2018 06:40 PM

May 14, 2018

Remote Desktop error: CredSSP encryption oracle remediation

The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on

A while back, Microsoft announced it would ship updates to both its RDP client & server components to resolve a critical security vulnerability. That rollout is now happening and many clients have received auto-updates for their client.

As a result, you might see this message/error when connecting to an unpatched Windows server:

It refers to CredSSP updates for CVE-2018-0886, which further explains the vulnerability and why it's been patched now.

But here's the catch: if your client is updated but your server isn't (yet), you can no longer RDP to that machine. Here's a couple of fixes;

  1. Find an old computer/RDP client to connect with
  2. Get console access to the server to run the updates & reboot the machine

If your client has been updated, there's no way to connect to an unpatched Windows server via Remote Desktop anymore.

The post Remote Desktop error: CredSSP encryption oracle remediation appeared first on

by Mattias Geniar at May 14, 2018 08:21 AM

May 13, 2018

Evaggelos Balaskas




One of the most common security concerns (especially when traveling) is the attach of unknown USB device on our system.

There are a few ways on how to protect your system.


Hardware Protection


Cloud Storage

More and more companies are now moving from local storage to cloud storage as a way to reduce the attack surface on systems:

IBM a few days ago, banned portable storage devices


Hot Glue on USB Ports

also we must not forget the old but powerful advice from security researches & hackers:


by inserting glue or using a Hot Glue Gun to disable the USB ports of a system.

Problem solved!



I was reading the redhat 7.5 release notes and I came upon on usbguard:



The USBGuard software framework helps to protect your computer against rogue USB devices (a.k.a. BadUSB) by implementing basic whitelisting / blacklisting capabilities based on device attributes.


USB protection framework

So the main idea is you run a daemon on your system that tracks udev monitor system. The idea seams like the usb kill switch but in a more controlled manner. You can dynamical whitelist or/and blacklist devices and change the policy on such devices more easily. Also you can do all that via a graphical interface, although I will not cover it here.


Archlinux Notes

for archlinux users, you can find usbguard in AUR (Archlinux User Repository)

AUR : usbguard

or you can try my custom PKGBUILDs files


How to use usbguard

Generate Policy

The very first thing is to generate a policy with the current attached USB devices.

sudo usbguard generate-policy

Below is an example output, viewing my usb mouse & usb keyboard :

allow id 17ef:6019 serial "" name "Lenovo USB Optical Mouse" hash "WXaMPh5VWHf9avzB+Jpua45j3EZK6KeLRdPcoEwlWp4=" parent-hash "jEP/6WzviqdJ5VSeTUY8PatCNBKeaREvo2OqdplND/o=" via-port "3-4" with-interface 03:01:02

allow id 045e:00db serial "" name "Naturalxc2xae Ergonomic Keyboard 4000" hash "lwGc9o+VaG/2QGXpZ06/2yHMw+HL46K8Vij7Q65Qs80=" parent-hash "kv3v2+rnq9QvYI3/HbJ1EV9vdujZ0aVCQ/CGBYIkEB0=" via-port "1-1.5" with-interface { 03:01:01 03:00:00 }

The default policy for already attached USB devices are allow.


We can create our rules configuration file by:

sudo usbguard generate-policy > /etc/usbguard/rules.conf



starting and enabling usbguard service via systemd:

systemctl start usbguard.service

systemctl enable usbguard.service


List of Devices

You can view the list of attached USB devices and

sudo usbguard list-devices


Allow Device

Attaching a new USB device (in my case, my mobile phone):

$ sudo usbguard list-devices | grep -v allow

we will see that the default policy is to block it:

17: block id 12d1:107e serial "7BQDU17308005969" name "BLN-L21" hash "qq1bdaK0ETC/thKW9WXAwawhXlBAWUIowpMeOQNGQiM=" parent-hash "kv3v2+rnq9QvYI3/HbJ1EV9vdujZ0aVCQ/CGBYIkEB0=" via-port "2-1.5" with-interface { ff:ff:00 08:06:50 }

So we can allow it by:

sudo usbguard allow-device 17


sudo usbguard list-devices | grep BLN-L21

we can verify that is okay:

17: allow id 12d1:107e serial "7BQDU17308005969" name "BLN-L21" hash "qq1bdaK0ETC/thKW9WXAwawhXlBAWUIowpMeOQNGQiM=" parent-hash "kv3v2+rnq9QvYI3/HbJ1EV9vdujZ0aVCQ/CGBYIkEB0=" via-port "2-1.5" with-interface { ff:ff:00 08:06:50 }


Block USB on screen lock

The default policy, when you (or someone else) are inserting a new USB device is:

sudo usbguard get-parameter InsertedDevicePolicy

is to apply the default policy we have. There is a way to block or reject any new USB device when you have your screen locker on, as this may be a potential security attack on your system. In theory, you are inserting USB devices as you are working on your system, and not when you have your screen lock on.

I use slock as my primary screen locker via a keyboard shortcut. So the easiest way to dynamical change the default policy on usbguard is via a shell wrapper:

vim /usr/local/bin/slock

# ebal, Sun, 13 May 2018 10:07:53 +0300

# function to revert the policy
revert() {
  usbguard set-parameter InsertedDevicePolicy ${POLICY_UNLOCKED}

usbguard set-parameter InsertedDevicePolicy ${POLICY_LOCKED}


# shell function to revert reject policy

(you can find the same example on redhat’s blog post).

May 13, 2018 06:42 PM

Multiple passwords for one user, UIC uniqueness and the system password on OpenVMS

In the book I bought about OpenVMS for the previous post on filesystems, 'Getting Started with OpenVMS by M. Duffy', I've read a few interesting things in the chapter that introduces user accounts and system login. Namely that a user can have multiple passwords, that user ID's are not unique and that there can be a system password. This article goes in to those three topics.

May 13, 2018 12:00 AM

May 08, 2018

Sean's IT Blog

VMware Horizon and Horizon Cloud Enhancements – Part 1

This morning, VMware announced enhancements to both the on-premises Horizon Suite and Horizon Cloud product sets.  Although there are a lot of additions to all products in the Suite, the VMware blog post did not go too indepth into many of the new features that you’ll be seeing in the upcoming editions.

VMware Horizon 7.5

Let’s start with the biggest news in the blog post – the announcement of Horizon 7.5.  Horizon 7.5 brings several new, long-awaited, features with it.  Some of these features are:

  1. Support for Horizon on VMC (VMware on AWS)
  2. The “Just-in-Time” Management Platform (JMP)
  3. Horizon 7 Extended Service Branch (ESB)
  4. Instant Clone improvements, including support for the new vSphere 6.7 Instant Clone APIs
  5. Support for IPv4/IPv6 Mixed-Mode Operations
  6. Cloud-Pod Architecture support for 200K Sessions
  7. Support for Windows 10 Virtualization-Based Security (VBS) and vTPM on Full Clone Desktops
  8. RDSH Host-based GPO Support for managing protocol settings

I’m not going to touch on all of these items.  I think the first four are the most important for this portion of the suite.

Horizon on VMC

Horizon on VMC is a welcome addition to the Horizon portfolio.  Unlike Citrix, the traditional VMware Horizon product has not had a good cloud story because it has been tightly coupled to the VMware SDDC stack.  By enabling VMC support for Horizon, customers can now run virtual desktops in AWS, or utilize VMC as a disaster recovery option for Horizon environments.

Full clone desktops will be the only desktop type supported in the initial release of Horizon on VMC.  Instant Clones will be coming in a future release, but some additional development work will be required since Horizon will not have the same access to vCenter in VMC as it has in on-premises environments.  I’m also hearing that Linked Clones and Horizon Composer will not be supported in VMC.

The initial release of Horizon on VMC will only support core Horizon, the Unified Access Gateway, and VMware Identity Manager.  Other components of the Horizon Suite, such as UEM, vRealize Operations, and App Volumes have not been certified yet (although there should be nothing stopping UEM from working in Horizon on VMC because it doesn’t rely on any vSphere components).  Security Server, Persona Management, and ThinApp will not be supported.

Horizon Extended Service Branches

Under the current release cadence, VMware targets one Horizon 7 release per quarter.  The current support policy for Horizon states that a release only continues to receive bug fixes and security patches if a new point release hasn’t been available for at least 60 days.  Let’s break that down to make it a little easier to understand.

  1. VMware will support any version of Horizon 7.x for the lifecycle of the product.
  2. If you are currently running the latest Horizon point release (ex. Horizon 7.4), and you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version.
  3. If you are running Horizon 7.4, and Horizon 7.5 has been out for less than 60 days when you find a critical bug/security issue, VMware will issue a hot patch to fix it for that version.
  4. If you are running Horizon 7.4, and Horizon 7.5 has been out for more than 60 days when you find a critical bug/security issue, the fix for the bug will be applied to Horizon 7.5 or later, and you will need to upgrade to receive the fix.

In larger environments, Horizon upgrades can be non-trivial efforts that enterprises may not undertake every quarter.  There are also some verticals, such as healthcare, where core business applications are certified against specific versions of a product, and upgrading or moving away from that certified version can impact support or support costs for key business applications.

With Horizon 7.5, VMware is introducing a long-term support bundle for the Horizon Suite.  This bundle will be called the Extended Service Branch (ESB), and it will contain Horizon 7, App Volumes, User Environment Manager, and Unified Access Gateway.  The ESB will have 2 years of active support from release date where it will receive hot fixes, and each ESB will receive three service packs with critical bug and security fixes and support for new Windows 10 releases.  A new ESB will be released approximately every twelve months.

Each ESB branch will support approximately 3-4 Windows 10 builds, including any recent LTSC builds.  That means the Horizon 7.5 ESB release will support the Windows 10 1709, 1803, 1809 and 1809 LTSC builds of Windows 10.

This packaging is nice for enterprise organizations that want to limit the number of Horizon upgrades they want to apply in a year or require long-term support for core business applications.  I see this being popular in healthcare environments.

Extended Service Branches do not require any additional licensing, and customers will have the option to adopt either the current release cadence or the extended service branch when implementing their environment.


The Just-in-Time Management Platform, or JMP, is a new component of the Horizon Suite.  The intention is to bring together Horizon, Active Directory, App Volumes, and User Environment Manager to provide a single portal for provisioning instant clone desktops, applications, and policies to users.  JMP also brings a new, HTML5 interface to Horizon.

I’m a bit torn on the concept.  I like the idea behind JMP and providing a portal for enabling user self-provisioning.  But I’m not sure building that portal into Horizon is the right place for it.  A lot of organizations use Active Directory Groups as their management layer for Horizon Desktop Pools and App Volumes.  There is a good reason for doing it this way.  It’s easy to audit who has desktop or application access, and there are a number of ways to easily generate reports on Active Directory Group membership.

Many customers that I talk to are also attempting to standardize their IT processes around an ITSM platform that includes a Service Catalog.  The most common one I run across is ServiceNow.  The customers that I’ve talked to that want to implement self-service provisioning of virtual desktops and applications often want to do it in the context of their service catalog and approval workflows.

It’s not clear right now if JMP will include an API that will allow customers to integrate it with an existing service catalog or service desk tool.  If it does include an API, then I see it being an important part of automated, self-service end-user computing solutions.  If it doesn’t, then it will likely be another yet-another-user-interface, and the development cycles would have been better spent on improving the Horizon and App Volumes APIs.

Not every customer will be utilizing a service catalog, ITSM tool and orchestration. For those customers, JMP could be an important way to streamline IT operations around virtual desktops and applications and provide them some benefits of automation.

Instant Clone Enhancements

The release of vSphere 6.7 brought with it new Instant Clone APIs.  The new APIs bring features to VMFork that seem new to pure vSphere Admins but have been available to Horizon for some time such as vMotion.  The new APIs are why Horizon 7.4 does not support vSphere 6.7 for Instant Clone desktops.

Horizon 7.5 will support the new vSphere 6.7 Instant Clone APIs.  It is also backward compatible with the existing vSphere 6.0 and 6.5 Instant Clone APIs.

There are some other enhancements coming to Instant Clones as well.  Instant Clones will now support vSGA and Soft3D.  These settings can be configured in the parent image.  And if you’re an NVIDIA vGPU customer, more than one vGPU profile will be supported per cluster when GPU Consolidation is turned on.  NVIDIA GRID can only run a single profile per discrete GPU, so this feature will be great for customers that have Maxwell-series boards, especially the Tesla M10 high-density board that has four discrete GPUs.  However, I’m not sure how beneficial it will be with customer that adopt Pascal-series or Volta-series Tesla cards as these only have a single discrete GPU per board.   There may be some additional design considerations that need to be worked out.

Finally, there is one new Instant Clone feature for VSAN customers.  Before I explain the feature, I can to explain how Horizon utilizes VMFork and Instant Clone technology.  Horizon doesn’t just utilize VMFork – it adds it’s own layers of management on top of it to overcome the limitations of the first generation technology.  This is how Horizon was able to support Instant Clone vMotion when the standard VMFork could not.

This additional layer of management also allows VMware to do other cool things with Horizon Instant Clones without having to make major changes to the underlying platform.  One of the new features that is coming in Horizon 7.5 for VSAN customers is the ability to use Instant Clones across cluster boundaries.

For those who aren’t familiar with VSAN, it is VMware’s software-defined storage product.  The storage boundary for VSAN aligns with the ESXi cluster, so I’m not able to stretch a VSAN datastore between vSphere clusters.  So if I’m running a large EUC environment using VSAN, I may need multiple clusters to meet the needs of my user base.  And unlike 3-tier storage, I can’t share VSAN datastores between clusters.  Under the current setup in Horizon 7.4, I would need to have a copy of my gold/master/parent image in each cluster.

Due to some changes made in Horizon 7.5, I can now share an Instant Clone gold/master/parent image across VSAN clusters without having to make a copy of it in each cluster first.  I don’t have too many specific details on how this will work, but it could significantly reduce the management burden of large, multi-cluster Horizon environments on VSAN.

Blast Extreme Enhancements

The addition of Blast Extreme Adaptive Transport, or BEAT as it’s commonly known, provided an enhanced session remoting experience when using Blast Extreme.  It also required users and administrators to configure which transport they wanted to use in the client, and this could lead to less than optimal user experience for users who frequently moved between locations with good and bad connectivity.

Horizon 7.5 adds some automation and intelligence to BEAT with a feature called Blast Extreme Network Intelligence.  NI will evaluate network conditions on the client side and automatically choose the correct Blast Extreme transport to use.  Users will no longer have to make that choice or make changes in the client.  As a result, the Excellent, Typical, and Poor options are being removed from future versions of the Horizon client.

Another major enhancment coming to Blast Extreme is USB Redirection Port Consolidation.  Currently, USB redirection utilizes a side channel that requires an additional port to be opened in any external-facing firewalls.  Starting in Horizon 7.5, customers will have the option to utilize USB redirection over ports 443/8443 instead of the side channel.

Performance Tracker

The last item I want to cover in this post is Performance Tracker.  Performance Tracker is a tool that Pat Lee demonstrated at VMworld last year, and it is a tool to present session performance metrics to end users.  It supports both Blast Extreme and PCoIP, and it provides information such as session latency, frames per second, Blast Extreme transport type, and help with troubleshooting connectivity issues between the Horizon Agent and the Horizon Client.

Part 2

As you can see, there is a lot of new stuff in Horizon 7.5.  We’ve hit 1900 words in this post just talking about what’s new in Horizon.  We haven’t touched on client improvements, Horizon Cloud, App Volumes, UEM or Workspace One Intelligence yet.  So we’ll have to break those announcements into another post that will be coming in the next day or two.

by seanpmassey at May 08, 2018 02:09 PM

Everything Sysadmin

SO (my employer) is hiring a Windows SRE/sysadmin in NY/NJ

Come work with Stack Overflow's SRE team!

We're looking for a Windows system administrator / SRE to join our SRE team at Stack Overflow. (The downside is that I'll be your manager... ha ha ha). Anyway... the full job description is here:

A quick and unofficial FAQ:

Q: NYC/NJ? I thought Stack was an "remote first" company! Whudup with that?

A: While most of the SRE team works remotely, we like to have a few team members near each of our datacenters (Jersey City, NJ and Denver, CO). You won't be spending hours each week pulling cables, I promise you. In fact, we use remote KVMs, and a "remote hands" service for most things. Heck, a lot of our new products are running in "the cloud" (and probably more over time). That said, it's good to have 1-2 people within easy travel distance of the datacenters for emergencies.

Q: Can I work from home?

A: Absolutely. You can work from home (we'll ship you a nice desk, chair and other great stuff) or you can work from our NYC office (see the job advert for a list of perks). Either way, you will need to be able to get to the Jersey City, NJ data center in a reasonable amount of time (like... an hour).

Q: Wait... Windows?

A: Yup. We're a mixed Windows and Linux environment. We're doing a lot of cutting edge stuff with Windows. We were early adopters of PowerShell (if you love PowerShell, definitely apply!) and DSC and a number of other technologies. Microsoft's containers is starting to look good too (hint, hint).

Q: You mentioned another datacenter in Denver, CO. What if I live near there?

A: This position is designated as "NY/NJ". However, watch this space. Or, if you are impatient, contact me and I'll loop you in.

Q: Where do I get more info? How do I apply?


by Tom Limoncelli at May 08, 2018 02:02 AM

May 07, 2018


Trying Splunk Cloud

I first used Splunk over ten years ago, but the first time I blogged about it was in 2008. I described how to install Splunk on Ubuntu 8.04. Today I decided to try the Splunk Cloud.

Splunk Cloud is the company's hosted Splunk offering, residing in Amazon Web Services (AWS). You can register for a 15 day free trial of Splunk Cloud that will index 5 GB per day.

If you would like to follow along, you will need a computer with a Web browser to interact with Splunk Cloud. (There may be ways to interact via API, but I do not cover that here.)

I will collect logs from a virtual machine running Debian 9, inside Oracle VirtualBox.

First I registered for the free Splunk Cloud trial online.

After I had a Splunk Cloud instance running, I consulted the documentation for Forward data to Splunk Cloud from Linux. I am running a "self-serviced" instance and not a "managed instance," i.e., I am the administrator in this situation.

I learned that I needed to install a software package called the Splunk Universal Forwarder on my Linux VM.

I downloaded a 64 bit Linux 2.6+ kernel .deb file to the /home/Downloads directory on the Linux VM.

richard@debian:~$ cd Downloads/

richard@debian:~/Downloads$ ls


With elevation permissions I created a directory for the .deb, changed into the directory, and installed the .deb using dpkg.

richard@debian:~/Downloads$ sudo bash
[sudo] password for richard: 

root@debian:/home/richard/Downloads# mkdir /opt/splunkforwarder

root@debian:/home/richard/Downloads# mv splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb /opt/splunkforwarder/

root@debian:/home/richard/Downloads# cd /opt/splunkforwarder/

root@debian:/opt/splunkforwarder# ls


root@debian:/opt/splunkforwarder# dpkg -i splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb 

Selecting previously unselected package splunkforwarder.
(Reading database ... 141030 files and directories currently installed.)
Preparing to unpack splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb ...
Unpacking splunkforwarder (7.1.0) ...
Setting up splunkforwarder (7.1.0) ...

root@debian:/opt/splunkforwarder# ls
bin        license-eula.txt
copyright.txt  openssl
etc        README-splunk.txt
ftr        share
include        splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-amd64.deb
lib        splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest

Next I changed into the bin directory, ran the splunk binary, and accepted the EULA.

root@debian:/opt/splunkforwarder# cd bin/

root@debian:/opt/splunkforwarder/bin# ls

btool   copyright.txt   openssl slim   splunkmon
btprobe splunk   srm
bzip2  scripts splunkd
classify   setSplunkEnv splunkdj

root@debian:/opt/splunkforwarder/bin# ./splunk start




Splunk Software License Agreement 04.24.2018

Do you agree with this license? [y/n]: y

Now I had to set an administrator password for this Universal Forwarder instance. I will refer to it as "mypassword" in the examples that follow although Splunk does not echo it to the screen below.

This appears to be your first time running this version of Splunk.

An Admin password must be set before installation proceeds.
Password must contain at least:
   * 8 total printable ASCII character(s).
Please enter a new password: 
Please confirm new password: 

Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Creating: /opt/splunkforwarder/var/lib/splunk
Creating: /opt/splunkforwarder/var/run/splunk
Creating: /opt/splunkforwarder/var/run/splunk/appserver/i18n
Creating: /opt/splunkforwarder/var/run/splunk/appserver/modules/static/css
Creating: /opt/splunkforwarder/var/run/splunk/upload
Creating: /opt/splunkforwarder/var/spool/splunk
Creating: /opt/splunkforwarder/var/spool/dirmoncache
Creating: /opt/splunkforwarder/var/lib/splunk/authDb
Creating: /opt/splunkforwarder/var/lib/splunk/hashDb
New certs have been generated in '/opt/splunkforwarder/etc/auth'.
Checking conf files for problems...
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  

With that done, I had to return to the Splunk Cloud Web site, and click the link to "Download Universal Forwarder Credentials" to download a splunkclouduf.spl file. As noted in the documentation, splunkclouduf.spl is a "credentials file, which contains a custom certificate for your Splunk Cloud deployment. The universal forwarder credentials are different from the credentials that you use to log into Splunk Cloud."

After downloading the splunkclouduf.spl file, I installed it. Note I pass "admin" as the user and "mypassword" as the password here. After installing I restart the universal forwarder.

root@debian:/opt/splunkforwarder/bin# ./splunk install app /home/richard/Downloads/splunkclouduf.spl -auth admin:mypassword

App '/home/richard/Downloads/splunkclouduf.spl' installed 

root@debian:/opt/splunkforwarder/bin# ./splunk restart
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
Stopping splunk helpers...


Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  

It's time to take the final steps to get data into Splunk Cloud. I need to forwarder management in the Splunk Cloud Web site. Observe the in the command. You obtain this (mine is masked with XXXX) from the URL for your Splunk Cloud deployment, e.g., Note that you have to add "input-" before the fully qualified domain name used by the Splunk Cloud instance.

root@debian:/opt/splunkforwarder/bin# ./splunk set deploy-poll

Your session is invalid.  Please login.
Splunk username: admin
Configuration updated.

Once again I restart the universal forwarder. I'm not sure if I could have done all these restarts at the end.

root@debian:/opt/splunkforwarder/bin# ./splunk restart
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
Stopping splunk helpers...


Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  

Finally I need to tell the universal forwarder to watch some logs on this Linux system. I tell it to monitor the /var/log directory and restart one more time.

root@debian:/opt/splunkforwarder/bin# ./splunk add monitor /var/log
Your session is invalid.  Please login.
Splunk username: admin
Added monitor of '/var/log'.

root@debian:/opt/splunkforwarder/bin# ./splunk restart

Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
Stopping splunk helpers...


Splunk> Map. Reduce. Recycle.

Checking prerequisites...
Checking mgmt port [8089]: open
Checking conf files for problems...
Checking default conf files for edits...
Validating installed files against hashes from '/opt/splunkforwarder/splunkforwarder-7.1.0-2e75b3406c5b-linux-2.6-x86_64-manifest'
All installed files intact.
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  

At this point I return to the Splunk Cloud Web interface and click the "search" feature. I see Splunk is indexing some data.

I run a search for "host=debian" and find my logs.

Not too bad! Have you tried Splunk Cloud? What do you think? Leave me a comment below.

Update: I installed the Universal Forwarder on FreeBSD 11.1 using the method above (except with a FreeBSD .tgz) and everything seems to be working!

by Richard Bejtlich ( at May 07, 2018 06:26 PM


Systemd dependencies

There is a lot of hate around Systemd in unixy circles. Like, a lot. There are many reasons for this, a short list:

  • For some reason they felt the need to reimplement daemons that have existed for years. And are finding the same kinds of bugs those older daemons found and squashed over a decade ago.
    • I'm looking at you Time-sync and DNS resolver.
  • It takes away an init system that everyone knows and is well documented in both the official documentation sense, and the unofficial 'millions of blog-posts' sense. Blog posts like this one.
  • It has so many incomprehensible edge-cases that make reasoning about the system even harder.
  • The maintainers are steely-eyed fundamentalists who know exactly how they want everything.
  • Because it runs so many things in parallel, bugs we've never had to worry about are now impossible to ignore.

So much hate. Having spent the last few weeks doing a sysv -> systemd migration, I've found another reason for that hate. And it's one I'm familiar with because I've spent so many years in the Puppet ecosystem.

People love to hate on puppet because of the wacky non-deterministic bugs. The order resources are declared in a module is not the order in which they are applied. Puppet uses a dependency model to determine the order of things, which leads to weird bugs where a thing has worked for two weeks but suddenly stops working that way because a new change was made somewhere that changed the order of resource-application. A large part of why people like Chef over Puppet is because Chef behaves like a scripting language, where the order of the file is the order things are done in.

Guess what? Systemd uses the Puppet model of dependency! This is why its hard to reason. And why I, someone who has been handling these kinds of problems for years, haven't spent much time shaking my tiny fist at an uncaring universe. There has been swearing, oh yes. But of a somewhat different sort.

The Puppet Model

Puppet has two kinds of dependency. Strict ordering, and do this if that other thing does something. Which makes for four ways of linking resources.

  • requires => Do this after this other thing.
  • before => Do this before this other thing.
  • subscribes => Do this after this other thing, but only if this other thing changes something.
  • notifies => Do this before this other thing, and tell it you changed something.

This makes for some real power, while also making the system hard to reason about.

Thing is, systemd goes a step further

The Systemd Model

Systemd also has dependencies, but it was also designed to run as much in parallel as possible. Puppet was written in Ruby, so has a strong single-threaded tendencies. Systemd is multi-threaded. Multi-threaded systems are harder to reason about in general. Add on dependency ordering to multi-threaded issues and you get a sheer cliff of learning before you can have a hope of following along. Even better (worse), systemd has more ways of defining relationships.

  • Before= This unit needs to get all the way done before the named units are even started. And, the named units only get started if this unit finishes successfully.
  • After= This unit only gets started if the named units run to completion first, successfully.
  • Requires= The named units will get started if this one is, and do so at the same time. Not only that, but if the named units are explicitly stopped, this one will be stopped as well. For puppet-heads, this breaks things since this works backwards.
  • BindsTo= Does everything Requires does, but will also stop this unit if the named unit stops for any reason, not just explicit stops.
  • Wants= Like Require, but less picky. The named units will get started, but not care if they can't start or end up failing.
  • Requisite= Like Require, but will fail immediately if the named services aren't started yet. Think of mount units not starting unless the device unit is already started.
  • Conflicts= A negative dependency. Turn this unit off if the named unit is started. And turn this other unit off if this unit is started.

There are several more I'm not going into. This is a lot, and some of these work independently. The documentation even says:

It is a common pattern to include a unit name in both the After= and Requires= options, in which case the unit listed will be started before the unit that is configured with these options.

Using both After and Requires means that the named units need to get all the way done (After=) before this unit is started. And if this unit is started, the named units need to get started as well (Require=).

Hence, in many cases it is best to combine BindsTo= with After=.

Using both configures a hard dependency relationship. After= means the other unit needs to be all the way started before this one is started. BindsTo= makes it so that this unit is only ever in an active state when the unit named in both BindsTo= and After= is in an active state. If that other unit fails or goes inactive, this one will as well.

There is also a concept missing from Puppet, and that's when the dependency fires. After/Before are trailing-edge triggers, they fire on completion, which is how Puppet works. Most of the rest are leading-edge triggered, where the dependency is satisfied as soon as the named units start. This is how you get parallelism in an init-system, and why the weirder dependencies are often combined with either Before or After.

Systemd hate will continue for the next 10 or so years, at least long enough for most Linux engineers to have been working with it to stop grumbling about how nice the olden days were.

It also means that fewer people will be writing startup services due to the complexity of doing anything other than 'start this after this other thing' ordering.

by SysAdmin1138 at May 07, 2018 06:20 PM

May 06, 2018

Everything Sysadmin

Launched: Stack Overflow for Teams!

I usually don't use my blog to plug my employer but I'm very excited about Stack Overflow's new "Stack Overflow for Teams" launch this week.

How would you like a private Stack Overflow area for your team? Stack Overflow for Teams allows teams of any size to use the Stack Overflow that they already know and love but for all their proprietary information - creating a special private space just for them on It uses the same Q&A format, collaborative editing, and even recognition systems to solve the massive knowledge sharing issues that all teams have.

More info is on the overview page and our blog post, plus we got great press about it on VentureBeat and GeekWire.

The launch happened without a hitch. I'm very proud of our SRE team for all their work. They basically built the equivalent of our existing infrastructure two times over (and more if you count dev/test environments). They helped design the security and other aspects of the new service's infrastructure. It's been an impressive 9 months that has radically changed how we work. Props to everyone on the team! I'm so proud!

by Tom Limoncelli at May 06, 2018 08:40 PM

May 02, 2018

DNS Spy now checks for the “Null MX”

The post DNS Spy now checks for the “Null MX” appeared first on

A small but useful addition to the scoring system of DNS Spy: support for the Null MX record.

Internet mail determines the address of a receiving server through the DNS, first by looking for an MX record and then by looking for an A/AAAA record as a fallback.

Unfortunately, this means that the A/AAAA record is taken to be mail server address even when that address does not accept mail.

The No Service MX RR, informally called "null MX", formalizes the existing mechanism by which a domain announces that it accepts no mail, without having to provide a mail server; this permits significant operational efficiencies.

Source: RFC 7505 -- A "Null MX" No Service Resource Record for Domains That Accept No Mail

Give it a try at the DNS Spy Scan page.

The post DNS Spy now checks for the “Null MX” appeared first on

by Mattias Geniar at May 02, 2018 05:45 PM

Everything Sysadmin

SRECon: Operational Excellence in April Fools' Pranks

The video of my SRECon talk is finally available!

"Operational Excellence in April Fools' Pranks: Being Funny Is Serious Work!" at SREcon18 Americas is about mitigating the risk of "high stakes" launches.

The microphones didn't pick up the audience reaction. As a result it looks like I keep pausing for no reason, but really I'm waiting for the laughter to calm down. Really! (Really!)

On a personal note, I'd like to thank the co-chairs of SRECon for putting together such an excellent conference. This was my first time being the last speaker at a national conference, which was quite a thrill.

I look forward to SRECon next year in Brooklyn!

by Tom Limoncelli at May 02, 2018 03:00 PM

May 01, 2018

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – April 2018

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version)
  3. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
  4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  5. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 7X of the traffic of this blog]: 

Critical reference posts:
Current research on testing security:
Current research on threat detection “starter kit”
Just finished research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin ( at May 01, 2018 06:11 PM

A short security review of Bitwarden

Bitwarden is an open source online password manager:

The easiest and safest way for individuals, teams, and business organizations to store, share, and sync sensitive data.

Bitwarden offers both a cloud hosted and on-premise version. Some notes on the scope of this blog post and disclaimers:

  • I only looked at the cloud hosted version.
  • This security review is not exhaustive, I only took about a few minutes to review various things.
  • I'm not a security researcher, just a paranoid enthusiast. If you find anything wrong with this blog post, please contact me at ferry DOT boender (AT) gmaildotcom.

Here are my findings:

Encryption password sent over the wire

There appears to be no distinction between the authentication password and encryption password.

When logging in, the following HTTP POST is made to Bitwarden's server:

client_id: web
grant_type: password
password: xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk=
scope: api offline_access

That's a base64 encoded password. (Don't worry, I anonymized all secrets in this post, besides, it's all throw-away passwords anyway). Lets see what it contains:

>>> import base64
>>> base64.b64decode('xFSJdHvKcrYQA0KAgOlhxBB3Bpsuanc7bZIKTpskiWk=')

Okay, at least that's not my plain text password. It is encoded, hashed or encrypted somehow, but I'm not sure how. Still, it makes me nervous that my password is being sent over the wire. The master password used for encryption should never leave a device, in any form. I would have expected two password here perhaps. One for authentication and one for encryption.

The reason it was implemented this way is probably because of the "Organizations" feature, which lets you share passwords with other people. Sharing secrets among people is probably hard to do in a secure way. I'm no cryptography expert, but there are probably ways to do this more securely using asymmetric encryption (public and private keys), which Bitwarden doesn't appear to be using.

Bitwarden has a FAQ entry about its use of encryption, which claims that passwords are never sent over the wire unencrypted or unhashed:

Bitwarden always encrypts and/or hashes your data on your local device before it is ever sent to the cloud servers for syncing. The Bitwarden servers are only used for storing encrypted data. It is not possible to get your unencrypted data from the Bitwarden cloud servers.

The FAQ entry on hashing is also relevant:

Bitwarden salts and hashes your master password with your email address on the client (your computer/device) before it is transmitted to our servers. Once the server receives the hashed password from your computer/device it is then salted again with a cryptographically secure random value, hashed again and stored in our database. This process is repeated and hashes are compared every time you log in.

The hashing functions that are used are one way hashes. This means that they cannot be reverse engineered by anyone at Bitwarden to reveal your true master password. In the hypothetical event that the Bitwarden servers were hacked and your data was leaked, the data would have no value to the hacker.

However, there's a major caveat here which they don't mention. All of the encryption is done client-side by Javascript loaded from various servers and CDNs. This means that an attacker who gains control over any of these servers (or man-in-the-middle's them somehow) can inject any javascript they like, and obtain your password that way.

Indiscriminate allowance / loading of external resources

The good news is that Bitwarden uses Content-Security-Policy. The bad news is that it allows the loading of resources from a variety of untrusted sources. uMatrix shows the type of resources it's trying to load from various sources:

Here's what the Content-Security-Policy looks like:


Roughly translated, it allows indiscriminate loading and executing of scripts, css, web workers (background threads) and inclusion of framed content from a wide variety of untrusted sources such as CDNs, Paypal, Duosecurity, Braintreegateway, Google, etc. Some of these I know, some I don't. Trust I have in none of them.

It would take too long to explain why this is a bad idea, but the gist of it is that the more resources you load and allow from different sources, the bigger the attack surface becomes. Perhaps these are perfectly secure (right now…), but an import part of security is the developers' security mindset. Some of these resources could have easily been hosted on the same origin servers. Some of these resources should only be allowed to run from payment pages. It shows sloppy configuration of the Content-Security-Policy, namely site-wide configuration in the web server (probably) rather than being determined on an URL by URL basis.

The actual client-side encryption library is loaded from, which is good. However, the (possibility of) inclusion of scripts from other sources negates any security benefits of doing so.

The inclusion of Google analytics in a password manager is, in my opinion, inexcusable. It's not required functionality for the application, so it shouldn't be in there.

New password entry is sent securely

When adding a new authentication entry, the entry appears to be client-side encrypted in some way before sending it to the server:

    "name": "2.eD4fFLYUWmM6sgVDSA9pTg==|SNzQjLitpA5K+6qrBwC7jw==|DlfVCnVdZA9+3oLej4FHSQwwdo/CbmHkL2TuwnfXAoI=", 
    "organizationId": null, 
    "fields": null, 
    "notes": null, 
    "favorite": false, 
    "login": {
        "username": null, 
        "password": "2.o4IO/yzz6syip4UEaU4QpA==|LbCyLjAOHa3m2wopsqayYK9O7Q5aqnR8nltUgylwSOo=|6ajVAh0r9OaBs+NgLKrTd+j3LdBLKBUbs/q8SE6XvUE=", 
        "totp": null
    "folderId": null, 
    "type": 1

It's base64 again, and decodes into the same obscure binary string as the password when logging in. I have not spent time looking at how exactly the encoding / encryption is happening, so I cannot claim that this is actually secure. So keep that in mind. It does give credence to Bitwarden's claims that all sensitive data is encrypted client-side before sending it to the server.

Disclosure of my email address to a third part without my consent

I clicked on the "Data breach report" link on the left, and Bitwarden immediately sent my email address to No confirmation, no nothing; it was disclosed to a third party immediately. Well, actually, since I use uMatrix to firewall my browser, it wasn't and I had to explicitly allow it to do so, but even most security nerds don't use uMatrix. 

That's not cool. Don't disclose my info to third parties without my consent.

Developer mindset

One of, if not the, most important aspects is the developer mindset. That is, do they care about security and are they knowledgeable in the field?

Bitwarden appears to know what they're doing. They have a security policy and run a bug bounty program. Security incidents appear to be solved quickly. I'd like to see more documentation on how the encryption, transfer and storage of secrets works. Right now, there are some FAQ entries, but it's all promisses that give me no insight into where and how the applied security might break down.

One thing that bothers me is that they do not disclose any of the security trade-offs they made and how it impacts the security of your secrets. I'm always weary when claims of perfect security are made, whether explicitely, or by omission of information. There are obvious problems with client-side javascript encryption, which every developer and user with an reasonable understanding of web developers recognises. No mention of this is made. Instead, security concerns are waved away with "everything is encrypted on your device!". That's nice, but if attackers can control the code that does the encryption, all is lost.

Please note that I'm not saying that client-side javascript encryption is a bad decision! It's a perfectly reasonable trade-off between the convenience of being able to access your secrets on all your devices and a more secure way of managing your passwords. However, this trade-off should be disclosed prominently to users.


So, is Bitwarden (Cloud) secure and should you use it? Unfortunately, I can't give you any advice. It all depends on your requirements. All security is a tradeoff between usability, convenience and security.

I did this review because my organisation is looking into a self-hosted Open Source password manager to manage our organisation's secrets. Would I use this to keep my personal passwords in? The answer is: no. I use an offline Keepass, which I manually sync from my laptop to my phone every now and then. This is still the most secure way of managing passwords that I do not need to share with anyone. However, that's not the use-case that I reviewed Bitwarden for. So would I use it to manage our organisation's secrets? Perhaps, the jury is still out on that. I'll need to look at the self-hosted version to see if it also includes Javascript from unreliable sources. If so, I'd have to say that, no, I would not recommend Bitwarden.

by admin at May 01, 2018 07:06 AM

April 30, 2018

Certificate Transparency logging now mandatory

The post Certificate Transparency logging now mandatory appeared first on

All certificates are now required to be logged in publicly available logs (aka "Certificate Transparency").

Since January 2015, Chrome has required that Extended Validation (EV) certificates be CT-compliant in order to receive EV status.

In April 2018, this requirement will be extended to all newly-issued publicly-trusted certificates -- DV, OV, and EV -- and certificates failing to comply with this policy will not be recognized as trusted when evaluated by Chrome.

Source: Certificate Transparency Enforcement in Google Chrome -- Google Groups

In other words: if Chrome encounters a certificate, issued after April 2018, that isn't signed by a Certificate Transparency log, the certificate will be marked as insecure.

Don't want to have this happen to you out of the blue? Monitor your sites and their certificate health via Oh Dear!.

The post Certificate Transparency logging now mandatory appeared first on

by Mattias Geniar at April 30, 2018 01:42 PM

April 28, 2018


Edge Web Server Testing at Swiftype

This article has been originally posted on Swiftype Engineering blog.

For any modern technology company, a comprehensive application test suite is an absolute necessity. Automated testing suites allow developers to move faster while avoiding any loss of code quality or system stability. Software development has seen great benefit come from the adoption of automated testing frameworks and methodologies, however, the culture of automated testing has neglected one key area of modern web application serving stack: web application edge routing and multiplexing rulesets.

From modern load balancer appliances that allow for TCL based rule sets; local or remotely hosted varnish VCL rules; or in the power and flexibility that Nginx and OpenResty make available through LUA, edge routing rulesets have become a vital part of application serving controls.

Over the past decade or so, it has become possible to incorporate more and more logic into edge web server infrastructures. Almost every modern web server has support for scripting, enabling developers to make their edge servers smarter than ever before. Unfortunately, the application logic configured within web servers is often much harder to test than that hosted directly in application code, and thus too often software teams resort to manual testing, or worse, customers as testers, by shipping their changes to production without edge routing testing having been performed.

In this post, I would like to explain the approach Swiftype has taken to ensure that our test suites account for our use of complex edge web server logic
to manage our production traffic flow, and thus that we can confidently deploy changes to our application infrastructure with little or no risk.

Our Web Infrastructure

Before I go into details of our edge web server configuration testing, it may be helpful to share an overview of the infrastructure behind our web services and applications.

Swiftype has evolved from a relatively simple Rails monolith and is still largely powered by a set of Ruby applications served by Unicorn application servers. To balance traffic between the multitude of application instances, we use Haproxy (mainly for its observability features and the fair load balancing implementation). Finally, there is an OpenResty (nginx+lua) layer at the edge of our infrastructure that is responsible for many key functions: SSL termination and enforcement, rate limiting, as well as providing flexible traffic management and routing functionality (written in Lua) customized specifically for the Swiftype API.

Here is a simple diagram of our web application infrastructure:

Swiftype web infrastructure overview

Testing Edge Web Servers

Swiftype’s edge web server configuration contains thousands of lines of code: from Nginx configs to custom templates rendered during deployment, to complex Lua logic used to manage production API traffic.Any mistake in this configuration, if not caught in testing, could lead to an outage at our edge, and considering that 100% of our API traffic is served through this layer, any outage at the edge is likely to be very impactful to our customers and our business. This is why we have invested time and resources to build a system that allows us to test our edge configuration changes in development and on CI before they are deployed to production systems.

Testing Workflow Overview

The first step in safely introducing change is ensuring that development and testing environments are quarantined from production environments. To do this we have created an “isolated” runtime mode for our edge web server stack. All changes to our edge configurations are first developed and run in this “isolated” mode. The “isolated” mode has no references to production backend infrastructure, and thus by employing the “isolated” mode, developers are able to iterate very quickly in a local environment without fear of harmful repercussions. All tests are written to run as part of the “isolated” mode employ a mock server to emulate production backends and primarily focus on the unit-testing of specific new features that are being implemented.

When we are confident enough in our unit-tested set of changes, we could run the same set of tests in an “acceptance testing” mode when the mock server used in isolated tests is replaced with an Haproxy load balancer with access to production networks. Working on tests and running them in this mode allows us to ensure with the highest degree of certainty that our changes will work in a real production environment since we exercise our whole stack while running the test suite.

Testing Environment Overview

Our testing environment employs Docker containers to serve in place of our production web servers. The test environment is comprised of the following components:

  • A loopback network interface on which a full complement of production IPs are configured to account for every service we are planning to test (e.g. a service pointing to an IP address 10.1.0.x in production is tested in a local “isolated” testing environment with IP 10.1.0.x assigned to an alias on the local loopback interface). This allows us to perform end-to-end testing: DNS resolution, TCP service connections to a specific IP address, etc. without needing access to production, nor local /etc/hosts or name resolution changes.
  • For use cases where we are testing changes that are not represented in DNS (for example, when preparing edge servers for serving traffic currently handled by a different service), we may still employ local /etc/hosts entries to point the DNS name for a service to a local IP address for the period of testing. In this scenario, we ensure that our tests have been written in a way that is independent of the DNS configuration, and thus that the tests can be reused at a later date, or when the configuration has been deployed to production.
  • An OpenResty server instance with the configuration we need to test.
  • A test runner process (based on RSpec and a custom framework for writing our tests).
  • An optional Mock server. (As noted above, this might be docker in a local test environment, or in CI, and is likely to be used as part of the test runner process, where it emulates an external application/service; serves in place of a production backends; or acts as a local Haproxy instance running a production configuration and may even route traffic to real production backends.

Isolated Testing Walkthrough

Here is how a test for a hypothetical service (registered in DNS as is performed in an isolated environment:

  1. We automatically assign as an alias on a loopback interface.
  2. We start a mock server listening on the localhost configured to respond on the same port used by the Nginx server backend (in production, there would be haproxy on that port) with a specific stub response.
  3. Our test performs a DNS resolution for, receives 10.1.0.x as the IP of the service, connects to the local Nginx instance listening on 10.1.0.x (bound to a loopback interface) and performs a test call.
  4. Nginx, receiving the test request, performs all configured operations and forwards the request to a backend, which in this case is handled by the local mock server. The call result is then returned by Nginx to the test runner.
  5. The test runner performs all defined testing against the server response: These tests can be very thorough, as the test runner has access to the server response code, all headers, and also the response body, and can thus confirm that all data returned meets each test’s specifications before concluding if the process as a whole has passed or failed test validation.
  6. Specific to isolated testing: In some use cases, we may validate the state of the Mock server, verifying that it has received all call we expected it to receive and that each call represented the data and headers expected. This can be very useful for testing changes where our web layer has been configured to alter requests (rewrite, add or remove headers, etc.) prior to passing them to a given backend.

Here is a diagram illustrating a test running in an isolated environment:

An isolated testing environment

Acceptance Testing Walkthrough

When all of our tests have passed in our “isolated” environment, and we want to make sure our configurations work in a non-mock, physically “production-like” environment (or during our periodic acceptance test runs that must also run in a production mirroring environment), we use an “acceptance testing” mode. In this mode, we replace our mock server with a real production Haproxy load balancer instance talking to real production backends (or a subset of backends representing a real production application).

Here is what happens during an acceptance test for the same hypothetical service (registered in DNS as

  1. We automatically assign as an alias on a loopback interface.
  2. We start a dedicated production Haproxy instance, with a configuration pointing to production backend applications, and bind this dedicated haproxy instance to localhost. (This exactly mirrors what we do in production, where haproxy is always a dedicated localhost service).
  3. Our test performs DNS resolution for, receives 10.1.0.x as the IP of the service, connects to a local Nginx instance listening on 10.1.0.x (bound to a loopback interface), and performs a test call.
  4. Nginx, receiving a test request, performs whatever operations are defined and forwards it to a local Haproxy backend, which in turn sends the request to a production application instance. When a call is complete, the result is returned by Nginx to the test runner.
  5. The test runner performs all defined checks on the response and defines whether the call and response are identified as passing or failing the test.

Here is a diagram illustrating a test call made in an acceptance testing environment:

A test call within the acceptance testing environment


Using our edge web server testing framework for the past few years, we have been able to perform hundreds of high-risk changes in our production edge infrastructure without any significant incidents being caused by the deploying of an untested configuration update. Our testing framework provides us the assurances we need, such that we can make very dramatic changes to our web application edge routing (services that affect every production request) and that we can be confident in our ability to introduce these changes safely.

We highly recommend that every engineering team tasked with building or operating complex edge server configurations adopt some level of testing that allows the team to iterate faster without fear of compromising these critical components.

by Oleksiy Kovyrin at April 28, 2018 08:38 PM

April 26, 2018

Cryptography Engineering

A few thoughts on Ray Ozzie’s “Clear” Proposal

Yesterday I happened upon a Wired piece by Steven Levy that covers Ray Ozzie’s proposal for “CLEAR”. I’m quoted at the end of the piece (saying nothing much), so I knew the piece was coming. But since many of the things I said to Levy were fairly skeptical — and most didn’t make it into the piece — I figured it might be worthwhile to say a few of them here.

Ozzie’s proposal is effectively a key escrow system for encrypted phones. It’s receiving attention now due to the fact that Ozzie has a stellar reputation in the industry, and due to the fact that it’s been lauded by law enforcement (and some famous people like Bill Gates). Ozzie’s idea is the just the latest bit of news in this second edition of the “Crypto Wars”, in which the FBI and various law enforcement agencies have been arguing for access to end-to-end encryption technologies — like phone storage and messaging — in the face of pretty strenuous opposition by (most of) the tech community.

In this post I’m going to sketch a few thoughts about Ozzie’s proposal, and about the debate in general. Since this is a cryptography blog, I’m mainly going to stick to the technical, and avoid the policy details (which are substantial). Also, since the full details of Ozzie’s proposal aren’t yet public — some are explained in the Levy piece and some in this patent — please forgive me if I get a few details wrong. I’ll gladly correct.

[Note: I’ve updated this post in several places in response to some feedback from Ray Ozzie. For the updated parts, look for the *. Also, Ozzie has posted some slides about his proposal.]

How to Encrypt a Phone

The Ozzie proposal doesn’t try tackle every form of encrypted data. Instead it focuses like a laser on the simple issue of encrypted phone storage. This is something that law enforcement has been extremely concerned about. It also represents the (relatively) low-hanging fruit of the crypto debate, for essentially two reasons: (1) there are only a few phone hardware manufacturers, and (2) access to an encrypted phone generally only takes place after law enforcement has gained physical access to it.

I’ve written about the details of encrypted phone storage in a couple of previous posts. A quick recap: most phone operating systems encrypt a large fraction of the data stored on your device. They do this using an encryption key that is (typically) derived from the user’s passcode. Many recent phones also strengthen this key by “tangling” it with secrets that are stored within the phone itself — typically with the assistance of a secure processor included in the phone. This further strengthens the device against simple password guessing attacks.

The upshot is that the FBI and local law enforcement have not — until very recently (more on that further below) — been able to obtain access to many of the phones they’ve obtained during investigation. This is due the fact that, by making the encryption key a function of the user’s passcode, manufacturers like Apple have effectively rendered themselves unable to assist law enforcement.

The Ozzie Escrow Proposal

Ozzie’s proposal is called “Clear”, and it’s fairly straightforward. Effectively, it calls for manufacturers (e.g., Apple) to deliberately put themselves back in the loop. To do this, Ozzie proposes a simple form of key escrow (or “passcode escrow”). I’m going to use Apple as our example in this discussion, but obviously the proposal will apply to other manufacturers as well.

Ozzie’s proposal works like this:

  1. Prior to manufacturing a phone, Apple will generate a public and secret “keypair” for some public key encryption scheme. They’ll install the public key into the phone, and keep the secret key in a “vault” where hopefully it will never be needed.
  2. When a user sets a new passcode onto their phone, the phone will encrypt a passcode under the Apple-provided public key. This won’t necessarily be the user’s passcode, but it will be an equivalent passcode that can unlock the phone.* It will store the encrypted result in the phone’s storage.
  3. In the unlikely event that the FBI (or police) obtain the phone and need to access its files, they’ll place the phone into some form of law enforcement recovery mode. Ozzie describes doing this with some special gesture, or “twist”. Alternatively, Ozzie says that Apple itself could do something more complicated, such as performing an interactive challenge/response with the phone in order to verify that it’s in the FBI’s possession.
  4. The phone will now hand the encrypted passcode to law enforcement. (In his patent, Ozzie suggests it might be displayed as a barcode on a screen.)
  5. The law enforcement agency will send this data to Apple, who will do a bunch of checks (to make sure this is a real phone and isn’t in the hands of criminals). Apple will access their secret key vault, and decrypt the passcode. They can then send this back to the FBI.
  6. Once the FBI enters this code, the phone will be “bricked”. Let me be more specific: Ozzie proposes that once activated, a secure chip inside the phone will now permanently “blow” several JTAG fuses monitored by the OS, placing the phone into a locked mode. By reading the value of those fuses as having been blown, the OS will never again overwrite its own storage, will never again talk to any network, and will become effectively unable to operate as a normal phone again.

When put into its essential form, this all seems pretty simple. That’s because it is. In fact, with the exception of the fancy “phone bricking” stuff in step (6), Ozzie’s proposal is a straightforward example of key escrow — a proposal that people have been making in various guises for many years. The devil is always in the details.

A vault of secrets

If we picture how the Ozzie proposal will change things for phone manufacturers, the most obvious new element is the key vault. This is not a metaphor. It literally refers to a giant, ultra-secure vault that will have to be maintained individually by different phone manufacturers. The security of this vault is no laughing matter, because it will ultimately store the master encryption key(s) for every single device that manufacturer ever makes. For Apple alone, that’s about a billion active devices.

Does this vault sound like it might become a target for organized criminals and well-funded foreign intelligence agencies? If it sounds that way to you, then you’ve hit on one of the most challenging problems with deploying key escrow systems at this scale. Centralized key repositories — that can decrypt every phone in the world — are basically a magnet for the sort of attackers you absolutely don’t want to be forced to defend yourself against.

So let’s be clear. Ozzie’s proposal relies fundamentally on the ability of manufacturers to secure extremely valuable key material for a massive number of devices against the strongest and most resourceful attackers on the planet. And not just rich companies like Apple. We’re also talking about the companies that make inexpensive phones and have a thinner profit margin. We’re also talking about many foreign-owned companies like ZTE and Samsung. This is key material that will be subject to near-constant access by the manufacturer’s employees, who will have to access these keys regularly in order to satisfy what may be thousands of law enforcement access requests every month.

If ever a single attacker gains access to that vault and is able to extract, a few “master” secret keys (Ozzie says that these master keys will be relatively small in size*) then the attackers will gain unencrypted access to every device in the world. Even better: if the attackers can do this surreptitiously, you’ll never know they did it.

Now in fairness, this element of Ozzie’s proposal isn’t really new. In fact, this key storage issue an inherent aspect of all massive-scale key escrow proposals. In the general case, the people who argue in favor of such proposals typically make two arguments:

  1. We already store lots of secret keys — for example, software signing keys — and things works out fine. So this isn’t really a new thing.
  2. Hardware Security Modules.

Let’s take these one at a time.

It is certainly true that software manufacturers do store secret keys, with varying degrees of success. For example, many software manufacturers (including Apple) store secret keys that they use to sign software updates. These keys are generally locked up in various ways, and are accessed periodically in order to sign new software. In theory they can be stored in hardened vaults, with biometric access controls (as the vaults Ozzie describes would have to be.)

But this is pretty much where the similarity ends. You don’t have to be a technical genius to recognize that there’s a world of difference between a key that gets accessed once every month — and can be revoked if it’s discovered in the wild —  and a key that may be accessed dozens of times per day and will be effectively undetectable if it’s captured by a sophisticated adversary.

Moreover, signing keys leak all the time. The phenomenon is so common that journalists have given it a name: it’s called “Stuxnet-style code signing”. The name derives from the fact that the Stuxnet malware — the nation-state malware used to sabotage Iran’s nuclear program — was authenticated with valid code signing keys, many of which were (presumably) stolen from various software vendors. This practice hasn’t remained with nation states, unfortunately, and has now become common in retail malware.

The folks who argue in favor of key escrow proposals generally propose that these keys can be stored securely in special devices called Hardware Security Modules (HSMs). Many HSMs are quite solid. They are not magic, however, and they are certainly not up to the threat model that a massive-scale key escrow system would expose them to. Rather than being invulnerable, they continue to cough up vulnerabilities like this one. A single such vulnerability could be game-over for any key escrow system that used it.

In some follow up emails, Ozzie suggests that keys could be “rotated” periodically, ensuring that even after a key compromise the system could renew security eventually. He also emphasizes the security mechanisms (such as biometric access controls) that would be present in such a vault. I think that these are certainly valuable and necessary protections, but I’m not convinced that they would be sufficient.

Assume a secure processor

Let’s suppose for a second that an attacker does get access to the Apple (or Samsung, or ZTE) key vault. In the section above I addressed the likelihood of such an attack. Now let’s talk about the impact.

Ozzie’s proposal has one significant countermeasure against an attacker who wants to use these stolen keys to illegally spy on (access) your phone. Specifically, should an attacker attempt to illegally access your phone, the phone will be effectively destroyed. This doesn’t protect you from having your files read — that horse has fled the stable — but it should alert you to the fact that something fishy is going on. This is better than nothing.

This measure is pretty important, not only because it protects you against evil maid attacks. As far as I can tell, this protection is pretty much the only measure by which theft of the master decryption keys might ever be detected. So it had better work well.

The details on how this might work aren’t very clear in Ozzie’s patent, but the Wired article describes it as follows. This quote to repeat Ozzie’s presentation at Columbia University:

What Ozzie appears to describe here is a secure processor contained within every phone. This processor would be capable if securely and irreversibly enforcing that once law enforcement has accessed a phone, that phone could no longer be placed into an operational state.

My concern with this part of Ozzie’s proposal is fairly simple: this processor does not currently exist. To explain why this, let me tell a story.

Back in 2013, Apple began installing a secure processor in each of their phones. While this secure processor (called the Secure Enclave Processor, or SEP) is not exactly the same as the one Ozzie proposes, the overall security architecture seems very similar.

One main goal of Apple’s SEP was to limit the number of passcode guessing attempts that a user could make against a locked iPhone. In short, it was designed to keep track of each (failed) login attempt and keep a counter. If the number of attempts got too high, the SEP would make the user wait a while — in the best case — or actively destroy the phone’s keys. This last protection is effectively identical to Ozzie’s proposal. (With some modest differences: Ozzie proposes to “blow fuses” in the phone, rather than erasing a key; and he suggests that this event would triggered by entry of a recovery passcode.*)

For several years, the SEP appeared to do its job fairly effectively. Then in 2017, everything went wrong. Two firms, Cellebrite and Grayshift, announced that they had products that effectively unlocked every single Apple phone, without any need to dismantle the phone. Digging into the details of this exploit, it seems very clear that both firms — working independently — have found software exploits that somehow disable the protections that are supposed to be offered by the SEP.

The cost of this exploit (to police and other law enforcement)? About $3,000-$5,000 per phone. Or (if you like to buy rather than rent) about $15,000. Aso, just to add an element of comedy to the situation, the GrayKey source code appears to have recently been stolen. The attackers are extorting the company for two Bitcoin. Because 2018. (🤡👞)

Let me sum this up my point in case I’m not beating you about the head quite enough:

The richest and most sophisticated phone manufacturer in the entire world tried to build a processor that achieved goals similar to those Ozzie requires. And as of April 2018, after five years of trying, they have been unable to achieve this goala goal that is critical to the security of the Ozzie proposal as I understand it.

Now obviously the lack of a secure processor today doesn’t mean such a processor will never exist. However, let me propose a general rule: if your proposal fundamentally relies on a secure lock that nobody can ever break, then it’s on you to show me how to build that lock.


While this mainly concludes my notes about on Ozzie’s proposal, I want to conclude this post with a side note, a response to something I routinely hear from folks in the law enforcement community. This is the criticism that cryptographers are a bunch of naysayers who aren’t trying to solve “one of the most fundamental problems of our time”, and are instead just rejecting the problem with lazy claims that it “can’t work”.

As a researcher, my response to this is: phooey.

Cryptographers — myself most definitely included — love to solve crazy problems. We do this all the time. You want us to deploy a new cryptocurrency? No problem! Want us to build a system that conducts a sugar-beet auction using advanced multiparty computation techniques? Awesome. We’re there. No problem at all.

But there’s crazy and there’s crazy.

The reason so few of us are willing to bet on massive-scale key escrow systems is that we’ve thought about it and we don’t think it will work. We’ve looked at the threat model, the usage model, and the quality of hardware and software that exists today. Our informed opinion is that there’s no detection system for key theft, there’s no renewability system, HSMs are terrifically vulnerable (and the companies largely staffed with ex-intelligence employees), and insiders can be suborned. We’re not going to put the data of a few billion people on the line an environment where we believe with high probability that the system will fail.

Maybe that’s unreasonable. If so, I can live with that.

by Matthew Green at April 26, 2018 12:26 PM

April 25, 2018


Choria Progress Update

It’s been a while since my previous update and quite a bit have happened since.

Choria Server

As previously mentioned the Choria Server will aim to replace mcollectived eventually. Thus far I was focussed on it’s registration subsystem, Golang based MCollective RPC compatible agents and being able to embed it into other software for IoT and management backplanes.

Over the last few weeks I learned that MCollective will no longer be shipped in Puppet Agent version 6 which is currently due around Fall 2018. This means we have to accelerate making Choria standalone in it’s own right.

A number of things have to happen to get there:

  • Choria Server should support Ruby agents
  • The Ruby libraries Choria Server needs either need to be embedded and placed dynamically or provided via a Gem
  • The Ruby client needs to be provided via a Gem
  • New locations for these Ruby parts are needed outside of AIO Ruby

Yesterday I released the first step in this direction, you can now replace mcollectived with choria server. For now I am marking this as a preview/beta feature while we deal with issues the community finds.

The way this works is that we provide a small shim that uses just enough of MCollective to get the RPC framework running – luckily this was initially developed as a MCollective plugin and it retained its quite separate code base. When the Go code needs to invoke a ruby agent it will call the shim to do so, the shim in turn will provide the result from the agent – in JSON format – back to Go.

This works for me with any agent I’ve tried it with and I am quite pleased with the results:

root     10820  0.0  1.1 1306584 47436 ?       Sl   13:50   0:06 /opt/puppetlabs/puppet/bin/ruby /opt/puppetlabs/puppet/bin/mcollectived

MCollective would of course include the entire Puppet as soon as any agent that uses Puppet is loaded – service, package, puppet – and so over time things only get worse. Here is Choria:

root     32396  0.0  0.5 296436  9732 ?        Ssl  16:07   0:03 /usr/sbin/choria server --config=/etc/choria/server.conf

I run a couple 100 000 instances of this and this is what you get, it never changes really. This is because Choria spawns the Ruby code and that will exit when done.

This has an unfortunate side effect that the service, package and puppet agents are around 1 second slower per invocation because loading Puppet is really slow. Ones that do not load Puppet are only marginally slower.

irb(main):002:0> Benchmark.measure { require "puppet" }.real
=> 0.619865644723177

There is a page set up dedicated to the Beta that details how to run it and what to look out for.

JSON pure protocol

Some of the reasons for breakage that you might run into – like mco facts is not working now with Choria Server – is due to a hugely significant change in the background. Choria – both plugged into MCollective and Standalone – is JSON safe. The Ruby Plugin is optionally so (and off by default) but the Choria daemon only supports JSON.

Traditionally MCollective have used YAML on the wire, being quite old JSON was really not that big a deal back in the early 2000s when the foundation for this choice was laid down, XML was more important. Worse MCollective have exposed Ruby specific data types and YAML extensions on the wire which have made creating cross platform support nearly impossible.

YAML is also of course capable of carrying any object – which means some agents are just never going to be compatible with anything but Ruby. This was the case with the process agent but I fixed that before shipping it in Choria. It also essentially means YAML can invoke things you might not have anticipated and so happens big security problems.

Since quite some time now the Choria protocol is defined, versioned and JSON schemas are available. The protocol makes the separation between Payload, Security, Transport and Federation much clearer and the protocol can now support anything that can move JSON – Middleware, REST, SSH, Postal Doves are all capable of carrying Choria packets.

There is a separate Golang implementation of the protocol that is transport agnostic and the schemas are there. Version 1 of the protocol is a tad skewed to MCollective but Version 2 (not yet planned) will drop those shackles. A single Choria Server is capable of serving multiple versions of the network protocol and communicate with old and new clients.

Golang being a static language and having a really solid and completely compatible implementation of the protocol means making ones for other languages like Python etc will not be hard. However I think long term the better option for other languages are still a capable REST gateway.

I did some POC work on a very very light weight protocol suitable for devices like Arduino and will provide bridging between the worlds in our Federation Brokers. You’ll be able to mco rpc wallplug off, your client will talk full Choria Protocol and the wall plug might speak a super light weight MQTT based protocol and you will not even know this.

There are some gotchas as a result of these changes, also captured in the Choria Server evaluation documentation. To resolve some of these I need to be much more aggressive with what I do to the MCollective libraries, something I can do once they are liberated out of Puppet Agent.

by R.I. Pienaar at April 25, 2018 08:25 AM

April 23, 2018

Vincent Bernat

A more privacy-friendly blog

When I started this blog, I embraced some free services, like Disqus or Google Analytics. These services are quite invasive for users’ privacy. Over the years, I have tried to correct this to reach a point where I do not rely on any “privacy-hostile” services.


Google Analytics is an ubiquitous solution to get a powerful analytics solution for free. It’s also a great way to provide data about your visitors to Google—also for free. There are self-hosted solutions like Matomo—previously Piwik.

I opted for a simpler solution: no analytics. It also enables me to think that my blog attracts thousands of visitors every day.


Google Fonts is a very popular font library and hosting service, which relies on the generic Google Privacy Policy. The google-webfonts-helper service makes it easy to self-host any font from Google Fonts. Moreover, with help from pyftsubset, I include only the characters used in this blog. The font files are lighter and more complete: no problem spelling “Antonín Dvořák”.


  • Before: YouTube
  • After: self-hosted

Some articles are supported by a video (like “OPL2LPT: an AdLib sound card for the parallel port“). In the past, I was using YouTube, mostly because it was the only free platform with an option to disable ads. Streaming on-demand videos is usually deemed quite difficult. For example, if you just use the <video> tag, you may push a too big video for people with a slow connection. However, it is not that hard, thanks to hls.js, which enables to deliver video sliced in segments available at different bitrates. Users with Java­Script disabled are still delivered with a progressive version of medium quality.

In “Self-hosted videos with HLS”, I explain this approach in more details.


Disqus is a popular comment solution for static websites. They were recently acquired by Zeta Global, a marketing company and their business model is supported only by advertisements. On the technical side, Disqus also loads several hundred kilobytes of resources. Therefore, many websites load Disqus on demand. That’s what I did. This doesn’t solve the privacy problem and I had the sentiment people were less eager to leave a comment if they had to execute an additional action.

For some time, I thought about implementing my own comment system around Atom feeds. Each page would get its own feed of comments. A piece of Java­Script would turn these feeds into HTML and comments could still be read without Java­Script, thanks to the default rendering provided by browsers. People could also subscribe to these feeds: no need for mail notifications! The feeds would be served as static files and updated on new comments by a small piece of server-side code. Again, this could work without Javascript.

Day Planner by Fowl Language Comics
Fowl Language Comics: Day Planner or the real reason why I didn't code a new comment system.

I still think this is a great idea. But I didn’t feel like developing and maintaining a new comment system. There are several self-hosted alternatives, notably Isso and Commento. Isso is a bit more featureful, with notably an imperfect import from Disqus. Both are struggling with maintenance and are trying to become sustainable with a paid hosted version.1 Commento is more privacy-friendly as it doesn’t use cookies at all. However, cookies from Isso are not essential and can be filtered with nginx:

proxy_hide_header Set-Cookie;
proxy_hide_header X-Set-Cookie;
proxy_ignore_headers Set-Cookie;

In Isso, there is currently no mail notifications, but I have added an Atom feed for each comment thread.

Another option would have been to not provide comments anymore. However, I had some great contributions as comments in the past and I also think they can work as some kind of peer review for blog articles: they are a weak guarantee that the content is not totally wrong.

Search engine🔗

A way to provide a search engine for a personal blog is to provide a form for a public search engine, like Google. That’s what I did. I also slapped some Java­Script on top of that to make it look like not Google.

The solution here is easy: switch to DuckDuckGo, which lets you customize a bit the search experience:

<form id="lf-search" action="">
  <input type="hidden" name="kf" value="-1">
  <input type="hidden" name="kaf" value="1">
  <input type="hidden" name="k1" value="-1">
  <input type="hidden" name="sites" value="">
  <input type="submit" value="">
  <input type="text" name="q" value="" autocomplete="off" aria-label="Search">

The Java­Script part is also removed as DuckDuckGo doesn’t provide an API. As it is unlikely that more than three people will use the search engine in a year, this seems a good idea to not spend too much time on this non-essential feature.


  • Before: RSS feed
  • After: still RSS feed but also a MailChimp newsletter

Nowadays, RSS feeds are far less popular they were before. I am still baffled as why a technical audience wouldn’t use RSS, but some readers prefer to receive updates by mail.

MailChimp is a common solution to send newsletters. It provides a simple integration with RSS feeds to trigger a mail each time new items are added to the feed. From a privacy point of view, MailChimp seems a good citizen: data collection is mainly limited to the amount needed to operate the service. Privacy-conscious users can still avoid this service and use the RSS feed.

Less Java­Script🔗

  • Before: third-party Java­Script code
  • After: self-hosted Java­Script code

Many privacy-conscious people are disabling Java­Script or using extensions like uMatrix or NoScript. Except for comments, I was using Java­Script only for non-essential stuff:

For mathematical formulae, I have switched from MathJax to KaTeX. The later is faster but also enables server-side rendering: it produces the same output regardless of browser. Therefore, client-side Java­Script is not needed anymore.

For sidenotes, I have turned the Java­Script code doing the transformation into Python code, with pyquery. No more client-side Java­Script for this aspect either.

The remaining code is still here but is self-hosted.

Memento: CSP🔗

The HTTP Content-Security-Policy header controls the resources that a user agent is allowed to load for a given page. It is a safeguard and a memento for the external resources a site will use. Mine is moderately complex and shows what to expect from a privacy point of view:3

  default-src 'self' blob:;
  script-src  'self' blob:;
  object-src  'self';
  img-src     'self' data:;
  style-src   'self' 'unsafe-inline';
  font-src    'self' about: data:;
  worker-src  blob:;
  media-src   'self' blob:;
  connect-src 'self';
  frame-ancestors 'none';

I am quite happy having been able to reach this result. 😊

  1. For Isso, look at For Commento, look at↩︎

  2. You may have noticed I am a footnote sicko and use them all the time for pointless stuff. ↩︎

  3. I don’t have issue with using a CDN like CloudFront: it is a paid service and Amazon AWS is not in the business of tracking users. ↩︎

by Vincent Bernat at April 23, 2018 08:01 AM

April 21, 2018

Cryptography Engineering

Wonk post: chosen ciphertext security in public-key encryption (Part 1)

In general I try to limit this blog to posts that focus on generally-applicable techniques in cryptography. That is, I don’t focus on the deeply wonky. But this post is going to be an exception. Specifically, I’m going to talk about a topic that most “typical” implementers don’t — and shouldn’t — think about.

Specifically: I’m going to talk about various techniques for making public key encryption schemes chosen ciphertext secure. I see this as the kind of post that would have saved me ages of reading when I was a grad student, so I figured it wouldn’t hurt to write it all down.

Background: CCA(1/2) security

Early (classical) ciphers used a relatively weak model of security, if they used one at all. That is, the typical security model for an encryption scheme was something like the following:

  1. I generate an encryption key (or keypair for public-key encryption)
  2. I give you the encryption of some message of my choice
  3. You “win” if you can decrypt it

This is obviously not a great model in the real world, for several reasons. First off, in some cases the attacker knows a lot about the message to be decrypted. For example: it may come from a small space (like a set of playing cards). For this reason we require a stronger definition like “semantic security” that assumes the attacker can choose the plaintext distribution, and can also obtain the encryption of messages of his/her own choice. I’ve written more about this here.

More relevant to this post, another limitation of the above game is that — in some real-world examples — the attacker has even more power. That is: in addition to obtaining the encryption of chosen plaintexts, they may be able to convince the secret keyholder to decrypt chosen ciphertexts of their choice.

The latter attack is called a chosen-ciphertext (CCA) attack.

At first blush this seems like a really stupid model. If you can ask the keyholder to decrypt chosen ciphertexts, then isn’t the scheme just obviously broken? Can’t you just decrypt anything you want?

The answer, it turns out, is that there are many real-life examples where the attacker has decryption capability, but the scheme isn’t obviously broken. For example:

  1. Sometimes an attacker can decrypt a limited set of ciphertexts (for example, because someone leaves the decryption machine unattended at lunchtime.) The question then is whether they can learn enough from this access to decrypt other ciphertexts that are generated after she loses access to the decryption machine — for example, messages that are encrypted after the operator comes back from lunch.
  2. Sometimes an attacker can submit any ciphertext she wants — but will only obtain a partial decryption of the ciphertext. For example, she might learn only a single bit of information such as “did this ciphertext decrypt correctly”. The question, then, is whether she can leverage this tiny amount of data to fully decrypt some ciphertext of her choosing.

The first example is generally called a “non-adaptive” chosen ciphertext attack, or a CCA1 attack (and sometimes, historically, a “lunchtime” attack). There are a few encryption schemes that totally fall apart under this attack — the most famous textbook example is Rabin’s public key encryption scheme, which allows you to recover the full secret key from just a single chosen-ciphertext decryption.

The more powerful second example is generally referred to as an “adaptive” chosen ciphertext attack, or a CCA2 attack. The term refers to the idea that the attacker can select the ciphertexts they try to decrypt based on seeing a specific ciphertext that they want to attack, and by seeing the answers to specific decryption queries.

In this article we’re going to use the more powerful “adaptive” (CCA2) definition, because that subsumes the CCA1 definition. We’re also going to focus primarily on public-key encryption.

With this in mind, here is the intuitive definition of the experiment we want a CCA2 public-key encryption scheme to be able to survive:

  1. I generate an encryption keypair for a public-key scheme and give you the public key.
  2. You can send me (sequentially and adaptively) many ciphertexts, which I will decrypt with my secret key. I’ll give you the result of each decryption.
  3. Eventually you’ll send me a pair of messages (of equal length) M_0, M_1 and I’ll pick a bit b at random, and return to you the encryption of M_b, which I will denote as C^* \leftarrow {\sf Encrypt}(pk, M_b).
  4. You’ll repeat step (2), sending me ciphertexts to decrypt. If you send me C^* I’ll reject your attempt. But I’ll decrypt any other ciphertext you send me, even if it’s only slightly different from C^*.
  5. The attacker outputs their guess b'. They “win” the game if b'=b.

We say that our scheme is secure if the attacker wins only with a significantly greater probability than they would win with if they simply guessed b' at random. Since they can win this game with probability 1/2 just by guessing randomly, that means we want (Probability attacker wins the game) – 1/2 to be “very small” (typically a negligible function of the security parameter).

You should notice two things about this definition. First, it gives the attacker the full decryption of any ciphertext they send me. This is obviously much more powerful than just giving the attacker a single bit of information, as we mentioned in the example further above. But note that powerful is good. If our scheme can remain secure in this powerful experiment, then clearly it will be secure in a setting where the attacker gets strictly less information from each decryption query.

The second thing you should notice is that we impose a single extra condition in step (4), namely that the attacker cannot ask us to decrypt C^*. We do this only to prevent the game from being “trivial” — if we did not impose this requirement, the attacker could always just hand us back C^* to decrypt, and they would always learn the value of b.

(Notice as well that we do not give the attacker the ability to request encryptions of chosen plaintexts. We don’t need to do that in the public key encryption version of this game, because we’re focusing exclusively on public-key encryption here — since the attacker has the public key, she can encrypt anything she wants without my help.)

With definitions out of the way, let’s talk a bit about how we achieve CCA2 security in real schemes.

A quick detour: symmetric encryption

This post is mainly going to focus on public-key encryption, because that’s actually the problem that’s challenging and interesting to solve. It turns out that achieving CCA2 for symmetric-key encryption is really easy. Let me briefly explain why this is, and why the same ideas don’t work for public-key encryption.

(To explain this, we’ll need to slightly tweak the CCA2 definition above to make it work in the symmetric setting. The changes here are small: we won’t give the attacker a public key in step (1), and at steps (2) and (4) we will allow the attacker to request the encryption of chosen plaintexts as well as the decryption.)

The first observation is that many common encryption schemes — particularly, the widely-used cipher modes of operation like CBC and CTR — are semantically secure in a model where the attacker does not have the ability to decrypt chosen ciphertexts. However, these same schemes break completely in the CCA2 model.

The simple reason for this is ciphertext malleability. Take CTR mode, which is particularly easy to mess with. Let’s say we’ve obtained a ciphertext C^* at step (4) (recall that C^* is the encryption of M_b), it’s trivially easy to “maul” the ciphertext — simply by flipping, say, a bit of the message (i.e., XORing it with “1”). This gives us a new ciphertext C' = C^* \oplus 1 that we are now allowed to submit for decryption. We are now allowed (by the rules of the game) to submit this ciphertext, and obtain M_b \oplus 1, which we can use to figure out b.

(A related, but “real world” variant of this attack is Vaudenay’s Padding Oracle Attack, which breaks actual implementations of symmetric-key cryptosystems. Here’s one we did against Apple iMessage. Here’s an older one on XML encryption.)

So how do we fix this problem? The straightforward observation is that we need to prevent the attacker from mauling the ciphertext C^*. The generic approach to doing this is to modify the encryption scheme so that it includes a Message Authentication Code (MAC) tag computed over every CTR-mode ciphertext. The key for this MAC scheme is generated by the encrypting party (me) and kept with the encryption key. When asked to decrypt a ciphertext, the decryptor first checks whether the MAC is valid. If it’s not, the decryption routine will output “ERROR”. Assuming an appropriate MAC scheme, the attacker can’t modify the ciphertext (including the MAC) without causing the decryption to fail and produce a useless result.

So in short: in the symmetric encryption setting, the answer to CCA2 security is simply for the encrypting parties to authenticate each ciphertext using a secret authentication (MAC) key they generate. Since we’re talking about symmetric encryption, that extra (secret) authentication key can be generated and stored with the decryption key. (Some more efficient schemes make this all work with a single key, but that’s just an engineering convenience.) Everything works out fine.

So now we get to the big question.

CCA security is easy in symmetric encryption. Why can’t we just do the same thing for public-key encryption?

As we saw above, it turns out that strong authenticated encryption is sufficient to get CCA(2) security in the world of symmetric encryption. Sadly, when you try this same idea generically in public key encryption, it doesn’t always work. There’s a short reason for this, and a long one. The short version is: it matters who is doing the encryption.

Let’s focus on the critical difference. In the symmetric CCA2 game above, there is exactly one person who is able to (legitimately) encrypt ciphertexts. That person is me. To put it more clearly: the person who performs the legitimate encryption operations (and has the secret key) is also the same person who is performing decryption.

Even if the encryptor and decryptor aren’t literally the same person, the encryptor still has to be honest. (To see why this has to be the case, remember that the encryptor has shared secret key! If that party was a bad guy, then the whole scheme would be broken, since they could just output the secret key to the bad guys.) And once you’ve made the stipulation that the encryptor is honest, then you’re almost all the way there. It suffices simply to add some kind of authentication (a MAC or a signature) to any ciphertext she encrypts. At that point the decryptor only needs to determine whether any given ciphertexts actually came from the (honest) encryptor, and avoid decrypting the bad ones. You’re done.

Public key encryption (PKE) fundamentally breaks all these assumptions.

In a public-key encryption scheme, the main idea is that anyone can encrypt a message to you, once they get a copy of your public key. The encryption algorithm may sometimes be run by good, honest people. But it can also be run by malicious people. It can be run by parties who are adversarial. The decryptor has to be able to deal with all of those cases. One can’t simply assume that the “real” encryptor is honest.

Let me give a concrete example of how this can hurt you. A couple of years ago I wrote a post about flaws in Apple iMessage, which (at the time) used simple authenticated (public key) encryption scheme. The basic iMessage encryption algorithm used public key encryption (actually a combination of RSA with some AES thrown in for efficiency) so that anyone could encrypt a message to my key. For authenticity, it required that every message be signed with an ECDSA signature by the sender.

When I received a message, I would look up the sender’s public key and first make sure the signature was valid. This would prevent bad guys from tampering with the message in flight — e.g., executing nasty stuff like adaptive chosen ciphertext attacks. If you squint a little, this is almost exactly a direct translation of the symmetric crypto approach we discussed above. We’re simply swapping the MAC for a digital signature.

The problems with this scheme start to become apparent when we consider that there might be multiple people sending me ciphertexts. Let’s say the adversary is on the communication path and intercepts a signed message from you to me. They want to change (i.e., maul) the message so that they can execute some kind of clever attack. Well, it turns out this is simple. They simply rip off the honest signature and replace it one they make themselves:


The new message is identical, but now appears to come from a different person (the attacker). Since the attacker has their own signing key, they can maul the encrypted message as much as they want, and sign new versions of that message. If you plug this attack into (a version) of the public-key CCA2 game up top, you see they’ll win quite easily. All they have to do is modify the challenge ciphertext C^* at step (4) to be signed with their own signing key, then they can change it by munging with the CTR mode encryption, and request the decryption of that ciphertext.

Of course if I only accept messages from signed by some original (guaranteed-to-be-honest) sender, this scheme might work out fine. But that’s not the point of public key encryption. In a real public-key scheme — like the one Apple iMessage was trying to build — I should be able to (safely) decrypt messages from anyone, and in that setting this naive scheme breaks down pretty badly.


Ok, this post has gotten a bit long, and so far I haven’t actually gotten to the various “tricks” for adding chosen ciphertext security to real public key encryption schemes. That will have to wait until the next post, to come shortly.

by Matthew Green at April 21, 2018 03:40 PM

Vincent Bernat

OPL2 Audio Board: an AdLib sound card for Arduino

In a previous article, I presented the OPL2LPT, a sound card for the parallel port featuring a Yamaha YM3812 chip, also known as OPL2—the chip of the AdLib sound card. The OPL2 Audio Board for Arduino is another indie sound card using this chip. However, instead of relying on a parallel port, it uses a serial interface, which can be drived from an Arduino board or a Raspberry Pi. While the OPL2LPT targets retrogamers with real hardware, the OPL2 Audio Board cannot be used in the same way. Nonetheless, it can also be operated from ScummVM and DOSBox!

OPL2 Audio Board for Arduino
The OPL2 Audio Board over a “Grim Fandango” box.


The OPL2 Audio Board can be purchased on Tindie, either as a kit or fully assembled. I have paired it with a cheap clone of the Arduino Nano. A library to drive the board is available on GitHub, along with some examples.

One of them is DemoTune.ino. It plays a short tune on three channels. It can be compiled and uploaded to the Arduino with PlatformIO—installable with pip install platformio—using the following command:1

$ platformio ci \
    --board nanoatmega328 \
    --lib ../../src \
    --project-option="targets=upload" \
    --project-option="upload_port=/dev/ttyUSB0" \
PLATFORM: Atmel AVR > Arduino Nano ATmega328
Converting DemoTune.ino
Configuring upload protocol...
AVAILABLE: arduino
CURRENT: upload_protocol = arduino
Looking for upload port...
Use manually specified: /dev/ttyUSB0
Uploading .pioenvs/nanoatmega328/firmware.hex
avrdude: 6618 bytes of flash written
===== [SUCCESS] Took 5.94 seconds =====

Immediately after the upload, the Arduino plays the tune. 🎶

The next interesting example is SerialIface.ino. It turns the audio board into a sound card over serial port. Once the code has been pushed to the Arduino, you can use the program in the same directory to play VGM files. They are a sample-accurate sound format for many sound chips. They log the exact commands sent. There are many of them on VGMRips. Be sure to choose the ones for the YM3812/OPL2! Here is a small selection:

The OPL2 Audio Board playing some VGM files. It is connected to an Arduino Nano. You can see the LEDs blinking when the Arduino receives the commands from the serial port.

Usage with DOSBox & ScummVM🔗


The support for the serial protocol used in this section has not been merged yet. In the meantime, grab SerialIface.ino from the pull request: git checkout 50e1717.

When the Arduino is flashed with SerialIface.ino, the board can be driven through a simple protocol over the serial port. By patching DOSBox and ScummVM, we can make them use this unusual sound card. Here are some examples of games:

  • 0:00, with DOSBox, the first level of Doom 🎮
  • 1:06, with DOSBox, the introduction of Loom 🎼
  • 2:38, with DOSBox, the first level of Lemmings 🐹
  • 3:32, with DOSBox, the introduction of Legend of Kyrandia 🃏
  • 6:47, with ScummVM, the introduction of Day of the Tentacle ☢️
  • 11:10, with DOSBox, the introduction of Another World2 🐅


The serial protocol is described in the SerialIface.ino file:

 * A very simple serial protocol is used.
 * - Initial 3-way handshake to overcome reset delay / serial noise issues.
 * - 5-byte binary commands to write registers.
 *   - (uint8)  OPL2 register address
 *   - (uint8)  OPL2 register data
 *   - (int16)  delay (milliseconds); negative -> pre-delay; positive -> post-delay
 *   - (uint8)  delay (microseconds / 4)
 * Example session:
 * Arduino: HLO!
 * PC:      BUF?
 * Arduino: 256 (switches to binary mode)
 * PC:      0xb80a014f02 (write OPL register and delay)
 * Arduino: k
 * A variant of this protocol is available without the delays. In this
 * case, the BUF? command should be sent as B0F? The binary protocol
 * is now using 2-byte binary commands:
 *   - (uint8)  OPL2 register address
 *   - (uint8)  OPL2 register data

Adding support for this protocol in DOSBox is relatively simple (patch). For best performance, we use the 2-byte variant (5000 ops/s). The binary commands are pipelined and a dedicated thread collects the acknowledgments. A semaphore captures the number of free slots in the receive buffer. As it is not possible to read registers, we rely on DOSBox to emulate the timers, which are mostly used to let the various games detect the OPL2.

The patch is tested only on Linux but should work on any POSIX system—not Windows. To test it, you need to build DOSBox from source:

$ sudo apt build-dep dosbox
$ git clone -b feature/opl2audioboard
$ cd dosbox
$ ./
$ ./configure && make

Replace the sblaster section of ~/.dosbox/dosbox-SVN.conf:


Then, run DOSBox with ./src/dosbox. That’s it!

You will likely get the “OPL2Arduino: too slow, consider increasing buffer” message a lot. To fix this, you need to recompile SerialIface.ino with a bigger receive buffer:

$ platformio ci \
    --board nanoatmega328 \
    --lib ../../src \
    --project-option="targets=upload" \
    --project-option="upload_port=/dev/ttyUSB0" \
    --project-option="build_flags=-DSERIAL_RX_BUFFER_SIZE=512" \


The same code can be adapted for ScummVM (patch). To test, build it from source:

$ sudo apt build-dep scummvm
$ git clone -b feature/opl2audioboard
$ cd scummvm
$ ./configure --disable-all-engines --enable-engine=scumm && make

Then, you can start ScummVM with ./scummvm. Select “AdLib Emulator” as the music device and “OPL2 Arduino” as the AdLib emulator.3 Like for DOSBox, watch the console to check if you need a larger receive buffer.

Enjoy! 😍

  1. This command is valid for an Arduino Nano. For another board, take a look at the output of platformio boards arduino↩︎

  2. Another World (also known as Out of This World), released in 1991, designed by Éric Chahi, is using sampled sounds at 5 kHz or 10 kHz. With a serial port operating at 115,200 bits/s, the 5 kHz option is just within our reach. However, I have no idea if the rendering is faithful. It doesn’t sound like a SoundBlaster, but it sounds analogous to the rendering of the OPL2LPT which sounds similar to the SoundBlaster when using the 10 kHz option. DOSBox’ AdLib emulation using Nuked OPL3—which is considered to be the best—sounds worse. ↩︎

  3. If you need to specify a serial port other than /dev/ttyUSB0, add a line opl2arduino_device= in the ~/.scummvmrc configuration file. ↩︎

by Vincent Bernat at April 21, 2018 09:19 AM

April 20, 2018

Sarah Allen

false dichotomy of control vs sharing

Email is the killer app of the Internet. Amidst many sharing and collaboration applications and services, most of us frequently fall back to email. Marc Stiegler suggests that email often “just works better”. Why is this?

Digital communication is fast across distances and allows access to incredible volumes of information, yet digital access controls typically force us into a false dichotomy of control vs sharing.

Looking at physical models of sharing and access control, we can see that we already have well-established models where we can give up control temporarily, yet not completely.

Alan Karp illustrated this nicely at last week’s Internet Identity Workshop (IIW) in a quick anecdote:

Marc gave me the key to his car so I could park in in my garage. I couldn’t do it, so I gave the key to my kid, and asked my neighbor to do it for me. She stopped by my house, got the key and used it to park Marc’s car in my garage.

The car key scenario is clear. In addition to possession of they key, there’s even another layer of control — if my kid doesn’t have a driver’s license, then he can’t drive the car, even if he holds the key.

When we translate this story to our modern digital realm, it sounds crazy:

Marc gave me his password so I could copy a file from his computer to mine. I couldn’t do it, so I gave Marc’s password to my kid, and asked my neighbor to do it for me. She stopped by my house so my kid could tell her my password, and then she used it to copy the file from Marc’s computer to mine.

After the conference, I read Marc Stiegler’s 2009 paper Rich Sharing for the Web details key features of sharing that we have in the real world that are illustrated in the anecdote that Alan so effectively rattled off.

These 6 features (enumerated below) enable people to create networks of access rights that implement the Principle of Least Authority (POLA). The key is to limit how much you need to trust someone before sharing. “Systems that do not implement these 6 features will feel rigid and inadequately functional once enough users are involved, forcing the users to seek alternate means to work around the limitations in those applications.”

  1. Dynamic: I can grant access quickly and effortlessly (without involving an administrator).
  2. Attenuated: To give you permission to do or see one thing, I don’t have to give you permission to do everything. (e.g. valet key allows driving, but not access to the trunk)
  3. Chained: Authority may be delegated (and re-delegated).
  4. Composable: I have permission to drive a car from the State of California, and Marc’s car key. I require both permissions together to drive the car.
  5. Cross-jurisdiction: There are three families involved, each with its own policies, yet there’s no
    need to communicate policies to another jurisdiction. In the example, I didn’t need to ask Marc to change his policy to grant my neighbor permission to drive his car.
  6. Accountable: If Marc finds a new scratch on his car, he knows to ask me to pay for the repair. It’s up to me to collect from my neighbor. Digital access control systems will typically record who did which action, but don’t record who asked an administrator to grant permission.

Note: Accountability is not always directly linked to delegation. Marc would likely hold me accountable if his car got scratched, even if my neighbor had damaged the car when parking it in the garage. Whereas, if it isn’t my garage, bur rather a repair shop where my neighbor drops off the car for Marc, then if the repair shop damages the car, Marc would hold them responsible.

How does this work for email?

The following examples from Marc’s paper were edited for brevity:

  • Dynamic: You can send email to anyone any time.
  • Attenuated: When I email you an attachment, I’m sending a read-only copy. You don’t have access to my whole hard drive and you don’t expect that modifying it will change my copy.
  • Chained: I can forward you an email. You can then forward it to someone else.
  • Cross-Domain: I can send email to people at other companies and organizations with permissions from their IT dept.
  • Composable: I can include an attachment from email originating at one company with text or another attachment from another email and send it to whoever I want.
  • Accountable: If Alice asks Bob to edit a file and email it back, and Bob asks Carol to edit the file, and
    Bob then emails it back, Alice will hold Bob responsible if the edits are erroneous. If Carol (whom Alice
    may not know) emails her result directly to Alice, either Alice will ask Carol who she is before accepting
    the changes, or if Carol includes the history of messages in the message, Alice will directly see, once
    again, that she should hold Bob responsible.

Further reading

Alan Karp’s IoT Position Paper compares several sharing tools across these 6 features and also discusses ZBAC (authoriZation-Based Access Control) where authorization is known as a “capability.” An object capability is an unforgeable token that both designates a resource and grants permission to access it.

by sarah at April 20, 2018 12:57 PM

April 19, 2018

Steve Kemp's Blog

A filesystem for known_hosts

The other day I had an idea that wouldn't go away, a filesystem that exported the contents of ~/.ssh/known_hosts.

I can't think of a single useful use for it, beyond simple shell-scripting, and yet I couldn't resist.

 $ go get -u
 $ go install

Now make it work:

 $ mkdir ~/knownfs
 $ knownfs ~/knownfs

Beneat out mount-point we can expect one directory for each known-host. So we'll see entries:

 ~/knownfs $ ls | grep \.vpn

 ~/knownfs $ ls | grep steve

The host-specified entries will each contain a single file fingerprint, with the fingerprint of the remote host:

 ~/knownfs $ cd
 ~/knownfs/ $ ls
 frodo ~/knownfs/ $ cat fingerprint

I've used it in a few shell-loops to run commands against hosts matching a pattern, but beyond that I'm struggling to think of a use for it.

If you like the idea I guess have a play:

It was perhaps more useful and productive than my other recent work - which involves porting an existing network-testing program from Ruby to golang, and in the process making it much more uniform and self-consistent.

The resulting network tester is pretty good, and can now notify via MQ to provide better decoupling too. The downside is of course that nobody changes network-testing solutions on a whim, and so these things are basically always in-house only.

April 19, 2018 09:01 AM

April 11, 2018

Steve Kemp's Blog

Bread and data

For the past two weeks I've mostly been baking bread. I'm not sure what made me decide to make some the first time, but it actually turned out pretty good so I've been doing every day or two ever since.

This is the first time I've made bread in the past 20 years or so - I recall in the past I got frustrated that it never rose, or didn't turn out well. I can't see that I'm doing anything differently, so I'll just write it off as younger-Steve being daft!

No doubt I'll get bored of the delicious bread in the future, but for the moment I've got a good routine going - juggling going to the shops, child-care, and making bread.

Bread I've made includes the following:

Beyond that I've spent a little while writing a simple utility to embed resources in golang projects, after discovering the tool I'd previously been using, go-bindata, had been abandoned.

In short you feed it a directory of files and it will generate a file static.go with contents like this:

files[ "data/index.html" ] = "<html>....
files[ "data/robots.txt" ] = "User-Agent: * ..."

It's a bit more complex than that, but not much. As expected getting the embedded data at runtime is trivial, and it allows you to distribute a single binary even if you want/need some configuration files, templates, or media to run.

For example in the project I discussed in my previous post there is a HTTP-server which serves a user-interface based upon bootstrap. I want the HTML-files which make up that user-interface to be embedded in the binary, rather than distributing them seperately.

Anyway it's not unique, it was a fun experience writing, and I've switched to using it now:

April 11, 2018 09:01 AM

April 08, 2018

Multi-git-status now shows branches with no upstream

Just a quick update on Multi-git-status. It now also shows branches with no upstream. These are typically branches created locally that haven't been configured to track a local or remote branch. Any changes in those branches are lost when the repo is removed from your machine. Additionally, multi-git-status now handles branches with slashes in them properly. For example, "feature/loginscreen". Here's how the output looks now:


You can get multi-git-status from the Github page.

by admin at April 08, 2018 06:22 PM

April 07, 2018

Sarah Allen

zero-knowledge proof: trust without shared secrets

In cryptography we typically share a secret which allows us to decrypt future messages. Commonly this is a password that I make up and submit to a Web site, then later produce to verify I am the same person.

I missed Kazue Sako’s Zero Knowledge Proofs 101 presentation at IIW last week, but Rachel Myers shared an impressively simply retelling in the car on the way back to San Francisco, which inspired me to read the notes and review the proof for myself. I’ve attempted to reproduce this simple explanation below, also noting additional sources and related articles.

Zero Knowledge Proofs (ZPKs) are very useful when applied to internet identity — with an interactive exchange you can prove you know a secret without actually revealing the secret.

Understanding Zero Knowledge Proofs with simple math:

x -> f(x)

Simple one way function. Easy to go one way from x to f(x) but mathematically hard to go from f(x) to x.

The most common example is a hash function. Wired: What is Password Hashing? provides an accessible introduction to why hash functions are important to cryptographic applications today.

f(x) = g ^ x mod p

Known(public): g, p
* g is a constant
* p has to be prime

Easy to know x and compute g ^ x mod p but difficult to do in reverse.

Interactive Proof

Alice wants to prove Bob that she knows x without giving any information about x. Bob already knows f(x). Alice can make f(x) public and then prove that she knows x through an interactive exchange with anyone on the Internet, in this case, Bob.

  1. Alice publishes f(x): g^x mod p
  2. Alice picks random number r
  3. Alice sends Bob u = g^r mod p
  4. Now Bob has artifact based on that random number, but can’t actually calculate the random number
  5. Bob returns a challenge e. Either 0 or 1
  6. Alice responds with v:
    If 0, v = r
    If 1, v = r + x
  7. Bob can now calculate:
    If e == 0: Bob has the random number r, as well as the publicly known variables and can check if u == g^v mod p
    If e == 1: u*f(x) = g^v (mod p)

I believe step 6 is true based on Congruence of Powers, though I’m not sure that I’ve transcribed e==1 case accurately with my limited ascii representation.

If r is true random, equally distributed between zero and (p-1), this does not leak any information about x, which is pretty neat, yet not sufficient.

In order to ensure that Alice cannot be impersonated, multiple iterations are required along with the use of large numbers (see IIW session notes).

Further Reading

by sarah at April 07, 2018 05:52 PM

April 05, 2018

Marios Zindilis

A small web application with Angular5 and Django

Django works well as the back-end of an application that uses Angular5 in the front-end. In my attempt to learn Angular5 well enough to build a small proof-of-concept application, I couldn't find a simple working example of a combination of the two frameworks, so I created one. I called this the Pizza Maker. It's available on GitHub, and its documentation is in the README.

If you have any feedback for this, please open an issue on GitHub.

April 05, 2018 11:00 PM

April 03, 2018


Adding rich object data types to Puppet

Extending Puppet using types, providers, facts and functions are well known and widely done. Something new is how to add entire new data types to the Puppet DSL to create entirely new language behaviours.

I’ve done a bunch of this recently with the Choria Playbooks and some other fun experiments, today I’ll walk through building a small network wide spec system using the Puppet DSL.


A quick look at what we want to achieve here, I want to be able to do Choria RPC requests and assert their outcomes, I want to write tests using the Puppet DSL and they should run on a specially prepared environment. In my case I have a AWS environment with CentOS, Ubuntu, Debian and Archlinux machines:

Below I test the File Manager Agent:

  • Get status for a known file and make sure it finds the file
  • Create a brand new file, ensure it reports success
  • Verify that the file exist and is empty using the status action

cspec::suite("filemgr agent tests", $fail_fast, $report) |$suite| {
  # Checks an existing file
  $"Should get file details") |$t| {
    $results = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.status",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "properties" => {
        "file" => "/etc/hosts"
    $results.each |$result| {
      $t.assert_task_data_equals($result, $result["data"]["present"], 1)
  # Make a new file and check it exists
  $"Should support touch") |$t| {
    $fname = sprintf("/tmp/filemgr.%s", strftime(Timestamp(), "%s"))
    $r1 = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.touch",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "fail_ok" => true,
      "properties" => {
        "file" => $fname
    $r2 = choria::task("mcollective", _catch_errors => true,
      "action" => "filemgr.status",
      "nodes" => $nodes,
      "silent" => true,
      "fact_filter" => ["kernel=Linux"],
      "properties" => {
        "file" => $fname
    $r2.each |$result| {
      $t.assert_task_data_equals($result, $result["data"]["present"], 1)
      $t.assert_task_data_equals($result, $result["data"]["size"], 0)

I also want to be able to test other things like lets say discovery:

  cspec::suite("${method} discovery method", $fail_fast, $report) |$suite| {
    $"Should support a basic discovery") |$t| {
      $found = choria::discover(
        "discovery_method" => $method,
      $t.assert_equal($found.sort, $all_nodes.sort)

So we want to make a Spec like system that can drive Puppet Plans (aka Choria Playbooks) and do various assertions on the outcome.

We want to run it with mco playbook run and it should write a JSON report to disk with all suites, cases and assertions.

Adding a new Data Type to Puppet

I’ll show how to add the Cspec::Suite data Type to Puppet. This comes in 2 parts: You have to describe the Type that is exposed to Puppet and you have to provide a Ruby implementation of the Type.

Describing the Objects

Here we create the signature for Cspec::Suite:

# modules/cspec/lib/puppet/datatypes/cspec/suite.rb
Puppet::DataTypes.create_type("Cspec::Suite") do
  interface <<-PUPPET
    attributes => {
      "description" => String,
      "fail_fast" => Boolean,
      "report" => String
    functions => {
      it => Callable[[String, Callable[Cspec::Case]], Any],
  load_file "puppet_x/cspec/suite"
  implementation_class PuppetX::Cspec::Suite

As you can see from the line of code cspec::suite(“filemgr agent tests”, $fail_fast, $report) |$suite| {….} we pass 3 arguments: a description of the test, if the test should fail immediately on any error or keep going and there to write the report of the suite to. This corresponds to the attributes here. A function that will be shown later takes these and make our instance.

We then have to add our it() function which again takes a description and yields out `Cspec::Case`, it returns any value.

When Puppet needs the implementation of this code it will call the Ruby class PuppetX::Cspec::Suite.

Here is the same for the Cspec::Case:

# modules/cspec/lib/puppet/datatypes/cspec/case.rb
Puppet::DataTypes.create_type("Cspec::Case") do
  interface <<-PUPPET
    attributes => {
      "description" => String,
      "suite" => Cspec::Suite
    functions => {
      assert_equal => Callable[[Any, Any], Boolean],
      assert_task_success => Callable[[Choria::TaskResults], Boolean],
      assert_task_data_equals => Callable[[Choria::TaskResult, Any, Any], Boolean]
  load_file "puppet_x/cspec/case"
  implementation_class PuppetX::Cspec::Case

Adding the implementation

The implementation is a Ruby class that provide the logic we want, I won’t show the entire thing with reporting and everything but you’ll get the basic idea:

# modules/cspec/lib/puppet_x/cspec/suite.rb
module PuppetX
  class Cspec
    class Suite
      # Puppet calls this method when it needs an instance of this type
      def self.from_asserted_hash(description, fail_fast, report)
        new(description, fail_fast, report)
      attr_reader :description, :fail_fast
      def initialize(description, fail_fast, report)
        @description = description
        @fail_fast = !!fail_fast
        @report = report
        @testcases = []
      # what puppet file and line the Puppet DSL is on
      def puppet_file_line
        fl = Puppet::Pops::PuppetStack.stacktrace[0]
        [fl[0], fl[1]]
      def outcome
          "testsuite" => @description,
          "testcases" => @testcases,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "success" => @testcases.all?{|t| t["success"]}
      # Writes the memory state to disk, see outcome above
      def write_report
        # ...
      def run_suite
        Puppet.notice(">>> Starting test suite: %s" % [@description])
        Puppet.notice(">>> Completed test suite: %s" % [@description])
      def it(description, &blk)
        require_relative "case"
        t =, description)
        @testcases << t.outcome

And here is the Cspec::Case:

# modules/cspec/lib/puppet_x/cspec/case.rb
module PuppetX
  class Cspec
    class Case
      # Puppet calls this to make instances
      def self.from_asserted_hash(suite, description)
        new(suite, description)
      def initialize(suite, description)
        @suite = suite
        @description = description
        @assertions = []
        @start_location = puppet_file_line
      # assert 2 things are equal and show sender etc in the output
      def assert_task_data_equals(result, left, right)
        if left == right
          success("assert_task_data_equals", "%s success" %
          return true
        failure("assert_task_data_equals: %s" %, "%s\n\n\tis not equal to\n\n %s" % [left, right])
      # checks the outcome of a choria RPC request and make sure its fine
      def assert_task_success(results)
        if results.error_set.empty?
          success("assert_task_success:", "%d OK results" % results.count)
          return true
        failure("assert_task_success:", "%d failures" % [results.error_set.count])
      # assert 2 things are equal
      def assert_equal(left, right)
        if left == right
          success("assert_equal", "values matches")
          return true
        failure("assert_equal", "%s\n\n\tis not equal to\n\n %s" % [left, right])
      # the puppet .pp file and line Puppet is on
      def puppet_file_line
        fl = Puppet::Pops::PuppetStack.stacktrace[0]
        [fl[0], fl[1]]
      # show a OK message, store the assertions that ran
      def success(what, message)
        @assertions << {
          "success" => true,
          "kind" => what,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "message" => message
        Puppet.notice("&#x2714;︎ %s: %s" % [what, message])
      # show a Error message, store the assertions that ran
      def failure(what, message)
        @assertions << {
          "success" => false,
          "kind" => what,
          "file" => puppet_file_line[0],
          "line" => puppet_file_line[1],
          "message" => message
        Puppet.err("✘ %s: %s" % [what, @description])
        raise(Puppet::Error, "Test case %s fast failed: %s" % [@description, what]) if @suite.fail_fast
      # this will show up in the report JSON
      def outcome
          "testcase" => @description,
          "assertions" => @assertions,
          "success" => @assertions.all? {|a| a["success"]},
          "file" => @start_location[0],
          "line" => @start_location[1]
      # invokes the test case
      def run
        Puppet.notice("==== Test case: %s" % [@description])
        # runs the puppet block
        success("testcase", @description)

Finally I am going to need a little function to create the suite – cspec::suite function, it really just creates an instance of PuppetX::Cspec::Suite for us.

# modules/cspec/lib/puppet/functions/cspec/suite.rb
Puppet::Functions.create_function(:"cspec::suite") do
  dispatch :handler do
    param "String", :description
    param "Boolean", :fail_fast
    param "String", :report
    return_type "Cspec::Suite"
  def handler(description, fail_fast, report, &blk)
    suite =, fail_fast, report)

Bringing it together

So that’s about it, it’s very simple really the code above is pretty basic stuff to achieve all of this, I hacked it together in a day basically.

Lets see how we turn these building blocks into a test suite.

I need a entry point that drives the suite – imagine I will have many different plans to run, one per agent and that I want to do some pre and post run tasks etc.

plan cspec::suite (
  Boolean $fail_fast = false,
  Boolean $pre_post = true,
  Stdlib::Absolutepath $report,
  String $data
) {
  $ds = {
    "type"   => "file",
    "file"   => $data,
    "format" => "yaml"
  # initializes the report
  # force a puppet run everywhere so PuppetDB is up to date, disables Puppet, wait for them to finish
  if $pre_post {
    choria::run_playbook("cspec::pre_flight", ds => $ds)
  # Run our test suite
  choria::run_playbook("cspec::run_suites", _catch_errors => true,
    ds => $ds,
    fail_fast => $fail_fast,
    report => $report
    .choria::on_error |$err| {
      err("Test suite failed with a critical error: ${err.message}")
  # enables Puppet
  if $pre_post {
    choria::run_playbook("cspec::post_flight", ds => $ds)
  # reads the report from disk and creates a basic overview structure

Here’s the cspec::run_suites Playbook that takes data from a Choria data source and drives the suite dynamically:

plan cspec::run_suites (
  Hash $ds,
  Boolean $fail_fast = false,
  Stdlib::Absolutepath $report,
) {
  $suites = choria::data("suites", $ds)
  notice(sprintf("Running test suites: %s", $suites.join(", ")))
  choria::data("suites", $ds).each |$suite| {
      ds => $ds,
      fail_fast => $fail_fast,
      report => $report

And finally a YAML file defining the suite, this file describes my AWS environment that I use to do integration tests for Choria and you can see there’s a bunch of other tests here in the suites list and some of them will take data like what nodes to expect etc.

  - cspec::discovery
  - cspec::choria
  - cspec::agents::shell
  - cspec::agents::process
  - cspec::agents::filemgr
  - cspec::agents::nettest
choria.version: mcollective plugin 0.7.0
nettest.port: 8140
discovery.fact_filter: operatingsystem=CentOS


So this then is a rather quick walk through of extending Puppet in ways many of us would not have seen before. I spent about a day getting this all working which included figuring out a way to maintain the mutating report state internally etc, the outcome is a test suite I can run and it will thoroughly drive a working 5 node network and assert the outcomes against real machines running real software.

I used to have a MCollective integration test suite, but I think this is a LOT nicer mainly due to the Choria Playbooks and extensibility of modern Puppet.

$ mco playbook run cspec::suite --data `pwd`/suite.yaml --report `pwd`/report.json

The current code for this is on GitHub along with some Terraform code to stand up a test environment, it’s a bit barren right now but I’ll add details in the next few weeks.

by R.I. Pienaar at April 03, 2018 09:23 AM

toolsmith #132 - The HELK vs APTSimulator - Part 2

Continuing where we left off in The HELK vs APTSimulator - Part 1, I will focus our attention on additional, useful HELK features to aid you in your threat hunting practice. HELK offers Apache Spark, GraphFrames, and Jupyter Notebooks  as part of its lab offering. These capabilities scale well beyond a standard ELK stack, this really is where parallel computing and significantly improved processing and analytics truly take hold. This is a great way to introduce yourself to these technologies, all on a unified platform.

Let me break these down for you a little bit in case you haven't been exposed to these technologies yet. First and foremost, refer to @Cyb3rWard0g's wiki page on how he's designed it for his HELK implementation, as seen in Figure 1.
Figure 1: HELK Architecture
First, Apache Spark. For HELK, "Elasticsearch-hadoop provides native integration between Elasticsearch and Apache Spark, in the form of an RDD (Resilient Distributed Dataset) (or Pair RDD to be precise) that can read data from Elasticsearch." Per the Apache Spark FAQ, "Spark is a fast and general processing engine compatible with Hadoop data" to deliver "lighting-fast cluster computing."
Second, GraphFrames. From the GraphFrames overview, "GraphFrames is a package for Apache Spark which provides DataFrame-based Graphs. GraphFrames represent graphs: vertices (e.g., users) and edges (e.g., relationships between users). GraphFrames also provide powerful tools for running queries and standard graph algorithms. With GraphFrames, you can easily search for patterns within graphs, find important vertices, and more." 
Finally, Jupyter Notebooks to pull it all together.
From "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more." Jupyter Notebooks provide a higher order of analyst/analytics capabilities, if you haven't dipped your toe in that water, this may be your first, best opportunity.
Let's take a look at using Jupyter Notebooks with the data populated to my Docker-based HELK instance as implemented in Part 1. I repopulated my HELK instance with new data from a different, bare metal Windows instance reporting to HELK with Winlogbeat, Sysmon enabled, and looking mighty compromised thanks to @cyb3rops's APTSimulator.
To make use of Jupyter Notebooks, you need your JUPYTER CURRENT TOKEN to access the Jupyter Notebook web interface. It was presented to you when your HELK installation completed, but you can easily retrieve it via sudo docker logs helk-analytics, then copy and paste the URL into your browser to connect for the first time with a token. It will look like this,
http://localhost:8880/?token=3f46301da4cd20011391327647000e8006ee3574cab0b163, as described in the Installation wiki. After browsing to the URL with said token, you can begin at http://localhost:8880/lab, where you should immediately proceed to the Check_Spark_Graphframes_Integrations.ipynb notebook. It's found in the hierarchy menu under training > jupyter_notebooks > getting_started. This notebook is essential to confirming you're ingesting data properly with HELK and that its integrations are fully functioning. Step through it one cell at a time with the play button, allowing each task to complete so as to avoid errors. Remember the above mentioned Resilient Distributed Dataset? This notebook will create a Spark RDD on top of Elasticsearch using the logs-endpoint-winevent-sysmon-* (Sysmon logs) index as source, and do the same thing with the logs-endpoint-winevent-security-* (Window Security Event logs) index as source, as seen in Figure 2.
Figure 2: Windows Security EVT Spark RDD
The notebook will also query your Windows security events via Spark SQL, then print the schema with:
df ="org.elasticsearch.spark.sql").load("logs-endpoint-winevent-security-*/doc")
The result should resemble Figure 3.
Figure 3: Schema
Assuming all matches with relative consistency in your experiment, let's move on to the Sysmon_ProcessCreate_Graph.ipynb notebook, found in training > jupyter_notebooks. This notebook will again call on the Elasticsearch Sysmon index and create vertices and edges dataframes, then create a graph produced with GraphFrame built from those same vertices and edges. Here's a little walk-through.
The v parameter (yes, for vertices) is populated with:
v = df.withColumn("id", df.process_guid).select("id","user_name","host_name","process_parent_name","process_name","action")
v = v.filter(v.action == "processcreate")
Showing the top three rows of that result set, with,truncate=False), appears as Figure 4 in the notebook, with the data from my APTSimulator "victim" system, N2KND-PC.
Figure 4: WTF, Florian :-)
The epic, uber threat hunter in me believes that APTSimulator created nslookup, 7z, and regedit as processes via cmd.exe. Genius, right? :-)
The e parameter (yes, for edges) is populated with:
e = df.filter(df.action == "processcreate").selectExpr("process_parent_guid as src","process_guid as dst").withColumn("relationship", lit("spawned"))
Showing the top three rows of that result set, with,truncate=False), produces the source and destination process IDs as it pertains to the spawning relationship.
Now, to create a graph from the vertices and edges dataframes as defined in the v & e parameters with g = GraphFrame(v, e). Let's bring it home with a hunt for Process A spawning Process B AND Process B Spawning Process C, the code needed, and the result, are seen from the notebook in Figure 5.
Figure 5: APTSimulator's happy spawn
Oh, yes, APTSimulator fully realized in a nice graph. Great example seen in cmd.exe spawning wscript.exe, which then spawns rundll32.exe. Or cmd.exe spawning powershell.exe and schtasks.exe.
Need confirmation? Florian's CactusTorch JS dropper is detailed in Figure 6, specifically cmd.exe > wscript.exe > rundll32.exe.
Figure 6: APTSimulator source for CactusTorch
Still not convinced? How about APTSimulator's schtasks.bat, where APTSimulator kindly loads mimikatz with schtasks.exe for persistence, per Figure 7?
Figure 7: schtasks.bat
I certainly hope that the HELK's graph results matching nicely with APTSimulator source meets with your satisfaction.
The HELK vs APTSimulator ends with a glorious flourish, these two monsters in their field belong in every lab to practice red versus blue, attack and defend, compromise and detect. I haven't been this happy to be a practitioner in the defense against the dark arts in quite awhile. My sincere thanks to Roberto and Florian for their great work on the HELK and APTSimulator. I can't suggest strongly enough how much you'll benefit from taking the time to run through Part 1 and 2 of The HELK vs APTSimulator for yourself. Both tools are well documented on their respective Githubs, go now, get started, profit.
Cheers...until next time.

by Russ McRee ( at April 03, 2018 07:01 AM

April 01, 2018

That grumpy BSD guy

ed(1) mastery is a must for a real Unix person

ed(1) is the standard editor. Now there's a book out to help you master this fundamental Unix tool.

In some circles on the Internet, your choice of text editor is a serious matter.

We've all seen the threads on mailing lits, USENET news groups and web forums about the relative merits of Emacs vs vi, including endless iterations of flame wars, and sometimes even involving lesser known or non-portable editing environments.

And then of course, from the Linux newbies we have seen an endless stream of tweeted graphical 'memes' about the editor vim (aka 'vi Improved') versus the various apparently friendlier-to-some options such as GNU nano. Apparently even the 'improved' version of the classical and ubiquitous vi(1) editor is a challenge even to exit for a significant subset of the younger generation.

Yes, your choice of text editor or editing environment is a serious matter. Mainly because text processing is so fundamental to our interactions with computers.

But for those of us who keep our systems on a real Unix (such as OpenBSD or FreeBSD), there is no real contest. The OpenBSD base system contains several text editors including vi(1) and the almost-emacs mg(1), but ed(1) remains the standard editor.

Now Michael Lucas has written a book to guide the as yet uninitiated to the fundamentals of the original Unix text editor. It is worth keeping in mind that much of Unix and its original standard text editor written back when the standard output and default user interface was more likely than not a printing terminal.

To some of us, reading and following the narrative of Ed Mastery is a trip down memory lane. To others, following along the text will illustrate the horror of the world of pre-graphic computer interfaces. For others again, the fact that ed(1) doesn't use your terminal settings much at all offers hope of fixing things when something or somebody screwed up your system so you don't have a working terminal for that visual editor.

ed(1) is a line editor. And while you may have heard mutters that 'vi is just a line editor in drag', vi(1) does offer a distinctly visual interface that only became possible with the advent of the video terminal, affectionately known as the glass teletype. ed(1) offers no such luxury, but as the book demonstrates, even ed(1) is able to display any part of a file's content for when you are unsure what your file looks like.

The book Ed Mastery starts by walking the reader through a series of editing sessions using the classical ed(1) line editing interface. To some readers the thought of editing text while not actually seeing at least a few lines at the time onscreen probably sounds scary.  This book shows how it is done and while the author never explicitly mentions it, the text aptly demonstrates how the ed(1) command set is in fact the precursor of of how things are done in many Unix text processing programs.

As one might expect, the walkthrough of ed(1) text editing functionality is followed up by a sequence on searching and replacing which ultimately leads to a very readable introduction to regular expressions, which of course are part of the ed(1) package too. If you know your ed(1) command set, you are quite far along in the direction of mastering the stream editor sed(1), as well as a number of other systems where regular expressions play a crucial role.

After the basic editing functionality and some minor text processing magic has been dealt with, the book then proceeds to demonstrate ed(1) as a valuable tool in your Unix scripting environment. And once again, if you can do something with ed, you can probably transfer that knowledge pretty much intact to use with other Unix tools.

The eighty-some text pages of Ed Mastery are a source of solid information on the ed(1) tool itself with a good helping of historical context that will make it clearer to newcomers why certain design choices were made back when the Unix world was new. A number of these choices influence how we interact with the modern descendants of the Unix systems we had back then.

Your choice of text editor is a serious matter. With this book, you get a better foundation for choosing the proper tool for your text editing and text processing needs. I'm not saying that you have to switch to the standard editor, but after reading Ed Mastery , your choice of text editing and processing tools will be a much better informed one.

Ed Mastery  is available now directly from Michael W. Lucas' books site at, and will most likely appear in other booksellers' catalogs as soon as their systems are able to digest the new data.

Do read the book, try out the standard editor and have fun!

by Peter N. M. Hansteen ( at April 01, 2018 10:21 AM

Colin Percival

Tarsnap pricing change

I launched the current Tarsnap website in 2009, and while we've made some minor adjustments to it over the years — e.g., adding a page of testimonials, adding much more documentation, and adding a page with .deb binary packages — the changes have overall been relatively modest. One thing people criticized the design for in 2009 was the fact that prices were quoted in picodollars; this is something I have insisted on retaining for the past eight years.

One of the harshest critics of Tarsnap's flat rate picodollars-per-byte pricing model is Patrick McKenzie — known to much of the Internet as "patio11" — who despite our frequent debates can take credit for ten times more new Tarsnap customers than anyone else, thanks to a single ten thousand word blog post about Tarsnap. The topic of picodollars has become something of an ongoing debate between us, with Patrick insisting that they communicate a fundamental lack of seriousness and sabotage Tarsnap's success as a business, and me insisting that using they communicate exactly what I want to communicate, and attract precisely the customer base I want to have. In spite of our disagreements, however, I really do value Patrick's input; indeed, the changes I mentioned above came about in large part due to the advice I received from him, and for a long time I've been considering following more of Patrick's advice.

A few weeks ago, I gave a talk at the AsiaBSDCon conference about profiling the FreeBSD kernel boot. (I'll be repeating the talk at BSDCan if anyone is interested in seeing it in person.) Since this was my first time in Tokyo (indeed, my first time anywhere in Asia) and despite communicating with him frequently I had never met Patrick in person, I thought it was only appropriate to meet him for dinner; fortunately the scheduling worked out and there was an evening when he was free and I wasn't suffering too much from jetlag. After dinner, Patrick told me about a cron job he runs:

I knew then that the time was coming to make a change Patrick has long awaited: Getting rid of picodollars. It took a few weeks before the right moment arrived, but I'm proud to announce that as of today, April 1st 2018, Tarsnap's storage pricing is 8333333 attodollars per byte-day.

This addresses a long-standing concern I've had about Tarsnap's pricing: Tarsnap bills customers for usage on a daily basis, but since 250 picodollars is not a multiple of 30, usage bills have been rounded. Tarsnap's accounting code works with attodollars internally (Why attodollars? Because it's easy to have 18 decimal places of precision using fixed-point arithmetic with a 64-bit "attodollars" part.) and so during 30-day months I have in fact been rounding down and billing customers at a rate of 8333333 attodollars per byte-day for years — so making this change on the Tarsnap website brings it in line with the reality of the billing system.

Of course, there are other advantages to advertising Tarsnap's pricing in attodollars. Everything which was communicated by pricing storage in picodollars per byte-month is communicated even more effectively by advertising prices in attodollars per byte-day, and I have no doubt that Tarsnap users will appreciate the increased precision.

April 01, 2018 12:00 AM

March 27, 2018

LZone - Sysadmin

Sequence definitions with kwalify

After guess-trying a lot on how to define a simple sequence in kwalify (which I do use as a JSON/YAML schema validator) I want to share this solution for a YAML schema.

So my use case is whitelisting certain keys and somehow ensuring their types. Using this I want to use kwalify to validate YAML files. Doing this for scalars are simple, but hashes and lists of scalar elements are not. Most problematic was the lists...

Defining Arbitrary Scalar Sequences

So how to define a list in kwalify? The user guide gives this example:
  type: seq
     - type: str
This gives us a list of strings. But many lists also contain numbers and some contain structured data. For my use case I want to exclude structured date AND allow numbers. So "type: any" cannot be used. Also "type: any" would'nt work because it would require defining the mapping for any, which in a validation use case where we just want to ensure the list as a type, we cannot know. The great thing is there is a type "text" which you can use to allow a list of strings or number or both like this:
  type: seq
     - type: text

Building a key name + type validation schema

As already mentioned the need for this is to have a whitelisting schema with simple type validation. Below you see an example for such a schema:
type: map
  "default_definition": &allow_hash
     type: map
         type: any

"default_list_definition": &allow_list type: seq sequence: # Type text means string or number - type: text

"key1": *allow_hash "key2": *allow_list "key3": type: str

=: type: number range: { max: 29384855, min: 29384855 }
At the top there are two dummy keys "default_definition" and "default_list_definition" which we use to define two YAML references "allow_hash" and "allow_list" for generic hashes and scalar only lists.

In the middle of the schema you see three keys which are whitelisted and using the references are typed as hash/list and also as a string.

Finally for this to be a whitelist we need to refuse all other keys. Note that '=' as a key name stands for a default definition. Now we want to say: default is "not allowed". Sadly kwalify has no mechanism for this that allows expressing something like
    type: invalid
Therefore we resort to an absurd type definition (that we hopefully never use) for example a number that has to be exactly 29384855. All other keys not listed in the whitelist above, hopefully will fail to be this number an cause kwalify to throw an error.

This is how the kwalify YAML whitelist works.

March 27, 2018 08:59 PM

PyPI does brownouts for legacy TLS

Nice! Reading through the maintenance notices on my status page aggregator I learned that PyPI started intentionally blocking legacy TLS clients as a way of getting people to switch before TLS 1.0/1.1 support is gone for real.

Here is a quote from their status page:

In preparation for our CDN provider deprecating TLSv1.0 and TLSv1.1 protocols, we have begun rolling brownouts for these protocols for the first ten (10) minutes of each hour.

During that window, clients accessing with clients that do not support TLSv1.2 will receive an HTTP 403 with the error message "This is a brown out of TLSv1 support. TLSv1 support is going away soon, upgrade to a TLSv1.2+ capable client.".

I like this action as a good balance of hurting as much as needed to help end users to stop putting of updates.

March 27, 2018 08:35 PM

March 26, 2018

Sean's IT Blog

The Virtual Horizon Podcast Episode 2 – A Conversation with Angelo Luciani

On this episode of The Virtual Horizon podcast, we’ll journey to the French Rivera for the 2017 Nutanix .Next EU conference. We’ll be joined by Angelo Luciani, Community Evangelist for Nutanix, to discuss blogging and the Virtual Design Master competition.

Nutanix has two large conferences scheduled for 2018 – .Next in New Orleans in May 2018 and .Next EU in London at the end of November 2018.

Show Credits:
Podcast music is a derivative of Boogie Woogie Bed by Jason Shaw ( Licensed under Creative Commons: By Attribution 3.0 License

by seanpmassey at March 26, 2018 01:00 PM

March 18, 2018

Restic (backup) deleting old backups is extremely slow

Here's a very quick note:

I've been using the Restic backup tool with the SFTP backend for a while now, and so far it was great. Until I tried to prune some old backups. It takes two hours to prune 1 GiB of data from a 15 GiB backup. During that time, you cannot create new backups. It also consumes a huge amount of bandwidth when deleting old backups. I strongly suspect it downloads each blob from the remote storage backend, repacks it and then writes it back.

I've seen people on the internet with a few hundred GiB worth of backups having to wait 7 days to delete their old backups. Since the repo is locked during that time, you cannot create new backups.

This makes Restic completely unusable as far as I'm concerned. Which is a shame, because other than that, it's an incredible tool.

by admin at March 18, 2018 03:34 PM

March 17, 2018

LZone - Sysadmin

Puppet Agent Settings Issue

Experienced a strange puppet agent 4.8 configuration issue this week. To properly distribute the agent runs over time to even out puppet master load I wanted to configure the splay settings properly. There are two settings:
  • A boolean "splay" to enable/disable splaying
  • A range limiter "splayLimit" to control the randomization
What first confused me was the "splay" was not on per-default. Of course when using the open source version it makes sense to have it off. Having it on per-default sounds more like an enterprise feature :-)

No matter the default after deploying an agent config with settings like this
runInterval = 3600
splay = true
splayLimit = 3600
... nothing happened. Runs were still not randomized. Checking the active configuration with
# puppet config print | grep splay
turned out that my config settings were not working at all. What was utterly confusing is that even the runInterval was reported as 1800 (which is the default value). But while the splay just did not work the effective runInterval was 3600!

After hours of debugging it, I happened to read the puppet documentation section that covers the config sections like [agent] and [main]. It says that [main] configures global settings and other sections can override the settings in [main], which makes sense.

But it just doesn't work this way. In the end the solution was using [main] as config section instead of [agent]:
and with this config "puppet config print" finally reported the settings as effective and the runtime behaviour had the expected randomization.

Maybe I misread something somewhere, but this is really hard to debug. And INI file are not really helpful in Unix. Overriding works better default files and with drop dirs.

March 17, 2018 08:38 PM

March 14, 2018


Target your damned survey report

StackOverflow has released their 2018 Developer Hiring Landscape report. (alternate source)

This is the report that reportedly is about describing the demographics and preferences of software creators, which will enable people looking to hire such creators to better tailor their offerings.

It's an advertising manual, basically. However, they dropped the ball in a few areas. One of which has been getting a lot of traction on Twitter.

It's getting traction for a good reason, and it has to do with how these sorts of reports are written. The section under discussion here is "Differences in assessing jobs by gender". They have five cross-tabs here:

  1. All respondents highest-ranked.
  2. All respondents lowest-ranks (what the above references).
  3. All men highest-ranked.
  4. All women highest-ranked.
  5. All non-binary highest-ranked (they have this. This is awesome).

I took this survey, and it was one of those classic questions like:

Rank these ten items from lowest to highest.

And yet, this report seems to ignore everything but the 1's and 10's. This is misguided, and leaves a lot of very valuable market-segment targeting information on the floor. Since 92% of respondents were men, the first and third tabs were almost identical, differing only by tenths of a percent. The second tab is likewise, that's a proxy tab for "what men don't want". We don't know how women or non-binary differ in their least-liked preferences.

There is some very good data they could have presented, but chose not to. First of all, the number one, two and three priorities are the ones that people are most conscious of and may be willing to compromise one to get the other two. This should have been presented.

  1. All respondents top-3 ranked.
  2. All men top-3 ranked.
  3. All women top-3 ranked.
  4. All non-binary top-3 ranked.

Compensation/Benefits would probably be close to 100%, but we would get interesting differences in the number two and three places on that chart. This gives recruiters the information they need to construct their pitches. Top-rank is fine, but you also want to know the close-enoughs. Sometimes, if you don't hit the top spot, you can win someone by hitting everything else.

I have the same complaint for their "What Developers Value in Compensation and Benefits" cross-tab. Salary/Bonus is the top item for nearly everyone. This is kind of a gimmie. The number 2 and 3 places are very important because they're the tie-breaker. If an applicant is looking at a job that hits their pay rank, but misses on the next two most important priorities, they're going to be somewhat less enthusiastic. In a tight labor market, if they're also looking at an offer from a company that misses the pay by a bit and hits the rest, that may be the offer that gets accepted. The 2 through 9 rankings on that chart are important.

This is a company that uses proportional voting for their moderator elections. They know the value of ranked voting. Winner-takes-all surveys are missing the point, and doing their own target market, recruiters, a disservice.

They should do better.

by SysAdmin1138 at March 14, 2018 08:59 PM

The Lone Sysadmin

No VMware NSX Hardware Gateway Support for Cisco

I find it interesting, as I’m taking my first real steps into the world of VMware NSX, that there is no Cisco equipment supported as a VMware NSX hardware gateway (VTEP). According to the HCL on March 13th, 2018 there is a complete lack of “Cisco” in the “Partner” category: I wonder how that works out […]

The post No VMware NSX Hardware Gateway Support for Cisco appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at March 14, 2018 05:26 PM

March 09, 2018


Making Of “Murdlok”, the new old adventure game for the C64

Recently, the 1986 adventure game “Murdlok” was published here for the first time. This is author Peter Hempel‘s “making-of” story, in German. (English translation)

Am Anfang war der Brotkasten: Wir schreiben das Jahr 1984, oder war es doch schon 1985? Ich hab es über all die Jahre vergessen. Computer sind noch ein Zauberwort, obwohl sie schon seit Jahren auf dem Markt angeboten werden. Derweilen sind sie so klein, dass diese problemlos auf den Tisch gestellt werden können. Mikroprozessor! Und Farbe soll der auch haben, nicht monochrom wie noch überall üblich. Commodore VC20 stand in der Reklame der Illustrierten Zeitung, der Volkscomputer, wahrlich ein merkwürdiger Name, so wie der Name der Firma die ihn herstellt. C=Commodore, was hat dieser Computer mit der Seefahrt zu tun frage ich mich? Gut, immerhin war mir die Seite ins Auge gefallen.

Das Ding holen wir uns, aber gleich den „Großen“, der C64 mit 64 KB. Den bestellen wir im Versandhandel bei Quelle. So trat mein Kumpel an mich heran. Das war damals noch mit erheblichen Kosten verbunden. Der Computer 799 D-Mark, Floppy 799 D-Mark und noch ein Bildschirm in Farbe dazu. Damals noch ein Portable TV für 599 D-Mark.

Als alles da war ging es los! Ohne Selbststudium war da nichts zu machen, für mich war diese Technologie absolutes Neuland. Ich kannte auch niemanden, der sich hier auskannte, auch mein Kumpel nicht. Es wurden Fachbücher gekauft! BASIC für Anfänger! Was für eine spannende Geschichte. Man tippt etwas ein und es gibt gleich ein Ergebnis, manchmal ein erwartetes und manchmal ein unerwartetes. Das Ding hatte mich gefesselt, Tag und Nacht, wenn es die Arbeit und die Freundin zuließ.

Irgendwann viel mir das Adventure „Zauberschloß“ von Dennis Merbach in die Hände. Diese Art von Spielen war genau mein Ding! Spielen und denken! In mir keimte der Gedanke auch so ein Adventure zu basteln. „Adventures und wie man sie programmiert“ hieß das Buch, das ich zu Rate zog. Ich wollte auf jeden Fall eine schöne Grafik haben und natürlich möglichst viele Räume. Die Geschichte habe ich mir dann ausgedacht und im Laufe der Programmierung auch ziemlich oft geändert und verbessert. Ich hatte mich entschieden, die Grafik mit einem geänderten Zeichensatz zu erzeugen. Also, den Zeichensatzeditor aus der 64’er Zeitung abgetippt. Ja, Sprites brauchte ich auch, also den Sprite-Editor aus der 64’er Zeitung abgetippt. „Maschinensprache für Anfänger“ und fertig war die kleine abgeänderte Laderoutine im Diskettenpuffer. Die Entwicklung des neuen Zeichensatzes war dann eine sehr mühselige Angelegenheit. Zeichen ändern und in die Grafik einbauen. Zeichen ändern und in die Grafik einbauen………….und so weiter. Nicht schön geworden, dann noch mal von vorne. Als das Listing zu groß wurde kam, ich ohne Drucker nicht mehr aus und musste mir einen anschaffen. Irgendwann sind mir dann auch noch die Bytes ausgegangen und der Programmcode musste optimiert werden. Jetzt hatte sich die Anschaffung des Druckers ausgezahlt.

Während ich nach Feierabend und in der Nacht programmierte, saß meine Freundin mit den Zwillingen schwanger auf der Couch. Sie musste viel Verständnis für mein stundenlanges Hacken auf dem Brotkasten aufbringen. Sie hatte es aufgebracht, das Verständnis, und somit konnte das Spiel im Jahr 1986 fertigstellt werden. War dann auch mächtig stolz darauf. Habe meine Freundin dann auch später geheiratet, oder hatte sie mich geheiratet?

Das Projekt hatte mich viel über Computer und Programmierung gelehrt. Das war auch mein hautsächlicher Antrieb das Adventure zu Ende zu bringen. Es hat mir einfach außerordentliche Freude bereitet. Einige Kopien wurden angefertigt und an Freunde verteilt. Mehr hatte ich damals nicht im Sinn.

Mir wird immer wieder die Frage gestellt: „Warum hast du dein Spiel nicht veröffentlicht?“ Ja, im nachherein war es vermutlich dumm, aber ich hatte das damals einfach nicht auf dem Schirm. Es gab zu dieser Zeit eine Vielzahl von Spielen auf dem Markt, und ich hatte nicht das Gefühl, dass die Welt gerade auf meins wartete. War wohl eine Fehleinschätzung!

Sorry, dass ihr alle so lange auf „Murdlok“ warten musstet!

Zu meiner Person: Ich heiße Peter Hempel, aber das wisst ihr ja schon. Ich bin Jahrgang 1957 und wohne in Berlin, Deutschland. Das Programmieren ist nicht mein Beruf. Als ich 1974 meine Lehre zum Elektroniker angetreten hatte waren Homecomputer noch unbekannt. Ich habe viele Jahre als Servicetechniker gearbeitet und Ampelanlagen entstört und programmiert.

Das Spiel ist dann in Vergessenheit geraten!

Derweilen hatte ich schon mit einem Amiga 2000 rumgespielt.

Wir schreiben das Jahr 2017, ich finde zufällig einen C=Commodore C65. Ein altes Gefühl meldet sich in mir. Was für eine schöne Erinnerung an vergangene Tage. Aufbruch in die Computerzeit. Der C65 stellt sofort eine Verbindung zur Vergangenheit her. Die letzten Reste meiner C64 Zeit werden wieder vorgekramt. So kommt das Adventure „Murdlok“ wieder ans Tageslicht. Spiel läuft auch auf dem C65, was für ein schönes Gefühl.

Ich habe dann Michael kennengelernt. Ihm haben wir die Veröffentlichung von „Murdlok“ zu verdanken. Ich hätte nie gedacht, dass mein altes Spiel noch mal so viel Ehre erfährt.


Ich wünsche allen viel Spaß mit meinem Spiel und natürlich beim 8-Bit Hobby.

by Michael Steil at March 09, 2018 05:12 PM

March 07, 2018


50 000 Node Choria Network

I’ve been saying for a while now my aim with Choria is that someone can get a 50 000 node Choria network that just works without tuning, like, by default that should be the scale it supports at minimum.

I started working on a set of emulators to let you confirm that yourself – and for me to use it during development to ensure I do not break this promise – though that got a bit side tracked as I wanted to do less emulation and more just running 50 000 instances of actual Choria, more on that in a future post.

Today I want to talk a bit about a actual 50 000 real nodes deployment and how I got there – the good news is that it’s terribly boring since as promised it just works.



The network is pretty much just your typical DC network. Bunch of TOR switches, Distribution switches and Core switches, nothing special. Many dom0’s and many more domUs and some specialised machines. It’s flat there are firewalls between all things but it’s all in one building.


I have 4 machines, 3 set aside for the Choria Network Broker Cluster and 1 for a client, while waiting for my firewall ports I just used the 1 machine for all the nodes as well as the client. It’s a 8GB RAM VM with 4 vCPU, not overly fancy at all. Runs Enterprise Linux 6.

In the past I think we’d have considered this machine on the small side for a ActiveMQ network with 1000 nodes 😛

I’ll show some details of the single Choria Network Broker here and later follow up about the clustered setup.

Just to be clear, I am going to show managing 50 000 nodes on a machine that’s the equivalent of a $40/month Linode.


I run a custom build of Choria 0.0.11, I bump the max connections up to 100k and turned off SSL since we simply can’t provision certificates, so a custom build let me get around all that.

The real reason for the custom build though is that we compile in our agent into the binary so the whole deployment that goes out to all nodes and broker is basically what you see below, no further dependencies at all, this makes for quite a nice deployment story since we’re a bit challenged in that regard.

$ rpm -ql choria

Other than this custom agent and no SSL we’re about on par what you’d get if you just install Choria from the repos.

Network Broker Setup

The Choria Network Broker is deployed basically exactly as the docs. Including setting the sysctl values to what was specified in the docs.

identity =
logfile = /var/log/choria.log
plugin.choria.stats_address = ::
plugin.choria.stats_port = 8222 = :: = 4222 = 4223

Most of this isn’t even needed basically if you use defaults like you should.

Server Setup

The server setup was even more boring:

logger_type = file
logfile = /var/log/choria.log
plugin.choria.middleware_hosts =
plugin.choria.use_srv = false


So we were being quite conservative and deployed it in batches of 50 a time, you can see the graph below of this process as seen from the Choria Network Broker (click for larger):

This is all pretty boring actually, quite predictable growth in memory, go routines, cpu etc. The messages you see being sent is me doing lots of pings and rpc’s and stuff just to check it’s all going well.

$ ps -auxw|grep choria
root     22365 12.9 14.4 2879712 1175928 ?     Sl   Mar06 241:34 /usr/choria broker --config=....
# a bit later than the image above
$ sudo netstat -anp|grep 22365|grep ESTAB|wc -l


So how does work in practise? In the past we’d have had a lot of issues with getting consistency out of a network of even 10% this size, I was quite confident it was not the Ruby side, but you never know?

Well, lets look at this one, I set discovery_timeout = 20 in my client configuration:

$ mco rpc rpcutil ping --display failed
Finished processing 51152 / 51152 hosts in 20675.80 ms
Finished processing 51152 / 51152 hosts in 20746.82 ms
Finished processing 51152 / 51152 hosts in 20778.17 ms
Finished processing 51152 / 51152 hosts in 22627.80 ms
Finished processing 51152 / 51152 hosts in 20238.92 ms

That’s a huge huge improvement, and this is without fancy discovery methods or databases or anything – it’s the, generally fairly unreliable, broadcast based method of discovery. These same nodes on a big RabbitMQ cluster never gets a consistent result (and it’s 40 seconds slower), so this is a huge win for me.

I am still using the Ruby code here of course and it’s single threaded and stuck on 1 CPU, so in practise it’s going to have a hard ceiling of churning through about 2500 to 3000 replies/second, hence the long timeouts there.

I have a go based ping, it round trips this network in less than 3.5 seconds quite reliably – wow.

The broker peaked at 25Mbps at times when doing many concurrent RPC requests and pings etc, but it’s all just been pretty good with no surprises.

The ruby client is a bit big so as a final test I bumped the RAM on this node to 16GB. If I run 6 x RPC clients at exactly the same time doing a full estate RPC round trip (including broadcast based discovery) all 6 clients get exactly the same results consistently. So I guess I know the Ruby code was never the problem and I am very glad to see code I designed and wrote in 2009 scaling to this size – the Ruby client code really have never been touched after initial development.

So, that’s about it, I really can’t complain about this.

by R.I. Pienaar at March 07, 2018 04:50 PM

March 01, 2018

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – February 2018

It is mildly shocking that I’ve been blogging for 13+ years (my first blog post on this blog was in December 2005, my old blog at O’Reilly predates this by about a year), so let’s spend a moment contemplating this fact.

<contemplative pause here :-)>

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
  3. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  5. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version)
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

Critical reference posts:
Current research on testing security:
Current research on threat detection “starter kit”
Just finished research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin ( at March 01, 2018 05:19 PM


Seeking Last Group of Contributors

The following is a press release that we just put out about how finishing off our relicensing effort. For the impatient, please see to help us find the last people; we want to change the license with our next release, which is currently in Alpha, and tentatively set for May.

For background, you can see all posts in the license category.

One copy of the press release is at

OpenSSL Seeking Last Group of Contributors

Looking for programmers who contributed code to the OpenSSL project

The OpenSSL project, [] (, is trying to reach the last couple-dozen people who have contributed code to OpenSSL. They are asking people to look at to see if they recognize any names. If so, contact with any information.

This marks one of the final steps in the project’s work to change the license from its non-standard custom text, to the highly popular Apache License. This effort first started in the Fall of 2015, by requiring contributor agreements. Last March, the project made a major publicity effort, with large coverage in the industry. It also began to reach out and contact all contributors, as found by reviewing all changes made to the source. Over 600 people have already responded to emails or other attempts to contact them, and more than 98% agreed with the change. The project removed the code of all those who disagreed with the change. In order to properly respect the desires of all original authors, the project continues to make strong efforts to find everyone.

Measured purely by simple metrics, the average contribution still outstanding is not large. There are a total of 59 commits without a response, out of a history of more than 32,300. On average, each person submitted a patch that modified 3-4 files, adding 100 lines and removing 23.

“We’re very pleased to be changing the license, and I am personally happy that OpenSSL has adopted the widely deployed Apache License,” said Mark Cox, a founding member of the OpenSSL Management Committee. Cox is also a founder and former Board Member of the Apache Software Foundation.

The project hopes to conclude its two-year relicensing effort in time for the next release, which will include an implementation of TLS 1.3.

For more information, email


March 01, 2018 06:00 AM