Planet SysAdmin


February 23, 2018

Chris Siebenmann

Github and publishing Git repositories

Recently I got into a discussion on Twitter where I mentioned that I'd like a simple way to publish Git repositories on my own web server. You might reasonably ask why I need such a thing, since Github exists and I even use it. For me, a significant part of the answer is social. To put it one way, Github has become a little bit too formal, or at least I perceive it as having done so.

What has done this to Github is that more and more, people will look at your Github presence and form judgements based on what they see. They will go through your list of repositories and form opinions, and then look inside some of the repositories and form more opinions. At least part of this exploration is natural and simply comes from stumbling over something interesting; more than once, I've wound up on someone's repository and wondered what else they work on and if there's anything interesting there. But a certain amount of it is the straightforward and logical consequence of the common view that Github is part of your developer resume. We curate our resumes, and if our Github presence is part of that, well, we're going to curate that too. A public portfolio of work always tries to put your best foot forward, and even if that's not necessarily my goal with my Github presence, I still know that that's how people may take it.

All of this makes me feel uncomfortable about throwing messy experiments and one-off hacks up on Github. If nothing else, they feel like clutter that gets in the way of people seeing (just) the repositories that I'm actively proud of, want to attract attention to, and think that people might find something useful in. Putting something up on Github just so people can get a copy of it feels not so much wrong as out of place; that's not what I use my Github presence for.

(A strongly related issue are the signals that I suspect that your Github presence sends when you file issues in other people's Github repositories. Some of the time people are going to look at your profile, your activities, and your repositories to assess your clue level, especially if you're reporting something tangled and complex. If you want people to take your issues seriously, a presence that signals 'I probably know what I'm doing' is pretty useful.)

A separate set of Git repositories elsewhere, in a less formal space, avoids all of these issues. No one is going to mistake a set of repositories explicitly labeled 'random stuff I'm throwing up in case people want to look' for anything more than that, and to even find it in the first place they would have to go on a much more extensive hunt than it takes to get to my Github presence (which I do link in various places because, well, it's my Github presence, the official place where I publish various things).

Sidebar: What I want in a Git repository publishing program

The minimal thing I need is something you can do git clone and git pull from, because that is the very basic start of publishing a Git repository. What I'd like is something that also gave a decent looking web view as well, with a description and showing a README, so that people don't have to clone a repository just to poke around in it. Truly ideal would be also providing tarball or zip archive downloads. All of this should be read-only; accepting git push and other such operations is an anti-feature.

It would be ideal if the program ran as a CGI, because CGIs are easy to manage and I don't expect much load. I'll live with a daemon that runs via FastCGI, but it can't be its own web server unless it can work behind another web server via a reverse proxy, since I already have a perfectly good web server that is serving things I care a lot more about.

(Also, frankly I don't trust random web server implementations to do HTTPS correctly and securely, and HTTPS is no longer optional. Doing HTTPS well is so challenging that not all dedicated, full scale web servers manage it.)

It's possible that git http-backend actually does what I want here, if I can set it up appropriately. Alternately, maybe cgit is what I want. I'll have to do some experimentation.

by cks at February 23, 2018 06:00 AM

February 22, 2018

Sarah Allen

declarative eventing

An emerging pattern in server-side event-driven programming formalizes the data that might be generated by an event source, then a consumer of that event source registers for very specific events.

A declarative eventing system establishes a contract between the producer (event source) and consumer (a specific action) and allows for binding a source and action without modifying either.

Comparing this to how traditional APIs are constructed, we can think of it as a kind of reverse query — we reverse the direction of typical request-response by registering a query and then getting called back every time there’s a new answer. This new model establishes a specific operational contract for registering these queries that are commonly called event triggers.

This pattern requires a transport for event delivery. While systems typically support HTTP and RPC mechanisms for local events which might be connected point-to-point in a mesh network, they also often connect to messaging or streaming data systems, like Apache Kafka, RabbitMQ, as well as proprietary offering.

This declarative eventing pattern can be seen in a number of serverless platforms, and is typically coupled with Functions-as-a-Service offerings, such as AWS Lambda and Google Cloud Functions.

An old pattern applied in a new way

Binding events to actions is nothing new. We have seen this pattern in various GUI programming environment for decades, and on the server-side in many Services Oriented Architecture (SOA) frameworks. What’s new is that we’re seeing server-side code that can be connected to managed services in a way that is almost as simple to set up as an onClick handler in HyperCard. However, the problems that we can solve with this pattern are today’s challenges of integrating data from disparate systems, often at high volume, along with custom analysis, business logic, machine learning and human interaction.

Distributed systems programming is no longer solely the domain of specialized systems engineers who create infrastructure, most applications we use every day integrate data sources from multiple systems across many providers. Distributed systems programming has become ubiquitous, providing an opportunity for interoperable systems at a much higher level.

by sarah at February 22, 2018 01:47 PM

Chris Siebenmann

Sorting out what exec does in Bourne shell pipelines

Today, I was revising a Bourne shell script. The original shell script ended by running rsync with an exec like this:

exec rsync ...

(I don't think the exec was there for any good reason; it's a reflex.)

I was adding some filtering of errors from rsync, so I fed its standard error to egrep and in the process I removed the exec, so it became:

rsync ... 2>&1 | egrep -v '^(...|...)'

Then I stopped to think about this, and realized that I was working on superstition. I 'knew' that combining exec and anything else didn't work, and in fact I had a memory that it caused things to malfunction. So I decided to investigate a bit to find out the truth.

To start with, let's talk about what we could think that exec did here (and what I hoped it did when I started digging). Suppose that you end a shell script like this:

#!/bin/sh
[...]
rsync ... 2>&1 | egrep -v '...'

When you run this shell script, you'll wind up with a hierarchy of three processes; the shell is the parent process, and then generally the rsync and the egrep are siblings. Linux's pstree will represent this as 'sh───2*[sleep]', and my favorite tool shows it like so:

pts/10   |      17346 /bin/sh thescript
pts/10    |     17347 rsync ...
pts/10    |     17348 egrep ...

If exec worked here the way I was sort of hoping it would, you'd get two processes instead of three, with whatever you exec'd (either the rsync or the egrep) taking over from the parent shell process. Now that I think about it, there are some reasonably decent reasons to not do this, but let's set that aside for now.

What I had a vague superstition of exec doing in a pipeline was that it might abruptly truncate the pipeline. When it go to the exec the shell just did what you told it to, ie exec the process, and since it had turned itself into a process it didn't go on to set up the rest of the pipeline. That would make 'exec rsync ... | egrep' be the same as just 'exec rsync ...', with the egrep effectively ignored. Obviously you wouldn't want that, hence me automatically taking the exec out.

Fortunately this is not what happens. What actually does happen is not quite that the exec is ignored, although that's what it looks like in simple cases. To understand what's going on, I had to start by paying careful attention to how exec is described, for example in Dash's manpage:

Unless command is omitted, the shell process is replaced with the specified program [...]

I have emphasized the important bit. The magic trick is what 'the shell process' is in a pipeline. If we write:

exec rsync ... | egrep -v ...

When the shell gets to processing the exec, what it considers 'the shell process' is actually the subshell running one step of the pipeline, here the subshell that exists to run rsync. This subshell is normally invisible here because for simple commands like this, the (sub)shell will immediately exec() rsync anyway; using exec just instructs this subshell to do what it was already going to do.

We can cause the shell to actually materialize a subshell by putting multiple commands here:

(/bin/echo hi; sleep 120) | cat

If you look at the process tree for this, you'll probably get:

pts/9    |      7481 sh
pts/9     |     7806 sh
pts/9      |    7808 sleep 120
pts/9     |     7807 cat

The subshell making up the first step of the pipeline could end by just exec()ing sleep, but it doesn't (at least in Dash and Bash); once the shell has decided to have a real subshell here, it stays a real subshell.

If you use exec in the context of such an actual subshell, it will indeed replace 'the shell process' of the subshell with the command you exec:

$ (exec echo hi; echo ho) | cat
hi
$

The exec replaced the entire subshell with the first echo, and so it never went on to run the second echo.

(Effectively you've arranged for an early termination of the subshell. There are probably times when this is useful behavior as part of a pipeline step, but I think you can generally use exit and what you're actually doing will be clearer.)

(I'm sure that I once knew all of this, but it fell out of my mind until I carefully worked it out again just now. Perhaps this time around it will stick.)

Sidebar: some of this behavior can vary by shell

Let's go back to '(/bin/echo hi; sleep 120) | cat'. In Dash and Bash, the first step's subshell sticks around to be the parent process of sleep, as mentioned. Somewhat to my surprise, both the Fedora Linux version of official ksh93 and FreeBSD 10.4's sh do optimize away the subshell in this situation. They directly exec the sleep, as if you wrote:

(/bin/echo hi; exec sleep 120) | cat

There's probably a reason that Bash skips this little optimization.

by cks at February 22, 2018 03:31 AM

February 21, 2018

Chris Siebenmann

How switching to uMatrix for JavaScript blocking has improved my web experience

I'm a long-term advocate of not running JavaScript. Over the years I've used a number of Firefox (and also Chrome) addons to do this, starting with a relatively simple one and then upgrading to NoScript. Recently I switched over to uMatrix for various reasons, which has generally been going well. When I switched, I didn't expect my experience of the modern web to really change, but to my surprise uMatrix is slowly enticing me into making it a clearly nicer experience. What's going on is that uMatrix's more fine-grained permissions model turns out to be a better fit for how JavaScript exists on the modern web.

NoScript and other similar addons have a simple global site permissions model; either you block JavaScript from site X or you allow JavaScript from site X. There are two problems with this model on the modern web. The first problem is that in practice a great deal of JavaScript is loaded from a few highly used websites, for example Cloudflare's CDN network. If you permit JavaScript from cdnjs.cloudflare.com to run on any site you visit, you could be loading almost anything on any specific site (really).

The second problem is that there are a number of big companies that extend their tendrils all over the web, while at the same time being places that you might want to visit directly (where they may either work better with their own JavaScript or outright require it). Globally permitting JavaScript from Twitter, Google, and so on on all sites opens me up to a lot of things that make me nervous, so in NoScript I never gave them that permission.

uMatrix's scoped permissions defang both versions of this pervasiveness. I can restrict Twitter's JavaScript to only working when I'm visiting Twitter itself, and I can allow JavaScript from Cloudflare's CDN only on sites where I want the effects it creates and I trust the site not to do abusive things (eg, where it's used as part of formatting math equations). Because I can contain the danger it would otherwise represent, uMatrix has been getting me to selectively enable JavaScript in a slowly growing number of places where it does improve my web browsing experience.

(I could more or less do this before in NoScript as a one-off temporary thing, but generally it wasn't quite worth it and I always had lingering concerns. uMatrix lets me set it once and leave it, and then I get to enjoy it afterward.)

PS: I'm not actually allowing JavaScript on Twitter, at least not on a permanent basis, but there are some other places that are both JavaScript-heavy and a little bit too pervasive for my tastes where I'm considering it, especially Medium.

PPS: There are some setting differences that also turn out to matter, to my surprise. If you use NoScript in a default-block setup and almost always use temporary permissions, I suggest that you tell NoScript to only reload the current tab on permission changes so that the effects of temporarily allowing something are much more contained. If I had realized how much of a difference it makes, especially with NoScript's global permissions, I would have done it years ago.

Sidebar: Cookie handling also benefits from scoped permissions

I hate Youtube's behavior of auto-playing the next video when I've watched one, because generally I'm only on YouTube to watch exactly one video. You can turn this off, but to make it stick you need to accept cookies from YouTube, which will then quietly follow you around the web anywhere someone embeds some YouTube content. uMatrix's scoped permissions let me restrict when YouTube can see those cookies to only when I'm actually on YouTube looking at a video. I can (and do) do similar things with cookies from Google Search.

(I also have Self-Destructing Cookies set to throw out YouTube's cookies every time I close down Firefox, somewhat limiting the damage of any tracking cookies. This means I have to reset the 'no auto-play' cookie every time I restart Firefox, but I only do that infrequently.)

by cks at February 21, 2018 03:50 AM

February 20, 2018

pagetable

Murdlok: A new old adventure game for the C64

Murdlok is a previously unreleased graphical text-based adventure game for the Commodore 64 written in 1986 by Peter Hempel. A German and an English version exist.

Murdlok – Ein Abenteuer von Peter Hempel

Befreie das Land von dem bösen Murdlok. Nur Nachdenken und kein Leichtsinn führen zum Ziel.


murdlok_de.d64

(Originalversion von 1986)

Murdlok – An Adventure by Peter Hempel

Liberate the land from the evil Murdlok! Reflection, not recklessness will guide you to your goal!


murdlok_en.d64

(English translation by Lisa Brodner and Michael Steil, 2018)

The great thing about a new game is that no walkthroughs exist yet! Feel free to use the comments section of this post to discuss how to solve the game. Extra points for the shortest solution – ours is 236 steps!

by Michael Steil at February 20, 2018 08:07 PM

ma.ttias.be

Update a docker container to the latest version

The post Update a docker container to the latest version appeared first on ma.ttias.be.

Here's a simple one, but if you're new to Docker something you might have to look up. On this server, I run Nginx as a Docker container using the official nginx:alpine  version.

I was running a fairly outdated version:

$ docker images | grep nginx
nginx    none                5a35015d93e9        10 months ago       15.5MB
nginx    latest              46102226f2fd        10 months ago       109MB
nginx    1.11-alpine         935bd7bf8ea6        18 months ago       54.8MB

In order to make sure I had the latest version, I ran pull:

$ docker pull nginx:alpine
alpine: Pulling from library/nginx
550fe1bea624: Pull complete
d421ba34525b: Pull complete
fdcbcb327323: Pull complete
bfbcec2fc4d5: Pull complete
Digest: sha256:c8ff0187cc75e1f5002c7ca9841cb191d33c4080f38140b9d6f07902ababbe66
Status: Downloaded newer image for nginx:alpine

Now, my local repository contains an up-to-date Nginx version:

$ docker images | grep nginx
nginx    alpine              bb00c21b4edf        5 weeks ago         16.8MB

To use it, you have to launch a new container based on that particular image. The currently running container will still be using the original (old) image.

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED
4d9de6c0fba1        5a35015d93e9        "nginx -g 'daemon ..."   9 months ago

In my case, I re-created my HTTP/2 nginx container like this;

$ docker stop nginx-container
$ docker rm nginx-container
$ docker run --name nginx-container \ 
    --net="host" \
    -v /etc/nginx/:/etc/nginx/ \
    -v /etc/ssl/certs/:/etc/ssl/certs/ \
    -v /etc/letsencrypt/:/etc/letsencrypt/ \
    -v /var/log/nginx/:/var/log/nginx/ \
    --restart=always \
    -d nginx:alpine

And the Nginx/container upgrade was completed.

The post Update a docker container to the latest version appeared first on ma.ttias.be.

by Mattias Geniar at February 20, 2018 07:13 PM

February 19, 2018

ma.ttias.be

Show IDN punycode in Firefox to avoid phishing URLs

The post Show IDN punycode in Firefox to avoid phishing URLs appeared first on ma.ttias.be.

Pop quiz: can you tell the difference between these 2 domains?

Both host a version of the popular crypto exchange Binance.

The second image is the correct one, the first one is a phishing link with the letter 'n' replaced by 'n with a dot below it' (U+1E47). It's not a piece of dirt on your screen, it's an attempt to trick you to believe it's the official site.

Firefox has a very interesting option called IDN_show_punycode. You can enable it in about:config`.

Once enabled, it'll make that phishing domain look like this:

Doesn't look that legit now anymore, does it?

I wish Chrome offered a similar option though, could prevent quite a few phishing attempts.

 

The post Show IDN punycode in Firefox to avoid phishing URLs appeared first on ma.ttias.be.

by Mattias Geniar at February 19, 2018 07:52 PM

Steve Kemp's Blog

How we care for our child

This post is a departure from the regular content, which is supposed to be "Debian and Free Software", but has accidentally turned into a hardware blog recently!

Anyway, we have a child who is now about 14 months old. The way that my wife and I care for him seems logical to us, but often amuses local people. So in the spirit of sharing this is what we do:

  • We divide the day into chunks of time.
  • At any given time one of us is solely responsible for him.
    • The other parent might be nearby, and might help a little.
    • But there is always a designated person who will be changing nappies, feeding, and playing at any given point in the day.
  • The end.

So our weekend routine, covering Saturday and Sunday, looks like this:

  • 07:00-08:00: Husband
  • 08:01-13:00: Wife
  • 13:01-17:00: Husband
  • 17:01-18:00: Wife
  • 18:01-19:30: Husband

Our child, Oiva, seems happy enough with this and he sometimes starts walking from one parent to the other at the appropriate time. But the real benefit is that each of us gets some time off - in my case I get "the morning" off, and my wife gets the afternoon off. We can hide in our bedroom, go shopping, eat cake, or do anything we like.

Week-days are similar, but with the caveat that we both have jobs. I take the morning, and the evenings, and in exchange if he wakes up overnight my wife helps him sleep and settle between 8PM-5AM, and if he wakes up later than 5AM I deal with him.

Most of the time our child sleeps through the night, but if he does wake up it tends to be in the 4:30AM/5AM timeframe. I'm "happy" to wake up at 5AM and stay up until I go to work because I'm a morning person and I tend to go to bed early these days.

Day-care is currently a complex process. There are three families with small children, and ourselves. Each day of the week one family hosts all the children, and the baby-sitter arrives there too (all the families live within a few blocks of each other).

All of the parents go to work, leaving one carer in charge of 4 babies for the day, from 08:15-16:15. On the days when we're hosting the children I greet the carer then go to work - on the days the children are at a different families house I take him there in the morning, on my way to work, and then my wife collects him in the evening.

At the moment things are a bit terrible because most of the children have been a bit sick, and the carer too. When a single child is sick it's mostly OK, unless that is the child which is supposed to be host-venue. If that child is sick we have to panic and pick another house for that day.

Unfortunately if the child-carer is sick then everybody is screwed, and one parent has to stay home from each family. I guess this is the downside compared to sending the children to public-daycare.

This is private day-care, Finnish-style. The social-services (kela) will reimburse each family €700/month if you're in such a scheme, and carers are limited to a maximum of 4 children. The net result is that prices are stable, averaging €900-€1000 per-child, per month.

(The €700 is refunded after a month or two, so in real-terms people like us pay €200-€300/month for Monday-Friday day-care. Plus a bit of beaurocracy over deciding which family is hosting, and which parents are providing food. With the size being capped, and the fees being pretty standard the carers earn €3600-€4000/month, which is a good amount. To be a school-teacher you need to be very qualified, but to do this caring is much simpler. It turns out that being an English-speaker can be a bonus too, for some families ;)

Currently our carer has a sick-note for three days, so I'm staying home today, and will likely stay tomorrow too. Then my wife will skip work on Wednesday. (We usually take it in turns but sometimes that can't happen easily.)

But all of this is due to change in the near future, because we've had too many sick days, and both of us have missed too much work.

More news on that in the future, unless I forget.

February 19, 2018 11:00 AM

February 17, 2018

Cryptography Engineering

A few notes on Medsec and St. Jude Medical

In Fall 2016 I was invited to come to Miami as part of a team that independently validated some alleged flaws in implantable cardiac devices manufactured by St. Jude Medical (now part of Abbott Labs). These flaws were discovered by a company called MedSec. The story got a lot of traction in the press at the time, primarily due to the fact that a hedge fund called Muddy Waters took a large short position on SJM stock as a result of these findings. SJM subsequently sued both parties for defamation. The FDA later issued a recall for many of the devices.

Due in part to the legal dispute (still ongoing!), I never had the opportunity to write about what happened down in Miami, and I thought that was a shame: because it’s really interesting. So I’m belatedly putting up this post, which talks a bit MedSec’s findings, and implantable device security in general.

By the way: “we” in this case refers to a team of subject matter experts hired by Bishop Fox, and retained by legal counsel for Muddy Waters investments. I won’t name the other team members here because some might not want to be troubled by this now, but they did most of the work — and their names can be found in this public expert report (as can all the technical findings in this post.)

Quick disclaimers: this post is my own, and any mistakes or inaccuracies in it are mine and mine alone. I’m not a doctor so holy cow this isn’t medical advice. Many of the flaws in this post have since been patched by SJM/Abbot. I was paid for my time and travel by Bishop Fox for a few days in 2016, but I haven’t worked for them since. I didn’t ask anyone for permission to post this, because it’s all public information.

A quick primer on implantable cardiac devices 

Implantable cardiac devices are tiny computers that can be surgically installed inside a patient’s body. Each device contains a battery and a set of electrical leads that can be surgically attached to the patient’s heart muscle.

When people think about these devices, they’re probably most familiar with the cardiac pacemaker. Pacemakers issue small electrical shocks to ensure that the heart beats at an appropriate rate. However, the pacemaker is actually one of the least powerful implantable devices. A much more powerful type of device is the Implantable Cardioverter-Defibrillator (ICD). These devices are implanted in patients who have a serious risk of spontaneously entering a dangerous state in which their heart ceases to pump blood effectively. The ICD continuously monitors the patient’s heart rhythm to identify when the patient’s heart has entered this condition, and applies a series of increasingly powerful shocks to the heart muscle to restore effective heart function. Unlike pacemakers, ICDs can issue shocks of several hundred volts or more, and can both stop and restart a patient’s normal heart rhythm.

Like most computers, implantable devices can communicate with other computers. To avoid the need for external data ports – which would mean a break in the patient’s skin – these devices communicate via either a long-range radio frequency (“RF”) or a near-field inductive coupling (“EM”) communication channel, or both. Healthcare providers use a specialized hospital device called a Programmer to update therapeutic settings on the device (e.g., program the device, turn therapy off). Using the Programmer, providers can manually issue commands that cause an ICD to shock the patient’s heart. One command, called a “T-Wave shock” (or “Shock-on-T”) can be used by healthcare providers to deliberately induce ventrical fibrillation. This capability is used after a device is implanted, in order to test the device and verify it’s functioning properly.

Because the Programmer is a powerful tool – one that could cause harm if misused – it’s generally deployed in a physician office or hospital setting. Moreover, device manufacturers may employ special precautions to prevent spurious commands from being accepted by an implantable device. For example:

  1. Some devices require that all Programmer commands be received over a short-range communication channel, such as the inductive (EM) channel. This limits the communication range to several centimeters.
  2. Other devices require that a short-range inductive (EM) wand must be used to initiate a session between the Programmer and a particular implantable device. The device will only accept long-range RF commands sent by the Programmer after this interaction, and then only for a limited period of time.

From a computer security perspective, both of these approaches have a common feature: using either approach requires some form of close-proximity physical interaction with the patient before the implantable device will accept (potentially harmful) commands via the long-range RF channel. Even if a malicious party steals a Programmer from a hospital, she may still need to physically approach the patient – at a distance limited to perhaps centimeters – before she can use the Programmer to issue commands that might harm the patient.

In addition to the Programmer, most implantable manufacturers also produce some form of “telemedicine” device. These devices aren’t intended to deliver commands like cardiac shocks. Instead, they exist to provide remote patient monitoring from the patient’s home. Telematics devices use RF or inductive (EM) communications to interrogate the implantable device in order to obtain episode history, usually at night when the patient is asleep. The resulting data is uploaded to a server (via telephone or cellular modem) where it can be accessed by healthcare providers.

What can go wrong?

Before we get into specific vulnerabilities in implantable devices, it’s worth asking a very basic question. From a security perspective, what should we even be worried about?

There are a number of answers to this question. For example, an attacker might abuse implantable device systems or infrastructure to recover confidential patient data (known as PHI). Obviously this would be bad, and manufacturers should design against it. But the loss of patient information is, quite frankly, kind of the least of your worries.

A much scarier possibility is that an attacker might attempt to harm patients. This could be as simple as turning off therapy, leaving the patient to deal with their underlying condition. On the much scarier end of the spectrum, an ICD attacker could find a way to deliberately issue dangerous shocks that could stop a patient’s heart from functioning properly.

Now let me be clear: this isn’t not what you’d call a high probability attack. Most people aren’t going to be targeted by sophisticated technical assassins. The concerning thing about this  the impact of such an attack is significantly terrifying that we should probably be concerned about it. Indeed, some high-profile individuals have already taken precautions against it.

The real nightmare scenario is a mass attack in which a single resourceful attacker targets thousands of individuals simultaneously — perhaps by compromising a manufacturer’s back-end infrastructure — and threatens to harm them all at the same time. While this might seem unlikely, we’ve already seen attackers systematically target hospitals with ransomware. So this isn’t entirely without precedent.

Securing device interaction physically

The real challenge in securing an implantable device is that too much security could hurt you. As tempting as it might be to lard these devices up with security features like passwords and digital certificates, doctors need to be able to access them. Sometimes in a hurry.

This shouldn’t happen in the ER.

This is a big deal. If you’re in a remote emergency room or hospital, the last thing you want is some complex security protocol making it hard to disable your device or issue a required shock. This means we can forget about complex PKI and revocation lists. Nobody is going to have time to remember a password. Even merely complicated procedures are out — you can’t afford to have them slow down treatment.

At the same time, these devices obviously must perform some sort of authentication: otherwise anyone with the right kind of RF transmitter could program them — via RF, from a distance. This is exactly what you want to prevent.

Many manufacturers have adopted an approach that cut through this knot. The basic idea is to require physical proximity before someone can issue commands to your device. Specifically, before anyone can issue a shock command (even via a long-range RF channel) they must — at least briefly — make close physical contact with the patient.

This proximity be enforced in a variety of ways. If you remember, I mentioned above that most devices have a short-range inductive coupling (“EM”) communications channel. These short-range channels seem ideal for establishing a “pairing” between a Programmer and an implantable device — via a specialized wand. Once the channel is established, of course, it’s possible to switch over to long-range RF communications.

This isn’t a perfect solution, but it has a lot going for it: someone could still harm you, but they would have to at least get a transmitter within a few inches of your chest before doing so. Moreover, you can potentially disable harmful commands from an entire class of device (like telemedecine monitoring devices) simply by leaving off the wand.

St. Jude Medical and MedSec

 

So given this background, what did St. Jude Medical do? All of the details are discussed in a full expert report published by Bishop Fox. In this post we I’ll focus on the most serious of MedSec’s claims, which can be expressed as follows:

Using only the hardware contained within a “Merlin @Home” telematics device, it was possible to disable therapy and issue high-power “shock” commands to an ICD from a distance, and without first physically interacting with the implantable device at close range.

This vulnerability had several implications:

  1. The existence of this vulnerability implies that – through a relatively simple process of “rooting” and installing software on a Merlin @Home device – a malicious attacker could create a device capable of issuing harmful shock commands to installed SJM ICD devices at a distance. This is particularly worrying given that Merlin @Home devices are widely deployed in patients’ homes and can be purchased on eBay for prices under $30. While it might conceivably be possible to physically secure and track the location of all PCS Programmer devices, it seems challenging to physically track the much larger fleet of Merlin @Home devices.
  2. More critically, it implies that St. Jude Medical implantable devices do not enforce a close physical interaction (e.g., via an EM wand or other mechanism) prior to accepting commands that have the potential to harm or even kill patients. This may be a deliberate design decision on St. Jude Medical’s part. Alternatively, it could be an oversight. In either case, this design flaw increases the risk to patients by allowing for the possibility that remote attackers might be able to cause patient harm solely via the long-range RF channel.
  3. If it is possible – using software modifications only – to issue shock commands from the Merlin @Home device, then patients with an ICD may be vulnerable in the hypothetical event that their Merlin @Home device becomes remotely compromised by an attacker. Such a compromise might be accomplished remotely via a network attack on a single patient’s Merlin @Home device. Alternatively, a compromise might be accomplished at large scale through a compromise of St. Jude Medical’s server infrastructure.

We stress that the final scenario is strictly hypothetical. MedSec did not allege a specific vulnerability that allows for the remote compromise of Merlin @Home devices or SJM infrastructure. However, from the perspective of software and network security design, these attacks are one of the potential implications of a design that permits telematics devices to send such commands to an implantable device. It is important to stress that none of these attacks would be possible if St. Jude Medical’s design prohibited the implantable from accepting therapeutic commands from the Merlin @Home device (e.g., by requiring close physical interaction via the EM wand, or by somehow authenticating the provenance of commands and restricting critical commands to be sent by the Programmer only).

Validating MedSec’s claim

To validate MedSec’s claim, we examined their methodology from start to finish. This methodology included extracting and decompiling Java-based software from a single PCS Programmer; accessing a Merlin @Home device to obtain a root shell via the JTAG port; and installing a new package of custom software written by MedSec onto a used Merlin @Home device.

We then observed MedSec issue a series of commands to an ICD device using a Merlin @Home device that had been customized (via software) as described above. We used the Programmer to verify that these commands were successfully received by the implantable device, and physically confirmed that MedSec had induced shocks by attaching a multimeter to the leads on the implantable device.

Finally, we reproduced MedSec’s claims by opening the case of a second Merlin @Home device (after verifying that the tape was intact over the screw holes), obtaining a shell by connecting a laptop computer to the JTAG port, and installing MedSec’s software on the device. We were then able to issue commands to the ICD from a distance of several feet. This process took us less than three hours in total, and required only inexpensive tools and a laptop computer.

What are the technical details of the attack?

Simply reproducing a claim is only part of the validation process. To verify MedSec’s claims we also needed to understand why the attack described above was successful. Specifically, we were interested in identifying the security design issues that make it possible for a Merlin @Home device to successfully issue commands that are not intended to be issued from this type of device. The answer to this question is quite technical, and involves the specific way that SJM implantable devices verify commands before accepting them.

MedSec described to us the operation of SJM’s command protocol as part of their demonstration. They also provided us with Java JAR executable code files taken from the hard drive of the PCS Programmer. These files, which are not obfuscated and can easily be “decompiled” into clear source code, contain the software responsible for implementing the Programmer-to-Device communications protocol.

By examining the SJM Programmer code, we verified that Programmer commands are authenticated through the inclusion of a three-byte (24 bit) “authentication tag” that must be present and correct within each command message received by the implantable device. If this tag is not correct, the device will refuse to accept the command.

From a cryptographic perspective, 24 bits is a surprisingly short value for an important authentication field. However, we note that even this relatively short tag might be sufficient to prevent forgery of command messages – provided the tag ws calculated using a secure cryptographic function (e.g., a Message Authentication Code) using a fresh secret key that cannot be predicted by an the attacker.

Based on MedSec’s demonstration, and on our analysis of the Programmer code, it appears that SJM does not use the above approach to generate authentication tags. Instead, SJM authenticates the Programmer to the implantable with the assistance of a “key table” that is hard-coded within the Java code within the Programmer. At minimum, any party who obtains the (non-obfuscated) Java code from a legitimate SJM Programmer can gain the ability to calculate the correct authentication tags needed to produce viable commands – without any need to use the Programmer itself.

Moreover, MedSec determined – and successfully demonstrated – that there exists a “Universal Key”, i.e., a fixed three-byte authentication tag, that can be used in place of the calculated authentication tag. We identified this value in the Java code provided by MedSec, and verified that it was sufficient to issue shock commands from a Merlin @Home to an implantable device.

While these issues alone are sufficient to defeat the command authentication mechanism used by SJM implantable devices, we also analyzed the specific function that is used by SJM to generate the three-byte authentication tag.  To our surprise, SJM does not appear to use a standard cryptographic function to compute this tag. Instead, they use an unusual and apparently “homebrewed” cryptographic algorithm for the purpose.

Specifically, the PCS Programmer Java code contains a series of hard-coded 32-bit RSA public keys. To issue a command, the implantable device sends a value to the Programmer. This value is then “encrypted” by the Programmer using one of the RSA public keys, and the resulting output is truncated to produce a 24-bit output tag.

The above is not a standard cryptographic protocol, and quite frankly it is difficult to see what St. Jude Medical is trying to accomplish using this technique. From a cryptographic perspective it has several problems:

  1. The RSA public keys used by the PCS Programmers are 32 bits long. Normal RSA keys are expected to be a minimum of 1024 bits in length. Some estimates predict that a 1024-bit RSA key can be factored (and thus rendered insecure) in approximately one year using a powerful network of supercomputers. Based on experimentation, we were able to factor the SJM public keys in less than one second on a laptop computer.
  2. Even if the RSA keys were of an appropriate length, the SJM protocol does not make use of the corresponding RSA secret keys. Thus the authentication tag is not an RSA signature, nor does it use RSA in any way that we are familiar with.
  3. As noted above, since there is no shared session key established between the specific implantable device and the Programmer, the only shared secret available to both parties is contained within the Programmer’s Java code. Thus any party who extracts the Java code from a PCS Programmer will be able to transmit valid commands to any SJM implantable device.

Our best interpretation of this design is that the calculation is intended as a form of “security by obscurity”, based on the assumption that an attacker will not be able to reverse engineer the protocol. Unfortunately, this approach is rarely successful when used in security systems. In this case, the system is fundamentally fragile – due to the fact that code for computing the correct authentication tag is likely available in easily-decompiled Java bytecode on each St. Jude Medical Programmer device. If this code is ever extracted and published, all St. Jude Medical devices become vulnerable to command forgery.

How to remediate these attacks?

To reiterate, the fundamental security concerns with these St. Jude Medical devices (as of 2016) appeared to be problems of design. These were:

  1. SJM implantable devices did not require close physical interaction prior to accepting commands (allegedly) sent by the Programmer.
  2. SJM did not incorporate a strong cryptographic authentication mechanism in its RF protocol to verify that commands are truly sent by the Programmer.
  3. Even if the previous issue was addressed, St. Jude did not appear to have an infrastructure for securely exchanging shared cryptographic keys between a legitimate Programmer and an implantable device.

There are various ways to remediate these issues. One approach is to require St. Jude implantable devices to exchange a secret key with the Programmer through a close-range interaction involving the Programmer’s EM wand. A second approach would be to use a magnetic sensor to verify the presence of a magnet on the device, prior to accepting Programmer commands. Other solutions are also possible. I haven’t reviewed the solution SJM ultimately adopted in their software patches, and I don’t know how many users patched.

Conclusion

Implantable devices offer a number of unique security challenges. It’s naturally hard to get these things right. At the same time, it’s important that vendors take these issues seriously, and spend the time to get cryptographic authentication mechanisms right — because once deployed, these devices are very hard to repair, and the cost of a mistake is extremely high.

by Matthew Green at February 17, 2018 06:27 PM

That grumpy BSD guy

A Life Lesson in Mishandling SMTP Sender Verification

An attempt to report spam to a mail service provider's abuse address reveals how incompetence is sometimes indistinguishable from malice.

It all started with one of those rare spam mails that got through.

This one was hawking address lists, much like the ones I occasionally receive to addresses that I can not turn into spamtraps. The message was addressed to, of all things, root@skapet.bsdly.net. (The message with full headers has been preserved here for reference).

Yes, that's right, they sent their spam to root@. And a quick peek at the headers revealed that like most of those attempts at hawking address lists for spamming that actually make it to a mailbox here, this one had been sent by an outlook.com customer.

The problem with spam delivered via outlook.com is that you can't usefully blacklist the sending server, since the largish chunk of the world that uses some sort of Microsoft hosted email solution (Office365 and its ilk) have their usually legitimate mail delivered via the very same infrastructure.

And since outlook.com is one of the mail providers that doesn't play well with greylisting (it spreads its retries across no less than 81 subnets (the output of 'echo outlook.com | doas smtpctl spf walk' is preserved here), it's fairly common practice to just whitelist all those networks and avoid the hassle of lost or delayed mail to and from Microsoft customers.

I was going to just ignore this message too, but we've seen an increasing number of spammy outfits taking advantage of outlook.com's seeming right of way to innocent third parties' mail boxes.

So I decided to try both to do my best at demoralizing this particular sender and alert outlook.com to their problem. I wrote a messsage (preserved here) with a Cc: to abuse@outlook.com where the meat is,

Ms Farell,

The address root@skapet.bsdly.net has never been subscribed to any mailing list, for obvious reasons. Whoever sold you an address list with that address on it are criminals and you should at least demand your money back.

Whoever handles abuse@outlook.com will appreciate the attachment, which is a copy of the message as it arrived here with all headers intact.

Yours sincerely,
Peter N. M. Hansteen

What happened next is quite amazing.

If my analysis is correct, it may not be possible for senders who are not themselves outlook.com customers to actually reach the outlook.com abuse team.

Almost immediately after I sent the message to Ms Farell with a Cc: to abuse@outlook.com, two apparently identical messages from staff@hotmail.com, addressed to postmaster@bsdly.net appeared (preserved here and here), with the main content of both stating

This is an email abuse report for an email message received from IP 216.32.180.51 on Sat, 17 Feb 2018 01:59:21 -0800.
The message below did not meet the sending domain's authentication policy.
For more information about this format please see http://www.ietf.org/rfc/rfc5965.txt.

In order to understand what happened here, it is necessary to look at the mail server log for a time interval of a few seconds (preserved here).

The first few lines describe the processing of my outgoing message:

2018-02-17 10:59:14 1emzGs-0009wb-94 <= peter@bsdly.net H=(greyhame.bsdly.net) [192.168.103.164] P=esmtps X=TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128 CV=no S=34977 id=31b4ffcf-bf87-de33-b53a-0 ebff4349b94@bsdly.net

My server receives the message from my laptop, and we can see that the connection was properly TLS encrypted

2018-02-17 10:59:15 1emzGs-0009wb-94 => peter <root@skapet.bsdly.net> R=localuser T=local_delivery

I had for some reason kept the original recipient among the To: addresses. Actually useless but also harmless.

2018-02-17 10:59:16 1emzGs-0009wb-94 [104.47.40.33] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/CN=mail.protection.outlook.com" H="outlook-com.olc.protection.outlook.com"
2018-02-17 10:59:18 1emzGs-0009wb-94 SMTP error from remote mail server after end of data: 451 4.4.0 Message failed to be made redundant due to A shadow copy was required but failed to be made with an AckStatus of Fail [CO1NAM03HT002.eop-NAM03.prod.protection.outlook.com] [CO1NAM03FT002.eop-NAM03.prod.protection.outlook.com]
2018-02-17 10:59:19 1emzGs-0009wb-94 [104.47.42.33] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/CN=mail.protection.outlook.com" H="outlook-com.olc.protection.outlook.com"


What we see here is that even a huge corporation like Microsoft does not always handle certificates properly. The certificate they present for setting up the encrypted connection is not actually valid for the host name that the outlook.com server presents.

There is also what I interpret as a file system related message which I assume is meaningful to someone well versed in Microsoft products, but we see that

2018-02-17 10:59:20 1emzGs-0009wb-94 => janet@prospectingsales.net R=dnslookup T=remote_smtp H=prospectingsales-net.mail.protection.outlook.com [23.103.140.138] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=yes K C="250 2.6.0 <31b4ffcf-bf87-de33-b53a-0ebff4349b94@bsdly.net> [InternalId=40926743365667, Hostname=BMXPR01MB0934.INDPRD01.PROD.OUTLOOK.COM] 44350 bytes in 0.868, 49.851 KB/sec Queued mail for delivery"

even though the certificate fails the verification part, the connection sets up with TLSv1.2 anyway, and the message is accepted with a "Queued mail for delivery" message.

The message is also delivered to the Cc: recipient:

2018-02-17 10:59:21 1emzGs-0009wb-94 => abuse@outlook.com R=dnslookup T=remote_smtp H=outlook-com.olc.protection.outlook.com [104.47.42.33] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K C="250 2.6.0 <31b4ffcf-bf87-de33-b53a-0ebff4349b94@bsdly.net> [InternalId=3491808500196, Hostname=BY2NAM03HT071.eop-NAM03.prod.protection.outlook.com] 42526 bytes in 0.125, 332.215 KB/sec Queued mail for delivery"
2018-02-17 10:59:21 1emzGs-0009wb-94 Completed


And the transactions involving my message would normally have been completed.

But ten seconds later this happens:

2018-02-17 10:59:31 1emzHG-0004w8-0l <= staff@hotmail.com H=bay004-omc1s10.hotmail.com [65.54.190.21] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43968 id=BAY0-XMR-100m4KrfmH000a51d4@bay0-xmr-100.phx.gbl
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <postmaster@bsdly.net> R=localuser T=local_delivery
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <postmaster@bsdly.net> R=localuser T=local_delivery


That's the first message to my domain's postmaster@ address, followed two seconds later by

2018-02-17 10:59:33 1emzHI-0004w8-Fy <= staff@hotmail.com H=bay004-omc1s10.hotmail.com [65.54.190.21] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43963 id=BAY0-XMR-100Q2wN0I8000a51d3@bay0-xmr-100.phx.gbl
2018-02-17 10:59:33 1emzHI-0004w8-Fy => peter <postmaster@bsdly.net> R=localuser T=local_delivery
2018-02-17 10:59:33 1emzHI-0004w8-Fy Completed


a second, apparently identical message.

Both of those messages state that the message I sent to abuse@outlook.com had failed SPF verification, because the check happened on connections from NAM03-BY2-obe.outbound.protection.outlook.com (216.32.180.51) by whatever handles incoming mail to the staff@hotmail.com address, which apparently is where the system forwards abuse@outlook.com's mail.

Reading Microsoft Exchange's variant SMTP headers has never been my forte, and I won't try decoding the exact chain of events here since that would probably also require you to have fairly intimate knowledge of Microsoft's internal mail delivery infrastructure.

But even a quick glance at the messages reveals that the message passed SPF and other checks on incoming to the outlook.com infrastructure, but may have ended up not getting delivered after all since a second SPF test happened on a connection from a host that is not in the sender domain's SPF record.

In fact, that second test would only succeed for domains that have

include:spf.protection.outlook.com

in their SPF record, and those would presumably be Outlook.com customers.

Any student or practitioner of SMTP mail delivery should know that SPF records should only happen on ingress, that is at the point where the mail traffic enters your infrastructure and the sender IP address is the original one. Leave the check for later when the message may have been forwarded, and you do not have sufficient data to perform the check.

Whenever I encounter incredibly stupid and functionally destructive configuration errors like this I tend to believe they're down to simple incompetence and not malice.

But this one has me wondering. If you essentially require incoming mail to include the contents of spf.outlook.com (currently no less than 81 subnets) as valid senders for the domain, you are essentially saying that only outlook.com customers are allowed to communicate.

If that restriction is a result of a deliberate choice rather than a simple configuration error, the problem moves out of the technical sphere and could conceivably become a legal matter, depending on what outlook.com have specified in their contracts that they are selling to their customers.

But let us assume that this is indeed a matter of simple bad luck or incompetence and that the solution is indeed technical.

I would have liked to report this to whoever does technical things at that domain via email, but unfortunately there are indications that being their customer is a precondition for using that channel of communication to them.

I hope they fix that, and soon. And then move on to terminating their spamming customers' contracts.

The main lesson to be learned from this is that when you shop around for email service, please do yourself a favor and make an effort to ensure that your prospective providers actually understand how the modern-ish SMTP addons SPF, DKIM and DMARC actually work.

Otherwise you may end up receiving more of the mail you don't want than what you do want, and your own mail may end up not being delivered as intended.

Update 2018-02-19: Just as I was going to get ready for bed (it's late here in CET) another message from Ms Farell arrived, this time to an alias I set up in order to make it easier to filter PF tutorial related messages into a separate mailbox.

I wrote another response, and as the mail server log will show, despite the fact that a friend with an Office365 contract contacted them quoting this article, outlook.com have still not fixed the problem. Two more messages (preserved here and here) shot back here immediately.

Update 2018-02-20: A response from Microsoft, with pointers to potentially useful information.

A message from somebody identifying as working for Microsoft Online Safety arrived, apparently responding to my message dated 2018-02-19, where the main material was,

Hi,

Based on the information you provided, it appears to have originated from an Office 365 or Exchange Online tenant account.

To report junk mail from Office 365 tenants, send an email to junk@office365.microsoft.com   and include the junk mail as an attachment.

This link provides further junk mail education https://technet.microsoft.com/en-us/library/jj200769(v=exchg.150).aspx.

Kindly,
I have asked for clarification of some points, but no response has arrived by this getting close to bedtime in CET.

However I did take the advice to forward the offending messages as attachment to the junk@ message, and put the outlook.com abuse address in the Cc: on that message. My logs indicate that the certificate error had not gone away, but no SPF-generated bounces appeared either.

If Microsoft responds with further clarifications, I will publish a useful condensate here.



In other news, there will be PF tutorial at the 2018 AsiaBSDCon in Tokyo. Follow the links for the most up to date information.

by Peter N. M. Hansteen (noreply@blogger.com) at February 17, 2018 04:38 PM

pagetable

Commodore KERNAL History

If you have ever written 6502 code for the Commodore 64, you may remember using “JSR $FFD2″ to print a character on the screen. You may have read that the jump table at the end of the KERNAL ROM was designed to allow applications to run on a all Commodore 8 bit computers from the PET to the C128 (and the C65!) – but that is a misconception. This article will show how

  • the first version of the jump table in the PET was designed to only hook up BASIC to the system’s features
  • it wasn’t until the VIC-20 that the jump table was generalized for application development (and the vector table introduced)
  • all later machines add their own calls, but later machines don’t necessary support older calls.

KIM-1 (1976)

The KIM-1 was originally meant as a computer development board for the MOS 6502 CPU. Commodore acquired MOS in 1976 and kept selling the KIM-1. It contained a 2 KB ROM (“TIM”, “Terminal Interface Monitor”), which included functions to read characters from ($1E5A) and write characters to ($1EA0) a serial terminal, as well as code to load from and save to tape and support for the hex keyboard and display.

Commodore asked Microsoft to port their BASIC for 6502 to it, which interfaced with the monitor only through the two character in and out functions. The original source of BASIC shows how Microsoft adapted it to work with the KIM-1 by defining CZGETL and OUTCH to point to the monitor routines:

IFE     REALIO-1,<GETCMD==1
        DISKO==1
        OUTCH=^O17240                   ;1EA0
        ROMLOC==^O20000
        RORSW==0
        CZGETL=^O17132>

(The values are octal, since the assembler Microsoft used did not support hexadecimal.)

The makers of the KIM-1 never intended to change the ROM, so there was no need to have a jump table for these calls. Applications just hardcoded their offsets in ROM.

PET (1977)

The PET was Commodore’s first complete computer, with a keyboard, a display and a built-in tape drive. The system ROM (“KERNAL”) was now 4 KB and included a powerful file I/O system for tape, RS-232 and IEEE-488 (for printers and disk drives) as well as timekeeping logic. Another 2 KB ROM (“EDITOR”) handled screen output and character input. Microsoft BASIC was included in ROM and was marketed – with the name “COMMODORE BASIC” – as the actual operating system, making the KERNAL and the editor merely a device driver package.

Like with the KIM-1, Commodore asked Microsoft to port BASIC to the PET, and provided them with addresses of a jump table in the KERNAL ROM for interfacing with it. These are the symbol definitions in Microsoft’s source:

        CQOPEN=^O177700
        CQCLOS=^O177703
        CQOIN= ^O177706         ;OPEN CHANNEL FOR INPUT
        CQOOUT=^O177711         ;FILL FOR COMMO.
        CQCCHN=^O177714
        CQINCH=^O177717         ;INCHR'S CALL TO GET A CHARACTER
        OUTCH= ^O177722
        CQLOAD=^O177725
        CQSAVE=^O177730
        CQVERF=^O177733
        CQSYS= ^O177736
        ISCNTC=^O177741
        CZGETL=^O177744         ;CALL POINT FOR "GET"
        CQCALL=^O177747         ;CLOSE ALL CHANNELS

(The meaning of the CQ prefix is left as an exercise to the reader.)

In hex and with Commodore’s names, these are the KERNAL calls used by BASIC:

  • $FFC0: OPEN
  • $FFC3: CLOSE
  • $FFC6: CHKIN
  • $FFC9: CHKOUT
  • $FFCC: CLRCHN
  • $FFCF: BASIN
  • $FFD2: BSOUT
  • $FFD5: LOAD
  • $FFD8: SAVE
  • $FFDB: VERIFY
  • $FFDE: SYS
  • $FFE1: STOP
  • $FFE4: GETIN
  • $FFE7: CLALL
  • $FFEA: UDTIM (advance clock; not used by BASIC)

At first sight, this jump table looks very similar to the one known from the C64, but it is indeed very different, and it is not generally compatible.

The following eight KERNAL routines are called from within the implementation of BASIC commands to deal with character I/O and the keyboard:

  • $FFC6: CHKIN – set channel for character input
  • $FFC9: CHKOUT – set channel for character output
  • $FFCC: CLRCHN – restore character I/O to screen/keyboard
  • $FFCF: BASIN – get character
  • $FFD2: BSOUT – write character
  • $FFE1: STOP – test for STOP key
  • $FFE4: GETIN – get character from keyboard
  • $FFE7: CLALL – close all channels

But the remaining six calls are not library calls at all, but full-fledged implementations of BASIC commands:

  • $FFC0: OPEN – open a channel
  • $FFC3: CLOSE – close a channel
  • $FFD5: LOAD – load a file into memory
  • $FFD8: SAVE – save a file from memory
  • $FFDB: VERIFY – compare a file with memory
  • $FFDE: SYS – run machine code

When compiled for the PET, Microsoft BASIC detects the extra commands “OPEN”, “CLOSE” etc., but does not provide an implementation for them. Instead, it calls out to these KERNAL functions when these commands are encountered. So these KERNAL calls have to parse the BASIC arguments, check for errors, and update BASIC’s internal data structures.

These 6 KERNAL calls are actually BASIC command extensions, and they are not useful for any other programs in machine code. After all, the whole jump table was not meant as an abstraction of the machine, but as an interface especially for Microsoft BASIC.

PET BASIC V4 (1980)

Version 4 of the ROM set, which came with significant improvements to BASIC and shipped by default with the 4000 and 8000 series, contained several additions to the KERNAL – all of which were additional BASIC commands.

  • $FF93: CONCAT
  • $FF96: DOPEN
  • $FF99: DCLOSE
  • $FF9C: RECORD
  • $FF9F: HEADER
  • $FFA2: COLLECT
  • $FFA5: BACKUP
  • $FFA8: COPY
  • $FFAB: APPEND
  • $FFAE: DSAVE
  • $FFB1: DLOAD
  • $FFB4: CATALOG/DIRECTORY
  • $FFB7: RENAME
  • $FFBA; SCRATCH
  • $FFBD: DS$ (disk status)

Even though Commodore was doing all development on their fork of BASIC after version 2, command additions were still kept separate and developed as part of the KERNAL. In fact, for all Commodore 8-bit computers from the PET to the C65, BASIC and KERNAL were built separately, and the KERNAL jump table was their interface.

VIC-20 (1981)

The VIC-20 was Commodore’s first low-cost home computer. In order to keep the cost down, the complete ROM had to fit into 16 KB, which meant the BASIC V4 features and the machine language monitor had to be dropped and the editor was merged into the KERNAL. While reorganizing the ROM, the original BASIC command extensions (OPEN, CLOSE, …) were moved into the BASIC ROM (so the KERNAL calls for the BASIC command implementations were no longer needed).

The VIC-20 KERNAL is the first one to have a proper system call interface, which does not only include all calls required so BASIC is hardware-independent, but also additional calls not used by BASIC but intended for applications written in machine code. The VIC-20 Programmer’s Reference Manual documents these, making this the first time that machine code applications could be written for the Commodore 8 bit series in a forward-compatible way.

Old PET Calls

The following PET KERNAL calls are generally useful and therefore still supported on the VIC-20:

  • $FFC6: CHKIN
  • $FFC9: CHKOUT
  • $FFCC: CLRCHN
  • $FFCF: BASIN
  • $FFD2: BSOUT
  • $FFE1: STOP
  • $FFE4: GETIN
  • $FFE7: CLALL
  • $FFEA: UDTIM

Channel I/O

The calls for the BASIC commands OPEN, CLOSE, LOAD and SAVE have been replaced by generic functions that can be called from machine code:

  • $FFC0: OPEN
  • $FFC3: CLOSE
  • $FFD5: LOAD
  • $FFD8: SAVE

(There is no separate call for VERIFY, since the LOAD function can perform this function based on its inputs.)

OPEN, LOAD and SAVE take more arguments (LA, FA, SA, filename) than fit into the 6502 registers, so two more calls take these and store them temporarily.

  • $FFBA: SETLFS – set LA, FA and SA
  • $FFBD: SETNAM – set filename

Two more additions allow reading the status of the last operation and to set the verbosity of messages/errors:

  • $FFB7: READST – return status byte
  • $FF90: SETMSG – set verbosity

BASIC uses all these functions to implement the commands OPEN, CLOSE, LOAD, SAVE and VERIFY. It basically parses the arguments and then calls the KERNAL functions.

IEC

The KERNAL also exposes a complete low-level interface to the serial IEC (IEEE-488) bus used to connect printers and disk drives. None of these calls are used by BASIC though, which talks to these devices on a higher level (OPEN, CHKIN, BASIN etc.).

  • $FFB4: TALK – send TALK command
  • $FFB1: LISTEN – send LISTEN command
  • $FFAE: UNLSN – send UNLISTEN command
  • $FFAB: UNTLK – send UNTALK command
  • $FFA8: IECOUT – send byte to serial bus
  • $FFA5: IECIN – read byte from serial bus
  • $FFA2: SETTMO – set timeout
  • $FF96: TKSA – send TALK secondary address
  • $FF93: SECOND – send LISTEN secondary address

Memory

BASIC needs to know where usable RAM starts and where it ends, which is what the MEMTOP and MEMBOT function are for. They also allow setting these values.

  • $FF9C: MEMBOT – read/write address of start of usable RAM
  • $FF99: MEMTOP – read/write address of end of usable RAM

Time

BASIC supports the TI and TI$ variables to access the system clock. The RDTIM and SETTIM KERNAL calls allow reading and writing this clock.

  • $FFDE: RDTIM – read system clock
  • $FFDB: SETTIM – write system clock

These functions use the addresses that used to be the BASIC commands SYS and VERIFY on the PET.

Screen

Machine code applications may want to know the size of the text screen (SCREEN) and be able to read or set the cursor position (PLOT). The latter is used by BASIC to align text on tab positions.

  • $FFED: SCREEN – get the screen resolution
  • $FFF0: PLOT – read/write cursor position

I/O

On the PET, the BASIC’s random number generator for the RND command was directly reading the timers in THE VIA 6522 controller. Since the VIC-20, this is abstracted: The IOBASE function returns the start address of the VIA in memory, and BASIC reads from the indexes 4, 5, 8 and 9 to access the timer values.

  • $FFF3: IOBASE – return start of I/O area

The VIC-20 Programmer’s Reference Guide states: “This routine exists to provide compatibility between the VIC 20 and future models of the VIC. If the I/O locations for a machine language program are set by a call to this routine, they should still remain compatible with future versions of the VIC, the KERNAL and BASIC.”

Vectors

The PET already allowed the user to override the following vectors in RAM to hook into some KERNAL functions:

  • $00E9: input from keyboard
  • $00EB: output to screen
  • $0090: IRQ handler
  • $0092: BRK handler
  • $0094: NMI handler

The VIC-20 ROM replaces these vectors with a more extensive table of addresses in RAM at $0300 to hook core BASIC and KERNAL functions. The KERNAL ones start at $0314. The first three can be used to hook IRQ, BRK and NMI:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler
  • $0318: NMINV – NMI handler

The others allow overriding the core set of KERNAL calls

  • $031A: IOPEN – indirect entry to OPEN ($FFC0)
  • $031C: ICLOSE – indirect entry to CLOSE ($FFC3)
  • $031E: ICHKIN – indirect entry to CHKIN ($FFC6)
  • $0320: ICKOUT – indirect entry to CHKOUT ($FFC9)
  • $0322: ICLRCH – indirect entry to CLRCHN ($FFCC)
  • $0324: IBASIN – indirect entry to CHRIN ($FFCF)
  • $0326: IBSOUT – indirect entry to CHROUT ($FFD2)
  • $0328: ISTOP – indirect entry to STOP ($FFE1)
  • $032A: IGETIN – indirect entry to GETIN ($FFE4)
  • $032C: ICLALL – indirect entry to CLALL ($FFE7)
  • $032E: USRCMD – “User-Defined Vector”
  • $0330: ILOAD – indirect entry to LOAD ($FFD5)
  • $0332: ISAVE – indirect entry to SAVE ($FFD8)

The “USRCMD” vector is interesting: It’s unused on the VIC-20 and C64. On all later machines, this vector is documented as “EXMON” and allows hooking the machine code monitor’s command entry. The vector was presumably meant for the monitor from the beginning, but this feature was cut from these two machines.

The KERNAL documentation warns against changing these vectors by hand. Instead, the VECTOR call allows the application to copy the complete set of KERNAL vectors ($0314-$0333) from and to private memory. The RESTOR command sets the default values.

  • $FF8D: VECTOR – read/write KERNAL vectors
  • $FF8A: RESTOR – set KERNAL vectors to defaults

Custom IRQ Handlers

If an application hooks the IRQ vector, it could just insert itself and call code originally pointed to by the vector, or completely replace the IRQ code (and return with pulling the registers and RTI). In the latter case, it may still want the keyboard and the system clock to work. The PET already had the UDTIM ($FFEA) call to update the clock in the IRQ context. The VIC-20 adds SCNKEY to handle the keyboard and populating the keyboard buffer.

  • $FF9F: SCNKEY – keyboard driver

CBM-II (1982)

The CBM-II series of computers was meant as a successor of the PET 4000/8000 series. The KERNAL’s architecture was based on the VIC-20.

The vector table in RAM is compatible except for ILOAD, ISAVE and USRCMD (which is now used), whose order was changed:

  • $032E: ILOAD – indirect entry to LOAD ($FFD5)
  • $0330: ISAVE – indirect entry to SAVE ($FFD8)
  • $0332: USRCMD – machine code monitor command input

There are two new keyboard-related vectors:

  • $0334: ESCVEC – ESC key vector
  • $0336: CTLVEC – CONTROL key vector (unused)

And all IEEE-488 KERNAL calls except ACPTR can be hooked:

  • $0346: ITALK – indirect entry to TALK ($FFB4)
  • $0344: ILISTN – indirect entry to LISTEN ($FFB1)
  • $0342: IUNLSN – indirect entry to UNLSN ($FFAE)
  • $0340: IUNTLK – indirect entry to UNTLK ($FFAB)
  • $033E: ICIOUT – indirect entry to CIOUT ($FFA8)
  • $033C: IACPTR – indirect entry to ACPTR ($FFA5)
  • $033A: ITKSA – indirect entry to TKSA ($FF96)
  • $0338: ISECND – indirect entry to SECOND ($FF93)

For no apparent reason, the VECTOR and RESTOR calls have moved to different addresses:

  • $FF84: VECTOR – read/write KERNAL vectors
  • $FF87: RESTOR – set KERNAL vectors to defaults

And there are several new calls. All machines since the VIC-20 have a way to hand control to ROM cartridges instead of BASIC on system startup. At this point, no system initialization whatsoever has been done by the KERNAL, so the application or game on the cartridge can start up as quickly as possible. Applications that want to be forward-compatible can call into the following new KERNAL calls to initialize different parts of the system:

  • $FF7B: IOINIT – initialize I/O and enable timer IRQ
  • $FF7E: CINT – initialize text screen

The LKUPLA and LKUPSA calls are used by BASIC to find unused logical and secondary addresses for channel I/O, so its built-in disk commands can open channels even if the user has currently open channels – logical addresses have to be unique on the computer side, and secondary addresses have to be unique on the disk drive side.

  • $FF8D: LKUPLA – search tables for given LA
  • $FF8A: LKUPSA – search tables for given SA

It also added 6 generally useful calls:

  • $FF6C: TXJMP – jump across banks
  • $FF6F: VRESET – power-on/off vector reset
  • $FF72: IPCGO – loop for other processor
  • $FF75: FUNKEY – list/program function key
  • $FF78: IPRQST – send IPC request
  • $FF81: ALOCAT – allocate memory from MEMTOP down

C64 (1982)

Both the KERNAL and the BASIC ROM of the C64 are derived from the VIC-20, so both the KERNAL calls and the vectors are fully compatible with it, but some extensions from the CBM-II carried over: The IOINIT and CINT calls to initialize I/O and the text screen exist, but at different addresses, and a new RAMTAS call has been added, which is also useful for startup from a ROM cartridge.

  • $FF87: RAMTAS – test and initialize RAM
  • $FF84: IOINIT – initialize I/O and enable timer IRQ
  • $FF81: CINT – initialize text screen

The other CBM-II additions are missing, since they are not needed, e.g. because BASIC doesn’t have the V4 disk commands (LKUPLA, LKUPSA) and because there is only one RAM bank (TXJMP, ALOCAT).

Plus/4 (264 Series, 1985)

The next Commodore 8 bit computers in historical order are the 264 series: the C16, the C116 and the Plus/4, which share the same general architecture, BASIC and KERNAL. But they are neither meant as successors of the C64, nor to the CBM-II series – they are more like spiritual successors of the VIC-20. Nevertheless, the KERNAL jump table and vectors are based on the C64.

Since the 264 machines don’t have an NMI, the NMI vector is missing, and the remaining vectors have been moved in memory. This makes most of the vector table incompatible with their predecessors:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler
  • (NMI removed)
  • $0318: IOPEN
  • $031A: ICLOSE
  • $031C: ICHKIN
  • $031E: ICKOUT
  • $0320: ICLRCH
  • $0322: IBASIN
  • $0324: IBSOUT
  • $0326: ISTOP
  • $0328: IGETIN
  • $032A: ICLALL
  • $032C: USRCMD
  • $032E: ILOAD
  • $0330: ISAVE

The Plus/4 is the first machine from the home computer series to include the machine code monitor, so the USRCMD vector is now used for command input in the monitor.

And there is one new vector, ITIME, which is called one every frame during vertical blank.

  • $0312: ITIME – vertical blank IRQ

The Plus/4 supports all C64 KERNAL calls, plus some additions. The RESET call has been added to the very end of the table:

  • $FFF6: RESET – restart machine

There are nine more undocumented entries, which are located at lower addresses so that there is an (unused) gap between them and the remaining calls. Since the area $FFD0 to $FF3F is occupied by the I/O area, these vectors are split between the areas just below and just above it. These two sets are known as the “banking routine table” and the “unofficial jump table”.

  • $FCF1: CARTRIDGE_IRQ
  • $FCF4: PHOENIX
  • $FCF7: LONG_FETCH
  • $FCFA: LONG_JUMP
  • $FCFD: LONG_IRQ
  • $FF49: DEFKEY – program function key
  • $FF4C: PRINT – print string
  • $FF4F: PRIMM – print string following the caller’s code
  • $FF52: MONITOR – enter machine code monitor

The DEFKEY call has the same functionality as FUNKEY ($FF75) call of the CBM-II series, but the two take different arguments.

C128 (1985)

The Commodore 128 is the successor of the C64. Next to a 100% compatible C64 mode that used the original ROMs, it has a native C128 mode, which is based on the C64 (not the CBM-II or the 264), so all KERNAL vectors and calls are compatible with the C64, but there are additions.

The KERNAL vectors are the same as on the C64, but again, the USRCMD vector (at the VIC-20/C64 location of $032E) is used for command input in the machine code monitor. There are additional vectors starting at $0334 for hooking editor logic as well as pointers to keyboard decode tables, but these are not part of the KERNAL vectors, since the VECTOR and RESTOR calls don’t include them.

The set of KERNAL calls has been extended by 19 entries. The LKUPLA and LKUPSA calls from the CBM-II exist (because BASIC has disk commands), but they are at different locations:

  • $FF59: LKUPLA
  • $FF5C: LKUPSA

There are also several calls known from the Plus/4, but at different addresses:

  • $FF65: PFKEY – program a function key
  • $FF7D: PRIMM – print string following the caller’s code
  • $FF56: PHOENIX – init function cartridges

And there are another 14 completely new ones:

  • $FF47: SPIN_SPOUT – setup fast serial ports for I/O
  • $FF4A: CLOSE_ALL – close all files on a device
  • $FF4D: C64MODE – reconfigure system as a C64
  • $FF50: DMA_CALL – send command to DMA device
  • $FF53: BOOT_CALL – boot load program from disk
  • $FF5F: SWAPPER – switch between 40 and 80 columns
  • $FF62: DLCHR – init 80-col character RAM
  • $FF68: SETBNK – set bank for I/O operations
  • $FF6B: GETCFG – lookup MMU data for given bank
  • $FF6E: JSRFAR – gosub in another bank
  • $FF71: JMPFAR – goto another bank
  • $FF74: INDFET – LDA (fetvec),Y from any bank
  • $FF77: INDSTA – STA (stavec),Y to any bank
  • $FF7A: INDCMP – CMP (cmpvec),Y to any bank

Interestingly, the C128 Programmer’s Reference Guide states that all calls since the C64 “are specifically for the C128 and as such should not be considered as permanent additions to the standard jump table.

C65 (1991)

The C65 (also known as the C64X) was a planned successor of the C64 line of computers. Several hundred prerelease devices were built, but it was never released as a product. Like the C128, it has a C64 mode, but it is not backwards-compatible with the C128. Nevertheless, the KERNAL of the native C65 mode is based on the C128 KERNAL.

Like the CBM-II, but at different addresses, all IEE-488/IEC functions can be hooked with these 8 new vectors:

  • $0335: ITALK – indirect entry to TALK ($FFB4)
  • $0338: ILISTEN – indirect entry to LISTEN ($FFB1)
  • $033B: ITALKSA – indirect entry to TKSA ($FF96)
  • $033E: ISECND – indirect entry to SECOND ($FF93)
  • $0341: IACPTR – indirect entry to ACPTR ($FFA5)
  • $0344: ICIOUT – indirect entry to CIOUT ($FFA8)
  • $0347: IUNTLK – indirect entry to UNTLK ($FFAB)
  • $034A: IUNLSN – indirect entry to UNLSN ($FFAE)

The C128 additions of the jump table are basically supported, but three calls have been removed and one has been added. The removed ones are DMA_CALL (REU support), DLCHR (VDC support) and GETCFG (MMU support). All three are C128-specific and would make no sense on the C65. The one addition is:

  • $FF56: MONITOR_CALL – enter machine code monitor

The removals and addition causes the addresses of the following calls to change:

  • $FF4D: SPIN_SPOUT
  • $FF50: CLOSE_ALL
  • $FF53: C64MODE
  • $FF59: BOOT_CALL
  • $FF5C: PHOENIX
  • $FF5F: LKUPLA
  • $FF62: LKUPSA
  • $FF65: SWAPPER
  • $FF68: PFKEY
  • $FF6B: SETBNK

The C128-added KERNAL calls on the C65 can in no way be called compatible with the C128, since several of the calls take different arguments, e.g. the INDFET, INDSTA, INDCMP calls take the bank number in the 65CE02′s Z register. This shows again that the C65 is no way a successor of the C128, but another successor of the C64.

Relationship Graph

The successorship of the Commodore 8 bit computers is messy. Most were merely spiritual successors and rarely truly compatible. The KERNAL source code and the features of the jump table mostly follow the successorship path, but some KERNAL features and jump table calls carried over between branches.

Which entries are safe?

If you want to write code that works on multiple Commodore 8 bit machines, this table will help:

PET

VIC-20

C64
C128
C65
Plus4

CBM-II

$FF80

KERNAL Version

$FF81

CINT

$FF84

IOINIT

$FF87

RAMTAS

$FF8A

RESTOR

$FF8D

VECTOR

$FF90

SETMSG

$FF93

SECOND

$FF96

TKSA

$FF99

MEMTOP

$FF9C

MEMBOT

$FF9F

SCNKEY

$FFA2

SETTMO

$FFA5

IECIN

$FFA8

IECOUT

$FFAB

UNTLK

$FFAE

UNLSN

$FFB1

LISTEN

$FFB4

TALK

$FFB7

READST

$FFBA

SETLFS

$FFBD

SETNAM

$FFC0

OPEN

$FFC3

CLOSE

$FFC6

CHKIN

$FFC9

CHKOUT

$FFCC

CLRCHN

$FFCF

BASIN

$FFD2

BSOUT

$FFD5

LOAD

$FFD8

SAVE

$FFDB

SETTIM

$FFDE

RDTIM

$FFE1

STOP

$FFE4

GETIN

$FFE7

CLALL

$FFEA

UDTIM

$FFED

SCREEN

$FFF0

PLOT

$FFF3

IOBASE

Code that must work on all Commodore 8 bit computers (without detecting the specific machine) is limited to the following KERNAL calls that are supported from the first PET up to the C65:

  • $FFCF: BASIN – get character
  • $FFD2: BSOUT – write character
  • $FFE1: STOP – test for STOP key
  • $FFE4: GETIN – get character from keyboard

The CHKIN, CHKOUT, CLRCHN, CLALL and UDTIM would be available, but they are not useful, since they are missing their counterparts (opening a file, hooking an interrupt) on the PET. The UDTIM call would be available too, but there is no standard way to hook the timer interrupt if you include the PET.

Nevertheless, the four basic calls are enough for any text mode application that doesn’t care where the line breaks are. Note that the PETSCII graphical character set and the basic PETSCII command codes e.g. for moving the cursor are supported across the whole family.

If you are limiting yourself to the VIC-20 and above (i.e. excluding the PET but including the CBM-II), you can use the basic of 34 calls starting at $FF90.

You can only use these two vectors though – if you’re okay with changing them manually without going through the VECTOR call in order to support the CBM-II:

  • $0314: CINV – IRQ handler
  • $0316: CBINV – BRK handler

VECTOR and RESTOR are supported on the complete home computer series (i.e. if you exclude the PET and the CBM-II), and the complete set of 16 vectors can be used on all home computers except the Plus/4.

The initialization calls (CINT, IOINIT, RAMTAS) exist on all home computers since the C64. In addition, all these machines contain the version of the KERNAL at $FF80.

References

by Michael Steil at February 17, 2018 12:38 PM

February 16, 2018

Steve Kemp's Blog

Updated my package-repository

Yesterday I overhauled my Debian package-hosting repository, in response to user-complaints.

I started down the rabit hole due to:

  W: No Hash entry in Release file /.._._Release which is considered strong enough for security purposes

I fixed that by changing my hashes from SHA1 to SHA256 + SHA512, but I was only making a little progress, due to the more serious problem, my repository-signing key was DSA-based and "small". I replaced it with a modern key, then changed how I generate my packages and all is well.

In the past I was generating the Release files manually, via a silly shell-script. Anyway here is my trivial Makefile for making the per-project and per-distribution archive, no doubt it could be improved:

   all: repo

   clean:
       @rm -f InRelease Packages Sources Packages.gz Sources.gz Release Release.gpg

   Packages: $(wildcard *.deb)
       @apt-ftparchive packages . > Packages 2>/dev/null
       @gzip -c Packages > Packages.gz

   Sources: $(wildcard *.tar.gz)
       @apt-ftparchive sources . > Sources 2>/dev/null
       @gzip -c Sources > Sources.gz

   repo: Packages Sources
       @apt-ftparchive release . > Release
       @gpg --yes --clearsign -o InRelease Release
       @gpg --yes -abs -o Release.gpg Release

In conclusion, in the unlikely event you're using my packages please see GPG-instructions. I've also hidden any packages which were solely for Squeeze and Wheezy, but they continue to exist to avoid breaking links.

February 16, 2018 10:00 PM

February 13, 2018

ma.ttias.be

Nginx adds support for HTTP/2 Server-Side Push

The post Nginx adds support for HTTP/2 Server-Side Push appeared first on ma.ttias.be.

Looks like they'll be implementing it based on the Link header, which should make it trivial to implement from a developer point of view.

HTTP/2: server push.

Resources to be pushed are configured with the "http2_push" directive.

Also, preload links from the Link response headers, as described in Server Push HTTP/2 ([RFC7540]), can be pushed, if enabled with the "http2_push_preload" directive.

Only relative URIs with absolute paths can be pushed.

The number of concurrent pushes is normally limited by a client, but cannot exceed a hard limit set by the "http2_max_concurrent_pushes" directive.

Source: nginx: 641306096f5b

The post Nginx adds support for HTTP/2 Server-Side Push appeared first on ma.ttias.be.

by Mattias Geniar at February 13, 2018 08:34 AM

February 12, 2018

Colin Percival

FreeBSD/EC2 history

A couple years ago Jeff Barr published a blog post with a timeline of EC2 instances. I thought at the time that I should write up a timeline of the FreeBSD/EC2 platform, but I didn't get around to it; but last week, as I prepared to ask for sponsorship for my work I decided that it was time to sit down and collect together the long history of how the platform has evolved and improved over the years.

February 12, 2018 07:50 PM

Sarah Allen

to be recognized

Some people have the privilege to be recognized in our society. We’ve started to resurrect history and tell stories of people whose contributions have been studiously omitted. In America, February is Black history month. I didn’t learn Black history in school. I value this time for remedial studies, even as I feel a bit disturbed that we need to aggregate people by race to notice their impact. I had hoped that we, as a society, would have come farther along by now, in treating each other with fairness and respect. Instead, we are encoding our bias about what is noticed and who is recognized.

At the M.I.T. Media Lab, researcher Joy Buolamwini has studied facial recognition software, finding error rates increased with darker skin (via NYT). Specifically, algorithms by Microsoft, IBM and Face++ more frequently failed to identify the gender of black women than white men.

When the person in the photo is a white man, the software is right 99 percent of the time.
But the darker the skin, the more errors arise — up to nearly 35 percent for images of darker skinned women

A lack of judgement in choosing a data set is cast as an error of omission, a small lapse in attention on the part of software developers, yet the persistence of these kinds of errors illustrates a systemic bias. The systems that we build (ones made of code and others made of people) lack checks and balances where we actively notice whether our peers and our software are exercising good judgement, which includes treating people fairly and with respect.

Errors made by humans are amplified by the software we create.

Google "knowledge card" for Bessie Blount Griffen, shows same photo for Marie Van Brittan Brown and Miriam Benjamin

The Google “knowledge card” that appears next to the search results for “Bessie Blount Griffen” shows “people also searched for” two other women who are identified with the same photo. It’s hard to tell when the error first appeared, but we can guess that it was amplified by Google search results and perhaps by image search.

I discovered this error first when reading web articles about two different inventors and noticed that the photos used many of the articles were identical. This can be seen clearly in two examples below where the photo is composited with an image of the corresponding patent drawings. The patents were awarded to two unique humans, but somehow we, collectively, blur their individual identities, anonymizing them with a singular black female face.

I found a New York Times article of Marie Van Brittan Brown, and it seems that the oft replicated photo is Bessie Blount Griffen.

Additional confirmation by @SamMaggs tweet, author of “Wonder Women: 25 Innovators, Inventors, and Trailblazers Who Changed History”

photo of black women and patent drawing of medical apparatus
source: Black Then

patent drawing with home security system, with text: Marie Van Brittan Brown invented First Home Security System in 1966
source: Circle City Alarm blog

Newspaper article with photo of black woman and man behind her, caption: Mr and Mrs Albert L Brown

by sarah at February 12, 2018 01:49 PM

HolisticInfoSec.org

toolsmith #131 - The HELK vs APTSimulator - Part 1

Ladies and gentlemen, for our main attraction, I give you...The HELK vs APTSimulator, in a Death Battle! The late, great Randy "Macho Man" Savage said many things in his day, in his own special way, but "Expect the unexpected in the kingdom of madness!" could be our toolsmith theme this month and next. Man, am I having a flashback to my college days, many moons ago. :-) The HELK just brought it on. Yes, I know, HELK is the Hunting ELK stack, got it, but it reminded me of the Hulk, and then, I thought of a Hulkamania showdown with APTSimulator, and Randy Savage's classic, raspy voice popped in my head with "Hulkamania is like a single grain of sand in the Sahara desert that is Macho Madness." And that, dear reader, is a glimpse into exactly three seconds or less in the mind of your scribe, a strange place to be certain. But alas, that's how we came up with this fabulous showcase.
In this corner, from Roberto Rodriguez, @Cyb3rWard0g, the specter in SpecterOps, it's...The...HELK! This, my friends, is the s**t, worth every ounce of hype we can muster.
And in the other corner, from Florian Roth, @cyb3rops, the The Fracas of Frankfurt, we have APTSimulator. All your worst adversary apparitions in one APT mic drop. This...is...Death Battle!

Now with that out of our system, let's begin. There's a lot of goodness here, so I'm definitely going to do this in two parts so as not undervalue these two offerings.
HELK is incredibly easy to install. Its also well documented, with lots of related reading material, let me propose that you take the tine to to review it all. Pay particular attention to the wiki, gain comfort with the architecture, then review installation steps.
On an Ubuntu 16.04 LTS system I ran:
  • git clone https://github.com/Cyb3rWard0g/HELK.git
  • cd HELK/
  • sudo ./helk_install.sh 
Of the three installation options I was presented with, pulling the latest HELK Docker Image from cyb3rward0g dockerhub, building the HELK image from a local Dockerfile, or installing the HELK from a local bash script, I chose the first and went with the latest Docker image. The installation script does a fantastic job of fulfilling dependencies for you, if you haven't installed Docker, the HELK install script does it for you. You can observe the entire install process in Figure 1.
Figure 1: HELK Installation
You can immediately confirm your clean installation by navigating to your HELK KIBANA URL, in my case http://192.168.248.29.
For my test Windows system I created a Windows 7 x86 virtual machine with Virtualbox. The key to success here is ensuring that you install Winlogbeat on the Windows systems from which you'd like to ship logs to HELK. More important, is ensuring that you run Winlogbeat with the right winlogbeat.yml file. You'll want to modify and copy this to your target systems. The critical modification is line 123, under Kafka output, where you need to add the IP address for your HELK server in three spots. My modification appeared as hosts: ["192.168.248.29:9092","192.168.248.29:9093","192.168.248.29:9094"]. As noted in the HELK architecture diagram, HELK consumes Winlogbeat event logs via Kafka.
On your Windows systems, with a properly modified winlogbeat.yml, you'll run:
  • ./winlogbeat -c winlogbeat.yml -e
  • ./winlogbeat setup -e
You'll definitely want to set up Sysmon on your target hosts as well. I prefer to do so with the @SwiftOnSecurity configuration file. If you're doing so with your initial setup, use sysmon.exe -accepteula -i sysmonconfig-export.xml. If you're modifying an existing configuration, use sysmon.exe -c sysmonconfig-export.xml.  This will ensure rich data returns from Sysmon, when using adversary emulation services from APTsimulator, as we will, or experiencing the real deal.
With all set up and working you should see results in your Kibana dashboard as seen in Figure 2.

Figure 2: Initial HELK Kibana Sysmon dashboard.
Now for the showdown. :-) Florian's APTSimulator does some comprehensive emulation to make your systems appear compromised under the following scenarios:
  • POCs: Endpoint detection agents / compromise assessment tools
  • Test your security monitoring's detection capabilities
  • Test your SOCs response on a threat that isn't EICAR or a port scan
  • Prepare an environment for digital forensics classes 
This is a truly admirable effort, one I advocate for most heartily as a blue team leader. With particular attention to testing your security monitoring's detection capabilities, if you don't do so regularly and comprehensively, you are, quite simply, incomplete in your practice. If you haven't tested and validated, don't consider it detection, it's just a rule with a prayer. APTSimulator can be observed conducting the likes of:
  1. Creating typical attacker working directory C:\TMP...
  2. Activating guest user account
    1. Adding the guest user to the local administrators group
  3. Placing a svchost.exe (which is actually srvany.exe) into C:\Users\Public
  4. Modifying the hosts file
    1. Adding update.microsoft.com mapping to private IP address
  5. Using curl to access well-known C2 addresses
    1. C2: msupdater.com
  6. Dropping a Powershell netcat alternative into the APT dir
  7. Executes nbtscan on the local network
  8. Dropping a modified PsExec into the APT dir
  9. Registering mimikatz in At job
  10. Registering a malicious RUN key
  11. Registering mimikatz in scheduled task
  12. Registering cmd.exe as debugger for sethc.exe
  13. Dropping web shell in new WWW directory
A couple of notes here.
Download and install APTSimulator from the Releases section of its GitHub pages.
APTSimulator includes curl.exe, 7z.exe, and 7z.dll in its helpers directory. Be sure that you drop the correct version of 7 Zip for your system architecture. I'm assuming the default bits are 64bit, I was testing on a 32bit VM.

Let's do a fast run-through with HELK's Kibana Discover option looking for the above mentioned APTSimulator activities. Starting with a search for TMP in the sysmon-* index yields immediate results and strikes #1, 6, 7, and 8 from our APTSimulator list above, see for yourself in Figure 3.

Figure 3: TMP, PS nc, nbtscan, and PsExec in one shot
Created TMP, dropped a PowerShell netcat, nbtscanned the local network, and dropped a modified PsExec, check, check, check, and check.
How about enabling the guest user account and adding it to the local administrator's group? Figure 4 confirms.

Figure 4: Guest enabled and escalated
Strike #2 from the list. Something tells me we'll immediately find svchost.exe in C:\Users\Public. Aye, Figure 5 makes it so.

Figure 5: I've got your svchost right here
Knock #3 off the to-do, including the process.commandline, process.name, and file.creationtime references. Up next, the At job and scheduled task creation. Indeed, see Figure 6.

Figure 6. tasks OR schtasks
I think you get the point, there weren't any misses here. There are, of course, visualization options. Don't forget about Kibana's Timelion feature. Forensicators and incident responders live and die by timelines, use it to your advantage (Figure 7).

Figure 7: Timelion
Finally, for this month, under HELK's Kibana Visualize menu, you'll note 34 visualizations. By default, these are pretty basic, but you quickly add value with sub-buckets. As an example, I selected the Sysmon_UserName visualization. Initially, it yielded a donut graph inclusive of malman (my pwned user), SYSTEM and LOCAL SERVICE. Not good enough to be particularly useful I added a sub-bucket to include process names associated with each user. The resulting graph is more detailed and tells us that of the 242 events in the last four hours associated with the malman user, 32 of those were specific to cmd.exe processes, or 18.6% (Figure 8).

Figure 8: Powerful visualization capabilities
This has been such a pleasure this month, I am thrilled with both HELK and APTSimulator. The true principles of blue team and detection quality are innate in these projects. The fact that Roberto consider HELK still in alpha state leads me to believe there is so much more to come. Be sure to dig deeply into APTSimulator's Advance Solutions as well, there's more than one way to emulate an adversary.
Next month Part 2 will explore the Network side of the equation via the Network Dashboard and related visualizations, as well as HELK integration with Spark, Graphframes & Jupyter notebooks.
Aw snap, more goodness to come, I can't wait.
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at February 12, 2018 06:56 AM

February 11, 2018

syslog.me

The future of configuration management (again), and a suggestion

cfgmgmtcamp-logoI have attended the Config Management Camp in Gent this year, where I also presented the talk “Promise theory: from configuration management to team leadership“. A thrilling experience, considering that I was talking about promise theory at the same conference and in the same track where Mark Burgess, the inventor of promise theory, was holding one of the keynotes!

The quality of the conference was as good as always, but my experience at the conference was completely different from the past. Last time I attended, in 2016, I was actively using CFEngine and that shaped in both the talks I attended and the people that I hanged on with the most. This year I was coming from a different work environment and a different job: I jumped a lot through the different tracks and devrooms, and talked with many people with a very different experience than mine. And that was truly enriching. I’ll focus on one experience in particular, that led me to see what the future of configuration management could be.

 

I attended all the keynotes. Mark Burgess’ was, as always, rich in content and a bit hard to process; lots of food for though, but I couldn’t let it percolate in my brain until someone made it click several hours later. More on that in a minute.

Then there was Luke Kanies’ keynote, explaining where configuration management and we, CM practitioners, won the battle; and also where we lost the battle and where we are irrelevant. Again, more stuff accumulated, waiting for something to trigger the mental process to consume the information. There was also the keynote by Adam Jacob about the future of Configuration Management, great and fun as always but not part of this movie 🙂 I recommend that you enjoy it on youtube.

Later, at the social event, I had the pleasure to have a conversation with Stein Inge Morisbak, whom I knew from before as we met in Oslo several times. With his experience working on public cloud infrastructures like AWS and Google Cloud Platform, Stein Inge was one of the people who attended the conference with a sceptical eye about configuration management and, at the same time, with the open mind that you would expect from the great guy he is. In a sincere effort to understand, he couldn’t really see how CM, “a sinking ship”, could possibly be relevant in an era where public cloud, immutable infrastructure and all the tooling around are the modern technology of today.

While we were talking, another great guy chimed in, namely Ivan Rossi. If you look at Ivan’s LinkedIn page you’ll see that he’s been working in technology for a good while and has seen things from many different angles. Ivan made a few practical examples where CM is the only tooling that you can use because the cloud simply isn’t there and the tooling that you use in immutable infrastructure don’t work: think of networks of devices sitting in the middle of nowhere. In situations like those, with limited hardware resources and/or shitty wireless links like 2G networks, you need something that is lightweight, resilient, fault tolerant, and that can maintain the configuration because in no way you’re just going around every other day to replace the devices with new ones with updated configurations and software.

And there, Stein Inge was the first one to make the link with Mark Burgess’ keynote and to make me part of his revelation (or his “pilgrim’s experience”, as he calls it). Mark talked about a new sprawl of hardware devices going on: they are all around us, in phones and tablets, and more and more in our domestic appliances, in smart cars, in all the “smart” devices that people are buying every day. A heap of devices that is poorly managed as of today, if at all, and where CM has definitely a place. Stein Inge talked about this experience in his blog; his post is in Norwegian so you must either know the language or ask some translation software for help, I promise it’s worth the read.

What’s the future then?

So, what’s the future of configuration management, based on Mark Burgess’ vision and these observations? A few ideas:

  • on the server side, it will be less and less relevant to the everyday user as more people will shift to private and public clouds. It will still be relevant for those who maintain hardware infrastructures; the big players will maybe decide to bake their own tools to better suit their hardware and workflows — they have the workforce and the skills in house, so why not? The smaller players will keep using “off-the-shelf” tools in the same lines of those we have today for provisioning hardware and keep their configurations in shape;
  • configuration management will become more relevant as a tool to manage fleets of hardware like company workstations and laptops, for example, to enforce policies and ensure that security measures are in place at all times; that will eventually include company-owned phones;
  • configuration management will be more and more relevant in IoT and “smart” devices in general; for those, a new generation of tools may be needed that can run on limited hardware and unreliable networks; agent-based tools will probably have the upper hand here;
  • we’ll have less and less config management on virtual machines (and possibly less and less virtual machines and more and more containers); CM on virtual machines will remain only in special cases, e.g. where you need to run a software that doesn’t lend itself to automatic installation and configuration (Atlassian, I am looking at you).

As always with future forecast, time will tell.

One word about Configuration Management Camp

I am a fan of Config Management Camp since I attended (and presented at-) the first edition. I am glad to see that the scope of the conference is widening to include containers and immutable infrastructure. However, as Stein Inge says in his blog post (the translation is mine, as all mistakes thereof):

The most part of the talks revolved around configuration management or servers, which is of little importance in a world where we use services on public cloud platforms on a much higher abstraction level.

Maybe, and I stress maybe, an effort should be made to reduce the focus from configuration management a bit in favour of the “rival” technologies of nowadays; not to the point that CM disappears because, as I just said, CM will still play an important part, and CfgMgmtCamp is not DevOpsDays anyway. Possibly a different name that underlines Infrastructure as Code as the real topic could help in this rebalance?

by bronto at February 11, 2018 09:03 PM

February 10, 2018

Steve Kemp's Blog

Decoding 433Mhz-transmissions with software-defined radio

This blog-post is a bit of a diversion, and builds upon my previous entry of using 433Mhz radio-transmitters and receivers with Arduino and/or ESP8266 devices.

As mentioned in my post I've recently been overhauling my in-house IoT buttons, and I decided to go down the route of using commercially-available buttons which broadcast signals via radio, rather than using IR, or WiFi. The advantage is that I don't need to build any devices, or worry about 3D-printing a case - the commercially available buttons are cheap, water-proof, portable, and reliable, so why not use them? Ultimately I bought around ten buttons, along with a radio-receiver and radio-transmitter modules for my ESP8266 device. I wrote code to run on my device to receive the transmissions, decode the device-ID, and take different actions based upon the specific button pressed.

In the gap between buying the buttons (read: radio transmitters) and waiting for the transmitter/receiver modules I intended to connect to my ESP8266/arduino device(s) I remembered that I'd previously bought a software-defined-radio receiver, and figured I could use it to receive and react to the transmissions directly upon my PC.

The dongle I'd bought in the past was a simple USB-device which identifies itself as follows when inserted into my desktop:

  [17844333.387774] usb 3-9: New USB device found, idVendor=0bda, idProduct=2838
  [17844333.387777] usb 3-9: New USB device strings: Mfr=1, Product=2, SerialNumber=3
  [17844333.387778] usb 3-9: Product: RTL2838UHIDIR
  [17844333.387778] usb 3-9: Manufacturer: Realtek
  [17844333.387779] usb 3-9: SerialNumber: 00000001

At the time I bought it I wrote a brief blog post, which described tracking aircraft, and I said "I know almost nothing about SDR, except that it can be used to let your computer do stuff with radio."

So my first step was finding some suitable software to listen to the right-frequency and ideally decode the transmissions. A brief search lead me to the following repository:

The RTL_433 project is pretty neat as it allows receiving transmissions and decoding them. Of course it can't decode everything, but it has the ability to recognize a bunch of commonly-used hardware, and when it does it outputs the payload in a useful way, rather than just dumping a bitstream/bytestream.

Once you've got your USB-dongle plugged in, and you've built the project you can start receiving and decoding all discovered broadcasts like so:

  skx@deagol ~$ ./build/src/rtl_433 -U -G
  trying device  0:  Realtek, RTL2838UHIDIR, SN: 00000001
  Found Rafael Micro R820T tuner
  Using device 0: Generic RTL2832U OEM
  Exact sample rate is: 250000.000414 Hz
  Sample rate set to 250000.
  Bit detection level set to 0 (Auto).
  Tuner gain set to Auto.
  Reading samples in async mode...
  Tuned to 433920000 Hz.
  ...

Here we've added flags:

  • -G
    • Enable all decoders. So we're not just listening for traffic at 433Mhz, but we're actively trying to decode the payload of the transmissions.
  • -U
    • Timestamps are in UTC

Leaving this running for a few hours I noted that there are several nearby cars which are transmitting data about their tyre-pressure:

  2018-02-10 11:53:33 :      Schrader       :      TPMS       :      25
  ID:          1B747B0
  Pressure:    2.500 bar
  Temperature: 6 C
  Integrity:   CRC

The second log is from running with "-F json" to cause output to be generated in JSON format:

  {"time" : "2018-02-10 09:51:02",
   "model" : "Toyota",
   "type" : "TPMS",
   "id" : "5e7e0637",
   "code" : "63e6026d",
   "mic" : "CRC"}

In both cases we see "TPMS", and according to wikipedia that is Tyre Pressure Monitoring System. I'm a little shocked to receive this data, unencrypted!

Other events also become visible, when I left the scanner running, which is presumably some kind of temperature-sensor a neighbour has running:

  2018-02-10 13:19:08 : RF-tech
     Id:              0
     Battery:         LOW
     Button:          0
     Temperature:     0.0 C

Anyway I have a bunch of remote-controlled sockets, branded "NEXA", which look like this:

Radio-Controlled Sockets

When I press the remote I can see the transmissions and program my PC to react to them:

  2018-02-11 07:31:20 : Nexa
    House Code:  39920705
    Group:  1
    Channel: 3
    State:   ON
    Unit:    2

In conclusion:

  • SDR can be used to easily sniff & decode cheap and commonly available 433Mhz-based devices.
  • "Modern" cars transmit their tyre-pressure, apparently!
  • My neighbours can probably overhear my button presses.

February 10, 2018 10:00 PM

February 08, 2018

Sean's IT Blog

Moving to the Cloud? Don’t Forget End-User Experience

The cloud has a lot to offer IT departments.  It provides the benefits of virtualization in a consumption-based model, and it allows new applications to quickly be deployed while waiting for, or even completely forgoing, on-premises infrastructure.  This can provide a better time-to-value and greater flexibility for the business.  It can help organizations reduce, or eliminate, their on-premises data center footprint.

But while the cloud has a lot of potential to disrupt how IT manages applications in the data center, it also has the potential to disrupt how IT delivers services to end users.

In order to understand how cloud will disrupt end-user computing, we first need to look at how organizations are adopting the cloud.  We also need to look at how the cloud can change application development patterns, and how that will change how IT delivers services to end users.

The Current State of Cloud

When people talk about cloud, they’re usually talking about three different types of services.  These services, and their definitions, are:

  • Infrastructure-as-a-Service: Running virtual machines in a hosted, multi-tenant virtual data center.
  • Platform-as-a-Service: Allows developers to subscribe to build applications without having to build the supporting infrastructure.  The platform can include some combination of web services, application run time services (like .Net or Java), databases, message bus services, and other managed components.
  • Software-as-a-Service: Subscription to a vendor hosted and managed application.

The best analogy to explain this comparing the different cloud offerings with different types of pizza restaurants using the graphic below from episerver.com:

Image retrieved from: http://www.episerver.com/learn/resources/blog/fred-bals/pizza-as-a-service/

So what does this have to do with End-User Computing?

Today, it seems like enterprises that are adopting cloud are going in one of two directions.  The first is migrating their data centers into infrastructure-as-a-service offerings with some platform-as-a-service mixed in.  The other direction is replacing applications with software-as-a-service options.  The former is migrating your applications to Azure or AWS EC2, the latter is replacing on-premises services with options like ServiceNow or Microsoft Office 365.

Both options can present challenges to how enterprises deliver applications to end-users.  And the choices made when migrating on-premises applications to the cloud can greatly impact end-user experience.

The challenges around software-as-a-service deal more with identity management, so this post will focus on migrating on-premises applications to the cloud.

Know Thy Applications – Infrastructure-As-A-Service and EUC Challenges

Infrastructure-as-a-Service offerings provide IT organizations with virtual machines running in a cloud service.  These offerings provide different virtual machines optimized for different tasks, and they provide the flexibility to meet the various needs of an enterprise IT organization.  They allow IT organizations to bring their on-premises business applications into the cloud.

The lifeblood of many businesses is Win32 applications.  Whether they are commercial or developed in house, these applications are often critical to some portion of a business process.  Many of these applications were never designed with high availability or the cloud in mind, and the developer and/or the source code may be long gone.  Or they might not be easily replaced because they are deeply integrated into critical processes or other enterprise systems.

Many Win32 applications have clients that expect to connect to local servers.  But when you move those servers to a remote datacenter, including the cloud, it can introduce problems that makes the application nearly unusable.  Common problems that users encounter are longer application load times, increased transaction times, and reports taking longer to preview and/or print.

These problems make employees less productive, and it has an impact on the efficiency and profitability of the business.

A few jobs ago, I was working for a company that had its headquarters, local office, and data center co-located in the same building.  They also had a number of other regional offices scattered across our state and the country.  The company had grown to the point where they were running out of space, and they decided to split the corporate and local offices.  The corporate team moved to a new building a few miles away, but the data center remained in the building.

Many of the corporate employees were users of a two-tier business application, and the application client connected directly to the database server.  Moving users of a fat client application a few miles down the road from the database server had a significant impact on application performance and user experience.  Application response suffered, and user complaints rose.  Critical business processes took longer, and productivity suffered as a result.

More bandwidth was procured. That didn’t solve the issue, and IT was sent scrambling for a new solution.  Eventually, these issues were addressed with a solution that was already in use for other areas of the business – placing the core applications into Windows Terminal Services and provide users at the corporate office with a published desktop that provided their required applications.

This solution solved their user experience and application performance problems.  But it required other adjustments to the server environment, business process workflows, and how users interact with the technology that enables them to work.  It took time for users to adjust to the changes.  Many of the issues were addressed when the business moved everything to a colocation facility a hundred miles away a few months later.

Ensuring Success When Migrating Applications to the Cloud

The business has said it’s time to move some applications to the cloud.  How do you ensure it’s a success and meets the business and technical requirements of that application while making sure an angry mob of users don’t show up at your office with torches and pitchforks?

The first thing is to understand your application portfolio.  That understanding goes beyond having visibility into what applications you have in your environment and how those applications work from a technical perspective.  You need wholistic view of your applications and  keep the following questions in mind:

  • Who uses the application?
  • What do the users do in the application?
  • How do the users access the application?
  • Where does it fit into business processes and workflows?
  • What other business systems does the application integrate with?
  • How is that integration handled?

Applications rarely exist in a vacuum, and making changes to one not only impacts the users, but it can impact other applications and business processes as well.

By understanding your applications, you will be able to build a roadmap of when applications should migrate to the cloud and effectively mitigate any impacts to both user experience and enterprise integrations.

The second thing is to test it extensively.  The testing needs to be more extensive than functional testing to ensure that the application will run on the server images built by the cloud providers, and it needs to include extensive user experience and user acceptance testing.  This may include spending time with users measuring tasks with a stop-watch to compare how long tasks take in cloud-hosted systems versus on-premises systems.

If application performance isn’t up to user standards and has a significant impact on productivity, you may need to start investigating solutions for bringing users closer to the cloud-hosted applications.  This includes solutions like Citrix, VMware Horizon Cloud, or Amazon Workspaces or AppStream. These solutions bring users closer to the applications, and it can give users an on-premises experience in the cloud.

The third thing is to plan ahead.  Having a roadmap and knowing your application portfolio enables you to plan for when you need capacity or specific features to support users, and it can guide your architecture and product selection.  You don’t want to get three years into a five year migration and find out that the solution you selected doesn’t have the features you require for a use case or that the environment wasn’t architected to support the number of users.

When planning to migrate applications from your on-premises datacenters to an infrastructure-as-a-service offering, it’s important to know your applications and take end-user experience into account.   It’s important to test, and understand, how these applications perform when the application servers and databases are remote to the application client.  If you don’t, you not only anger your users, but you also make them less productive and less profitable overall.

 

by seanpmassey at February 08, 2018 03:22 PM

OpenSSL

Using TLS1.3 With OpenSSL

Note: This is an updated version of an earlier blog post available here.

The forthcoming OpenSSL 1.1.1 release will include support for TLSv1.3. The new release will be binary and API compatible with OpenSSL 1.1.0. In theory, if your application supports OpenSSL 1.1.0, then all you need to do to upgrade is to drop in the new version of OpenSSL when it becomes available and you will automatically start being able to use TLSv1.3. However there are some issues that application developers and deployers need to be aware of. In this blog post I am going to cover some of those things.

Differences with TLS1.2 and below

TLSv1.3 is a major rewrite of the specification. There was some debate as to whether it should really be called TLSv2.0 - but TLSv1.3 it is. There are major changes and some things work very differently. A brief, incomplete, summary of some things that you are likely to notice follows:

  • There are new ciphersuites that only work in TLSv1.3. The old ciphersuites cannot be used for TLSv1.3 connections.
  • The new ciphersuites are defined differently and do not specify the certificate type (e.g. RSA, DSA, ECDSA) or the key exchange mechanism (e.g. DHE or ECHDE). This has implications for ciphersuite configuration.
  • Clients provide a “key_share” in the ClientHello. This has consequences for “group” configuration.
  • Sessions are not established until after the main handshake has been completed. There may be a gap between the end of the handshake and the establishment of a session (or, in theory, a session may not be established at all). This could have impacts on session resumption code.
  • Renegotiation is not possible in a TLSv1.3 connection
  • More of the handshake is now encrypted.
  • More types of messages can now have extensions (this has an impact on the custom extension APIs and Certificate Transparency)
  • DSA certificates are no longer allowed in TLSv1.3 connections

Note that at this stage only TLSv1.3 is supported. DTLSv1.3 is still in the early days of specification and there is no OpenSSL support for it at this time.

Current status of the TLSv1.3 standard

As of the time of writing TLSv1.3 is still in draft. Periodically a new version of the draft standard is published by the TLS Working Group. Implementations of the draft are required to identify the specific draft version that they are using. This means that implementations based on different draft versions do not interoperate with each other.

OpenSSL 1.1.1 will not be released until (at least) TLSv1.3 is finalised. In the meantime the OpenSSL git master branch contains our development TLSv1.3 code which can be used for testing purposes (i.e. it is not for production use). You can check which draft TLSv1.3 version is implemented in any particular OpenSSL checkout by examining the value of the TLS1_3_VERSION_DRAFT_TXT macro in the tls1.h header file. This macro will be removed when the final version of the standard is released.

TLSv1.3 is enabled by default in the latest development versions (there is no need to explicitly enable it). To disable it at compile time you must use the “no-tls1_3” option to “config” or “Configure”.

Currently OpenSSL has implemented the “draft-23” version of TLSv1.3. Other applications that support TLSv1.3 may still be using older draft versions. This is a common source of interoperability problems. If two peers supporting different TLSv1.3 draft versions attempt to communicate then they will fall back to TLSv1.2.

Ciphersuites

OpenSSL has implemented support for five TLSv1.3 ciphersuites as follows:

  • TLS13-AES-256-GCM-SHA384
  • TLS13-CHACHA20-POLY1305-SHA256
  • TLS13-AES-128-GCM-SHA256
  • TLS13-AES-128-CCM-8-SHA256
  • TLS13-AES-128-CCM-SHA256

Of these the first three are in the DEFAULT ciphersuite group. This means that if you have no explicit ciphersuite configuration then you will automatically use those three and will be able to negotiate TLSv1.3.

All the TLSv1.3 ciphersuites also appear in the HIGH ciphersuite alias. The CHACHA20, AES, AES128, AES256, AESGCM, AESCCM and AESCCM8 ciphersuite aliases include a subset of these ciphersuites as you would expect based on their names. Key exchange and authentication properties were part of the ciphersuite definition in TLSv1.2 and below. This is no longer the case in TLSv1.3 so ciphersuite aliases such as ECDHE, ECDSA, RSA and other similar aliases do not contain any TLSv1.3 ciphersuites.

If you explicitly configure your ciphersuites then care should be taken to ensure that you are not inadvertently excluding all TLSv1.3 compatible ciphersuites. If a client has TLSv1.3 enabled but no TLSv1.3 ciphersuites configured then it will immediately fail (even if the server does not support TLSv1.3) with an error message like this:

1
140399519134144:error:141A90B5:SSL routines:ssl_cipher_list_to_bytes:no ciphers available:ssl/statem/statem_clnt.c:3715:No ciphers enabled for max supported SSL/TLS version

Similarly if a server has TLSv1.3 enabled but no TLSv1.3 ciphersuites it will also immediately fail, even if the client does not support TLSv1.3, with an error message like this:

1
140640328024512:error:141FC0B5:SSL routines:tls_setup_handshake:no ciphers available:ssl/statem/statem_lib.c:120:No ciphers enabled for max supported SSL/TLS version

For example, setting a ciphersuite selection string of ECDHE:!COMPLEMENTOFDEFAULT will work in OpenSSL 1.1.0 and will only select those ciphersuites that are in DEFAULT and also use ECDHE for key exchange. However no TLSv1.3 ciphersuites are in the ECDHE group so this ciphersuite configuration will fail in OpenSSL 1.1.1 if TLSv1.3 is enabled.

You may want to explicitly list the TLSv1.3 ciphersuites you want to use to avoid problems. For example:

1
"TLS13-CHACHA20-POLY1305-SHA256:TLS13-AES-128-GCM-SHA256:TLS13-AES-256-GCM-SHA384:ECDHE:!COMPLEMENTOFDEFAULT"

You can test which ciphersuites are included in a given ciphersuite selection string using the openssl ciphers -s -v command:

1
$ openssl ciphers -s -v "ECDHE:!COMPLEMENTOFDEFAULT"

Ensure that at least one ciphersuite supports TLSv1.3

Groups

In TLSv1.3 the client selects a “group” that it will use for key exchange. At the time of writing, OpenSSL only supports ECDHE groups for this. The client then sends “key_share” information to the server for its selected group in the ClientHello.

The list of supported groups is configurable. It is possible for a client to select a group that the server does not support. In this case the server requests that the client sends a new key_share that it does support. While this means a connection will still be established (assuming a mutually supported group exists), it does introduce an extra server round trip - so this has implications for performance. In the ideal scenario the client will select a group that the server supports in the first instance.

In practice most clients will use X25519 or P-256 for their initial key_share. For maximum performance it is recommended that servers are configured to support at least those two groups and clients use one of those two for its initial key_share. This is the default case (OpenSSL clients will use X25519).

The group configuration also controls the allowed groups in TLSv1.2 and below. If applications have previously configured their groups in OpenSSL 1.1.0 then you should review that configuration to ensure that it still makes sense for TLSv1.3. The first named (i.e. most preferred group) will be the one used by an OpenSSL client in its intial key_share.

Applications can configure the group list by using SSL_CTX_set1_groups() or a similar function (see here for further details). Alternatively, if applications use SSL_CONF style configuration files then this can be configured using the Groups or Curves command (see here).

Sessions

In TLSv1.2 and below a session is established as part of the handshake. This session can then be used in a subsequent connection to achieve an abbreviated handshake. Applications might typically obtain a handle on the session after a handshake has completed using the SSL_get1_session() function (or similar). See here for further details.

In TLSv1.3 sessions are not established until after the main handshake has completed. The server sends a separate post-handshake message to the client containing the session details. Typically this will happen soon after the handshake has completed, but it could be sometime later (or not at all).

The specification recommends that applications only use a session once (although this is not enforced). For this reason some servers send multiple session messages to a client. To enforce the “use once” recommendation applications could use SSL_CTX_remove_session() to mark a session as non-resumable (and remove it from the cache) once it has been used.

The old SSL_get1_session() and similar APIs may not operate as expected for client applications written for TLSv1.2 and below. Specifically if a client application calls SSL_get1_session() before the server message containing session details has been received then an SSL_SESSION object will still be returned, but any attempt to resume with it will not succeed and a full handshake will occur instead. In the case where multiple sessions have been sent by the server then only the last session will be returned by SSL_get1_session().

Client application developers should consider using the SSL_CTX_sess_set_new_cb() API instead (see here). This provides a callback mechanism which gets invoked every time a new session is established. This can get invoked multiple times for a single connection if a server sends multiple session messages.

Note that SSL_CTX_sess_set_new_cb() was also available in OpenSSL 1.1.0. Applications that already used that API will still work, but they may find that the callback is invoked at unexpected times, i.e. post-handshake.

An OpenSSL server will immediately attempt to send session details to a client after the main handshake has completed. To server applications this post-handshake stage will appear to be part of the main handshake, so calls to SSL_get1_session() should continue to work as before.

Custom Extensions and Certificate Transparency

In TLSv1.2 and below the initial ClientHello and ServerHello messages can contain “extensions”. This allows the base specifications to be extended with additional features and capabilities that may not be applicable in all scenarios or could not be foreseen at the time that the base specifications were written. OpenSSL provides support for a number of “built-in” extensions.

Additionally the custom extensions API provides some basic capabilities for application developers to add support for new extensions that are not built-in to OpenSSL.

Built on top of the custom extensions API is the “serverinfo” API. This provides an even more basic interface that can be configured at run time. One use case for this is Certificate Transparency. OpenSSL provides built-in support for the client side of Certificate Transparency but there is no built-in server side support. However this can easily be achieved using “serverinfo” files. A serverinfo file containing the Certificate Transparency information can be configured within OpenSSL and it will then be sent back to the client as appropriate.

In TLSv1.3 the use of extensions is expanded significantly and there are many more messages that can include them. Additionally some extensions that were applicable to TLSv1.2 and below are no longer applicable in TLSv1.3 and some extensions are moved from the ServerHello message to the EncryptedExtensions message. The old custom extensions API does not have the ability to specify which messages the extensions should be associated with. For that reason a new custom extensions API was required.

The old API will still work, but the custom extensions will only be added where TLSv1.2 or below is negotiated. To add custom extensions that work for all TLS versions application developers will need to update their applications to the new API (see here for details).

The “serverinfo” data format has also been updated to include additional information about which messages the extensions are relevant to. Applications using “serverinfo” files may need to update to the “version 2” file format to be able to operate in TLSv1.3 (see here and here for details).

Renegotiation

TLSv1.3 does not have renegotiation so calls to SSL_renegotiate() or SSL_renegotiate_abbreviated() will immediately fail if invoked on a connection that has negotiated TLSv1.3.

A common use case for renegotiation is to update the connection keys. The function SSL_key_update() can be used for this purpose in TLSv1.3 (see here for further details).

Another use case is to request a certificate from the client. This can be achieved by using the SSL_verify_client_post_handshake() function in TLSv1.3 (see here for further details).

DSA certificates

DSA certificates are no longer allowed in TLSv1.3. If your server application is using a DSA certificate then TLSv1.3 connections will fail with an error message similar to the following:

1
140348850206144:error:14201076:SSL routines:tls_choose_sigalg:no suitable signature algorithm:ssl/t1_lib.c:2308:

Please use an ECDSA or RSA certificate instead.

Middlebox Compatibility Mode

During development of the TLSv1.3 standard it became apparent that in some cases, even if a client and server both support TLSv1.3, connections could sometimes still fail. This is because middleboxes on the network between the two peers do not understand the new protocol and prevent the connection from taking place. In order to work around this problem the TLSv1.3 specification introduced a “middlebox compatibility” mode. This made a few optional changes to the protocol to make it appear more like TLSv1.2 so that middleboxes would let it through. Largely these changes are superficial in nature but do include sending some small but unneccessary messages. OpenSSL has middlebox compatibility mode on by default, so most users should not need to worry about this. However applications may choose to switch it off by calling the function SSL_CTX_clear_options() and passing SSL_OP_ENABLE_MIDDLEBOX_COMPAT as an argument (see here for further details).

If the remote peer is not using middlebox compatibility mode and there are problematic middleboxes on the network path then this could cause spurious connection failures.

Conclusion

TLSv1.3 represents a significant step forward and has some exciting new features but there are some hazards for the unwary when upgrading. Mostly these issues have relatively straight forward solutions. Application developers should review their code and consider whether anything should be updated in order to work more effectively with TLSv1.3. Similarly application deployers should review their configuration.

February 08, 2018 11:00 AM

February 03, 2018

Sarah Allen

making of esther music video

In 1991, the Company of Science & Art created a music video for Phish. It featured richly colored pastel drawings by Scott Nybakken with just a little bit of vector art and compositing that was likely done in Photoshop 1.0.

The Esther animation (below) was recorded in real-time from Apple Macintosh IIfx running PACo a software-based animation engine developed by The Company of Science & Art

At the time that this music video was made, CD-ROMs were new and Apple Macintosh computers had recently started supporting color monitors. The IIfx was the high-end of the Mac product line, and I remember that computer had 8MB of RAM, which was the most of any Mac in the office.

The sound file had to be annotated with sync points, which were produced by importing the audio into SoundEdit and then adding labels using the graphical user interface and then I think the labels could be exported as its own file (or maybe it was embedded in the sound file, I don’t remember). We used HyperCard as the original user interface for PACo, so we could experiment with new features quickly. The software would combine the labels, audio and a folder full of images, and also allow you somehow to specify transition between the images.

Images were 8-bit color, which means that each pixel was stored as a number which was an index into a color palette. Some of the color shifts in the Esther flying sequence were animated by a specific animation technique that was accomplished by rotating the color palette while leaving the pixels on the screen unchanged. This allowed for faster transitions and smoother animations than would otherwise have been possible. Even the fastest personal computer at that time was too slow to redraw the full screen at the framerate required for animation to look smooth. Notice that where elements are animated only a portion of the screen changes at any one time, such as with the church window sequence (2:07) and the spinning girl (3:40).

It was really fun to work with Scott Nybakken and John Greene who created effects that seemed to stretch beyond what was possible with the software and hardware.

by sarah at February 03, 2018 10:29 PM

Errata Security

Blame privacy activists for the Memo??

Former FBI agent Asha Rangappa @AshaRangappa_ has a smart post debunking the Nunes Memo, then takes it all back again with an op-ed on the NYTimes blaming us privacy activists. She presents an obviously false narrative that the FBI and FISA courts are above suspicion.

I know from first hand experience the FBI is corrupt. In 2007, they threatened me, trying to get me to cancel a talk that revealed security vulnerabilities in a large corporation's product. Such abuses occur because there is no transparency and oversight. FBI agents write down our conversation in their little notebooks instead of recording it, so that they can control the narrative of what happened, presenting their version of the converstion (leaving out the threats). In this day and age of recording devices, this is indefensible.

She writes "I know firsthand that it’s difficult to get a FISA warrant". Yes, the process was difficult for her, an underling, to get a FISA warrant. The process is different when a leader tries to do the same thing.

I know this first hand having casually worked as an outsider with intelligence agencies. I saw two processes in place: one for the flunkies, and one for those above the system. The flunkies constantly complained about how there is too many process in place oppressing them, preventing them from getting their jobs done. The leaders understood the system and how to sidestep those processes.

That's not to say the Nunes Memo has merit, but it does point out that privacy advocates have a point in wanting more oversight and transparency in such surveillance of American citizens.

Blaming us privacy advocates isn't the way to go. It's not going to succeed in tarnishing us, but will push us more into Trump's camp, causing us to reiterate that we believe the FBI and FISA are corrupt.

by Robert Graham (noreply@blogger.com) at February 03, 2018 02:32 AM

February 01, 2018

Evaggelos Balaskas

containers containers containers

systemd

Latest systemd version now contains the systemd-importd daemon .

That means that we can use machinectl to import a tar or a raw image from the internet to use it with the systemd-nspawn command.

so here is an example

machinectl

from my archlinux box:

# cat /etc/arch-release

Arch Linux release

We can download the tar centos7 docker image from the docker hub registry:

# machinectl pull-tar --verify=no https://github.com/CentOS/sig-cloud-instance-images/raw/79db851f4016c283fb3d30f924031f5a866d51a1/docker/centos-7-docker.tar.xz

...
Created new local image 'centos-7-docker'.
Operation completed successfully.
Exiting.

we can verify that:

# ls -la /var/lib/machines/centos-7-docker

total 28
dr-xr-xr-x 1 root root   158 Jan  7 18:59 .
drwx------ 1 root root   488 Feb  1 21:17 ..
-rw-r--r-- 1 root root 11970 Jan  7 18:59 anaconda-post.log
lrwxrwxrwx 1 root root     7 Jan  7 18:58 bin -> usr/bin
drwxr-xr-x 1 root root     0 Jan  7 18:58 dev
drwxr-xr-x 1 root root  1940 Jan  7 18:59 etc
drwxr-xr-x 1 root root     0 Nov  5  2016 home
lrwxrwxrwx 1 root root     7 Jan  7 18:58 lib -> usr/lib
lrwxrwxrwx 1 root root     9 Jan  7 18:58 lib64 -> usr/lib64
drwxr-xr-x 1 root root     0 Nov  5  2016 media
drwxr-xr-x 1 root root     0 Nov  5  2016 mnt
drwxr-xr-x 1 root root     0 Nov  5  2016 opt
drwxr-xr-x 1 root root     0 Jan  7 18:58 proc
dr-xr-x--- 1 root root   120 Jan  7 18:59 root
drwxr-xr-x 1 root root   104 Jan  7 18:59 run
lrwxrwxrwx 1 root root     8 Jan  7 18:58 sbin -> usr/sbin
drwxr-xr-x 1 root root     0 Nov  5  2016 srv
drwxr-xr-x 1 root root     0 Jan  7 18:58 sys
drwxrwxrwt 1 root root   140 Jan  7 18:59 tmp
drwxr-xr-x 1 root root   106 Jan  7 18:58 usr
drwxr-xr-x 1 root root   160 Jan  7 18:58 var

systemd-nspawn

Now test we can test it:

[root@myhomepc ~]# systemd-nspawn --machine=centos-7-docker

Spawning container centos-7-docker on /var/lib/machines/centos-7-docker.
Press ^] three times within 1s to kill container.

[root@centos-7-docker ~]#
[root@centos-7-docker ~]#
[root@centos-7-docker ~]# cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)
[root@centos-7-docker ~]#
[root@centos-7-docker ~]# exit
logout
Container centos-7-docker exited successfully.

and now returning to our system:

[root@myhomepc ~]#
[root@myhomepc ~]#
[root@myhomepc ~]# cat /etc/arch-release
Arch Linux release

February 01, 2018 09:08 PM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – January 2018

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018.
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  3. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book  – note that this series is even mentioned in some PCI Council materials. 
  4. Simple Log Review Checklist Released!” is often at the top of this list – this rapildy aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version) s
  5. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?”
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

A critical reference post:
Current research on testing security:
Current research on threat detection “starter kit”
Current research on SOAR:
Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at February 01, 2018 09:05 PM

Ben's Practical Admin Blog

Impressions of Dell EMC OpenManage Enterprise

Dell EMC OpenManage Enterprise has now been available available as a Tech Release for a couple of months now, and I have recently had a opportunity to sit down and do some evaluation of the product at work.

The following thoughts and comments are made based on the version 1.0.0 (build 543) appliance.

DellOMEnt02-blur

 

OpenManage Enterprise (OMEnt) is described by Dell EMC as the next generation of their  Open Manage Essentials (OMEss) platform. At face value it has some really good features going for it:

  • System is now deployed from an appliance template (OVF, VHD etc). No more having to customise a host build for the application, and no more licensing considerations.
  • The UI is now HTML5. I can’t begin to describe how happy I am to see the end of silverlight…
  • Information and UI simplification.

Installation of the appliance was painless, and I had my test install up and running in under 10 minutes, which again is quite a welcome thing. Previous OpenManage Essentials installed usually took much longer after ensuring the prerequisites software installs were met, and mucking around with the interactive install.

Upon logging in, your eyes are in for a treat. The new HTML5 UI makes OMEss look positively archaic. Dell EMC have largely adopted the interface design seen in their OpenManage Mobile (OMM) application and the whole login dashboard is refreshingly uncluttered. The same goes for the whole UI really, and I think this is best shown in the device views.

Another excellent new feature is the overhaul of the firmware management within OMEnt. It is now possible to configure multiple baselines for firmware based on network share location or by using the Dell EMC online repository. This is fantastic news for larger organisations that like to establish firmware baselines and schedule their platform updates, rather than having to do some fancy repository juggling that OMEss forced you to do.  The one issue I do see with this so far is that you cannot import firmware version information from an existing device to establish what the baseline should be.

Update: Version 2.4 of OMEss released Jan 25 supports multiple firmware baselines as well.

Many of the other features currently remain the same as OMEss, albeit in in a much nicer UI, so I do not plan to go through those.

Email alerts have also had a facelift, now being full HTML, rather than the plain text of OMEss. This I have mixed feelings about because for email alerts at least, pretty formatting seems overkill and gets in the way of information. Below is an example of one of the new look alerts.

DellOMEnt-EmailAlert

Being a version 1.0.0 Tech Release build, it definitely has a number of issues that will need resolving before the next planned release during Q2-Q3 2018. Some of the issues I have experienced include:

  • AD/LDAP integration doesn’t appear to be working
  • Logins cannot have a ‘.’ in the username
  • In order to import configuration templates from devices, SMB1 must be enabled in application settings. This one tripped me up and left me frustrated for a good day.
  • Navigation around application logs is not terribly intuitive. When the configuration template import failed (see above), there was no immediate way to find a log file to find out why it had failed. I eventually found it buried in the jobs section of the site.
  • It is not possible to archive alerts in a way that they are cleared as being active in dashboards, but remain associated with the device. This can be very useful for forensic troubleshooting of a server (e.g. oh look, it had memory issues 3 times in last 12 months). Currently the only way to “clear” an active alert is to delete it from OMEss/OMEnt.

Being a Tech Release of the software also means that there is a reasonable amount of technical information not available yet or vague. One such example was that in the build white paper, it advised to install on “fast storage”. I am pretty sure “fast storage” could mean a number of different things to people – SSD, SAN, RAID Array, High RPM disk?

Luckily, there is an active Dell EMC Community Forum where you can ask questions and leave feedback. My experience there has been mostly a positive one.

OMEnt will not be replacing Essentials any time soon though, with a 2.4 release of OMEss on the 25th January and future release look likely to be scheduled as OMEnt still appears to lack some features of OMEss such as integration with OpenManage Mobile, which in the install I was using there were references to in the documentation, however did not appear in any of the application settings menus.

The appearance of OMEnt as an appliance also raises questions about the future for the suite of additional applications associated with the OpenManage suite including:

  • OpenManage Power Center
  • OpenManage License Manager
  • OpenManage Repository Manager

One of the “useful” things about OMEss install on a host system was that you were able to install these additional tools on the same host. I am particularly hopeful that Dell EMC will look into integrating the features of these products into the appliance itself – for example I couldn’t imagine there being a huge jump to store historical power and temperature data from the iDRAC to draw charts with, which would almost make power center redundant (some code would need to be written around power policy as well). License Manager was purely about exporting and storing licenses from management interfaces, that could be integrated into OMEnt as well.

Overall the Tech Release of OMEnt shows significant improvements in the UI and ability to manage servers and you can see that there is great potential there for this to be a much better management platform that it’s predecessor. If you are only needing basic monitoring and firmware management, then you may want start your own evaluation of the Tech Release.

However if you are an OMEss power user, Making extensive use of Configuration Management for Deployment along with firmware repository management and using the secondary applications like OpenManage Mobile, Power Center and Repository Manager, I would say that you may wish to hold off on rolling out OMEnt until reviewing the next release – right now it’s not quite there yet, but many things are on the roadmap.

 

by Ben at February 01, 2018 09:53 AM

Errata Security

Bitcoin: In Crypto We Trust

Tim Wu, who coined "net neutrality", has written an op-ed on the New York Times called "The Bitcoin Boom: In Code We Trust". He is wrong about "code".

The wrong "trust"

Wu builds a big manifesto about how real-world institutions can't be trusted. Certainly, this reflects the rhetoric from a vocal wing of Bitcoin fanatics, but it's not the Bitcoin manifesto.

Instead, the word "trust" in the Bitcoin paper is much narrower, referring to how online merchants can't trust credit-cards (for example). When I bought school supplies for my niece when she studied in Canada, the online site wouldn't accept my U.S. credit card. They didn't trust my credit card. However, they trusted my Bitcoin, so I used that payment method instead, and succeeded in the purchase.

Real-world currencies like dollars are tethered to the real-world, which means no single transaction can be trusted, because "they" (the credit-card company, the courts, etc.) may decide to reverse the transaction. The manifesto behind Bitcoin is that a transaction cannot be reversed -- and thus, can always be trusted.

Deliberately confusing the micro-trust in a transaction and macro-trust in banks and governments is a sort of bait-and-switch.

The wrong inspiration

Wu claims:
"It was, after all, a carnival of human errors and misfeasance that inspired the invention of Bitcoin in 2009, namely, the financial crisis."
Not true. Bitcoin did not appear fully formed out of the void, but was instead based upon a series of innovations that predate the financial crisis by a decade. Moreover, the financial crisis had little to do with "currency". The value of the dollar and other major currencies were essentially unscathed by the crisis. Certainly, enthusiasts looking backward like to cherry pick the financial crisis as yet one more reason why the offline world sucks, but it had little to do with Bitcoin.

In crypto we trust

It's not in code that Bitcoin trusts, but in crypto. Satoshi makes that clear in one of his posts on the subject:
A generation ago, multi-user time-sharing computer systems had a similar problem. Before strong encryption, users had to rely on password protection to secure their files, placing trust in the system administrator to keep their information private. Privacy could always be overridden by the admin based on his judgment call weighing the principle of privacy against other concerns, or at the behest of his superiors. Then strong encryption became available to the masses, and trust was no longer required. Data could be secured in a way that was physically impossible for others to access, no matter for what reason, no matter how good the excuse, no matter what.
You don't possess Bitcoins. Instead, all the coins are on the public blockchain under your "address". What you possess is the secret, private key that matches the address. Transferring Bitcoin means using your private key to unlock your coins and transfer them to another. If you print out your private key on paper, and delete it from the computer, it can never be hacked.

Trust is in this crypto operation. Trust is in your private crypto key.

We don't trust the code

The manifesto "in code we trust" has been proven wrong again and again. We don't trust computer code (software) in the cryptocurrency world.

The most profound example is something known as the "DAO" on top of Ethereum, Bitcoin's major competitor. Ethereum allows "smart contracts" containing code. The quasi-religious manifesto of the DAO smart-contract is that the "code is the contract", that all the terms and conditions are specified within the smart-contract code, completely untethered from real-world terms-and-conditions.

Then a hacker found a bug in the DAO smart-contract and stole most of the money.

In principle, this is perfectly legal, because "the code is the contract", and the hacker just used the code. In practice, the system didn't live up to this. The Ethereum core developers, acting as central bankers, rewrote the Ethereum code to fix this one contract, returning the money back to its original owners. They did this because those core developers were themselves heavily invested in the DAO and got their money back.

Similar things happen with the original Bitcoin code. A disagreement has arisen about how to expand Bitcoin to handle more transactions. One group wants smaller and "off-chain" transactions. Another group wants a "large blocksize". This caused a "fork" in Bitcoin with two versions, "Bitcoin" and "Bitcoin Cash". The fork championed by the core developers (central bankers) is worth around $20,000 right now, while the other fork is worth around $2,000.

So it's still "in central bankers we trust", it's just that now these central bankers are mostly online instead of offline institutions. They have proven to be even more corrupt than real-world central bankers. It's certainly not the code that is trusted.

The bubble

Wu repeats the well-known reference to Amazon during the dot-com bubble. If you bought Amazon's stock for $107 right before the dot-com crash, it still would be one of wisest investments you could've made. Amazon shares are now worth around $1,200 each.

The implication is that Bitcoin, too, may have such long term value. Even if you buy it today and it crashes tomorrow, it may still be worth ten-times its current value in another decade or two.

This is a poor analogy, for three reasons.

The first reason is that we knew the Internet had fundamentally transformed commerce. We knew there were going to be winners in the long run, it was just a matter of picking who would win (Amazon) and who would lose (Pets.com). We have yet to prove Bitcoin will be similarly transformative.

The second reason is that businesses are real, they generate real income. While the stock price may include some irrational exuberance, it's ultimately still based on the rational expectations of how much the business will earn. With Bitcoin, it's almost entirely irrational exuberance -- there are no long term returns.

The third flaw in the analogy is that there are an essentially infinite number of cryptocurrencies. We saw this today as Coinbase started trading Bitcoin Cash, a fork of Bitcoin. The two are nearly identical, so there's little reason one should be so much valuable than another. It's only a fickle fad that makes one more valuable than another, not business fundamentals. The successful future cryptocurrency is unlikely to exist today, but will be invented in the future.

The lessons of the dot-com bubble is not that Bitcoin will have long term value, but that cryptocurrency companies like Coinbase and BitPay will have long term value. Or, the lesson is that "old" companies like JPMorgan that are early adopters of the technology will grow faster than their competitors.

Conclusion

The point of Wu's paper is to distinguish trust in traditional real-world institutions and trust in computer software code. This is an inaccurate reading of the situation.

Bitcoin is not about replacing real-world institutions but about untethering online transactions.

The trust in Bitcoin is in crypto -- the power crypto gives individuals instead of third-parties.

The trust is not in the code. Bitcoin is a "cryptocurrency" not a "codecurrency".

by Robert Graham (noreply@blogger.com) at February 01, 2018 01:06 AM

January 31, 2018

Evaggelos Balaskas

Network-Bound Disk Encryption

Network-Bound Disk Encryption

I was reading the redhat release notes on 7.4 and came across: Chapter 15. Security

New packages: tang, clevis, jose, luksmeta

Network Bound Disk Encryption (NBDE) allows the user to encrypt root volumes of the hard drives on physical and virtual machines without requiring to manually enter password when systems are rebooted.

That means, we can now have an encrypted (luks) volume that will be de-crypted on reboot, without the need of typing a passphrase!!!

Really - really useful on VPS (and general in cloud infrastructures)

Useful Links

CentOS 7.4 with Encrypted rootfs

(aka client machine)

Below is a test centos 7.4 virtual machine with an encrypted root filesystem:

/boot

centos7bootfs.png

/

centos7luksrootfs.png

Tang Server

(aka server machine)

Tang is a server for binding data to network presence. This is a different centos 7.4 virtual machine from the above.

Installation

Let’s install the server part:

# yum -y install tang

Start socket service:

# systemctl restart tangd.socket

Enable socket service:

# systemctl enable tangd.socket

TCP Port

Check that the tang server is listening:

# netstat -ntulp | egrep -i systemd

tcp6    0    0 :::80    :::*    LISTEN    1/systemd

Firewall

Dont forget the firewall:

Firewall Zones

# firewall-cmd --get-active-zones

public
  interfaces: eth0

Firewall Port

# firewall-cmd --zone=public --add-port=80/tcp --permanent

or

# firewall-cmd --add-port=80/tcp --permanent

success

Reload

# firewall-cmd --reload

success

We have finished with the server part!

Client Machine - Encrypted rootfs

Now it is time to configure the client machine, but before let’s check the encrypted partition:

CryptTab

Every encrypted block devices is configured under crypttab file:

[root@centos7 ~]# cat /etc/crypttab

luks-3cc09d38-2f55-42b1-b0c7-b12f6c74200c UUID=3cc09d38-2f55-42b1-b0c7-b12f6c74200c none 

FsTab

and every filesystem that is static mounted on boot, is configured under fstab:

[root@centos7 ~]# cat /etc/fstab

UUID=c5ffbb05-d8e4-458c-9dc6-97723ccf43bc          /boot  xfs  defaults  0 0

/dev/mapper/luks-3cc09d38-2f55-42b1-b0c7-b12f6c74200c  /  xfs  defaults,x-systemd.device-timeout=0 0 0

Installation

Now let’s install the client (clevis) part that will talk with tang:

# yum -y install clevis clevis-luks clevis-dracut

Configuration

with a very simple command:

# clevis bind luks -d /dev/vda2 tang '{"url":"http://192.168.122.194"}'

The advertisement contains the following signing keys:

FYquzVHwdsGXByX_rRwm0VEmFRo

Do you wish to trust these keys? [ynYN] y

You are about to initialize a LUKS device for metadata storage.
Attempting to initialize it may result in data loss if data was
already written into the LUKS header gap in a different format.
A backup is advised before initialization is performed.

Do you wish to initialize /dev/vda2? [yn] y

Enter existing LUKS password:

we’ve just configured our encrypted volume against tang!

Luks MetaData

We can verify it’s luks metadata with:

[root@centos7 ~]# luksmeta show -d /dev/vda2

0   active empty
1   active cb6e8904-81ff-40da-a84a-07ab9ab5715e
2 inactive empty
3 inactive empty
4 inactive empty
5 inactive empty
6 inactive empty
7 inactive empty

dracut

We must not forget to regenerate the initramfs image, that on boot will try to talk with our tang server:

[root@centos7 ~]# dracut -f

Reboot

Now it’s time to reboot!

centos7luksbooting.png

A short msg will appear in our screen, but in a few seconds and if successfully exchange messages with the tang server, our server with de-crypt the rootfs volume.

centos7luksdf.png

Tang messages

To finish this article, I will show you some tang msg via journalct:

Initialization

Getting the signing key from the client on setup:

Jan 31 22:43:09 centos7 systemd[1]: Started Tang Server (192.168.122.195:58088).
Jan 31 22:43:09 centos7 systemd[1]: Starting Tang Server (192.168.122.195:58088)...
Jan 31 22:43:09 centos7 tangd[1219]: 192.168.122.195 GET /adv/ => 200 (src/tangd.c:85)

reboot

Client is trying to decrypt the encrypted volume on reboot

Jan 31 22:46:21 centos7 systemd[1]: Started Tang Server (192.168.122.162:42370).
Jan 31 22:46:21 centos7 systemd[1]: Starting Tang Server (192.168.122.162:42370)...
Jan 31 22:46:22 centos7 tangd[1223]: 192.168.122.162 POST /rec/Shdayp69IdGNzEMnZkJasfGLIjQ => 200 (src/tangd.c:168)

Tag(s): NBDE, luks, centos7

January 31, 2018 11:25 PM

January 30, 2018

Errata Security

The problematic Wannacry North Korea attribution

Last month, the US government officially "attributed" the Wannacry ransomware worm to North Korea. This attribution has three flaws, which are a good lesson for attribution in general.

It was an accident

The most important fact about Wannacry is that it was an accident. We've had 30 years of experience with Internet worms teaching us that worms are always accidents. While launching worms may be intentional, their effects cannot be predicted. While they appear to have targets, like Slammer against South Korea, or Witty against the Pentagon, further analysis shows this was just a random effect that was impossible to predict ahead of time. Only in hindsight are these effects explainable.

We should hold those causing accidents accountable, too, but it's a different accountability. The U.S. has caused more civilian deaths in its War on Terror than the terrorists caused triggering that war. But we hold these to be morally different: the terrorists targeted the innocent, whereas the U.S. takes great pains to avoid civilian casualties. 

Since we are talking about blaming those responsible for accidents, we also must include the NSA in that mix. The NSA created, then allowed the release of, weaponized exploits. That's like accidentally dropping a load of unexploded bombs near a village. When those bombs are then used, those having lost the weapons are held guilty along with those using them. Yes, while we should blame the hacker who added ETERNAL BLUE to their ransomware, we should also blame the NSA for losing control of ETERNAL BLUE.


A country and its assets are different

Was it North Korea, or hackers affilliated with North Korea? These aren't the same.

It's hard for North Korea to have hackers of its own. It doesn't have citizens who grow up with computers to pick from. Moreover, an internal hacking corps would create tainted citizens exposed to dangerous outside ideas. Update: Some people have pointed out that Kim Il-sung University in the capital does have some contact with the outside world, with academics granted limited Internet access, so I guess some tainting is allowed. Still, what we know of North Korea hacking efforts largley comes from hackers they employ outside North Korea. It was the Lazurus Group, outside North Korea, that did Wannacry.

Instead, North Korea develops external hacking "assets", supporting several external hacking groups in China, Japan, and South Korea. This is similar to how intelligence agencies develop human "assets" in foreign countries. While these assets do things for their handlers, they also have normal day jobs, and do many things that are wholly independent and even sometimes against their handler's interests.

For example, this Muckrock FOIA dump shows how "CIA assets" independently worked for Castro and assassinated a Panamanian president. That they also worked for the CIA does not make the CIA responsible for the Panamanian assassination.

That CIA/intelligence assets work this way is well-known and uncontroversial. The fact that countries use hacker assets like this is the controversial part. These hackers do act independently, yet we refuse to consider this when we want to "attribute" attacks.


Attribution is political

We have far better attribution for the nPetya attacks. It was less accidental (they clearly desired to disrupt Ukraine), and the hackers were much closer to the Russian government (Russian citizens). Yet, the Trump administration isn't fighting Russia, they are fighting North Korea, so they don't officially attribute nPetya to Russia, but do attribute Wannacry to North Korea.

Trump is in conflict with North Korea. He is looking for ways to escalate the conflict. Attributing Wannacry helps achieve his political objectives.

That it was blatantly politics is demonstrated by the way it was released to the press. It wasn't released in the normal way, where the administration can stand behind it, and get challenged on the particulars. Instead, it was pre-released through the normal system of "anonymous government officials" to the NYTimes, and then backed up with op-ed in the Wall Street Journal. The government leaks information like this when it's weak, not when its strong.

The proper way is to release the evidence upon which the decision was made, so that the public can challenge it. Among the questions the public would ask is whether it they believe it was North Korea's intention to cause precisely this effect, such as disabling the British NHS. Or, whether it was merely hackers "affiliated" with North Korea, or hackers carrying out North Korea's orders. We cannot challenge the government this way because the government intentionally holds itself above such accountability.


Conclusion

We believe hacking groups tied to North Korea are responsible for Wannacry. Yet, even if that's true, we still have three attribution problems. We still don't know if that was intentional, in pursuit of some political goal, or an accident. We still don't know if it was at the direction of North Korea, or whether their hacker assets acted independently. We still don't know if the government has answers to these questions, or whether it's exploiting this doubt to achieve political support for actions against North Korea.


by Robert Graham (noreply@blogger.com) at January 30, 2018 11:08 PM

syslog.me

syslog.me and 2017 in numbers

syslog.me-stats-2017

2017 has been a pretty good year for this blog.

The 10000 mark was passed for both the views (13790) and the visitors (11454); the previous records were established in 2015 for the views (10395) and in 2016 for the visitors (7520).

The top three visiting countries are the US (3251), Germany (1037) and France (763). My own country, Italy, didn’t make the top 10 with only 328 views.

The top three articles of the year were An init system in a Docker container with 3287 views, followed by Dates from UNIX timestamps in OpenOffice/LibreOffice (3123) and Exploring Docker overlay networks (published this year) with 1601 views.

2017 was also an year of change. In November 2016 I have left Opera Software and joined Telenor Digital as Head of IT. I have more “managerial” tasks now, less time for “operations” and the scale is definitely different than the one I was managing in Opera. That had an impact on the contents I was able to post in this blog, both in terms of topics and amount. Whether the new course is better or worse, only time will tell.

Happy 2018!

by bronto at January 30, 2018 08:00 AM

January 29, 2018

Vincent Bernat

L3 routing to the hypervisor with BGP

On layer 2 networks, high availability can be achieved by:

Layer 2 networks need very little configuration but come with a major drawback in highly available scenarios: an incident is likely to bring the whole network down.2 Therefore, it is safer to limit the scope of a single layer 2 network by, for example, using one distinct network in each rack and connecting them together with layer 3 routing. Incidents are unlikely to impact a whole IP network.

In the illustration below, top of the rack switches provide a default gateway for hosts. To provide redundancy, they use an MC-LAG implementation. Layer 2 fault domains are scoped to a rack. Each IP subnet is bound to a specific rack and routing information is shared between top of the rack switches and core routers using a routing protocol like OSPF.

Legacy L2 design

There are two main issues with this design:

  1. The L2 domains are still large. A rack could host several dozen hypervisors and several thousand virtual guests. Therefore, a network incident will have a large impact.

  2. IP subnets are pinned to each rack. A virtual guest cannot move to another rack and unused IP addresses in a rack cannot be used in another one.

To solve both these problems, it is possible to push L3 routing further to the south, turning each hypervisor into a L3 router. However, we need to ensure customer virtual guests are blind to this change: they should keep getting their configuration from DHCP (IP, subnet and gateway).

Hypervisor as a router🔗

In a nutshell, for a guest with an IPv4 address:

  • the hosting hypervisor configures a /32 route with the virtual interface as next-hop, and
  • this route is distributed to other hypervisors (and routers) using BGP.

Our design also handles two routing domains: a public one (hosting virtual guests from multiple tenants with direct Internet access) and a private one (used by our own infrastructure, hypervisors included). Each hypervisor uses two routing tables for this purpose.

The following illustration shows the configuration of an hypervisor with 5 guests. No bridge is needed.

L3 routing inside an hypervisor

The complete configuration described below is also available on GitHub. In real life, a piece of software is needed to update the hypervisor configuration when an instance is added or removed. It would listen to notifications from your cloud orchestrator.

Calico is a project fulfilling the same objective (L3 routing to the hypervisor) with mostly the same ideas (except it heavily relies on Netfilter to ensure separation between administrative domains). It provides an agent (Felix) to serve as an interface with orchestrators like OpenStack or Kubernetes. Check it if you want a turnkey solution.

Routing setup🔗

Using IP rules, each interface is “attached” to a routing table:

$ ip rule show
0:  from all lookup local
20: from all iif lo lookup main
21: from all iif lo lookup local-out
30: from all iif eth0.private lookup private
30: from all iif eth1.private lookup private
30: from all iif vnet8 lookup private
30: from all iif vnet9 lookup private
40: from all lookup public

The most important rules are the highlighted ones (priority 30 and 40): any traffic coming from a “private” interface uses the private table. Any remaining traffic uses the public table.

The two iif lo rules manage routing for packets originated from the hypervisor itself. The local-out table is a mix of the private and public tables. The hypervisor mostly needs the routes from the private table but also needs to contact local virtual guests (for example, to answer to a ping request) using the public table. Both tables contain a default route (no chaining possible), so we build a third table, local-out, by copying all routes from the private table and directly connected routes from the public table.

To avoid an accidental leak of traffic, public, private and local-out routing tables contain a default last-resort route with a large metric.3 On normal operations, these routes should be shadowed by a regular default route:

ip route add blackhole default metric 4294967294 table public
ip route add blackhole default metric 4294967294 table private
ip route add blackhole default metric 4294967294 table local-out

IPv6 is far simpler as we have only have one routing domain. While we keep a public table, there is no need for a local-out table:

$ ip -6 rule show
0:  from all lookup local
20: from all lookup main
40: from all lookup public

As a last step, forwarding is enabled and the number of maximum routes for IPv6 is increased (default is only 4096):

sysctl -qw net.ipv4.conf.all.forwarding=1
sysctl -qw net.ipv6.conf.all.forwarding=1
sysctl -qw net.ipv6.route.max_size=4194304

Guest routes🔗

The second step is to configure routes to each guest. For IPv6, we use the link-local address, derived from the remote MAC address, as next-hop:

ip -6 route add 2001:db8:cb00:7100:5254:33ff:fe00:f/128 \
    via fe80::5254:33ff:fe00:f dev vnet6 \
    table public

Assigning several IP addresses (or subnets) to each guest can be done by adding more routes:

ip -6 route add 2001:db8:cb00:7107::/64 \
    via fe80::5254:33ff:fe00:f dev vnet6 \
    table public

For IPv4, the route uses the guest interface as a next-hop. Linux will issue an ARP request before being able to forward the packet:4

ip route add 192.0.2.15/32 dev vnet6 \
  table public

Additional IP addresses and subnets can be configured the same way but each IP address would have to answer to ARP requests. To avoid this, it is possible to route additional subnets through the first IP address:5

ip route add 203.0.113.128/28 \
  via 192.0.2.15 dev vnet6 onlink \
  table public

BGP setup🔗

The third step is to share routes between hypervisors, through BGP. This part is dependant on how hypervisors are connected to each others.

Fabric design🔗

Several designs are possible to connect hypervisors. The most obvious one is to use a full L3 leaf-spine fabric:

Full L3 fabric design

Each hypervisor establishes an eBGP session to each of the leaf top-of-the-rack routers. These routers establishes an iBGP session with their neighbor and an eBGP session with each spine router. This solution can be expensive because the spine routers need to handle all routes. With the current generation of switches/routers, this puts a limit around the maximum number of routes with respect to the expected density.6 IP and BGP configuration can also be tedious unless some uncommon autoconfiguration mechanisms are used. On the other hand, leaf routers (and hypervisors) may optionally learn less routes as they can push non-local traffic north.

Another potential design is to use an L2 fabric. This may sound surprising after bad mouthing L2 networks for their unreliability but we don’t need them to be highly available. They can provide a very scalable and cost-efficient design:7

L2 fabric design

Each hypervisor is connected to 4 distinct L2 networks, limiting the scope of a single failure to a quarter of the available bandwidth. In this design, only iBGP is used. To avoid a full-mesh topology between all hypervisors, route reflectors are used. Each hypervisor has an iBGP session with one or several route reflectors from each of the L2 networks. Route reflectors on the same L2 network share their routes using iBGP. Calico documents this design in more detail.

This is the solution described below. Public and private domains share the same infrastructure but use distinct VLANs.

Route reflectors🔗

Route reflectors are BGP-speaking boxes acting as a hub for all routes on a given L2 network but not routing any traffic. We need at least one of them on each L2 network. We can use more for redundancy.

Here is an example of configuration for JunOS:8

protocols {
    bgp {
        group public-v4 {
            family inet {
                unicast {
                    no-install; # ❶
                }
            }
            type internal;
            cluster 198.51.100.126; # ❷
            allow 198.51.100.0/25; # ❸
            neighbor 198.51.100.127;
        }
        group public-v6 {
            family inet6 {
                unicast {
                    no-install;
                }
            }
            type internal;
            cluster 198.51.100.126;
            allow 2001:db8:c633:6401::/64;
            neighbor 2001:db8:c633:6401::198.51.100.127;
        }
        ttl 255;
        bfd-liveness-detection { # ❹
            minimum-interval 100;
            multiplier 5;
        }
    }
}
routing-options {
    router-id 198.51.100.126;
    autonomous-system 65000;
}

This route reflector accepts and redistributes IPv4 and IPv6 routes. In ❶, we ensure received routes are not installed in FIB: route reflectors are not routers.

Each route reflector needs to be assigned a cluster identifier, which is used in loop detection. In our case, we use the IPv4 address for this purpose (in ❷). Having a different cluster identifier for each route reflector on the same network ensure they share the routes they receive—increasing resiliency.

Instead of explicitely declaring all hypervisors allowed to connect to this route reflector, a whole subnet is authorized in ❸.9 We also declare the second route reflector for the same network as neighbor to ensure they connect to each other.

Another important point of this setup is how to quickly react to unavailable paths. With directly connected BGP sessions, a faulty link may be detected immediately and the associated BGP sessions will be brought down. This may not always be reliable. Moreover, in our case, BGP sessions are established over several switches: a link down on a path may be left undetected until the hold timer expires. Therefore, in ❹, we enable BFD, a protocol to quickly detect faults in path between two BGP peers (RFC 5880).

A last point to consider is whetever you want to allow anycast on your network: if an IP is advertised from more than one hypervisor, you may want to:

  • send all flows to only one hypervisor, or
  • load-balance flows between hypervisors.

The second choice provides a scalable L3 load-balancer. With the above configuration, for each prefix, route reflectors choose one path and distribute it. Therefore, only one hypervisor will receive packets. To get load-balancing, you need to enable advertisement of multiple paths in BGP (RFC 7911):10

set protocols bgp group public-v4 family inet  unicast add-path send path-count 4
set protocols bgp group public-v6 family inet6 unicast add-path send path-count 4

Here is an excerpt of show route exhibiting “simple” routes as well as an anycast route:

> show route protocol bgp
inet.0: 6 destinations, 7 routes (7 active, 1 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

0.0.0.0/0        *[BGP/170] 00:09:01, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.1 via em1.90
192.0.2.15/32    *[BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.101 via em1.90
203.0.113.1/32   *[BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.101 via em1.90
203.0.113.6/32   *[BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.102 via em1.90
203.0.113.18/32  *[BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.103 via em1.90
203.0.113.10/32  *[BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.101 via em1.90
                  [BGP/170] 00:09:00, localpref 100
                    AS path: I, validation-state: unverified
                  > to 198.51.100.102 via em1.90

Complete configuration is available on GitHub. Configurations for GoBGP, BIRD or FRR (running on Cumulus Linux) are also available.11 The configuration for the private routing domain is similar. To avoid dedicated boxes, a solution is to run the route reflectors on some of the top of the rack switches.

Hypervisor configuration🔗

So, let’s tackle the last step: the hypervisor configuration. We use BIRD (1.6.x) as a BGP daemon. It maintains three internal routing tables (public, private and local-out). We use a template with the common properties to connect to a route reflector:

template bgp rr_client {
  local as 65000;   # Local ASN matches route reflector ASN
  import all;       # Accept all received routes
  export all;       # Send all routes to remote peer
  next hop self;    # Modify next-hop with the IP used for this BGP session
  bfd yes;          # Enable BFD
  direct;           # Not a multi-hop BGP session
  ttl security yes; # GTSM is enabled
  add paths rx;     # Enable ADD-PATH reception (for anycast)

  # Low timers to establish sessions faster
  connect delay time 1;
  connect retry time 5;
  error wait time 1,5;
  error forget time 10;
}

table public;
protocol bgp RR1_public from rr_client {
  neighbor 198.51.100.126 as 65000;
  table public;
}
# […]

With the above configuration, all routes in BIRD’s public table are sent to the route reflector 198.51.100.126. Any route from the route reflector is accepted. We also need to connect BIRD’s public table to the kernel’s one:12

protocol kernel kernel_public {
  persist;
  scan time 10;
  import filter {
    # Take any route from kernel,
    # except our last-resort default route
    if krt_metric < 4294967294 then accept;
    reject;
  };
  export all;      # Put all routes into kernel
  learn;           # Learn routes not added by BIRD
  merge paths yes; # Build ECMP routes if possible
  table public;    # BIRD's table name
  kernel table 90; # Kernel table number
}

We also need to enable BFD on all interfaces:

protocol bfd {
  interface "*" {
    interval 100ms;
    multiplier 5;
  };
}

To avoid loosing BFD packets when the conntrack table is full, it is safer to disable connection tracking for these datagrams:

ip46tables -t raw -A PREROUTING -p udp --dport 3784 \
  -m addrtype --dst-type LOCAL -j CT --notrack
ip46tables -t raw -A OUTPUT -p udp --dport 3784 \
  -m addrtype --src-type LOCAL -j CT --notrack

Some missing bits are:

Once the BGP sessions have been established, we can query the kernel for the installed routes:

$ ip route show table public proto bird
default
        nexthop via 198.51.100.1 dev eth0.public weight 1
        nexthop via 198.51.100.254 dev eth1.public weight 1
203.0.113.6
        nexthop via 198.51.100.102 dev eth0.public weight 1
        nexthop via 198.51.100.202 dev eth1.public weight 1
203.0.113.18
        nexthop via 198.51.100.103 dev eth0.public weight 1
        nexthop via 198.51.100.203 dev eth1.public weight 1

Performance🔗

You may be worried on how much memory Linux may use when handling many routes. Well, don’t:

  • 128 MiB can fit 1 million IPv4 routes, and
  • 512 MiB can fit 1 million IPv6 routes.

BIRD uses about the same amount of memory for its own usage. As for lookup times, performance are also excellent with IPv4 and still quite good with IPv6:

  • 30 ns per lookup with 1 million IPv4 routes, and
  • 1.25 µs per lookup with 1 million IPv6 routes.

Therefore, the impact on letting Linux handle many routes is very low. For more details, see “IPv4 route lookup on Linux” and “IPv6 route lookup on Linux.”

Reverse path filtering🔗

To avoid spoofing, reverse path filtering is enabled on virtual guest interfaces: Linux will verify the source address is legit by checking the answer would use the incoming interface as an outgoing interface. This effectively prevent any possible spoofing from guests.

For IPv4, reverse path filtering can be enabled either through a per-interface sysctl14 or through the rpfilter match of Netfilter. For IPv6, only the second method is available.

# For IPv6, use NetFilter
ip6tables -t raw -N RPFILTER
ip6tables -t raw -A RPFILTER -m rpfilter -j RETURN
ip6tables -t raw -A RPFILTER -m rpfilter --accept-local \
  -m addrtype --dst-type MULTICAST -j DROP
ip6tables -t raw -A RPFILTER -m limit --limit 5/s --limit-burst 5 \
  -j LOG --log-prefix "NF: rpfilter: " --log-level warning
ip6tables -t raw -A RPFILTER -j DROP
ip6tables -t raw -A PREROUTING -i vnet+ -j RPFILTER

# For IPv4, use sysctls
sysctl -qw net.ipv4.conf.all.rp_filter=0
for iface in /sys/class/net/vnet*; do
    sysctl -qw net.ipv4.conf.${iface##*/}.rp_filter=1
done

There is no need to prevent L2 spoofing as there is no gain for the attacker.

Keeping guests in the dark🔗

An important aspect of the solution is to ensure guests believe they are attached to a classic L2 network (with an IP in a subnet).

The first step is to provide them with a working default gateway. On the hypervisor, this can be done by assigning the default gateway IP directly to the guest interface:

ip addr add 203.0.113.254/32 dev vnet5 scope link

Our main goal is to ensure Linux will answer to ARP requests for the gateway IP. Configuring a /32 is enough for this and we do not want to configure a larger subnet as, by default, Linux would install a route for the subnet to this interface, which would be incorrect.15

For IPv6, this is not needed as we rely on link-local addresses instead.

A guest may also try to speak with other guests on the same subnet. The hypervisor will answer ARP requests on their behalf. Once it starts receiving IP traffic, it will route it to the appropriate interface. This can be done by enabling ARP proxying on the guest interface:

sysctl -qw net.ipv4.conf.vnet5.proxy_arp=1
sysctl -qw net.ipv4.neigh.vnet5.proxy_delay=0

For IPv6, Linux NDP proxying is far less convenient. Instead, ndppd can handle this task. For each interface, we use the following configuration snippet:

proxy vnet5 {
  rule 2001:db8:cb00:7100::/64 {
    static
  }
}

For DHCP, some daemons may have difficulties to handle this odd configuration (with the /32 IP address on the interface), but Dnsmasq accepts such an oddity. For IPv6, assuming the assigned IP address is an EUI-64 one, radvd works with the following configuration on each interface:

interface vnet5 {
  AdvSendAdvert on;
  prefix 2001:db8:cb00:7100::/64 {
    AdvOnLink on;
    AdvAutonomous on;
    AdvRouterAddr on;
  };
};

Conclusion and future work🔗

This setup should work with BIRD 1.6.3 and a Linux 3.15+ kernel. Compared to legacy L2 networks, it brings flexibility and resiliency while keeping guests unaware of the change. By handing over routing to Linux, this design is also cheap as existing equipments can be reused. Still, exploitation of such solution is simple enough once the basic concepts are understood—IP rules, routing tables and BGP sessions.

There are several potential improvements:

using VRF
Starting from Linux 4.3, L3 VRF domains enable binding interfaces to routing tables. We could have three VRFs: public, private and local-out. This would improve performances by removing most IP rules (but until Linux 4.8, performances are crippled due to offloading features not enabled, see commit 7889681f4a6c). For more information, have a look at the kernel documentation.
full L3 routing
The BGP setup can be enhanced and simplified by using an L3 fabric and using some autoconfiguration features. Cumulus published a nice book, “BGP in the datacenter,” on this topic. However, this requires all BGP speakers to support these features. On the hypervisors, this would mean using FRR while the various network equipments would need to run Cumulus Linux.
BGP resiliency with BGP LLGR
Using short BFD timers make our network react fast to any disruption by quickly invalidating faulty paths without relying on link status. However, under load or congestion, BFD packets may be lost, making the whole hypervisor unreachable until BGP sessions can be brought up again. Some BGP implementations support Long-Lived BGP Graceful Restart, an extension allowing stale routes to be retained with a lower priority (see draft-uttaro-idr-bgp-persistence-03). This is an ideal solution to our problem: these stale routes are used only in last resort, after all links have failed. Currently, no open source implementation supports this draft.

  1. MC-LAG has been standardized in IEEE 802.1AX-2014. However, most vendors are likely to stick with their implementations. With MC-LAG, control planes remain independent. ↩︎

  2. An incident can stem from an operator mistake, but also from software bugs, which are more likely to happen in complex implementations during infrequent operations, like a software upgrade. ↩︎

  3. These routes should not be distributed over BGP. Hypervisors should receive a default route with a lower metric from edge routers. ↩︎

  4. A static ARP entry can also be added with the remote MAC address. ↩︎

  5. For example, a Juniper QFX5100 supports about 200k IPv4 routes (about $10,000, with Broadcom Trident II chipset). On the other hand, an Arista 7208SR supports 1.2M IPv4 routes (about $20,000, with Broadcom Jericho chipset), through the use of an external TCAM. A Juniper MX240 would support more than 2M IPv4 routes (about $30,000 for an empty chassis with two routing engines, with Juniper Trio chipset) with a lower density. ↩︎

  6. From a scalability point of view, with switches able to handle 32k MAC addresses, the fabric can host more than 8,000 hypervisors (more than 5 million virtual guests). Therefore, cost-effective switches can be used as both leaves and spines. Each hypervisor has to handle all routes, an easy task for Linux. ↩︎

  7. Use of routing instances would enable hosting several route reflectors on the same box. This is not used in this example but should be considered to reduce costs. ↩︎

  8. This prevents the use of any authentication mechanism: BGP usually relies on TCP MD5 signature (RFC 2385) to authenticate BGP sessions. On most OS, this requires to know allowed peers. To tighten a bit the security in absence of authentication, we use the Generalized TTL Security Mechanism (RFC 5082). For JunOS, the configuration presented here (with ttl 255) is incomplete. A firewall filter is also needed↩︎

  9. Unfortunately, for no good reason, JunOS doesn’t support the BGP add-path extension in a routing instance. Such a configuration is possible with Cumulus Linux. ↩︎

  10. Only BIRD comes with BFD support out of the box but it does not support implicit peers. FRR needs Cumulus’ PTMD. If you don’t care about BFD, GoBGP is really nice as a route reflector. ↩︎

  11. Kernel tables are numbered. ip can use names declared in /etc/iproute2/rt_tables↩︎

  12. It should be noted that ECMP with IPv6 only works from BIRD 1.6.1. Moreover, when using Linux 4.11 or more recent, you need to apply commit 98bb80a243b5↩︎

  13. For a given interface, Linux uses the maximum value between the sysctl for all and the one for the interface. ↩︎

  14. It is possible to prevent Linux to install a connected route by using the noprefixroute flag. However, this flag is only available since Linux 4.4 for IPv4. Only use this flag if your DHCP server is giving you a hard time as it may trigger other issues (related to the promotion of secondary addresses). ↩︎

by Vincent Bernat at January 29, 2018 08:25 PM

January 28, 2018

syslog.me

Promise-based team leadership

Can Promise Theory help you shape a better, effective leadership style?

Promise-based leadership will be the topic of the talk I will hold at two conferences. The first one is the Config Management Camp in Gent, Belgium, and it’s pretty close: February 5th. The second conference is the glorious Incontro DevOps Italia 2018, the Italian DevOps Meeting in Bologna, on March 9th.

When I joined Telenor Digital as the Head of IT I had to find an unconventional leadership style, as circumstances didn’t allow for a traditional one based on the “line-of-command” approach. After so many years spent with using CFEngine it was quite natural to me to use Promise Theory to model my new “reality” and understand how I could exploit exactly those peculiarities that were making the traditional leadership approach pointless.

Promise-based leadership has clear limits in applicability. It requires the right attitude in leaders and the right culture in the company. Where the right leaders and the right culture are present, I am confident that it provides significant advantages compared to the conventional approach based on top-down imposition.

I have been doing promise-based leadership for a bit more than one year now, regardless of people being direct reports or simply colleagues at any level of the hierarchy. My talk is a report of the experience so far. I don’t have definitive answers yet and there are several unanswered questions. I will be a bit tight with my talk schedule and I won’t be able to take many questions, but I hope to have several interesting conversations “on the side” of the conference events 🙂

One fun fact for closing: when I submitted to Config Management Camp I wasn’t really confident that my talk would be accepted because, I thought, the topic was kind-of “tangent” to the conference’s, so I didn’t even plan to attend. Later on the keynotes were announced, and one of them will be held by Mark Burgess, the inventor of Promise Theory. A few weeks more and I was informed that my talk was accepted and they actually liked it. So I will be talking of promise-based leadership at the same conference, in the same track and in the same room as the inventor of Promise Theory himself. Guess how hard I am working to put together a decent talk in time… 😀

by bronto at January 28, 2018 11:38 AM

January 26, 2018

Simon Lyall

Linux.conf.au 2018 – Day 5 – Light Talks and Close

Lightning Talk

  • Usability Fails
  • Etching
  • Diverse Events
  • Kids Space – fairly unstructured and self organising
  • Opening up LandSat imagery – NBAR-T available on NCI
  • Project Nacho – HTML -> VPN/RDP gateway . Apache Guacomle
  • Vocaloids
  • Blockchain
  • Using j2 to create C++ code
  • Memory model code update
  • CLIs are user interface too
  • Complicated git things
  • Mollygive -matching donations
  • Abusing Docker

Closing

  • LCA 2019 will be in Christchurch, New Zealand – http://lca2019.linux.org.au
  • 700 Attendees at 2018
  • 400 talk and 36 Miniconf submissions

 

 

Share

by simon at January 26, 2018 06:17 AM

Linux.conf.au 2018 – Day 5 – Session 2

QUIC: Replacing TCP for the Web Jana Iyengar

  • History
    • Protocol for http transport
    • Deployed Inside Google 2014 and Chrome / mobile apps
    • Improved performance: Youtube rebuffers 15-18% , Google search latency 3.6 – 8 %
    • 35% of Google’s egree traffic (7% of Internet)
    • Working group started in 2016 to standardized QUIC
    • Turned off at the start of 2016 due to security problem
    • Doubled in Sept 2016 due turned on for the youtube app
  • Technology
    • Previously – ip _> TCP -> TLS -> HTTP/2
    • QUIC -> udp -> QUIC -> http over QUIC
    • Includes crypto and tcp handshake
    • congestion control
    • loss recovery
    • TLS 1.3 has some of the same features that QUIC pioneered, being updated to take account
  • HTTP/1
    • 1 trip for TCP
    • 2 trips for TLS
    • Single connection – Head Of Line blocking
    • Multiple TCP connections workaround.
  • HTTP/2
    • Streams within a single transport connection
    • Packet loss will stall the TCP layer
    • Unresolved problems
      • Connection setup latency
      • Middlebox interference with TCP – makes it hard to change TCP
      • Head of line blocking within TCP
  • QUIC
    • Connection setup
      • 0 round trips, handshake packet followed directly by data packet
      • 1 round-trips if crypto keys are not new
      • 2 round trips if QUIC version needs renegotiation
    • Streams
      • http/2 streams are sent as quic streams
  • Aspirations of protocol
    • Deployable and evolveable
    • Low latency connection establishment
    • Stream multiplexing
    • Better loss recovery and flexible congestion control
      • richer signalling (unique packet number)
      • better RTT estimates
    • Resilience to NAT-rebinding ( UDP Nat-mapping changes often, maybe every few seconds)
  • UDP is not a transport, you put something in top of UDP to build a transport
  • Why not a new protocol instead of UDP? Almost impossible to get a new protocol in middle boxes around the Internet.
  • Metrics
    • Search Latency (see paper for other metrics)
    • Enter search term > entire page is loaded
    • Mean: desktop improve 8% , mobile 3.6 %
    • Low latency: Desktop 1% , Mobile none
    • Highest Latency 90-99% of users: Desktop & mobile 15-16%
    • Video similar
    • Big gain is from 0 RTT handshake
  • QUIC – Search Latency Improvements by Country
    • South Korea – 38ms RTT – 1% improvement
    • USA – 50ms – 2 – 3.5 %
    • India – 188ms – 5 – 13%
  • Middlebox ossification
    • Vendor ossified first byte of QUIC packet – flags byte
    • since it seemed to be the same on all QUIC packets
    • broke QUIC deployment when a flag was fixed
    • Encryption is the only way to protect against network ossification
    • “Greasing” by randomly changing options is also an option.
  • Other Protocols over QUIC?
    • Concentrating on http/2
    • Looking at Web RPC

Remote Work: My first decade working from the far end of the earth John Dalton

  • “Remote work has given me a fulfilling technical career while still being able to raise my family in Tasmania”
  • First son both in 2015, wanted to start in Tasmania with family to raise them, rather than moving to a tech hub.
  • 2017 working with High Performance Computing at University Tasmania
  • If everything is going to be outsourced, I want to be the one they outsourced to.
  • Wanted to do big web stuff, nobody in Tasmania doing that.
  • Was a user at LibraryThing
    • They were searching for Sysadmin/DBA in Portland, Maine
    • Knew he could do the job even though was on other side of the world
    • Negotiated into it over a couple of months
    • Knew could do the work, but not sure how the position would work out

Challenges

  • Discipline
    • Feels he is not organised. Doesn’t keep planner uptodate or todo lists etc
    • “You can spend a lot of time reading about time management without actually doing it”
    • Do you need to have the minimum level
  • Isolation
    • Lives 20 minutes out of Hobart
    • In semi-rural area for days at a time, doesn’t leave house all week except to ferry kids on weekends.
    • “Never considered myself an extrovert, but I do enjoy talking to people at least weekly”
    • Need to work to hook in with Hobart tech community, Goes to meetups. Plays D&D with friends.
    • Considering going to coworking space. sometimes goes to Cafes etc
  • Setting Boundries
    • Hard to Leave work.
    • Have a dedicated work space.
  • Internet Access
    • Prioritise Coverage over cost these days for mobile.
    • Sometimes fixed provider go down, need to have a backup
  • Communication
    • Less random communicated with other employees
    • Cannot assume any particular knowledge when talking with other people
    • Aware of particular cultural differences
    • Multiple chance of a miscommunication

Opportunities

  • Access to companies and jobs and technologies that could get locally
  • Access to people with a wider range of experiences and backgrounds

Finding remote work

  • Talk your way into it
  • Networking
  • Job Bof
  • stackoverflow.com/jobs can filter
  • weworkremotely.com

Making it work

  • Be Visable
  • Go home at the end of the day
  • Remember real people are at the end of the email

 

Share

by simon at January 26, 2018 04:23 AM

Linux.conf.au 2018 – Day 5 – Session 1

Self-Documenting Coders: Writing Workshop for Devs Heidi Waterhouse

History of Technical documentation

  • Linear Writing
    • On Paper, usually books
    • Emphasis on understanding and doing
  • Task-based writing
    • Early 90s
    • DITA
    • Concept, Procedure, Reference
  • Object-orientated writing
    • High art for of tech writers
    • Content as code
    • Only works when compiled
    • Favoured by tech writers, translated. Up to $2000 per seat
  • Guerilla Writing
    • Stack Overflow
    • Wikis
    • YouTube
    • frustrated non-writers trying to help peers
  • Search-first writing
    • Every page is page one
    • Search-index driven

Writing Words

  • 5 W’s of journalism.
  • Documentation needs to be tested
  • Audiences
    • eg Users, future-self, Sysadmins, experts, End users, installers
  • Writing Basics
    • Sentences short
    • Graphics for concepts
    • Avoid screencaps (too easily outdated)
    • User style guides and linters
    • Accessibility is a real thing
  • Words with pictures
    • Never include settings only in an image ( “set your screen to look like this” is bad)
    • Use images for concepts not instructions
  • Not all your users are readers
    • Can’t see well
    • Can’t parse easily
    • Some have terrible equipment
    • Some of the “some people” is us
    • Accessibility is not a checklist, although that helps, it is us
  • Using templates to write
    • Organising your thoughts and avoid forgetting parts
    • Add a standard look at low mental cost
  • Search-first writing – page one
    • If you didn’t answer the question or point to the answer you failed
    • answer “How do I?”
  • Indexing and search
    • All the words present are indexed
    • No false pointers
    • Use words people use and search for, Don’t use just your internal names for things
  • Semantic tagging and reuse
    • Semantic text splits form and content
    • Semantic tagging allows reuse
    • Reuse saves duplication
    • Reuse requires compiling
  • Sorting topics into buckets
    • Even with search you need some organisation
    • Group items by how they get used not by how they get prammed
    • Grouping similar items allows serendipity
  • Links, menus and flow
    • give people a next step
    • Provide related info on same page
    • show location
    • offer a chance to see the document structure

Distributing Words

  • Static Sites
  • Hosted Sites
  • Baked into the product
    • Only available to customers
    • only updates with the product
    • Hard to encourage average user to input
  • Knowledge based / CMS
    • Useful to community that known what it wants
    • Prone to aging and rot
    • Sometimes diverges from published docs or company message
  • Professional Writing Tools
    • Shiny and powerful
    • Learning Cliff
    • IDE
    • Super features
    • Not going to happen again
  • Paper-ish things
    • Essential for some topics
    • Reassuring to many people
    • touch is a sense we can bond with
    • Need to understand if people using docs will be online or offline when they want them.
  • Using templates to publish
    • Unified look and feel
    • Consistency and not missing things
    • Built-in checklist

Collaborating on Words

  • One weird trick, write it up as your best guess and let them correct it
  • Have a hack day
    • Ste a goal of things to delete
    • Set a goal of things to fix
    • Keep track of debt you can’t handle today
    • team-building doesn’t have to be about activities

Deleting Words

  • What needs to go
    • Old stuff that is wrong and terrible
    • Wrong stuff that hides right stuff
  • What to delete
    • Anything wrong
    • Anything dangerious
    • Anything used of updated in year
  • How
    • Delete temporarily (put aside for a while)
    • Based on analytics
    • Ruthlessly
    • Delete or update

Documentation Must be

  • True
  • Timely
  • Testable
  • Tuned

Documentation Components

  • Who is reading and why
    • Assuming no one likes reading docs
    • What is driving them to be here
  • Pre Requisites
    • What does a user need to succeed
    • Can I change the product to reduce documentation
    • Is there any hazard in this process
  • How do I do this task
    • Steps
    • Results
    • Next steps
  • Test – How do I know that it worked
    • If you can’t test i, it is not a procedure
    • What will the system do, how does the state change
  • Reference
    • What other stuff that affects this
    • What are the optionsal settings
    • What are the related things
  • Code and code samples
    • Best: code you can modify and run in the docs
    • 2nd Best: Code you can copy easily
    • Worst: retyping code
  • Option
    • Why did we build it this way
    • What else might you want to know
    • Have other people done this
    • Lifecycle

Documentation Types

  • Instructions
  • Ideas (arch, problem space,discarded options, process)
  • Action required (release notes, updates, deprecation)
  • Historical (roads maps, projects plans, retrospective documents)
  • Invisible docs (user experience, microinteractions, error messages)
    • Error messages – Unique ID, what caused, What mitigation, optional: Link to report

 

Share

by simon at January 26, 2018 01:11 AM

Raymii.org

Dell PowerEdge firmware upgrades via iDrac

The recent spectre and meltdown vulnerabilities require BIOS and firmware updates. Dell provides binaries for Windows and Linux, but just for Red Hat and SUSE. Some firmware updates can be run on Ubuntu or Debian, but some fail with the error that RPM could not be found. Which is correct since it's not Red Hat. In this small article I'll show you how to upgrade the firmware via the iDrac, which I recently discovered.

January 26, 2018 12:00 AM

January 25, 2018

Sean's IT Blog

VDI in the Time of Frequent Windows 10 Upgrades

The longevity of Windows 7, and Windows XP before that, has spoiled many customers and enterprises.  It provided IT organizations with a stable base to build their end-user computing infrastructures and applications on, and users were provided with a consistent experience.  The update model was fairly well known – a major service pack with all updates and feature enhancements would come out after about one year.

Whether this stability was good for organizations is debatable.  It certainly came with trade-offs, security of the endpoint being the primary one.

The introduction of Windows 10 has changed that model, and Microsoft is continuing to refine that model.  Microsoft is now releasing two major “feature updates” for Windows 10 each year, and these updates will only be supported for about 18 months each.  Microsoft calls this the “Windows as a Service” model, and it consists of two production-ready semi-annual release channels – a targeted deployment that is used to pilot users to test applications, and a broad deployment that replaces the “Current Branch for Business” option for enterprises.

Gone are the days where the end user’s desktop will have the same operating system for it’s entire life cycle.

(Note: While there is still a long-term servicing branch, Microsoft has repeatedly stated that this branch is suited for appliances and “machinery” that should not receive frequent feature updates such as ATMs and medical equipment.)

In order to facilitate this new delivery model, Microsoft has refined their in-place operating system upgrade technology.  While it has been possible to do this for years with previous versions of Windows, it was often flaky.  Settings wouldn’t port over properly, applications would refuse to run, and other weird errors would crop up.  That’s mostly a thing of the past when working with physical Windows 10 endpoints.

Virtual desktops, however, don’t seem to handle in-place upgrades well.  Virtual desktops often utilize various additional agents to deliver desktops remotely to users, and the in-place upgrade process can break these agents or cause otherwise unexpected behavior.  They also have a tendancy to reinstall Windows Modern Applications that have been removed or reset settings (although Microsoft is supposed to be working on those items).

If Windows 10 feature release upgrades can break, or at least require significant rework of, existing VDI images, what is the best method for handling them in a VDI environment?

I see two main options.  The first is to manually uninstall the VDI agents from the parent VMs, take a snapshot, and then do an in-place upgrade.  After the upgrade is complete, the VDI agents would need to be reinstalled on the machine.  In my opinion, this option has a couple of drawbacks.

First, it requires a significant amount of time.  While there are a number of steps that could be automated, validating the image after the upgrade would still require an administrator.  Someone would have to log in to validate that all settings were carried over properly and that Modern Applications were not reinstalled.  This may become a significant time sink if I have multiple parent desktop images.

Second, this process wouldn’t scale well.  If I have a large number of parent images, or a large estate of persistent desktops, I have to build a workflow to remove agents, upgrade Windows, and reinstall agents after the upgrade.  Not only do I have to test this workflow significantly, but I still have to test my desktops to ensure that the upgrade didn’t break any applications.

The second option, in my view, is to rebuild the desktop image when each new version of Windows 10 is released.  This ensures that you have a clean OS and application installation with every new release, and it would require less testing to validate because I don’t have to check to see what broke during the upgrade process.

One of the main drawbacks to this approach is that image building is a time consuming process.  This is where automated deployments can be helpful.  Tools like Microsoft Deployment Toolkit can help administrators build their parent images, including any agents and required applications, automatically as part of a task sequence.  With this type of toolkit, and administrator can automate their build process so that when a new version of Windows 10 is released, or a core desktop component like the Horizon or XenDesktop agent is updated, the image will have the latest software the next time a new build is started.

(Note: MDT is not the only tool in this category.  It is, however, the one I’m most familiar with.  It’s also the tool that Trond Haavarstein, @XenAppBlog, used for his Automation Framework Tool.)

Let’s take this one step further.  As an administrator, I would be doing a new Windows 10 build every 6 months to a year to ensure that my virtual desktop images remain on a supported version of Windows.  At some point, I’ll want to do more than just automate the Windows installation so that my end result, a fully configured virtual desktop that is deployment ready, is available at the push of a button.  This can include things like bringing it into Citrix Provisioning Services or shutting it down and taking a snapshot for VMware Horizon.

Virtualization has allowed for significant automation in the data center.  Tools like VMware PowerCLI and the Nutanix REST API make it easy for administrators to deploy and manage virtual machines using a few lines of PowerShell.   Using these same tools, I can also take details from this virtual machine shell, such as the name and MAC address, and inject them into my MDT database along with a Task Sequence and role.  When I power the VM on, it will automatically boot to MDT and start the task sequence that has been defined.

This is bringing “Infrastructure as Code” concepts to end-user computing, and the results should make it easier for administrators to test and deploy the latest versions of Windows 10 while reducing their management overhead.

I’m in the process of working through the last bits to automate the VM creation and integration with MDT, and I hope to have something to show in the next couple of weeks.

 

by seanpmassey at January 25, 2018 04:47 PM

January 24, 2018

Evaggelos Balaskas

Ready Player One by Ernest Cline

Ready Player One by Ernest Cline

I’ve listened to the audiobook, Narrated by Wil Wheaton.

 

The book is AMAZING! Taking a trip down memory lane to ’80s pop culture, video games, music & movies. A sci-fi futuristic book that online gamers are trying to solve puzzles on a easter egg hunt for the control of oasis, a virtual reality game.

 

readyplayerone.jpg

 

You can find more info here

January 24, 2018 10:51 AM

January 23, 2018

The Lone Sysadmin

How to Disable Windows IPv6 Temporary Addresses

The default Microsoft Windows IPv6 implementation has privacy extensions enabled, where IPv6 temporary addresses are used for client activities. The idea is that IPv6 has so many addresses available to it that we can create extra ones to help mask our activities. In practice these temporary addresses are largely pointless, and are very unhelpful if firewalls […]

The post How to Disable Windows IPv6 Temporary Addresses appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 23, 2018 07:32 PM

January 22, 2018

TaoSecurity

Lies and More Lies

Following the release of the Spectre and Meltdown CPU attacks, the security community wondered if other researchers would find related speculative attack problems. When the following appeared, we were concerned:

"Skyfall and Solace

More vulnerabilities in modern computers.

Following the recent release of the Meltdown and Spectre vulnerabilities, CVE-2017-5175, CVE-2017-5753 and CVE-2017-5754, there has been considerable speculation as to whether all the issues described can be fully mitigated. 

Skyfall and Solace are two speculative attacks based on the work highlighted by Meltdown and Spectre.

Full details are still under embargo and will be published soon when chip manufacturers and Operating System vendors have prepared patches.

Watch this space..."

It turns out this was a hoax. The latest version of the site says, in part:

"With little more than a couple of quickly registered domain names, thousands of people were hooked...

Skyfall

The idea here was to suggest a link to Intel's Skylake processor.

Solace

The idea here was to suggest a link to the Solaris operating system.

Copy the styling of the original Meltdown and Spectre sites and add a couple of favicons based loosely on the Intel and Solaris logos and I was nearly done.

The final step was to add on https, because if a site's got an SSL certificate it must be legitimate, and the bait was set."

The problem with this "explanation" is that it wasn't just a logo, domain name and SSL certificate. The "security professional" who created this site outright lied, as shown at the top of this post. Don't fall for his false narrative.

I'm not naming names or linking to the sites here, because the person responsible already thinks he's too clever.

by Richard Bejtlich (noreply@blogger.com) at January 22, 2018 02:30 PM

January 18, 2018

Homo-Adminus

Compliance-Driven Development or the Story Behind Swiftype’s SOC2 Certification

This article has been originally posted on Swiftype Engineering blog.


Based on my experience, just a decade ago not many people within the Silicon Valley startup community considered compliance an important stepping stone in a company’s development roadmap. And when it came to compliance for startups, it was nearly synonymous with PCI/DSS — mandatory certification used by the credit card industry. Over the last few years though, the rise in the number of startups working with large amounts of private and confidential data (fintech, healthcare, etc) and subsequently the rise in the magnitude of data breaches, led our industry to accept the idea that compliance and certifications are not just for the “big guys”. Nowadays, even very small companies are pressed to go through formal certifications if they want people to trust them with private or confidential data.

That is exactly what happened to Swiftype at the beginning of 2017. While preparing for a public release of our latest product (Swiftype Enterprise Search), we understood that it was going to involve a lot of confidential information and we would need to be able to assure our customers of our capabilities to protect their data. In addition to the marketing aspect, there was a security angle to the problem as well: we were looking for a standard framework that could be used by our small team to ensure the safety of customer data, guiding us through the process. Based on those considerations, we decided to go through a formal SOC 2 certification. In this article, I will describe our journey towards the certification and our findings along the way.

 


Different Ways of Approaching Certification

Based on my experience with various certifications (during my career, I’ve had a chance to participate in PCI/DSS, ISO27001 and SOC 2), there are at least two primary ways of looking at and preparing for an audit.

A Passive Approach

You hire an auditing company, wait for them to hand you a list of requests, scramble to collect some evidence and keep going back and forth with them until you have a required minimum — just enough to pass the certification. If you miss some controls, the auditor often provides you with generic examples to be used. This approach is typical for large (and old-school) companies who view certification and compliance as an obligation.

Even though this is the cheapest (in the short term) way of achieving a certification, I am not a fan of this approach for multiple reasons:

  • You often end up with a random set of processes and controls designed to fit a generic large company that are handed over to you by an outside consultant. Those processes often feel very formal and alien to a culture of a fast-moving startup.
  • By viewing the certification process as an obligation (something “we have to do”), you miss out on the opportunity to use it as a driver for improvement in all areas of your business: from infrastructure security to onboarding processes, to accounting, HR and so on.
  • Being generic, a lot of processes you introduce by being passive during a certification end up relying heavily on bureaucratic approach and are very manual. This may be too painful for a small company with a limited staff juggling different roles and responsibilities.

In general, I do not believe a small to medium-sized company should use this approach when preparing for and going through any certification. By doing so, you are missing out on many potential benefits of compliance-related efforts.

An Active Approach

Another, drastically different approach to compliance is based on a simple idea — you have decided to get certified because you understand the value behind it, you’re spending resources on doing it, so you better use it as an opportunity to improve your business, your infrastructure, your team and gain as much as possible from the process. Here is how it works:

  • First, you collect the list of requirements and criteria that your company will be tested against during the audit.
  • Then, you analyze the current state of the company according to each of the criteria.
  • And, finally, you carefully map those lists to each other, looking for any gaps that you will need to fill.

By actively analyzing the state of your company, you will understand what is missing and will be in a position to fix those issues by changing your infrastructure, introducing new processes, etc. You will understand not only what needs to be done, but why it is being done and, since you know how your company operates, you should be able to tailor those changes to your unique environment and make them feel much more natural. Another very important aspect of the active approach to compliance in a modern technical startup is that you should be able to automate a lot of your compliance needs, significantly reducing the additional workload caused by compliance-related changes within your company.

I was lucky to get introduced to this approach at Eligible by Aaron Bedra, and it has changed my view on compliance forever. Applying the same method at Swiftype only reinforced my conviction that compliance-related work in a small company may be hugely beneficial while being done in a non-disruptive way.

Initial Gap Analysis

As described above, the first step in our preparation for SOC 2 certification was so-called “gap analysis” — a process of mapping the list of criteria for a certification with the current state of the company and finding all of the criteria that require additional controls and processes to be introduced within the company.

The most useful document during this step was the official list of Trust Services Criteria published on AICPA website. The document contains a list of all criteria used during the audit and, what was enormously helpful, a list of illustrative risks and illustrative controls associated with each criterion. Those illustrative examples helped us better understand the reasoning behind each criterion and define what our internal controls should look like.

Based on the official documentation from AICPA, we created a large spreadsheet, which mapped SOC 2 criteria to illustrative risks, then each risk was mapped to an internal control we already had or needed to implement.

Here is a small snippet from the spreadsheet:

SOC 2 Criteria Mapping

As you can see, control activities (specific things we do within the company to address different risks) within the document end up repeating, since one control activity often addresses multiple risks and hence applies to different criteria. After we collected a list of control activities, we grouped them into logical collections — internal controls within our company.

When the document was ready, it became the main tool used for preparation for the audit, helping us track the progress of implementation for each control activity and each control within the company.

Designing Controls

As I already mentioned before, the illustrative controls provided by AICPA within their Trust Services Criteria list were really helpful for guiding the process of designing our own internal controls. We would take each criterion, look at each illustrative risk and illustrative control and ask ourselves a simple question: how could we address the illustrative risk in the most efficient way using internal automation and other technological solutions at our disposal? The result would very often be much simpler than the illustrative control, but as long as it addressed the underlying risk, we were confident enough it would work for us.

Just as with any other aspect of building a company, there are many people who have designed and implemented compliance and security controls before, and it is always useful to understand what other people did and why before you make your own decisions. While designing our own controls, we looked at available public information on SOC 2 and ISO 27001 controls, talked to our peers within the industry and researched security controls used by our vendors (many companies publish information about their security controls or are willing to provide you with their SOC 2 reports). But we always strived to make controls our own — make sure they would fit within our existing culture, our existing processes, etc. Changing existing processes that worked for us for years was the last option, and we only did it when we could see a clear improvement for the company as the result of the change.

Tracking the Implementation

After we performed the initial gap analysis, we ended up with a very long list of control activities designed for our company. To make it easier to control the process of implementing those controls, we used Jira. Here are some ideas that helped us along the way:

  • First, we created a dedicated Jira project for tracking our internal controls — each logically grouped set of control activities from the gap analysis document ended up being represented with a single CONTROL Jira issue, that would contain information about those control activities, map them to relevant SOC 2 criteria, etc.
  • Then we created Jira issues for our engineering and operations teams (TECHOPS and ENG in our case); one issue per control activity we needed to implement — this made sure that we would not drop anything on the floor during the implementation process and would be able to incorporate compliance-related work into our normal business operations and Jira-based sprint planning. All implementation Jira issues have been tagged with a “SOC2” label to make it easier to find them across different team projects.
  • Finally, we linked implementation Jira issues to controls by making implementation issues “block” controls — this made it very easy to tell what needed to be done before a control would be considered fully implemented at Swiftype by looking at a specific CONTROL issue and seeing all blocking implementation items on it.

Here is an example snippet of a CONTROL issue from our internal Jira tracker:

Internal Pre-Audit Control Testing

As I explained at the beginning of the article, ever since the beginning of the process we aimed to pass the audit without any exceptions — we wanted to be sure our system was designed properly without any gaps between the compliance criteria and our internal controls. To ensure our audit would go as smoothly as possible, we performed internal control testing during the final stretch before our on-site visit from the auditors.

The testing was done using the following simple process:

  1. For each CONTROL issue listed in our Jira
  2. Make sure all blocking implementation issues have been completed
  3. Check each SOC 2 criteria related to a specific CONTROL, get a list of illustrative risks listed in the original document and make sure we:
  • Have relevant policies in place addressing the risk
  • Have automation in place providing us with alerts and audit trails addressing the risk
  • Could provide evidence of both the policies and automation if asked during the audit

When we tested a control, we would mark it as Done in Jira, helping us track the controls that still needed attention.

This phase is where our very active and involved process finally met the old-school evidence-based process typically used by large companies. Since certification is based on providing auditors with evidence, they always require you to upload hundreds of pieces of content (documents, screenshots, logs, etc.) to their portals before the on-site audit. In turn, while doing our internal testing, we would always make sure we could provide any evidence required. This meant that after each test had been finished, we would upload relevant evidence pieces into the auditor portal and make notes on how it was obtained so that we could quickly do it again during the on-site audit.

Our Findings and Future Plans

During the on-site audit, we were often complimented by the auditor’s staff for being so well prepared and having our controls so well laid out. But most importantly, we have not noticed any slowdown in our team’s day-to-day operations after implementing all of our internal controls. I believe that is a true testament to the active and involved preparation process we have gone through and to all the automation we put in place, allowing our small team to continue focusing on building the product and the company instead of being slowed down by the old, paper-heavy process often associated with compliance.

We believe that many aspects of our company improved thanks to the compliance-related efforts of our team and all the changes we were able to make while guided by the reliable, industry-tested framework of SOC 2. We’re looking forward to providing safe and secure services to our customers and are happy to have AICPA certify our ability to do so.

 

 

by Oleksiy Kovyrin at January 18, 2018 05:06 PM

OpenSSL

Another Face to Face: Email Changes and Crypto Policy

The OpenSSL OMC met last month for a two-day face-to-face meeting in London, and like previous F2F meetings, most of the team was present and we addressed a great many issues. This blog posts talks about some of them, and most of the others will get their own blog posts, or notices, later. Red Hat graciously hosted us for the two days, and both Red Hat and Cryptsoft covered the costs of their employees who attended.

One of the overall threads of the meeting was about increasing the transparency of the project. By default, everything should be done in public. We decided to try some major changes to email and such.

Security Releases

First, a short item. We are changing our release schedule so that unless there are extenuating circumstances, security releases will go out on a Tuesday, with the pre-notification being the previous Tuesday. We don’t see a need to have people ready to sacrifice their weekend every time a new CVE comes out (see our security policy for details).

On the other hand, a severe enough vulnerability that has known exploits would be a good example of an extenuating circumstance.

Online communication

We created a new mailing list, openssl-project, that is for discussions about the governance and policies of OpenSSL. Anyone can join this list. Only members of the OMC and committers will be able to post.
We want this to be a useful list for the OMC and committers to communicate in public – like many Parliaments, for example, where debate is public but the public doesn’t speak.

Still, not everyone is completely comfortable with this change. It’s an experiment, and we will see how it goes and adjust if necessary. Note that OMC vote results will be posted here, as will initial discussions about vote topics. One important item that will be discussed on this list is planning for upcoming releases. Also, our paid fellows will be posting monthly status reports there.

We decided to increase our use of GitHub. In addition to asking that all bug reports and enhancement requests be reported as issues, we now want all major code proposals to be discussed as issues before a large pull request shows up. This will let the community discuss the feature, offer input on design and such, before having code to look at. We hope this will let us all first look at the bigger picture, before getting bogged down in the weeds of line-by-line code reviews.

We are going to close the openssl-dev mailing list. The distinction between openssl-dev and openssl-users was often unclear, and the changes described above will make that situation worse. GitHub issues are the way most projects work these days, and with the creation of openssl-project it should be much more clear how and when to use the openssl-users mailing list.

If our expectations are wrong, of course, we’ll fix or revert these changes.

Technical Debt

We want to reduce our technical debt. This includes not only things like old open RT tickets or GitHub issues, but also things like refactoring code to make it cleaner and hopefully have fewer bugs. The recent addition of the PACKET and WPACKET API’s in the libssl make the code much more clear, and also avoid hand-coded packet processing bugs (like forgetting to check a buffer against its declared length).

We are getting better on the documentation, but we still have many API’s that aren’t documented. If you want to help fix that, pass the -u flag to the the util/find-doc-nits script.

We have added a new label, technical-debt, to mark these kinds of things on GitHub.

Cryptography Policies

We also came to some decisions about a policy for cryptography, although the wording is still under discussion. The following applies to all new cryptography, and in a future release we will address the existing source.

  • Insecure configuration options will not be enabled by default but must be enabled by a compile-time switch. We had already started to do this by disabling SSLv2 and small keys. A recent change is that “multi-prime RSA” will enforce a maximum number of prime factors by default. In the future, it’s possible we’ll increase the minimum key sizes for a variety of algorithms.

  • It must be possible to disable all new algorithms at compile-time. When we extend the existing code, we’ll probably skip cases that are known to not work. Building OpenSSL without SHA will break libssl, so it’s not worth spending time on that.

  • The EVP interface is the primary interface for calling crypto operations. All new algorithms should only provide this API. In a future release, existing API’s like AES_encrypt will be provided with a compatibility layer, perhaps separately, that wraps the EVP API.

  • All algorithms and protocols should be recognized by a national or international standards body. That is somewhat vague, but the important point is that we most of us are implementors, not cryptographers, and will defer judgement to experts.

  • The DEFAULT value for the cipher string is not the same as ALL. That is, while many ciphers will be available to the libraries, they will not be enabled at the TLS layer unless specified at run-time. This brought up the point that the syntax of the cipher string cannot support the things people need it to do, including “cipher classes,” custom keywords, and site-wide configurations.

Roadmap

We remain committed to having TLS 1.3 be the main feature for our next release. Of course we must wait for the IETF to finish it. We’ll again point out that this is version 1.1.1, and you should get your applications ready by porting to 1.1.0 now.

We reviewed the status of our license-change work. We’ll post an update in a couple of weeks, but our goal is to change the license with our next release.

We also decided that the primary focus of the next feature release after 1.1.1 will be FIPS. We know that FIPS is very important to some, not all, members of our community and we are committed to addressing this. We don’t have much more information to share, and we know there has been some confusion and misleading communication out there. But we do want to make this strong, definitive statement: OpenSSL will implement a FIPS solution, and we expect it will be completed much sooner than previous timetables indicated.

January 18, 2018 01:00 AM

January 17, 2018

Colin Percival

Some thoughts on Spectre and Meltdown

By now I imagine that all of my regular readers, and a large proportion of the rest of the world, have heard of the security issues dubbed "Spectre" and "Meltdown". While there have been some excellent technical explanations of these issues from several sources — I particularly recommend the Project Zero blog post — I have yet to see anyone really put these into a broader perspective; nor have I seen anyone make a serious attempt to explain these at a level suited for a wide audience. While I have not been involved with handling these issues directly, I think it's time for me to step up and provide both a wider context and a more broadly understandable explanation.

January 17, 2018 02:40 AM

January 16, 2018

Cryptography Engineering

Apple in China: who holds the keys?

Last week Apple made an announcement describing changes to the iCloud service for users residing in mainland China. Beginning on February 28th, all users who have specified China as their country/region will have their iCloud data transferred to the GCBD cloud services operator in Guizhou, China.

Chinese news sources optimistically describe the move as a way to offer improved network performance to Chinese users, while Apple admits that the change was mandated by new Chinese regulations on cloud services. Both explanations are almost certainly true. But neither answers the following question: regardless of where it’s stored, how secure is this data?

Apple offers the following:

Apple has strong data privacy and security protections in place and no backdoors will be created into any of our systems”

That sounds nice. But what, precisely, does it mean? If Apple is storing user data on Chinese services, we have to at least accept the possibility that the Chinese government might wish to access it — and possibly without Apple’s permission. Is Apple saying that this is technically impossible?

This is a question, as you may have guessed, that boils down to encryption.

Does Apple encrypt your iCloud backups?

Unfortunately there are many different answers to this question, depending on which part of iCloud you’re talking about, and — ugh — which definition you use for “encrypt”. The dumb answer is the one given in the chart on the right: all iCloud data probably is encrypted. But that’s the wrong question. The right question is: who holds the key(s)?

This kind of thing is Not Helpful.

There’s a pretty simple thought experiment you can use to figure out whether you (or a provider) control your encryption keys. I call it the “mud puddle test”. It goes like this:

Imagine you slip in a mud puddle, in the process (1) destroying your phone, and (2) developing temporary amnesia that causes you to forget your password. Can you still get your iCloud data back? If you can (with the help of Apple Support), then you don’t control the key.

With one major exception — iCloud Keychain, which I’ll discuss below — iCloud fails the mud puddle test. That’s because most Apple files are not end-to-end encrypted. In fact, Apple’s iOS security guide is clear that it sends the keys for encrypted files out to iCloud.

However, there is a wrinkle. You see, iCloud isn’t entirely an Apple service, not even here in the good-old U.S.A. In fact, the vast majority of iCloud data isn’t actually stored by Apple at all. Every time you back up your phone, your (encrypted)

A list of HTTPS requests made during an iCloud backup from an iPhone. The bottom two addresses are Amazon and Google Cloud Services “blob” stores.

data is transmitted directly to a variety of third-party cloud service providers including Amazon, Google and Microsoft.

And this is, from a privacy perspective, mostly** fine! Those services act merely as “blob stores”, storing unreadable encrypted data files uploaded by Apple’s customers. At least in principle, Apple controls the encryption keys for that data, ideally on a server located in a dedicated Apple datacenter.*

So what exactly is Apple storing in China?

Good question!

You see, it’s entirely possible that the new Chinese cloud stores will perform the same task that Amazon AWS, Google, or Microsoft do in the U.S. That is, they’re storing encrypted blobs of data that can’t be decrypted without first contacting the iCloud mothership back in the U.S. That would at least be one straightforward reading of Apple’s announcement, and it would also be the most straightforward mapping from iCloud’s current architecture and whatever it is Apple is doing in China.

Of course, this interpretation seems hard to swallow. In part this is due to the fact that some of the new Chinese regulations appear to include guidelines for user monitoring. I’m no lawyer, and certainly not an expert in Chinese law — so I can’t tell you if those would apply to backups. But it’s at least reasonable to ask whether Chinese law enforcement agencies would accept the total inability to access this data without phoning home to Cupertino, not to mention that this would give Apple the ability to instantly wipe all Chinese accounts. Solving these problems (for China) would require Apple to store keys as well as data in Chinese datacenters.

The critical point is that these two interpretations are not compatible. One implies that Apple is simply doing business as usual. The other implies that they may have substantially weakened the security protections of their system — at least for Chinese users.

And here’s my problem. If Apple needs to fundamentally rearchitect iCloud to comply with Chinese regulations, that’s certainly an option. But they should say explicitly and unambiguously what they’ve done. If they don’t make things explicit, then it raises the possibility that they could make the same changes for any other portion of the iCloud infrastructure without announcing it.

It seems like it would be a good idea for Apple just to clear this up a bit.

You said there was an exception. What about iCloud Keychain?

I said above that there’s one place where iCloud passes the mud puddle test. This is Apple’s Cloud Key Vault, which is currently used to implement iCloud Keychain. This is a special service that stores passwords and keys for applications, using a much stronger protection level than is used in the rest of iCloud. It’s a good model for how the rest of iCloud could one day be implemented.

For a description, see here. Briefly, the Cloud Key Vault uses a specialized piece of hardware called a Hardware Security Module (HSM) to store encryption keys. This HSM is a physical box located on Apple property. Users can access their own keys if and only if they know their iCloud Keychain password — which is typically the same as the PIN/password on your iOS device. However, if anyone attempts to guess this PIN too many times, the HSM will wipe that user’s stored keys.

The critical thing is that the “anyone” mentioned above includes even Apple themselves. In short: Apple has designed a key vault that even they can’t be forced to open. Only customers can get their own keys.

What’s strange about the recent Apple announcement is that users in China will apparently still have access to iCloud Keychain. This means that either (1) at least some data will be totally inaccessible to the Chinese government, or (2) Apple has somehow weakened the version of Cloud Key Vault deployed to Chinese users. The latter would be extremely unfortunate, and it would raise even deeper questions about the integrity of Apple’s systems.

Probably there’s nothing funny going on, but this is an example of how Apple’s vague (and imprecise) explanations make it harder to trust their infrastructure around the world.

So what should Apple do?

Unfortunately, the problem with Apple’s disclosure of its China’s news is, well, really just a version of the same problem that’s existed with Apple’s entire approach to iCloud.

Where Apple provides overwhelming detail about their best security systems (file encryption, iOS, iMessage), they provide distressingly little technical detail about the weaker links like iCloud encryption. We know that Apple can access and even hand over iCloud backups to law enforcement. But what about Apple’s partners? What about keychain data? How is this information protected? Who knows.

This vague approach to security might make it easier for Apple to brush off the security impact of changes like the recent China news (“look, no backdoors!”) But it also confuses the picture, and calls into doubt any future technical security improvements that Apple might be planning to make in the future. For example, this article from 2016 claims that Apple is planning stronger overall encryption for iCloud. Are those plans scrapped? And if not, will those plans fly in the new Chinese version of iCloud? Will there be two technically different versions of iCloud? Who even knows?

And at the end of the day, if Apple can’t trust us enough to explain how their systems work, then maybe we shouldn’t trust them either.

Notes:

* This is actually just a guess. Apple could also outsource their key storage to a third-party provider, even though this would be dumb.

** A big caveat here is that some iCloud backup systems use convergent encryption, also known as “message locked encryption”. The idea in these systems is that file encryption keys are derived by hashing the file itself. Even if a cloud storage provider does not possess encryption keys, it might be able to test if a user has a copy of a specific file. This could be problematic. However, it’s not really clear from Apple’s documentation if this attack is feasible. (Thanks to RPW for pointing this out.)

by Matthew Green at January 16, 2018 07:44 PM

TaoSecurity

Addressing Innumeracy in Reporting

Anyone involved in cybersecurity reporting needs a strong sense of numeracy, or mathematical literacy. I see two sorts of examples of innumeracy repeatedly in the media.

The first involves the time value of money. Recently CNN claimed Amazon CEO Jeff Bezos was the "richest person in history" and Recode said Bezos was "now worth more than Bill Gates ever was." Thankfully both Richard Steinnon and Noah Kirsch recognized the foolishness of these reports, correctly noting that Bezos would only rank number 17 on a list where wealth was adjusted for inflation.

This failure to recognize the time value of money is pervasive. Just today I heard the host of a podcast claim that the 1998 Jackie Chan movie Rush Hour was "the top grossing martial arts film of all time." According to Box Office Mojo, Rush Hour earned $244,386,864 worldwide. Adjusting for inflation, in 2017 dollars that's $367,509,865.67 -- impressive!

For comparison, I researched the box office returns for Bruce Lee's Enter the Dragon. Box Office Mojo lacked data, but I found a 2017 article stating his 1973 movie earned "$25 million in the U.S. and $90 million worldwide, excluding Hong Kong." If I adjust the worldwide figure of $90 million for inflation, in 2017 dollars that's $496,864,864.86 -- making Enter the Dragon easily more successful than Rush Hour.

If you're wondering about Crouching Tiger, Hidden Dragon, that 2000 movie earned $213,525,736 worldwide. That movie earned less than Rush Hour, and arrived two years later, so it's not worth doing the inflation math.

The take-away is that any time you are comparing dollars from different time periods, you must adjust for inflation to have your comparisons have any meaning whatsoever.

Chart by @CanadianFlags
The second sort of innumeracy I'd like to highlight today also involves money, but in a slightly different way. This involves changes in values over time.

For example, a company may grow revenue from 2015 to 2016, with 2015 revenue being $100,000 and 2016 being $200,000. That's a 100% gain.

If the company grows another $100,000 from 2016 to 2017, from $200,000 to $300,000, the growth rate has declined to 50%. To have maintained a 100% growth rate, the company needed to make $400,000 in 2016.

That same $100,000 dollar increase isn't so great when compared to the new base value.

We see the same dynamic at play when tracking the growth of individual stocks or market indices over time.

CNN wrote a story about the 1,000 point rise in the Dow Jones Industrial Average over a period of 7 days, from 25,000 to 26,000. One person Tweeted the chart at the above right, asking "is that healthy?" My answer -- you need a proper chart!

My second reaction was "that's a jump, but it's only (1-(25000/26000)) = 3.8%. Yes, 3.8% in 7 days is a lot, but that doesn't even rate in the top 20 one-day percentage gains or losses over the life of the index.

If the DJIA gained 1,000 points in 7 days 5 years ago, when the market was at 13,649, a rise to 14,649 would be a 6.8% gain. 20 years ago the market was roughly 3,310, so a 1,000 point rise to 4,310 would be a massive 23.2% gain.

A better way to depict the growth in the DJIA would be to use a logarithmic chart. The charts below show a linear version on the top and a logarithmic version below it.



Using barcharts.com, I drew the last 30 years of the DJIA at the top using a linear Y axis, meaning there is equal distance between 2,000 and 4,000, 4,000 and 6,000, and so on. The blue line shows the slope of the growth.

I then drew the same period using a logarithmic Y axis, meaning the percentage gains from one line to another are equal. For example, a 100% increase from 1,000 to 2,000 occupies the same distance as the 100% increase from 5,000 to 10,000. The green line shows the slope of the growth.

I put the blue and green lines on both charts to permit comparison of the slopes. As you can see, the growth, when properly indicated using a log chart and the green line, is less than the exaggerations introduced by the linear chart blue line.

There is indeed an upturn recently in the log chart, but the growth is probably on trend over time.

While we're talking about the market, let's take one minute to smack down the old trope that "what comes up, must come down." There is no "law of gravity" in investing, at least for the US market, as a whole.

The best example I have seen of the reality of the situation is this 2017 article titled The Dow’s tumultuous 120-year history, in one chart. Here is the chart:

Chart by Chris Kacher, managing director of MoKa Investors

What an amazing story. The title of the article should not be gloomy. It should be triumphant. Despite two World Wars, a Cold War, wars in Korea, Vietnam, the Middle East, and elsewhere, assassinations of world leaders, market depressions and recessions, and so on, the trend line is up, and up in a big way. While the DJIA doesn't represent the entire US market, it captures enough of it to be representative. This is why I do not bet against the US market over the long term. (And yes I recognize that the market and the economy are different.)

Individual companies may disappear, and the DJIA has indeed been changed many times over the years. However, those changes were made so that the index roughly reflected the makeup of the economy. Is it perfect? No. Does it capture the overall directional trend line since 1896? Yes.

Please keep in mind these two sorts of innumeracy -- the time value of money, and the importance of percentage changes over time -- when dealing with numbers and time.

by Richard Bejtlich (noreply@blogger.com) at January 16, 2018 05:31 PM

January 15, 2018

Michael Biven

Brain Dump

Every wonder why you have so many ideas when you’re in the bathroom taking a shower, or a bath, brushing your teeth or that other thing we use the room for?

Ever notice that there are no screens in there to pull or hold our attention?

Take a minute and count how many screens are around you right now. How many different TVs, phones, tablets, computers, e-readers, portable video game consoles, smartwatches, or VR headsets are within your sight? How many of these are in the bathroom?

p.s. please don’t admit to owning a pair of AR glasses.

While having a discussion with my wife, who had a MacBook on her lap, a iPad closed on the ottoman next to her feet, and her iPhone within reach I said…

“the nuance is blurred” and then I had no reply as she was still reading whatever it was she was reading.

I waited a minute and then asked her “you know I just asked you a question right?”

She replied “yes, something about being blurred.”

I repeated what I had said “yeah the nuance is blurred” as I started to walk to the bathroom I added “Blur as in the band, as in song number two, as in what I’m going to do.”

Which I did and then noticed there are no screens in the bathroom and wondered maybe this is why we have flashes of clarity and creativity in here. Which reminded me we recently joked about getting an Amazon Echo in here so we could ask Alexa to take notes so we don’t forget things.

So as I step back into the living room, even before I’m through the bathroom door I’m calling out to her “Don’t say anything! I had an idea that I need to write down before I forget!” She was looking at me as I stepped out and I was left with the impression she was holding onto something to tell me.

I grabbed my laptop, sit down and start writing. At this point I’ve read everything that you’ve read so far to her and she laughs.

We talk about the fact there’s a screen in every room in the house except for the bathroom. We go over the semantics of are there really screens in the bedroom. Because you know the iPads we both have can follow us around. She uses hers through out the day and I leave mine at the bed side table to watch something when I go to bed. Which isn’t really going to bed as it’s just laying down and watching TV. No wonder I get so little sleep.

We talk about the screens even in our car. Hell the car even emails and sends us texts messages when it gets low on windshield wiper fluid. Our car has a drinking problem and it likes to let us know about it.

Used to be the only screen in the house was the one television set the family had and if you were well-off your parents had a second set in their bedroom to watch Johnny Carson together.

Instead we watch different things on our own screen that we hold out in front of us or sit on our bellies. Recently my wife asked if there’s an app so we can both watch the same thing synced up on our individual iPads. We laughed when we realized that yes there is and it’s called a television.

We seem to miss the opportunities we have to build something new and we underestimate the value of what we lost. Instead we build things that place each of us in our own individual world. Walled off with wireless headsets and a microphone, where you can be talking with someone who is ignoring the people around them or maybe you’re just listening to music that a machine picked to play for you.

Yes there is usually at least one screen in the bathroom called a mirror. It’s passive and only reflects back what we bring to it. That’s the key difference. That screen is passive and is used as a tool so we can brush our teeth or have a moment of self reflection.

Anyways just wanted to capture this before I lost.

I never did find out what she was holding on to tell me.

January 15, 2018 07:01 PM

January 14, 2018

TaoSecurity

Remembering When APT Became Public

Last week I Tweeted the following on the 8th anniversary of Google's blog post about its compromise by Chinese threat actors:

This intrusion made the term APT mainstream. I was the first to associate it with Aurora, in this post 

https://taosecurity.blogspot.com/2010/01/google-v-china.html

My first APT post was a careful reference in 2007, when we all feared being accused of "leaking classified" re China: 

https://taosecurity.blogspot.com/2007/10/air-force-cyberspace-report.html

I should have added the term "publicly" to my original Tweet. There were consultants with years of APT experience involved in the Google incident response, and they recognized the work of APT17 at that company and others. Those consultants honored their NDAs and have stayed quiet.

I wrote my original Tweet as a reminder that "APT" was not a popular, recognized term until the Google announcement on 12 January 2010. In my Google v China blog post I wrote:

Welcome to the party, Google. You can use the term "advanced persistent threat" (APT) if you want to give this adversary its proper name.

I also Tweeted a similar statement on the same day:

This is horrifying: http://bit.ly/7x7vVW Google admits intellectual property theft from China; it's called Advanced Persistent Threat, GOOG

I made the explicit link of China and APT because no one had done that publicly.

This slide from a 2011 briefing I did in Hawaii captures a few historical points:


The Google incident was a watershed, for reasons I blogged on 16 January 2010. I remember the SANS DFIR 2008 event as effectively "APTCon," but beyond Mandiant, Northrup Grumman, and NetWitness, no one was really talking publicly about the APT until after Google.

As I noted in the July 2009 blog post, You Down With APT? (ugh):

Aside from Northrup Grumman, Mandiant, and a few vendors (like NetWitness, one of the full capture vendors out there) mentioning APT, there's not much else available. A Google search for "advanced persistent threat" -netwitness -mandiant -Northrop yields 34 results (prior to this blog post). (emphasis added)

Today that search yields 244,000 results.

I would argue we're "past APT." APT was the buzzword for RSA and other vendor-centric events from, say, 2011-2015, with 2013 being the peak following Mandiant's APT1 report.

The threat hasn't disappeared, but it has changed. I wrote my Tweet to mark a milestone and to note that I played a small part in it.

All my APT posts here are reachable by this APT tag. Also see my 2010 article for Information Security Magazine titled What APT Is, and What It Isn't.

by Richard Bejtlich (noreply@blogger.com) at January 14, 2018 07:08 PM

January 11, 2018

Sean's IT Blog

Getting Started with VMware UEM

One of the most important aspects of any end-user computing environment is user experience, and a big part of user experience is managing the user’s Windows and application preferences.  This is especially true in non-persistent environments and published application environments where the user may not log into the same machine each time.

So why is this important?  A big part of a user’s experience on any desktop is maintaining their customizations.  Users invest time into personalizing their environment by setting a desktop background, creating an Outlook signature, or configuring the applications to connect to the correct datasets, and the ability to retain these settings make users more productive because they don’t have to recreate these every time they log in or open the application.

User settings portability is nothing new.  Microsoft Roaming Profiles have been around for a long time.  But Roaming Profiles also have limitations, such as casting a large net by moving the entire profile (or the App Data roaming folder on newer versions of Windows) or being tied to specific versions of Windows.

VMware User Environment Manager, or UEM for short, is one of a few 3rd-party user environment management tools that can provide a lighter-weight solution than Roaming Profiles.  UEM can manage both the user’s personalization of the environment by capturing Windows and application settings as well as apply settings to the desktop or RDSH session based on the user’s context.  This can include things like setting up network drives and printers, Horizon Smart Policies to control various Horizon features, and acting as a Group Policy replacement for per-user settings.

UEM Components

There are four main components for VMware UEM.  The components are:

  • UEM Management Console – The central console for managing the UEM configuration
  • UEM Agent – The local agent installed on the virtual desktop, RDSH server, or physical machine
  • Configuration File Share – Network File Share where UEM configuration data is stored
  • User Data File Share – Network File Share where user data is stored.  Depending on the environment and the options used, this can be multiple file shares.

The UEM Console is the central management tool for UEM.  The console does not require a database, and anything that is configured in the console is saved as a text file on the configuration file share.  The agent consumes these configuration files from the configuration share during logon and logoff, and it saves the application or Windows settings configuration when the application is closed or when the user logs off, and it stores them on the user data share as a ZIP file.

The UEM Agent also includes a few other optional tools.  These are a Self-Service Tool, which allows users to restore application configurations from a backup, and an Application Migration Tool.  The Application Migration Tool allows UEM to convert settings from one version of an application to another when the vendor uses different registry keys and AppData folders for different versions.  Microsoft Office is the primary use case for this feature, although other applications may require it as well.

UEM also includes a couple of additional tools to assist administrators with maintaining environment.  The first of these tools is the Application Profiler Tool.  This tool runs on a desktop or an RDSH Server in lieu of the UEM Agent.  Administrators can use this tool to create UEM profiles for applications, and it does this by running the application and tracking where the application writes to.  It can also be used to create default settings that are applied to an application when a user launches it, and this can be used to reduce the amount of time it takes to get users applications configured for the first time.

The other support tool is the Help Desk support tool.  The Helpdesk support tool allows helpdesk agents or other IT support to restore a backup of a user settings archive.

Planning for a UEM Deployment

There are a couple of questions you need to ask when deploying UEM.

  1. How many configuration shares will I have, and where will they be placed? – In multisite environments, I may need multiple configuration shares so the configs are placed near the desktop environments.
  2. How many user data shares will I need, and where will they be placed?  – This is another factor in multi-site environments.  It is also a factor in how I design my overall user data file structure if I’m using other features like folder redirection.  Do I want to keep all my user data together to make it easier to manage and back up, or do I want to place it on multiple file shares.
  3. Will I be using file replication technology? What replication technology will be used? – A third consideration for multi-site environments.  How am I replicating my data between sites?
  4. What URL/Name will be used to access the shares? – Will some sort of global namespace, like a DFS Namespace, be used to provide a single name for accessing the shares?  Or will each server be accessed individually?  This can have some implications around configuring Group Policy and how users are referred to the nearest file server.
  5. Where will I run the management console?  Who will have access to it?
  6. Will I configure UEM to create backup copies of user settings?  How many backup copies will be created?

These are the main questions that come up from an infrastructure and architecture perspective, and they influence how the UEM file shares and Group Policy objects will be configured.

Since UEM does not require a database, and it does not actively use files on a network share, planning for multisite deployments is relatively straight forward.

In the next post, I’ll talk about deploying the UEM supporting infrastructure.

by seanpmassey at January 11, 2018 01:55 PM

January 10, 2018

OpenSSL

OpenSSL Wins the Levchin Prize

Today I have had great pleasure in attending the Real World Crypto 2018 conference in Zürich in order to receive the Levchin prize on behalf of the OpenSSL team.

The Levchin prize for Real World Cryptography recognises up to two groups or individuals each year who have made significant advances in the practice of cryptography and its use in real-world systems. This year one of the two recipients is the OpenSSL team. The other recipient is Hugo Krawczyk.

The team were selected by the selection committee “for dramatic improvements to the code quality of OpenSSL”. You can read the press release here.

We have worked very hard over the last few years to build an active and engaged community around the project. I am very proud of what that community has collectively achieved. Although this prize names specific individuals in the OpenSSL team, I consider ourselves to just be the custodians of the project. In a very real way this prize is for the whole community. It is fantastic to be recognised in this way.

The job is not done though. There is still much work we need to do. I am confident though that our community will work together to achieve what needs to be done.

January 10, 2018 07:00 PM

Cryptography Engineering

Attack of the Week: Group Messaging in WhatsApp and Signal

If you’ve read this blog before, you know that secure messaging is one of my favorite topics. However, recently I’ve been a bit disappointed. My sadness comes from the fact that lately these systems have been getting too damned good. That is, I was starting to believe that most of the interesting problems had finally been solved.

If nothing else, today’s post helped disabuse me of that notion.

This result comes from a new paper by Rösler, Mainka and Schwenk from Ruhr-Universität Bochum (affectionately known as “RUB”). The RUB paper paper takes a close look at the problem of group messaging, and finds that while messengers may be doing fine with normal (pairwise) messaging, group messaging is still kind of a hack.

If all you want is the TL;DR, here’s the headline finding: due to flaws in both Signal and WhatsApp (which I single out because I use them), it’s theoretically possible for strangers to add themselves to an encrypted group chat. However, the caveat is that these attacks are extremely difficult to pull off in practice, so nobody needs to panic. But both issues are very avoidable, and tend to undermine the logic of having an end-to-end encryption protocol in the first place. (Wired also has a good article.)

First, some background.

How do end-to-end encryption and group chats work?

In recent years we’ve seen plenty of evidence that centralized messaging servers aren’t a very good place to store confidential information. The good news is: we’re not stuck with them. One of the most promising advances in the area of secure communications has been the recent widespread deployment of end-to-end (e2e) encrypted messaging protocols. 

At a high level, e2e messaging protocols are simple: rather than sending plaintext to a server — where it can be stolen or read — the individual endpoints (typically smartphones) encrypt all of the data using keys that the server doesn’t possess. The server has a much more limited role, moving and storing only meaningless ciphertext. With plenty of caveats, this means a corrupt server shouldn’t be able to eavesdrop on the communications.

In pairwise communications (i.e., Alice communicates with only Bob) this encryption is conducted using a mix of public-key and symmetric key algorithms. One of the most popular mechanisms is the Signal protocol, which is used by Signal and WhatsApp (notable for having 1.3 billion users!) I won’t discuss the details of the Signal protocol here, except to say that it’s complicated, but it works pretty well.

A fly in the ointment is that the standard Signal protocol doesn’t work quite as well for group messaging, primarily because it’s not optimized for broadcasting messages to many users.

To handle that popular case, both WhatsApp and Signal use a small hack. It works like this: each group member generates a single “group key” that this member will use to encrypt all of her messages to everyone else in the group. When a new member joins, everyone who is already in the group needs to send a copy of their group key to the new member (using the normal Signal pairwise encryption protocol). This greatly simplifies the operation of group chats, while ensuring that they’re still end-to-end encrypted.

How do members know when to add a new user to their chat?

Here is where things get problematic.

From a UX perspective, the idea is that only one person actually initiates the adding of a new group member. This person is called the “administrator”. This administrator is the only human being who should actually do anything — yet, her one click must cause some automated action on the part of every other group members’ devices. That is, in response to the administrator’s trigger, all devices in the group chat must send their keys to this new group member.

Notification messages in WhatsApp.

(In Signal, every group member is an administrator. In WhatsApp it’s just a subset of the members.)

The trigger is implemented using a special kind of message called (unimaginatively) a “group management message”. When I, as an administrator, add Tom to a group, my phone sends a group management message to all the existing group members. This instructs them to send their keys to Tom — and to notify the members visually so that they know Tom is now part of the group. Obviously this should only happen if I really did add Tom, and not if some outsider (like that sneaky bastard Tom himself!) tries to add Tom.

And this is where things get problematic.

Ok, what’s the problem?

According to the RUB paper, both Signal and WhatsApp fail to properly authenticate group management messages.

The upshot is that, at least in theory, this makes it possible for an unauthorized person — not a group administrator, possibly not even a member of the group — to add someone to your group chat.

The issues here are slightly different between Signal and WhatsApp. To paraphrase Tolstoy, every working implementation is alike, but every broken one is broken in its own way. And WhatsApp’s implementation is somewhat worse than Signal. Here I’ll break them down.

Signal. Signal takes a pragmatic (and reasonable) approach to group management. In Signal, every group member is considered an administrator — which means that any member can add a new member. Thus if I’m a member of a group, I can add a new member by sending a group management message to every other member. These messages are sent encrypted via the normal (pairwise) Signal protocol.

The group management message contains the “group ID” (a long, unpredictable number), along with the identity of the person I’m adding. Because messages are sent using the Signal (pairwise) protocol, they should be implicitly authenticated as coming from me — because authenticity is a property that the pairwise Signal protocol already offers. So far, this all sounds pretty good.

The problem that the RUB researchers discovered through testing, is that while the Signal protocol does authenticate that the group management comes from me, it doesn’t actually check that I am a member of the group — and thus authorized to add the new user!

In short, if this finding is correct, it turns out that any random Signal user in the world can you send a message of the form “Add Mallory to the Group 8374294372934722942947”, and (if you happen to belong to that group) your app will go ahead and try to do it.

The good news is that in Signal the attack is very difficult to execute. The reason is that in order to add someone to your group, I need to know the group ID. Since the group ID is a random 128-bit number (and is never revealed to non-group-members or even the server**) that pretty much blocks the attack. The main exception to this is former group members, who already know the group ID — and can now add themselves back to the group with impunity.

(And for the record, while the group ID may block the attack, it really seems like a lucky break — like falling out of a building and landing on a street awning. There’s no reason the app should process group management messages from random strangers.)

So that’s the good news. The bad news is that WhatsApp is a bit worse.

WhatsApp. WhatsApp uses a slightly different approach for its group chat. Unlike Signal, the WhatsApp server plays a significant role in group management, which means that it determines who is an administrator and thus authorized to send group management messages.

Additionally, group management messages are not end-to-end encrypted or signed. They’re sent to and from the WhatsApp server using transport encryption, but not the actual Signal protocol.

When an administrator wishes to add a member to a group, it sends a message to the server identifying the group and the member to add. The server then checks that the user is authorized to administer that group, and (if so), it sends a message to every member of the group indicating that they should add that user.

The flaw here is obvious: since the group management messages are not signed by the administrator, a malicious WhatsApp server can add any user it wants into the group. This means the privacy of your end-to-end encrypted group chat is only guaranteed if you actually trust the WhatsApp server.

This undermines the entire purpose of end-to-end encryption.

But this is silly. Don’t we trust the WhatsApp server? And what about visual notifications?

One perfectly reasonable response is that exploiting this vulnerability requires a compromise of the WhatsApp server (or legal compulsion, perhaps). This seems fairly unlikely.

And yet, the entire point of end-to-end encryption is to remove the server from the trusted computing base. We haven’t entirely achieved this yet, thanks to things like key servers. But we are making progress. This bug is a step back, and it’s one a sophisticated attacker potentially could exploit.

A second obvious objection to these issues is that adding a new group member results in a visual notification to each group member. However, it’s not entirely clear that these messages are very effective. In general they’re relatively easy to miss. So these are meaningful bugs, and things that should be fixed.

How do you fix this?

The great thing about these bugs is that they’re both eminently fixable.

The RUB paper points out some obvious countermeasures. In Signal, just make sure that the group management messages come from a legitimate member of the group. In WhatsApp, make sure that the group management messages are signed by an administrator.*

Obviously fixes like this are a bit complex to roll out, but none of these should be killers.

Is there anything else in the paper?

Oh yes, there’s quite a bit more. But none of it is quite as dramatic. For one thing, it’s possible for attackers to block message acknowledgements in group chats, which means that different group members could potentially see very different versions of the chat. There are also several cases where forward secrecy can be interrupted. There’s also some nice analysis of Threema, if you’re interested.

I need a lesson. What’s the moral of this story?

The biggest lesson is that protocol specifications are never enough. Both WhatsApp and Signal (to an extent) have detailed protocol specifications that talk quite a bit about the cryptography used in their systems. And yet the issues reported in the RUB paper not obvious from reading these summaries. I certainly didn’t know about them.

In practice, these problems were only found through testing.

Mallory.

So the main lesson here is: test, test, test. This is a strong argument in favor of open-source applications and frameworks that can interact with private-garden services like Signal and WhatsApp. It lets us see what the systems are getting right and getting wrong.

The second lesson — and a very old one — is that cryptography is only half the battle. There’s no point in building the most secure encryption protocol in the world if someone can simply instruct your client to send your keys to Mallory. The greatest lesson of all time is that real cryptosystems are always broken this way — and almost never through the fancy cryptographic attacks we love to write about.

Notes:

* The challenge here is that since WhatsApp itself determines who the administrators are, this isn’t quite so simple. But at very least you can ensure that someone in the group was responsible for the addition.

** According to the paper, the Signal group IDs are always sent encrypted between group members and are never revealed to the Signal server. Indeed, group chat messages look exactly like pairwise chats, as far as the server is concerned. This means only current or former group members should know the group ID.

by Matthew Green at January 10, 2018 02:01 PM

January 09, 2018

pagetable

The Ultimate Apollo Guidance Computer Talk [video]

This is the video recording of “The Ultimate Apollo Guidance Computer Talk” at 34C3. If you think it is too fast, try watching it at 0.75x speed.

I will post the slides in Apple Keynote format later.

If you enjoyed this, you might also like my talks

by Michael Steil at January 09, 2018 04:05 PM

R.I.Pienaar

Replicating NATS Streams between clusters

I’ve mentioned NATS before – the fast and light weight message broker from nats.io – but I haven’t yet covered the sister product NATS Streaming before so first some intro.

NATS Streaming is in the same space as Kafka, it’s a stream processing system and like NATS it’s super light weight delivered as a single binary and you do not need anything like Zookeeper. It uses normal NATS for communication and ontop of that builds streaming semantics. Like NATS – and because it uses NATS – it is not well suited to running over long cluster links so you end up with LAN local clusters only.

This presents a challenge since very often you wish to move data out of your LAN. I wrote a Replicator tool for NATS Streaming which I’ll introduce here.

Streaming?


First I guess it’s worth covering what Streaming is, I should preface also that I am quite new in using Stream Processing tools so I am not about to give you some kind of official answer but just what it means to me.

In a traditional queue like ActiveMQ or RabbitMQ, which I covered in my Common Messaging Patterns posts, you do have message storage, persistence etc but those who consume a specific queue are effectively a single group of consumers and messages either go to all or load shared all at the same pace. You can’t really go back and forth over the message store independently as a client. A message gets ack’d once and once it’s been ack’d it’s done being processed.

In a Stream your clients each have their own view over the Stream, they all have their unique progress and point in the Stream they are consuming and they can move backward and forward – and indeed join a cluster of readers if they so wish and then have load balancing with the other group members. A single message can be ack’d many times but once ack’d a specific consumer will not get it again.

This is to me the main difference between a Stream processing system and just a middleware. It’s a huge deal. Without it you will find it hard to build very different business tools centred around the same stream of data since in effect every message can be processed and ack’d many many times vs just once.

Additionally Streams tend to have well defined ordering behaviours and message delivery guarantees and they support clustering etc. much like normal middleware has. There’s a lot of similarity between streams and middleware so it’s a bit hard sometimes to see why you won’t just use your existing queueing infrastructure.

Replicating a NATS Stream


I am busy building a system that will move Choria registration data from regional data centres to a global store. The new Go based Choria daemon has a concept of a Protocol Adapter which can receive messages on the traditional NATS side of Choria and transform them into Stream messages and publish them.

This gets me my data from the high frequency, high concurrency updates from the Choria daemons into a Stream – but the Stream is local to the DC. Indeed in the DC I do want to process these messages to build a metadata store there but I also want to processes these messages for replication upward to my central location(s).

Hence the importance of the properties of Streams that I highlighted above – multiple consumers with multiple views of the Stream.

There are basically 2 options available:

  1. Pick a message from a topic, replicate it, pick the next one, one after the other in a single worker
  2. Have a pool of workers form a queue group and let them share the replication load

At the basic level the first option will retain ordering of the messages – order in the source queue will be the order in the target queue. NATS Streaming will try to redeliver a message that timed out delivery and it won’t move on till that message is handled, thus ordering is safe.

The 2nd option since you have multiple workers you have no way to retain ordering of the messages since workers will go at different rates and retries can happen in any order – it will be much faster though.

I can envision a 3rd option where I have multiple workers replicating data into a temporary store where on the other side I inject them into the queue in order but this seems super prone to failure, so I only support these 2 methods for now.

Limiting the rate of replication


There is one last concern in this scenario, I might have 10s of data centres all with 10s of thousands of nodes. At the DC level I can handle the rate of messages but at the central location where I might have 10s of DCs x 10s of thousands of machines if I had to replicate ALL the data at near real time speed I would overwhelm the central repository pretty quickly.

Now in the case of machine metadata you probably want the first piece of metadata immediately but from then on it’ll be a lot of duplicated data with only small deltas over time. You could be clever and only publish deltas but you have the problem then that should a delta publish go missing you end up with a inconsistent state – this is something that will happen in distributed systems.

So instead I let the replicator inspect your JSON, if your JSON has something like fqdn in it, it can look at that and track it and only publish data for any single matching sender every 1 hour – or whatever you configure.

This has the effect that this kind of highly duplicated data is handled continuously in the edge but that it only gets a snapshot replication upwards once a hour for any given node. This solves the problem neatly for me without there being any risks to deltas being lost, it’s also a lot simpler to implement.

Choria Stream Replicator


So finally I present the Choria Stream Replicator. It does all that was described above with a YAML configuration file, something like this:

debug: false                     # default
verbose: false                   # default
logfile: "/path/to/logfile"      # STDOUT default
state_dir: "/path/to/statedir"   # optional
topics:
    cmdb:
        topic: acme.cmdb
        source_url: nats://source1:4222,nats://source2:4222
        source_cluster_id: dc1
        target_url: nats://target1:4222,nats://target2:4222
        target_cluster_id: dc2
        workers: 10              # optional
        queued: true             # optional
        queue_group: cmdb        # optional
        inspect: host            # optional
        age: 1h                  # optional
        monitor: 10000           # optional
        name: cmdb_replicator    # optional

Please review the README document for full configuration details.

I’ve been running this in a test DC with 1k nodes for a week or so and I am really happy with the results, but be aware this is new software so due care should be given. It’s available as RPMs, has a Puppet module, and I’ll upload some binaries on the next release.

by R.I. Pienaar at January 09, 2018 08:04 AM

January 03, 2018

The Lone Sysadmin

Should We Panic About the KPTI/KAISER Intel CPU Design Flaw?

As a followup to yesterday’s post, I’ve been asked: should we panic about the KPTI/KAISER/F*CKWIT Intel CPU design flaw? My answer was: it depends on a lot of unknowns. There are NDAs around a lot of the fixes so it’s hard to know the scope and effect. We also don’t know how much this will affect […]

The post Should We Panic About the KPTI/KAISER Intel CPU Design Flaw? appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 03, 2018 10:14 PM

Anton Chuvakin - Security Warrior

Annual Blog Round-Up – 2017

Here is my annual "Security Warrior" blog round-up of top 10 popular posts in 2017. Note that my current Gartner blog is where you go for my recent blogging (example), all of the content below predates 2011!

  1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here!
  2. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009. Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not.
  3. Simple Log Review Checklist Released!” is often at the top of this list – the checklist is still a very useful tool for many people. “On Free Log Management Tools” is a companion to the checklist (updated version
  4. My classic PCI DSS Log Review series is always hot! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ in 2017 as well), useful for building log review processes and procedures , whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (out in its 4th edition!
  5. “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?” is a quick framework for assessing the SIEM project (well, a program, really) costs at an organization (a lot more details on this here in this paper). 
  6. “SIEM Bloggables”  is a very old post, more like a mini-paper on  some key aspects of SIEM, use cases, scenarios, etc as well as 2 types of SIEM users. Still very relevant, if not truly modern.
  7. Top 10 Criteria for a SIEM?” came from one of my last projects I did when running my SIEM consulting firm in 2009-2011 (for my recent work on evaluating SIEM tools, see this document
  8. Another old checklist, “Log Management Tool Selection Checklist Out!”  holds a top spot  – it can be used to compare log management tools during the tool selection process or even formal RFP process. But let me warn you – this is from 2010.
  9. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013.
  10. “A Myth of An Expert Generalist” is a fun rant on what I think it means to be “a security expert” today; it argues that you must specialize within security to really be called an expert.

Total pageviews: 33,231 in 2017.

Disclaimer: all this content was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing.  For my current security blogging, go here.

Also see my past monthly and annual “Top Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

by Anton Chuvakin (anton@chuvakin.org) at January 03, 2018 07:11 PM

January 02, 2018

Everything Sysadmin

DevOpsDays New York City 2018: Register now!

DevOpsDays NYC is only a few weeks away: Jan 18-19, 2018!

Please register asap. We could sell out this year. With this awesome line-up of speakers, tickets are going fast.

https://www.devopsdays.org/events/2018-new-york-city/

Or... this handy shortcut: http://dod.nyc

by Tom Limoncelli at January 02, 2018 11:50 PM

The Lone Sysadmin

Intel CPU Design Flaw, Performance Degradation, Security Updates

I was just taking a break and reading some tech news and I saw a wonderfully detailed post from El Reg (link below) about an Intel CPU design flaw and impending crisis-level security updates to fix it. As if that wasn’t bad enough, the fix for the problem is estimated to decrease performance by 5% […]

The post Intel CPU Design Flaw, Performance Degradation, Security Updates appeared first on The Lone Sysadmin. Head over to the source to read the full post!

by Bob Plankers at January 02, 2018 09:26 PM

Anton Chuvakin - Security Warrior

Monthly Blog Round-Up – December 2017

Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last
month’s visitor data  (excluding other monthly or annual round-ups):
  1. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge.  Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 
  2. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we are updating it now.
  3. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book (now in its 4th edition!) – note that this series is mentioned in some PCI Council materials. 
  4. Simple Log Review Checklist Released!” is often at the top of this list – this rapildy aging checklist is still a very useful tool for many people. “On Free Log Management Tools” (also aged a bit by now) is a companion to the checklist (updated version)
  5. “SIEM Bloggables”  is a very old post, more like a mini-paper on  some key aspects of SIEM, use cases, scenarios, etc as well as 2 types of SIEM users. Still very relevant, if not truly modern.
In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: 

A critical reference post (!):
Upcoming research on testing security:

Upcoming research on threat detection “starter kit”
Current research on SOAR:

Miscellaneous fun posts:

(see all my published Gartner research here)
Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016.

Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here.

Other posts in this endless series:

by Anton Chuvakin (anton@chuvakin.org) at January 02, 2018 07:06 PM

Michael Biven

Attrition from The Small Things

If you find yourself thinking about what work lies ahead in 2018 consider the following. Doesn’t matter if you’re only thinking of the first month, quarter or the entire year. How many changes can your team handle before it culminates to where they’re no longer capable of performing their normal operations? How about for yourself?

While you’re considering your answer frame change as anything from work related to existing or new projects, ongoing work supporting existing products, incidents, pivots in priorities or business models, new regulations, reorgs, to just plain interruptions.

Every individual and team has a point where their capabilities become ineffective. We have different tools and methodologies that track time spent on tasks. We track complexity (story points) delivered in a given timeframe (sprints). Burn up charts show tasks added across time, but this is a very narrow view of change. A holistic view of the impact of change on a team is missing. One that can show how change wears down on the people we’re responsible for and ourselves.

Reduction of Capability

Before we started needing data for most decisions we placed trust in individuals to do their job. When people were being pushed too far too fast they might push back. This still happens, but the early signs of it are often drowned out by data or a mantra of stick with the process. It’s developed into a narrow focus that has eroded trust in experience to drive us towards our goals. This has damaged some of the basic leadership skills needed and it has focused our industry on efficiency over effectiveness. I’m also starting to think this is creating a tendency where people are second guessing their own abilities due to the inabilities of others.

This reinforces a culture where leaders stop trusting the opinions of people doing the work or those who are close to it. When people push back the leaders have a choice to either listen and take into account the feedback or to double down on the data and methods used. This contributed in creating the environments where the labels “10x”, “Rock Stars” and “Ninjas” started being applied to engineers, designers, and developers.

heroics — “behavior or talk that is bold or dramatic, especially excessively or unexpectedly so: the makeshift team performed heroics.” — New Oxford American Dictionary

Ever think about why we apply the label heroics or hero when teams or people are able to pull through in the end? If the output of work and the frequency of changes were plotted I’d bet you’ll find the point where sustaining normal operations was impracticable or improbable was passed before these labels are used.

Last month’s fatal Amtrak derailment that killed three people was traveling more than twice the speed limit (80 mph in a 30 mph zone). The automated system (positive train control) designed to prevent these types of conditions from happening while installed was not activated. Was this fatal accident on the inaugural run of a new Amtrak route an example of where normal operations were no longer possible? Is this any different than the fatal collisions involving US Navy ships last year due to over-burdened personnel and equipment?

For the derailment it looks like a combination of failing to use available safety systems and following safety guidelines contributed to the accident. There’s also the question was the crew given training to build awareness of the new route. The Navy collisions looks to be the result of the strain of trying to do too much with too few people and resources. This includes individuals working too many hours, a reduction in training, failure to verify readiness, and a backlog of maintenance on the equipment, aircraft and ships.

The cadence of change was greater than what these organizations were capable of supporting.

For most of us working as engineers, designers, developers, product managers, or as online support we wouldn’t consider ourselves to be in a high-risk occupation. But the work we do impacts peoples lives in small to massive ways. These examples are something that we should be learning from. We should also acknowledge that we’re not good at being aware of the negative impacts of the tempo of change on our people.

There’s a phrase and image that can illustrate the dependencies between people, processes, and systems. It’s called the “Swiss Cheese Model” and it highlights when shortcomings between them line up it can allow a problem to happen. It also shows how the strengths from each is able to support the weaknesses of others.

Swiss Cheese Model of Accident Causation

Illustration by David Mack CC BY-SA 3.0.

We have runbooks, playbooks, incident management processes, and things to help us understand what is happening in our products and systems. Remember that these things are not absolute and they’re fallible. The systems and processes we put into place are never final, they’re ideas maintained as long as they stay relevant and then removed when they are no longer necessary. This requires awareness and diligence.

In any postmortem I’ve participated in or read through there were early signs that conditions were unusual. Often people fail to recognize a difference between what is happening and what is expected to happen. This is the point where a difference can start to develop into a problem if we ignore it. If you think you see something that doesn’t seem right you need to speak up.

After the Apollo 1 fire Gene Kranz gave a speech to his team at NASA that is knows as the Kranz Dictum. He begins by stating they work in a field that cannot tolerate incompetence. He then immediately holds himself and every part of the Apollo program accountable for their failures to prevent the deaths of Gus Grissom, Ed White, and Roger Chaffee.

From this day forward, Flight Control will be known by two words: “Tough” and “Competent.” Tough means we are forever accountable for what we do or what we fail to do. We will never again compromise our responsibilities. Every time we walk into Mission Control we will know what we stand for. Competent means we will never take anything for granted. We will never be found short in our knowledge and in our skills. — Gene Kranz

I take this as doing the work to protect the people involved. For us this should include ourselves, the people in our organizations, and our customers. Protection is gained when we’re thorough and accountable; sufficient training and resources are given; communication is concise and assertive; and we have an awareness of what is happening.

When I compare the derailment and collisions, what Kranz was speaking too, any emergency I responded to as a fire fighter, or any incident I worked as an engineer there are similarities. They’re the results from the attrition of little things that continued unabated.

Andon Cord for People

Alerting, availability, continuous integration/deployment, error rates, logging, metrics, monitoring, MTBF, MTTF, MVP, observability, reliability, resiliency, SLA, SLI, SLO, telemetry, throughput and uptime.

We build tools and we have all kinds of words and acronyms to help us frame our thoughts around the planning, building, maintaining and supporting of products. We even allow machines to bug us to fix them, including waking us up in the middle of the night. Why don’t we have the same level of response when people break?

One of the many things that came out of the Toyota Production System is Andon. It gives individuals the ability to stop the production line when a problem is found and call for help.

We talk about rapid feedback loops and iterative workflows, but we don’t talk about feedback from the people close to the work as a way of continuous improvement. We should be giving people the ability to pull the cord when there is an issue that impacts the ability for them or someone else on the team to perform. And that doesn’t mean only technical issues.

What would happen if your on-call staff had horrible time that they’re spent after their first night? Imagine if we gave our people the same level of support that we give our machines? Give them an andon cord to pull (i.e. page) that would get them the help they need.

As you’re planning don’t forget about your people. Could you track the frequency of changes happening to your team? Then plot the impact of that against the work completed? Think about providing an andon cord for them. How could you build a culture where people feel responsible to speak up when they see something that doesn’t line up with what we expect?

“People, ideas and technology. In that order!” — John Boyd.

Too many times we think a solution or problem is technical. More often than not it’s about a breakdown of communication and then sometimes not having the right people or protecting them.

The ideas from Boyd are a good example of how our industry fails to fully understand a concept before using it. If you’ve heard the phrase OODA Loop you’ve probably seen a circular image with points for Observe, Orient, Decide and Act. The thing is he never drew just a single loop. He gave a way to frame an environment and a process to help guide us through the unknowns. And it puts the people first by using their experience so when they recognize something for what it is they can act on it immediately. It was always more than a loop. It was a focus on the people and organizations.

January 02, 2018 11:20 AM

Homo-Adminus

My Favourite Books in 2017

Following the very ambitious and successful 2016 challenge, I have decided to keep the goal at the same level of 36 books for 2017 to prove to myself that it is sustainable and wasn’t a one-off success. Surprising myself, I have crushed the goal and finished 39 books this year. Below is summary of the best of those books.

Business, Management and Leadership

After changing my job at the beginning of 2017 and returning to Swiftype to focus on Technical Operations team leadership, I continued working on improving my skills in this area and read a number of truly awesome books:

  • The Effective Executive: The Definitive Guide to Getting the Right Things Done” by Peter F. Drucker — this classic has immediately become one of my favourite leadership books of all time. There are many useful lessons I learned from it (like the notion that all knowledge workers should consider themselves executives in some sense), but the most powerful was the part on executive time management.
  • Hatching Twitter: A True Story of Money, Power, Friendship, and Betrayal” by Nick Bilton — A truly horrifying “Game of Thrones”-like story behind the early years of Twitter. I didn’t think shit like that actually happened in real life… I guess the book made me grow up a little and realize, that simply doing your best to push your company forward is not always enough. I’d highly recommend this book to anybody working in a fast growing company or thinking about starting a VC-backed business.
  • Shoe Dog: A Memoir by the Creator of NIKE” by Phil Knight — a great story of a great company built by regular people striving for quality results. Heavily reinforces the notion that to be an entrepreneur you need to be a bit crazy and slightly masochistic. Overall, a very fascinating tale of a multi-decade development of a company — a strong contrast with all the modern stories about internet businesses. A must read for people thinking about starting a business.

Health, Medicine and Mortality

I have always been fascinated by the history of medicine, medical stories and the inner workings of the modern medical system. Unfortunately, this year I’ve had to interact with it a lot and that made me seriously consider the fact of our mortality. This has led me upon a quest to learn more about the topics of medicine, mortality and philosophy.

  • When Breath Becomes Air” by Paul Kalanithi — Fantastic memoir! Terrifying, depressing, beautifully described story of a young neurosurgeon, his cancer diagnosis, his battle with the horrible disease and up to the very end of his life. I found the story of Paul very relatable and just like with Atul Gawande’s book I’ve read last year, it brought forth very important questions on how should we deal with our own mortality. Paul gave us a great example of one of the options for how we may choose to spend our last days — the same way we may want to spend our lives: “You can’t reach perfection, but you can believe in an asymptote toward which you are ceaselessly striving”.
  • The Emperor of All Maladies” by Siddhartha Mukherjee — probably the best book on cancer out there (based on my limited research). The author takes us on a long, very interesting and terrifying trip through the dark ages of human war against cancer and explains why after so much time we are still only starting to understand how to deal with it and there is still a long road ahead. Highly recommended to anybody interested in the history of medicine or wants to understand more about the reason behind a malady that kills more than 8 million people each year.
  • Complications: A Surgeon’s Notes on an Imperfect Science” by Atul Gawande — once again, one of my favourite authors manages to explain a hard problem of complications in healthcare and give us a sobering look at the limits and fallibilities of modern medicine.
  • Bonus: “On The Shortness Of Life” by Seneca — It is amazing how something written 2000 years ago can have such profound relevance today. I found this short book really inspiring and it has led me to start my road to adapting some of Stoic techniques including mindfulness and meditation.

Miscellaneous

Few more books I found very interesting:

  • Born a Crime: Stories From a South African Childhood” by Trevor Noah — Listened to this book on Audible and absolutely loved it! Hearing Noah’s voice describing his crazy childhood in South Africa mixing fun and absolutely horrifying details of his life there and the struggles he had to endure being a coloured kid under and right after Apartheid.
    Even though it was never as scary as what Noah is describing in his book, I have found in his stories a lot of things I could relate to based on my childhood in late USSR and then in 1990s Ukraine which was going through an economic meltdown with all of the usual attributes like crime and crazy unemployment.
  • I Can’t Make This Up: Life Lessons” by Kevin Hart — I have never been a particular fan of Kevin Hart. Not that I disliked him, just didn’t really follow his career. This book (I absolutely recommend the audiobook version!) ended up being one of the biggest literary surprises ever for me: it is the funniest inspirational read and the most inspiring comic memoir I’ve ever read (or, in this case, listened to). Kevin’s dedication to his craft, his work ethic and perseverance are truly inspiring and his success is absolutely well-earned.
  • Kingpin: How One Hacker Took Over the Billion-Dollar Cybercrime Underground” by Kevin Poulsen — Terrifying read… I’ve never realized how close the early years of my career as a systems administrator and developer took me to the crazy world of underground computer crime that was unfolding around us.
    I’ve spent a few weeks week wondering if doing what Max and other people in this story did is the result of an innate personality trait or just a set of coincidences, a bad hand the life deals a computer specialist, turning them into a criminal. For many people working in this industry, it is always about the craft, the challenge of building systems (just like the bind hack was for Max) and I am not sure there is a point in one’s career when you make a conscious decision to become a criminal. Unfortunately, even after finishing the book I don’t have an answer to this question.
    The book is a fascinating primer on the effects of bad and the need for good security in today’s computerized society and I’d highly recommend it to everybody working with computers on a daily basis.
  • Modern Romance” by Aziz Ansari — very interesting insight into the crazy modern world of dating and romance. Made me really appreciate the fact that I have already found the love of my life and hope will never need to participate in the technology-driven culture today’s singles have to deal with. Really recommend listening to the audiobook, Aziz is very funny even when he’s talking about a serious topic like this.
  • The Year of Living Danishly: My Twelve Months Unearthing the Secrets of the World’s Happiest Country” by Helen Russell — Really liked this book. It offers a glimpse into a society surprisingly different from what many modern North Americans would consider normal. Reading about all kinds of Danish customs, I would think back to the times I grew up in USSR and realize, that modern Danish life is very close to what was promised by the party back then. The only difference — they’ve managed to make it work long term.
    Even though not many of us could or want to relocate to Denmark or to affect our government policies, there is a lot in this book that many of us could apply in our lives: trusting people more, striving for a better work-life balance, exercising more, surrounding ourselves with beautiful things, etc.

I hope you enjoyed this overview of the best books I’ve read in 2017. Let me know you liked it!

by Oleksiy Kovyrin at January 02, 2018 02:09 AM

January 01, 2018

HolisticInfoSec.org

toolsmith #130 - OSINT with Buscador

First off, Happy New Year! I hope you have a productive and successful 2018. I thought I'd kick off the new year with another exploration of OSINT. In addition to my work as an information security leader and practitioner at Microsoft, I am privileged to serve in Washington's military as a J-2 which means I'm part of the intelligence directorate of a joint staff. Intelligence duties in a guard unit context are commonly focused on situational awareness for mission readiness. Additionally, in my unit we combine part of J-6 (command, control, communications, and computer systems directorate of a joint staff) with J-2, making Cyber Network Operations a J-2/6 function. Open source intelligence (OSINT) gathering is quite useful in developing indicators specific to adversaries as well as identifying targets of opportunity for red team and vulnerability assessments. We've discussed numerous OSINT offerings as part of toolsmiths past, there's no better time than our 130th edition to discuss an OSINT platform inclusive of previous topics such as Recon-ng, Spiderfoot, Maltego, and Datasploit. Buscador is just such a platform and comes from genuine OSINT experts Michael Bazzell and David Wescott. Buscador is "a Linux Virtual Machine that is pre-configured for online investigators." Michael is the author of Open Source Intelligence Techniques (5th edition) and Hiding from the Internet (3rd edition). I had a quick conversation with him and learned that they will have a new release in January (1.2), which will address many issues and add new features. Additionally, it will also revamp Firefox since the release of version 57. You can download Buscador as an OVA bundle for a variety of virtualization options, or as a ISO for USB boot devices or host operating systems. I had Buscador 1.1 up and running on Hyper-V in a matter of minutes after pulling the VMDK out of the OVA and converting it with QEMU. Buscador 1.1 includes numerous tools, in addition to the above mentioned standard bearers, you can expect the following and others:
  • Creepy
  • Metagoofil
  • MediaInfo
  • ExifTool
  • EmailHarvester
  • theHarvester
  • Wayback Exporter
  • HTTrack Cloner
  • Web Snapper
  • Knock Pages
  • SubBrute
  • Twitter Exporter
  • Tinfoleak 
  • InstaLooter 
  • BleachBit 
Tools are conveniently offered via the menu bar on the UI's left, or can easily be via Show Applications.
To put Buscador through its paces, using myself as a target of opportunity, I tested a few of the tools I'd not prior utilized. Starting with Creepy, the geolocation OSINT tool, I configured the Twitter plugin, one of the four available (Flickr, Google+, Instagram, Twitter) in Creepy, and searched holisticinfosec, as seen in Figure 1.
Figure 1:  Creepy configuration




The results, as seen in Figure 2, include some good details, but no immediate location data.

Figure 2: Creepy results
Had I configured the other plugins or was even a user of Flickr or Google+, better results would have been likely. I have location turned off for my Tweets, but my profile does profile does include Seattle. Creepy is quite good for assessing targets who utilize social media heavily, but if you wish to dig more deeply into Twitter usage, check out Tinfoleak, which also uses geo information available in Tweets and uploaded images. The report for holisticinfosec is seen in Figure 3.

Figure 3: Tinfoleak
If you're looking for domain enumeration options, you can start with Knock. It's as easy as handing it a domain, I did so with holisticinfosec.org as seen in Figure 4, results are in Figure 5.
Figure 4: Knock run
Figure 5: Knock results
Other classics include HTTrack for web site cloning, and ExifTool for pulling all available metadata from images. HTTrack worked instantly as expected for holisticinfosec.org. I used Instalooter, "a program that can download any picture or video associated from an Instagram profile, without any API access", to grab sample images, then ran pyExifToolGui against them. As a simple experiment, I ran Instalooter against the infosec.memes Instagram account, followed by pyExifToolGui against all the downloaded images, then exported Exif metadata to HTML. If I were analyzing images for associated hashtags the export capability might be useful for an artifacts list.
Finally, one of my absolute favorites is Metagoofil, "an information gathering tool designed for extracting metadata of public documents." I did a quick run against my domain, with the doc retrieval parameter set at 50, then reviewed full.txt results (Figure 6), included in the output directory (home/Metagoofil) along with authors.csv, companies.csv, and modified.csv.

Figure 6: Metagoofil results

Metagoofil is extremely useful for gathering target data, I consider it a red team recon requirement. It's a faster, currently maintained offering that has some shared capabilities with Foca. It should also serve as a reminder just how much information is available in public facing documents, consider stripping the metadata before publishing. 

It's fantastic having all these capabilities ready and functional on one distribution, it keeps the OSINT discipline close at hand for those who need regular performance. I'm really looking forward to the Buscador 1.2 release, and better still, I have it on good authority that there is another book on the horizon from Michael. This is a simple platform with which to explore OSINT, remember to be a good citizen though, there is an awful lot that can be learned via these passive means.
Cheers...until next time.

by Russ McRee (noreply@blogger.com) at January 01, 2018 11:28 PM

December 29, 2017

SysAdmin1138

In defense of job titles

I've noticed that the startup-flavored tech industry has certain preferences when it comes to your job-title. They like them flat. A job tree can look like this:

  1. Intern (write software as a student)
  2. Software Engineer (write software as a full time salaried employee)
  3. Lead Software Engineer (does manager things in addition to software things)
  4. Manager (mostly does manager things; if they used to be a Software Engineer, maybe some of that if there is time)

Short and to the point. The argument in favor of this is pretty well put by:

A flat hierarchy keeps us from having to rank everyone against some arbitrary rules. What, really, is the quantifiable difference between a 'junior' and a 'senior' engineer? We are all engineers. If you do manager things, you're a lead. When you put Eclipse/Vim/VisualStudio behind you, then you're a manager.

No need to judge some engineers as better than other engineers. Easy. Simple. Understandable.

Over in the part of the tech-industry that isn't dominated by startups, but is dominated by, say, US Federal contracting rules you have a very different hierarchy.

  1. Associate Systems Engineer
  2. Junior Systems Engineer
  3. Systems Engineer
  4. Senior Systems Engineer
  5. Lead Systems Engineer (may do some managery things, may not)
  6. Principal Systems Engineer (the top title for technical stuff)

Because civil service is like that, each of those has a defined job title, with responsibilities, and skill requirements. Such job-reqs read similar to:

Diagnose and troubleshoots problems involving multiple interconnected systems. Proposes complete systems and integrates them. Work is highly independent, and is effective in coordinating work with other separate systems teams. May assume a team-lead role.

Or for a more junior role:

Diagnose and troubleshoot problems for a single system in an interconnected ecosystem. Proposes changes to specific systems and integrates them. Follows direction when implementing new systems. Work is somewhat independent, guided by senior engineers.

Due to the different incentive, win US government contracting agreements versus not having to judge engineers as better/worse than each other, having multiple classes of 'systems engineer' makes sense for the non-startup case.


I'm arguing that the startup-stance (flat) is more unfair. Yes, you don't have to judge people as 'better-than'.

On the job-title, at least.

Salaries are another story. Those work very much like Enterprise Pricing Agreements, where no two agreements look the same. List-price is only the opening bid of a protracted negotiation, after all. This makes sense, as hiring a tech-person is a 6-figure annual recurring cost in most large US job-markets (after you factor in fringe benefits, employer-side taxes etc). That's an Enterprise contract right there, no wonder each one is a unique snowflake of specialness.

I guarantee that the person deciding what a potential hire's salary is going to be is going to consider time-in-the-field, experience with our given technologies, ability to operate in a fast paced & changing environment, and ability to make change as the factors in the initial offer. All things that were involved in the job-req example I posted above. Sub-consciously certain unconscious biases factor in, such as race and gender.

By the time a new Software Engineer walks in the door for their first day they've already been judged better/worse than their peers. Just, no one knows it because it isn't in the job title.

If the company is one that bases annual compensation improvements on the previous year's performance, this judgment happens every year and compounds. Which is how you can get a hypothetical 7 person team that looks like this:

  1. Lead Software Engineer, $185,000/yr
  2. Software Engineer, $122,000/yr
  3. Software Engineer, $105,000/yr
  4. Software Engineer, $170,000/yr
  5. Software Engineer, $150,000/yr
  6. Software Engineer, $135,000/yr
  7. Software Engineer, $130,000/yr

Why is Engineer 4 paid so much more? Probably because they were the second hire after the Lead, meaning they have more years of increase under their belts, and possibly a guilt-raise when Engineer 1 was picked for Lead when they weren't after the 3rd hire happened and the team suddenly needed a Lead.

One job-title, $65,000 spread in annual compensation. Obviously, no one has been judged better or worse than each other.

Riiiiight.

Then something like #TalkPay happens. Engineer number 4 says in Slack, "I'm making 170K. #TalkPay". Engineer number 3 chokes on her coffee. Suddenly, five engineers are now hammering to get raises because they had no idea the company was willing to pay that much for a non-Lead.

Now, if that same series were done but with a Fed-style job series?

  1. Lead Software Engineer, $185,000/yr
  2. Junior Software Engineer, $122,000/yr
  3. Associate Software Engineer, $105,000/yr
  4. Senior Software Engineer, $170,000/yr
  5. Senior Software Engineer, $150,000/yr
  6. Software Engineer, $135,000/yr
  7. Software Engineer, $130,000/yr

Only one person will be banging on doors, Engineer number 5. Having a job-series allows you to have overt pay disparity without having to pretend everyone is equal to everyone else. It makes overt the judgment that is already being made, which makes the system more fair.


Is this the best of all possible worlds?

Heck no. Balancing unconscious bias mitigation (rigid salary scheduled and titles) versus compensating your high performers (individualized salary negotiations) is a fundamentally hard problem with unhappy people no matter what you pick. But not pretending we're all the same helps keep things somewhat more transparent. It also makes certain kinds of people not getting promotions somewhat more obvious than certain kinds of people getting half the annual raises of everyone else.

by SysAdmin1138 at December 29, 2017 06:09 PM