April 21, 2018

Cryptography Engineering

Wonk post: chosen ciphertext security in public-key encryption (Part 1)

In general I try to limit this blog to posts that focus on generally-applicable techniques in cryptography. That is, I don’t focus on the deeply wonky. But this post is going to be an exception. Specifically, I’m going to talk about a topic that most “typical” implementers don’t — and shouldn’t — think about.

Specifically: I’m going to talk about various techniques for making public key encryption schemes chosen ciphertext secure. I see this as the kind of post that would have saved me ages of reading when I was a grad student, so I figured it wouldn’t hurt to write it all down.

Background: CCA(1/2) security

Early (classical) ciphers used a relatively weak model of security, if they used one at all. That is, the typical security model for an encryption scheme was something like the following:

1. I generate an encryption key (or keypair for public-key encryption)
2. I give you the encryption of some message of my choice
3. You “win” if you can decrypt it

This is obviously not a great model in the real world, for several reasons. First off, in some cases the attacker knows a lot about the message to be decrypted. For example: it may come from a small space (like a set of playing cards). For this reason we require a stronger definition like “semantic security” that assumes the attacker can choose the plaintext distribution, and can also obtain the encryption of messages of his/her own choice. I’ve written more about this here.

More relevant to this post, another limitation of the above game is that — in some real-world examples — the attacker has even more power. That is: in addition to obtaining the encryption of chosen plaintexts, they may be able to convince the secret keyholder to decrypt chosen ciphertexts of their choice.

The latter attack is called a chosen-ciphertext (CCA) attack.

At first blush this seems like a really stupid model. If you can ask the keyholder to decrypt chosen ciphertexts, then isn’t the scheme just obviously broken? Can’t you just decrypt anything you want?

The answer, it turns out, is that there are many real-life examples where the attacker has decryption capability, but the scheme isn’t obviously broken. For example:

1. Sometimes an attacker can decrypt a limited set of ciphertexts (for example, because someone leaves the decryption machine unattended at lunchtime.) The question then is whether they can learn enough from this access to decrypt other ciphertexts that are generated after she loses access to the decryption machine — for example, messages that are encrypted after the operator comes back from lunch.
2. Sometimes an attacker can submit any ciphertext she wants — but will only obtain a partial decryption of the ciphertext. For example, she might learn only a single bit of information such as “did this ciphertext decrypt correctly”. The question, then, is whether she can leverage this tiny amount of data to fully decrypt some ciphertext of her choosing.

The first example is generally called a “non-adaptive” chosen ciphertext attack, or a CCA1 attack (and sometimes, historically, a “lunchtime” attack). There are a few encryption schemes that totally fall apart under this attack — the most famous textbook example is Rabin’s public key encryption scheme, which allows you to recover the full secret key from just a single chosen-ciphertext decryption.

The more powerful second example is generally referred to as an “adaptive” chosen ciphertext attack, or a CCA2 attack. The term refers to the idea that the attacker can select the ciphertexts they try to decrypt based on seeing a specific ciphertext that they want to attack, and by seeing the answers to specific decryption queries.

In this article we’re going to use the more powerful “adaptive” (CCA2) definition, because that subsumes the CCA1 definition. We’re also going to focus primarily on public-key encryption.

With this in mind, here is the intuitive definition of the experiment we want a CCA2 public-key encryption scheme to be able to survive:

1. I generate an encryption keypair for a public-key scheme and give you the public key.
2. You can send me (sequentially and adaptively) many ciphertexts, which I will decrypt with my secret key. I’ll give you the result of each decryption.
3. Eventually you’ll send me a pair of messages (of equal length) $M_0, M_1$ and I’ll pick a bit $b$ at random, and return to you the encryption of $M_b$, which I will denote as $C^* \leftarrow {\sf Encrypt}(pk, M_b)$.
4. You’ll repeat step (2), sending me ciphertexts to decrypt. If you send me $C^*$ I’ll reject your attempt. But I’ll decrypt any other ciphertext you send me, even if it’s only slightly different from $C^*$.
5. The attacker outputs their guess $b'$. They “win” the game if $b'=b$.

We say that our scheme is secure if the attacker wins only with a significantly greater probability than they would win with if they simply guessed $b'$ at random. Since they can win this game with probability 1/2 just by guessing randomly, that means we want (Probability attacker wins the game) – 1/2 to be “very small” (typically a negligible function of the security parameter).

You should notice two things about this definition. First, it gives the attacker the full decryption of any ciphertext they send me. This is obviously much more powerful than just giving the attacker a single bit of information, as we mentioned in the example further above. But note that powerful is good. If our scheme can remain secure in this powerful experiment, then clearly it will be secure in a setting where the attacker gets strictly less information from each decryption query.

The second thing you should notice is that we impose a single extra condition in step (4), namely that the attacker cannot ask us to decrypt $C^*$. We do this only to prevent the game from being “trivial” — if we did not impose this requirement, the attacker could always just hand us back $C^*$ to decrypt, and they would always learn the value of $b$.

(Notice as well that we do not give the attacker the ability to request encryptions of chosen plaintexts. We don’t need to do that in the public key encryption version of this game, because we’re focusing exclusively on public-key encryption here — since the attacker has the public key, she can encrypt anything she wants without my help.)

With definitions out of the way, let’s talk a bit about how we achieve CCA2 security in real schemes.

A quick detour: symmetric encryption

This post is mainly going to focus on public-key encryption, because that’s actually the problem that’s challenging and interesting to solve. It turns out that achieving CCA2 for symmetric-key encryption is really easy. Let me briefly explain why this is, and why the same ideas don’t work for public-key encryption.

(To explain this, we’ll need to slightly tweak the CCA2 definition above to make it work in the symmetric setting. The changes here are small: we won’t give the attacker a public key in step (1), and at steps (2) and (4) we will allow the attacker to request the encryption of chosen plaintexts as well as the decryption.)

The first observation is that many common encryption schemes — particularly, the widely-used cipher modes of operation like CBC and CTR — are semantically secure in a model where the attacker does not have the ability to decrypt chosen ciphertexts. However, these same schemes break completely in the CCA2 model.

The simple reason for this is ciphertext malleability. Take CTR mode, which is particularly easy to mess with. Let’s say we’ve obtained a ciphertext $C^*$ at step (4) (recall that $C^*$ is the encryption of $M_b$), it’s trivially easy to “maul” the ciphertext — simply by flipping, say, a bit of the message (i.e., XORing it with “1”). This gives us a new ciphertext $C' = C^* \oplus 1$ that we are now allowed to submit for decryption. We are now allowed (by the rules of the game) to submit this ciphertext, and obtain $M_b \oplus 1$, which we can use to figure out $b$.

(A related, but “real world” variant of this attack is Vaudenay’s Padding Oracle Attack, which breaks actual implementations of symmetric-key cryptosystems. Here’s one we did against Apple iMessage. Here’s an older one on XML encryption.)

So how do we fix this problem? The straightforward observation is that we need to prevent the attacker from mauling the ciphertext $C^*$. The generic approach to doing this is to modify the encryption scheme so that it includes a Message Authentication Code (MAC) tag computed over every CTR-mode ciphertext. The key for this MAC scheme is generated by the encrypting party (me) and kept with the encryption key. When asked to decrypt a ciphertext, the decryptor first checks whether the MAC is valid. If it’s not, the decryption routine will output “ERROR”. Assuming an appropriate MAC scheme, the attacker can’t modify the ciphertext (including the MAC) without causing the decryption to fail and produce a useless result.

So in short: in the symmetric encryption setting, the answer to CCA2 security is simply for the encrypting parties to authenticate each ciphertext using a secret authentication (MAC) key they generate. Since we’re talking about symmetric encryption, that extra (secret) authentication key can be generated and stored with the decryption key. (Some more efficient schemes make this all work with a single key, but that’s just an engineering convenience.) Everything works out fine.

So now we get to the big question.

CCA security is easy in symmetric encryption. Why can’t we just do the same thing for public-key encryption?

As we saw above, it turns out that strong authenticated encryption is sufficient to get CCA(2) security in the world of symmetric encryption. Sadly, when you try this same idea generically in public key encryption, it doesn’t always work. There’s a short reason for this, and a long one. The short version is: it matters who is doing the encryption.

Let’s focus on the critical difference. In the symmetric CCA2 game above, there is exactly one person who is able to (legitimately) encrypt ciphertexts. That person is me. To put it more clearly: the person who performs the legitimate encryption operations (and has the secret key) is also the same person who is performing decryption.

Even if the encryptor and decryptor aren’t literally the same person, the encryptor still has to be honest. (To see why this has to be the case, remember that the encryptor has shared secret key! If that party was a bad guy, then the whole scheme would be broken, since they could just output the secret key to the bad guys.) And once you’ve made the stipulation that the encryptor is honest, then you’re almost all the way there. It suffices simply to add some kind of authentication (a MAC or a signature) to any ciphertext she encrypts. At that point the decryptor only needs to determine whether any given ciphertexts actually came from the (honest) encryptor, and avoid decrypting the bad ones. You’re done.

Public key encryption (PKE) fundamentally breaks all these assumptions.

In a public-key encryption scheme, the main idea is that anyone can encrypt a message to you, once they get a copy of your public key. The encryption algorithm may sometimes be run by good, honest people. But it can also be run by malicious people. It can be run by parties who are adversarial. The decryptor has to be able to deal with all of those cases. One can’t simply assume that the “real” encryptor is honest.

Let me give a concrete example of how this can hurt you. A couple of years ago I wrote a post about flaws in Apple iMessage, which (at the time) used simple authenticated (public key) encryption scheme. The basic iMessage encryption algorithm used public key encryption (actually a combination of RSA with some AES thrown in for efficiency) so that anyone could encrypt a message to my key. For authenticity, it required that every message be signed with an ECDSA signature by the sender.

When I received a message, I would look up the sender’s public key and first make sure the signature was valid. This would prevent bad guys from tampering with the message in flight — e.g., executing nasty stuff like adaptive chosen ciphertext attacks. If you squint a little, this is almost exactly a direct translation of the symmetric crypto approach we discussed above. We’re simply swapping the MAC for a digital signature.

The problems with this scheme start to become apparent when we consider that there might be multiple people sending me ciphertexts. Let’s say the adversary is on the communication path and intercepts a signed message from you to me. They want to change (i.e., maul) the message so that they can execute some kind of clever attack. Well, it turns out this is simple. They simply rip off the honest signature and replace it one they make themselves:

The new message is identical, but now appears to come from a different person (the attacker). Since the attacker has their own signing key, they can maul the encrypted message as much as they want, and sign new versions of that message. If you plug this attack into (a version) of the public-key CCA2 game up top, you see they’ll win quite easily. All they have to do is modify the challenge ciphertext $C^*$ at step (4) to be signed with their own signing key, then they can change it by munging with the CTR mode encryption, and request the decryption of that ciphertext.

Of course if I only accept messages from signed by some original (guaranteed-to-be-honest) sender, this scheme might work out fine. But that’s not the point of public key encryption. In a real public-key scheme — like the one Apple iMessage was trying to build — I should be able to (safely) decrypt messages from anyone, and in that setting this naive scheme breaks down pretty badly.

Whew.

Ok, this post has gotten a bit long, and so far I haven’t actually gotten to the various “tricks” for adding chosen ciphertext security to real public key encryption schemes. That will have to wait until the next post, to come shortly.

Vincent Bernat

OPL2 Audio Board: an AdLib sound card for Arduino

In a previous article, I presented the OPL2LPT, a sound card for the parallel port featuring a Yamaha YM3812 chip, also known as OPL2—the chip of the AdLib sound card. The OPL2 Audio Board for Arduino is another indie sound card using this chip. However, instead of relying on a parallel port, it uses a serial interface, which can be drived from an Arduino board or a Raspberry Pi. While the OPL2LPT targets retrogamers with real hardware, the OPL2 Audio Board cannot be used in the same way. Nonetheless, it can also be operated from ScummVM and DOSBox!

Unboxing🔗

The OPL2 Audio Board can be purchased on Tindie, either as a kit or fully assembled. I have paired it with a cheap clone of the Arduino Nano. A library to drive the board is available on GitHub, along with some examples.

One of them is DemoTune.ino. It plays a short tune on three channels. It can be compiled and uploaded to the Arduino with PlatformIO—installable with pip install platformio—using the following command:1

$platformio ci \ --board nanoatmega328 \ --lib ../../src \ --project-option="targets=upload" \ --project-option="upload_port=/dev/ttyUSB0" \ DemoTune.ino [...] PLATFORM: Atmel AVR > Arduino Nano ATmega328 SYSTEM: ATMEGA328P 16MHz 2KB RAM (30KB Flash) Converting DemoTune.ino [...] Configuring upload protocol... AVAILABLE: arduino CURRENT: upload_protocol = arduino Looking for upload port... Use manually specified: /dev/ttyUSB0 Uploading .pioenvs/nanoatmega328/firmware.hex [...] avrdude: 6618 bytes of flash written [...] ===== [SUCCESS] Took 5.94 seconds =====  Immediately after the upload, the Arduino plays the tune. 🎶 The next interesting example is SerialIface.ino. It turns the audio board into a sound card over serial port. Once the code has been pushed to the Arduino, you can use the play.py program in the same directory to play VGM files. They are a sample-accurate sound format for many sound chips. They log the exact commands sent. There are many of them on VGMRips. Be sure to choose the ones for the YM3812/OPL2! Here is a small selection: Usage with DOSBox & ScummVM🔗 Notice The support for the serial protocol used in this section has not been merged yet. In the meantime, grab SerialIface.ino from the pull request: git checkout 50e1717. When the Arduino is flashed with SerialIface.ino, the board can be driven through a simple protocol over the serial port. By patching DOSBox and ScummVM, we can make them use this unusual sound card. Here are some examples of games: • 0:00, with DOSBox, the first level of Doom 🎮 • 1:06, with DOSBox, the introduction of Loom 🎼 • 2:38, with DOSBox, the first level of Lemmings 🐹 • 3:32, with DOSBox, the introduction of Legend of Kyrandia 🃏 • 6:47, with ScummVM, the introduction of Day of the Tentacle ☢️ • 11:10, with DOSBox, the introduction of Another World2 🐅 DOSBox🔗 The serial protocol is described in the SerialIface.ino file: /* * A very simple serial protocol is used. * * - Initial 3-way handshake to overcome reset delay / serial noise issues. * - 5-byte binary commands to write registers. * - (uint8) OPL2 register address * - (uint8) OPL2 register data * - (int16) delay (milliseconds); negative -> pre-delay; positive -> post-delay * - (uint8) delay (microseconds / 4) * * Example session: * * Arduino: HLO! * PC: BUF? * Arduino: 256 (switches to binary mode) * PC: 0xb80a014f02 (write OPL register and delay) * Arduino: k * * A variant of this protocol is available without the delays. In this * case, the BUF? command should be sent as B0F? The binary protocol * is now using 2-byte binary commands: * - (uint8) OPL2 register address * - (uint8) OPL2 register data */  Adding support for this protocol in DOSBox is relatively simple (patch). For best performance, we use the 2-byte variant (5000 ops/s). The binary commands are pipelined and a dedicated thread collects the acknowledgments. A semaphore captures the number of free slots in the receive buffer. As it is not possible to read registers, we rely on DOSBox to emulate the timers, which are mostly used to let the various games detect the OPL2. The patch is tested only on Linux but should work on any POSIX system—not Windows. To test it, you need to build DOSBox from source: $ sudo apt build-dep dosbox
$git clone https://github.com/vincentbernat/dosbox.git -b feature/opl2audioboard$ cd dosbox
$./autogen.sh$ ./configure && make


Replace the sblaster section of ~/.dosbox/dosbox-SVN.conf:

[sblaster]
sbtype=none
oplmode=opl2
oplrate=49716
oplemu=opl2arduino
opl2arduino=/dev/ttyUSB0


Then, run DOSBox with ./src/dosbox. That’s it!

You will likely get the “OPL2Arduino: too slow, consider increasing buffer” message a lot. To fix this, you need to recompile SerialIface.ino with a bigger receive buffer:

$platformio ci \ --board nanoatmega328 \ --lib ../../src \ --project-option="targets=upload" \ --project-option="upload_port=/dev/ttyUSB0" \ --project-option="build_flags=-DSERIAL_RX_BUFFER_SIZE=512" \ SerialIface.ino  ScummVM🔗 The same code can be adapted for ScummVM (patch). To test, build it from source: $ sudo apt build-dep scummvm
$git clone https://github.com/vincentbernat/scummvm.git -b feature/opl2audioboard$ cd scummvm
$./configure --disable-all-engines --enable-engine=scumm && make  Then, you can start ScummVM with ./scummvm. Select “AdLib Emulator” as the music device and “OPL2 Arduino” as the AdLib emulator.3 Like for DOSBox, watch the console to check if you need a larger receive buffer. Enjoy! 😍 1. This command is valid for an Arduino Nano. For another board, take a look at the output of platformio boards arduino↩︎ 2. Another World (also known as Out of This World), released in 1991, designed by Éric Chahi, is using sampled sounds at 5 kHz or 10 kHz. With a serial port operating at 115,200 bits/s, the 5 kHz option is just within our reach. However, I have no idea if the rendering is faithful. It doesn’t sound like a SoundBlaster, but it sounds analogous to the rendering of the OPL2LPT which sounds similar to the SoundBlaster when using the 10 kHz option. DOSBox’ AdLib emulation using Nuked OPL3—which is considered to be the best—sounds worse. ↩︎ 3. If you need to specify a serial port other than /dev/ttyUSB0, add a line opl2arduino_device= in the ~/.scummvmrc configuration file. ↩︎ Chris Siebenmann The increasingly surprising limits to the speed of our Amanda backups When I started dealing with backups the slowest part of the process was generally writing things out to tape, which is why Amanda was much happier when you gave it a 'holding disk' that it could stage all of the backups to before it had to write them out to tape. Once you had that in place, the speed limit was generally some mix between the network bandwidth to the Amanda server and the speed of how fast the machines being backed up could grind through their filesystems to create the backups. When networks moved to 1G, you (and we) usually wound up being limited by the speed of reading through the filesystems to be backed up. (If you were backing up a lot of separate machines, you might initially be limited by the Amanda server's 1G of incoming bandwidth, but once most machines started finishing their backups you usually wound up with one or two remaining machines that had larger, slower filesystems. This slow tail wound up determining your total backup times. This was certainly our pattern, especially because only our fileservers have much disk space to back up. The same has typically been true of backing up multiple filesystems in parallel from the same machine; sooner or later we wind up stuck with a few big, slow filesystems, usually ones we're doing full dumps of.) Then we moved our Amanda servers to 10G-T networking and, from my perspective, things started to get weird. When you have 1G networking, it is generally slower than even a single holding disk; unless something's broken, modern HDs will generally do at least 100 Mbytes/sec of streaming writes, which is enough to keep up with a full speed 1G network. However this is only just over 1G data rates, which means that a single HD is vastly outpaced by a 10G network. As long as we had a number of machines backing up at once, the Amanda holding disk was suddenly the limiting factor. However, for a lot of the run time of backups we're only backing up our fileservers, because they're where all the data is, and for that we're currently still limited by how fast the fileservers can do disk IO. (The fileservers only have 1G network connections for reasons. However, usually it's disk IO that's the limiting factor, likely because scanning through filesystems is seek-limited. Also, I'm ignoring a special case where compression performance is our limit.) All of this is going to change in our next generation of fileservers, which will have both 10G-T networking and SSDs. Assuming that the software doesn't have its own IO rate limits (which is not always a safe assumption), both the aggregate SSDs and all the networking from the fileservers to Amanda will be capable of anywhere from several hundred Mbytes/sec up to as much 10G bandwidth as Linux can deliver. At this point the limit on how fast we can do backups will be down to the disk speeds on the Amanda backup servers themselves. These will probably be significantly slower than the rest of the system, since even striping two HDs together would only get us up to around 300 Mbytes/sec at most. (It's not really feasible to use a SSD for the Amanda holding disk, because it would cost too much to get the capacities we need. We currently dump over a TB a day per Amanda server, and things can only be moved off the holding disk at the now-paltry HD speed of 100 to 150 Mbytes/sec.) This whole shift feels more than a bit weird to me; it's upended my perception of what I expect to be slow and what I think of as 'sufficiently fast that I can ignore it'. The progress of hardware over time has made it so the one part that I thought of as fast (and that was designed to be fast) is now probably going to be the slowest. (This sort of upset in my world view of performance happens every so often, for example with IO transfer times. Sometimes it even sticks. It sort of did this time, since I was thinking about this back in 2014. As it turned out, back then our new fileservers did not stick at 10G, so we got to sleep on this issue until now.) April 20, 2018 Sarah Allen false dichotomy of control vs sharing Email is the killer app of the Internet. Amidst many sharing and collaboration applications and services, most of us frequently fall back to email. Marc Stiegler suggests that email often “just works better”. Why is this? Digital communication is fast across distances and allows access to incredible volumes of information, yet digital access controls typically force us into a false dichotomy of control vs sharing. Looking at physical models of sharing and access control, we can see that we already have well-established models where we can give up control temporarily, yet not completely. Alan Karp illustrated this nicely at last week’s Internet Identity Workshop (IIW) in a quick anecdote: Marc gave me the key to his car so I could park in in my garage. I couldn’t do it, so I gave the key to my kid, and asked my neighbor to do it for me. She stopped by my house, got the key and used it to park Marc’s car in my garage. The car key scenario is clear. In addition to possession of they key, there’s even another layer of control — if my kid doesn’t have a driver’s license, then he can’t drive the car, even if he holds the key. When we translate this story to our modern digital realm, it sounds crazy: Marc gave me his password so I could copy a file from his computer to mine. I couldn’t do it, so I gave Marc’s password to my kid, and asked my neighbor to do it for me. She stopped by my house so my kid could tell her my password, and then she used it to copy the file from Marc’s computer to mine. After the conference, I read Marc Stiegler’s 2009 paper Rich Sharing for the Web details key features of sharing that we have in the real world that are illustrated in the anecdote that Alan so effectively rattled off. These 6 features (enumerated below) enable people to create networks of access rights that implement the Principle of Least Authority (POLA). The key is to limit how much you need to trust someone before sharing. “Systems that do not implement these 6 features will feel rigid and inadequately functional once enough users are involved, forcing the users to seek alternate means to work around the limitations in those applications.” 1. Dynamic: I can grant access quickly and effortlessly (without involving an administrator). 2. Attenuated: To give you permission to do or see one thing, I don’t have to give you permission to do everything. (e.g. valet key allows driving, but not access to the trunk) 3. Chained: Authority may be delegated (and re-delegated). 4. Composable: I have permission to drive a car from the State of California, and Marc’s car key. I require both permissions together to drive the car. 5. Cross-jurisdiction: There are three families involved, each with its own policies, yet there’s no need to communicate policies to another jurisdiction. In the example, I didn’t need to ask Marc to change his policy to grant my neighbor permission to drive his car. 6. Accountable: If Marc finds a new scratch on his car, he knows to ask me to pay for the repair. It’s up to me to collect from my neighbor. Digital access control systems will typically record who did which action, but don’t record who asked an administrator to grant permission. Note: Accountability is not always directly linked to delegation. Marc would likely hold me accountable if his car got scratched, even if my neighbor had damaged the car when parking it in the garage. Whereas, if it isn’t my garage, bur rather a repair shop where my neighbor drops off the car for Marc, then if the repair shop damages the car, Marc would hold them responsible. How does this work for email? The following examples from Marc’s paper were edited for brevity: • Dynamic: You can send email to anyone any time. • Attenuated: When I email you an attachment, I’m sending a read-only copy. You don’t have access to my whole hard drive and you don’t expect that modifying it will change my copy. • Chained: I can forward you an email. You can then forward it to someone else. • Cross-Domain: I can send email to people at other companies and organizations with permissions from their IT dept. • Composable: I can include an attachment from email originating at one company with text or another attachment from another email and send it to whoever I want. • Accountable: If Alice asks Bob to edit a file and email it back, and Bob asks Carol to edit the file, and Bob then emails it back, Alice will hold Bob responsible if the edits are erroneous. If Carol (whom Alice may not know) emails her result directly to Alice, either Alice will ask Carol who she is before accepting the changes, or if Carol includes the history of messages in the message, Alice will directly see, once again, that she should hold Bob responsible. Further reading Alan Karp’s IoT Position Paper compares several sharing tools across these 6 features and also discusses ZBAC (authoriZation-Based Access Control) where authorization is known as a “capability.” An object capability is an unforgeable token that both designates a resource and grants permission to access it. Chris Siebenmann Spam from Yahoo Groups has quietly disappeared Over the years I have written several times about what was, at the time, an ongoing serious and long-term spam problem with email from Yahoo Groups. Not only was spam almost all of the Groups email that we got, but it was also clear that Yahoo Groups was allowing spammers to create their own mailing lists. I was coincidentally reminded of this history recently, so I wondered how things were today. One answer is that spam from Yahoo Groups has disappeared. Oh, it's not completely and utterly gone; we rejected one probable spam in last December and two at the end of July 2017, which is almost as far back as our readily accessible logs go (they stretch back to June 15th, 2017). But for pretty much anyone, much less what it was before, that counts as completely vanished. Certainly it counts for not having any sort of spam problem. But this is the tip of the iceberg, because it turns out that email volume from Yahoo Groups has fallen off significantly as well. We almost always get under ten accepted messages a day from Yahoo Groups, and some days we get none. Even after removing the spam, this is nothing like four years ago in 2014, when my entry implies that we got about 22 non-spam messages a day from Yahoo Groups. At one level I'm not surprised. Yahoo has been visibly and loudly dying for quite a while now, so I bet that a lot of people and groups have moved away from Yahoo Groups. If you had an active group that you cared about, it was clearly time to find alternate hosting quite some time ago and probably many people did (likely with Google Groups). At another level, I'm a bit surprised that it's this dramatic a shift. I would have expected plenty of people and groups to stick around until the very end, out of either inertia or ignorance. Perhaps Yahoo Groups service got so bad and so unreliable that even people who don't pay attention to computer news noticed that there was some problem. On the other hand there's another metric, the amount of email from Yahoo Groups that was rejected due to bad destination addresses here (and how many different addresses there are). We almost always seen a small number of such rejections a day, and the evidence suggests that almost all of them are for the same few addresses. There are old, obsolete addresses here that have been rejecting Yahoo Groups email since last June, and Yahoo Groups is still trying to send email to them. Apparently they don't even handle locally generated bounces, never mind bounces that they refuse to accept back. I can't say I'm too surprised. Given all of this I can't say I regret the slow motion demise of Yahoo Groups. At this point I'm not going to wish it was happening faster, because it's no longer causing us problems (and clearly hasn't been for more than half a year), but it's also clearly still not healthy. It's just that either the spammers abandoned it too or they finally got thrown off. (Perhaps a combination of both.) April 19, 2018 Steve Kemp's Blog A filesystem for known_hosts The other day I had an idea that wouldn't go away, a filesystem that exported the contents of ~/.ssh/known_hosts. I can't think of a single useful use for it, beyond simple shell-scripting, and yet I couldn't resist. $ go get -u github.com/skx/knownfs
$go install github.com/skx/knownfs  Now make it work: $ mkdir ~/knownfs
$knownfs ~/knownfs  Beneat out mount-point we can expect one directory for each known-host. So we'll see entries:  ~/knownfs$ ls | grep \.vpn
builder.vpn
deagol.vpn
master.vpn
www.vpn

~/knownfs $ls | grep steve blog.steve.fi builder.steve.org.uk git.steve.org.uk mail.steve.org.uk master.steve.org.uk scatha.steve.fi www.steve.fi www.steve.org.uk  The host-specified entries will each contain a single file fingerprint, with the fingerprint of the remote host:  ~/knownfs$ cd www.steve.fi
~/knownfs/www.steve.fi $ls fingerprint frodo ~/knownfs/www.steve.fi$ cat fingerprint
98:85:30:f9:f4:39:09:f7:06:e6:73:24:88:4a:2c:01


I've used it in a few shell-loops to run commands against hosts matching a pattern, but beyond that I'm struggling to think of a use for it.

If you like the idea I guess have a play:

It was perhaps more useful and productive than my other recent work - which involves porting an existing network-testing program from Ruby to golang, and in the process making it much more uniform and self-consistent.

The resulting network tester is pretty good, and can now notify via MQ to provide better decoupling too. The downside is of course that nobody changes network-testing solutions on a whim, and so these things are basically always in-house only.

Chris Siebenmann

The sensible way to use Bourne shell 'here documents' in pipelines

I was recently considering a shell script where I might want to feed a Bourne shell 'here document' to a shell pipeline. This is certainly possible and years ago I wrote an entry on the rules for combining things with here documents, where I carefully wrote down how to do this and the general rule involved. This time around, I realized that I wanted to use a much simpler and more straightforward approach, one that is obviously correct and is going to be clear to everyone. Namely, putting the production of the here document in a subshell.

(
cat <<EOF
with as much as you want.
EOF
) | sed | whatever


This is not as neat and nominally elegant as taking advantage of the full power of the Bourne shell's arcane rules, and it's probably not as efficient (in at least some sh implementations, you may get an extra process), but I've come around to feeling that that doesn't matter. This may be the brute force solution, but what matters is that I can look at this code and immediately follow it, and I'm going to be able to do that in six months or a year when I come back to the script.

(Here documents are already kind of confusing as it stands without adding extra strangeness.)

Of course you can put multiple things inside the (...) subshell, such as several here documents that you output only conditionally (or chunks of always present static text mixed with text you have to make more decisions about). If you want to process the entire text you produce in some way, you might well generate it all inside the subshell for convenience.

Perhaps you're wondering why you'd want to run a here document through a pipe to something. The case that frequently comes up for me is that I want to generate some text with variable substitution but I also want the text to flow naturally with natural line lengths, and the expansion will have variable length. Here, the natural way out is to use fmt:

(
cat <<EOF
My message to $NAME goes here. It concerns$HOST, where $PROG died unexpectedly. EOF ) | fmt  Using fmt reflows the text regardless of how long the variables expand out to. Depending on the text I'm generating, I may be fine with reflowing all of it (which means that I can put all of the text inside the subshell), or I may have some fixed formatting that I don't want passed through fmt (so I have to have a mix of fmt'd subshells and regular text). Having written that out, I've just come to the obvious realization that for simple cases I can just directly use fmt with a here document: fmt <<EOF My message to$NAME goes here.
It concerns $HOST, where$PROG
died unexpectedly.
EOF


This doesn't work well if there's some paragraphs that I want to include only some of the time, though; then I should still be using a subshell.

(For whatever reason I apparently have a little blind spot about using here documents as direct input to programs, although there's no reason for it.)

Raymii.org

Using the apt_key module one can add an APT key with ansible. You can get the key from a remote server or from a file, or just a key ID. I got the request to do some stuff on a machine which was quite rescricted (so no HKP protocol) and I was asked not to place to many files on the machine. The apt_key was needed but it could not be a file, so using a YAML Literal Block Scalar I was able to add the key inline in the playbook. Not the best way to do it, but one of the many ways Ansible allows it.

April 18, 2018

Vincent Bernat

Self-hosted videos with HLS

Note

This article was first published on Exoscale blog with some minor modifications.

Hosting videos on YouTube is convenient for several reasons: pretty good player, free bandwidth, mobile-friendly, network effect and, at your discretion, no ads.1 On the other hand, this is one of the less privacy-friendly solution. Most other providers share the same characteristics—except the ability to disable ads for free.

With the <video> tag, self-hosting a video is simple:2

<video controls>
<source src="../videos/big_buck_bunny.webm" type="video/webm">
<source src="../videos/big_buck_bunny.mp4" type="video/mp4">
</video>


However, while it is possible to provide a different videos depending on the screen width, adapting the video to the available bandwidth is trickier. There are two solutions:

They are both adaptive bitrate streaming protocols: the video is sliced in small segments and made available at a variety of different bitrates. Depending on current network conditions, the player automatically selects the appropriate bitrate to download the next segment.

HLS was initially implemented by Apple but is now also supported natively by Microsoft Edge and Chrome on Android. hls.js is a JavaScript library bringing HLS support to other browsers. MPEG-DASH is technically superior (codec-agnostic) but only works through a JavaScript library, like dash.js. In both cases, support of the Media Source Extensions is needed when native support is absent. Safari on iOS doesn’t have this feature and cannot use MPEG-DASH. Consequently, the most compatible solution is currently HLS.

Encoding🔗

To serve HLS videos, you need three kinds of files:

• the media segments (encoded with different bitrates/resolutions),
• a media playlist for each variant, listing the media segments, and
• a master playlist, listing the media playlists.

Media segments can come in two formats:

• MPEG-2 Transport Streams (TS), or
• Fragmented MP4.

Fragmented MP4 media segments are supported since iOS 10. They are a bit more efficient and can be reused to serve the same content as MPEG-DASH (only the playlists are different). Also, they can be served from the same file with range requests. However, if you want to target older versions of iOS, you need to stick with MPEG-2 TS.3

FFmpeg is able to convert a video to media segments and generate the associated media playlists. Peer5’s documentation explains the suitable commands. I have put together an handy (Python 3.6) script, video2hls, stitching together all the steps. After executing it on your target video, you get a directory containing:

• media segments for each resolution (1080p_1_001.ts, 720p_2_001.ts, …)
• media playlists for each resolution (1080p_1.m3u8, 720p_2.m3u8, …)
• master playlist (index.m3u8)
• progressive (streamable) MP4 version of your video (progressive.mp4)
• poster (poster.jpg)

The script accepts a lot of options for customization. Use the --help flag to discover them. Run it with --debug to get the ffmpeg commands executed with an explanation for each flag. For example, the poster is built with this command:

ffmpeg \
# seek to the given position (5%) \
-ss 4 \
# load input file \
-i ../2018-self-hosted-videos.mp4 \
# take only one frame \
-frames:v 1 \
# filter to select an I-frame and scale \
-vf 'select=eq(pict_type\,I),scale=1280:720' \
# request a JPEG quality ~ 10 \
-qscale:v 28 \
# output file \
poster.jpg


Serving🔗

So, we got a bunch of static files we can upload anywhere. Yet two details are important:

• When serving from another domain, CORS needs to be configured to allow GET requests. Adding Access-Control-Allow-Origin: * to response headers is enough.4
• Some clients may be picky about the MIME types. Ensure files are served with the ones in the table below.
Kind Extension MIME type
Playlists .m3u8 application/vnd.apple.mpegurl
MPEG2-TS segments .ts video/mp2t
fMP4 segments .mp4 video/mp4
Progressive MP4 .mp4 video/mp4
Poster .jpg image/jpeg

Let’s host our files on Exoscale’s Object Storage which is compatible with S3 and located in Switzerland. As an example, the Caminandes 3: Llamigos video is about 213 MiB (five sizes for HLS and one progressive MP4). It would cost us less than 0.01 € per month for storage and 1.42 € for bandwidth if 1000 people watch the 1080p version from beginning to end—unlikely.5

We use s3cmd to upload files. First, you need to recover your API credentials from the portal and put them in ~/.s3cfg:

[default]
host_base = sos-ch-dk-2.exo.io
host_bucket = %(bucket)s.sos-ch-dk-2.exo.io
access_key = EXO.....
secret_key = ....
use_https = True
bucket_location = ch-dk-2


The second step is to create a bucket:

$s3cmd mb s3://hls-videos Bucket 's3://hls-videos/' created  You need to configure the CORS policy for this bucket. First, define the policy in a cors.xml file (you may want to restrict the allowed origin): <CORSConfiguration> <CORSRule> <AllowedOrigin>*</AllowedOrigin> <AllowedMethod>GET</AllowedMethod> </CORSRule> </CORSConfiguration>  Then, apply it to the bucket: $ s3cmd setcors cors.xml s3://hls-videos


The last step is to copy the static files. Playlists are served compressed to save a bit of bandwidth. For each video, inside the directory containing all the generated files, use the following command:

while read extension mime gz; do
[ -z "$gz" ] || { # gzip compression (if not already done) for f in *.${extension}; do
! gunzip -t $f 2> /dev/null || continue gzip$f
mv $f.gz$f
done
}
s3cmd --no-preserve -F -P \
${gz:+--add-header=Content-Encoding:gzip} \ --mime-type=${mime} \
--encoding=UTF-8 \
--exclude=* --include=*.${extension} \ --delete-removed \ sync . s3://hls-videos/video1/ done <<EOF m3u8 application/vnd.apple.mpegurl true jpg image/jpeg mp4 video/mp4 ts video/mp2t EOF  The files are now available at https://hls-videos.sos-ch-dk-2.exo.io/video1/. HTML🔗 We can insert our video in a document with the following markup: <video poster="https://hls-videos.sos-ch-dk-2.exo.io/video1/poster.jpg" controls preload="none"> <source src="https://hls-videos.sos-ch-dk-2.exo.io/video1/index.m3u8" type="application/vnd.apple.mpegurl"> <source src="https://hls-videos.sos-ch-dk-2.exo.io/video1/progressive.mp4" type='video/mp4; codecs="avc1.4d401f, mp4a.40.2"'> </video>  Browsers with native support use the HLS version while others would fall back to the progressive MP4 version. However, with the help of hls.js, we can ensure most browsers benefit from the HLS version too: <script src="https://cdn.jsdelivr.net/npm/hls.js@latest"></script> <script> if(Hls.isSupported()) { var selector = "video source[type='application/vnd.apple.mpegurl']", videoSources = document.querySelectorAll(selector); videoSources.forEach(function(videoSource) { var once = false; // Clone the video to remove any source var oldVideo = videoSource.parentNode, newVideo = oldVideo.cloneNode(false); // Replace video tag with our clone. oldVideo.parentNode.replaceChild(newVideo, oldVideo); // On play, initialize hls.js, once. newVideo.addEventListener('play',function() { if (!once) return; once = true; var hls = new Hls({ capLevelToPlayerSize: true }); hls.loadSource(m3u8); hls.attachMedia(newVideo); hls.on(Hls.Events.MANIFEST_PARSED, function() { newVideo.play(); }); }, false); }); } </script>  Here is the result, featuring Caminandes 3: Llamigos, a video created by Pablo Vasquez, produced by the Blender Foundation and released under the Creative Commons Attribution 3.0 license: Most JavaScript attributes, methods and events work just like with a plain <video> element. For example, you can seek to an arbitrary position, like 1:00 or 2:00—but you would need to enable JavaScript to test. The player is different from one browser to another but provides the basic needs. You can upgrade to a more advanced player, like video.js or MediaElements.js. They also handle HLS videos through hls.js. Hosting your videos on YouTube is not unavoidable: serving them yourself while offering quality delivery is technically affordable. If bandwidth requirements are modest and the network effect not important, self-hosting makes it possible to regain control of the published content and not to turn over readers to Google. In the same spirit, PeerTube offers a video sharing platform. Decentralized and federated, it relies on BitTorrent to reduce bandwidth requirements. Addendum🔗 Preloading🔗 In the above example, preload="none" was used for two reasons: • Most readers won’t play the video as it is an addon to the main content. Therefore, bandwidth is not wasted by downloading a few segments of video, at the expense of slightly increased latency on play. • We do not want non-native HLS clients to start downloading the non-HLS version while hls.js is loading and taking over the video. This could also be done by declaring the progressive MP4 fallback from JavaScript, but this would make the video unplayable for users without JavaScript. If preloading is important, you can remove the preload attribute from JavaScript—and not wait for the play event to initialize hls.js. CSP🔗 Setting up CSP correctly can be quite a pain. For browsers with native HLS support, you need the following policy, in addition to your existing policy: • image-src https://hls-videos.sos-ch-dk-2.exo.io for the posters, • media-src https://hls-videos.sos-ch-dk-2.exo.io for the playlists and media segments. With hls.js, things are more complex. Ideally, the following policy should also be applied: • worker-src blob: for the transmuxing web worker, • media-src blob: for the transmuxed segments, • connect-src https://hls-videos.sos-ch-dk-2.exo.io to fetch playlists and media segments from JavaScript. However, worker-src is quite recent. The expected fallbacks are child-src (deprecated), script-src (but not everywhere) and then default-src. Therefore, for broader compatibility, you also need to append blob: to default-src as well as to script-src and child-src if you already have them. Here is an example policy—assuming the original policy was just default-src 'self' and media, XHR and workers were not needed: HTTP/1.0 200 OK Content-Security-Policy: default-src 'self' blob:; image-src 'self' https://hls-videos.sos-ch-dk-2.exo.io; media-src blob: https://hls-videos.sos-ch-dk-2.exo.io; connect-src https://hls-videos.sos-ch-dk-2.exo.io; worker-src blob:;  1. YouTube gives you the choice to not display ads on your videos. In advanced settings, you can unselect “Allow advertisements to be displayed alongside my videos.” Alternatively, you can also monetize your videos. ↩︎ 2. Nowadays, everything supports MP4/H.264. It usually also brings hardware acceleration, which improves battery life on mobile devices. WebM/VP9 provides a better quality at the same bitrate. ↩︎ 3. You could generate both formats and use them as variants in the master playlist. However, a limitation in hls.js prevents this option. ↩︎ 4. Use https://example.org instead of the wildcard character to restrict access to your own domain. ↩︎ 5. There is no need to host those files behind a (costly) CDN. Latency doesn’t matter much as long as you can sustain the appropriate bandwidth. ↩︎ April 16, 2018 Errata Security Notes on setting up Raspberry Pi 3 as WiFi hotspot I want to sniff the packets for IoT devices. There are a number of ways of doing this, but one straightforward mechanism is configuring a "Raspberry Pi 3 B" as a WiFi hotspot, then running tcpdump on it to record all the packets that pass through it. Google gives lots of results on how to do this, but they all demand that you have the precise hardware, WiFi hardware, and software that the authors do, so that's a pain. I got it working using the instructions here. There are a few additional notes, which is why I'm writing this blogpost, so I remember them. https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md I'm using the RPi-3-B and not the RPi-3-B+, and the latest version of Raspbian at the time of this writing, "Raspbian Stretch Lite 2018-3-13". Some things didn't work as described. The first is that it couldn't find the package "hostapd". That solution was to run "apt-get update" a second time. The second problem was error message about the NAT not working when trying to set the masquerade rule. That's because the 'upgrade' updates the kernel, making the running system out-of-date with the files on the disk. The solution to that is make sure you reboot after upgrading. Thus, what you do at the start is: apt-get update apt-get upgrade apt-get update shutdown -r now Then it's just "apt-get install tcpdump" and start capturing on wlan0. This will get the non-monitor-mode Ethernet frames, which is what I want. My letter urging Georgia governor to veto anti-hacking bill February 16, 2018 Office of the Governor 206 Washington Street 111 State Capitol Atlanta, Georgia 30334 Re: SB 315 Dear Governor Deal: I am writing to urge you to veto SB315, the "Unauthorized Computer Access" bill. The cybersecurity community, of which Georgia is a leader, is nearly unanimous that SB315 will make cybersecurity worse. You've undoubtedly heard from many of us opposing this bill. It does not help in prosecuting foreign hackers who target Georgian computers, such as our elections systems. Instead, it prevents those who notice security flaws from pointing them out, thereby getting them fixed. This law violates the well-known Kirchhoff's Principle, that instead of secrecy and obscurity, that security is achieved through transparency and openness. That the bill contains this flaw is no accident. The justification for this bill comes from an incident where a security researcher noticed a Georgia state election system had made voter information public. This remained unfixed, months after the vulnerability was first disclosed, leaving the data exposed. Those in charge decided that it was better to prosecute those responsible for discovering the flaw rather than punish those who failed to secure Georgia voter information, hence this law. Too many security experts oppose this bill for it to go forward. Signing this bill, one that is weak on cybersecurity by favoring political cover-up over the consensus of the cybersecurity community, will be part of your legacy. I urge you instead to veto this bill, commanding the legislature to write a better one, this time consulting experts, which due to Georgia's thriving cybersecurity community, we do not lack. Thank you for your attention. Sincerely, Robert Graham (formerly) Chief Scientist, Internet Security Systems ma.ttias.be Upcoming presentation at LOADays: Varnish Internals – Speeding up a site x100 The post Upcoming presentation at LOADays: Varnish Internals – Speeding up a site x100 appeared first on ma.ttias.be. I'll be speaking at LOADays next Sunday about Varnish. If you happen to be around, come say hi -- I'll be there all day! Varnish Internals -- Speeding up a site x100 In this talk we'll look at the internals of Varnish, a reverse proxy with powerful caching abilities. We'll walk through an HTTP request end-to-end, manipulate it change it in ways that no one should ever do in production -- but it'll proof how powerful Varnish can be. Varnish is a load balancer, caching engine, its own scripting language and a fun way to deep-dive in to the HTTP protocol. The post Upcoming presentation at LOADays: Varnish Internals – Speeding up a site x100 appeared first on ma.ttias.be. Errata Security Let's stop talking about password strength  Picture from EFF -- CC-BY license Near the top of most security recommendations is to use "strong passwords". We need to stop doing this. Yes, weak passwords can be a problem. If a website gets hacked, weak passwords are easier to crack. It's not that this is wrong advice. On the other hand, it's not particularly good advice, either. It's far down the list of important advice that people need to remember. "Weak passwords" are nowhere near the risk of "password reuse". When your Facebook or email account gets hacked, it's because you used the same password across many websites, not because you used a weak password. Important websites, where the strength of your password matters, already take care of the problem. They use strong, salted hashes on the backend to protect the password. On the frontend, they force passwords to be a certain length and a certain complexity. Maybe the better advice is to not trust any website that doesn't enforce stronger passwords (minimum of 8 characters consisting of both letters and non-letters). To some extent, this "strong password" advice has become obsolete. A decade ago, websites had poor protection (MD5 hashes) and no enforcement of complexity, so it was up to the user to choose strong passwords. Now that important websites have changed their behavior, such as using bcrypt, there is less onus on the user. But the real issue here is that "strong password" advice reflects the evil, authoritarian impulses of the infosec community. Instead of measuring insecurity in terms of costs vs. benefits, risks vs. rewards, we insist that it's an issue of moral weakness. We pretend that flaws happen because people are greedy, lazy, and ignorant. We pretend that security is its own goal, a benefit we should achieve, rather than a cost we must endure. We like giving moral advice because it's easy: just be "stronger". Discussing "password reuse" is more complicated, forcing us discuss password managers, writing down passwords on paper, that it's okay to reuse passwords for crappy websites you don't care about, and so on. What I'm trying to say is that the moral weakness here is us. Rather then give pertinent advice we give lazy advice. We give the advice that victim shames them for being weak while pretending that we are strong. So stop telling people to use strong passwords. It's crass advice on your part and largely unhelpful for your audience, distracting them from the more important things. Raymii.org OpenVMS 7.3 install log with simh VAX on Ubuntu 16.04 Using a guide I was able to install OpenVMS 7.3 for VAX on simh on Ubuntu 16.04. This is a copy-paste of my terminal for future reference. This is not one of my usual articles, a guide with comprehensive information an background. Just a log of my terminal. April 15, 2018 Raymii.org File versioning and deleting on OpenVMS with DELETE and PURGE I'm now a few weeks into my OpenVMS adventure and my home folder on the [DECUS](http://decus.org) system is quite cluttered with files. More specifically, with different versions of files, since OpenVMS by default has file versioning built in. This means that when you edit a file, or copy a file over an existing file, the old file is not overwritten but a new file with a new version is written. The old file still is there. This is one of the best things in my humble opinion so far on OpenVMS, but it does require maintenance to not have the disk get filled up fast. This article goes into the PURGE and DELETE commands which help you deal with file versioning and removal. April 11, 2018 Everything Sysadmin Accelerate NYC Launch Party, Saturday, April 21 The NYC launch event for Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim will be held this Saturday from 11am-2pm. All are invited. Space is limited. Please RSVP at EventBrite. I'm super excited by this book for two reasons: (1) It explains the business case for devops in a way that speaks to executives. (2) It is based on real data with statistical correlation that show real cause and effect. I'll be at this event. I hope to see you there too! Steve Kemp's Blog Bread and data For the past two weeks I've mostly been baking bread. I'm not sure what made me decide to make some the first time, but it actually turned out pretty good so I've been doing every day or two ever since. This is the first time I've made bread in the past 20 years or so - I recall in the past I got frustrated that it never rose, or didn't turn out well. I can't see that I'm doing anything differently, so I'll just write it off as younger-Steve being daft! No doubt I'll get bored of the delicious bread in the future, but for the moment I've got a good routine going - juggling going to the shops, child-care, and making bread. Bread I've made includes the following: Beyond that I've spent a little while writing a simple utility to embed resources in golang projects, after discovering the tool I'd previously been using, go-bindata, had been abandoned. In short you feed it a directory of files and it will generate a file static.go with contents like this: files[ "data/index.html" ] = "<html>.... files[ "data/robots.txt" ] = "User-Agent: * ..."  It's a bit more complex than that, but not much. As expected getting the embedded data at runtime is trivial, and it allows you to distribute a single binary even if you want/need some configuration files, templates, or media to run. For example in the project I discussed in my previous post there is a HTTP-server which serves a user-interface based upon bootstrap. I want the HTML-files which make up that user-interface to be embedded in the binary, rather than distributing them seperately. Anyway it's not unique, it was a fun experience writing, and I've switched to using it now: April 09, 2018 Everything Sysadmin ZFS Users Conference, April 19-20, Norwalk, CT Datto will be hosting the 2nd annual ZFS User Conference featuring ZFS co-creator, Matt Ahrens! The date is April 19-20 at Datto HQ in Norwalk, CT. This conference will focus on the deployment, administration, features, and tuning of the ZFS filesystem. Learn about OpenZFS and network with folks running businesses and interesting projects on ZFS. For more information and registration see http://zfs.datto.com (I won't be attending as I'm not longer using ZFS, but I'm still a ZFS fanboy so I felt like promoting this.) April 08, 2018 Electricmonk.nl Multi-git-status now shows branches with no upstream Just a quick update on Multi-git-status. It now also shows branches with no upstream. These are typically branches created locally that haven't been configured to track a local or remote branch. Any changes in those branches are lost when the repo is removed from your machine. Additionally, multi-git-status now handles branches with slashes in them properly. For example, "feature/loginscreen". Here's how the output looks now: You can get multi-git-status from the Github page. April 07, 2018 Sarah Allen zero-knowledge proof: trust without shared secrets In cryptography we typically share a secret which allows us to decrypt future messages. Commonly this is a password that I make up and submit to a Web site, then later produce to verify I am the same person. I missed Kazue Sako’s Zero Knowledge Proofs 101 presentation at IIW last week, but Rachel Myers shared an impressively simply retelling in the car on the way back to San Francisco, which inspired me to read the notes and review the proof for myself. I’ve attempted to reproduce this simple explanation below, also noting additional sources and related articles. Zero Knowledge Proofs (ZPKs) are very useful when applied to internet identity — with an interactive exchange you can prove you know a secret without actually revealing the secret. Understanding Zero Knowledge Proofs with simple math: x -> f(x) Simple one way function. Easy to go one way from x to f(x) but mathematically hard to go from f(x) to x. The most common example is a hash function. Wired: What is Password Hashing? provides an accessible introduction to why hash functions are important to cryptographic applications today. f(x) = g ^ x mod p Known(public): g, p * g is a constant * p has to be prime Easy to know x and compute g ^ x mod p but difficult to do in reverse. Interactive Proof Alice wants to prove Bob that she knows x without giving any information about x. Bob already knows f(x). Alice can make f(x) public and then prove that she knows x through an interactive exchange with anyone on the Internet, in this case, Bob. 1. Alice publishes f(x): g^x mod p 2. Alice picks random number r 3. Alice sends Bob u = g^r mod p 4. Now Bob has artifact based on that random number, but can’t actually calculate the random number 5. Bob returns a challenge e. Either 0 or 1 6. Alice responds with v: If 0, v = r If 1, v = r + x 7. Bob can now calculate: If e == 0: Bob has the random number r, as well as the publicly known variables and can check if u == g^v mod p If e == 1: u*f(x) = g^v (mod p) I believe step 6 is true based on Congruence of Powers, though I’m not sure that I’ve transcribed e==1 case accurately with my limited ascii representation. If r is true random, equally distributed between zero and (p-1), this does not leak any information about x, which is pretty neat, yet not sufficient. In order to ensure that Alice cannot be impersonated, multiple iterations are required along with the use of large numbers (see IIW session notes). Further Reading Cryptography Engineering Hash-based Signatures: An illustrated Primer Over the past several years I’ve been privileged to observe two contradictory and fascinating trends. The first is that we’re finally starting to use the cryptography that researchers have spent the past forty years designing. We see this every day in examples ranging from encrypted messaging to phone security to cryptocurrencies. The second trend is that cryptographers are getting ready for all these good times to end. But before I get to all of that — much further below — let me stress that this is not a post about the quantum computing apocalypse, nor is it about the success of cryptography in the 21st century. Instead I’m going to talk about something much more wonky. This post will be about one of the simplest (and coolest!) cryptographic technologies ever developed: hash-based signatures. Hash-based signature schemes were first invented in the late 1970s by Leslie Lamport, and significantly improved by Ralph Merkle and others. For many years they were largely viewed as an interesting cryptographic backwater, mostly because they produce relatively large signatures (among other complications). However in recent years these constructions have enjoyed something of a renaissance, largely because — unlike signatures based on RSA or the discrete logarithm assumption — they’re largely viewed as resistant to serious quantum attacks like Shor’s algorithm. First some background. Background: Hash functions and signature schemes In order to understand hash-based signatures, it’s important that you have some familiarity with cryptographic hash functions. These functions take some input string (typically or an arbitrary length) and produce a fixed-size “digest” as output. Common cryptographic hash functions like SHA2, SHA3 or Blake2 produce digests ranging from 256 bits to 512 bits. In order for a function $H(\cdot)$ to be considered a ‘cryptographic’ hash, it must achieve some specific security requirements. There are a number of these, but here we’ll just focus on three common ones: 1. Pre-image resistance (sometimes known as “one-wayness”): given some output $Y = H(X)$, it should be time-consuming to find an input $X$ such that $H(X) = Y$. (There are many caveats to this, of course, but ideally the best such attack should require a time comparable to a brute-force search of whatever distribution $X$ is drawn from.) 2. Second-preimage resistance: This is subtly different than pre-image resistance. Given some input $X$, it should be hard for an attacker to find a different input $X'$ such that $H(X) = H(X')$. 3. Collision resistance: It should be hard to find any two values $X_1, X_2$ such that $H(X_1) = H(X_2)$. Note that this is a much stronger assumption than second-preimage resistance, since the attacker has complete freedom to find any two messages of its choice. The example hash functions I mentioned above are believed to provide all of these properties. That is, nobody has articulated a meaningful (or even conceptual) attack that breaks any of them. That could always change, of course, in which case we’d almost certainly stop using them. (We’ll discuss the special case of quantum attacks a bit further below.) Since our goal is to use hash functions to construct signature schemes, it’s also helpful to briefly review that primitive. A digital signature scheme is a public key primitive in which a user (or “signer”) generates a pair of keys, called the public key and private key. The user retains the private key, and can use this to “sign” arbitrary messages — producing a resulting digital signature. Anyone who has possession of the public key can verify the correctness of a message and its associated signature. From a security perspective, the main property we want from a signature scheme is unforgeability, or “existential unforgeability“. This requirement means that an attacker (someone who does not possess the private key) should not be able to forge a valid signature on a message that you did not sign. For more on the formal definitions of signature security, see this page. The Lamport One-Time Signature The first hash-based signature schemes was invented in 1979 by a mathematician named Leslie Lamport. Lamport observed that given only simple hash function — or really, a one-way function — it was possible to build an extremely powerful signature scheme. Powerful that is, provided that you only need to sign one message! More on this below. For the purposes of this discussion, let’s suppose we have the following ingredient: a hash function that takes in, say, 256-bit inputs and produces 256-bit outputs. SHA256 would be a perfect example of such a function. We’ll also need some way to generate random bits. Let’s imagine that our goal is to sign 256 bit messages. To generate our secret key, the first thing we need to do is generate a series of 512 separate random bitstrings, each of 256 bits in length. For convenience, we’ll arrange those strings into two separate lists and refer to each one by an index as follows: ${\bf sk_0} = sk^{0}_1, sk^{0}_2, \dots, sk^{0}_{256}$ ${\bf sk_1} = sk^{1}_1, sk^{1}_2, \dots, sk^{1}_{256}$ The lists $({\bf sk_0}, {\bf sk_1})$ represent the secret key that we’ll use for signing. To generate the public key, we now simply hash every one of those random strings using our function $H(\cdot)$. This produces a second pair of lists: ${\bf pk_0} = H(sk^{0}_1), H(sk^{0}_2), \dots, H(sk^{0}_{256})$ ${\bf pk_1} = H(sk^{1}_1), H(sk^{1}_2), \dots, H(sk^{1}_{256})$ We can now hand out our public key $({\bf pk_0}, {\bf pk_1})$ to the entire world. For example, we can send it to our friends, embed it into a certificate, or post it on Keybase. Now let’s say we want to sign a 256-bit message $M$ using our secret key. The very first thing we do is break up and represent $M$ as a sequence of 256 individual bits: $M_1, \dots, M_{256} \in \{0,1\}$ The rest of the signing algorithm is blindingly simple. We simply work through the message from the first bit to the last bit, and select a string from one of the two secret key list. The list we choose from depends on value of the message bit we’re trying to sign. Concretely, for i=1 to 256: if the $i^{th}$ message bit $M_i =0$, we grab the $i^{th}$ secret key string $(sk^{0}_i)$ from the ${\bf sk_0}$ list, and output that string as part of our signature. If the message bit $M_i = 1$ we copy the appropriate string $(sk^{1}_i)$ from the ${\bf sk_1}$ list. Having done this for each of the message bits, we concatenate all of the strings we selected. This forms our signature. Here’s a toy illustration of the process, where (for simplicity) the secret key and message are only eight bits long. Notice that each colored box below represents a different 256-bit random string: When a user — who already has the public key $({\bf pk_0}, {\bf pk_1})$ — receives a message $M$ and a signature, she can verify the signature easily. Let $s_i$ represent the $i^{th}$ component of the signature: for each such string. She simply checks the corresponding message bit $M_i$ and computes hash $H(s_i)$. If $M_i = 0$ the result should match the corresponding element from ${\bf pk_0}$. If $M_i = 1$ the result should match the element in ${\bf pk_1}$. The signature is valid if every single element of the signature, when hashed, matches the correct portion of the public key. Here’s an (admittedly) sketchy illustration of the verification process, for at least one signature component: If your initial impression of Lamport’s scheme is that it’s kind of insane, you’re both a bit right and a bit wrong. Let’s start with the negative. First, it’s easy to see that Lamport signatures and keys are be quite large: on the order of thousands of bits. Moreover — and much more critically — there is a serious security limitation on this scheme: each key can only be used to sign one message. This makes Lamport’s scheme an example of what’s called a “one time signature”. To understand why this restriction exists, recall that every Lamport signature reveals exactly one of the two possible secret key values at each position. If I only sign one message, the signature scheme works well. However, if I ever sign two messages that differ at any bit position $i$, then I’m going to end up handing out both secret key values for that position. This can be a problem. Imagine that an attacker sees two valid signatures on different messages. She may be able to perform a simple “mix and match” forgery attack that allows her to sign a third message that I never actually signed. Here’s how that might look in our toy example: The degree to which this hurts you really depends on how different the messages are and how many of them you’ve given the attacker to play with. But it’s rarely good news. So to sum up our observations about the Lamport signature scheme. It’s simple. It’s fast. And yet for various practical reasons it kind of sucks. Maybe we can do a little better. From one-time to many-time signatures: Merkle’s tree-based signature While the Lamport scheme is a good start, our inability to sign many messages with a single key is a huge drawback. Nobody was more inspired by this than Martin Hellman’s student Ralph Merkle. He quickly came up with a clever way to address this problem. While we can’t exactly retrace Merkle’s steps, let’s see if we can recover some of the obvious ideas. Let’s say our goal is to use Lamport’s signature to sign many messages — say $N$ of them. The most obvious approach is to simply generate $N$ different keypairs for the original Lamport scheme, then concatenate all the public keys together into one mega-key. (Mega-key is a technical term I just invented). If the signer holds on to all $N$ secret key components, she can now sign $N$ different messages by using exactly one secret Lamport key per message. This seems to solve the problem without ever requiring her to re-use a secret key. The verifier has all the public keys, and can verify all the received messages. No Lamport keys are ever used to sign twice. Obviously this approach sucks big time. Specifically, in this naive approach, signing $N$ times requires the signer to distribute a public key that is $N$ times as large as a normal Lamport public key. (She’ll also need to hang on to a similar pile of secret keys.) At some point people will get fed up with this, and probably $N$ won’t every get to be very large. Enter Merkle. What Merkle proposed was a way to retain the ability to sign $N$ different messages, but without the linear-cost blowup of public keys. Merkle’s idea worked like this: 1. First, generate $N$ separate Lamport keypairs. We can call those $(PK_1, SK_1), \dots, (PK_N, SK_N)$. 2. Next, place each public key at one leaf of a Merkle hash tree (see below), and compute the root of the tree. This root will become the “master” public key of the new Merkle signature scheme. 3. The signer retains all of the Lamport public and secret keys for use in signing. Merkle trees are described here. Roughly speaking, what they provide is a way to collect many different values such that they can be represented by a single “root” hash (of length 256 bits, using the hash function in our example). Given this hash, it’s possible to produce a simple “proof” that an element is in a given hash tree. Moreover, this proof has size that is logarithmic in the number of leaves in the tree. To sign the $i^{th}$ message, the signer simply selects the $i^{th}$ public key from the tree, and signs the message using the corresponding Lamport secret key. Next, she concatenates the resulting signature to the Lamport public key and tacks on a “Merkle proof” that shows that this specific Lamport public key is contained within the tree identified by the root (i.e., the public key of the entire scheme). She then transmits this whole collection as the signature of the message. (To verify a signature of this form, the verifier simply unpacks this “signature” as a Lamport signature, Lamport public key, and Merkle Proof. She verifies the Lamport signature against the given Lamport public key, and uses the Merkle Proof to verify that the Lamport public key is really in the tree. With these three objectives achieved, she can trust the signature as valid.) This approach has the disadvantage of increasing the “signature” size by more than a factor of two. However, the master public key for the scheme is now just a single hash value, which makes this approach scale much more cleanly than the naive solution above. As a final optimization, the secret key data can itself be “compressed” by generating all of the various secret keys using the output of a cryptographic pseudorandom number generator, which allows for the generation of a huge number of (apparently random) bits from a single short ‘seed’. Whew. Making signatures and keys (a little bit) more efficient Merkle’s approach allows any one-time signature to be converted into an $N$-time signature. However, his construction still requires us to use some underlying one-time signature like Lamport’s scheme. Unfortunately the (bandwidth) costs of Lamport’s scheme are still relatively high. There are two major optimizations that can help to bring down these costs. The first was also proposed by Merkle. We’ll cover this simple technique first, mainly because it helps to explain the more powerful approach. If you recall Lamport’s scheme, in order sign a 256-bit message we required a vector consisting of 512 separate secret key (and public key) bitstrings. The signature itself was a collection of 256 of the secret bitstrings. (These numbers were motivated by the fact that each bit of the message to be signed could be either a “0” or a “1”, and thus the appropriate secret key element would need to be drawn from one of two different secret key lists.) But here’s a thought: what if we don’t sign all of the message bits? Let’s be a bit more clear. In Lamport’s scheme we sign every bit of the message — regardless of its value — by outputting one secret string. What if, instead of signing both zero values and one values in the message, we signed only the message bits were equal to one? This would cut the public and secret key sizes in half, since we could get rid of the ${\bf sk_0}$ list entirely. We would now have only a single list of bitstrings $sk_1, \dots, sk_{256}$ in our secret key. For each bit position of the message where $M_i = 1$ we would output a string $sk_i$. For every position where $M_i = 0$ we would output… zilch. (This would also tend to reduce the size of signatures, since many messages contain a bunch of zero bits, and those would now ‘cost’ us nothing!) An obvious problem with this approach is that it’s horrendously insecure. Please do not implement this scheme! As an example, let’s say an attacker observes a (signed) message that begins with “1111…”, and she want to edit the message so it reads “0000…” — without breaking the signature. All she has to do to accomplish this is to delete several components of the signature! In short, while it’s very difficult to “flip” a zero bit into a one bit, it’s catastrophically easy to do the reverse. But it turns out there’s a fix, and it’s quite elegant. You see, while we can’t prevent an attacker from editing our message by turning one bits into zero bits, we can catch them. To do this, we tack on a simple “checksum” to the message, then sign the combination of the original message and the checksum. The signature verifier must verify the entire signature over both values, and also ensure that the received checksum is correct. The checksum we use is trivial: it consists of a simple binary integer that represents the total number of zero bits in the original message. If the attacker tries to modify the content of the message (excluding the checksum) in order to turn some one bit into a zero bit, the signature scheme won’t stop her. But this attack have the effect of increasing the number of zero bits in the message. This will immediately make the checksum invalid, and the verifier will reject the signature. Of course, a clever attacker might also try to mess with the checksum (which is also signed along with the message) in order to “fix it up” by increasing the integer value of the checksum. However — and this is critical — since the checksum is a binary integer, in order to increase the value of the checksum, she would always need to turn some zero bit of the checksum into a one bit. But since the checksum is also signed, and the signature scheme prevents this kind of change, the attacker has nowhere to go. (If you’re keeping track at home, this does somewhat increase the size of the ‘message’ to be signed. In our 256-bit message example, the checksum will require an additional eight bits and corresponding signature cost. However, if the message has many zero bits, the reduced signature size will typically still be a win.) Winternitz: Trading space for time The trick above reduces the public key size by half, and reduces the size of (some) signatures by a similar amount. That’s nice, but not really revolutionary. It still gives keys and signatures that are thousands of bits long. It would be nice if we could make a bigger dent in those numbers. The final optimization we’ll talk about was proposed by Robert Winternitz as a further optimization of Merkle’s technique above. In practical use it gives a 4-8x reduction in the size of signatures and public keys — at a cost of increasing both signing and verification time. Winternitz’s idea is an example of a technique called a “time-space tradeoff“. This term refers to a class of solutions in which space is reduced at the cost of adding more computation time (or vice versa). To explain Winternitz’s approach, it helps to ask the following question: What if, instead of signing messages composed of bits (0 or 1), we treated our messages as though they were encoded using larger symbol alphabets? For example, what if we signed four-bit ‘nibbles’? Or eight-bit bytes? In Lamport’s original scheme, we had two lists of bitstrings as part of the signing (and public) key. One was for signing zero message bits, and the other was for one bits. Now let’s say we want to sign bytes rather than bits. An obvious idea would be to increase the number of secret key lists (and public key) from two such list to 256 such lists — one list for each possible value of a message byte. The signer could work through the message one byte at a time, and pick from the much larger menu of key values. Unfortunately, this solution really stinks. It reduces the size of the signature by a factor of eight, at a cost of increasing the public and secret key size by a factor of 256. Even this might be fine if the public keys could be used for many signatures, but they can’t — when it comes to key re-use, this “byte signing version of Lamport” suffers from the same limitations as the original Lamport signature. All of which brings us to Winternitz’s idea. Since it’s too expensive to store and distribute 256 truly random lists, what if we generated those lists programatically only when we needed them? Winternitz’s idea was to generate single list of random seeds ${\bf sk_0} = (sk^0_1, \dots, sk^0_{256})$ for our initial secret key. Rather than generating additional lists randomly, he proposed to use the hash function $H()$ on each element of that initial secret key, in order to derive the next such list for the secret key: ${\bf sk_1} = (sk^{1}_1, \dots, sk^{1}_{256}) = (H(sk^{0}_1), \dots, H(sk^{0}_{256}))$. And similarly, one can use the hash function again on that list to get the next list ${\bf sk_2}$. And so on for many possible lists. This is helpful in that we now only need to store a single list of secret key values ${\bf sk_0}$, and we can derive all the others lists on-demand just by applying the hash function. But what about the public key? This is where Winternitz gets clever. Specifically, Winternitz proposed that the public key could be derived by applying the hash function one more time to the final secret key list. This would produce a single public key list ${\bf pk}$. (In practice we only need 255 secret key lists, since we can treat the final secret key list as the public key.) The elegance of this approach is that given any one of the possible secret key values it’s always possible to check it against the public key, simply by hashing forward multiple times and seeing if we reach a public key element. The whole process of key generation is illustrated below: To sign the first byte of a message, we would pick a value from the appropriate list. For example, if the message byte was “0”, we would output a value from ${\bf sk_0}$ in our signature. If the message byte was “20”, we would output a value from ${\bf sk_{20}}$. For bytes with the maximal value “255” we don’t have a secret key list. That’s ok: in this case we can output an empty string, or we can output the appropriate element of ${\bf pk}$. Note as well that in practice we don’t really need to store each of these secret key lists. We can derive any secret key value on demand given only the original list ${\bf sk}_0$. The verifier only holds the public key vector and (as mentioned above) simply hashes forward an appropriate number of times — depending on the message byte — to see whether the result is equal to the appropriate component of the public key. Like the Merkle optimization discussion in the previous section, the scheme as presented so far has a glaring vulnerability. Since the secret keys are related (i.e., $sk^{1}_{1} = H(sk^{0}_1)$), anyone who sees a message that signs the message “0” can easily change the corresponding byte of the message to a “1”, and update the signature to match. In fact, an attacker can increment the value of any byte(s) in the message. Without some check on this capability, this would allow very powerful forgery attacks. The solution to this problem is similar to the we discussed just above. To prevent an attacker from modifying the signature, the signer calculates and also signs a checksum of the original message bytes. The structure of this checksum is designed to prevent the attacker from incrementing any of the bytes, without invalidating the checksum. I won’t go into the gory details right now, but you can find them here. It goes without saying that getting this checksum right is critical. Screw it up, even a little bit, and some very bad things can happen to you. This would be particularly unpleasant if you deployed these signatures in a production system. Illustrated in one terrible picture, a 4-byte toy example of the Winternitz scheme looks like this: What are hash-based signatures good for? Throughout this entire discussion, we’ve mainly been talking about the how of hash-based signatures rather than the why of them. It’s time we addressed this. What’s the point of these strange constructions? One early argument in favor of hash-based signatures is that they’re remarkably fast and simple. Since they only require only the evaluation of a hash function and some data copying, from a purely computational cost perspective they’re highly competitive with schemes like ECDSA and RSA. This could hypothetically be important for lightweight devices. Of course, this efficiency comes at a huge tradeoff in bandwidth efficiency. However, there is more complicated reason for the (recent) uptick in attention to hash-based signature constructions. This stems from the fact that all of our public-key crypto is about to be broken. More concretely: the imminent arrival of quantum computers is going to have a huge impact on the security of nearly all of our practical signature schemes, ranging from RSA to ECDSA and so on. This is due to the fact that Shor’s algorithm (and its many variants) provides us with a polynomial-time algorithm for solving the discrete logarithm and factoring problems, which is likely to render most of these schemes insecure. Most implementations of hash-based signatures are not vulnerable to Shor’s algorithm. That doesn’t mean they’re completely immune to quantum computers, of course. The best general quantum attacks on hash functions are based on a search technique called Grover’s algorithm, which reduces the effective security of a hash function. However, the reduction in effective security is nowhere near as severe as Shor’s algorithm (it ranges between the square root and cube root), and so security can be retained by simply increasing the internal capacity and output size of the hash function. Hash functions like SHA3 were explicitly developed with large digest sizes to provide resilience against such attacks. So at least in theory, hash-based signatures are interesting because they provide us with a line of defense against future quantum computers — for the moment, anyway. What about the future? Note that so far I’ve only discussed some of the “classical” hash-based signature schemes. All of the schemes I described above were developed in the 1970s or early 1980s. This hardly brings us up to present day. After I wrote the initial draft of this article, a few people asked for pointers on more recent developments in the field. I can’t possibly give an exhaustive list here, but let me describe just a couple of the more recent ideas that others brought up (thanks to Zooko and Claudio Orlandi): Signatures without state. A limitation of all the signature schemes above is that they require the signer to keep state between signatures. In the case of one-time signatures the reasoning is obvious: you have to avoid using any key more than once. But even in the multi-time Merkle signature, you have to remember which leaf public key you’re using, so you can avoid using any leaf twice. Even worse, the Merkle scheme requires the signer to construct all the keypairs up front, so the total number of signatures is bounded. In the 1980s, Oded Goldreich pointed out that one can build signatures without these limitations. The idea is as follows: rather than generate all signatures up front, one can generate a short “certification tree” of one-time public keys. Each of these keys can be used to sign additional one-time public keys at a lower layer of the tree, and so on and so forth. Provided all of the private keys are generated deterministically using a single seed, this means that the full tree need not exist in full at key generation time, but can be built on-demand whenever a new key is generated. Each signature contains a “certificate chain” of signatures and public keys starting from the root and going down to a real signing keypair at the bottom of the tree. This technique allows for the construction of extremely “deep” trees with a vast (exponential) number of possible signing keys. This allows us to construct so many one-time public keys that if we pick a signing key randomly (or pseudorandomly), then with high probability the same signing will never be used twice. This is intuition, of course. For a highly optimized and specific instantiation of this idea, see the SPHINCS proposal by Bernstein et alThe concrete SPHINCS-256 instantiation gives signatures that are approximately 41KB in size. Picnic: post-quantum zero-knowledge based signatures. In a completely different direction lies Picnic. Picnic is based on a new non-interactive zero-knowledge proof system called ZKBoo. ZKBoo is a new ZK proof system that works on the basis of a technique called “MPC in the head”, where the prover uses multi-party computation to calculate a function with the prover himself. This is too complicated to explain in a lot of detail, but the end result is that one can then prove complicated statements using only hash functions. The long and short of it is that Picnic and similar ZK proof systems provide a second direction for building signatures out of hash functions. The cost of these signatures is still quite large — hundreds of kilobytes. But future improvements in the technique could substantially reduce this size. Epilogue: the boring security details If you recall a bit earlier in this article, I spent some time describing the security properties of hash functions. This wasn’t just for show. You see, the security of a hash-based signature depends strongly on which properties a hash function is able to provide. (And by implication, the insecurity of a hash-based signature depends on which properties of a hash function an attacker has managed to defeat.) Most original papers discussing hash-based signatures generally hang their security arguments on the preimage-resistance of the hash function. Intuitively, this seems pretty straightforward. Let’s take the Lamport signature as an example. Given a public key element $pk^{0}_1 = H(sk^{0}_1)$, an attacker who is able to compute hash preimages can easily recover a valid secret key for that component of the signature. This attack obviously renders the scheme insecure. However, this argument considers only the case where an attacker has the public key but has not yet seen a valid signature. In this case the attacker has a bit more information. They now have (for example) both the public key and a portion of the secret key: $pk^{0}_1 = H(sk^{0}_1)$ and $sk^{0}_1$. If such an attacker can find a second pre-image for the public key $pk^{0}_1$ she can’t sign a different message. But she has produced a new signature. In the strong definition of signature security (SUF-CMA) this is actually considered a valid attack. So SUF-CMA requires the slightly stronger property of second-preimage resistance. Of course, there’s a final issue that crops up in most practical uses of hash-based signature schemes. You’ll notice that the description above assumes that we’re signing 256-bit messages. The problem with this is that in real applications, many messages are longer than 256 bits. As a consequence, most people use the hash function $H()$ to first hash the message as $D = H(M)$ and then the sign the resulting value $D$ instead of the message. This leads to a final attack on the resulting signature scheme, since the existential unforgeability of the scheme now depends on the collision-resistance of the hash function. An attacker who can find two different messages $M_1 \ne M_2$ such that $H(M_1) = H(M_2)$ has now found a valid signature on two different messages. This leads to a trivial break of EUF-CMA security. April 06, 2018 ma.ttias.be Chalk Talk #2: how does Varnish work? The post Chalk Talk #2: how does Varnish work? appeared first on ma.ttias.be. In the first Chalk Talk video, we looked at what Varnish can do. In this second video, I explain how Varnish does this. As usual, if you like a Dutch written version, have a look at the company blog. Next videos will focus more on the technical internals, like how the hashing works, how to optimize your content & site and how to debug Varnish. The post Chalk Talk #2: how does Varnish work? appeared first on ma.ttias.be. April 05, 2018 Marios Zindilis A small web application with Angular5 and Django Django works well as the back-end of an application that uses Angular5 in the front-end. In my attempt to learn Angular5 well enough to build a small proof-of-concept application, I couldn't find a simple working example of a combination of the two frameworks, so I created one. I called this the Pizza Maker. It's available on GitHub, and its documentation is in the README. If you have any feedback for this, please open an issue on GitHub. Evaggelos Balaskas Nested Loops in Ansible Recently I needed to create a Nested Loop in Ansible. One of the possible issues I had to consider, was the backward compatibility with both Ansible v1 and Ansible v2. A few days after, Ansible 2.5 introduced the the loop keyword and you can read a comprehensive blog entry here: Loop: Plays in the future, items in the past. So here are my notes on the subject: Variables Below is a variable yaml file for testing purposes: vars.yml --- days: - Monday - Tuesday - Wednesday - Thursday - Friday - Saturday - Sunday months: - January - February - March - April - May - June - July - August - September - October - November - December Ansible v1 Let’s start with Ansible v1: # ansible --version ansible 1.9.6 configured module search path = None Playbook Below a very simple ansible-playbook example that supports nested loops: --- - hosts: localhost gather_facts: no vars_files: - vars.yml tasks: - name: "This is a simple test" debug: msg: "Day: {{ item[0] }} exist in Month: {{ item[1] }}" with_nested: - "{{ days }}" - "{{ months }}" This playbook doesnt do much. Prints a message for every day and every month. Ansible-Playbook Run locally the playbook by: # ansible-playbook nested.yml -c local -l localhost -i "localhost,"  the output: PLAY [localhost] ****************************** TASK: [This is a simple test] ***************** ok: [localhost] => (item=['Monday', 'January']) => { "item": [ "Monday", "January" ], "msg": "Day: Monday exist in Month: January" } ... ok: [localhost] => (item=['Sunday', 'December']) => { "item": [ "Sunday", "December" ], "msg": "Day: Sunday exist in Month: December" } PLAY RECAP ************************************* localhost : ok=1 changed=0 unreachable=0 failed=0  Messages There are seven (7) days and twelve (12) months, so the output must print: 7*12 = 84 messages. Counting the messages: # ansible-playbook nested.yml -c local -l localhost -i "localhost," | egrep -c msg 84 Time Measuring the time it needs to pass through the nested-loop: time ansible-playbook nested.yml -c local -l localhost -i "localhost," &> /dev/null  real 0m0.448s user 0m0.406s sys 0m0.040s 0.448s nice! Ansible v2 Running the same playbook in latest ansible: # ansible-playbook nested.yml -c local -l localhost seems to still work! Compatibility issues: Resolved! Counting the messages # ansible-playbook nested.yml | egrep -c msg 84 Time # time ansible-playbook nested.yml &> /dev/null  real 0m7.396s user 0m7.575s sys 0m0.172s 7.396s !!! that is 7seconds more than ansible v1. Complex Loops The modern way, is to use the loop keyword with the nested lookup plugin: --- - hosts: localhost gather_facts: no vars_files: - vars.yml tasks: - name: "This is a simple test" debug: msg: "Day: {{ item[0] }} exist in Month: {{ item[1] }}" loop: "{{ lookup('nested', days, month) }}" Time # time ansible-playbook lookup_loop.yml &> /dev/null  real 0m7.975s user 0m8.169s sys 0m0.177s 7.623s Tag(s): ansible April 03, 2018 ma.ttias.be Varnish: same hash, different results? Check the Vary header! The post Varnish: same hash, different results? Check the Vary header! appeared first on ma.ttias.be. I'll admit I get bitten by the Vary header once every few months. It's something a lot of CMS's randomly add, and it has a serious impact on how Varnish handles and treats requests. For instance, here's a request I was troubleshooting that had these varnishlog hash() data: - VCL_call HASH - Hash "/images/path/to/file.jpg%00" - Hash "http%00" - Hash "www.yoursite.tld%00" - Hash "/images/path/to/file.jpg.jpg%00" - Hash "www.yoursite.tld%00" - VCL_return lookup - VCL_call MISS  A new request, giving the exact same hashing data, would return a different page from the cache/backend. So why does a request with the same hash return different data? Let me introduce the Vary header. In this case, the page I was requesting added the following header: Vary: Accept-Encoding,User-Agent This instructs Varnish to keep a separate version each page for every value of Accept-Encoding and User-Agent it finds. The Accept-Encoding would make sense, but Varnish already handles that internally. A gziped/plain version will return different data, that makes sense. There's no real point in adding that header for Varnish, but other proxies in between might still benefit from it. The User-Agent is plain nonsense, why would you serve a different version of a page per browser? If you consider a typical User-Agent string to contain text like Mozilla/5.0 (Macintosh; Intel Mac OS X...) AppleWebKit/537.xx (KHTML, like Gecko) Chrome/65.x.y.z Safari/xxx, that's practically unique per visitor you have. So, quick hack in this case, I remove the Vary header altogether. sub vcl_backend_response { unset beresp.http.Vary; ... }  No more variations of the cache based on what a random CMS does or says. The post Varnish: same hash, different results? Check the Vary header! appeared first on ma.ttias.be. Evaggelos Balaskas How to run Ansible2.5 on CentOS 5 [notes based on a docker centos5] # cat /etc/redhat-release CentOS release 5.11 (Final) Setup Enviroment Install compiler: # yum -y install gcc make Install zlib headers: # yum -y install zlib-devel Install tools: # yum -y install curl unzip SSL/TLS Errors If you are on a CentOS 5x machine, when trying to download files from the internet, you will get this error msg: This is a brown out of TLSv1 support. TLSv1 support is going away soon, upgrade to a TLSv1.2+ capable client. or SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert protocol version that is because CentOS 5x has an old cipher suite that doesnt work with today’s standards. OpenSSL To bypass these SSL/TLS errors, we need to install a recent version of openssl. # cd /root/ # curl -LO https://www.openssl.org/source/openssl-1.0.2o.tar.gz # tar xf openssl*.tar.gz # cd openssl* # ./Configure shared linux-x86_64 # make # make install The output has a useful info: OpenSSL shared libraries have been installed in: /usr/local/ssl So, we have to update the system’s library paths, to include this one: # echo "/usr/local/ssl/lib/" >> /etc/ld.so.conf # /sbin/ldconfig Python 2.7 Download the latest Python2.7 # cd /root/ # curl -LO https://www.python.org/ftp/python/2.7.14/Python-2.7.14.tgz # tar xf Python*.tgz # cd Python* Install Python: # ./configure --prefix=/opt/Python27 --enable-shared # make # make install PATH # export PATH=/opt/Python27/bin/:$PATH

# python -c "import ssl; print(ssl.OPENSSL_VERSION)"
OpenSSL 1.0.2o  27 Mar 2018

SetupTools

# cd /root/

# export PYTHONHTTPSVERIFY=0
# python -c 'import urllib; urllib.urlretrieve ("https://pypi.python.org/packages/72/c2/c09362ab29338413ab687b47dab03bab4a792e2bbb727a1eb5e0a88e3b86/setuptools-39.0.1.zip", "setuptools-39.0.1.zip")'


Install setuptools

# unzip setuptools*.zip
# cd setuptools*

# python2.7 setup.py build
# python2.7 setup.py install

PIP

Install PIP

# cd /root/

# easy_install pip

Searching for pip
Best match: pip 10.0.0b1
Processing pip-10.0.0b1-py2.py3-none-any.whl
Installing pip-10.0.0b1-py2.py3-none-any.whl to /opt/Python27/lib/python2.7/site-packages
writing requirements to /opt/Python27/lib/python2.7/site-packages/pip-10.0.0b1-py2.7.egg/EGG-INFO/requires.txt
Adding pip 10.0.0b1 to easy-install.pth file
Installing pip script to /opt/Python27/bin
Installing pip3.6 script to /opt/Python27/bin
Installing pip3 script to /opt/Python27/bin

Installed /opt/Python27/lib/python2.7/site-packages/pip-10.0.0b1-py2.7.egg
Processing dependencies for pip
Finished processing dependencies for pip


Ansible

Now, we are ready to install ansible

# pip install ansible

Collecting ansible

/opt/Python27/lib/python2.7/site-packages/pip-10.0.0b1-py2.7.egg/pip/_vendor/urllib3/util/ssl_.py:339: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
SNIMissingWarning
Using cached ansible-2.5.0-py2.py3-none-any.whl
Collecting paramiko (from ansible)
Using cached paramiko-2.4.1-py2.py3-none-any.whl
Collecting cryptography (from ansible)
Using cached cryptography-2.2.2-cp27-cp27m-manylinux1_x86_64.whl
Requirement already satisfied: setuptools in /opt/Python27/lib/python2.7/site-packages/setuptools-39.0.1-py2.7.egg (from ansible) (39.0.1)
Collecting PyYAML (from ansible)
Using cached PyYAML-3.12.tar.gz
Collecting jinja2 (from ansible)
Using cached Jinja2-2.10-py2.py3-none-any.whl
Collecting pyasn1>=0.1.7 (from paramiko->ansible)
Using cached pyasn1-0.4.2-py2.py3-none-any.whl
Collecting bcrypt>=3.1.3 (from paramiko->ansible)
Using cached bcrypt-3.1.4-cp27-cp27m-manylinux1_x86_64.whl
Collecting pynacl>=1.0.1 (from paramiko->ansible)
Using cached PyNaCl-1.2.1-cp27-cp27m-manylinux1_x86_64.whl
Collecting six>=1.4.1 (from cryptography->ansible)
Using cached six-1.11.0-py2.py3-none-any.whl
Collecting cffi>=1.7; platform_python_implementation != "PyPy" (from cryptography->ansible)
Using cached cffi-1.11.5-cp27-cp27m-manylinux1_x86_64.whl
Collecting enum34; python_version < "3" (from cryptography->ansible)
Using cached enum34-1.1.6-py2-none-any.whl
Collecting asn1crypto>=0.21.0 (from cryptography->ansible)
Using cached asn1crypto-0.24.0-py2.py3-none-any.whl
Collecting idna>=2.1 (from cryptography->ansible)
Using cached idna-2.6-py2.py3-none-any.whl
Collecting ipaddress; python_version < "3" (from cryptography->ansible)
Collecting MarkupSafe>=0.23 (from jinja2->ansible)
Using cached MarkupSafe-1.0.tar.gz
Collecting pycparser (from cffi>=1.7; platform_python_implementation != "PyPy"->cryptography->ansible)
Using cached pycparser-2.18.tar.gz
Installing collected packages: pyasn1, six, pycparser, cffi, bcrypt, enum34, asn1crypto, idna, ipaddress, cryptography, pynacl, paramiko, PyYAML, MarkupSafe, jinja2, ansible
Running setup.py install for pycparser ... done
Running setup.py install for ipaddress ... done
Running setup.py install for PyYAML ... done
Running setup.py install for MarkupSafe ... done

Successfully installed MarkupSafe-1.0 PyYAML-3.12 ansible-2.5.0 asn1crypto-0.24.0 bcrypt-3.1.4 cffi-1.11.5 cryptography-2.2.2 enum34-1.1.6 idna-2.6 ipaddress-1.0.19 jinja2-2.10 paramiko-2.4.1 pyasn1-0.4.2 pycparser-2.18 pynacl-1.2.1 six-1.11.0


Version

# ansible --version

ansible 2.5.0
config file = None
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /opt/Python27/lib/python2.7/site-packages/ansible
executable location = /opt/Python27/bin/ansible
python version = 2.7.14 (default, Mar 31 2018, 20:00:21) [GCC 4.1.2 20080704 (Red Hat 4.1.2-55)]

Ansible v2

# ansible -m ping localhost


localhost | SUCCESS => {
"changed": false,
"ping": "pong"
}


Ansible v1

or a previous version for testing

eg. 1.9.6

# pip install 'ansible==1.9.6'

# ansible --version

ansible 1.9.6
configured module search path = None

# yum -y install python-simplejson

# ansible localhost -c local -m ping -i "localhost,"

localhost | success >> {
"changed": false,
"ping": "pong"
}


Possible Building Error

When building python from source, setup.py will try to look for /usr/local/ssl/ directory to find the libraries and included headers of openssl. Althouth it works from _ssl.c , it doesnt for _hashlib.c.

To fix this problem, you must manual edit the Python-2.7.14/setup.py

 869                 ssl_incs += ['/usr/local/ssl/include']
870                 ssl_libs += ['/usr/local/ssl/lib']

the full code is:

 865         if have_any_openssl:
866             if have_usable_openssl:
867                 # The _hashlib module wraps optimized implementations
868                 # of hash functions from the OpenSSL library.
869                 ssl_incs += ['/usr/local/ssl/include']
870                 ssl_libs += ['/usr/local/ssl/lib']
871                 exts.append( Extension('_hashlib', ['_hashopenssl.c'],
872                                        include_dirs = ssl_incs,
873                                        library_dirs = ssl_libs,
874                                        libraries = ['ssl', 'crypto']) )
875             else:
876                 print ("warning: openssl 0x%08x is too old for _hashlib" %
877                        openssl_ver)
878                 missing.append('_hashlib')

hope that helps!

R.I.Pienaar

Adding rich object data types to Puppet

Extending Puppet using types, providers, facts and functions are well known and widely done. Something new is how to add entire new data types to the Puppet DSL to create entirely new language behaviours.

I’ve done a bunch of this recently with the Choria Playbooks and some other fun experiments, today I’ll walk through building a small network wide spec system using the Puppet DSL.

Overview

A quick look at what we want to achieve here, I want to be able to do Choria RPC requests and assert their outcomes, I want to write tests using the Puppet DSL and they should run on a specially prepared environment. In my case I have a AWS environment with CentOS, Ubuntu, Debian and Archlinux machines:

Below I test the File Manager Agent:

• Get status for a known file and make sure it finds the file
• Create a brand new file, ensure it reports success
• Verify that the file exist and is empty using the status action

 cspec::suite("filemgr agent tests", $fail_fast,$report) |$suite| { # Checks an existing file$suite.it("Should get file details") |$t| {$results = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.status", "nodes" => $nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "properties" => { "file" => "/etc/hosts" } )$t.assert_task_success($results)$results.each |$result| {$t.assert_task_data_equals($result,$result["data"]["present"], 1) } }   # Make a new file and check it exists $suite.it("Should support touch") |$t| { $fname = sprintf("/tmp/filemgr.%s", strftime(Timestamp(), "%s"))$r1 = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.touch", "nodes" => $nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "fail_ok" => true, "properties" => { "file" =>$fname } )   $t.assert_task_success($r1)   $r2 = choria::task("mcollective", _catch_errors => true, "action" => "filemgr.status", "nodes" =>$nodes, "silent" => true, "fact_filter" => ["kernel=Linux"], "properties" => { "file" => $fname } )$t.assert_task_success($r2)$r2.each |$result| {$t.assert_task_data_equals($result,$result["data"]["present"], 1) $t.assert_task_data_equals($result, $result["data"]["size"], 0) } } }  I also want to be able to test other things like lets say discovery:   cspec::suite("${method} discovery method", $fail_fast,$report) |$suite| {$suite.it("Should support a basic discovery") |$t| {$found = choria::discover( "discovery_method" => $method, )$t.assert_equal($found.sort,$all_nodes.sort) } } 

So we want to make a Spec like system that can drive Puppet Plans (aka Choria Playbooks) and do various assertions on the outcome.

We want to run it with mco playbook run and it should write a JSON report to disk with all suites, cases and assertions.

Adding a new Data Type to Puppet

I’ll show how to add the Cspec::Suite data Type to Puppet. This comes in 2 parts: You have to describe the Type that is exposed to Puppet and you have to provide a Ruby implementation of the Type.

Describing the Objects

Here we create the signature for Cspec::Suite:

 # modules/cspec/lib/puppet/datatypes/cspec/suite.rb Puppet::DataTypes.create_type("Cspec::Suite") do interface <<-PUPPET attributes => { "description" => String, "fail_fast" => Boolean, "report" => String }, functions => { it => Callable[[String, Callable[Cspec::Case]], Any], } PUPPET   load_file "puppet_x/cspec/suite"   implementation_class PuppetX::Cspec::Suite end 

As you can see from the line of code cspec::suite(“filemgr agent tests”, $fail_fast,$report) |$suite| {….} we pass 3 arguments: a description of the test, if the test should fail immediately on any error or keep going and there to write the report of the suite to. This corresponds to the attributes here. A function that will be shown later takes these and make our instance. We then have to add our it() function which again takes a description and yields out Cspec::Case, it returns any value. When Puppet needs the implementation of this code it will call the Ruby class PuppetX::Cspec::Suite. Here is the same for the Cspec::Case:  # modules/cspec/lib/puppet/datatypes/cspec/case.rb Puppet::DataTypes.create_type("Cspec::Case") do interface <<-PUPPET attributes => { "description" => String, "suite" => Cspec::Suite }, functions => { assert_equal => Callable[[Any, Any], Boolean], assert_task_success => Callable[[Choria::TaskResults], Boolean], assert_task_data_equals => Callable[[Choria::TaskResult, Any, Any], Boolean] } PUPPET load_file "puppet_x/cspec/case" implementation_class PuppetX::Cspec::Case end  Adding the implementation The implementation is a Ruby class that provide the logic we want, I won’t show the entire thing with reporting and everything but you’ll get the basic idea:  # modules/cspec/lib/puppet_x/cspec/suite.rb module PuppetX class Cspec class Suite # Puppet calls this method when it needs an instance of this type def self.from_asserted_hash(description, fail_fast, report) new(description, fail_fast, report) end attr_reader :description, :fail_fast def initialize(description, fail_fast, report) @description = description @fail_fast = !!fail_fast @report = report @testcases = [] end # what puppet file and line the Puppet DSL is on def puppet_file_line fl = Puppet::Pops::PuppetStack.stacktrace[0] [fl[0], fl[1]] end def outcome { "testsuite" => @description, "testcases" => @testcases, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "success" => @testcases.all?{|t| t["success"]} } end # Writes the memory state to disk, see outcome above def write_report # ... end def run_suite Puppet.notice(">>>") Puppet.notice(">>> Starting test suite: %s" % [@description]) Puppet.notice(">>>") begin yield(self) ensure write_report end Puppet.notice(">>>") Puppet.notice(">>> Completed test suite: %s" % [@description]) Puppet.notice(">>>") end def it(description, &blk) require_relative "case" t = PuppetX::Cspec::Case.new(self, description) t.run(&blk) ensure @testcases << t.outcome end end end end  And here is the Cspec::Case:  # modules/cspec/lib/puppet_x/cspec/case.rb module PuppetX class Cspec class Case # Puppet calls this to make instances def self.from_asserted_hash(suite, description) new(suite, description) end def initialize(suite, description) @suite = suite @description = description @assertions = [] @start_location = puppet_file_line end # assert 2 things are equal and show sender etc in the output def assert_task_data_equals(result, left, right) if left == right success("assert_task_data_equals", "%s success" % result.host) return true end failure("assert_task_data_equals: %s" % result.host, "%s\n\n\tis not equal to\n\n %s" % [left, right]) end # checks the outcome of a choria RPC request and make sure its fine def assert_task_success(results) if results.error_set.empty? success("assert_task_success:", "%d OK results" % results.count) return true end failure("assert_task_success:", "%d failures" % [results.error_set.count]) end # assert 2 things are equal def assert_equal(left, right) if left == right success("assert_equal", "values matches") return true end failure("assert_equal", "%s\n\n\tis not equal to\n\n %s" % [left, right]) end # the puppet .pp file and line Puppet is on def puppet_file_line fl = Puppet::Pops::PuppetStack.stacktrace[0] [fl[0], fl[1]] end # show a OK message, store the assertions that ran def success(what, message) @assertions << { "success" => true, "kind" => what, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "message" => message } Puppet.notice("&#x2714;︎ %s: %s" % [what, message]) end # show a Error message, store the assertions that ran def failure(what, message) @assertions << { "success" => false, "kind" => what, "file" => puppet_file_line[0], "line" => puppet_file_line[1], "message" => message } Puppet.err("✘ %s: %s" % [what, @description]) Puppet.err(message) raise(Puppet::Error, "Test case %s fast failed: %s" % [@description, what]) if @suite.fail_fast end # this will show up in the report JSON def outcome { "testcase" => @description, "assertions" => @assertions, "success" => @assertions.all? {|a| a["success"]}, "file" => @start_location[0], "line" => @start_location[1] } end # invokes the test case def run Puppet.notice("==== Test case: %s" % [@description]) # runs the puppet block yield(self) success("testcase", @description) end end end end  Finally I am going to need a little function to create the suite – cspec::suite function, it really just creates an instance of PuppetX::Cspec::Suite for us.  # modules/cspec/lib/puppet/functions/cspec/suite.rb Puppet::Functions.create_function(:"cspec::suite") do dispatch :handler do param "String", :description param "Boolean", :fail_fast param "String", :report block_param return_type "Cspec::Suite" end def handler(description, fail_fast, report, &blk) suite = PuppetX::Cspec::Suite.new(description, fail_fast, report) suite.run_suite(&blk) suite end end  Bringing it together So that’s about it, it’s very simple really the code above is pretty basic stuff to achieve all of this, I hacked it together in a day basically. Lets see how we turn these building blocks into a test suite. I need a entry point that drives the suite – imagine I will have many different plans to run, one per agent and that I want to do some pre and post run tasks etc.  plan cspec::suite ( Boolean$fail_fast = false, Boolean $pre_post = true, Stdlib::Absolutepath$report, String $data ) {$ds = { "type" => "file", "file" => $data, "format" => "yaml" } # initializes the report cspec::clear_report($report)   # force a puppet run everywhere so PuppetDB is up to date, disables Puppet, wait for them to finish if $pre_post { choria::run_playbook("cspec::pre_flight", ds =>$ds) }   # Run our test suite choria::run_playbook("cspec::run_suites", _catch_errors => true, ds => $ds, fail_fast =>$fail_fast, report => $report ) .choria::on_error |$err| { err("Test suite failed with a critical error: ${err.message}") } # enables Puppet if$pre_post { choria::run_playbook("cspec::post_flight", ds => $ds) } # reads the report from disk and creates a basic overview structure cspec::summarize_report($report) } 

Here’s the cspec::run_suites Playbook that takes data from a Choria data source and drives the suite dynamically:

 plan cspec::run_suites ( Hash $ds, Boolean$fail_fast = false, Stdlib::Absolutepath $report, ) {$suites = choria::data("suites", $ds) notice(sprintf("Running test suites: %s",$suites.join(", ")))   choria::data("suites", $ds).each |$suite| { choria::run_playbook($suite, ds =>$ds, fail_fast => $fail_fast, report =>$report ) } } 

And finally a YAML file defining the suite, this file describes my AWS environment that I use to do integration tests for Choria and you can see there’s a bunch of other tests here in the suites list and some of them will take data like what nodes to expect etc.

 suites: - cspec::discovery - cspec::choria - cspec::agents::shell - cspec::agents::process - cspec::agents::filemgr - cspec::agents::nettest   choria.version: mcollective plugin 0.7.0   nettest.fqdn: puppet.choria.example.net nettest.port: 8140   discovery.all_nodes: - archlinux1.choria.example.net - centos7.choria.example.net - debian9.choria.example.net - puppet.choria.example.net - ubuntu16.choria.example.net   discovery.mcollective_nodes: - archlinux1.choria.example.net - centos7.choria.example.net - debian9.choria.example.net - puppet.choria.example.net - ubuntu16.choria.example.net   discovery.filtered_nodes: - centos7.choria.example.net - puppet.choria.example.net   discovery.fact_filter: operatingsystem=CentOS 

Conclusion

So this then is a rather quick walk through of extending Puppet in ways many of us would not have seen before. I spent about a day getting this all working which included figuring out a way to maintain the mutating report state internally etc, the outcome is a test suite I can run and it will thoroughly drive a working 5 node network and assert the outcomes against real machines running real software.

I used to have a MCollective integration test suite, but I think this is a LOT nicer mainly due to the Choria Playbooks and extensibility of modern Puppet.

Python3 - Yaml Example

Save the above yaml example to a file, eg. fruits.yml
Open the Python3 Interpreter and write:

$python3.6 Python 3.6.4 (default, Jan 5 2018, 02:35:40) [GCC 7.2.1 20171224] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from yaml import load >>> print(load(open("fruits.yml"))) {'fruits': ['Apple', 'Orange', 'Strawberry', 'Mango']} >>>  an alternative way is to write the above commands to a python file: from yaml import load print(load(open("fruits.yml"))) and run it from the console: $ python3 test.py
{'fruits': ['Apple', 'Orange', 'Strawberry', 'Mango']}

Instead of print we can use yaml dump:

eg.

import yaml

'fruits: [Apple, Orange, Strawberry, Mango]n'

The return type of yaml.load is a python dictionary:

type(load(open("fruits.yml")))
<class 'dict'>

Have that in mind.

Jinja2

Jinja2 is a modern and designer-friendly templating language for Python.

As a template engine, we can use jinja2 to build complex markup (or even text) output, really fast and efficient.

Here is an jinja2 template example:

I like these tasty fruits:
* {{ fruit }}

where {{ fruit }} is a variable.
Declaring the fruit variable with some value and the jinja2 template can generate the prefarable output.

python-jinja

In an archlinux box, the system-wide installation of this python package, can be done by typing:

$sudo pacman -S --noconfirm python-jinja Python3 - Jinja2 Example Below is a python3 - jinja2 example: import jinja2 template = jinja2.Template(""" I like these tasty fruits: * {{ fruit }} """) data = "Apple" print(template.render(fruit=data)) The output of this example is: I like these tasty fruits: * Apple File Template Reading the jinja2 template from a template file, is a little more complicated than before. Building the jinja2 enviroment is step one: env = jinja2.Environment(loader=jinja2.FileSystemLoader("./")) and Jinja2 is ready to read the template file: template = env.get_template("t.j2") The template file: t.j2 is a litle diferrent than before: I like these tasty fruits: {% for fruit in fruits -%} * {{ fruit }} {% endfor %} Yaml, Jinja2 and Python3 To render the template a dict of global variables must be passed. And parsing the yaml file the yaml.load returns a dictionary! So everything are in place. Compine everything together: from yaml import load from jinja2 import Environment, FileSystemLoader mydata = (load(open("fruits.yml"))) env = Environment(loader=FileSystemLoader("./")) template = env.get_template("t.j2") print(template.render(mydata)) and the result is: $ python3 test.py

I like these tasty fruits:
* Apple
* Orange
* Strawberry
* Mango


March 30, 2018

Steve Kemp's Blog

Rewriting some services in golang

The past couple of days I've been reworking a few of my existing projects, and converting them from Perl into Golang.

Bytemark had a great alerting system for routing alerts to different enginners, via email, SMS, and chat-messages. The system is called mauvealert and is available here on github.

The system is built around the notion of alerts which have different states (such as "pending", "raised", or "acknowledged"). Each alert is submitted via a UDP packet getting sent to the server with a bunch of fields:

• Source IP of the submitter (this is implicit).
• A human-readable ID such as "heartbeat", "disk-space-/", "disk-space-/root", etc.
• A raise-field.
• More fields here ..

Each incoming submission is stored in a database, and events are considered unique based upon the source+ID pair, such that if you see a second submission from the same IP, with the same ID, then any existing details are updated. This update-on-receive behaviour is pretty crucial to the way things work, especially when coupled with the "raise"-field.

A raise field might have values such as:

• +5m
• This alert will be raised in 5 minutes.
• now
• This alert will be raised immediately.
• clear
• This alert will be cleared immediately.

One simple way the system is used is to maintain heartbeat-alerts. Imagine a system sends the following message, every minute:

• id:heartbeat raise:+5m [source:1.2.3.4]
• The first time this is received by the server it will be recorded in the database.
• The next time this is received the existing event will be updated, and crucially the time to raise an alert will be bumped (i.e. it will become current-time + 5m).
• The next time the update is received the raise-time will also be bumped
• ..

At some point the submitting system crashes, and five minutes after the last submission the alert moves from "pending" to "raised" - which will make it visible in the web-based user-interface, and also notify an engineer.

With this system you could easily write trivial and stateless ad-hoc monitoring scripts like so which would raise/clear :

 curl https://example.com && send-alert --id http-example.com --raise clear --detail "site ok" || \
send-alert  --id http-example.com --raise now --detail "site down"


In short mauvealert allows aggregation of events, and centralises how/when engineers are notified. There's the flexibility to look at events, and send them to different people at different times of the day, decide some are urgent and must trigger SMSs, and some are ignorable and just generate emails .

(In mauvealert this routing is done by having a configuration file containing ruby, this attempts to match events so you could do things like say "If the event-id contains "failed-disc" then notify a DC-person, or if the event was raised from $important-system then notify everybody.) I thought the design was pretty cool, and wanted something similar for myself. My version, which I setup a couple of years ago, was based around HTTP+JSON, rather than UDP-messages, and written in perl: The advantage of using HTTP+JSON is that writing clients to submit events to the central system could easily and cheaply be done in multiple environments for multiple platforms. I didn't see the need for the efficiency of using binary UDP-based messages for submission, given that I have ~20 servers at the most. Anyway the point of this blog post is that I've now rewritten my simplified personal-clone as a golang project, which makes deployment much simpler. Events are stored in an SQLite database and when raised they get sent to me via pushover: The main difference is that I don't allow you to route events to different people, or notify via different mechanisms. Every raised alert gets sent to me, and only me, regardless of time of day. (Albeit via an pluggable external process such that you could add your own local logic.) I've written too much already, getting sidetracked by explaining how neat mauvealert and by extension purple was, but also I rewrote the Perl DNS-lookup service at https://dns-api.org/ in golang too: That had a couple of regressions which were soon reported and fixed by a kind contributor (lack of CORS headers, most obviously). March 27, 2018 LZone - Sysadmin Sequence definitions with kwalify After guess-trying a lot on how to define a simple sequence in kwalify (which I do use as a JSON/YAML schema validator) I want to share this solution for a YAML schema. So my use case is whitelisting certain keys and somehow ensuring their types. Using this I want to use kwalify to validate YAML files. Doing this for scalars are simple, but hashes and lists of scalar elements are not. Most problematic was the lists... Defining Arbitrary Scalar Sequences So how to define a list in kwalify? The user guide gives this example: --- list: type: seq sequence: - type: str  This gives us a list of strings. But many lists also contain numbers and some contain structured data. For my use case I want to exclude structured date AND allow numbers. So "type: any" cannot be used. Also "type: any" would'nt work because it would require defining the mapping for any, which in a validation use case where we just want to ensure the list as a type, we cannot know. The great thing is there is a type "text" which you can use to allow a list of strings or number or both like this: --- list: type: seq sequence: - type: text  Building a key name + type validation schema As already mentioned the need for this is to have a whitelisting schema with simple type validation. Below you see an example for such a schema: --- type: map mapping: "default_definition": &allow_hash type: map mapping: =: type: any "default_list_definition": &allow_list type: seq sequence: # Type text means string or number - type: text "key1": *allow_hash "key2": *allow_list "key3": type: str =: type: number range: { max: 29384855, min: 29384855 }  At the top there are two dummy keys "default_definition" and "default_list_definition" which we use to define two YAML references "allow_hash" and "allow_list" for generic hashes and scalar only lists. In the middle of the schema you see three keys which are whitelisted and using the references are typed as hash/list and also as a string. Finally for this to be a whitelist we need to refuse all other keys. Note that '=' as a key name stands for a default definition. Now we want to say: default is "not allowed". Sadly kwalify has no mechanism for this that allows expressing something like --- =: type: invalid  Therefore we resort to an absurd type definition (that we hopefully never use) for example a number that has to be exactly 29384855. All other keys not listed in the whitelist above, hopefully will fail to be this number an cause kwalify to throw an error. This is how the kwalify YAML whitelist works. PyPI does brownouts for legacy TLS Nice! Reading through the maintenance notices on my status page aggregator I learned that PyPI started intentionally blocking legacy TLS clients as a way of getting people to switch before TLS 1.0/1.1 support is gone for real. Here is a quote from their status page: In preparation for our CDN provider deprecating TLSv1.0 and TLSv1.1 protocols, we have begun rolling brownouts for these protocols for the first ten (10) minutes of each hour. During that window, clients accessing pypi.python.org with clients that do not support TLSv1.2 will receive an HTTP 403 with the error message "This is a brown out of TLSv1 support. TLSv1 support is going away soon, upgrade to a TLSv1.2+ capable client.". I like this action as a good balance of hurting as much as needed to help end users to stop putting of updates. March 26, 2018 Sean's IT Blog The Virtual Horizon Podcast Episode 2 – A Conversation with Angelo Luciani On this episode of The Virtual Horizon podcast, we’ll journey to the French Rivera for the 2017 Nutanix .Next EU conference. We’ll be joined by Angelo Luciani, Community Evangelist for Nutanix, to discuss blogging and the Virtual Design Master competition. Nutanix has two large conferences scheduled for 2018 – .Next in New Orleans in May 2018 and .Next EU in London at the end of November 2018. Show Credits: Podcast music is a derivative of Boogie Woogie Bed by Jason Shaw (audionatix.com) Licensed under Creative Commons: By Attribution 3.0 License http://creativecommons.org/licenses/by/3.0/ March 19, 2018 Vincent Bernat Integration of a Go service with systemd: socket activation In a previous post, I highlighted some useful features of systemd when writing a service in Go, notably to signal readiness and prove liveness. Another interesting bit is socket activation: systemd listens on behalf of the application and, on incoming traffic, starts the service with a copy of the listening socket. Lennart Poettering details in a blog post: If a service dies, its listening socket stays around, not losing a single message. After a restart of the crashed service it can continue right where it left off. If a service is upgraded we can restart the service while keeping around its sockets, thus ensuring the service is continously responsive. Not a single connection is lost during the upgrade. This is one solution to get zero-downtime deployment for your application. Another upside is you can run your daemon with less privileges—loosing rights is a difficult task in Go.1 The basics🔗 Let’s take back our nifty 404-only web server: package main import ( "log" "net" "net/http" ) func main() { listener, err := net.Listen("tcp", ":8081") if err != nil { log.Panicf("cannot listen: %s", err) } http.Serve(listener, nil) }  Here is the socket-activated version, using go-systemd: package main import ( "log" "net/http" "github.com/coreos/go-systemd/activation" ) func main() { listeners, err := activation.Listeners(true) // ❶ if err != nil { log.Panicf("cannot retrieve listeners: %s", err) } if len(listeners) != 1 { log.Panicf("unexpected number of socket activation (%d != 1)", len(listeners)) } http.Serve(listeners[0], nil) // ❷ }  In ❶, we retrieve the listening sockets provided by systemd. In ❷, we use the first one to serve HTTP requests. Let’s test the result with systemd-socket-activate: $ go build 404.go
$systemd-socket-activate -l 8000 ./404 Listening on [::]:8000 as 3.  In another terminal, we can make some requests to the service: $ curl '[::1]':8000
$curl '[::1]':8000 404 page not found  For a proper integration with systemd, you need two files: • a socket unit for the listening socket, and • a service unit for the associated service. We can use the following socket unit, 404.socket: [Socket] ListenStream = 8000 BindIPv6Only = both [Install] WantedBy = sockets.target  The systemd.socket(5) manual page describes the available options. BindIPv6Only = both is explicitely specified because the default value is distribution-dependent. As for the service unit, we can use the following one, 404.service: [Unit] Description = 404 micro-service [Service] ExecStart = /usr/bin/404  systemd knows the two files work together because they share the same prefix. Once the files are in /etc/systemd/system, execute systemctl daemon-reload and systemctl start 404.​socket. Your service is ready to accept connections! Handling of existing connections🔗 Our 404 service has a major shortcoming: existing connections are abruptly killed when the daemon is stopped or restarted. Let’s fix that! Waiting a few seconds for existing connections🔗 We can include a short grace period for connections to terminate, then kill remaining ones: // On signal, gracefully shut down the server and wait 5 // seconds for current connections to stop. done := make(chan struct{}) quit := make(chan os.Signal, 1) server := &http.Server{} signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) go func() { <-quit log.Println("server is shutting down") ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() server.SetKeepAlivesEnabled(false) if err := server.Shutdown(ctx); err != nil { log.Panicf("cannot gracefully shut down the server: %s", err) } close(done) }() // Start accepting connections. server.Serve(listeners[0]) // Wait for existing connections before exiting. <-done  Upon reception of a termination signal, the goroutine would resume and schedule a shutdown of the service: Shutdown() gracefully shuts down the server without interrupting any active connections. Shutdown() works by first closing all open listeners, then closing all idle connections, and then waiting indefinitely for connections to return to idle and then shut down. While restarting, new connections are not accepted: they sit in the listen queue associated to the socket. This queue is bounded and its size can be configured with the Backlog directive in the socket unit. Its default value is 128. You may keep this value, even when your service is expecting to receive many connections by second. When this value is exceeded, incoming connections are silently dropped. The client should automatically retry to connect. On Linux, by default, it will retry 5 times (tcp_syn_retries) in about 3 minutes. This is a nice way to avoid the herd effect you would experience on restart if you increased the listen queue to some high value. Waiting longer for existing connections🔗 If you want to wait for a very long time for existing connections to stop, you do not want to ignore new connections for several minutes. There is a very simple trick: ask systemd to not kill any process on stop. With KillMode = none, only the stop command is executed and all existing processes are left undisturbed: [Unit] Description = slow 404 micro-service [Service] ExecStart = /usr/bin/404 ExecStop = /bin/kill$MAINPID
KillMode  = none


If you restart the service, the current process gracefully shuts down for as long as needed and systemd spawns immediately a new instance ready to serve incoming requests with its own copy of the listening socket. On the other hand, we loose the ability to wait for the service to come to a full stop—either by itself or forcefully after a timeout with SIGKILL.

Waiting longer for existing connections (alternative)🔗

done := make(chan struct{})
quit := make(chan os.Signal, 1)
server := &http.Server{}
signal.Notify(quit,
syscall.SIGHUP,
// for stop or full restart:
syscall.SIGINT, syscall.SIGTERM)
go func() {
sig := <-quit
switch sig {
case syscall.SIGINT, syscall.SIGTERM:
// Shutdown with a time limit.
log.Println("server is shutting down")
ctx, cancel := context.WithTimeout(context.Background(),
15*time.Second)
defer cancel()
server.SetKeepAlivesEnabled(false)
if err := server.Shutdown(ctx); err != nil {
log.Panicf("cannot gracefully shut down the server: %s", err)
}
case syscall.SIGHUP: // ❶
// Execute a short-lived process and asks systemd to
// track it instead of us.
pid := detachedSleep()
daemon.SdNotify(false, fmt.Sprintf("MAINPID=%d", pid))
time.Sleep(time.Second) // Wait a bit for systemd to check the PID

// Wait without a limit for current connections to stop.
server.SetKeepAlivesEnabled(false)
if err := server.Shutdown(context.Background()); err != nil {
log.Panicf("cannot gracefully shut down the server: %s", err)
}
}
close(done)
}()

// Serve requests with a slow handler.
server.Handler = http.HandlerFunc(
func(w http.ResponseWriter, r *http.Request) {
time.Sleep(10 * time.Second)
})
server.Serve(listeners[0])

// Wait for all connections to terminate.
<-done
log.Println("server terminated")


The main difference is the handling of the SIGHUP signal in ❶: a short-lived decoy process is spawned and systemd is told to track it. When it dies, systemd will start a new instance. This method is a bit hacky: systemd needs the decoy process to be a child of PID 1 but Go cannot easily detach on its own. Therefore, we leverage a short Python helper, wrapped in a detachedSleep() function:2

// detachedSleep spawns a detached process sleeping
// one second and returns its PID.
func detachedSleep() uint64 {
py :=
import os
import time

pid = os.fork()
if pid == 0:
for fd in {0, 1, 2}:
os.close(fd)
time.sleep(1)
else:
print(pid)

cmd := exec.Command("/usr/bin/python3", "-c", py)
out, err := cmd.Output()
if err != nil {
log.Panicf("cannot execute sleep command: %s", err)
}
pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64)
if err != nil {
log.Panicf("cannot parse PID of sleep command: %s", err)
}
return pid
}


During reload, there may be a small period during which both the new and the old processes accept incoming requests. If you don’t want that, you can move the creation of the short-lived process outside the goroutine, after server.Serve(), or implement some synchronization mechanism. There is also a possible race-condition when we tell systemd to track another PID—see PR #7816.

The 404.service unit needs an update:

[Unit]
Description = slow 404 micro-service

[Service]
ExecStart    = /usr/bin/404
ExecReload   = /bin/kill -HUP $MAINPID Restart = always NotifyAccess = main KillMode = process  Each additional directive is significant: • ExecReload tells how to reload the process—by sending SIGHUP. • Restart tells to restart the process if it stops “unexpectedly”, notably on reload.3 • NotifyAccess specifies which process can send notifications, like a PID change. • KillMode tells to only kill the main identified process—others are left untouched. Zero-downtime deployment?🔗 Zero-downtime deployment is a difficult endeavor on Linux. For example, HAProxy had a long list of hacks until a proper—and complex—solution was implemented in HAproxy 1.8. How do we fare with our simple implementation? From the kernel point of view, there is a only one socket with a unique listen queue. This socket is associated to several file descriptors: one in systemd and one in the current process. The socket stays alive as long as there is at least one file descriptor. An incoming connection is put by the kernel in the listen queue and can be dequeued from any file descriptor with the accept() syscall. Therefore, this approach actually achieves zero-downtime deployment: no incoming connection is rejected. By contrast, HAProxy was using several different sockets listening to the same addresses, thanks to the SO_REUSEPORT option.4 Each socket gets its own listening queue and the kernel balances incoming connections between each queue. When a socket gets closed, the content of its queue is lost. If an incoming connection was sitting here, it would receive a reset. An elegant patch for Linux to signal a socket should not receive new connections was rejected. HAProxy 1.8 is now recycling existing sockets to the new processes through an Unix socket. I hope this post and the previous one show how systemd is a good sidekick for a Go service: readiness, liveness and socket activation are some of the useful features you can get to build a more reliable application. Addendum: decoy process using Go🔗 Update (2018.03) On /r/golang, it was pointed out to me that, in the version where systemd is tracking a decoy, the helper can be replaced by invoking the main executable. By relying on a change of environment, it assumes the role of the decoy. Here is such an implementation replacing the detachedSleep() function: func init() { // As early as possible, check if we should be the decoy. state := os.Getenv("__SLEEPY") os.Unsetenv("__SLEEPY") switch state { case "1": // First step, fork again. execPath := self() child, err := os.StartProcess( execPath, []string{execPath}, &os.ProcAttr{ Env: append(os.Environ(), "__SLEEPY=2"), }) if err != nil { log.Panicf("cannot execute sleep command: %s", err) } // Advertise child's PID and exit. Child will be // orphaned and adopted by PID 1. fmt.Printf("%d", child.Pid) os.Exit(0) case "2": // Sleep and exit. time.Sleep(time.Second) os.Exit(0) } // Not the sleepy helper. Business as usual. } // self returns the absolute path to ourselves. This relies on // /proc/self/exe which may be a symlink to a deleted path (for // example, during an upgrade). func self() string { execPath, err := os.Readlink("/proc/self/exe") if err != nil { log.Panicf("cannot get self path: %s", err) } execPath = strings.TrimSuffix(execPath, " (deleted)") return execpath } // detachedSleep spawns a detached process sleeping one second and // returns its PID. A full daemonization is not needed as the process // is short-lived. func detachedSleep() uint64 { cmd := exec.Command(self()) cmd.Env = append(os.Environ(), "__SLEEPY=1") out, err := cmd.Output() if err != nil { log.Panicf("cannot execute sleep command: %s", err) } pid, err := strconv.ParseUint(strings.TrimSpace(string(out)), 10, 64) if err != nil { log.Panicf("cannot parse PID of sleep command: %s", err) } return pid }  Addendum: identifying sockets by name🔗 For a given service, systemd can provide several sockets. To identify them, it is possible to name them. Let’s suppose we also want to return 403 error codes from the same service but on a different port. We add an additional socket unit definition, 403.socket, linked to the same 404.service job: [Socket] ListenStream = 8001 BindIPv6Only = both Service = 404.service [Install] WantedBy=sockets.target  Unless overridden with FileDescriptorName, the name of the socket is the name of the unit: 403.socket. go-systemd provides the ListenersWithNames() function to fetch a map from names to listening sockets: package main import ( "log" "net/http" "sync" "github.com/coreos/go-systemd/activation" ) func main() { var wg sync.WaitGroup // Map socket names to handlers. handlers := map[string]http.HandlerFunc{ "404.socket": http.NotFound, "403.socket": func(w http.ResponseWriter, r *http.Request) { http.Error(w, "403 forbidden", http.StatusForbidden) }, } // Get listening sockets. listeners, err := activation.ListenersWithNames(true) if err != nil { log.Panicf("cannot retrieve listeners: %s", err) } // For each listening socket, spawn a goroutine // with the appropriate handler. for name := range listeners { for idx := range listeners[name] { wg.Add(1) go func(name string, idx int) { defer wg.Done() http.Serve( listeners[name][idx], handlers[name]) }(name, idx) } } // Wait for all goroutines to terminate. wg.Wait() }  Let’s build the service and run it with systemd-socket-activate: $ go build 404.go
$systemd-socket-activate -l 8000 -l 8001 \ > --fdname=404.socket:403.socket \ > ./404 Listening on [::]:8000 as 3. Listening on [::]:8001 as 4.  In another console, we can make a request for each endpoint: $ curl '[::1]':8000

Choria

I run a custom build of Choria 0.0.11, I bump the max connections up to 100k and turned off SSL since we simply can’t provision certificates, so a custom build let me get around all that.

The real reason for the custom build though is that we compile in our agent into the binary so the whole deployment that goes out to all nodes and broker is basically what you see below, no further dependencies at all, this makes for quite a nice deployment story since we’re a bit challenged in that regard.

# a bit later than the image above $sudo netstat -anp|grep 22365|grep ESTAB|wc -l 58319 Outcome So how does work in practise? In the past we’d have had a lot of issues with getting consistency out of a network of even 10% this size, I was quite confident it was not the Ruby side, but you never know? Well, lets look at this one, I set discovery_timeout = 20 in my client configuration: $ mco rpc rpcutil ping --display failed
Finished processing 51152 / 51152 hosts in 20675.80 ms Finished processing 51152 / 51152 hosts in 20746.82 ms Finished processing 51152 / 51152 hosts in 20778.17 ms Finished processing 51152 / 51152 hosts in 22627.80 ms Finished processing 51152 / 51152 hosts in 20238.92 ms

That’s a huge huge improvement, and this is without fancy discovery methods or databases or anything – it’s the, generally fairly unreliable, broadcast based method of discovery. These same nodes on a big RabbitMQ cluster never gets a consistent result (and it’s 40 seconds slower), so this is a huge win for me.

I am still using the Ruby code here of course and it’s single threaded and stuck on 1 CPU, so in practise it’s going to have a hard ceiling of churning through about 2500 to 3000 replies/second, hence the long timeouts there.

I have a go based ping, it round trips this network in less than 3.5 seconds quite reliably – wow.

The broker peaked at 25Mbps at times when doing many concurrent RPC requests and pings etc, but it’s all just been pretty good with no surprises.

The ruby client is a bit big so as a final test I bumped the RAM on this node to 16GB. If I run 6 x RPC clients at exactly the same time doing a full estate RPC round trip (including broadcast based discovery) all 6 clients get exactly the same results consistently. So I guess I know the Ruby code was never the problem and I am very glad to see code I designed and wrote in 2009 scaling to this size – the Ruby client code really have never been touched after initial development.

March 05, 2018

R.I.Pienaar

Choria Progress Update

It’s been a while since I posted about Choria and where things are. There are major changes in the pipeline so it’s well overdue a update.

The features mentioned here will become current in the next release cycle – about 2 weeks from now.

New choria module

The current gen Choria modules grew a bit organically and there’s a bit of a confusion between the various modules. I now have a new choria module, it will consume features from the current modules and deprecate them.

On the next release it can manage:

1. Choria YUM and APT repos
2. Choria Package
3. Choria Network Broker
4. Choria Federation Broker

Network Brokers

We have had amazing success with the NATS broker, lightweight, fast, stable. It’s perfect for Choria. While I had a pretty good module to configure it I wanted to create a more singular experience. Towards that there is a new Choria Broker incoming that manages an embedded NATS instance.

To show what I am on about, imagine this is all that is required to configure a cluster of 3 production ready brokers capable of hosting 50k or more Choria managed nodes on modestly specced machines:

plugin.choria.broker_network = true plugin.choria.network.peers = nats://choria1.example.net:4223, nats://choria2.example.net:4223, nats://choria3.example.net:4223 plugin.choria.stats_address = ::

Of course there is Puppet code to do this for you in choria::broker.

That’s it, start the choria-broker daemon and you’re done – and ready to monitor it using Prometheus. Like before it’s all TLS and all that kinds of good stuff.

Federation Brokers

We had good success with the Ruby Federation Brokers but they also had issues particularly around deployment as we had to deploy many instances of them and they tended to be quite big Ruby processes.

The same choria-broker that hosts the Network Broker will now also host a new Golang based Federation Broker network. Configuration is about the same as before you don’t need to learn new things, you just have to move to the configuration in choria::broker and retire the old ones.

Unlike the past where you had to run 2 or 3 of the Federation Brokers per node you now do not run any additional processes, you just enable the feature in the singular choria-broker, you only get 1 process. Internally each run 10 instances of the Federation Broker, its much more performant and scalable.

Monitoring is done via Prometheus.

Previously we had all kinds of fairly bad schemes to manage registration in MCollective. The MCollective daemon would make requests to a registration agent, you’d designate one or more nodes as running this agent and so build either a file store, mongodb store etc.

This was fine at small size but soon enough the concurrency in large networks would overwhelm what could realistically be expected from the Agent mechanism to manage.

I’ve often wanted to revisit that but did not know what approach to take. In the years since then the Stream Processing world has exploded with tools like Kafka, NATS Streaming and offerings from GPC, AWS and Azure etc.

Data Adapters are hosted in the Choria Broker and provide stateless, horizontally and vertically scalable Adapters that can take data from Choria and translate and publish them into other systems.

Today I support NATS Streaming and the code is at first-iteration quality, problems I hope to solve with this:

• Very large global scale node metadata ingest
• IoT data ingest – the upcoming Choria Server is embeddable into any Go project and it can exfil data into Stream Processors using this framework
• Asynchronous RPC – replies to requests streaming into Kafka for later processing, more suitable for web apps etc
• Adhoc asynchronous data rewrites – we have had feature requests where person one can make a request but not see replies, they go into Elastic Search

Plugins

After 18 months of trying to get Puppet Inc to let me continue development on the old code base I have finally given up. The plugins are now hosted in their own GitHub Organisation.

I’ve released a number of plugins that were never released under Choria.

I’ve updated all their docs to be Choria specific rather than out dated install docs.

I’ve added Action Policy rules allowing read only actions by default – eg. puppet status will work for anyone, puppet runonce will give access denied.

I’ve started adding Playbooks the first ones are mcollective_agent_puppet::enable, mcollective_agent_puppet::disable and mcollective_agent_puppet::disable_and_wait.

Embeddable Choria

The new Choria Server is embeddable into any Go project. This is not a new area of research for me – this was actually the problem I tried to solve when I first wrote the current gen MCollective, but i never got so far really.

The idea is that if you have some application – like my Prometheus Streams system – where you will run many of a specific daemon each with different properties and areas of responsibility you can make that daemon connect to a Choria network as if it’s a normal Choria Server. The purpose of that is to embed into the daemon it’s life cycle management and provide an external API into this.

The above mentioned Prometheus Streams server for example have a circuit breaker that can start/stop the polling and replication of data:

$mco rpc prometheus_streams switch -T prometheus Discovering hosts using the mc method for 2 second(s) .... 1 * [ ============================================================> ] 1 / 1 prom.example.net Mode: poller Paused: true Summary of Mode: poller = 1 Summary of Paused: false = 1 Finished processing 1 / 1 hosts in 399.81 ms Here I am communicating with the internals of the Go process, they sit in their of Sub Collective, expose facts and RPC endpoints. I can use discovery to find all only nodes in certain modes, with certain jobs etc and perform functions you’d typically do via a REST management interface over a more suitable interface. Likewise I’ve embedded a Choria Server into IoT systems where it uses the above mentioned Data Adapters to publish temperature and humidity while giving me the ability to extract from those devices data in demand using RPC and do things like in-place upgrades of the running binary on my IoT network. You can use this today in your own projects and it’s compatible with the Ruby Choria you already run. A full walk through of doing this can be found in the ripienaar/embedded-choria-sample repository. March 04, 2018 Electricmonk.nl Lurch: a unixy launcher and auto-typer I cobbled together a unixy command / application launcher and auto-typer. I've dubbed it Lurch. Features: • Fuzzy filtering as-you-type. • Execute commands. • Open new browser tabs. • Auto-type into currently focussed window • Auto-type TOTP / rfc6238 / two-factor / Google Authenticator codes. • Unixy and composable. Reads entries from stdin. You can use and combine these features to do many things: • Auto-type passwords • Switch between currently opened windows by typing a part of its title (using wmctrl to list and switch to windows) • As a generic (and very customizable) application launcher by parsing .desktop entries or whatever. • Quickly cd to parts of your filesystem using auto-type. • Open browser tabs and search via google or specific search engines. • List all entries in your SSH configuration and quickly launch an ssh session to one of them. • Etc. You'll need a way to launch it when you press a keybinding. That's usually the window manager's job. For XFCE, you can add a keybinding under the Keyboard -> Application Shortcuts settings dialog. Here's what it looks like: Unfortunately, due to time constraints, I cannot provide any support for this project: NO SUPPORT: There is absolutely ZERO support on this project. Due to time constraints, I don't take bug or features reports and probably won't accept your pull requests. You can get it from the Github page. March 01, 2018 Anton Chuvakin - Security Warrior Monthly Blog Round-Up – February 2018 It is mildly shocking that I’ve been blogging for 13+ years (my first blog post on this blog was in December 2005, my old blog at O’Reilly predates this by about a year), so let’s spend a moment contemplating this fact. <contemplative pause here :-)> Here is my next monthly "Security Warrior" blog round-up of top 5 popular posts based on last month’s visitor data (excluding other monthly or annual round-ups): 1. “New SIEM Whitepaper on Use Cases In-Depth OUT!” (dated 2010) presents a whitepaper on select SIEM use cases described in depth with rules and reports [using now-defunct SIEM product]; also see this SIEM use case in depth and this for a more current list of popular SIEM use cases. Finally, see our 2016 research on developing security monitoring use cases here – and we just UPDATED IT FOR 2018. 2. Updated With Community Feedback SANS Top 7 Essential Log Reports DRAFT2” is about top log reports project of 2008-2013, I think these are still very useful in response to “what reports will give me the best insight from my logs?” 3. Why No Open Source SIEM, EVER?” contains some of my SIEM thinking from 2009 (oh, wow, ancient history!). Is it relevant now? You be the judge. Succeeding with SIEM requires a lot of work, whether you paid for the software, or not. BTW, this post has an amazing “staying power” that is hard to explain – I suspect it has to do with people wanting “free stuff” and googling for “open source SIEM” … 4. Again, my classic PCI DSS Log Review series is extra popular! The series of 18 posts cover a comprehensive log review approach (OK for PCI DSS 3+ even though it predates it), useful for building log review processes and procedures, whether regulatory or not. It is also described in more detail in our Log Management book and mentioned in our PCI book – note that this series is even mentioned in some PCI Council materials. 5. Simple Log Review Checklist Released!” is often at the top of this list – this rapidly aging checklist is still a useful tool for many people. “On Free Log Management Tools” (also aged quite a bit by now) is a companion to the checklist (updated version) In addition, I’d like to draw your attention to a few recent posts from my Gartner blog [which, BTW, now has more than 5X of the traffic of this blog]: Critical reference posts: Current research on testing security: Current research on threat detection “starter kit” Just finished research on SOAR: Miscellaneous fun posts: (see all my published Gartner research here) Also see my past monthly and annual “Top Popular Blog Posts” – 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017. Disclaimer: most content at SecurityWarrior blog was written before I joined Gartner on August 1, 2011 and is solely my personal view at the time of writing. For my current security blogging, go here. Other posts in this endless series: The Lone Sysadmin How to Troubleshoot Unreliable or Malfunctioning Hardware My post on Intel X710 NICs being awful has triggered a lot of emotion and commentary from my readers. One of the common questions has been: so I have X710 NICs, what do I do? How do I troubleshoot hardware that isn’t working right? 1. Document how to reproduce the problem and its severity. Is […] The post How to Troubleshoot Unreliable or Malfunctioning Hardware appeared first on The Lone Sysadmin. Head over to the source to read the full post! OpenSSL Seeking Last Group of Contributors The following is a press release that we just put out about how finishing off our relicensing effort. For the impatient, please see https://license.openssl.org/trying-to-find to help us find the last people; we want to change the license with our next release, which is currently in Alpha, and tentatively set for May. For background, you can see all posts in the license category. One copy of the press release is at https://www.prnewswire.com/news-releases/openssl-seeking-last-group-of-contributors-300607162.html. OpenSSL Seeking Last Group of Contributors Looking for programmers who contributed code to the OpenSSL project The OpenSSL project, [https://www.openssl.org] (https://www.openssl.org), is trying to reach the last couple-dozen people who have contributed code to OpenSSL. They are asking people to look at https://license.openssl.org/trying-to-find to see if they recognize any names. If so, contact license@openssl.org with any information. This marks one of the final steps in the project’s work to change the license from its non-standard custom text, to the highly popular Apache License. This effort first started in the Fall of 2015, by requiring contributor agreements. Last March, the project made a major publicity effort, with large coverage in the industry. It also began to reach out and contact all contributors, as found by reviewing all changes made to the source. Over 600 people have already responded to emails or other attempts to contact them, and more than 98% agreed with the change. The project removed the code of all those who disagreed with the change. In order to properly respect the desires of all original authors, the project continues to make strong efforts to find everyone. Measured purely by simple metrics, the average contribution still outstanding is not large. There are a total of 59 commits without a response, out of a history of more than 32,300. On average, each person submitted a patch that modified 3-4 files, adding 100 lines and removing 23. “We’re very pleased to be changing the license, and I am personally happy that OpenSSL has adopted the widely deployed Apache License,” said Mark Cox, a founding member of the OpenSSL Management Committee. Cox is also a founder and former Board Member of the Apache Software Foundation. The project hopes to conclude its two-year relicensing effort in time for the next release, which will include an implementation of TLS 1.3. For more information, email osf-contact@openssl.org. -30- February 28, 2018 The Lone Sysadmin Intel X710 NICs Are Crap (I’m grumpy this week and I’m giving myself permission to return to my blogging roots and complain about stuff. Deal with it.) In the not so distant past we were growing a VMware cluster and ordered 17 new blade servers with X710 NICs. Bad idea. X710 NICs suck, as it turns out. Those NICs do […] The post Intel X710 NICs Are Crap appeared first on The Lone Sysadmin. Head over to the source to read the full post! Everything Sysadmin DevOpsDays New York City 2019: Join the planning committee! 2019 feels like a long way off, but since the conference is in January, we need to start planning soon. The sooner we start, the less rushed the planning can be. I have to confess that working with the 2018 committee was one of the best and most professional conference planning experiences I've ever had. I've been involved with many conferences over the years and this experience was one of the best! I invite new people to join the committee for 2019. The best way to learn about organizing is to join a committee and help out. You will be mentored and learn a lot in the process. Nothing involved in creating a conference is difficult, it just takes time and commitment. Interested in being on the next planning committee? An informational meeting will be held via WedEx on Tuesday, March 6 at 2pm (NYC timezone, of course!). During this kick-off meeting, the 2018 committee will review what roles they took on, what went well, what could be improved and the timeframe for the 2019 event. Please note, attendance to this meeting doesn't commit you to help organize this event, however, it is hoped by the end that we will be able to firm up who will comprise the 2019 event committee. Hope you all can make it! If you are interested in attending, email devopsdaysnyc@gmail.com for connection info. February 26, 2018 TaoSecurity Importing Pcap into Security Onion Within the last week, Doug Burks of Security Onion (SO) added a new script that revolutionizes the use case for his amazing open source network security monitoring platform. I have always used SO in a live production mode, meaning I deploy a SO sensor sniffing a live network interface. As the multitude of SO components observe network traffic, they generate, store, and display various forms of NSM data for use by analysts. The problem with this model is that it could not be used for processing stored network traffic. If one simply replayed the traffic from a .pcap file, the new traffic would be assigned contemporary timestamps by the various tools observing the traffic. While all of the NSM tools in SO have the independent capability to read stored .pcap files, there was no unified way to integrate their output into the SO platform. Therefore, for years, there has not been a way to import .pcap files into SO -- until last week! Here is how I tested the new so-import-pcap script. First, I made sure I was running Security Onion Elastic Stack Release Candidate 2 (14.04.5.8 ISO) or later. Next I downloaded the script using wget from https://github.com/Security-Onion-Solutions/securityonion-elastic/blob/master/usr/sbin/so-import-pcap. I continued as follows: richard@so1:~$ sudo cp so-import-pcap /usr/sbin/

richard@so1:~$sudo chmod 755 /usr/sbin/so-import-pcap I tried running the script against two of the sample files packaged with SO, but ran into issues with both. richard@so1:~$ sudo so-import-pcap /opt/samples/10k.pcap

so-import-pcap

...creating temp pcap for processing.
mergecap: Error reading /opt/samples/10k.pcap: The file appears to be damaged or corrupt
(pcap: File has 263718464-byte packet, bigger than maximum of 262144)
Error while merging!

I checked the file with capinfos.

richard@so1:~$capinfos /opt/samples/10k.pcap capinfos: An error occurred after reading 17046 packets from "/opt/samples/10k.pcap": The file appears to be damaged or corrupt. (pcap: File has 263718464-byte packet, bigger than maximum of 262144) Capinfos confirmed the problem. Let's try another! richard@so1:~$ sudo so-import-pcap /opt/samples/zeus-sample-1.pcap

so-import-pcap

...creating temp pcap for processing.
mergecap: Error reading /opt/samples/zeus-sample-1.pcap: The file appears to be damaged or corrupt
(pcap: File has 1984391168-byte packet, bigger than maximum of 262144)
Error while merging!

Another bad file. Trying a third!

richard@so1:~$sudo so-import-pcap /opt/samples/evidence03.pcap so-import-pcap Please wait while... ...creating temp pcap for processing. ...setting sguild debug to 2 and restarting sguild. ...configuring syslog-ng to pick up sguild logs. ...disabling syslog output in barnyard. ...configuring logstash to parse sguild logs (this may take a few minutes, but should only need to be done once)...done. ...stopping curator. ...disabling curator. ...stopping ossec_agent. ...disabling ossec_agent. ...stopping Bro sniffing process. ...disabling Bro sniffing process. ...stopping IDS sniffing process. ...disabling IDS sniffing process. ...stopping netsniff-ng. ...disabling netsniff-ng. ...adjusting CapMe to allow pcaps up to 50 years old. ...analyzing traffic with Snort. ...analyzing traffic with Bro. ...writing /nsm/sensor_data/so1-eth1/dailylogs/2009-12-28/snort.log.1261958400 Import complete! You can use this hyperlink to view data in the time range of your import: https://localhost/app/kibana#/dashboard/94b52620-342a-11e7-9d52-4f090484f59e?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2009-12-28T00:00:00.000Z',mode:absolute,to:'2009-12-29T00:00:00.000Z')) or you can manually set your Time Range to be: From: 2009-12-28 To: 2009-12-29 Incidentally here is the capinfos output for this trace. richard@so1:~$ capinfos /opt/samples/evidence03.pcap
File name:           /opt/samples/evidence03.pcap
File type:           Wireshark/tcpdump/... - pcap
File encapsulation:  Ethernet
Packet size limit:   file hdr: 65535 bytes
Number of packets:   1778
File size:           1537 kB
Data size:           1508 kB
Capture duration:    171 seconds
Start time:          Mon Dec 28 04:08:01 2009
End time:            Mon Dec 28 04:10:52 2009
Data byte rate:      8814 bytes/s
Data bit rate:       70 kbps
Average packet size: 848.57 bytes
Average packet rate: 10 packets/sec
SHA1:                34e5369c8151cf11a48732fed82f690c79d2b253
RIPEMD160:           afb2a911b4b3e38bc2967a9129f0a11639ebe97f
MD5:                 f8a01fbe84ef960d7cbd793e0c52a6c9
Strict time order:   True

That worked! Now to see what I can find in the SO interface.

I accessed the Kibana application and changed the timeframe to include those in the trace.

Here's another screenshot. Again I had to adjust for the proper time range.

Very cool! However, I did not find any IDS alerts. This made me wonder if there was a problem with alert processing. I decided to run the script on a new .pcap:

richard@so1:~$sudo so-import-pcap /opt/samples/emerging-all.pcap so-import-pcap Please wait while... ...creating temp pcap for processing. ...analyzing traffic with Snort. ...analyzing traffic with Bro. ...writing /nsm/sensor_data/so1-eth1/dailylogs/2010-01-27/snort.log.1264550400 Import complete! You can use this hyperlink to view data in the time range of your import: https://localhost/app/kibana#/dashboard/94b52620-342a-11e7-9d52-4f090484f59e?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:'2010-01-27T00:00:00.000Z',mode:absolute,to:'2010-01-28T00:00:00.000Z')) or you can manually set your Time Range to be: From: 2010-01-27 To: 2010-01-28 When I searched the interface for NIDS alerts (after adjusting the time range), I found results: The alerts show up in Sguil, too! This is a wonderful development for the Security Onion community. Being able to import .pcap files and analyze them with the standard SO tools and processes, while preserving timestamps, makes SO a viable network forensics platform. This thread in the mailing list is covering the new script. I suggest running on an evaluation system, probably in a virtual machine. I did all my testing on Virtual Box. Check it out! Ben's Practical Admin Blog Active Directory & Certificates – Which One is Being Used? So here’s a question I want you to try answering off the top of your head – Which certificate is your domain controller using for Kerberos & LDAPS and what happens when there are multiple certificates in the crypto store? The answer is actually pretty obvious if you already know the answer, however this was the question I faced recently, and ended up having to do a little bit of poking around to answer the question. The scenario in question for me is having built a new multi-tier PKI in our environment I have reached the point of migrating services to it, including the auto-enrolling certificates templates used on Domain Controllers. For most contemporary active directory installs where AD certificate services is also used, there are two main certificate templates related to domain controllers: • Kerberos Authentication • Directory Email Replication The “Kerberos Authentication” certificate template made it’s appearance in Windows Server 2008, replacing the “Domain Controller” and “Domain Controller Authentication” templates in earlier versions of ADCS. The “Directory Email Replication” template is used where you use email protocols to replicate AD (I am not quite sure why anyone would want to do this in this day & age). Getting back to my scenario and question, how do you work out which certificate is in use? In both examples, we’re interested in the certificate serial number. The first way is to use a network analyser such as Wireshark (or MS Message Analyzer) to trace a connection to port 636 of a domain controller: Using a network analyser is nifty in that you can see the full handshake occurring and the data passed – something crypto-geeks can get excited about expanding out the information we can obtain the serial number: 655dc58900010000e01e Alternatively, if you have openSSL available, you can use the following commands to connect and obtain similar information: openssl s_client -connect <LDAPS Server> This will connect to the server and amongst the output will be the offered certificate in bas64 format. Copying the All text between and including —–BEGIN CERTIFICATE—– & —–END CERTIFICATE—– to a file which will give you the public key being offered. You can then run this command: openssl x509 -in <certificate-file> -check To obtain all the detailed information on the certificate, including the serial number. From here, it’s just a matter of checking the personal certificate store on the local computer account and find the certificate with the matching serial: What Happens for multiple Kerberos Certificates? Again, looking back at my scenario, I now have two Kerberos Authentication certificates in my store – one from the old CA Infrastructure, and the other from the New CAS Infrastructure, with a different template name to meet naming standards. Using the tried and true method of “test it and see what happens”, I found that the AD DS service will always use the newest certificate available. That is, the one that has the newest validity start date. As an example, if today is February 26, the certificate which is valid from February 25th will be used over the certificate valid from February 20th. Changing between certificates is a seamless affair. AD Domain Services doesn’t need restarting, nor does the machine in general. Summary So there you have it, Domain Controllers at their base use 1-2 certificate templates, based on how you replicate. There’s no native way (that I found) to work out which certificate is being used, so tools like Wireshark and OpenSSL can be useful for obtaining certificate information to reference. Finally, Domain Controllers will use the Kerberos Certificate with the latest validity period. February 25, 2018 Sarah Allen listening to very specific events The model of declarative eventing allows for listening to very specific events and then triggering specific actions. This model simplifies the developer experience, as well as optimizing the system by reducing network traffic. AWS S3 bucket trigger In looking AWS to explain changes in S3 can trigger Lambda functions, I found that the AWS product docs focus on the GUI configuration experience. This probably makes it easy for new folks to write a specific Lambda function; however, it a little harder to see the system patterns before gaining a lot of hands-on experience. The trigger-action association can be seen more clearly in a Terraform configuration. Under the hood, Teraform must be using AWS APIs for setting up the trigger). The configuration below specifies that whenever a json file is uploaded to a specific bucket with the path prefix “content-packages” then a specific Lambda function will be executed: resource "aws_s3_bucket_notification" "bucket_terraform_notification" { bucket = "${aws_s3_bucket.terraform_bucket.id}"
lambda_function {
lambda_function_arn = "${aws_lambda_function.terraform_func.arn}" events = ["s3:ObjectCreated:*"] filter_prefix = "content-packages/" filter_suffix = ".json" } }  Google Cloud events To illustrate an alternate developer experience, the examples below are shown with Firebase JavaScript SDK for Google Cloud Functions, which is idiomatic for JavaScript developers using the Fluent API style, popularized by jQuery. The same functionality is available via command line options using gcloud, the Google Cloud CLI. ** Cloud Storage trigger** Below is an example of specifying a trigger for a change to a Google Cloud Storage object in a specific bucket: exports.generateThumbnail = functions.storage.bucket('my-bucket').object().onChange((event) => { // ... });  Cloud Firestore trigger This approach to filtering events at their source is very powerful when applied to database operations, where a developer can listen to a specific database path, such as with Cloud Firestore events: exports.createProduct = functions.firestore .document('products/{productId}') .onCreate(event => { // Get an object representing the document // e.g. {'name': 'Wooden Doll', 'description': '...} var newValue = event.data.data(); // access a particular field as you would any JS property var name = newValue.name; // perform desired operations ... });  February 20, 2018 pagetable Murdlok: A new old adventure game for the C64 Murdlok is a previously unreleased graphical text-based adventure game for the Commodore 64 written in 1986 by Peter Hempel. A German and an English version exist. Murdlok – Ein Abenteuer von Peter Hempel Befreie das Land von dem bösen Murdlok. Nur Nachdenken und kein Leichtsinn führen zum Ziel. murdlok_de.d64 (Originalversion von 1986) Murdlok – An Adventure by Peter Hempel Liberate the land from the evil Murdlok! Reflection, not recklessness will guide you to your goal! murdlok_en.d64 (English translation by Lisa Brodner and Michael Steil, 2018) The great thing about a new game is that no walkthroughs exist yet! Feel free to use the comments section of this post to discuss how to solve the game. Extra points for the shortest solution – ours is 236 steps! February 17, 2018 Cryptography Engineering A few notes on Medsec and St. Jude Medical In Fall 2016 I was invited to come to Miami as part of a team that independently validated some alleged flaws in implantable cardiac devices manufactured by St. Jude Medical (now part of Abbott Labs). These flaws were discovered by a company called MedSec. The story got a lot of traction in the press at the time, primarily due to the fact that a hedge fund called Muddy Waters took a large short position on SJM stock as a result of these findings. SJM subsequently sued both parties for defamation. The FDA later issued a recall for many of the devices. Due in part to the legal dispute (still ongoing!), I never had the opportunity to write about what happened down in Miami, and I thought that was a shame: because it’s really interesting. So I’m belatedly putting up this post, which talks a bit MedSec’s findings, and implantable device security in general. By the way: “we” in this case refers to a team of subject matter experts hired by Bishop Fox, and retained by legal counsel for Muddy Waters investments. I won’t name the other team members here because some might not want to be troubled by this now, but they did most of the work — and their names can be found in this public expert report (as can all the technical findings in this post.) Quick disclaimers: this post is my own, and any mistakes or inaccuracies in it are mine and mine alone. I’m not a doctor so holy cow this isn’t medical advice. Many of the flaws in this post have since been patched by SJM/Abbot. I was paid for my time and travel by Bishop Fox for a few days in 2016, but I haven’t worked for them since. I didn’t ask anyone for permission to post this, because it’s all public information. A quick primer on implantable cardiac devices Implantable cardiac devices are tiny computers that can be surgically installed inside a patient’s body. Each device contains a battery and a set of electrical leads that can be surgically attached to the patient’s heart muscle. When people think about these devices, they’re probably most familiar with the cardiac pacemaker. Pacemakers issue small electrical shocks to ensure that the heart beats at an appropriate rate. However, the pacemaker is actually one of the least powerful implantable devices. A much more powerful type of device is the Implantable Cardioverter-Defibrillator (ICD). These devices are implanted in patients who have a serious risk of spontaneously entering a dangerous state in which their heart ceases to pump blood effectively. The ICD continuously monitors the patient’s heart rhythm to identify when the patient’s heart has entered this condition, and applies a series of increasingly powerful shocks to the heart muscle to restore effective heart function. Unlike pacemakers, ICDs can issue shocks of several hundred volts or more, and can both stop and restart a patient’s normal heart rhythm. Like most computers, implantable devices can communicate with other computers. To avoid the need for external data ports – which would mean a break in the patient’s skin – these devices communicate via either a long-range radio frequency (“RF”) or a near-field inductive coupling (“EM”) communication channel, or both. Healthcare providers use a specialized hospital device called a Programmer to update therapeutic settings on the device (e.g., program the device, turn therapy off). Using the Programmer, providers can manually issue commands that cause an ICD to shock the patient’s heart. One command, called a “T-Wave shock” (or “Shock-on-T”) can be used by healthcare providers to deliberately induce ventrical fibrillation. This capability is used after a device is implanted, in order to test the device and verify it’s functioning properly. Because the Programmer is a powerful tool – one that could cause harm if misused – it’s generally deployed in a physician office or hospital setting. Moreover, device manufacturers may employ special precautions to prevent spurious commands from being accepted by an implantable device. For example: 1. Some devices require that all Programmer commands be received over a short-range communication channel, such as the inductive (EM) channel. This limits the communication range to several centimeters. 2. Other devices require that a short-range inductive (EM) wand must be used to initiate a session between the Programmer and a particular implantable device. The device will only accept long-range RF commands sent by the Programmer after this interaction, and then only for a limited period of time. From a computer security perspective, both of these approaches have a common feature: using either approach requires some form of close-proximity physical interaction with the patient before the implantable device will accept (potentially harmful) commands via the long-range RF channel. Even if a malicious party steals a Programmer from a hospital, she may still need to physically approach the patient – at a distance limited to perhaps centimeters – before she can use the Programmer to issue commands that might harm the patient. In addition to the Programmer, most implantable manufacturers also produce some form of “telemedicine” device. These devices aren’t intended to deliver commands like cardiac shocks. Instead, they exist to provide remote patient monitoring from the patient’s home. Telematics devices use RF or inductive (EM) communications to interrogate the implantable device in order to obtain episode history, usually at night when the patient is asleep. The resulting data is uploaded to a server (via telephone or cellular modem) where it can be accessed by healthcare providers. What can go wrong? Before we get into specific vulnerabilities in implantable devices, it’s worth asking a very basic question. From a security perspective, what should we even be worried about? There are a number of answers to this question. For example, an attacker might abuse implantable device systems or infrastructure to recover confidential patient data (known as PHI). Obviously this would be bad, and manufacturers should design against it. But the loss of patient information is, quite frankly, kind of the least of your worries. A much scarier possibility is that an attacker might attempt to harm patients. This could be as simple as turning off therapy, leaving the patient to deal with their underlying condition. On the much scarier end of the spectrum, an ICD attacker could find a way to deliberately issue dangerous shocks that could stop a patient’s heart from functioning properly. Now let me be clear: this isn’t not what you’d call a high probability attack. Most people aren’t going to be targeted by sophisticated technical assassins. The concerning thing about this the impact of such an attack is significantly terrifying that we should probably be concerned about it. Indeed, some high-profile individuals have already taken precautions against it. The real nightmare scenario is a mass attack in which a single resourceful attacker targets thousands of individuals simultaneously — perhaps by compromising a manufacturer’s back-end infrastructure — and threatens to harm them all at the same time. While this might seem unlikely, we’ve already seen attackers systematically target hospitals with ransomware. So this isn’t entirely without precedent. Securing device interaction physically The real challenge in securing an implantable device is that too much security could hurt you. As tempting as it might be to lard these devices up with security features like passwords and digital certificates, doctors need to be able to access them. Sometimes in a hurry. This is a big deal. If you’re in a remote emergency room or hospital, the last thing you want is some complex security protocol making it hard to disable your device or issue a required shock. This means we can forget about complex PKI and revocation lists. Nobody is going to have time to remember a password. Even merely complicated procedures are out — you can’t afford to have them slow down treatment. At the same time, these devices obviously must perform some sort of authentication: otherwise anyone with the right kind of RF transmitter could program them — via RF, from a distance. This is exactly what you want to prevent. Many manufacturers have adopted an approach that cut through this knot. The basic idea is to require physical proximity before someone can issue commands to your device. Specifically, before anyone can issue a shock command (even via a long-range RF channel) they must — at least briefly — make close physical contact with the patient. This proximity be enforced in a variety of ways. If you remember, I mentioned above that most devices have a short-range inductive coupling (“EM”) communications channel. These short-range channels seem ideal for establishing a “pairing” between a Programmer and an implantable device — via a specialized wand. Once the channel is established, of course, it’s possible to switch over to long-range RF communications. This isn’t a perfect solution, but it has a lot going for it: someone could still harm you, but they would have to at least get a transmitter within a few inches of your chest before doing so. Moreover, you can potentially disable harmful commands from an entire class of device (like telemedecine monitoring devices) simply by leaving off the wand. St. Jude Medical and MedSec So given this background, what did St. Jude Medical do? All of the details are discussed in a full expert report published by Bishop Fox. In this post we I’ll focus on the most serious of MedSec’s claims, which can be expressed as follows: Using only the hardware contained within a “Merlin @Home” telematics device, it was possible to disable therapy and issue high-power “shock” commands to an ICD from a distance, and without first physically interacting with the implantable device at close range. This vulnerability had several implications: 1. The existence of this vulnerability implies that – through a relatively simple process of “rooting” and installing software on a Merlin @Home device – a malicious attacker could create a device capable of issuing harmful shock commands to installed SJM ICD devices at a distance. This is particularly worrying given that Merlin @Home devices are widely deployed in patients’ homes and can be purchased on eBay for prices under$30. While it might conceivably be possible to physically secure and track the location of all PCS Programmer devices, it seems challenging to physically track the much larger fleet of Merlin @Home devices.
2. More critically, it implies that St. Jude Medical implantable devices do not enforce a close physical interaction (e.g., via an EM wand or other mechanism) prior to accepting commands that have the potential to harm or even kill patients. This may be a deliberate design decision on St. Jude Medical’s part. Alternatively, it could be an oversight. In either case, this design flaw increases the risk to patients by allowing for the possibility that remote attackers might be able to cause patient harm solely via the long-range RF channel.
3. If it is possible – using software modifications only – to issue shock commands from the Merlin @Home device, then patients with an ICD may be vulnerable in the hypothetical event that their Merlin @Home device becomes remotely compromised by an attacker. Such a compromise might be accomplished remotely via a network attack on a single patient’s Merlin @Home device. Alternatively, a compromise might be accomplished at large scale through a compromise of St. Jude Medical’s server infrastructure.

We stress that the final scenario is strictly hypothetical. MedSec did not allege a specific vulnerability that allows for the remote compromise of Merlin @Home devices or SJM infrastructure. However, from the perspective of software and network security design, these attacks are one of the potential implications of a design that permits telematics devices to send such commands to an implantable device. It is important to stress that none of these attacks would be possible if St. Jude Medical’s design prohibited the implantable from accepting therapeutic commands from the Merlin @Home device (e.g., by requiring close physical interaction via the EM wand, or by somehow authenticating the provenance of commands and restricting critical commands to be sent by the Programmer only).

Validating MedSec’s claim

To validate MedSec’s claim, we examined their methodology from start to finish. This methodology included extracting and decompiling Java-based software from a single PCS Programmer; accessing a Merlin @Home device to obtain a root shell via the JTAG port; and installing a new package of custom software written by MedSec onto a used Merlin @Home device.

We then observed MedSec issue a series of commands to an ICD device using a Merlin @Home device that had been customized (via software) as described above. We used the Programmer to verify that these commands were successfully received by the implantable device, and physically confirmed that MedSec had induced shocks by attaching a multimeter to the leads on the implantable device.

Finally, we reproduced MedSec’s claims by opening the case of a second Merlin @Home device (after verifying that the tape was intact over the screw holes), obtaining a shell by connecting a laptop computer to the JTAG port, and installing MedSec’s software on the device. We were then able to issue commands to the ICD from a distance of several feet. This process took us less than three hours in total, and required only inexpensive tools and a laptop computer.

What are the technical details of the attack?

Simply reproducing a claim is only part of the validation process. To verify MedSec’s claims we also needed to understand why the attack described above was successful. Specifically, we were interested in identifying the security design issues that make it possible for a Merlin @Home device to successfully issue commands that are not intended to be issued from this type of device. The answer to this question is quite technical, and involves the specific way that SJM implantable devices verify commands before accepting them.

MedSec described to us the operation of SJM’s command protocol as part of their demonstration. They also provided us with Java JAR executable code files taken from the hard drive of the PCS Programmer. These files, which are not obfuscated and can easily be “decompiled” into clear source code, contain the software responsible for implementing the Programmer-to-Device communications protocol.

By examining the SJM Programmer code, we verified that Programmer commands are authenticated through the inclusion of a three-byte (24 bit) “authentication tag” that must be present and correct within each command message received by the implantable device. If this tag is not correct, the device will refuse to accept the command.

From a cryptographic perspective, 24 bits is a surprisingly short value for an important authentication field. However, we note that even this relatively short tag might be sufficient to prevent forgery of command messages – provided the tag ws calculated using a secure cryptographic function (e.g., a Message Authentication Code) using a fresh secret key that cannot be predicted by an the attacker.

Based on MedSec’s demonstration, and on our analysis of the Programmer code, it appears that SJM does not use the above approach to generate authentication tags. Instead, SJM authenticates the Programmer to the implantable with the assistance of a “key table” that is hard-coded within the Java code within the Programmer. At minimum, any party who obtains the (non-obfuscated) Java code from a legitimate SJM Programmer can gain the ability to calculate the correct authentication tags needed to produce viable commands – without any need to use the Programmer itself.

Moreover, MedSec determined – and successfully demonstrated – that there exists a “Universal Key”, i.e., a fixed three-byte authentication tag, that can be used in place of the calculated authentication tag. We identified this value in the Java code provided by MedSec, and verified that it was sufficient to issue shock commands from a Merlin @Home to an implantable device.

While these issues alone are sufficient to defeat the command authentication mechanism used by SJM implantable devices, we also analyzed the specific function that is used by SJM to generate the three-byte authentication tag.  To our surprise, SJM does not appear to use a standard cryptographic function to compute this tag. Instead, they use an unusual and apparently “homebrewed” cryptographic algorithm for the purpose.

Specifically, the PCS Programmer Java code contains a series of hard-coded 32-bit RSA public keys. To issue a command, the implantable device sends a value to the Programmer. This value is then “encrypted” by the Programmer using one of the RSA public keys, and the resulting output is truncated to produce a 24-bit output tag.

The above is not a standard cryptographic protocol, and quite frankly it is difficult to see what St. Jude Medical is trying to accomplish using this technique. From a cryptographic perspective it has several problems:

1. The RSA public keys used by the PCS Programmers are 32 bits long. Normal RSA keys are expected to be a minimum of 1024 bits in length. Some estimates predict that a 1024-bit RSA key can be factored (and thus rendered insecure) in approximately one year using a powerful network of supercomputers. Based on experimentation, we were able to factor the SJM public keys in less than one second on a laptop computer.
2. Even if the RSA keys were of an appropriate length, the SJM protocol does not make use of the corresponding RSA secret keys. Thus the authentication tag is not an RSA signature, nor does it use RSA in any way that we are familiar with.
3. As noted above, since there is no shared session key established between the specific implantable device and the Programmer, the only shared secret available to both parties is contained within the Programmer’s Java code. Thus any party who extracts the Java code from a PCS Programmer will be able to transmit valid commands to any SJM implantable device.

Our best interpretation of this design is that the calculation is intended as a form of “security by obscurity”, based on the assumption that an attacker will not be able to reverse engineer the protocol. Unfortunately, this approach is rarely successful when used in security systems. In this case, the system is fundamentally fragile – due to the fact that code for computing the correct authentication tag is likely available in easily-decompiled Java bytecode on each St. Jude Medical Programmer device. If this code is ever extracted and published, all St. Jude Medical devices become vulnerable to command forgery.

How to remediate these attacks?

To reiterate, the fundamental security concerns with these St. Jude Medical devices (as of 2016) appeared to be problems of design. These were:

1. SJM implantable devices did not require close physical interaction prior to accepting commands (allegedly) sent by the Programmer.
2. SJM did not incorporate a strong cryptographic authentication mechanism in its RF protocol to verify that commands are truly sent by the Programmer.
3. Even if the previous issue was addressed, St. Jude did not appear to have an infrastructure for securely exchanging shared cryptographic keys between a legitimate Programmer and an implantable device.

There are various ways to remediate these issues. One approach is to require St. Jude implantable devices to exchange a secret key with the Programmer through a close-range interaction involving the Programmer’s EM wand. A second approach would be to use a magnetic sensor to verify the presence of a magnet on the device, prior to accepting Programmer commands. Other solutions are also possible. I haven’t reviewed the solution SJM ultimately adopted in their software patches, and I don’t know how many users patched.

Conclusion

Implantable devices offer a number of unique security challenges. It’s naturally hard to get these things right. At the same time, it’s important that vendors take these issues seriously, and spend the time to get cryptographic authentication mechanisms right — because once deployed, these devices are very hard to repair, and the cost of a mistake is extremely high.

That grumpy BSD guy

A Life Lesson in Mishandling SMTP Sender Verification

An attempt to report spam to a mail service provider's abuse address reveals how incompetence is sometimes indistinguishable from malice.

It all started with one of those rare spam mails that got through.

This one was hawking address lists, much like the ones I occasionally receive to addresses that I can not turn into spamtraps. The message was addressed to, of all things, root@skapet.bsdly.net. (The message with full headers has been preserved here for reference).

Yes, that's right, they sent their spam to root@. And a quick peek at the headers revealed that like most of those attempts at hawking address lists for spamming that actually make it to a mailbox here, this one had been sent by an outlook.com customer.

The problem with spam delivered via outlook.com is that you can't usefully blacklist the sending server, since the largish chunk of the world that uses some sort of Microsoft hosted email solution (Office365 and its ilk) have their usually legitimate mail delivered via the very same infrastructure.

And since outlook.com is one of the mail providers that doesn't play well with greylisting (it spreads its retries across no less than 81 subnets (the output of 'echo outlook.com | doas smtpctl spf walk' is preserved here), it's fairly common practice to just whitelist all those networks and avoid the hassle of lost or delayed mail to and from Microsoft customers.

I was going to just ignore this message too, but we've seen an increasing number of spammy outfits taking advantage of outlook.com's seeming right of way to innocent third parties' mail boxes.

So I decided to try both to do my best at demoralizing this particular sender and alert outlook.com to their problem. I wrote a messsage (preserved here) with a Cc: to abuse@outlook.com where the meat is,

Ms Farell,

The address root@skapet.bsdly.net has never been subscribed to any mailing list, for obvious reasons. Whoever sold you an address list with that address on it are criminals and you should at least demand your money back.

Whoever handles abuse@outlook.com will appreciate the attachment, which is a copy of the message as it arrived here with all headers intact.

Yours sincerely,
Peter N. M. Hansteen

What happened next is quite amazing.

If my analysis is correct, it may not be possible for senders who are not themselves outlook.com customers to actually reach the outlook.com abuse team.

Almost immediately after I sent the message to Ms Farell with a Cc: to abuse@outlook.com, two apparently identical messages from staff@hotmail.com, addressed to postmaster@bsdly.net appeared (preserved here and here), with the main content of both stating

This is an email abuse report for an email message received from IP 216.32.180.51 on Sat, 17 Feb 2018 01:59:21 -0800.
The message below did not meet the sending domain's authentication policy.

In order to understand what happened here, it is necessary to look at the mail server log for a time interval of a few seconds (preserved here).

The first few lines describe the processing of my outgoing message:

2018-02-17 10:59:14 1emzGs-0009wb-94 <= peter@bsdly.net H=(greyhame.bsdly.net) [192.168.103.164] P=esmtps X=TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128 CV=no S=34977 id=31b4ffcf-bf87-de33-b53a-0 ebff4349b94@bsdly.net

My server receives the message from my laptop, and we can see that the connection was properly TLS encrypted

2018-02-17 10:59:15 1emzGs-0009wb-94 => peter <root@skapet.bsdly.net> R=localuser T=local_delivery

I had for some reason kept the original recipient among the To: addresses. Actually useless but also harmless.

2018-02-17 10:59:16 1emzGs-0009wb-94 [104.47.40.33] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/CN=mail.protection.outlook.com" H="outlook-com.olc.protection.outlook.com"
2018-02-17 10:59:18 1emzGs-0009wb-94 SMTP error from remote mail server after end of data: 451 4.4.0 Message failed to be made redundant due to A shadow copy was required but failed to be made with an AckStatus of Fail [CO1NAM03HT002.eop-NAM03.prod.protection.outlook.com] [CO1NAM03FT002.eop-NAM03.prod.protection.outlook.com]
2018-02-17 10:59:19 1emzGs-0009wb-94 [104.47.42.33] SSL verify error: certificate name mismatch: DN="/C=US/ST=WA/L=Redmond/O=Microsoft Corporation/OU=Microsoft Corporation/CN=mail.protection.outlook.com" H="outlook-com.olc.protection.outlook.com"

What we see here is that even a huge corporation like Microsoft does not always handle certificates properly. The certificate they present for setting up the encrypted connection is not actually valid for the host name that the outlook.com server presents.

There is also what I interpret as a file system related message which I assume is meaningful to someone well versed in Microsoft products, but we see that

2018-02-17 10:59:20 1emzGs-0009wb-94 => janet@prospectingsales.net R=dnslookup T=remote_smtp H=prospectingsales-net.mail.protection.outlook.com [23.103.140.138] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=yes K C="250 2.6.0 <31b4ffcf-bf87-de33-b53a-0ebff4349b94@bsdly.net> [InternalId=40926743365667, Hostname=BMXPR01MB0934.INDPRD01.PROD.OUTLOOK.COM] 44350 bytes in 0.868, 49.851 KB/sec Queued mail for delivery"

even though the certificate fails the verification part, the connection sets up with TLSv1.2 anyway, and the message is accepted with a "Queued mail for delivery" message.

The message is also delivered to the Cc: recipient:

2018-02-17 10:59:21 1emzGs-0009wb-94 => abuse@outlook.com R=dnslookup T=remote_smtp H=outlook-com.olc.protection.outlook.com [104.47.42.33] X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K C="250 2.6.0 <31b4ffcf-bf87-de33-b53a-0ebff4349b94@bsdly.net> [InternalId=3491808500196, Hostname=BY2NAM03HT071.eop-NAM03.prod.protection.outlook.com] 42526 bytes in 0.125, 332.215 KB/sec Queued mail for delivery"
2018-02-17 10:59:21 1emzGs-0009wb-94 Completed

And the transactions involving my message would normally have been completed.

But ten seconds later this happens:

2018-02-17 10:59:31 1emzHG-0004w8-0l <= staff@hotmail.com H=bay004-omc1s10.hotmail.com [65.54.190.21] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43968 id=BAY0-XMR-100m4KrfmH000a51d4@bay0-xmr-100.phx.gbl
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <postmaster@bsdly.net> R=localuser T=local_delivery
2018-02-17 10:59:31 1emzHG-0004w8-0l => peter <postmaster@bsdly.net> R=localuser T=local_delivery

That's the first message to my domain's postmaster@ address, followed two seconds later by

2018-02-17 10:59:33 1emzHI-0004w8-Fy <= staff@hotmail.com H=bay004-omc1s10.hotmail.com [65.54.190.21] P=esmtps X=TLSv1.2:ECDHE-RSA-AES256-SHA384:256 CV=no K S=43963 id=BAY0-XMR-100Q2wN0I8000a51d3@bay0-xmr-100.phx.gbl
2018-02-17 10:59:33 1emzHI-0004w8-Fy => peter <postmaster@bsdly.net> R=localuser T=local_delivery
2018-02-17 10:59:33 1emzHI-0004w8-Fy Completed

a second, apparently identical message.

Both of those messages state that the message I sent to abuse@outlook.com had failed SPF verification, because the check happened on connections from NAM03-BY2-obe.outbound.protection.outlook.com (216.32.180.51) by whatever handles incoming mail to the staff@hotmail.com address, which apparently is where the system forwards abuse@outlook.com's mail.

Reading Microsoft Exchange's variant SMTP headers has never been my forte, and I won't try decoding the exact chain of events here since that would probably also require you to have fairly intimate knowledge of Microsoft's internal mail delivery infrastructure.

But even a quick glance at the messages reveals that the message passed SPF and other checks on incoming to the outlook.com infrastructure, but may have ended up not getting delivered after all since a second SPF test happened on a connection from a host that is not in the sender domain's SPF record.

In fact, that second test would only succeed for domains that have

include:spf.protection.outlook.com

in their SPF record, and those would presumably be Outlook.com customers.

Any student or practitioner of SMTP mail delivery should know that SPF records should only happen on ingress, that is at the point where the mail traffic enters your infrastructure and the sender IP address is the original one. Leave the check for later when the message may have been forwarded, and you do not have sufficient data to perform the check.

Whenever I encounter incredibly stupid and functionally destructive configuration errors like this I tend to believe they're down to simple incompetence and not malice.

But this one has me wondering. If you essentially require incoming mail to include the contents of spf.outlook.com (currently no less than 81 subnets) as valid senders for the domain, you are essentially saying that only outlook.com customers are allowed to communicate.

If that restriction is a result of a deliberate choice rather than a simple configuration error, the problem moves out of the technical sphere and could conceivably become a legal matter, depending on what outlook.com have specified in their contracts that they are selling to their customers.

But let us assume that this is indeed a matter of simple bad luck or incompetence and that the solution is indeed technical.

I would have liked to report this to whoever does technical things at that domain via email, but unfortunately there are indications that being their customer is a precondition for using that channel of communication to them.

I hope they fix that, and soon. And then move on to terminating their spamming customers' contracts.

The main lesson to be learned from this is that when you shop around for email service, please do yourself a favor and make an effort to ensure that your prospective providers actually understand how the modern-ish SMTP addons SPF, DKIM and DMARC actually work.

Otherwise you may end up receiving more of the mail you don't want than what you do want, and your own mail may end up not being delivered as intended.

Update 2018-02-19: Just as I was going to get ready for bed (it's late here in CET) another message from Ms Farell arrived, this time to an alias I set up in order to make it easier to filter PF tutorial related messages into a separate mailbox.

I wrote another response, and as the mail server log will show, despite the fact that a friend with an Office365 contract contacted them quoting this article, outlook.com have still not fixed the problem. Two more messages (preserved here and here) shot back here immediately.

Update 2018-02-20: A response from Microsoft, with pointers to potentially useful information.

A message from somebody identifying as working for Microsoft Online Safety arrived, apparently responding to my message dated 2018-02-19, where the main material was,

Hi,

Based on the information you provided, it appears to have originated from an Office 365 or Exchange Online tenant account.

To report junk mail from Office 365 tenants, send an email to junk@office365.microsoft.com   and include the junk mail as an attachment.

This link provides further junk mail education https://technet.microsoft.com/en-us/library/jj200769(v=exchg.150).aspx.

Kindly,
I have asked for clarification of some points, but no response has arrived by this getting close to bedtime in CET.

However I did take the advice to forward the offending messages as attachment to the junk@ message, and put the outlook.com abuse address in the Cc: on that message. My logs indicate that the certificate error had not gone away, but no SPF-generated bounces appeared either.

If Microsoft responds with further clarifications, I will publish a useful condensate here.

In other news, there will be PF tutorial at the 2018 AsiaBSDCon in Tokyo. Follow the links for the most up to date information.

If you have ever written 6502 code for the Commodore 64, you may remember using “JSR $FFD2” to print a character on the screen. You may have read that the jump table at the end of the KERNAL ROM was designed to allow applications to run on a all Commodore 8 bit computers from the PET to the C128 (and the C65!) – but that is a misconception. This article will show how • the first version of the jump table in the PET was designed to only hook up BASIC to the system’s features • it wasn’t until the VIC-20 that the jump table was generalized for application development (and the vector table introduced) • all later machines add their own calls, but later machines don’t necessary support older calls. KIM-1 (1976) The KIM-1 was originally meant as a computer development board for the MOS 6502 CPU. Commodore acquired MOS in 1976 and kept selling the KIM-1. It contained a 2 KB ROM (“TIM”, “Terminal Interface Monitor”), which included functions to read characters from ($1E5A) and write characters to ($1EA0) a serial terminal, as well as code to load from and save to tape and support for the hex keyboard and display. Commodore asked Microsoft to port their BASIC for 6502 to it, which interfaced with the monitor only through the two character in and out functions. The original source of BASIC shows how Microsoft adapted it to work with the KIM-1 by defining CZGETL and OUTCH to point to the monitor routines: IFE REALIO-1,<GETCMD==1 DISKO==1 OUTCH=^O17240 ;1EA0 ROMLOC==^O20000 RORSW==0 CZGETL=^O17132>  (The values are octal, since the assembler Microsoft used did not support hexadecimal.) The makers of the KIM-1 never intended to change the ROM, so there was no need to have a jump table for these calls. Applications just hardcoded their offsets in ROM. PET (1977) The PET was Commodore’s first complete computer, with a keyboard, a display and a built-in tape drive. The system ROM (“KERNAL”) was now 4 KB and included a powerful file I/O system for tape, RS-232 and IEEE-488 (for printers and disk drives) as well as timekeeping logic. Another 2 KB ROM (“EDITOR”) handled screen output and character input. Microsoft BASIC was included in ROM and was marketed – with the name “COMMODORE BASIC” – as the actual operating system, making the KERNAL and the editor merely a device driver package. Like with the KIM-1, Commodore asked Microsoft to port BASIC to the PET, and provided them with addresses of a jump table in the KERNAL ROM for interfacing with it. These are the symbol definitions in Microsoft’s source:  CQOPEN=^O177700 CQCLOS=^O177703 CQOIN= ^O177706 ;OPEN CHANNEL FOR INPUT CQOOUT=^O177711 ;FILL FOR COMMO. CQCCHN=^O177714 CQINCH=^O177717 ;INCHR'S CALL TO GET A CHARACTER OUTCH= ^O177722 CQLOAD=^O177725 CQSAVE=^O177730 CQVERF=^O177733 CQSYS= ^O177736 ISCNTC=^O177741 CZGETL=^O177744 ;CALL POINT FOR "GET" CQCALL=^O177747 ;CLOSE ALL CHANNELS  (The meaning of the CQ prefix is left as an exercise to the reader.) In hex and with Commodore’s names, these are the KERNAL calls used by BASIC: •$FFC0: OPEN
• $FFC3: CLOSE •$FFC6: CHKIN
• $FFC9: CHKOUT •$FFCC: CLRCHN
• $FFCF: BASIN •$FFD2: BSOUT
• $FFD5: LOAD •$FFD8: SAVE
• $FFDB: VERIFY •$FFDE: SYS
• $FFE1: STOP •$FFE4: GETIN
• $FFE7: CLALL •$FFEA: UDTIM (advance clock; not used by BASIC)

At first sight, this jump table looks very similar to the one known from the C64, but it is indeed very different, and it is not generally compatible.

The following eight KERNAL routines are called from within the implementation of BASIC commands to deal with character I/O and the keyboard:

• $FFC6: CHKIN – set channel for character input •$FFC9: CHKOUT – set channel for character output
• $FFCC: CLRCHN – restore character I/O to screen/keyboard •$FFCF: BASIN – get character
• $FFD2: BSOUT – write character •$FFE1: STOP – test for STOP key
• $FFE4: GETIN – get character from keyboard •$FFE7: CLALL – close all channels

But the remaining six calls are not library calls at all, but full-fledged implementations of BASIC commands:

• $FFC0: OPEN – open a channel •$FFC3: CLOSE – close a channel
• $FFD5: LOAD – load a file into memory •$FFD8: SAVE – save a file from memory
• $FFDB: VERIFY – compare a file with memory •$FFDE: SYS – run machine code

When compiled for the PET, Microsoft BASIC detects the extra commands “OPEN”, “CLOSE” etc., but does not provide an implementation for them. Instead, it calls out to these KERNAL functions when these commands are encountered. So these KERNAL calls have to parse the BASIC arguments, check for errors, and update BASIC’s internal data structures.

These 6 KERNAL calls are actually BASIC command extensions, and they are not useful for any other programs in machine code. After all, the whole jump table was not meant as an abstraction of the machine, but as an interface especially for Microsoft BASIC.

PET BASIC V4 (1980)

Version 4 of the ROM set, which came with significant improvements to BASIC and shipped by default with the 4000 and 8000 series, contained several additions to the KERNAL – all of which were additional BASIC commands.

• $FF93: CONCAT •$FF96: DOPEN
• $FF99: DCLOSE •$FF9C: RECORD
• $FF9F: HEADER •$FFA2: COLLECT
• $FFA5: BACKUP •$FFA8: COPY
• $FFAB: APPEND •$FFAE: DSAVE
• $FFB1: DLOAD •$FFB4: CATALOG/DIRECTORY
• $FFB7: RENAME •$FFBA; SCRATCH
• $FFBD: DS$ (disk status)

Even though Commodore was doing all development on their fork of BASIC after version 2, command additions were still kept separate and developed as part of the KERNAL. In fact, for all Commodore 8-bit computers from the PET to the C65, BASIC and KERNAL were built separately, and the KERNAL jump table was their interface.

VIC-20 (1981)

The VIC-20 was Commodore’s first low-cost home computer. In order to keep the cost down, the complete ROM had to fit into 16 KB, which meant the BASIC V4 features and the machine language monitor had to be dropped and the editor was merged into the KERNAL. While reorganizing the ROM, the original BASIC command extensions (OPEN, CLOSE, …) were moved into the BASIC ROM (so the KERNAL calls for the BASIC command implementations were no longer needed).

The VIC-20 KERNAL is the first one to have a proper system call interface, which does not only include all calls required so BASIC is hardware-independent, but also additional calls not used by BASIC but intended for applications written in machine code. The VIC-20 Programmer’s Reference Manual documents these, making this the first time that machine code applications could be written for the Commodore 8 bit series in a forward-compatible way.

Old PET Calls

The following PET KERNAL calls are generally useful and therefore still supported on the VIC-20:

• $FFC6: CHKIN •$FFC9: CHKOUT
• $FFCC: CLRCHN •$FFCF: BASIN
• $FFD2: BSOUT •$FFE1: STOP
• $FFE4: GETIN •$FFE7: CLALL
• $FFEA: UDTIM Channel I/O The calls for the BASIC commands OPEN, CLOSE, LOAD and SAVE have been replaced by generic functions that can be called from machine code: •$FFC0: OPEN
• $FFC3: CLOSE •$FFD5: LOAD
• $FFD8: SAVE (There is no separate call for VERIFY, since the LOAD function can perform this function based on its inputs.) OPEN, LOAD and SAVE take more arguments (LA, FA, SA, filename) than fit into the 6502 registers, so two more calls take these and store them temporarily. •$FFBA: SETLFS – set LA, FA and SA
• $FFBD: SETNAM – set filename Two more additions allow reading the status of the last operation and to set the verbosity of messages/errors: •$FFB7: READST – return status byte
• $FF90: SETMSG – set verbosity BASIC uses all these functions to implement the commands OPEN, CLOSE, LOAD, SAVE and VERIFY. It basically parses the arguments and then calls the KERNAL functions. IEC The KERNAL also exposes a complete low-level interface to the serial IEC (IEEE-488) bus used to connect printers and disk drives. None of these calls are used by BASIC though, which talks to these devices on a higher level (OPEN, CHKIN, BASIN etc.). •$FFB4: TALK – send TALK command
• $FFB1: LISTEN – send LISTEN command •$FFAE: UNLSN – send UNLISTEN command
• $FFAB: UNTLK – send UNTALK command •$FFA8: IECOUT – send byte to serial bus
• $FFA5: IECIN – read byte from serial bus •$FFA2: SETTMO – set timeout
• $FF96: TKSA – send TALK secondary address •$FF93: SECOND – send LISTEN secondary address

Memory

BASIC needs to know where usable RAM starts and where it ends, which is what the MEMTOP and MEMBOT function are for. They also allow setting these values.

• $FF9C: MEMBOT – read/write address of start of usable RAM •$FF99: MEMTOP – read/write address of end of usable RAM

Time

BASIC supports the TI and TI$variables to access the system clock. The RDTIM and SETTIM KERNAL calls allow reading and writing this clock. •$FFDE: RDTIM – read system clock
• $FFF0: PLOT – read/write cursor position I/O On the PET, the BASIC’s random number generator for the RND command was directly reading the timers in THE VIA 6522 controller. Since the VIC-20, this is abstracted: The IOBASE function returns the start address of the VIA in memory, and BASIC reads from the indexes 4, 5, 8 and 9 to access the timer values. •$FFF3: IOBASE – return start of I/O area

The VIC-20 Programmer’s Reference Guide states: “This routine exists to provide compatibility between the VIC 20 and future models of the VIC. If the I/O locations for a machine language program are set by a call to this routine, they should still remain compatible with future versions of the VIC, the KERNAL and BASIC.”

Vectors

The PET already allowed the user to override the following vectors in RAM to hook into some KERNAL functions:

• $00E9: input from keyboard •$00EB: output to screen
• $0090: IRQ handler •$0092: BRK handler
• $0094: NMI handler The VIC-20 ROM replaces these vectors with a more extensive table of addresses in RAM at$0300 to hook core BASIC and KERNAL functions. The KERNAL ones start at $0314. The first three can be used to hook IRQ, BRK and NMI: •$0314: CINV – IRQ handler
• $0316: CBINV – BRK handler •$0318: NMINV – NMI handler

The others allow overriding the core set of KERNAL calls

• $031A: IOPEN – indirect entry to OPEN ($FFC0)
• $031C: ICLOSE – indirect entry to CLOSE ($FFC3)
• $031E: ICHKIN – indirect entry to CHKIN ($FFC6)
• $0320: ICKOUT – indirect entry to CHKOUT ($FFC9)
• $0322: ICLRCH – indirect entry to CLRCHN ($FFCC)
• $0324: IBASIN – indirect entry to CHRIN ($FFCF)
• $0326: IBSOUT – indirect entry to CHROUT ($FFD2)
• $0328: ISTOP – indirect entry to STOP ($FFE1)
• $032A: IGETIN – indirect entry to GETIN ($FFE4)
• $032C: ICLALL – indirect entry to CLALL ($FFE7)
• $032E: USRCMD – “User-Defined Vector” •$0330: ILOAD – indirect entry to LOAD ($FFD5) •$0332: ISAVE – indirect entry to SAVE ($FFD8) The “USRCMD” vector is interesting: It’s unused on the VIC-20 and C64. On all later machines, this vector is documented as “EXMON” and allows hooking the machine code monitor’s command entry. The vector was presumably meant for the monitor from the beginning, but this feature was cut from these two machines. The KERNAL documentation warns against changing these vectors by hand. Instead, the VECTOR call allows the application to copy the complete set of KERNAL vectors ($0314-$0333) from and to private memory. The RESTOR command sets the default values. •$FF8D: VECTOR – read/write KERNAL vectors

• $FF9F: SCNKEY – keyboard driver CBM-II (1982) The CBM-II series of computers was meant as a successor of the PET 4000/8000 series. The KERNAL’s architecture was based on the VIC-20. The vector table in RAM is compatible except for ILOAD, ISAVE and USRCMD (which is now used), whose order was changed: •$032E: ILOAD – indirect entry to LOAD ($FFD5) •$0330: ISAVE – indirect entry to SAVE ($FFD8) •$0332: USRCMD – machine code monitor command input

There are two new keyboard-related vectors:

• $0334: ESCVEC – ESC key vector •$0336: CTLVEC – CONTROL key vector (unused)

And all IEEE-488 KERNAL calls except ACPTR can be hooked:

• $0346: ITALK – indirect entry to TALK ($FFB4)
• $0344: ILISTN – indirect entry to LISTEN ($FFB1)
• $0342: IUNLSN – indirect entry to UNLSN ($FFAE)
• $0340: IUNTLK – indirect entry to UNTLK ($FFAB)
• $033E: ICIOUT – indirect entry to CIOUT ($FFA8)
• $033C: IACPTR – indirect entry to ACPTR ($FFA5)
• $033A: ITKSA – indirect entry to TKSA ($FF96)
• $0338: ISECND – indirect entry to SECOND ($FF93)

For no apparent reason, the VECTOR and RESTOR calls have moved to different addresses:

• $FF84: VECTOR – read/write KERNAL vectors •$FF87: RESTOR – set KERNAL vectors to defaults

And there are several new calls. All machines since the VIC-20 have a way to hand control to ROM cartridges instead of BASIC on system startup. At this point, no system initialization whatsoever has been done by the KERNAL, so the application or game on the cartridge can start up as quickly as possible. Applications that want to be forward-compatible can call into the following new KERNAL calls to initialize different parts of the system:

• $FF7B: IOINIT – initialize I/O and enable timer IRQ •$FF7E: CINT – initialize text screen

The LKUPLA and LKUPSA calls are used by BASIC to find unused logical and secondary addresses for channel I/O, so its built-in disk commands can open channels even if the user has currently open channels – logical addresses have to be unique on the computer side, and secondary addresses have to be unique on the disk drive side.

• $FF8D: LKUPLA – search tables for given LA •$FF8A: LKUPSA – search tables for given SA

It also added 6 generally useful calls:

• $FF6C: TXJMP – jump across banks •$FF6F: VRESET – power-on/off vector reset
• $FF72: IPCGO – loop for other processor •$FF75: FUNKEY – list/program function key
• $FF78: IPRQST – send IPC request •$FF81: ALOCAT – allocate memory from MEMTOP down

C64 (1982)

Both the KERNAL and the BASIC ROM of the C64 are derived from the VIC-20, so both the KERNAL calls and the vectors are fully compatible with it, but some extensions from the CBM-II carried over: The IOINIT and CINT calls to initialize I/O and the text screen exist, but at different addresses, and a new RAMTAS call has been added, which is also useful for startup from a ROM cartridge.

• $FF87: RAMTAS – test and initialize RAM •$FF84: IOINIT – initialize I/O and enable timer IRQ
• $FF81: CINT – initialize text screen The other CBM-II additions are missing, since they are not needed, e.g. because BASIC doesn’t have the V4 disk commands (LKUPLA, LKUPSA) and because there is only one RAM bank (TXJMP, ALOCAT). Plus/4 (264 Series, 1985) The next Commodore 8 bit computers in historical order are the 264 series: the C16, the C116 and the Plus/4, which share the same general architecture, BASIC and KERNAL. But they are neither meant as successors of the C64, nor to the CBM-II series – they are more like spiritual successors of the VIC-20. Nevertheless, the KERNAL jump table and vectors are based on the C64. Since the 264 machines don’t have an NMI, the NMI vector is missing, and the remaining vectors have been moved in memory. This makes most of the vector table incompatible with their predecessors: •$0314: CINV – IRQ handler
• $0316: CBINV – BRK handler • (NMI removed) •$0318: IOPEN
• $031A: ICLOSE •$031C: ICHKIN
• $031E: ICKOUT •$0320: ICLRCH
• $0322: IBASIN •$0324: IBSOUT
• $0326: ISTOP •$0328: IGETIN
• $032A: ICLALL •$032C: USRCMD
• $032E: ILOAD •$0330: ISAVE

The Plus/4 is the first machine from the home computer series to include the machine code monitor, so the USRCMD vector is now used for command input in the monitor.

And there is one new vector, ITIME, which is called one every frame during vertical blank.

• $0312: ITIME – vertical blank IRQ The Plus/4 supports all C64 KERNAL calls, plus some additions. The RESET call has been added to the very end of the table: •$FFF6: RESET – restart machine

There are nine more undocumented entries, which are located at lower addresses so that there is an (unused) gap between them and the remaining calls. Since the area $FFD0 to$FF3F is occupied by the I/O area, these vectors are split between the areas just below and just above it. These two sets are known as the “banking routine table” and the “unofficial jump table”.

• $FCF1: CARTRIDGE_IRQ •$FCF4: PHOENIX
• $FCF7: LONG_FETCH •$FCFA: LONG_JUMP
• $FCFD: LONG_IRQ •$FF49: DEFKEY – program function key
• $FF4C: PRINT – print string •$FF4F: PRIMM – print string following the caller’s code
• $FF52: MONITOR – enter machine code monitor The DEFKEY call has the same functionality as FUNKEY ($FF75) call of the CBM-II series, but the two take different arguments.

C128 (1985)

The Commodore 128 is the successor of the C64. Next to a 100% compatible C64 mode that used the original ROMs, it has a native C128 mode, which is based on the C64 (not the CBM-II or the 264), so all KERNAL vectors and calls are compatible with the C64, but there are additions.

The KERNAL vectors are the same as on the C64, but again, the USRCMD vector (at the VIC-20/C64 location of $032E) is used for command input in the machine code monitor. There are additional vectors starting at$0334 for hooking editor logic as well as pointers to keyboard decode tables, but these are not part of the KERNAL vectors, since the VECTOR and RESTOR calls don’t include them.

The set of KERNAL calls has been extended by 19 entries. The LKUPLA and LKUPSA calls from the CBM-II exist (because BASIC has disk commands), but they are at different locations:

• $FF59: LKUPLA •$FF5C: LKUPSA

There are also several calls known from the Plus/4, but at different addresses:

• $FF65: PFKEY – program a function key •$FF7D: PRIMM – print string following the caller’s code
• $FF56: PHOENIX – init function cartridges And there are another 14 completely new ones: •$FF47: SPIN_SPOUT – setup fast serial ports for I/O
• $FF4A: CLOSE_ALL – close all files on a device •$FF4D: C64MODE – reconfigure system as a C64
• $FF50: DMA_CALL – send command to DMA device •$FF53: BOOT_CALL – boot load program from disk
• $FF5F: SWAPPER – switch between 40 and 80 columns •$FF62: DLCHR – init 80-col character RAM
• $FF68: SETBNK – set bank for I/O operations •$FF6B: GETCFG – lookup MMU data for given bank
• $FF6E: JSRFAR – gosub in another bank •$FF71: JMPFAR – goto another bank
• $FF74: INDFET – LDA (fetvec),Y from any bank •$FF77: INDSTA – STA (stavec),Y to any bank
• $FF7A: INDCMP – CMP (cmpvec),Y to any bank Interestingly, the C128 Programmer’s Reference Guide states that all calls since the C64 “are specifically for the C128 and as such should not be considered as permanent additions to the standard jump table. C65 (1991) The C65 (also known as the C64X) was a planned successor of the C64 line of computers. Several hundred prerelease devices were built, but it was never released as a product. Like the C128, it has a C64 mode, but it is not backwards-compatible with the C128. Nevertheless, the KERNAL of the native C65 mode is based on the C128 KERNAL. Like the CBM-II, but at different addresses, all IEE-488/IEC functions can be hooked with these 8 new vectors: •$0335: ITALK – indirect entry to TALK ($FFB4) •$0338: ILISTEN – indirect entry to LISTEN ($FFB1) •$033B: ITALKSA – indirect entry to TKSA ($FF96) •$033E: ISECND – indirect entry to SECOND ($FF93) •$0341: IACPTR – indirect entry to ACPTR ($FFA5) •$0344: ICIOUT – indirect entry to CIOUT ($FFA8) •$0347: IUNTLK – indirect entry to UNTLK ($FFAB) •$034A: IUNLSN – indirect entry to UNLSN ($FFAE) The C128 additions of the jump table are basically supported, but three calls have been removed and one has been added. The removed ones are DMA_CALL (REU support), DLCHR (VDC support) and GETCFG (MMU support). All three are C128-specific and would make no sense on the C65. The one addition is: •$FF56: MONITOR_CALL – enter machine code monitor

The removals and addition causes the addresses of the following calls to change:

• $FF4D: SPIN_SPOUT •$FF50: CLOSE_ALL
• $FF53: C64MODE •$FF59: BOOT_CALL
• $FF5C: PHOENIX •$FF5F: LKUPLA
• $FF62: LKUPSA •$FF65: SWAPPER
• $FF68: PFKEY •$FF6B: SETBNK

The C128-added KERNAL calls on the C65 can in no way be called compatible with the C128, since several of the calls take different arguments, e.g. the INDFET, INDSTA, INDCMP calls take the bank number in the 65CE02’s Z register. This shows again that the C65 is no way a successor of the C128, but another successor of the C64.

Relationship Graph

The successorship of the Commodore 8 bit computers is messy. Most were merely spiritual successors and rarely truly compatible. The KERNAL source code and the features of the jump table mostly follow the successorship path, but some KERNAL features and jump table calls carried over between branches.

Which entries are safe?

If you want to write code that works on multiple Commodore 8 bit machines, this table will help:

 PET VIC-20 C64C128C65Plus4 CBM-II $FF80 – KERNAL Version –$FF81 – CINT – $FF84 – IOINIT –$FF87 – RAMTAS – $FF8A – RESTOR –$FF8D – VECTOR – $FF90 – SETMSG$FF93 – SECOND $FF96 – TKSA$FF99 – MEMTOP $FF9C – MEMBOT$FF9F – SCNKEY $FFA2 – SETTMO$FFA5 – IECIN $FFA8 – IECOUT$FFAB – UNTLK $FFAE – UNLSN$FFB1 – LISTEN $FFB4 – TALK$FFB7 – READST $FFBA – SETLFS$FFBD – SETNAM $FFC0 – OPEN$FFC3 – CLOSE $FFC6 CHKIN$FFC9 CHKOUT $FFCC CLRCHN$FFCF BASIN $FFD2 BSOUT$FFD5 – LOAD $FFD8 – SAVE$FFDB – SETTIM $FFDE – RDTIM$FFE1 STOP $FFE4 GETIN$FFE7 CLALL $FFEA UDTIM$FFED – SCREEN $FFF0 – PLOT$FFF3 – IOBASE

Code that must work on all Commodore 8 bit computers (without detecting the specific machine) is limited to the following KERNAL calls that are supported from the first PET up to the C65:

• $FFCF: BASIN – get character •$FFD2: BSOUT – write character
• $FFE1: STOP – test for STOP key •$FFE4: GETIN – get character from keyboard

The CHKIN, CHKOUT, CLRCHN, CLALL and UDTIM would be available, but they are not useful, since they are missing their counterparts (opening a file, hooking an interrupt) on the PET. The UDTIM call would be available too, but there is no standard way to hook the timer interrupt if you include the PET.

Nevertheless, the four basic calls are enough for any text mode application that doesn’t care where the line breaks are. Note that the PETSCII graphical character set and the basic PETSCII command codes e.g. for moving the cursor are supported across the whole family.

If you are limiting yourself to the VIC-20 and above (i.e. excluding the PET but including the CBM-II), you can use the basic of 34 calls starting at $FF90. You can only use these two vectors though – if you’re okay with changing them manually without going through the VECTOR call in order to support the CBM-II: •$0314: CINV – IRQ handler
• $0316: CBINV – BRK handler VECTOR and RESTOR are supported on the complete home computer series (i.e. if you exclude the PET and the CBM-II), and the complete set of 16 vectors can be used on all home computers except the Plus/4. The initialization calls (CINT, IOINIT, RAMTAS) exist on all home computers since the C64. In addition, all these machines contain the version of the KERNAL at$FF80.

February 12, 2018

Colin Percival

FreeBSD/EC2 history

A couple years ago Jeff Barr published a blog post with a timeline of EC2 instances. I thought at the time that I should write up a timeline of the FreeBSD/EC2 platform, but I didn't get around to it; but last week, as I prepared to ask for sponsorship for my work I decided that it was time to sit down and collect together the long history of how the platform has evolved and improved over the years.

HolisticInfoSec.org

toolsmith #131 - The HELK vs APTSimulator - Part 1

Ladies and gentlemen, for our main attraction, I give you...The HELK vs APTSimulator, in a Death Battle! The late, great Randy "Macho Man" Savage said many things in his day, in his own special way, but "Expect the unexpected in the kingdom of madness!" could be our toolsmith theme this month and next. Man, am I having a flashback to my college days, many moons ago. :-) The HELK just brought it on. Yes, I know, HELK is the Hunting ELK stack, got it, but it reminded me of the Hulk, and then, I thought of a Hulkamania showdown with APTSimulator, and Randy Savage's classic, raspy voice popped in my head with "Hulkamania is like a single grain of sand in the Sahara desert that is Macho Madness." And that, dear reader, is a glimpse into exactly three seconds or less in the mind of your scribe, a strange place to be certain. But alas, that's how we came up with this fabulous showcase.
In this corner, from Roberto Rodriguez, @Cyb3rWard0g, the specter in SpecterOps, it's...The...HELK! This, my friends, is the s**t, worth every ounce of hype we can muster.
And in the other corner, from Florian Roth, @cyb3rops, the The Fracas of Frankfurt, we have APTSimulator. All your worst adversary apparitions in one APT mic drop. This...is...Death Battle!

Now with that out of our system, let's begin. There's a lot of goodness here, so I'm definitely going to do this in two parts so as not undervalue these two offerings.
HELK is incredibly easy to install. Its also well documented, with lots of related reading material, let me propose that you take the tine to to review it all. Pay particular attention to the wiki, gain comfort with the architecture, then review installation steps.
On an Ubuntu 16.04 LTS system I ran:
• git clone https://github.com/Cyb3rWard0g/HELK.git
• cd HELK/
• sudo ./helk_install.sh
Of the three installation options I was presented with, pulling the latest HELK Docker Image from cyb3rward0g dockerhub, building the HELK image from a local Dockerfile, or installing the HELK from a local bash script, I chose the first and went with the latest Docker image. The installation script does a fantastic job of fulfilling dependencies for you, if you haven't installed Docker, the HELK install script does it for you. You can observe the entire install process in Figure 1.
 Figure 1: HELK Installation
You can immediately confirm your clean installation by navigating to your HELK KIBANA URL, in my case http://192.168.248.29.
For my test Windows system I created a Windows 7 x86 virtual machine with Virtualbox. The key to success here is ensuring that you install Winlogbeat on the Windows systems from which you'd like to ship logs to HELK. More important, is ensuring that you run Winlogbeat with the right winlogbeat.yml file. You'll want to modify and copy this to your target systems. The critical modification is line 123, under Kafka output, where you need to add the IP address for your HELK server in three spots. My modification appeared as hosts: ["192.168.248.29:9092","192.168.248.29:9093","192.168.248.29:9094"]. As noted in the HELK architecture diagram, HELK consumes Winlogbeat event logs via Kafka.
On your Windows systems, with a properly modified winlogbeat.yml, you'll run:
• ./winlogbeat -c winlogbeat.yml -e
• ./winlogbeat setup -e
You'll definitely want to set up Sysmon on your target hosts as well. I prefer to do so with the @SwiftOnSecurity configuration file. If you're doing so with your initial setup, use sysmon.exe -accepteula -i sysmonconfig-export.xml. If you're modifying an existing configuration, use sysmon.exe -c sysmonconfig-export.xml.  This will ensure rich data returns from Sysmon, when using adversary emulation services from APTsimulator, as we will, or experiencing the real deal.
With all set up and working you should see results in your Kibana dashboard as seen in Figure 2.

 Figure 2: Initial HELK Kibana Sysmon dashboard.
Now for the showdown. :-) Florian's APTSimulator does some comprehensive emulation to make your systems appear compromised under the following scenarios:
• POCs: Endpoint detection agents / compromise assessment tools
• Test your security monitoring's detection capabilities
• Test your SOCs response on a threat that isn't EICAR or a port scan
• Prepare an environment for digital forensics classes
This is a truly admirable effort, one I advocate for most heartily as a blue team leader. With particular attention to testing your security monitoring's detection capabilities, if you don't do so regularly and comprehensively, you are, quite simply, incomplete in your practice. If you haven't tested and validated, don't consider it detection, it's just a rule with a prayer. APTSimulator can be observed conducting the likes of:
1. Creating typical attacker working directory C:\TMP...
2. Activating guest user account
3. Placing a svchost.exe (which is actually srvany.exe) into C:\Users\Public
4. Modifying the hosts file
5. Using curl to access well-known C2 addresses
1. C2: msupdater.com
6. Dropping a Powershell netcat alternative into the APT dir
7. Executes nbtscan on the local network
8. Dropping a modified PsExec into the APT dir
9. Registering mimikatz in At job
10. Registering a malicious RUN key
11. Registering mimikatz in scheduled task
12. Registering cmd.exe as debugger for sethc.exe
13. Dropping web shell in new WWW directory
A couple of notes here.
Download and install APTSimulator from the Releases section of its GitHub pages.
APTSimulator includes curl.exe, 7z.exe, and 7z.dll in its helpers directory. Be sure that you drop the correct version of 7 Zip for your system architecture. I'm assuming the default bits are 64bit, I was testing on a 32bit VM.

Let's do a fast run-through with HELK's Kibana Discover option looking for the above mentioned APTSimulator activities. Starting with a search for TMP in the sysmon-* index yields immediate results and strikes #1, 6, 7, and 8 from our APTSimulator list above, see for yourself in Figure 3.

 Figure 3: TMP, PS nc, nbtscan, and PsExec in one shot
Created TMP, dropped a PowerShell netcat, nbtscanned the local network, and dropped a modified PsExec, check, check, check, and check.
How about enabling the guest user account and adding it to the local administrator's group? Figure 4 confirms.

 Figure 4: Guest enabled and escalated
Strike #2 from the list. Something tells me we'll immediately find svchost.exe in C:\Users\Public. Aye, Figure 5 makes it so.

 Figure 5: I've got your svchost right here
Knock #3 off the to-do, including the process.commandline, process.name, and file.creationtime references. Up next, the At job and scheduled task creation. Indeed, see Figure 6.

I think you get the point, there weren't any misses here. There are, of course, visualization options. Don't forget about Kibana's Timelion feature. Forensicators and incident responders live and die by timelines, use it to your advantage (Figure 7).

 Figure 7: Timelion
Finally, for this month, under HELK's Kibana Visualize menu, you'll note 34 visualizations. By default, these are pretty basic, but you quickly add value with sub-buckets. As an example, I selected the Sysmon_UserName visualization. Initially, it yielded a donut graph inclusive of malman (my pwned user), SYSTEM and LOCAL SERVICE. Not good enough to be particularly useful I added a sub-bucket to include process names associated with each user. The resulting graph is more detailed and tells us that of the 242 events in the last four hours associated with the malman user, 32 of those were specific to cmd.exe processes, or 18.6% (Figure 8).

 Figure 8: Powerful visualization capabilities
This has been such a pleasure this month, I am thrilled with both HELK and APTSimulator. The true principles of blue team and detection quality are innate in these projects. The fact that Roberto consider HELK still in alpha state leads me to believe there is so much more to come. Be sure to dig deeply into APTSimulator's Advance Solutions as well, there's more than one way to emulate an adversary.
Next month Part 2 will explore the Network side of the equation via the Network Dashboard and related visualizations, as well as HELK integration with Spark, Graphframes & Jupyter notebooks.
Aw snap, more goodness to come, I can't wait.
Cheers...until next time.

February 11, 2018

syslog.me

The future of configuration management (again), and a suggestion

I have attended the Config Management Camp in Gent this year, where I also presented the talk “Promise theory: from configuration management to team leadership“. A thrilling experience, considering that I was talking about promise theory at the same conference and in the same track where Mark Burgess, the inventor of promise theory, was holding one of the keynotes!

The quality of the conference was as good as always, but my experience at the conference was completely different from the past. Last time I attended, in 2016, I was actively using CFEngine and that shaped in both the talks I attended and the people that I hanged on with the most. This year I was coming from a different work environment and a different job: I jumped a lot through the different tracks and devrooms, and talked with many people with a very different experience than mine. And that was truly enriching. I’ll focus on one experience in particular, that led me to see what the future of configuration management could be.

I attended all the keynotes. Mark Burgess’ was, as always, rich in content and a bit hard to process; lots of food for though, but I couldn’t let it percolate in my brain until someone made it click several hours later. More on that in a minute.

Then there was Luke Kanies’ keynote, explaining where configuration management and we, CM practitioners, won the battle; and also where we lost the battle and where we are irrelevant. Again, more stuff accumulated, waiting for something to trigger the mental process to consume the information. There was also the keynote by Adam Jacob about the future of Configuration Management, great and fun as always but not part of this movie I recommend that you enjoy it on youtube.

Later, at the social event, I had the pleasure to have a conversation with Stein Inge Morisbak, whom I knew from before as we met in Oslo several times. With his experience working on public cloud infrastructures like AWS and Google Cloud Platform, Stein Inge was one of the people who attended the conference with a sceptical eye about configuration management and, at the same time, with the open mind that you would expect from the great guy he is. In a sincere effort to understand, he couldn’t really see how CM, “a sinking ship”, could possibly be relevant in an era where public cloud, immutable infrastructure and all the tooling around are the modern technology of today.

While we were talking, another great guy chimed in, namely Ivan Rossi. If you look at Ivan’s LinkedIn page you’ll see that he’s been working in technology for a good while and has seen things from many different angles. Ivan made a few practical examples where CM is the only tooling that you can use because the cloud simply isn’t there and the tooling that you use in immutable infrastructure don’t work: think of networks of devices sitting in the middle of nowhere. In situations like those, with limited hardware resources and/or shitty wireless links like 2G networks, you need something that is lightweight, resilient, fault tolerant, and that can maintain the configuration because in no way you’re just going around every other day to replace the devices with new ones with updated configurations and software.

And there, Stein Inge was the first one to make the link with Mark Burgess’ keynote and to make me part of his revelation (or his “pilgrim’s experience”, as he calls it). Mark talked about a new sprawl of hardware devices going on: they are all around us, in phones and tablets, and more and more in our domestic appliances, in smart cars, in all the “smart” devices that people are buying every day. A heap of devices that is poorly managed as of today, if at all, and where CM has definitely a place. Stein Inge talked about this experience in his blog; his post is in Norwegian so you must either know the language or ask some translation software for help, I promise it’s worth the read.

What’s the future then?

So, what’s the future of configuration management, based on Mark Burgess’ vision and these observations? A few ideas:

• on the server side, it will be less and less relevant to the everyday user as more people will shift to private and public clouds. It will still be relevant for those who maintain hardware infrastructures; the big players will maybe decide to bake their own tools to better suit their hardware and workflows — they have the workforce and the skills in house, so why not? The smaller players will keep using “off-the-shelf” tools in the same lines of those we have today for provisioning hardware and keep their configurations in shape;
• configuration management will become more relevant as a tool to manage fleets of hardware like company workstations and laptops, for example, to enforce policies and ensure that security measures are in place at all times; that will eventually include company-owned phones;
• configuration management will be more and more relevant in IoT and “smart” devices in general; for those, a new generation of tools may be needed that can run on limited hardware and unreliable networks; agent-based tools will probably have the upper hand here;
• we’ll have less and less config management on virtual machines (and possibly less and less virtual machines and more and more containers); CM on virtual machines will remain only in special cases, e.g. where you need to run a software that doesn’t lend itself to automatic installation and configuration (Atlassian, I am looking at you).

As always with future forecast, time will tell.

One word about Configuration Management Camp

I am a fan of Config Management Camp since I attended (and presented at-) the first edition. I am glad to see that the scope of the conference is widening to include containers and immutable infrastructure. However, as Stein Inge says in his blog post (the translation is mine, as all mistakes thereof):

The most part of the talks revolved around configuration management or servers, which is of little importance in a world where we use services on public cloud platforms on a much higher abstraction level.

Maybe, and I stress maybe, an effort should be made to reduce the focus from configuration management a bit in favour of the “rival” technologies of nowadays; not to the point that CM disappears because, as I just said, CM will still play an important part, and CfgMgmtCamp is not DevOpsDays anyway. Possibly a different name that underlines Infrastructure as Code as the real topic could help in this rebalance?

February 08, 2018

Sean's IT Blog

Moving to the Cloud? Don’t Forget End-User Experience

The cloud has a lot to offer IT departments.  It provides the benefits of virtualization in a consumption-based model, and it allows new applications to quickly be deployed while waiting for, or even completely forgoing, on-premises infrastructure.  This can provide a better time-to-value and greater flexibility for the business.  It can help organizations reduce, or eliminate, their on-premises data center footprint.

But while the cloud has a lot of potential to disrupt how IT manages applications in the data center, it also has the potential to disrupt how IT delivers services to end users.

In order to understand how cloud will disrupt end-user computing, we first need to look at how organizations are adopting the cloud.  We also need to look at how the cloud can change application development patterns, and how that will change how IT delivers services to end users.

The Current State of Cloud

When people talk about cloud, they’re usually talking about three different types of services.  These services, and their definitions, are:

• Infrastructure-as-a-Service: Running virtual machines in a hosted, multi-tenant virtual data center.
• Platform-as-a-Service: Allows developers to subscribe to build applications without having to build the supporting infrastructure.  The platform can include some combination of web services, application run time services (like .Net or Java), databases, message bus services, and other managed components.
• Software-as-a-Service: Subscription to a vendor hosted and managed application.

The best analogy to explain this comparing the different cloud offerings with different types of pizza restaurants using the graphic below from episerver.com:

So what does this have to do with End-User Computing?

Today, it seems like enterprises that are adopting cloud are going in one of two directions.  The first is migrating their data centers into infrastructure-as-a-service offerings with some platform-as-a-service mixed in.  The other direction is replacing applications with software-as-a-service options.  The former is migrating your applications to Azure or AWS EC2, the latter is replacing on-premises services with options like ServiceNow or Microsoft Office 365.

Both options can present challenges to how enterprises deliver applications to end-users.  And the choices made when migrating on-premises applications to the cloud can greatly impact end-user experience.

The challenges around software-as-a-service deal more with identity management, so this post will focus on migrating on-premises applications to the cloud.

Know Thy Applications – Infrastructure-As-A-Service and EUC Challenges

Infrastructure-as-a-Service offerings provide IT organizations with virtual machines running in a cloud service.  These offerings provide different virtual machines optimized for different tasks, and they provide the flexibility to meet the various needs of an enterprise IT organization.  They allow IT organizations to bring their on-premises business applications into the cloud.

The lifeblood of many businesses is Win32 applications.  Whether they are commercial or developed in house, these applications are often critical to some portion of a business process.  Many of these applications were never designed with high availability or the cloud in mind, and the developer and/or the source code may be long gone.  Or they might not be easily replaced because they are deeply integrated into critical processes or other enterprise systems.

Many Win32 applications have clients that expect to connect to local servers.  But when you move those servers to a remote datacenter, including the cloud, it can introduce problems that makes the application nearly unusable.  Common problems that users encounter are longer application load times, increased transaction times, and reports taking longer to preview and/or print.

These problems make employees less productive, and it has an impact on the efficiency and profitability of the business.

A few jobs ago, I was working for a company that had its headquarters, local office, and data center co-located in the same building.  They also had a number of other regional offices scattered across our state and the country.  The company had grown to the point where they were running out of space, and they decided to split the corporate and local offices.  The corporate team moved to a new building a few miles away, but the data center remained in the building.

Many of the corporate employees were users of a two-tier business application, and the application client connected directly to the database server.  Moving users of a fat client application a few miles down the road from the database server had a significant impact on application performance and user experience.  Application response suffered, and user complaints rose.  Critical business processes took longer, and productivity suffered as a result.

More bandwidth was procured. That didn’t solve the issue, and IT was sent scrambling for a new solution.  Eventually, these issues were addressed with a solution that was already in use for other areas of the business – placing the core applications into Windows Terminal Services and provide users at the corporate office with a published desktop that provided their required applications.

This solution solved their user experience and application performance problems.  But it required other adjustments to the server environment, business process workflows, and how users interact with the technology that enables them to work.  It took time for users to adjust to the changes.  Many of the issues were addressed when the business moved everything to a colocation facility a hundred miles away a few months later.

Ensuring Success When Migrating Applications to the Cloud

The business has said it’s time to move some applications to the cloud.  How do you ensure it’s a success and meets the business and technical requirements of that application while making sure an angry mob of users don’t show up at your office with torches and pitchforks?

The first thing is to understand your application portfolio.  That understanding goes beyond having visibility into what applications you have in your environment and how those applications work from a technical perspective.  You need wholistic view of your applications and  keep the following questions in mind:

• Who uses the application?
• What do the users do in the application?
• How do the users access the application?
• Where does it fit into business processes and workflows?
• What other business systems does the application integrate with?
• How is that integration handled?

Applications rarely exist in a vacuum, and making changes to one not only impacts the users, but it can impact other applications and business processes as well.

By understanding your applications, you will be able to build a roadmap of when applications should migrate to the cloud and effectively mitigate any impacts to both user experience and enterprise integrations.

The second thing is to test it extensively.  The testing needs to be more extensive than functional testing to ensure that the application will run on the server images built by the cloud providers, and it needs to include extensive user experience and user acceptance testing.  This may include spending time with users measuring tasks with a stop-watch to compare how long tasks take in cloud-hosted systems versus on-premises systems.

If application performance isn’t up to user standards and has a significant impact on productivity, you may need to start investigating solutions for bringing users closer to the cloud-hosted applications.  This includes solutions like Citrix, VMware Horizon Cloud, or Amazon Workspaces or AppStream. These solutions bring users closer to the applications, and it can give users an on-premises experience in the cloud.

The third thing is to plan ahead.  Having a roadmap and knowing your application portfolio enables you to plan for when you need capacity or specific features to support users, and it can guide your architecture and product selection.  You don’t want to get three years into a five year migration and find out that the solution you selected doesn’t have the features you require for a use case or that the environment wasn’t architected to support the number of users.

When planning to migrate applications from your on-premises datacenters to an infrastructure-as-a-service offering, it’s important to know your applications and take end-user experience into account.   It’s important to test, and understand, how these applications perform when the application servers and databases are remote to the application client.  If you don’t, you not only anger your users, but you also make them less productive and less profitable overall.